我认为你需要
groupby使用
sum的
NaN值:
df2 = df.C.isnull().groupby([df['A'],df['B']]).sum().astype(int).reset_index(name='count')print (df2) A B count0 bar one 01 bar three 02 bar two 13 foo one 24 foo three 15 foo two 2
boolean indexing:
df = df[df['A'] == 'foo']df2 = df.C.isnull().groupby([df['A'],df['B']]).sum().astype(int)print (df2)A B foo one 2 three 1 two 2
或更简单:
df = df[df['A'] == 'foo']df2 = df['B'].value_counts()print (df2)one 2two 2three 1Name: B, dtype: int64
编辑:解决方案非常相似,只添加
transform:
df['D'] = df.C.isnull().groupby([df['A'],df['B']]).transform('sum').astype(int)print (df) A B C D0 foo one NaN 21 bar one bla2 02 foo two NaN 23 bar three bla3 04 foo two NaN 25 bar two NaN 16 foo one NaN 27 foo three NaN 1
类似的解决方案:
df['D'] = df.C.isnull()df['D'] = df.groupby(['A','B'])['D'].transform('sum').astype(int)print (df) A B C D0 foo one NaN 21 bar one bla2 02 foo two NaN 23 bar three bla3 04 foo two NaN 25 bar two NaN 16 foo one NaN 27 foo three NaN 1
欢迎分享,转载请注明来源:内存溢出
评论列表(0条)