我认为您需要add
reset_index,然后将参数设置
ascending=False为,
sort_values因为
sort返回:
FutureWarning:不建议使用sort(columns = ....),请使用sort_values(by =
.....).sort_values([‘count’],ascending = False)
df = df[['STNAME','CTYNAME']].groupby(['STNAME'])['CTYNAME'] .count() .reset_index(name='count') .sort_values(['count'], ascending=False) .head(5)
样品:
df = pd.Dataframe({'STNAME':list('abscscbcdbcsscae'), 'CTYNAME':[4,5,6,5,6,2,3,4,5,6,4,5,4,3,6,5]})print (df) CTYNAME STNAME0 4 a1 5 b2 6 s3 5 c4 6 s5 2 c6 3 b7 4 c8 5 d9 6 b10 4 c11 5 s12 4 s13 3 c14 6 a15 5 edf = df[['STNAME','CTYNAME']].groupby(['STNAME'])['CTYNAME'] .count() .reset_index(name='count') .sort_values(['count'], ascending=False) .head(5)print (df) STNAME count2 c 55 s 41 b 30 a 23 d 1
但似乎您需要
Series.nlargest:
df = df[['STNAME','CTYNAME']].groupby(['STNAME'])['CTYNAME'].count().nlargest(5)
要么:
df = df[['STNAME','CTYNAME']].groupby(['STNAME'])['CTYNAME'].size().nlargest(5)
size和之间的区别count是:size计NaN数值,count不计数。
样品:
df = pd.Dataframe({'STNAME':list('abscscbcdbcsscae'), 'CTYNAME':[4,5,6,5,6,2,3,4,5,6,4,5,4,3,6,5]})print (df) CTYNAME STNAME0 4 a1 5 b2 6 s3 5 c4 6 s5 2 c6 3 b7 4 c8 5 d9 6 b10 4 c11 5 s12 4 s13 3 c14 6 a15 5 edf = df[['STNAME','CTYNAME']].groupby(['STNAME'])['CTYNAME'] .size() .nlargest(5) .reset_index(name='top5')print (df) STNAME top50 c 51 s 42 b 33 a 24 d 1
欢迎分享,转载请注明来源:内存溢出
评论列表(0条)