退货
Dataframe后
groupby有两种解决方案:
参数
as_index=False
是什么在起作用尼斯count
,sum
,mean
功能reset_index
用于从index
,更通用的解决方案级别创建新列df = ttm.groupby([‘clienthostid’], as_index=False, sort=False)[‘LoginDaysSum’].count()
print (df)
clienthostid LoginDaysSum
0 1 4
1 3 2df = ttm.groupby([‘clienthostid’], sort=False)[‘LoginDaysSum’].count().reset_index()
print (df)
clienthostid LoginDaysSum
0 1 4
1 3 2
对于第二个需求,删除
as_index=False并改为添加
reset_index:
#output is `Series`a = ttm.groupby(['clienthostid'], sort=False)['LoginDaysSum'] .apply(lambda x: x.iloc[0] / x.iloc[1])print (a)clienthostid1 1.03 1.5Name: LoginDaysSum, dtype: float64print (type(a))<class 'pandas.core.series.Series'>print (a.index)Int64Index([1, 3], dtype='int64', name='clienthostid')df1 = ttm.groupby(['clienthostid'], sort=False)['LoginDaysSum'] .apply(lambda x: x.iloc[0] / x.iloc[1]).reset_index(name='ratio')print (df1) clienthostid ratio0 1 1.01 3 1.5
为什么有些列不见了?
我认为可能会自动排除讨厌的列:
#convert column to strttm.usersidid = ttm.usersidid.astype(str) + 'aa'print (ttm) usersidid clienthostid eventSumTotal LoginDaysSum score0 12aa 1 60 3 17281 11aa 1 240 3 13313 5aa 1 5 3 1254 6aa 1 16 2 2162 10aa 3 270 3 10005 8aa 3 18 2 512#removed str column userida = ttm.groupby(['clienthostid'], sort=False).sum()print (a) eventSumTotal LoginDaysSum scoreclienthostid 1 321 11 34003 288 5 1512
熊猫的大小和数量有什么区别?
欢迎分享,转载请注明来源:内存溢出
评论列表(0条)