from pandas import DataFramefrom datetime import datetimedata = [('product_a','08/31/2013'),('product_b',('product_c',('product_a','09/30/2013'),'10/31/2013'),'10/31/2013')]product_df = DataFrame( data,columns=['prod_desc','activity_month'])for index,row in product_df.iterrows(): row['activity_month']= datetime.strptime(row['activity_month'],'%m/%d/%Y') product_df.loc[index,'activity_month'] = datetime.strftime(row['activity_month'],'%Y-%m-%d')product_df = product_df.sort(['prod_desc','activity_month'])product_df['month_num'] = product_df.groupby(['prod_desc']).size()
但是,这会返回month_num的NaN.
这是我想要的:
prod_desc activity_month month_num product_a 2014-08-31 1 product_a 2014-09-30 2 product_a 2014-10-31 3 product_b 2014-08-31 1 product_b 2014-09-30 2 product_b 2014-10-31 3 product_c 2014-08-31 1 product_c 2014-09-30 2 product_c 2014-10-31 3解决方法 groupby是正确的想法,但正确的方法是cumcount:
>>> product_df['month_num'] = product_df.groupby('product_desc').cumcount()>>> product_df product_desc activity_month prod_count pct_ch month_num0 product_a 2014-01-01 53 NaN 03 product_a 2014-02-01 52 -0.018868 16 product_a 2014-03-01 50 -0.038462 21 product_b 2014-01-01 44 NaN 04 product_b 2014-02-01 43 -0.022727 17 product_b 2014-03-01 41 -0.046512 22 product_c 2014-01-01 36 NaN 05 product_c 2014-02-01 35 -0.027778 18 product_c 2014-03-01 34 -0.028571 2
如果你真的希望它从1开始,那么就这样做:
>>> product_df['month_num'] = product_df.groupby('product_desc').cumcount() + 1 product_desc activity_month prod_count pct_ch month_num0 product_a 2014-01-01 53 NaN 13 product_a 2014-02-01 52 -0.018868 26 product_a 2014-03-01 50 -0.038462 31 product_b 2014-01-01 44 NaN 14 product_b 2014-02-01 43 -0.022727 27 product_b 2014-03-01 41 -0.046512 32 product_c 2014-01-01 36 NaN 15 product_c 2014-02-01 35 -0.027778 28 product_c 2014-03-01 34 -0.028571 3总结
以上是内存溢出为你收集整理的python – 如何在DataFrame中增加groupby中的行数全部内容,希望文章能够帮你解决python – 如何在DataFrame中增加groupby中的行数所遇到的程序开发问题。
如果觉得内存溢出网站内容还不错,欢迎将内存溢出网站推荐给程序员好友。
欢迎分享,转载请注明来源:内存溢出
评论列表(0条)