初始数据框:
dt user val0 2016-01-01 a 11 2016-01-02 a 332 2016-01-05 b 23 2016-01-06 b 1
首先,将日期转换为日期时间:
x['dt'] = pd.to_datetime(x['dt'])
然后,生成日期和唯一用户:
dates = x.set_index('dt').resample('D').asfreq().index>> DatetimeIndex(['2016-01-01', '2016-01-02', '2016-01-03', '2016-01-04', '2016-01-05', '2016-01-06'], dtype='datetime64[ns]', name='dt', freq='D')users = x['user'].unique()>> array(['a', 'b'], dtype=object)
这将允许您创建一个MultiIndex:
idx = pd.MultiIndex.from_product((dates, users), names=['dt', 'user'])>> MultiIndex(levels=[[2016-01-01 00:00:00, 2016-01-02 00:00:00, 2016-01-03 00:00:00, 2016-01-04 00:00:00, 2016-01-05 00:00:00, 2016-01-06 00:00:00], ['a', 'b']],labels=[[0, 0, 1, 1, 2, 2, 3, 3, 4, 4, 5, 5], [0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1]],names=['dt', 'user'])
您可以使用它来重新索引您的Dataframe:
x.set_index(['dt', 'user']).reindex(idx, fill_value=0).reset_index()Out: dt user val0 2016-01-01 a 11 2016-01-01 b 02 2016-01-02 a 333 2016-01-02 b 04 2016-01-03 a 05 2016-01-03 b 06 2016-01-04 a 07 2016-01-04 b 08 2016-01-05 a 09 2016-01-05 b 210 2016-01-06 a 011 2016-01-06 b 1
然后可以按用户排序:
x.set_index(['dt', 'user']).reindex(idx, fill_value=0).reset_index().sort_values(by='user')Out: dt user val0 2016-01-01 a 12 2016-01-02 a 334 2016-01-03 a 06 2016-01-04 a 08 2016-01-05 a 010 2016-01-06 a 01 2016-01-01 b 03 2016-01-02 b 05 2016-01-03 b 07 2016-01-04 b 09 2016-01-05 b 211 2016-01-06 b 1
欢迎分享,转载请注明来源:内存溢出
评论列表(0条)