我认为您可以使用第一个convert列
Date
to_datetime,然后
Days通过
groupbywith
resample和last查找丢失
apply
rolling
test['Date'] = pd.to_datetime(test['Date'])df = test.groupby('User').apply(lambda x: x.set_index('Date').resample('1D').first())print df User ValueUser Date John 2016-04-01 John 2.0 2016-04-02 John 3.0 2016-04-03 NaN NaN 2016-04-04 NaN NaN 2016-04-05 NaN NaN 2016-04-06 John 6.0Mike 2016-04-01 Mike 1.0 2016-04-02 Mike 1.0 2016-04-03 Mike 4.5 2016-04-04 Mike 1.0 2016-04-05 Mike 2.0df1 = df.groupby(level=0)['Value'] .apply(lambda x: x.shift().rolling(min_periods=1,window=2).mean()) .reset_index(name='Value_Average_Past_2_days')print df1 User Date Value_Average_Past_2_days0 John 2016-04-01 NaN1 John 2016-04-02 2.002 John 2016-04-03 2.503 John 2016-04-04 3.004 John 2016-04-05 NaN5 John 2016-04-06 NaN6 Mike 2016-04-01 NaN7 Mike 2016-04-02 1.008 Mike 2016-04-03 1.009 Mike 2016-04-04 2.7510 Mike 2016-04-05 2.7511 Mike 2016-04-06 1.50print pd.merge(test, df1, on=['Date', 'User'], how='left') Date User Value Value_Average_Past_2_days0 2016-04-01 Mike 1.0 NaN1 2016-04-01 John 2.0 NaN2 2016-04-02 Mike 1.0 1.003 2016-04-02 John 3.0 2.004 2016-04-03 Mike 4.5 1.005 2016-04-04 Mike 1.0 2.756 2016-04-05 Mike 2.0 2.757 2016-04-06 Mike 3.0 1.508 2016-04-06 John 6.0 NaN
欢迎分享,转载请注明来源:内存溢出
评论列表(0条)