不像jezrael的回答那么优雅,但对我来说更容易理解…
你可以apply自定义函数f,然后cumsum,cumcount和astype:
def f(x): x['streak'] = x.groupby( (x['stat'] != 0).cumsum()).cumcount() + ( (x['stat'] != 0).cumsum() == 0).astype(int) return xdf = df.groupby('loser', sort=False).apply(f)print df time winner loser stat streak0 1 A B 0 11 2 C B 0 22 3 D B 1 03 4 E B 0 14 5 F A 0 15 6 G A 0 26 7 H A 0 37 8 I A 1 0
为了更好的理解:
def f(x): x['c'] = (x['stat'] != 0).cumsum() x['a'] = (x['c'] == 0).astype(int) x['b'] = x.groupby( 'c' ).cumcount() x['streak'] = x.groupby( 'c' ).cumcount() + x['a'] return xdf = df.groupby('loser', sort=False).apply(f)print df time winner loser stat c a b streak0 1 A B 0 0 1 0 11 2 C B 0 0 1 1 22 3 D B 1 1 0 0 03 4 E B 0 1 0 1 14 5 F A 0 0 1 0 15 6 G A 0 0 1 1 26 7 H A 0 0 1 2 37 8 I A 1 1 0 0 0
首先,定义一个可以处理单个失败者的函数:
def f(df): df['streak2'] = (df['stat'] == 0).cumsum() df['cumsum'] = np.nan df.loc[df['stat'] == 1, 'cumsum'] = df['streak2'] df['cumsum'] = df['cumsum'].fillna(method='ffill') df['cumsum'] = df['cumsum'].fillna(0) df['streak'] = df['streak2'] - df['cumsum'] df.drop(['streak2', 'cumsum'], axis=1, inplace=True) return df
条纹实际上是a
cumsum,但是每次需要将其重置
stat为1。因此,我们减去
cumsumwhere
stat为1的值,并继续进行到下一个1。
然后
groupby,
apply由失败者:
df.groupby('loser').apply(f)
结果是预期的。
欢迎分享,转载请注明来源:内存溢出
评论列表(0条)