完全有可能:
df.groupby(((df.Start - df.End.shift(1)) > 10).cumsum()).agg({'Start':min, 'End':max, 'Value1':sum, 'Value2': sum})
说明:
start_end_differences = df.Start - df.End.shift(1) #shift moves the series downthreshold_selector = start_end_differences > 10 # will give you a boolean array where true indicates a point where the difference more than 10.groups = threshold_selector.cumsum() # sums up the trues (1) and will create an integer series starting from 0df.groupby(groups).agg({'Start':min}) # the aggregation is self explaining
这是与其他专栏无关的通用解决方案:
cols = df.columns.difference(['Start', 'End'])grps = df.Start.sub(df.End.shift()).gt(10).cumsum()gpby = df.groupby(grps)gpby.agg(dict(Start='min', End='max')).join(gpby[cols].sum()) Start End Value1 Value20 1 42 10 501 100 162 36 22
欢迎分享,转载请注明来源:内存溢出
评论列表(0条)