根据条件合并行熊猫数据框

heywire • 2022-12-15 • 随笔 • 阅读 21

根据条件合并行熊猫数据框

完全有可能：

df.groupby(((df.Start  - df.End.shift(1)) > 10).cumsum()).agg({'Start':min, 'End':max, 'Value1':sum, 'Value2': sum})

说明：

start_end_differences = df.Start  - df.End.shift(1) #shift moves the series downthreshold_selector = start_end_differences > 10 # will give you a boolean array where true indicates a point where the difference more than 10.groups = threshold_selector.cumsum() # sums up the trues (1) and will create an integer series starting from 0df.groupby(groups).agg({'Start':min}) # the aggregation is self explaining

这是与其他专栏无关的通用解决方案：

cols = df.columns.difference(['Start', 'End'])grps = df.Start.sub(df.End.shift()).gt(10).cumsum()gpby = df.groupby(grps)gpby.agg(dict(Start='min', End='max')).join(gpby[cols].sum())   Start  End  Value1  Value20      1   42      10      501    100  162      36      22

欢迎分享，转载请注明来源：内存溢出

原文地址: http://outofmemory.cn/zaji/5631052.html

熊猫并行无关解决方案通用

打赏

微信扫一扫

支付宝扫一扫

heywire 一级用户组

0 0

如何在Python中检测日期是否连续？

上一篇 2022-12-15

f字符串给SyntaxError？

下一篇 2022-12-16

发表评论

登录后才能评论

根据条件合并行熊猫数据框

发表评论

评论列表（0条）