python – 按groupby对中的最大元素过滤数据帧_python

概述我有一个四列的数据框 df=DataFrame({'order_id':[134,101,131,159,101,189,120,102,134,130,231,421,141,129,141,101],\ 'user_id':[24,10,24,12,24,10,10,24,21,12,12,10,12,17,24,12], 'product_id':[1 我有一个四列的数据框

df=DataFrame({'order_ID':[134,101,131,159,189,120,102,134,130,231,421,141,129,101],\          'user_ID':[24,10,24,12,21,17,12],'product_ID':[1004,1041,1078,1001,1074,1019,1021,1004,1010,1017,1004],'sector':['a','a','b','d','c','a']})order_ID    product_ID  sector  user_ID    120      1001          c     10    421      1010          c     10    101      1041          a     10    189      1074          a     10    159      1001          d     12    231      1001          b     12    130      1004          a     12    141      1004          a     12    101      1004          a     12    129      1004          b     17    134      1021          c     21    101      1001          c     24    134      1004          a     24    141      1017          a     24    102      1019          a     24    131      1078          b     24

对于每个product_ID,我想通过选择每个的行来过滤数据帧(product_ID,user_ID)
具有比与(product_ID,user_ID)对关联的最大order_ID更大的order_ID值的对

例如,对于product_ID 1001,与user_ID 10相关联的max order_ID是120,即最大order_ID
与user_ID 12相同的是231,而对于user_ID 24,最大order_ID是101,所以对于product_ID 1001,我会
喜欢返回DataFrame

df2=DataFrame({'order_ID':[421,131],'product_ID':[1010,1078],'sector':['c','b'],'user_ID':[10,24]})order_ID    product_ID  sector  user_ID    421        1010       c         10    189        1074       a         10    134        1004       a         24    141        1017       a         24    102        1019       a         24    131        1078       b         24

对于product_ID 1004,没有与user_ID 10关联的数据,因此不返回任何行.对于user_ID 12来说最大
order_ID是141并且与1004相关联.由于与user_ID 12关联的order_ID不大,因此不返回任何行.
对于user_ID 17,只有一个条目,它与product_ID 1004相关联,因此没有其他product_ID的
与user_ID相关联17.可能没有更大的order_ID.最后,对于user_ID 24,最大的order_ID关联
product_ID 1004是134.在这种情况下,product_ID 1017的order_ID为141,因此必须返回其行.

总结product_ID 1004的输出是

order_ID  product_ID  sector  user_ID    141        1017       a        24

我想对所有product_ID重复此 *** 作,并将数据帧存储在列表中

我认为解决方案围绕user_ID进行分组,然后对order_ID和product_ID进行过滤,但我坚持这个

df3=df.groupby(['user_ID'])for key,val in df3:    d=val.sort_values(['order_ID','product_ID'])    print d

解决方法我不确定这是最有效的解决方案,但它有效：

def get_dataframe_for_product_ID(your_input_df,wanted_product_ID):    df2 = your_input_df.groupby(['user_ID'])    result = pd.DataFrame([],columns=your_input_df.columns)    for key,val in df2:        result = pd.concat([result,val[val.order_ID > val[val.product_ID == wanted_product_ID].order_ID.max()]])    return result

总结

以上是内存溢出为你收集整理的python – 按groupby对中的最大元素过滤数据帧全部内容，希望文章能够帮你解决python – 按groupby对中的最大元素过滤数据帧所遇到的程序开发问题。

如果觉得内存溢出网站内容还不错，欢迎将内存溢出网站推荐给程序员好友。

欢迎分享，转载请注明来源：内存溢出

原文地址: http://outofmemory.cn/langs/1196346.html

python – 按groupby对中的最大元素过滤数据帧

发表评论

评论列表（0条）