合并两个熊猫数据框会导致“重复”列_随笔

合并两个熊猫数据框会导致“重复”列

之所以要添加具有后缀’_x’和’_y’的其他列，是因为要合并的列没有匹配的值，因此此冲突会产生其他列。在这种情况下，您需要删除其他“ _y”列并重命名“
_x”列：

In [145]:# define our drop functiondef drop_y(df):    # list comprehension of the cols that end with '_y'    to_drop = [x for x in df if x.endswith('_y')]    df.drop(to_drop, axis=1, inplace=True)drop_y(merged)mergedOut[145]:    key  dept_name_x  res_name_x   year_x   need   holding    DeptA_ResA_2015        DeptA        ResA     2015      1         1   1  DeptA_ResA_2016        DeptA        ResA     2016      1         1   2  DeptA_ResA_2017        DeptA        ResA     2017      1         1    no_of_inv   inv_cost_wo_ice  011000000  10      0  20      0  In [146]:# func to rename '_x' colsdef rename_x(df):    for col in df:        if col.endswith('_x'): df.rename(columns={col:col.rstrip('_x')}, inplace=True)rename_x(merged)mergedOut[146]:    key  dept_name  res_name   year   need   holding   no_of_inv    DeptA_ResA_2015      DeptA      ResA   2015      1         11   1  DeptA_ResA_2016      DeptA      ResA   2016      1         10   2  DeptA_ResA_2017      DeptA      ResA   2017      1         10    inv_cost_wo_ice  01000000  1      0  2      0

编辑如果将公用列添加到合并中，则除非这些列上的匹配项不匹配，否则不应产生重复的列：

merge_df = pd.merge(holding_df, invest_df, on=['key', 'dept_name', 'res_name', 'year'], how='left').fillna(0)

欢迎分享，转载请注明来源：内存溢出

原文地址: http://outofmemory.cn/zaji/5632103.html

合并两个熊猫数据框会导致“重复”列

发表评论

评论列表（0条）