Python:如何通过保留第一个数据框的信息来合并列上的两个数据框?

Python:如何通过保留第一个数据框的信息来合并列上的两个数据框?,第1张

Python:如何通过保留第一个数据框的信息来合并列上的两个数据框?
Sample:df1 = pd.Dataframe({'Name': ['Tom', 'Sara', 'Eva', 'Jack', 'Laura'],          'Age': [34, 18, 44, 27, 30]})#print (df1)df3 = df1.copy()df2 = pd.Dataframe({'Name': ['Tom', 'Paul', 'Eva', 'Jack', 'Michelle'],          'Sex': ['M', 'M', 'F', 'M', 'F']})#print (df2)

使用map由

Series
创建人
set_index

df1['Sex'] = df1['Name'].map(df2.set_index('Name')['Sex'])print (df1)    Name  Age  Sex0    Tom   34    M1   Sara   18  NaN2    Eva   44    F3   Jack   27    M4  Laura   30  NaN

merge左连接的替代解决方案

df = df3.merge(df2[['Name','Sex']], on='Name', how='left')print (df)    Name  Age  Sex0    Tom   34    M1   Sara   18  NaN2    Eva   44    F3   Jack   27    M4  Laura   30  NaN

如果需要通过多列映射(例如Year和Code),则需要merge左连接:

df1 = pd.Dataframe({'Name': ['Tom', 'Sara', 'Eva', 'Jack', 'Laura'],          'Year':[2000,2003,2003,2004,2007],         'Code':[1,2,3,4,4],         'Age': [34, 18, 44, 27, 30]})print (df1)    Name  Year  Code  Age0    Tom  2000     1   341   Sara  2003     2   182    Eva  2003     3   443   Jack  2004     4   274  Laura  2007     4   30df2 = pd.Dataframe({'Name': ['Tom', 'Paul', 'Eva', 'Jack', 'Michelle'],          'Sex': ['M', 'M', 'F', 'M', 'F'],         'Year':[2001,2003,2003,2004,2007],         'Code':[1,2,3,5,3],         'Val':[21,34,23,44,67]})print (df2)       Name Sex  Year  Code  Val0       Tom   M  2001     1   211      Paul   M  2003     2   342       Eva   F  2003     3   233      Jack   M  2004     5   444  Michelle   F  2007     3   67
#merge by all columnsdf = df1.merge(df2, on=['Year','Code'], how='left')print (df)  Name_x  Year  Code  Age Name_y  Sex   Val0    Tom  2000     1   34    NaN  NaN   NaN1   Sara  2003     2   18   Paul    M  34.02    Eva  2003     3   44    Eva    F  23.03   Jack  2004     4   27    NaN  NaN   NaN4  Laura  2007     4   30    NaN  NaN   NaN#specified columns - columns for join (Year, Code) need always + appended columns (Val)df = df1.merge(df2[['Year','Code', 'Val']], on=['Year','Code'], how='left')print (df)    Name  Year  Code  Age   Val0    Tom  2000     1   34   NaN1   Sara  2003     2   18  34.02    Eva  2003     3   44  23.03   Jack  2004     4   27   NaN4  Laura  2007     4   30   NaN

如果获取错误map意味着按连接列重复,则在这里Name:

df1 = pd.Dataframe({'Name': ['Tom', 'Sara', 'Eva', 'Jack', 'Laura'],          'Age': [34, 18, 44, 27, 30]})print (df1)    Name  Age0    Tom   341   Sara   182    Eva   443   Jack   274  Laura   30df3, df4 = df1.copy(), df1.copy()df2 = pd.Dataframe({'Name': ['Tom', 'Tom', 'Eva', 'Jack', 'Michelle'],          'Val': [1,2,3,4,5]})print (df2)       Name  Val0       Tom    1 <-duplicated name Tom1       Tom    2 <-duplicated name Tom2       Eva    33      Jack    44  Michelle    5s = df2.set_index('Name')['Val']df1['New'] = df1['Name'].map(s)print (df1)

InvalidIndexError
:重新索引仅对唯一值的Index对象有效

解决方案通过删除重复项Dataframe.drop_duplicates,或dict在最后一次重复匹配中使用map by :

#default keep first values = df2.drop_duplicates('Name').set_index('Name')['Val']print (s)NameTom         1Eva         3Jack        4Michelle    5Name: Val, dtype: int64df1['New'] = df1['Name'].map(s)print (df1)    Name  Age  New0    Tom   34  1.01   Sara   18  NaN2    Eva   44  3.03   Jack   27  4.04  Laura   30  NaN
#add parameter for keep last value s = df2.drop_duplicates('Name', keep='last').set_index('Name')['Val']print (s)NameTom         2Eva         3Jack        4Michelle    5Name: Val, dtype: int64df3['New'] = df3['Name'].map(s)print (df3)    Name  Age  New0    Tom   34  2.01   Sara   18  NaN2    Eva   44  3.03   Jack   27  4.04  Laura   30  NaN
#map by dictionaryd = dict(zip(df2['Name'], df2['Val']))print (d){'Tom': 2, 'Eva': 3, 'Jack': 4, 'Michelle': 5}df4['New'] = df4['Name'].map(d)print (df4)    Name  Age  New0    Tom   34  2.01   Sara   18  NaN2    Eva   44  3.03   Jack   27  4.04  Laura   30  NaN


欢迎分享,转载请注明来源:内存溢出

原文地址: http://outofmemory.cn/zaji/4942883.html

(0)
打赏 微信扫一扫 微信扫一扫 支付宝扫一扫 支付宝扫一扫
上一篇 2022-11-13
下一篇 2022-11-13

发表评论

登录后才能评论

评论列表(0条)

保存