在我看来,这是一些pythonic的代码(在您的示例中),没有显式循环:
def get_donors(row): d = donors.apply(lambda x: fuzz.ratio(x['name'], row['name']) * 2 if row['Email'] == x['Email'] else 1, axis=1) d = d[d >= 75] if len(d) == 0: v = ['']*3 else: v = donors.ix[d.idxmax(), ['name','Email','Date']].values return pd.Series(v, index=['donor name', 'donor email', 'donor date'])pd.concat((fundraisers, fundraisers.apply(get_donors, axis=1)), axis=1)
输出:
DateEmail name donor name donor emaildonor date0 2013-03-27 10:00:00 a@a.ca John Doe John Doe a@a.ca 2013-03-01 10:39:001 2013-03-01 10:39:00 a@a.ca John Doe John Doe a@a.ca 2013-03-01 10:39:002 2013-03-02 10:39:00 d@d.ca Kathy test Kat test d@d.ca 2013-03-27 10:39:003 2013-03-03 10:39:00 asdf@asdf.ca Tes Ester 4 2013-03-04 10:39:00 something@a.ca Jane Doe Jane Doe something@a.ca 2013-03-04 10:39:00
欢迎分享,转载请注明来源:内存溢出
评论列表(0条)