使用起来
np.where更快。使用与您使用类似的模式
replace:
df['col1'] = np.where(df['col1'] == 0, df['col2'], df['col1'])df['col1'] = np.where(df['col1'] == 0, df['col3'], df['col1'])
但是,使用嵌套
np.where稍微快一点:
df['col1'] = np.where(df['col1'] == 0, np.where(df['col2'] == 0, df['col3'], df['col2']),df['col1'])
时机
df = pd.concat([df]*10**4, ignore_index=True)def root_nested(df): df['col1'] = np.where(df['col1'] == 0, np.where(df['col2'] == 0, df['col3'], df['col2']), df['col1']) return dfdef root_split(df): df['col1'] = np.where(df['col1'] == 0, df['col2'], df['col1']) df['col1'] = np.where(df['col1'] == 0, df['col3'], df['col1']) return dfdef pir2(df): df['col1'] = df.where(df.ne(0), np.nan).bfill(axis=1).col1.fillna(0) return dfdef pir2_2(df): slc = (df.values != 0).argmax(axis=1) return df.values[np.arange(slc.shape[0]), slc]def andrew(df): df.col1[df.col1 == 0] = df.col2 df.col1[df.col1 == 0] = df.col3 return dfdef pablo(df): df['col1'] = df['col1'].replace(0,df['col2']) df['col1'] = df['col1'].replace(0,df['col3']) return df
我得到以下计时:
%timeit root_nested(df.copy())100 loops, best of 3: 2.25 ms per loop%timeit root_split(df.copy())100 loops, best of 3: 2.62 ms per loop%timeit pir2(df.copy())100 loops, best of 3: 6.25 ms per loop%timeit pir2_2(df.copy())1 loop, best of 3: 2.4 ms per loop%timeit andrew(df.copy())100 loops, best of 3: 8.55 ms per loop
我尝试计时您的方法,但是它已经运行了几分钟,但没有完成。作为比较,仅对6行示例Dataframe(而不是上面测试的较大行)计时您的方法就花费了12.8
ms。
欢迎分享,转载请注明来源:内存溢出
评论列表(0条)