首先使用回填
NaNs,然后通过
iloc以下方式选择第一列:
df['result'] = df[['c1','c2','c3','c4']].bfill(axis=1).iloc[:, 0].fillna('unknown')
要么:
df['result'] = df.iloc[:, 1:].bfill(axis=1).iloc[:, 0].fillna('unknown')
print (df) ID c1 c2 c3 c4 result0 1 a b a NaN a1 2 NaN cc dd cc cc2 3 NaN ee ff ee ee3 4 NaN NaN gg gg gg
性能 :
df = pd.concat([df] * 1000, ignore_index=True)In [220]: %timeit df['result'] = df[['c1','c2','c3','c4']].bfill(axis=1).iloc[:, 0].fillna('unknown')100 loops, best of 3: 2.78 ms per loopIn [221]: %timeit df['result'] = df.iloc[:, 1:].bfill(axis=1).iloc[:, 0].fillna('unknown')100 loops, best of 3: 2.7 ms per loop#jpp solutionIn [222]: %%timeit ...: cols = df.iloc[:, 1:].T.apply(pd.Series.first_valid_index) ...: ...: df['result'] = [df.loc[i, cols[i]] for i in range(len(df.index))] ...: 1 loop, best of 3: 180 ms per loop#cᴏʟᴅsᴘᴇᴇᴅ' s solutionIn [223]: %timeit df['result'] = df.stack().groupby(level=0).first()1 loop, best of 3: 606 ms per loop
欢迎分享,转载请注明来源:内存溢出
评论列表(0条)