如何在Pandas数据框中传播列_随笔

如何在Pandas数据框中传播列

使用

pivot

或

unstack

：

#df = df[['gene_symbol', 'sample_id', 'fc']]df = df.pivot(index='gene_symbol',columns='sample_id',values='fc')print (df)sample_id       S1     S2gene_symbol   a 100.0    1.3b 100.0   14.0c 112.0  125.0

df = df.set_index(['gene_symbol','sample_id'])['fc'].unstack(fill_value=0)print (df)sample_id       S1     S2gene_symbol   a 100.0    1.3b 100.0   14.0c 112.0  125.0

但是，如果重复，需要

pivot_table

或集合体

groupby

，或

mean

可以改变

sum

，

median

…：

df = pd.Dataframe({    'fc': [100,100,112,1.3,14,125, 100],    'sample_id': ['S1','S1','S1','S2','S2','S2', 'S2'],    'gene_symbol': ['a', 'b', 'c', 'a', 'b', 'c', 'c'],    })print (df)      fc gene_symbol sample_id0  100.0a        S11  100.0b        S12  112.0c        S13    1.3a        S24   14.0b        S25  125.0c        S2 <- same c, S2, different fc6  100.0c        S2 <- same c, S2, different fcdf = df.pivot(index='gene_symbol',columns='sample_id',values='fc')

ValueError：索引包含重复的条目，无法重塑

df = df.pivot_table(index='gene_symbol',columns='sample_id',values='fc', aggfunc='mean')print (df)sample_id       S1     S2gene_symbol   a 100.0    1.3b 100.0   14.0c 112.0  112.5

df = df.groupby(['gene_symbol','sample_id'])['fc'].mean().unstack(fill_value=0)print (df)sample_id       S1     S2gene_symbol   a 100.0    1.3b 100.0   14.0c 112.0  112.5

编辑：

对于设置

columns name

为

None

和的清洁

reset_index

：

df.columns.name = Nonedf = df.reset_index()print (df)  gene_symbol     S1     S20a  100.0    1.31b  100.0   14.02c  112.0  112.5

欢迎分享，转载请注明来源：内存溢出

原文地址: https://outofmemory.cn/zaji/5643543.html

如何在Pandas数据框中传播列

发表评论

评论列表（0条）