如何在Pandas数据框中传播列

如何在Pandas数据框中传播列,第1张

如何在Pandas数据框中传播列

使用

pivot
unstack

#df = df[['gene_symbol', 'sample_id', 'fc']]df = df.pivot(index='gene_symbol',columns='sample_id',values='fc')print (df)sample_id       S1     S2gene_symbol   a 100.0    1.3b 100.0   14.0c 112.0  125.0

df = df.set_index(['gene_symbol','sample_id'])['fc'].unstack(fill_value=0)print (df)sample_id       S1     S2gene_symbol   a 100.0    1.3b 100.0   14.0c 112.0  125.0

但是,如果重复,需要

pivot_table
集合体
groupby
,或
mean
可以改变
sum
median
…:

df = pd.Dataframe({    'fc': [100,100,112,1.3,14,125, 100],    'sample_id': ['S1','S1','S1','S2','S2','S2', 'S2'],    'gene_symbol': ['a', 'b', 'c', 'a', 'b', 'c', 'c'],    })print (df)      fc gene_symbol sample_id0  100.0a        S11  100.0b        S12  112.0c        S13    1.3a        S24   14.0b        S25  125.0c        S2 <- same c, S2, different fc6  100.0c        S2 <- same c, S2, different fcdf = df.pivot(index='gene_symbol',columns='sample_id',values='fc')

ValueError:索引包含重复的条目,无法重塑

df = df.pivot_table(index='gene_symbol',columns='sample_id',values='fc', aggfunc='mean')print (df)sample_id       S1     S2gene_symbol   a 100.0    1.3b 100.0   14.0c 112.0  112.5

df = df.groupby(['gene_symbol','sample_id'])['fc'].mean().unstack(fill_value=0)print (df)sample_id       S1     S2gene_symbol   a 100.0    1.3b 100.0   14.0c 112.0  112.5

编辑:

对于设置

columns name
None
和的清洁
reset_index

df.columns.name = Nonedf = df.reset_index()print (df)  gene_symbol     S1     S20a  100.0    1.31b  100.0   14.02c  112.0  112.5


欢迎分享,转载请注明来源:内存溢出

原文地址: https://outofmemory.cn/zaji/5643543.html

(0)
打赏 微信扫一扫 微信扫一扫 支付宝扫一扫 支付宝扫一扫
上一篇 2022-12-16
下一篇 2022-12-16

发表评论

登录后才能评论

评论列表(0条)

保存