使用
pivot或
unstack:
#df = df[['gene_symbol', 'sample_id', 'fc']]df = df.pivot(index='gene_symbol',columns='sample_id',values='fc')print (df)sample_id S1 S2gene_symbol a 100.0 1.3b 100.0 14.0c 112.0 125.0
df = df.set_index(['gene_symbol','sample_id'])['fc'].unstack(fill_value=0)print (df)sample_id S1 S2gene_symbol a 100.0 1.3b 100.0 14.0c 112.0 125.0
但是,如果重复,需要
pivot_table或集合体
groupby,或
mean可以改变
sum,
median…:
df = pd.Dataframe({ 'fc': [100,100,112,1.3,14,125, 100], 'sample_id': ['S1','S1','S1','S2','S2','S2', 'S2'], 'gene_symbol': ['a', 'b', 'c', 'a', 'b', 'c', 'c'], })print (df) fc gene_symbol sample_id0 100.0a S11 100.0b S12 112.0c S13 1.3a S24 14.0b S25 125.0c S2 <- same c, S2, different fc6 100.0c S2 <- same c, S2, different fcdf = df.pivot(index='gene_symbol',columns='sample_id',values='fc')
ValueError:索引包含重复的条目,无法重塑
df = df.pivot_table(index='gene_symbol',columns='sample_id',values='fc', aggfunc='mean')print (df)sample_id S1 S2gene_symbol a 100.0 1.3b 100.0 14.0c 112.0 112.5
df = df.groupby(['gene_symbol','sample_id'])['fc'].mean().unstack(fill_value=0)print (df)sample_id S1 S2gene_symbol a 100.0 1.3b 100.0 14.0c 112.0 112.5
编辑:
对于设置
columns name为
None和的清洁
reset_index:
df.columns.name = Nonedf = df.reset_index()print (df) gene_symbol S1 S20a 100.0 1.31b 100.0 14.02c 112.0 112.5
欢迎分享,转载请注明来源:内存溢出
评论列表(0条)