python-如何使用熊猫融化的值和它的错误

python-如何使用熊猫融化的值和它的错误,第1张

概述我有以下广泛的数据集:import pandas as pd from io import StringIO testcsv = '''P,N,N_relerr,F,F_relerr 10,6073.98,0.0022,61.973,0.0036 12,6412.97,0.0021,65.405,0.0036 4,4141.24,0.0019,42.8202

我有以下广泛的数据集:

import pandas as pdfrom io import StringIOtestcsv = """P,N,N_relerr,F,F_relerr10,6073.98,0.0022,61.973,0.003612,6412.97,0.0021,65.405,0.00364,4141.24,0.0019,42.8202,0.00326,5009.83,51.9615,0.00318,5601.87,0.0025,57.8129,0.0042"""csvfile = StringIO(testcsv)df = pd.read_csv(csvfile)    P   N           N_relerr  F         F_relerr0   10  6073.98     0.0022    61.9730   0.00361   12  6412.97     0.0021    65.4050   0.00362   4   4141.24     0.0019    42.8202   0.00323   6   5009.83     0.0019    51.9615   0.00314   8   5601.87     0.0025    57.8129   0.0042

我想变成一个长数据集,该数据集具有“计数”(N和F列)以及相关的错误(N_relerr和F_relerr):

    P   which   count       err0   10  N       6073.9800   0.00221   12  N       6412.9700   0.00212   4   N       4141.2400   0.00193   6   N       5009.8300   0.00194   8   N       5601.8700   0.00255   10  F       61.9730     0.00366   12  F       65.4050     0.00367   4   F       42.8202     0.00328   6   F       51.9615     0.00319   8   F       57.8129     0.0042

因为这是格式,所以我需要使用带有’N’和’F’计数彼此区分的plotnine绘制误差线.我当前非常难看的解决方案是:

dflong = (df[['P','N','F']]           .melt(ID_vars=['P'],var_name='which',value_name='count'))dferr = (df[['P','N_relerr','F_relerr']]          .melt(ID_vars=['P'],value_name='count_relerr'))dflong['err'] = dferr['count_relerr'].copy()

我的猜测是,有一个优雅的方法可以使用multiindex列以及堆栈,从看起来像这样的数据集开始:

            N                   F    P       counts    relerr    counts    relerr0   10      6073.98   0.0022    61.9730   0.00361   12      6412.97   0.0021    65.4050   0.00362   4       4141.24   0.0019    42.8202   0.00323   6       5009.83   0.0019    51.9615   0.00314   8       5601.87   0.0025    57.8129   0.0042

我可以通过以下方式创建该数据框:

cols = {'P': 'P','N': ('N','counts'),'N_relerr': ('N',"relerr"),'F': ('F','F_relerr': ('F','relerr')}nested_df = df.rename(columns=cols)nested_df.columns = [c if isinstance(c,tuple)                      else ('',c) for c in nested_df.columns]nested_df.columns = pd.MultiIndex.from_tuples(nested_df.columns) 

(我认为必须有一个更好的方法),但是我还没有弄清楚如何有效地使用堆栈来获得我想要的东西.

有人知道规范的解决方案吗?谢谢!

最佳答案您可以使用pd.wide_to_long,非常适合那种“同时融化”的情况,只需稍微重命名列即可.

import pandas as pdfrom io import StringIOtestcsv = """P,0.0042"""csvfile = StringIO(testcsv)df = pd.read_csv(csvfile)#Rename columns with set_axisd1 = df.set_axis(['P','Count_N','Err_N','Count_F','Err_F'],axis=1,inplace=False)#Use pd.wIDe_to_long to reshape dataframepd.wIDe_to_long(d1,['Count','Err'],'P','which',sep='_',suffix='.+')

输出:

              Count     ErrP  which                   10 N      6073.9800  0.002212 N      6412.9700  0.00214  N      4141.2400  0.00196  N      5009.8300  0.00198  N      5601.8700  0.002510 F        61.9730  0.003612 F        65.4050  0.00364  F        42.8202  0.00326  F        51.9615  0.00318  F        57.8129  0.0042
总结

以上是内存溢出为你收集整理的python-如何使用熊猫融化的值和它的错误 全部内容,希望文章能够帮你解决python-如何使用熊猫融化的值和它的错误 所遇到的程序开发问题。

如果觉得内存溢出网站内容还不错,欢迎将内存溢出网站推荐给程序员好友。

欢迎分享,转载请注明来源:内存溢出

原文地址: http://outofmemory.cn/langs/1199467.html

(0)
打赏 微信扫一扫 微信扫一扫 支付宝扫一扫 支付宝扫一扫
上一篇 2022-06-04
下一篇 2022-06-04

发表评论

登录后才能评论

评论列表(0条)

保存