我有以下广泛的数据集:
import pandas as pdfrom io import StringIOtestcsv = """P,N,N_relerr,F,F_relerr10,6073.98,0.0022,61.973,0.003612,6412.97,0.0021,65.405,0.00364,4141.24,0.0019,42.8202,0.00326,5009.83,51.9615,0.00318,5601.87,0.0025,57.8129,0.0042"""csvfile = StringIO(testcsv)df = pd.read_csv(csvfile) P N N_relerr F F_relerr0 10 6073.98 0.0022 61.9730 0.00361 12 6412.97 0.0021 65.4050 0.00362 4 4141.24 0.0019 42.8202 0.00323 6 5009.83 0.0019 51.9615 0.00314 8 5601.87 0.0025 57.8129 0.0042
我想变成一个长数据集,该数据集具有“计数”(N和F列)以及相关的错误(N_relerr和F_relerr):
P which count err0 10 N 6073.9800 0.00221 12 N 6412.9700 0.00212 4 N 4141.2400 0.00193 6 N 5009.8300 0.00194 8 N 5601.8700 0.00255 10 F 61.9730 0.00366 12 F 65.4050 0.00367 4 F 42.8202 0.00328 6 F 51.9615 0.00319 8 F 57.8129 0.0042
因为这是格式,所以我需要使用带有’N’和’F’计数彼此区分的plotnine绘制误差线.我当前非常难看的解决方案是:
dflong = (df[['P','N','F']] .melt(ID_vars=['P'],var_name='which',value_name='count'))dferr = (df[['P','N_relerr','F_relerr']] .melt(ID_vars=['P'],value_name='count_relerr'))dflong['err'] = dferr['count_relerr'].copy()
我的猜测是,有一个优雅的方法可以使用multiindex列以及堆栈,从看起来像这样的数据集开始:
N F P counts relerr counts relerr0 10 6073.98 0.0022 61.9730 0.00361 12 6412.97 0.0021 65.4050 0.00362 4 4141.24 0.0019 42.8202 0.00323 6 5009.83 0.0019 51.9615 0.00314 8 5601.87 0.0025 57.8129 0.0042
我可以通过以下方式创建该数据框:
cols = {'P': 'P','N': ('N','counts'),'N_relerr': ('N',"relerr"),'F': ('F','F_relerr': ('F','relerr')}nested_df = df.rename(columns=cols)nested_df.columns = [c if isinstance(c,tuple) else ('',c) for c in nested_df.columns]nested_df.columns = pd.MultiIndex.from_tuples(nested_df.columns)
(我认为必须有一个更好的方法),但是我还没有弄清楚如何有效地使用堆栈来获得我想要的东西.
有人知道规范的解决方案吗?谢谢!
最佳答案您可以使用pd.wide_to_long
,非常适合那种“同时融化”的情况,只需稍微重命名列即可.import pandas as pdfrom io import StringIOtestcsv = """P,0.0042"""csvfile = StringIO(testcsv)df = pd.read_csv(csvfile)#Rename columns with set_axisd1 = df.set_axis(['P','Count_N','Err_N','Count_F','Err_F'],axis=1,inplace=False)#Use pd.wIDe_to_long to reshape dataframepd.wIDe_to_long(d1,['Count','Err'],'P','which',sep='_',suffix='.+')
输出:
Count ErrP which 10 N 6073.9800 0.002212 N 6412.9700 0.00214 N 4141.2400 0.00196 N 5009.8300 0.00198 N 5601.8700 0.002510 F 61.9730 0.003612 F 65.4050 0.00364 F 42.8202 0.00326 F 51.9615 0.00318 F 57.8129 0.0042
总结 以上是内存溢出为你收集整理的python-如何使用熊猫融化的值和它的错误 全部内容,希望文章能够帮你解决python-如何使用熊猫融化的值和它的错误 所遇到的程序开发问题。
如果觉得内存溢出网站内容还不错,欢迎将内存溢出网站推荐给程序员好友。
欢迎分享,转载请注明来源:内存溢出
评论列表(0条)