以父属性作为列标题将XML提取到数据框中_随笔

以父属性作为列标题将XML提取到数据框中

我建议仅先解析为一个Dataframe，类似于您已经使用的方式（请参见下面的实现），然后根据您的要求进行调整。

然后，您正在寻找

pivot

：

In [11]: dfOut[11]:  child  Time  grandchild0  blah  1200         1001  blah  1300          302   abc  120023   abc  130044   abc  14002In [12]: df.pivot('Time', 'child', 'grandchild')Out[12]:child  abc  blahTime1200     2   1001300     4    301400     2   NaN

我建议先从文件中解析，然后将所需的内容放入元组列表中：

from lxml import etreeroot = etree.parse(file_name)parents = root.getchildren()[0].getchildren()In [21]: elems = [(p.attrib['name'], int(c.attrib['Time']), int(gc.text))for p in parentsfor c in pfor gc in c]In [22]: elemsOut[22]:[('blah', 1200, 100), ('blah', 1300, 30), ('blah', 1400, 70), ('abc', 1200, 2), ('abc', 1300, 4), ('abc', 1400, 2)]

对于多个文件，您甚至可以以更长的列表理解力对其进行重击。 除非您有大量的xml（这

files

是xml的列表），否则这应该不会太慢…

elems = [(p.attrib['name'], int(c.attrib['Time']), int(gc.text)) for f in files for p in etree.parse(f).getchildren()[0].getchildren() for c in p for gc in c]

将它们放在Dataframe中：

In [23]: pd.Dataframe(elems, columns=['child', 'Time', 'grandchild'])Out[23]:  child  Time grandchild0  blah  1200        1001  blah  1300         302  blah  1400         703   abc  1200          24   abc  1300          45   abc  1400          2

然后做枢轴。:)

欢迎分享，转载请注明来源：内存溢出

原文地址: http://outofmemory.cn/zaji/5667112.html

以父属性作为列标题将XML提取到数据框中

发表评论

评论列表（0条）