从reStructuredText提取字段列表_随笔

从reStructuredText提取字段列表

您可以尝试使用类似以下代码的内容。而不是使用

publish_parts

我曾经使用过的方法

publish_doctree

来获取文档的伪XML表示形式。然后，我已转换为XML
DOM，以提取所有

field

元素。然后我得到每个元素的第一个

field_name

和

field_body

元素

field

。

from docutils.core import publish_doctreesource = """Some text ...:foo: barSome text ..."""# Parse reStructuredText input, returning the Docutils doctree as# an `xml.dom.minidom.document` instance.doctree = publish_doctree(source).asdom()# Get all field lists in the document.fields = doctree.getElementsByTagName('field')d = {}for field in fields:    # I am assuming that `getElementsByTagName` only returns one element.    field_name = field.getElementsByTagName('field_name')[0]    field_body = field.getElementsByTagName('field_body')[0]    d[field_name.firstChild.nodevalue] =         " ".join(c.firstChild.nodevalue for c in field_body.childNodes)print d # Prints {u'foo': u'bar'}

该xml.dom的模块是不是最容易与工作（为什么我需要使用

.firstChild.nodevalue

，而不是仅仅

.nodevalue

例如），所以你可能希望使用xml.etree.ElementTree模块，我觉得轻松了许多与工作。如果您使用LXML你也可以使用XPath表示法来找到所有的

field

，

field_name

和

field_body

元素。

欢迎分享，转载请注明来源：内存溢出

原文地址: http://outofmemory.cn/zaji/5674776.html

从reStructuredText提取字段列表

发表评论

评论列表（0条）