我们可以将xpath与BeautifulSoup一起使用吗？_随笔

我们可以将xpath与BeautifulSoup一起使用吗？

不，BeautifulSoup本身不支持XPath表达式。

另一种库，LXML，不支持的XPath
1.0。它具有BeautifulSoup兼容模式，它将以Soup的方式尝试解析损坏的HTML。但是，默认的lxml
HTML解析器
可以很好地完成解析损坏的HTML的工作，而且我相信它的速度更快。

将文档解析为lxml树后，就可以使用该

.xpath()

方法搜索元素。

try:    # Python 2    from urllib2 import urlopenexcept importError:    from urllib.request import urlopenfrom lxml import etreeurl =  "http://www.example.com/servlet/av/ResultTemplate=AVResult.html"response = urlopen(url)htmlparser = etree.HTMLParser()tree = etree.parse(response, htmlparser)tree.xpath(xpathselector)

还有一个带有附加功能的专用

lxml.html()

模块。

请注意，在上面的示例中，我将

response

对象直接传递给

lxml

，因为直接从流中读取解析器比将响应首先读取到大字符串中更有效。要对

requests

库执行相同的 *** 作，您需要在启用透明传输解压缩后设置

stream=True

并传递

response.raw

对象：)

import lxml.htmlimport requestsurl =  "http://www.example.com/servlet/av/ResultTemplate=AVResult.html"response = requests.get(url, stream=True)response.raw.depre_content = Truetree = lxml.html.parse(response.raw)

您可能会感兴趣的是CSS选择器支持；在

CSSSelector

类转换CSS语句转换为XPath表达式，使您的搜索

td.empformbody

更加容易：

from lxml.cssselect import CSSSelectortd_empformbody = CSSSelector('td.empformbody')for elem in td_empformbody(tree):    # Do something with these table cells.

即将来临：BeautifulSoup本身确实
具有非常完整的CSS选择器支持：

for cell in soup.select('table#foobar td.empformbody'):    # Do something with these table cells.

欢迎分享，转载请注明来源：内存溢出

原文地址: http://outofmemory.cn/zaji/5616570.html

我们可以将xpath与BeautifulSoup一起使用吗？

发表评论

评论列表（0条）