在Python中浏览HTML DOM

在Python中浏览HTML DOM,第1张

在Python中浏览HTML DOM

您可以使用许多不同的模块。例如,lxml或BeautifulSoup。

这是一个

lxml
例子

import lxml.htmlmysite = urllib.request.urlopen('http://www.google.com').read()lxml_mysite = lxml.html.fromstring(mysite)description = lxml_mysite.xpath("//meta[@name='description']")[0] # meta tag descriptiontext = description.get('content') # content attribute of the tag>>> print(text)"Search the world's information, including webpages, images, videos and more. Google has many special features to help you find exactly what you're looking for."

还有一个

BeautifulSoup
例子:

from bs4 import BeautifulSoupmysite = urllib.request.urlopen('http://www.google.com').read()soup_mysite = BeautifulSoup(mysite)description = soup_mysite.find("meta", {"name": "description"}) # meta tag descriptiontext = description['content'] # text of content attribute>>> print(text)u"Search the world's information, including webpages, images, videos and more. Google has many special features to help you find exactly what you're looking for."

注意如何

BeautifulSoup
返回unipre字符串,而
lxml
不会。根据需要,这可能有用/有害。



欢迎分享,转载请注明来源:内存溢出

原文地址: https://outofmemory.cn/zaji/5650628.html

(0)
打赏 微信扫一扫 微信扫一扫 支付宝扫一扫 支付宝扫一扫
上一篇 2022-12-16
下一篇 2022-12-16

发表评论

登录后才能评论

评论列表(0条)

保存