在Python中浏览HTML DOM_随笔

在Python中浏览HTML DOM

您可以使用许多不同的模块。例如，lxml或BeautifulSoup。

这是一个

lxml

例子：

import lxml.htmlmysite = urllib.request.urlopen('http://www.google.com').read()lxml_mysite = lxml.html.fromstring(mysite)description = lxml_mysite.xpath("//meta[@name='description']")[0] # meta tag descriptiontext = description.get('content') # content attribute of the tag>>> print(text)"Search the world's information, including webpages, images, videos and more. Google has many special features to help you find exactly what you're looking for."

还有一个

BeautifulSoup

例子：

from bs4 import BeautifulSoupmysite = urllib.request.urlopen('http://www.google.com').read()soup_mysite = BeautifulSoup(mysite)description = soup_mysite.find("meta", {"name": "description"}) # meta tag descriptiontext = description['content'] # text of content attribute>>> print(text)u"Search the world's information, including webpages, images, videos and more. Google has many special features to help you find exactly what you're looking for."

注意如何

BeautifulSoup

返回unipre字符串，而

lxml

不会。根据需要，这可能有用/有害。

欢迎分享，转载请注明来源：内存溢出

原文地址: https://outofmemory.cn/zaji/5650628.html

在Python中浏览HTML DOM

发表评论

评论列表（0条）