我可以在Python 3上提供lxml.etree.parse的URL吗?

我可以在Python 3上提供lxml.etree.parse的URL吗?,第1张

概述文档说我可以: lxml can parse from a local file, an HTTP URL or an FTP URL. It also auto-detects and reads gzip-compressed XML files (.gz). (从“Parsers”下的http://lxml.de/parsing.html起) 但一个快速的实验似乎暗示: Python 3.4 文档说我可以:

lxml can parse from a local file,an http URL or an FTP URL. It also
auto-detects and reads gzip-compressed XML files (.gz).

(从“Parsers”下的http://lxml.de/parsing.html起)

但一个快速的实验似乎暗示:

Python 3.4.1 (v3.4.1:c0e311e010fc,May 18 2014,10:45:13) [MSC v.1600 64 bit (AMD64)] on win32Type "help","copyright","credits" or "license" for more information.>>> from lxml import etree>>> parser = etree.HTMLParser()>>> from urllib.request import urlopen>>> with urlopen('https://pypi.python.org/simple') as f:...   tree = etree.parse(f,parser)...>>> tree2 = etree.parse('https://pypi.python.org/simple',parser)Traceback (most recent call last):  file "<stdin>",line 1,in <module>  file "lxml.etree.pyx",line 3299,in lxml.etree.parse (src\lxml\lxml.etree.c:72655)  file "parser.pxi",line 1791,in lxml.etree._parsedocument (src\lxml\lxml.etree.c:106263)  file "parser.pxi",line 1817,in lxml.etree._parsedocumentFromURL (src\lxml\lxml.etree.c:106564)  file "parser.pxi",line 1721,in lxml.etree._parseDocFromfile (src\lxml\lxml.etree.c:105561)  file "parser.pxi",line 1122,in lxml.etree._BaseParser._parseDocFromfile (src\lxml\lxml.etree.c:100456)  file "parser.pxi",line 580,in lxml.etree._ParserContext._handleParseResultDoc (src\lxml\lxml.etree.c:94543)  file "parser.pxi",line 690,in lxml.etree._handleParseResult (src\lxml\lxml.etree.c:96003)  file "parser.pxi",line 618,in lxml.etree._raiseParseError (src\lxml\lxml.etree.c:95015)OSError: Error reading file 'https://pypi.python.org/simple': Failed to load external entity "https://pypi.python.org/simple">>>

我可以使用urlopen方法,但文档似乎暗示传递URL在某种程度上更好.另外,如果文档不准确,我有点担心依赖lxml,特别是如果我开始需要做更复杂的事情.

从已知的URL解析带有lxml的HTML的正确方法是什么?我应该在哪里看到有记录的?

更新:如果我使用http网址而不是https网址,我会收到同样的错误.

解决方法 问题是lxml不支持httpS URL,而 http://pypi.python.org/simple重定向到httpS版本.

因此,对于任何安全的网站,您需要自己阅读URL:

from lxml import etreefrom urllib.request import urlopenparser = etree.HTMLParser()with urlopen('https://pypi.python.org/simple') as f:    tree = etree.parse(f,parser)
总结

以上是内存溢出为你收集整理的我可以在Python 3上提供lxml.etree.parse的URL吗?全部内容,希望文章能够帮你解决我可以在Python 3上提供lxml.etree.parse的URL吗?所遇到的程序开发问题。

如果觉得内存溢出网站内容还不错,欢迎将内存溢出网站推荐给程序员好友。

欢迎分享,转载请注明来源:内存溢出

原文地址: https://outofmemory.cn/langs/1207598.html

(0)
打赏 微信扫一扫 微信扫一扫 支付宝扫一扫 支付宝扫一扫
上一篇 2022-06-04
下一篇 2022-06-04

发表评论

登录后才能评论

评论列表(0条)

保存