从scrapy.selector导入选择器错误

转身之间 • 2022-11-12 • 随笔 • 阅读 7

尝试改为导入HtmlXPathSelector。

    from scrapy.selector import HtmlXPathSelector

然后使用.select（）方法解析出你的html。例如，

    sel = HtmlXPathSelector(response)    site_names = sel.select('//ul/li')

示例将如下所示：

    from scrapy.spider import baseSpider    from scrapy.selector import HtmlXPathSelector    class DmozSpider(baseSpider):        name = "dmoz"        allowed_domains = ["dmoz.org"]        start_urls = [ "http://www.dmoz.org/Computers/Programming/Languages/Python/Books/", "http://www.dmoz.org/Computers/Programming/Languages/Python/Resources/"        ]        def parse(self, response): sel = HtmlXPathSelector(response) sites = sel.select('//ul/li') for site in sites:     title = site.select('a/text()').extract()     link = site.select('a/@href').extract()     desc = site.select('text()').extract()     print title, link, desc

希望这可以帮助！

欢迎分享，转载请注明来源：内存溢出

原文地址: http://outofmemory.cn/zaji/4929658.html

导入示例改为解析尝试

打赏

微信扫一扫

支付宝扫一扫

转身之间一级用户组

0 0

在Django中使用AuthenticationForm

上一篇 2022-11-12

send_mass_emails和EmailMultiAlternatives

下一篇 2022-11-12

发表评论

登录后才能评论

从scrapy.selector导入选择器错误

发表评论

评论列表（0条）