CrawlerProcess与CrawlerRunner

CrawlerProcess与CrawlerRunner,第1张

CrawlerProcess与CrawlerRunner

Scrapy的文档在给出两者的实际应用示例方面做得非常糟糕。

CrawlerProcess
假设只有刮板才是使用扭曲的电抗器的唯一方法。如果您在python中使用线程来运行其他代码,则并非总是如此。让我们以此为例。

from scrapy.crawler import CrawlerProcessimport scrapydef notThreadSafe(x):    """do something that isn't thread-safe"""    # ...class MySpider1(scrapy.Spider):    # Your first spider definition    ...class MySpider2(scrapy.Spider):    # Your second spider definition    ...process = CrawlerProcess()process.crawl(MySpider1)process.crawl(MySpider2)process.start() # the script will block here until all crawling jobs are finishednotThreadSafe(3) # it will get executed when the crawlers stop

现在,您可以看到,该函数仅在搜寻器停止时才执行,如果我希望在搜寻器在同一反应堆中爬行时执行该函数,该怎么办?

from twisted.internet import reactorfrom scrapy.crawler import CrawlerRunnerimport scrapydef notThreadSafe(x):    """do something that isn't thread-safe"""    # ...class MySpider1(scrapy.Spider):    # Your first spider definition    ...class MySpider2(scrapy.Spider):    # Your second spider definition    ...runner = CrawlerRunner()runner.crawl(MySpider1)runner.crawl(MySpider2)d = runner.join()d.addBoth(lambda _: reactor.stop())reactor.callFromThread(notThreadSafe, 3)reactor.run() #it will run both crawlers and pre inside the function

Runner类不限于此功能,您可能需要在反应堆上进行一些自定义设置(延迟,线程,getPage,自定义错误报告等)



欢迎分享,转载请注明来源:内存溢出

原文地址: http://outofmemory.cn/zaji/5620366.html

(0)
打赏 微信扫一扫 微信扫一扫 支付宝扫一扫 支付宝扫一扫
上一篇 2022-12-15
下一篇 2022-12-15

发表评论

登录后才能评论

评论列表(0条)

保存