本文实例讲述了Python实现从脚本里运行scrapy的方法。分享给大家供大家参考。具体如下:
复制代码 代码如下:#!/usr/bin/python
import os
os.environ.setdefault('SCRAPY_SETTINGS_MODulE','project.settings') #Must be at the top before other imports
from scrapy import log,signals,project
from scrapy.xlib.pydispatch import dispatcher
from scrapy.conf import settings
from scrapy.crawler import CrawlerProcess
from multiprocessing import Process,Queue
class CrawlerScript():
def __init__(self):
self.crawler = CrawlerProcess(settings)
if not hasattr(project,'crawler'):
self.crawler.install()
self.crawler.configure()
self.items = []
dispatcher.connect(self._item_passed,signals.item_passed)
def _item_passed(self,item):
self.items.append(item)
def _crawl(self,queue,spIDer_name):
spIDer = self.crawler.spIDers.create(spIDer_name)
if spIDer:
self.crawler.queue.append_spIDer(spIDer)
self.crawler.start()
self.crawler.stop()
queue.put(self.items)
def crawl(self,spIDer):
queue = Queue()
p = Process(target=self._crawl,args=(queue,spIDer,))
p.start()
p.join()
return queue.get(True)
# Usage
if __name__ == "__main__":
log.start()
"""
This example runs spIDer1 and then spIDer2 three times.
"""
items = List()
crawler = CrawlerScript()
items.append(crawler.crawl('spIDer1'))
for i in range(3):
items.append(crawler.crawl('spIDer2'))
print items
希望本文所述对大家的Python程序设计有所帮助。
总结以上是内存溢出为你收集整理的Python实现从脚本里运行scrapy的方法全部内容,希望文章能够帮你解决Python实现从脚本里运行scrapy的方法所遇到的程序开发问题。
如果觉得内存溢出网站内容还不错,欢迎将内存溢出网站推荐给程序员好友。
欢迎分享,转载请注明来源:内存溢出
评论列表(0条)