我会做这样的事情:
from scrapy import CrawlSpiderfrom selenium import webdriverimport timeclass FooSpider(CrawlSpider): name = 'foo' allow_domains = 'foo.com' start_urls = ['foo.com'] def __init__(self, *args, **kwargs): super(FooSpider, self).__init__(*args, **kwargs) self.download_delay = 0.25 self.browser = webdriver.Firefox() self.browser.implicitly_wait(60) def parse_foo(self.response): self.browser.get(response.url) # load response to the browser button = self.browser.find_element_by_xpath("path") # find # the element to click to button.click() # click time.sleep(1) # wait until the page is fully loaded source = self.browser.page_source # get source of the loaded page sel = Selector(text=source) # create a Selector object data = sel.xpath('path/to/the/data') # select data ...
不过,最好不要等待固定的时间。因此
time.sleep(1),您可以使用http://www.obeythetestinggoat.com/how-
to-get-selenium-to-wait-for-page-load-after-a-
click.html中介绍的方法之一来代替 。
欢迎分享,转载请注明来源:内存溢出
评论列表(0条)