无头无尽滚动selenium

无头无尽滚动selenium,第1张

无头无尽滚动selenium

这是使它在无头模式下对我有用的一系列方法:

  • 切换到
    PhantomJS
  • 通过设置自定义User-Agent字符串来伪装成其他浏览器
  • 滚动到最后一条推文之前,请滚动到页面顶部(几次以提高可靠性)

代码:

import timedef return_html_pre(url):    dcap = dict(webdriver.DesiredCapabilities.PHANTOMJS)    dcap["phantomjs.page.settings.userAgent"] = "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_8_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/29.0.1547.57 Safari/537.36"    driver = webdriver.PhantomJS(desired_capabilities=dcap)    driver.maximize_window()    driver.get(url)    # initial wait for the tweets to load    wait = WebDriverWait(driver, 30)    wait.until(EC.visibility_of_element_located((By.CSS_SELECTOR, "li[data-item-id]")))    # scroll down to the last tweet until there is no more tweets loaded    while True:        tweets = driver.find_elements_by_css_selector("li[data-item-id]")        number_of_tweets = len(tweets)        print(number_of_tweets)        # move to the top and then to the bottom 5 times in a row        for _ in range(5): driver.execute_script("window.scrollTo(0, 0)") driver.execute_script("arguments[0].scrollIntoView(true);", tweets[-1]) time.sleep(0.5)        try: wait.until(wait_for_more_than_n_elements_to_be_present((By.CSS_SELECTOR, "li[data-item-id]"), number_of_tweets))        except TimeoutException: break


欢迎分享,转载请注明来源:内存溢出

原文地址: http://outofmemory.cn/zaji/5641952.html

(0)
打赏 微信扫一扫 微信扫一扫 支付宝扫一扫 支付宝扫一扫
上一篇 2022-12-16
下一篇 2022-12-16

发表评论

登录后才能评论

评论列表(0条)

保存