在当前代码内:
Python 2.X的Python 3.Ximport urllib2, sysfrom BeautifulSoup import BeautifulSoupsite= "http://en.wikipedia.org/wiki/StackOverflow"hdr = {'User-Agent': 'Mozilla/5.0'}req = urllib2.Request(site,headers=hdr)page = urllib2.urlopen(req)soup = BeautifulSoup(page)print soup
带有Selenium的Python 3.X(执行Javascript函数)from bs4 import BeautifulSoupfrom urllib.request import Request, urlopensite= "http://en.wikipedia.org/wiki/StackOverflow"hdr = {'User-Agent': 'Mozilla/5.0'}req = Request(site,headers=hdr)page = urlopen(req)soup = BeautifulSoup(page)print(soup)
from selenium import webdriver as driverbrowser = driver.PhantomJS()p = browser.get("http://en.wikipedia.org/wiki/StackOverflow")assert "Stack Overflow - Wikipedia" in browser.title
修改后的版本起作用的原因是因为Wikipedia检查User-Agent是“流行的浏览器”
欢迎分享,转载请注明来源:内存溢出
评论列表(0条)