Xpath实战 :猪八戒网
重点: 点击页面——》右键检查——》点击代码右键——》copy xpath!!!!
#In[] ####Xpath实战 :猪八戒网 ## 1、拿到页面源代码 ## 2、提取和解析数据 import requests from lxml import etree url = "https://beijing.zbj.com/search/f/?type=new&kw=saas" resp = requests.get(url) # print(resp.text) #解析 html = etree.HTML(resp.text) #拿到每一个服务商的div divs = html.xpath("/html/body/div[6]/div/div/div[2]/div[5]/div[1]/div") for div in divs: price = div.xpath("./div/div/a[2]/div[2]/div[1]/span[1]/text()")[0].strip("¥") businiss = div.xpath("./div/div/a[1]/div[1]/p/text()")[1].strip('n') title = "saas" + div.xpath("./div/div/a[2]/div[2]/div[2]/p/text()")[0] num = div.xpath("./div/div/a[2]/div[2]/div[1]/span[2]/text()")[0] location = div.xpath("./div/div/a[1]/div[1]/div/span/text()")[0] if div.xpath("./div/div/a[1]/div[2]/span[2]/i[2]/text()") == [] or div.xpath("./div/div/a[1]/div[2]/span[2]/i[2]/text()") == '' or div.xpath("./div/div/a[1]/div[2]/span[2]/i[2]/text()") is None: location_issue = "NULL" else: star_level = div.xpath("./div/div/a[1]/div[2]/span[2]/i[2]/text()")[0] if div.xpath("./div/div/a[1]/div[2]/span[3]/i/text()") == [] or div.xpath("./div/div/a[1]/div[2]/span[3]/i/text()") == '' or div.xpath("./div/div/a[1]/div[2]/span[3]/i/text()") is None: location_issue = "NULL" else: location_issue = div.xpath("./div/div/a[1]/div[2]/span[3]/i/text()")[0] # print(location_issue) print(location + 't' + star_level + 't' + location_issue + 't' + businiss + 't' + title + 't' + price + 't' + num) # print(div.xpath("./div/div/a[1]/div[2]/span[3]/i/text()"))
欢迎分享,转载请注明来源:内存溢出
评论列表(0条)