网站源码获取
步骤代码
在掌握了python基本语法之后,便想继续学习一些python分支的一些东西练练手,便想到了python的爬虫,本文几乎只介绍了最基础的网站源码获取步骤。
1.导入相关库
import requests import re import time import json
2.编辑模拟信息
headers = { 'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/92.0.4515.131 Safari/537.36' }
3.进行爬取,并写入文件
代码import requests import re import time import json def get_one_page(url): headers = { 'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/92.0.4515.131 Safari/537.36' } response = requests.get(url,headers=headers) if response.status_code == 200 : return response.text return None def write_to_file(content): #存储到文件中 with open('result.txt','a',encoding='utf-8') as f: f.write(json.dumps(content,ensure_ascii=False)+'n') #利用json.dumps将字典转换成字符串的形式 f.close() def main(): url = 'https://www.maoyan.com/board/4' html = get_one_page(url) write_to_file(html) if __name__ == '__main__': main()
欢迎分享,转载请注明来源:内存溢出
评论列表(0条)