被爬虫的网页是"UTF-8"格式的编码,但是我在保存内容时
from urllib.request import urlopen def get_url(): url = 'https://www.hao123.com/' resp = urlopen(url) with open('baidu.html', mode='w') as file: content = resp.read() # print(f) # file.write(f) file.write(content.decode("UTF-8")) print('file is done!!') if __name__ == '__main__': get_url()
出现了下面的错误
UnicodeEncodeError: 'gbk' codec can't encode character 'u2022' in position 252532: illegal multibyte sequence2. 问题及解决方案
原因是windows默认打开文件的时候采用的是‘gbk'编码,这里我们修改其编码的方式为’UTF-8‘即可
with open('baidu.html', mode='w', encoding="utf-8") as file:
在打开的这行函数加了encoding="utf-8"
欢迎分享,转载请注明来源:内存溢出
评论列表(0条)