内容长度标头与手动计算时不一样吗？_随笔

内容长度标头与手动计算时不一样吗？

的

Content-Length

报头反映了响应的主体中。这与

text

content

属性的长度不同，因为响应可以被压缩
。

requests

为您解压缩响应。

您必须绕过许多内部管道来获取原始的，压缩的原始内容，然后，如果您希望

response

对象仍然能够正常工作，则必须访问更多内部组件。“最简单”的方法是启用流传输，然后从原始套接字读取：

from io import BytesIOr = requests.get(url, stream=True)# read directly from the raw urllib3 connectionraw_content = r.raw.read()content_length = len(raw_content)# replace the internal file-object to serve the data againr.raw._fp = BytesIO(raw_content)

演示：

>>> import requests>>> from io import BytesIO>>> url = "https://stackoverflow.com">>> r = requests.get(url, stream=True)>>> r.headers['Content-Encoding'] # a compressed response'gzip'>>> r.headers['Content-Length']   # the raw response contains 52055 bytes of compressed data'52055'>>> r.headers['Content-Type']     # we are served UTF-8 HTML data'text/html; charset=utf-8'>>> raw_content = r.raw.read()>>> len(raw_content)   # the raw content body length52055>>> r.raw._fp = BytesIO(raw_content)>>> len(r.content)    # the decompressed binary content, byte count258719>>> len(r.text)       # the Unipre content depred from UTF-8, character count258658

这会将完整的响应读入内存，因此，如果您期望较大的响应，请不要使用它！在这种情况下，您可以改为

shutil.copyfileobj()

将数据从

r.raw

文件复制到假脱机临时文件（一旦达到特定大小，它将切换到磁盘上的文件），获取该文件的文件大小，然后填充该文件上

r.raw._fp

。

将

Content-Type

标头添加到缺少该标头的任何请求的函数应如下所示：

import requestsimport shutilimport tempfiledef ensure_content_length(    url, *args, method='GET', session=None, max_size=2**20,  # 1Mb    **kwargs):    kwargs['stream'] = True    session = session or requests.Session()    r = session.request(method, url, *args, **kwargs)    if 'Content-Length' not in r.headers:        # stream content into a temporary file so we can get the real size        spool = tempfile.SpooledTemporaryFile(max_size)        shutil.copyfileobj(r.raw, spool)        r.headers['Content-Length'] = str(spool.tell())        spool.seek(0)        # replace the original socket with our temporary file        r.raw._fp.close()        r.raw._fp = spool    return r

这接受现有的会话，并允许您也指定请求方法。

max_size

根据需要调整内存限制。上的演示

https://github.com

，缺少

Content-Length

标题：

>>> r = ensure_content_length('https://github.com/')>>> r<Response [200]>>>> r.headers['Content-Length']'14490'>>> len(r.content)54814

请注意，如果不存在

Content-Encoding

标题，或者该标题的值设置为

identity

，并且

Content-Length

可用，那么您可以依靠

Content-Length

响应的完整大小。那是因为那时显然没有压缩。

附带说明：

sys.getsizeof()

如果您所追求的是a

bytes

或

str

对象的长度（该对象中的字节或字符数），则不应使用。

sys.getsizeof()

为您提供了Python对象的内部内存占用空间，该内存占用空间不只是该对象中的字节数或字符数。请参阅python中的len（）和sys.getsizeof（）方法之间有什么区别？

欢迎分享，转载请注明来源：内存溢出

原文地址: http://outofmemory.cn/zaji/5021261.html

内容长度标头与手动计算时不一样吗？

发表评论

评论列表（0条）