使用特殊字符时，Python返回错误的字符串长度_随笔

使用特殊字符时，Python返回错误的字符串长度

UTF-8是一种unipre编码，它对特殊字符使用多个字节。如果您不希望编码字符串的长度，只需对其进行解码并

len()

在

unipre

对象（而不是

str

对象！）上使用。

这里有些例子：

>>> # creates a str literal (with utf-8 encoding, if this was>>> # specified on the beginning of the file):>>> len('ë́aúlt') 9>>> # creates a unipre literal (you should generally use this>>> # version if you are dealing with special characters):>>> len(u'ë́aúlt') 6>>> # the same str literal (written in an enpred notation):>>> len('xc3xabxccx81axc3xbalt') 9>>> # you can convert any str to an unipre object by decoding() it:>>> len('xc3xabxccx81axc3xbalt'.depre('utf-8')) 6

当然，您也可以

unipre

像在对象中那样访问对象中的单个字符

str

（它们都继承自对象

basestring

，因此具有相同的方法）：

>>> test = u'ë́aúlt'>>> print test[0]ë

如果您开发本地化的应用程序，通常最好在

unipre

内部仅使用-
objects，方法是解码得到的所有输入。工作完成后，您可以再次将结果编码为’UTF-8’。如果坚持这一原则，您将永远不会看到服务器因任何内部错误而崩溃

UnipreDepreError

；）

PS：请注意，

str

和

unipre

数据类型在Python 3中已发生了显着变化。在Python
3中，只有unipre字符串和纯字节字符串不能再混合使用。这应该有助于避免unipre处理的常见陷阱…

问候克里斯托夫

欢迎分享，转载请注明来源：内存溢出

原文地址: http://outofmemory.cn/zaji/5644963.html

使用特殊字符时，Python返回错误的字符串长度

发表评论

评论列表（0条）