如何在Python 3中遍历Unicode字符？_随笔

如何在Python 3中遍历Unicode字符？

在具有窄Unipre构建的Python 3.2.1上：

PythonWin 3.2.1 (default, Jul 10 2011, 21:51:15) [MSC v.1500 32 bit (Intel)] on win32.Portions Copyright 1994-2008 Mark Hammond - see 'Help/about PythonWin' for further copyright information.>>> import sys>>> sys.maxunipre65535

您发现的内容（UTF-16编码）：

>>> s = "abcu20acU00010302U0010fffd">>> len(s)8>>> for c in s:...     print('U+{:04X}'.format(ord(c)))...     U+0061U+0062U+0063U+20ACU+D800U+DF02U+DBFFU+DFFD

解决方法：

>>> import struct>>> s=s.enpre('utf-32-be')>>> struct.unpack('>{}L'.format(len(s)//4),s)(97, 98, 99, 8364, 66306, 1114109)>>> for i in struct.unpack('>{}L'.format(len(s)//4),s):...     print('U+{:04X}'.format(i))...     U+0061U+0062U+0063U+20ACU+10302U+10FFFD

Python 3.3更新：

现在，它可以按照OP的期望进行工作：

>>> s = "abcu20acU00010302U0010fffd">>> len(s)6>>> for c in s:...     print('U+{:04X}'.format(ord(c)))...     U+0061U+0062U+0063U+20ACU+10302U+10FFFD

欢迎分享，转载请注明来源：内存溢出

原文地址: http://outofmemory.cn/zaji/5646263.html

如何在Python 3中遍历Unicode字符？

发表评论

评论列表（0条）