在具有窄Unipre构建的Python 3.2.1上:
PythonWin 3.2.1 (default, Jul 10 2011, 21:51:15) [MSC v.1500 32 bit (Intel)] on win32.Portions Copyright 1994-2008 Mark Hammond - see 'Help/about PythonWin' for further copyright information.>>> import sys>>> sys.maxunipre65535
您发现的内容(UTF-16编码):
>>> s = "abcu20acU00010302U0010fffd">>> len(s)8>>> for c in s:... print('U+{:04X}'.format(ord(c)))... U+0061U+0062U+0063U+20ACU+D800U+DF02U+DBFFU+DFFD
解决方法:
Python 3.3更新:>>> import struct>>> s=s.enpre('utf-32-be')>>> struct.unpack('>{}L'.format(len(s)//4),s)(97, 98, 99, 8364, 66306, 1114109)>>> for i in struct.unpack('>{}L'.format(len(s)//4),s):... print('U+{:04X}'.format(i))... U+0061U+0062U+0063U+20ACU+10302U+10FFFD
现在,它可以按照OP的期望进行工作:
>>> s = "abcu20acU00010302U0010fffd">>> len(s)6>>> for c in s:... print('U+{:04X}'.format(ord(c)))... U+0061U+0062U+0063U+20ACU+10302U+10FFFD
欢迎分享,转载请注明来源:内存溢出
评论列表(0条)