更新1
我不明白,在UTF-16中,每个字符不会占用2个字节,而不是一个,并且与ascii不同?例如,UTF-16(U 0045)中的字符E是0xfeff0045.那是0xfeff然后是0x0045,但是一些编码会改变它的结尾.你是否必须通过检查0xfeff并意识到它不能是ASCII或其他什么?
解决方法 以下是W3C对此的评价:The XML enCoding declaration functions
as an internal label on each entity,
indicating which character enCoding is
in use. Before an XML processor can
read the internal label,however,it
apparently has to kNow what character
enCoding is in use–which is what the
internal label is trying to indicate.
In the general case,this is a
hopeless situation. It is not entirely
hopeless in XML,because XML
limits the general case in two ways:
each implementation is assumed to
support only a finite set of character
enCodings,and the XML enCoding
declaration is restricted in position
and content in order to make it
feasible to autodetect the character
enCoding in use in each entity in
normal cases.
http://www.w3.org/TR/2000/REC-xml-20001006#sec-guessing
总结以上是内存溢出为你收集整理的html – 如何在不知道编码的情况下读取编码头?全部内容,希望文章能够帮你解决html – 如何在不知道编码的情况下读取编码头?所遇到的程序开发问题。
如果觉得内存溢出网站内容还不错,欢迎将内存溢出网站推荐给程序员好友。
欢迎分享,转载请注明来源:内存溢出
评论列表(0条)