查找任何文件编码的有效方法_随笔

查找任何文件编码的有效方法

该

StreamReader.CurrentEncoding

属性很少为我返回正确的文本文件编码。通过分析文件的字节序标记（BOM），我在确定文件的字节序方面取得了更大的成功。如果文件没有BOM，则无法确定文件的编码。

*已更新4/08/2020，包括UTF-32LE检测并返回UTF-32BE的正确编码

/// <summary>/// Determines a text file's encoding by analyzing its byte order mark (BOM)./// Defaults to ASCII when detection of the text file's endianness fails./// </summary>/// <param name="filename">The text file to analyze.</param>/// <returns>The detected encoding.</returns>public static Encoding GetEncoding(string filename){    // Read the BOM    var bom = new byte[4];    using (var file = new FileStream(filename, FileMode.Open, FileAccess.Read))    {        file.Read(bom, 0, 4);    }    // Analyze the BOM    if (bom[0] == 0x2b && bom[1] == 0x2f && bom[2] == 0x76) return Encoding.UTF7;    if (bom[0] == 0xef && bom[1] == 0xbb && bom[2] == 0xbf) return Encoding.UTF8;    if (bom[0] == 0xff && bom[1] == 0xfe && bom[2] == 0 && bom[3] == 0) return Encoding.UTF32; //UTF-32LE    if (bom[0] == 0xff && bom[1] == 0xfe) return Encoding.Unipre; //UTF-16LE    if (bom[0] == 0xfe && bom[1] == 0xff) return Encoding.BigEndianUnipre; //UTF-16BE    if (bom[0] == 0 && bom[1] == 0 && bom[2] == 0xfe && bom[3] == 0xff) return new UTF32Encoding(true, true);  //UTF-32BE    // We actually have no idea what the encoding is if we reach this point, so    // you may wish to return null instead of defaulting to ASCII    return Encoding.ASCII;}

欢迎分享，转载请注明来源：内存溢出

原文地址: https://outofmemory.cn/zaji/5505797.html

查找任何文件编码的有效方法

发表评论

评论列表（0条）