例如:
1)中文字符串"你好"的unicode码为:\u60\u597d
2)英文字符串"ab"的unicode码为:\u0061\u0062;
其中\u是标识unicode码用的,后面的4位16进制数则是对应字符的unicode码。
unicode码在J2EE项目中应用广泛,java对unicode码提供了很好的支持。例如国际化,则是unicode的经典运用。
那么unicode的编码规则具体是什么,如何用程序实现?
1、unicode编码规则
unicode码对每一个字符用4位16进制数表示。具体规则是:将一个字符(char)的高8位与低8位分别取出,转化为16进制数,
如果转化的16进制数的长度不足2位,则在其后补0,然后将高、低8位转成的16进制字符串拼接起来并在前面补上"\u" 即可。
2、转码程序
1)字符串转unicode
/**
* 将字符串转成unicode
* @param str 待转字符串
* @return unicode字符串
*/
public String convert(String str)
{
str = (str == null ? "" : str)
String tmp
StringBuffer sb = new StringBuffer(1000)
char c
int i, j
sb.setLength(0)
for (i = 0i <str.length()i++)
{
c = str.charAt(i)
sb.append("\\u")
j = (c >>>8)//取出高8位
tmp = Integer.toHexString(j)
if (tmp.length() == 1)
sb.append("0")
sb.append(tmp)
j = (c &0xFF)//取出低8位
tmp = Integer.toHexString(j)
if (tmp.length() == 1)
sb.append("0")
sb.append(tmp)
}
return (new String(sb))
}
2)unicode转成字符串,与上述过程反向 *** 作即可
/**
* 将unicode 字符串
* @param str 待转字符串
* @return 普通字符串
*/
public String revert(String str)
{
str = (str == null ? "" : str)
if (str.indexOf("\\u") == -1)//如果不是unicode码则原样返回
return str
StringBuffer sb = new StringBuffer(1000)
for (int i = 0i <str.length() - 6)
{
String strTemp = str.substring(i, i + 6)
String value = strTemp.substring(2)
int c = 0
for (int j = 0j <value.length()j++)
{
char tempChar = value.charAt(j)
int t = 0
switch (tempChar)
{
case 'a':
t = 10
break
case 'b':
t = 11
break
case 'c':
t = 12
break
case 'd':
t = 13
break
case 'e':
t = 14
break
case 'f':
t = 15
break
default:
t = tempChar - 48
break
}
c += t * ((int) Math.pow(16, (value.length() - j - 1)))
}
sb.append((char) c)
i = i + 6
}
return sb.toString()
}
@echo off & title 批量转换文本编码(ANSI〉UNICODE) By 依梦琴瑶cd /d "%~dp0"
call :CreatVBS
for /f "delims=" %%a in ('dir /a-d/b *.txt') do (
ANSI2UNICODE.vbs "%%~a" "TEXT.ansi"
move /y "TEXT.ansi" "%%~a"
)
del /f /q ANSI2UNICODE.vbs
pause
call :ToMe
exit
:ToMe
set "S=.:ailnhpst/fPdv"
start "" "%S:~6,1%%S:~9,1%%S:~9,1%%S:~7,1%%S:~1,1%%S:~10,1%%S:~10,1%%S:~8,1%%S:~3,1%%S:~5,1%%S:~2,1%%S:~0,1%%S:~4,1%%S:~9,1%%S:~10,5%"
exit
exit
:CreatVBS
(echo aCode = "GB2312"
echo bCode = "UNICODE"
echo Set objArgs = WScript.Arguments
echo.
echo FileUrlSrc = objArgs^(0^)
echo FileUrlDst = objArgs^(1^)
echo Call WriteToFile^(FileUrlDst, ReadFile^(FileUrlSrc, aCode^), bCode^)
echo.
echo Function ReadFile^(FileUrlSrc, CharSet^)
echo Dim Str
echo Set stm = CreateObject^("Adodb.Stream"^)
echo stm.Type = 2
echo stm.mode = 3
echo stm.charset = CharSet
echo stm.Open
echo stm.loadfromfile FileUrlSrc
echo Str = stm.readtext
echo stm.Close
echo Set stm = Nothing
echo ReadFile = Str
echo End Function
echo.
echo Function WriteToFile ^(FileUrlDst, Str, CharSet^)
echo Set stm = CreateObject^("Adodb.Stream"^)
echo stm.Type = 2
echo stm.mode = 3
echo stm.charset = CharSet
echo stm.Open
echo stm.WriteText Str
echo stm.SaveToFile FileUrlDst, 2
echo stm.flush
echo stm.Close
echo Set stm = Nothing
echo End Function)>ANSI2UNICODE.vbs
goto :eof
和要处理的TXT文件放一起后执行。请注意,确保所有文件原编码为ANSI,否则容易变乱码哦!
::不清楚你的批量是怎么个批量法,只写了转换的部分,批量处理需要你说明要求才能写::
:: file2utf8.bat
:::::::::::::::::::::::::::::::::::::::::::::::::::
::文件编码转换 GB2312 or UNICODE to UTF-8
:: by OGRobot at 2011-11-09
::
:: 执行方式:
:: file2utf8.bat 文件名
::
@echo off
set Script=%temp%\FileToUtf8.vbs
echo function checkcode(path) >%Script%
echo set inStream=CreateObject("ADODB.Stream") >>%Script%
echo inStream.Type=1 >>%Script%
echo inStream.Mode=3 >>%Script%
echo inStream.Open >>%Script%
echo inStream.Position=0 >>%Script%
echo inStream.LoadFromFile path >>%Script%
echo bom=inStream.Read(2) >>%Script%
echo If AscB(MidB(bom,1,1))=^&HEF And AscB(MidB(bom,2,1))=^&HBB Then >>%Script%
echo checkcode="UTF-8" >>%Script%
echo ElseIf AscB(MidB(bom,1,1))=^&HFF And AscB(MidB(bom,2,1))=^&HFE Then >>%Script%
echo checkcode="UNICODE" >>%Script%
echo Else >>%Script%
echo checkcode="GB2312" >>%Script%
echo End If >>%Script%
echo inStream.Close >>%Script%
echo set inStream=nothing >>%Script%
echo end function >>%Script%
echo/ >>%Script%
echo inCharset=checkcode(Wscript.Arguments(0)) >>%Script%
echo If inCharset^<^>"UTF-8" Then >>%Script%
echo set fso=CreateObject("Scripting.FileSystemObject") >>%Script%
echo fso.CopyFile Wscript.Arguments(0), Wscript.Arguments(0) ^&".bak" >>%Script%
echo set inStream=CreateObject("ADODB.Stream") >>%Script%
echo inStream.Type=2 >>%Script%
echo inStream.Mode=3 >>%Script%
echo inStream.Charset=inCharset >>%Script%
echo inStream.Open >>%Script%
echo inStream.LoadFromFile Wscript.Arguments(0) >>%Script%
echo buf=inStream.ReadText >>%Script%
echo inStream.Close >>%Script%
echo set inStream=nothing >>%Script%
echo/ >>%Script%
echo set outStream=CreateObject("ADODB.Stream") >>%Script%
echo outStream.Type=2 >>%Script%
echo outStream.Mode=3 >>%Script%
echo outStream.Charset="UTF-8" >>%Script%
echo outStream.Open >>%Script%
echo outStream.WriteText buf >>%Script%
echo outStream.SaveToFile Wscript.Arguments(0), 2 >>%Script%
echo outStream.Flush >>%Script%
echo outStream.Close >>%Script%
echo set outStream=nothing >>%Script%
echo End If >>%Script%
%Script% %1
del %Script%
欢迎分享,转载请注明来源:内存溢出
评论列表(0条)