Posted on 2012-02-24 14:54
小明 阅读(1524)
评论(0) 编辑 收藏 所属分类:
开发日志
Mysql
的latin1 不等于标准的latin1(iso-8859-1) 和cp1252,比iso-8859-1多了0x80-0x9f字符,比cp1252多了0x81,0x8d,0x8f,0x90,0x9d 一共5个字符。
http://dev.mysql.com/doc/refman/5.0/en/charset-we-sets.html
latin1
is the default character set. MySQL's latin1
is the same as the Windows cp1252
character set. This means it is the same as the official ISO 8859-1
or IANA (Internet Assigned Numbers Authority)
latin1
, except that IANA latin1
treats the code points between 0x80
and
0x9f
as “
undefined,” whereas cp1252
, and therefore MySQL's latin1
, assign characters for those positions. For example, 0x80
is the Euro sign. For the “
undefined” entries in cp1252
, MySQL translates 0x81
to Unicode 0x0081
,
0x8d
to
0x008d
,
0x8f
to
0x008f
,
0x90
to
0x0090
, and
0x9d
to
0x009d
.
这样在Java中,如果使用标准的iso-8859-1或者cp1252解码可能出现乱码。
s.getBytes("iso-8859-1") 或者 s.getBytes("cp1252");
写了一段代码来解决这个问题
private String convertCharset(String s){
if(s!=null){
try {
int length = s.length();
byte[] buffer = new byte[length];
//0x81 to Unicode 0x0081, 0x8d to 0x008d, 0x8f to 0x008f, 0x90 to 0x0090, and 0x9d to 0x009d.
for(int i=0;i<length;++i){
char c = s.charAt(i);
if(c==0x0081){
buffer[i]=(byte)0x81;
}
else if(c==0x008d){
buffer[i]=(byte)0x8d;
}
else if(c==0x008f){
buffer[i]=(byte)0x8f;
}
else if(c==0x0090){
buffer[i]=(byte)0x90;
}
else if(c==0x009d){
buffer[i]=(byte)0x9d;
}
else{
buffer[i] = Character.toString(c).getBytes("cp1252")[0];
}
}
String result = new String(buffer,"utf-8");
return result;
} catch (UnsupportedEncodingException e) {
logger.error("charset convert error", e);
}
}
return null;
}