对于一些常量我们经常放到property属性文件中.
今天在对其的存取过程中遇到了些问题:
1:取的时候掉了内容
2:取出后出现乱码
首先,我们的property文件大约如下:
#友情链接
news.link.inner.href = http:
//
www.baidu.com,
http://www.baidu.com
,
http://www.baidu.com
new
.link.inner.title = 百度1,百度2,百度3
1:取的时候掉了内容的解决:
当然我这个文件有些特殊,主要是针对跳转下拉菜单的数据设计(用户在日后扩展数据的时候只需在后面直接添加,但必须以","号分开).
在开始我以如下方法来取:
private
void
initiallink()
{
String innerlinkstr
=
null
;
String innertitlestr
=
null
;
String outerlinkstr
=
null
;
String outertitlestr
=
null
;
StringTokenizer innerlink
=
null
;
StringTokenizer innertitle
=
null
;
StringTokenizer outerlink
=
null
;
StringTokenizer outertitle
=
null
;
InputStream in
=
this
.getClass().getResourceAsStream(
"
/conf/netedu.properties
"
);
try
{
try
{
Properties props
=
new
Properties();
props.load(in);
innerlinkstr
=
props.getProperty(
"
news.link.inner.href
"
);
innertitlestr
=
props.getProperty(
"
new.link.inner.title
"
);
outerlinkstr
=
props.getProperty(
"
news.link.outer.href
"
);
outertitlestr
=
props.getProperty(
"
new.link.outer.title
"
);
innerlink
=
new
StringTokenizer(innerlinkstr,
"
,
"
);
innertitle
=
new
StringTokenizer(innertitlestr,
"
,
"
);
outerlink
=
new
StringTokenizer(outerlinkstr,
"
,
"
);
outertitle
=
new
StringTokenizer(outertitlestr,
"
,
"
);
innermap
=
this
.getlinks(innertitle,innerlink);
outermap
=
this
.getlinks(outertitle,outerlink);
}
finally
{
in.close();
}
}
catch
(Exception ex)
{
log.debug(
"
Error to read property in /conf/netedu.properties file
"
);
}
}
getLinks的方法如下:
/** */
/**
*
@param
titleargs 存放链接名称的StringTokenizer
*
@param
linkargs 存放链接地址的StringTokenizer
* return hashmap(title,link)
* 两个参数应该有相同的长度
*/
private
HashMap getlinks(StringTokenizer titles, StringTokenizer links)
{
HashMap results
=
new
HashMap();
for
(
int
i
=
0
;i
<
titles.countTokens();i
++
)
results.put((String)titles.nextElement(),(String)links.nextElement());
return
results;
}
但到JSP显示的时候得到的results的长度为2,也就是只取出了文件中的百度1,百度2(乱码解决后才知道是这二个啦).在Eclipse中调试的时候发在getLinks方法中的for循环确实少执行了次!为什么?(搞不懂!郁闷了半天),不得不将此方法的代码加长些(真受不了)
/** */
/**
*
@param
titleargs 存放链接名称的StringTokenizer
*
@param
linkargs 存放链接地址的StringTokenizer
* return hashmap(title,link)
* 两个参数应该有相同的长度
*/
private
HashMap getlinks(StringTokenizer titles, StringTokenizer links)
{
HashMap results
=
new
HashMap();
int
len
=
titles.countTokens();
String[] temp1
=
new
String[len];
String[] temp2
=
new
String[len];
for
(
int
i
=
0
;i
<
len;i
++
)
{
temp1[i]
=
(String)titles.nextElement();
temp2[i]
=
(String)links.nextElement();
}
for
(
int
i
=
0
;i
<
len;i
++
)
{
results.put(temp1[i],temp2[i]);
}
return
results;
}
这简直一样的处理啊?为什么结果会不同呢?
2:乱码处理
乱码其实简单,只是开始的时候没注意罢了.我们的机器编码应该是GBK方式的,而在JVM程序中读取Property文件的时候使用的是Unicode编码方式,(我的这些处理过程也没对编码文件请求进行过滤),所以我们可以对其进行对应的编码.
我是利用了JDK自带的native2ascii.exe工具.
通过--encoding 来指定其编码方式
native2ascii -encoding GBK sourcefilename destfilename
这样你在
InputStream in = this.getClass().getResourceAsStream("/conf/netedu.properties");
语句中用到的/conf/netedu.properties文件就是destfilename 来代替就OK了
只是这样你看到的可能是如下的一些代码:
#\u5385\u5185\u94fe\u63a5
-----
\u7528,\u9694\u5f00
news.link.inner.href http:
//
www.baidu.com,
http://www.baidu.com
,
http://www.baidu.com
new
.link.inner.title \u767e\u5ea61,\u767e\u5ea62,\u767e\u5ea63
当然你不可能对着一大堆的16进制看吧,所以可以通过 -reverse 来解码.
native2ascii -reverse sourcefilename destfilename
本人觉得对于大量文本的处理,比如说整个项目的国际化,这样可以通过对整个文件编码来处理,但若只是为了一个下拉框,就显得有些大材小用了(再说对用户来说,他们也得多一步去执行项目里的脚本代码).所以我们可以在叠代标题(这些就是"百度")StringTokenizer的时候对其重新编码.这时可用将getLinks()方法中的
for
(
int
i
=
0
;i
<
len;i
++
)
{
temp1[i]
=
(String)titles.nextElement();
temp2[i]
=
(String)links.nextElement();
}
改成
for
(
int
i
=
0
;i
<
len;i
++
)
{
String s
=
(String)titles.nextElement();
try
{
temp1[i]
=
new
String(s.getBytes(
"
ISO-8859-1
"
),
"
GBK
"
);
}
catch
(UnsupportedEncodingException e)
{
e.printStackTrace();
}
temp2[i]
=
(String)links.nextElement();
}
注意:"ISO-8859-1"与"GBK"对应的分别是源编码与目标编码方式.
关于native2ascii的详细用法,可以参考相关文档,as this:
native2ascii - Native-to-ASCII Converter
Converts a file with native-encoded characters (characters which are non-Latin 1 and non-Unicode) to one with Unicode-encoded characters.
SYNOPSIS
native2ascii [options] [inputfile [outputfile]]
DESCRIPTION
The Java compiler and other Java tools can only process files which contain Latin-1 and/or Unicode-encoded (\udddd notation) characters. native2ascii converts files which contain other character encodings into files containing Latin-1 and/or Unicode-encoded charaters.
If outputfile is omitted, standard output is used for output. If, in addition, inputfile is omitted, standard input is used for input.
OPTIONS
-reverse
Perform the reverse operation: convert a file with Latin-1 and/or Unicode encoded characters to one with native-encoded characters.
-encoding encoding_name
Specify the encoding name which is used by the conversion procedure. The default encoding is taken from System property file.encoding. The encoding_name string must be a string taken from the first column of the table below.
-------------------------------------------------------------
Converter Description
Class
-------------------------------------------------------------
8859_1 ISO 8859-1
8859_2 ISO 8859-2
8859_3 ISO 8859-3
8859_4 ISO 8859-4
8859_5 ISO 8859-5
8859_6 ISO 8859-6
8859_7 ISO 8859-7
8859_8 ISO 8859-8
8859_9 ISO 8859-9
Big5 Big5, Traditional Chinese
CNS11643 CNS 11643, Traditional Chinese
Cp037 USA, Canada(Bilingual, French), Netherlands,
Portugal, Brazil, Australia
Cp1006 IBM AIX Pakistan (Urdu)
Cp1025 IBM Multilingual Cyrillic: Bulgaria, Bosnia,
Herzegovinia, Macedonia(FYR)
Cp1026 IBM Latin-5, Turkey
Cp1046 IBM Open Edition US EBCDIC
Cp1097 IBM Iran(Farsi)/Persian
Cp1098 IBM Iran(Farsi)/Persian (PC)
Cp1112 IBM Latvia, Lithuania
Cp1122 IBM Estonia
Cp1123 IBM Ukraine
Cp1124 IBM AIX Ukraine
Cp1125 IBM Ukraine (PC)
Cp1250 Windows Eastern European
Cp1251 Windows Cyrillic
Cp1252 Windows Latin-1
Cp1253 Windows Greek
Cp1254 Windows Turkish
Cp1255 Windows Hebrew
Cp1256 Windows Arabic
Cp1257 Windows Baltic
Cp1258 Windows Vietnamese
Cp1381 IBM OS/2, DOS People's Republic of China (PRC)
Cp1383 IBM AIX People's Republic of China (PRC)
Cp273 IBM Austria, Germany
Cp277 IBM Denmark, Norway
Cp278 IBM Finland, Sweden
Cp280 IBM Italy
Cp284 IBM Catalan/Spain, Spanish Latin America
Cp285 IBM United Kingdom, Ireland
Cp297 IBM France
Cp33722 IBM-eucJP - Japanese (superset of 5050)
Cp420 IBM Arabic
Cp424 IBM Hebrew
Cp437 MS-DOS United States, Australia, New Zealand,
South Africa
Cp500 EBCDIC 500V1
Cp737 PC Greek
Cp775 PC Baltic
Cp838 IBM Thailand extended SBCS
Cp850 MS-DOS Latin-1
Cp852 MS-DOS Latin-2
Cp855 IBM Cyrillic
Cp857 IBM Turkish
Cp860 MS-DOS Portuguese
Cp861 MS-DOS Icelandic
Cp862 PC Hebrew
Cp863 MS-DOS Canadian French
Cp864 PC Arabic
Cp865 MS-DOS Nordic
Cp866 MS-DOS Russian
Cp868 MS-DOS Pakistan
Cp869 IBM Modern Greek
Cp870 IBM Multilingual Latin-2
Cp871 IBM Iceland
Cp874 IBM Thai
Cp875 IBM Greek
Cp918 IBM Pakistan(Urdu)
Cp921 IBM Latvia, Lithuania (AIX, DOS)
Cp922 IBM Estonia (AIX, DOS)
Cp930 Japanese Katakana-Kanji mixed with 4370 UDC,
superset of 5026
Cp933 Korean Mixed with 1880 UDC, superset of 5029
Cp935 Simplified Chinese Host mixed with 1880 UDC,
superset of 5031
Cp937 Traditional Chinese Host miexed with 6204 UDC,
&n0