love fish大鹏一曰同风起,扶摇直上九万里

常用链接

统计

积分与排名

friends

link

最新评论

通过汉字取得拼音Java版(转)

现在想到的方法有两种:
1,取得的汉字拼音对照表,然後做成map.一个GB2312的对照表见这里:
http://zh.transwiki.org/wiki/index.php/GB2312%E6%B1%89%E5%AD%97%E6%8B%BC%E9%9F%B3%E5%AF%B9%E7%85%A7%E8%A1%A8
2,这个方法看起来好,效果一般;
  1    private static int[] pyvalue = new int[] -20319-20317-20304-20295,
  2            -20292-20283-20265-20257-20242-20230-20051-20036,
  3            -20032-20026-20002-19990-19986-19982-19976-19805,
  4            -19784-19775-19774-19763-19756-19751-19746-19741,
  5            -19739-19728-19725-19715-19540-19531-19525-19515,
  6            -19500-19484-19479-19467-19289-19288-19281-19275,
  7            -19270-19263-19261-19249-19243-19242-19238-19235,
  8            -19227-19224-19218-19212-19038-19023-19018-19006,
  9            -19003-18996-18977-18961-18952-18783-18774-18773,
 10            -18763-18756-18741-18735-18731-18722-18710-18697,
 11            -18696-18526-18518-18501-18490-18478-18463-18448,
 12            -18447-18446-18239-18237-18231-18220-18211-18201,
 13            -18184-18183-18181-18012-17997-17988-17970-17964,
 14            -17961-17950-17947-17931-17928-17922-17759-17752,
 15            -17733-17730-17721-17703-17701-17697-17692-17683,
 16            -17676-17496-17487-17482-17468-17454-17433-17427,
 17            -17417-17202-17185-16983-16970-16942-16915-16733,
 18            -16708-16706-16689-16664-16657-16647-16474-16470,
 19            -16465-16459-16452-16448-16433-16429-16427-16423,
 20            -16419-16412-16407-16403-16401-16393-16220-16216,
 21            -16212-16205-16202-16187-16180-16171-16169-16158,
 22            -16155-15959-15958-15944-15933-15920-15915-15903,
 23            -15889-15878-15707-15701-15681-15667-15661-15659,
 24            -15652-15640-15631-15625-15454-15448-15436-15435,
 25            -15419-15416-15408-15394-15385-15377-15375-15369,
 26            -15363-15362-15183-15180-15165-15158-15153-15150,
 27            -15149-15144-15143-15141-15140-15139-15128-15121,
 28            -15119-15117-15110-15109-14941-14937-14933-14930,
 29            -14929-14928-14926-14922-14921-14914-14908-14902,
 30            -14894-14889-14882-14873-14871-14857-14678-14674,
 31            -14670-14668-14663-14654-14645-14630-14594-14429,
 32            -14407-14399-14384-14379-14368-14355-14353-14345,
 33            -14170-14159-14151-14149-14145-14140-14137-14135,
 34            -14125-14123-14122-14112-14109-14099-14097-14094,
 35            -14092-14090-14087-14083-13917-13914-13910-13907,
 36            -13906-13905-13896-13894-13878-13870-13859-13847,
 37            -13831-13658-13611-13601-13406-13404-13400-13398,
 38            -13395-13391-13387-13383-13367-13359-13356-13343,
 39            -13340-13329-13326-13318-13147-13138-13120-13107,
 40            -13096-13095-13091-13076-13068-13063-13060-12888,
 41            -12875-12871-12860-12858-12852-12849-12838-12831,
 42            -12829-12812-12802-12607-12597-12594-12585-12556,
 43            -12359-12346-12320-12300-12120-12099-12089-12074,
 44            -12067-12058-12039-11867-11861-11847-11831-11798,
 45            -11781-11604-11589-11536-11358-11340-11339-11324,
 46            -11303-11097-11077-11067-11055-11052-11045-11041,
 47            -11038-11024-11020-11019-11018-11014-10838-10832,
 48            -10815-10800-10790-10780-10764-10587-10544-10533,
 49            -10519-10331-10329-10328-10322-10315-10309-10307,
 50            -10296-10281-10274-10270-10262-10260-10256-10254 }
;
 51
 52    private static String[] pystr = new String[] "a""ai""an""ang",
 53            "ao""ba""bai""ban""bang""bao""bei""ben""beng",
 54            "bi""bian""biao""bie""bin""bing""bo""bu""ca",
 55            "cai""can""cang""cao""ce""ceng""cha""chai""chan",
 56            "chang""chao""che""chen""cheng""chi""chong""chou",
 57            "chu""chuai""chuan""chuang""chui""chun""chuo""ci",
 58            "cong""cou""cu""cuan""cui""cun""cuo""da""dai",
 59            "dan""dang""dao""de""deng""di""dian""diao""die",
 60            "ding""diu""dong""dou""du""duan""dui""dun""duo",
 61            "e""en""er""fa""fan""fang""fei""fen""feng""fo",
 62            "fou""fu""ga""gai""gan""gang""gao""ge""gei""gen",
 63            "geng""gong""gou""gu""gua""guai""guan""guang""gui",
 64            "gun""guo""ha""hai""han""hang""hao""he""hei",
 65            "hen""heng""hong""hou""hu""hua""huai""huan""huang",
 66            "hui""hun""huo""ji""jia""jian""jiang""jiao""jie",
 67            "jin""jing""jiong""jiu""ju""juan""jue""jun""ka",
 68            "kai""kan""kang""kao""ke""ken""keng""kong""kou",
 69            "ku""kua""kuai""kuan""kuang""kui""kun""kuo""la",
 70            "lai""lan""lang""lao""le""lei""leng""li""lia",
 71            "lian""liang""liao""lie""lin""ling""liu""long",
 72            "lou""lu""lv""luan""lue""lun""luo""ma""mai""man",
 73            "mang""mao""me""mei""men""meng""mi""mian""miao",
 74            "mie""min""ming""miu""mo""mou""mu""na""nai""nan",
 75            "nang""nao""ne""nei""nen""neng""ni""nian""niang",
 76            "niao""nie""nin""ning""niu""nong""nu""nv""nuan",
 77            "nue""nuo""o""ou""pa""pai""pan""pang""pao""pei",
 78            "pen""peng""pi""pian""piao""pie""pin""ping""po",
 79            "pu""qi""qia""qian""qiang""qiao""qie""qin""qing",
 80            "qiong""qiu""qu""quan""que""qun""ran""rang""rao",
 81            "re""ren""reng""ri""rong""rou""ru""ruan""rui",
 82            "run""ruo""sa""sai""san""sang""sao""se""sen",
 83            "seng""sha""shai""shan""shang""shao""she""shen",
 84            "sheng""shi""shou""shu""shua""shuai""shuan""shuang",
 85            "shui""shun""shuo""si""song""sou""su""suan""sui",
 86            "sun""suo""ta""tai""tan""tang""tao""te""teng",
 87            "ti""tian""tiao""tie""ting""tong""tou""tu""tuan",
 88            "tui""tun""tuo""wa""wai""wan""wang""wei""wen",
 89            "weng""wo""wu""xi""xia""xian""xiang""xiao""xie",
 90            "xin""xing""xiong""xiu""xu""xuan""xue""xun""ya",
 91            "yan""yang""yao""ye""yi""yin""ying""yo""yong",
 92            "you""yu""yuan""yue""yun""za""zai""zan""zang",
 93            "zao""ze""zei""zen""zeng""zha""zhai""zhan""zhang",
 94            "zhao""zhe""zhen""zheng""zhi""zhong""zhou""zhu",
 95            "zhua""zhuai""zhuan""zhuang""zhui""zhun""zhuo""zi",
 96            "zong""zou""zu""zuan""zui""zun""zuo" }
;
 97
 98    public static int getChsAscii(String chs) {
 99        int asc = 0;
100        try {
101            byte[] bytes = chs.getBytes("gb2312");
102            if (bytes == null || bytes.length > 2 || bytes.length <= 0// 错误
103                // log
104                System.out.println("error");
105            }

106            if (bytes.length == 1// 英文字符
107                asc = bytes[0];
108            }

109            if (bytes.length == 2// 中文字符
110            // System.out.println(bytes[0]);
111            // System.out.println(bytes[1]);
112                int hightByte = 256 + bytes[0];
113                int lowByte = 256 + bytes[1];
114                asc = (256 * hightByte + lowByte) - 256 * 256;
115            }

116        }
 catch (Exception e) {
117            e.printStackTrace();
118        }

119        return asc;
120    }

121
122    public static String convert(String str) {
123        String result = null;
124        int ascii = getChsAscii(str);
125//        System.out.println(ascii);
126        if (ascii > 0 && ascii < 160{
127            result = String.valueOf((char) ascii);
128        }
 else {
129            for (int i = (pyvalue.length - 1); i >= 0; i--{
130                if (pyvalue[i] <= ascii) {
131                    result = pystr[i];
132                    break;
133                }

134            }

135        }

136        return result;
137    }

138    public static String getPy1(String chs) {
139        StringBuffer py = new StringBuffer();
140        String key, value;
141        for (int i = 0; i < chs.length(); i++{
142            key = chs.substring(i, i + 1);
143            if (key.getBytes().length == 2{
144                value = (String) convert(key);
145                if (value == null{
146                    value = "unknown";
147                }

148            }
 else {
149                value = key;
150            }

151
152            py.append(value);
153        }

154        return py.toString();
155    }





# re: 通过汉字取得拼音Java版 2007-01-05 10:34 yerba

第二种方法,取不到一些特殊汉字,比如常用的"琪",就取成了“zuo”不知道是不是你得数据的问题。
另外,你的两种方法都不能做到多音字的识别。要做到多音字的识别,我的方法是用微软拼音输入法字库导出来,里面包括了所有的读音。  回复  更多评论  

# re: 通过汉字取得拼音Java版 2007-01-05 11:15 Ivan Chen

试试pinyin4j:http://pinyin4j.sf.net/
很不错。  回复  更多评论  

# re: 通过汉字取得拼音Java版 2007-01-05 12:22 junmy[匿名]

这里一个.net版本的
see: http://www.cnblogs.com/jillzhang/archive/2006/10/30/544596.html  回复  更多评论  

# re: 通过汉字取得拼音Java版 2007-01-05 16:11 Alex

不错,可以参考  回复  更多评论  

# re: 通过汉字取得拼音Java版 2007-01-05 22:21 blackbat

第1种方法是不是全部汉字都可以去到?  回复  更多评论  

# re: 通过汉字取得拼音Java版 2007-01-07 14:52 slx

是否能全部取到要看你的字库有多全  回复  更多评论  

# re: 通过汉字取得拼音Java版 2007-01-10 10:00 jyb

从微软拼音中导出来的比较全,应该是gbk吧。对于多音字,我想还得看实际情况来处理,如果让你把一个人的中文姓名、地名转换为拼音,总不能把多音字的全部读音对转换过来吧。重庆--chongqing,重庆--chongzhongqing?  回复  更多评论  

# re: 通过汉字取得拼音Java版[未登录]2007-01-19 10:08 Leon

不错,大有用处。  回复  更多评论  

posted on 2007-02-26 09:36 liaojiyong 阅读(945) 评论(0)  编辑  收藏 所属分类: Java


只有注册用户登录后才能发表评论。


网站导航: