DANCE WITH JAVA

开发出高质量的系统

常用链接

统计

随笔 - 239
文章 - 0
评论 - 664
引用 - 0

积分与排名

积分 - 996731
排名 - 34

好友之家

比较lucene各种英文分析器Analyzer

比较常用的几种英文分析器，他们之间的区别见程序中的注释。
SimpleAnalyzer
StandardAnalyzer
WhitespaceAnalyzer
StopAnalyzer

package analyzer;

import java.io.Reader;

import java.io.StringReader;

import org.apache.lucene.analysis.Analyzer;

import org.apache.lucene.analysis.SimpleAnalyzer;

import org.apache.lucene.analysis.StopAnalyzer;

import org.apache.lucene.analysis.StopFilter;

import org.apache.lucene.analysis.Token;

import org.apache.lucene.analysis.Tokenizer;

import org.apache.lucene.analysis.WhitespaceAnalyzer;

import org.apache.lucene.analysis.standard.StandardAnalyzer;

public class TestAnalyzer {

private static String testString1 = "The quick brown fox jumped over the lazy dogs";

private static String testString2 = "xy&z mail is - xyz@sohu.com";

public static void testWhitespace(String testString) throws Exception{

Analyzer analyzer = new WhitespaceAnalyzer();

Reader r = new StringReader(testString);

Tokenizer ts = (Tokenizer) analyzer.tokenStream("", r);

System.err.println("=====Whitespace analyzer====");

System.err.println("分析方法：空格分割");

Token t;

while ((t = ts.next()) != null) {

System.out.println(t.termText());

}

public static void testSimple(String testString) throws Exception{

Analyzer analyzer = new SimpleAnalyzer();

Reader r = new StringReader(testString);

Tokenizer ts = (Tokenizer) analyzer.tokenStream("", r);

System.err.println("=====Simple analyzer====");

System.err.println("分析方法：空格及各种符号分割");

Token t;

while ((t = ts.next()) != null) {

System.out.println(t.termText());

}

public static void testStop(String testString) throws Exception{

Analyzer analyzer = new StopAnalyzer();

Reader r = new StringReader(testString);

StopFilter sf = (StopFilter) analyzer.tokenStream("", r);

System.err.println("=====stop analyzer====");

System.err.println("分析方法：空格及各种符号分割,去掉停止词，停止词包括 is,are,in,on,the等无实际意义的词");

//停止词

Token t;

while ((t = sf.next()) != null) {

System.out.println(t.termText());

}

public static void testStandard(String testString) throws Exception{

Analyzer analyzer = new StandardAnalyzer();

Reader r = new StringReader(testString);

StopFilter sf = (StopFilter) analyzer.tokenStream("", r);

System.err.println("=====standard analyzer====");

System.err.println("分析方法：混合分割,包括了去掉停止词，支持汉语");

Token t;

while ((t = sf.next()) != null) {

System.out.println(t.termText());

}

public static void main(String[] args) throws Exception{

// String testString = testString1;

String testString = testString2;

System.out.println(testString);

testWhitespace(testString);

testSimple(testString);

testStop(testString);

testStandard(testString);

}

posted on 2007-06-20 16:46 dreamstone 阅读(4068) 评论(2) 编辑收藏所属分类: 搜索引擎lucence

# re: 比较lucene各种英文分析器Analyzer 2007-06-20 18:02 good

不错回复更多评论

# re: 比较lucene各种英文分析器Analyzer 2008-06-21 18:03 美女

Me with the floorshow
Kickin' with your torso
Boys getting high
And the girls even more so
Wave your hands if your not with the man
Can I kick it?
(Yes you can)
I got
(Funk)
You got
(Soul)
We got everybody
I've got the gift
Gonna stick it in the goal
It's time to move your body
Babylon back in business
Can I get a witness?
Every girl, every man
Houston, can you hear me?
Ground control, can you feel me?
Need permission to land
I don't wanna rock, DJ
But your making me feel so nice
When's it gonna stop, DJ?
Cos you're keepin' me up all night

Singin' in the classes
Music for your masses
Give no head
No backstage passes
Have a proper giggle
I'll be quite polite
But when I rock the mic
I rock the mic
(Right)
You got no love, then you're with the wrong man
It's time to move your body
If you can't get a girl
But your best friend can
It's time to move your body
I don't wanna be sleazy
Baby just tease me
Got no family planned
Houston, can you hear me?
Need permission to land
But you're making me feel so nice
When's it gonna stop, DJ?
Cos you're keeping me up all night
I don't wanna rock, DJ
But you're making me feel so nice
When's it gonna stop, DJ?
Cos your keeping me up all night
Pimpin' aint easy
Most of them fleece me
Every night
Pimpin' ain't easy
But if you're sellin' it
It's alright
Come on
I don't wanna rock, DJ
But you're making me feel so nice
When's it gonna stop, DJ?
Cos you're keeping me up all night
I don't wanna rock, DJ
But you're making me feel so nice
When's it gonna stop, DJ?
Cos you're keeping me up all night
I don't wanna rock, DJ
But you're making me feel so nice
When's it gonna stop, DJ?
Cos you're keeping me up all night
I don't wanna rock, DJ
But you're making me feel so nice
When's it gonna stop, DJ?
Cos you're keeping me up all night 回复更多评论

新用户注册刷新评论列表


只有注册用户登录后才能发表评论。




网站导航: 博客园博客园最新博文博问管理
相关文章: lucene入门合集 lucene的中文分词器 lucene的丰富的各种查询（二） lucene的丰富的各种查询(一) 比较lucene各种英文分析器Analyzer lucene建立索引时候的用到的一些文档和目录操作 lucene 索引非txt文档 (pdf word rtf html xml) apache lucene 的核心类 apache lucene 一个最简单的实例 apache lucene介绍

# re: 比较lucene各种英文分析器Analyzer 2007-06-20 18:02 good

# re: 比较lucene各种英文分析器Analyzer 2008-06-21 18:03 美女

DANCE WITH JAVA

导航

随笔分类(277)

随笔档案(238)

阅读排行榜

常用链接

统计

积分与排名

好友之家

最新评论

比较lucene各种英文分析器Analyzer

评论