Lucene In Action ch2 系统的讲解了 indexing,下面就来看看吧.
1,indexing的处理过程.
首先要把indexing的数据转换为text,因为Lucene只能索引text,然后由Analysis来过虑text,把一些ch1中提到的所谓的stop words 过滤掉, 然后建立index.建立的index为inverted index 也就是所谓的倒排索引.
2,基本的ingex操作
基本的操作 包括 :添加 删除 更新.
I . 添加
下面我们看个例子代码 BaseIndexingTestCase.class
01 package lia.indexing; 02 03 import org.apache.lucene.store.Directory; 04 import org.apache.lucene.store.FSDirectory; 05 import org.apache.lucene.document.Document; 06 import org.apache.lucene.document.Field; 07 import org.apache.lucene.index.IndexWriter; 08 import org.apache.lucene.index.IndexReader; 09 import org.apache.lucene.analysis.Analyzer; 10 import org.apache.lucene.analysis.SimpleAnalyzer; 11 12 import junit.framework.TestCase; 13 import java.io.IOException; 14 15 /** 16 * 17 */ 18 public abstract class BaseIndexingTestCase extends TestCase { 19 protected String[] keywords = {"1", "2"}; 20 protected String[] unindexed = {"Netherlands", "Italy"}; 21 protected String[] unstored = {"Amsterdam has lots of bridges", 22 "Venice has lots of canals"}; 23 protected String[] text = {"Amsterdam", "Venice"}; 24 protected Directory dir; 25 // setUp 方法 26 protected void setUp() throws IOException { 27 String indexDir = 28 System.getProperty("java.io.tmpdir", "tmp") + 29 System.getProperty("file.separator") + "index-dir"; 30 dir = FSDirectory.getDirectory(indexDir, true); 31 addDocuments(dir); 32 } 33 34 protected void addDocuments(Directory dir) 35 throws IOException { 36 IndexWriter writer = new IndexWriter(dir, getAnalyzer(), 37 true); // 得到indexWriter 实例 38 writer.setUseCompoundFile(isCompound()); 39 for (int i = 0; i < keywords.length; i++) { 40 Document doc = new Document(); // 添加文档 41 doc.add(Field.Keyword("id", keywords[i])); 42 doc.add(Field.UnIndexed("country", unindexed[i])); 43 doc.add(Field.UnStored("contents", unstored[i])); 44 doc.add(Field.Text("city", text[i])); 45 writer.addDocument(doc); 46 } 47 writer.optimize(); // 优化index 48 writer.close(); 49 } 50 // 可以覆盖该方法提供不同的Analyzer 51 protected Analyzer getAnalyzer() { 52 return new SimpleAnalyzer(); 53 } 54 // 也可以覆盖该方法 指出Compound属性 是否是 Heterogeneous Documents 55 protected boolean isCompound() { 56 return true; 57 } 58 // 测试添加文档 59 public void testIndexWriter() throws IOException { 60 IndexWriter writer = new IndexWriter(dir, getAnalyzer(), 61 false); 62 assertEquals(keywords.length, writer.docCount()); 63 writer.close(); 64 } 65 // 测试IndexReader 66 public void testIndexReader() throws IOException { 67 IndexReader reader = IndexReader.open(dir); 68 assertEquals(keywords.length, reader.maxDoc()); 69 assertEquals(keywords.length, reader.numDocs()); 70 reader.close(); 71 } 72 }
|
这是一个测试超类 可以被其他的测试用例继承 来测试不同的功能.上面带有详细的注释.
在添加Field时, 会遇到同义词的情况,添加同义词由两种方式:
a.创建一个同义词词组,循环添加到Single Strng的不同Field中.
b.把同义词添加到一个Base word的field中.如下:
String baseWord = "fast";
String synonyms[] = String {"quick", "rapid", "speedy"};
Document doc = new Document();
doc.add(Field.Text("word", baseWord));
for (int i = 0; i < synonyms.length; i++) {
doc.add(Field.Text("word", synonyms[i]));
}
这样 在Lucene内部把每个词都添加的一个名为word的Field中,在搜索时 你可以使用任何一个给定的词语.