posts - 40, comments - 7, trackbacks - 0

Lucene In Action ch2 系统的讲解了 indexing,下面就来看看吧.

1,indexing 的处理过程.

首先要把indexing的数据转换为text,因为Lucene只能索引text,然后由Analysis来过虑text,把一些ch1中提到的所谓的stop words 过滤掉, 然后建立index.建立的index为 inverted index 也就是所谓的倒排索引 .

2, 基本的 ingex 操作

基本的操作包括 : 添加删除更新 .

I . 添加

下面我们看个例子代码 BaseIndexingTestCase.class

01 package lia.indexing;
02
03 import org.apache.lucene.store.Directory;
04 import org.apache.lucene.store.FSDirectory;
05 import org.apache.lucene.document.Document;
06 import org.apache.lucene.document.Field;
07 import org.apache.lucene.index.IndexWriter;
08 import org.apache.lucene.index.IndexReader;
09 import org.apache.lucene.analysis.Analyzer;
10 import org.apache.lucene.analysis.SimpleAnalyzer;
11
12 import junit.framework.TestCase;
13 import java.io.IOException;
14
15 /**
16   *
17   */
18 public abstract class BaseIndexingTestCase extends TestCase {
19    protected String[] keywords = {"1", "2"};
20    protected String[] unindexed = {"Netherlands", "Italy"};
21    protected String[] unstored = {"Amsterdam has lots of bridges",
22                                   "Venice has lots of canals"};
23    protected String[] text = {"Amsterdam", "Venice"};
24    protected Directory dir;
25    // setUp 方法
26    protected void setUp() throws IOException {
27      String indexDir =
28        System.getProperty("java.io.tmpdir", "tmp") +
29        System.getProperty("file.separator") + "index-dir";
30      dir = FSDirectory.getDirectory(indexDir, true);
31      addDocuments(dir);
32    }
33
34    protected void addDocuments(Directory dir)
35      throws IOException {
36      IndexWriter writer = new IndexWriter(dir, getAnalyzer(),
37        true);    // 得到 indexWriter 实例
38      writer.setUseCompoundFile(isCompound());
39      for (int i = 0; i < keywords.length; i++) {
40        Document doc = new Document();        // 添加文档
41        doc.add(Field.Keyword("id", keywords[i]));
42        doc.add(Field.UnIndexed("country", unindexed[i]));
43        doc.add(Field.UnStored("contents", unstored[i]));
44        doc.add(Field.Text("city", text[i]));
45        writer.addDocument(doc);
46      }
47      writer.optimize();   // 优化 index
48      writer.close();
49    }
50    // 可以覆盖该方法提供不同的 Analyzer
51    protected Analyzer getAnalyzer() {
52      return new SimpleAnalyzer();
53    }
54    // 也可以覆盖该方法指出 Compound 属性是否是 Heterogeneous Documents
55    protected boolean isCompound() {
56      return true;
57    }
58    // 测试添加文档
59    public void testIndexWriter() throws IOException {
60      IndexWriter writer = new IndexWriter(dir, getAnalyzer(),
61        false);
62      assertEquals(keywords.length, writer.docCount());
63      writer.close();
64    }
65    // 测试 IndexReader
66    public void testIndexReader() throws IOException {
67      IndexReader reader = IndexReader.open(dir);
68      assertEquals(keywords.length, reader.maxDoc());
69      assertEquals(keywords.length, reader.numDocs());
70      reader.close();
71    }
72 }

这是一个测试超类可以被其他的测试用例继承来测试不同的功能 . 上面带有详细的注释 .

在添加 Field 时 , 会遇到同义词的情况 , 添加同义词由两种方式 :

a. 创建一个同义词词组 , 循环添加到 Single Strng 的不同 Field 中 .

b. 把同义词添加到一个 Base word 的 field 中 . 如下 :

String baseWord = "fast";

String synonyms[] = String {"quick", "rapid", "speedy"};

Document doc = new Document();

doc.add(Field.Text("word", baseWord));

for (int i = 0; i < synonyms.length; i++) {

doc.add(Field.Text("word", synonyms[i]));

}

这样在 Lucene 内部把每个词都添加的一个名为 word 的 Field 中 , 在搜索时你可以使用任何一个给定的词语 .

II. 删除

删除 Document 利用 IndexReader.IndexReader 并不是立即的在 Index 中删除 Document, 而是标记为 Deleted, 当 IndexReader's close() 方法调用时才真正删除 . 这点和 Hibernate 的 Lazy Loading 差不多并不是立即加载数据而是在你用到的时候加载数据 .

下面看一个测试的例子 : DocumentDeleteTest. java

01 package lia.indexing;
02
03 import org.apache.lucene.index.IndexWriter;
04 import org.apache.lucene.index.IndexReader;
05
06 import java.io.IOException;
07
08 /**
09   *
10   */
11 public class DocumentDeleteTest extends BaseIndexingTestCase {
12    // 在 Merge 执行以前测试删除
13    public void testDeleteBeforeIndexMerge() throws IOException {
14      IndexReader reader = IndexReader.open(dir);
15      assertEquals(2, reader.maxDoc()); // 下一个文档号是 2
16      assertEquals(2, reader.numDocs()); //index 中的文档数目
17      reader.delete(1); // 删除 id 为 1 的文档
18
19      assertTrue(reader.isDeleted(1)); // id 为 1 的文档被删除
20      assertTrue(reader.hasDeletions()); //index 中存在被标记为删除的文档
21      assertEquals(2, reader.maxDoc()); // 注意这两行
22      assertEquals(1, reader.numDocs()); // index 中有一个文档
23
24      reader.close();
25
26      reader = IndexReader.open(dir);
27
28      assertEquals(2, reader.maxDoc()); // 重新打开 reader 测试
29      assertEquals(1, reader.numDocs());
30
31      reader.close();
32    }
33    // 在 index Merge 后测试删除
34    public void testDeleteAfterIndexMerge() throws IOException {
35      IndexReader reader = IndexReader.open(dir);
36      assertEquals(2, reader.maxDoc());
37      assertEquals(2, reader.numDocs());
38      reader.delete(1);
39      reader.close();
40
41      IndexWriter writer = new IndexWriter(dir, getAnalyzer(),
42        false);
43      writer.optimize();   // 优化 , 即 Merge idnex 改变了下面的文档号
44      writer.close();
45
46      reader = IndexReader.open(dir);
47
48      assertFalse(reader.isDeleted(1));
49      assertFalse(reader.hasDeletions());
50      assertEquals(1, reader.maxDoc()); // 注意与上面的不同
51      assertEquals(1, reader.numDocs());
52
53      reader.close();
54    }
55 }

该 DocumentDeleteTest 类继承上面的类 . 里面带有详细的注释 .

删除 Document 可以利用 lucene 内部的文档号码来删除 , 但是该号码不是固定的 , 当 Merge 后会有变化 .

也可以查找删除就是删除符合条件的文档 . 如 :

IndexReader reader = IndexReader.open(dir);
reader.delete(new Term("city", "Amsterdam"));
reader.close();

这样只删除符合条件的文档 . 使用该方法时要小心可能一不小心就删除了所有的数据了 .

关于为什么使用 IndexReader 来删除文档而不使用 IndexWriter 大家可以看看 Lucene in Actin 里面的解释 ,

You may wonder why Lucene performs Document deletion from IndexReader
and not IndexWriter instances. That question is asked in the Lucene community
every few months, probably due to imperfect and perhaps misleading class names.
Lucene users often think that IndexWriter is the only class that can modify an
index and that IndexReader accesses an index in a read-only fashion. In reality,
IndexWriter touches only the list of index segments and a small subset of index
files when segments are merged. On the other hand, IndexReader knows how to
parse all index files and make sense out of them. When a Document is deleted,
IndexReader first needs to locate the segment containing the specified Document
before it can mark it as deleted. There are currently no plans to change either the
names or behavior of these two Lucene classes.

注意 : 由于删除是延迟的 , 所以在 Reader 关闭以前你还有机会把标记为 deleted 的文档恢复回来 , 调用 undeleteAll() 方法就可以了 .

III. 更新

更新 index, 在 Lucene 中只能先删除然后再添加 .

如下例子所示 : DocumentUpdateTest.java

01 package lia.indexing;
02
03 import org.apache.lucene.index.IndexWriter;
04 import org.apache.lucene.index.IndexReader;
05 import org.apache.lucene.analysis.WhitespaceAnalyzer;
06 import org.apache.lucene.analysis.Analyzer;
07 import org.apache.lucene.document.Document;
08 import org.apache.lucene.document.Field;
09 import org.apache.lucene.index.Term;
10 import org.apache.lucene.search.Query;
11 import org.apache.lucene.search.TermQuery;
12 import org.apache.lucene.search.Hits;
13 import org.apache.lucene.search.IndexSearcher;
14
15 import java.io.IOException;
16
17 /**
18   *
19   */
20 public class DocumentUpdateTest extends BaseIndexingTestCase {
21
22    public void testUpdate() throws IOException {
23
24      assertEquals(1, getHitCount("city", "Amsterdam"));
25
26      IndexReader reader = IndexReader.open(dir);
27      reader.delete(new Term("city", "Amsterdam")); // 删除文档
28      reader.close();
29
30      IndexWriter writer = new IndexWriter(dir, getAnalyzer(),
31        false);
32      Document doc = new Document();
33      doc.add(Field.Keyword("id", "1"));
34      doc.add(Field.UnIndexed("country", "Russia"));
35      doc.add(Field.UnStored("contents",
36        "St. Petersburg has lots of bridges"));
37      doc.add(Field.Text("city", "St. Petersburg")); // 重新加入
38      writer.addDocument(doc); // 完成更新
39      writer.optimize();
40      writer.close();
41
42      assertEquals(0, getHitCount("city", "Amsterdam"));
43      assertEquals(1, getHitCount("city", "Petersburg"));
44    }
45
46    protected Analyzer getAnalyzer() {
47      return new WhitespaceAnalyzer();
48    }
49
50    private int getHitCount(String fieldName, String searchString)
51      throws IOException {
52      IndexSearcher searcher = new IndexSearcher(dir);
53      Term t = new Term(fieldName, searchString);
54      Query query = new TermQuery(t);
55      Hits hits = searcher.search(query);
56      int hitCount = hits.length();
57      searcher.close();
58      return hitCount;
59    }
60 }

至于为什么利用这样的方法更新文档 , 我想应该和 Lucene 的索引建立机制有关 , 可以找找相关资料看看 .

在批量更新时 , 请安装如下步骤 , 会提升性能的 .

1 Open IndexReader.
2 Delete all the Documents you need to delete.
3 Close IndexReader.
4 Open IndexWriter.
5 Add all the Documents you need to add.
6 Close IndexWriter.

3. 给 Document 和 Field 标记 Boosting 值

Boosting 值就是所谓的加权比重系数 , 确定该数据的重要程度 . 例如一些不重要的资料你可以设置小的 Boost 值默认值为 1.0.

4. 索引日期

日期是比较重要的 , 在 Lucene 中有两种方法来 indexing date. 一个是利用 :Field.Keyword(String,Date) , 一个是 DateField 类 .

例如 : 你可以 indexing 今天的日期如下 :

Document doc = new Document();
doc.add(Field.Keyword("indexDate", new Date()));

而在 Lucene 内部呢利用 DateField 类把 Date 转换为 String 来存储的 .

注意该方法是有 Java 平台来决定日期的精确度 , 目前是精确到毫秒级的 , 在实践中可能不需要这么精确如 : 精确到天就可以了 , 这时你可以自己转换为 String 来 indexing 如 "YYYYMMDD" 这样你还可以根据年或者月来索引 . 如 : 根据年来索引利用 YYYY 的 ProfixQuery 就可以 . 在 ch3 中讨论到 ProfixQuery.

5. indexing 数字 .

数字有两种不同的情况 .

I. 文本中的数据如 :Mt.Everest is 8848 meters tall. 此时可以用 String 来 idnexing.

II. 数据还要用来范围搜索或者排序的 . 由于数组是存储为 Sting 所以排序的结果是字典顺序 . 只要把数字前面填充 0 结果就会字典顺序一样了 . 如 : 7, 20, 71 如果不填充结果为 20 , 7, 71. 而填充后结果为 07, 20, 71 是正确的 .

6. 索引用来排序的 Field

当我们 search 得到 Hits 时 , Lucene 安装默认的 Score 来排序结果 . 如果你要用自己的条件来排序结果 , 例如一个 Field 的值 .

那么你必须添加该值被 Indexed 并且不被 tokenized( 例如 :Field.Keyword). 用来排序的 Field 值必须可以转换为 Integers,
Floats, or Strings:

Field.Keyword("size", "4096"); //
Field.Keyword("price", "10.99");
Field.Keyword("author", "Arthur C. Clark");
Although we’ve indexed numeric values as Strings, you can specify the correct
Field type (such as Integer or Long) at sort time, as described in section 5.1.7.
注意 : Fields used for sorting have to be indexed and must not be tokenized.

7. 控制索引处理过程 (controlling the indexing process)

I, 调整 indexing 性能 .

在处理大量数据时候 , 使用 Lucene 的默认设置可能不会适合你的情况 , 有几个参数可以让你来调整 Lucene 的 indexing 性能 .


IndexWrite 变量	属性	默认值	说明
mergeFactor	org.apache.lucene.mergeFactor	10	控制segment merge的大小和频率
maxMergeDocs	org.apache.lucene.maxMergeDocs	Integer.MAX_VALUE	限制一个segment中的document数目
minMergeDocs	org.apache.lucene.minMergeDocs	10	Controls the amount of RAM used when indexing

上面的几个参数的作用我们来看看原文吧翻译出来就变味了 : :)

IndexWriter’s mergeFactor lets you control how many Documents to store in memory
before writing them to the disk, as well as how often to merge multiple index
segments together. (Index segments are covered in appendix B.) With the default
value of 10, Lucene stores 10 Documents in memory before writing them to a single
segment on the disk. The mergeFactor value of 10 also means that once the
number of segments on the disk has reached the power of 10, Lucene merges
these segments into a single segment.
For instance, if you set mergeFactor to 10, a new segment is created on the disk
for every 10 Documents added to the index. When the tenth segment of size 10 is
added, all 10 are merged into a single segment of size 100. When 10 such segments
of size 100 have been added, they’re merged into a single segment containing
1,000 Documents, and so on. Therefore, at any time, there are no more than 9
segments in the index, and the size of each merged segment is the power of 10.
There is a small exception to this rule that has to do with maxMergeDocs,
another IndexWriter instance variable: While merging segments, Lucene ensuresthat no segment with more than maxMergeDocs Documents is created. For instance,
suppose you set maxMergeDocs to 1,000. When you add the ten-thousandth Document,
instead of merging multiple segments into a single segment of size 10,000,
Lucene creates the tenth segment of size 1,000 and keeps adding new segments
of size 1,000 for every 1,000 Documents added.

Now that you’ve seen how mergeFactor and maxMergeDocs work, you can
deduce that using a higher value for mergeFactor causes Lucene to use more RAM
but let it write data to disk less frequently, consequently speeding up the indexing
process. A lower mergeFactor uses less memory and causes the index to be
updated more frequently, which makes it more up to date but also slows down the
indexing process. Similarly, a higher maxMergeDocs is better suited for batch
indexing, and a lower maxMergeDocs is better for more interactive indexing. Be
aware that because a higher mergeFactor means less frequent merges, it results in
an index with more index files. Although this doesn’t affect indexing performance,
it may slow searching, because Lucene will need to open, read, and process
more index files.
minMergeDocs is another IndexWriter instance variable that affects indexing
performance. Its value controls how many Documents have to be buffered before
they’re merged to a segment. The minMergeDocs parameter lets you trade in
more of your RAM for faster indexing. Unlike mergeFactor, this parameter
doesn’t affect the size of index segments on disk.
下面来看几个例子吧 :

01 package lia.indexing;
02
03 import org.apache.lucene.index.IndexWriter;
04 import org.apache.lucene.analysis.Analyzer;
05 import org.apache.lucene.analysis.SimpleAnalyzer;
06 import org.apache.lucene.document.Document;
07 import org.apache.lucene.document.Field;
08 import org.apache.lucene.store.Directory;
09 import org.apache.lucene.store.FSDirectory;
10
11 /**
12   *
13   */
14 public class IndexTuningDemo {
15
16    public static void main(String[] args) throws Exception {
17      int docsInIndex  = Integer.parseInt(args[0]);
18
19      // create an index called 'index-dir' in a temp directory
20      Directory dir = FSDirectory.getDirectory(
21        System.getProperty("java.io.tmpdir", "tmp") +
22        System.getProperty("file.separator") + "index-dir", true);
23      Analyzer analyzer = new SimpleAnalyzer();
24      IndexWriter writer = new IndexWriter(dir, analyzer, true);
25
26      // set variables that affect speed of indexing
27      writer.mergeFactor   = Integer.parseInt(args[1]); // 设置各个参数
28      writer.maxMergeDocs  = Integer.parseInt(args[2]);
29      writer.minMergeDocs  = Integer.parseInt(args[3]);
30      writer.infoStream    = System.out;
31
32      System.out.println("Merge factor:   " + writer.mergeFactor);
33      System.out.println("Max merge docs: " + writer.maxMergeDocs);
34      System.out.println("Min merge docs: " + writer.minMergeDocs);
35
36      long start = System.currentTimeMillis();
37      for (int i = 0; i < docsInIndex; i++) {
38        Document doc = new Document();
39        doc.add(Field.Text("fieldname", "Bibamus"));
40        writer.addDocument(doc);
41      }
42      writer.close();
43      long stop = System.currentTimeMillis();
44      System.out.println("Time: " + (stop - start) + " ms");
45    }
46 }

不同的运行结果如下;

% java lia.indexing.IndexTuningDemo 100000 10 9999999 10
Merge factor: 10
Max merge docs: 9999999
Min merge docs: 10
Time: 74136 ms

% java lia.indexing.IndexTuningDemo 100000 100 9999999 10
Merge factor: 100
Max merge docs: 9999999
Min merge docs: 10
Time: 68307 ms

% java lia.indexing.IndexTuningDemo 100000 10 9999999 100
Merge factor: 10
Max merge docs: 9999999
Min merge docs: 100
Time: 54050 ms

% java lia.indexing.IndexTuningDemo 100000 100 9999999 100
Merge factor: 100
Max merge docs: 9999999
Min merge docs: 100
Time: 47831 ms

% java lia.indexing.IndexTuningDemo 100000 100 9999999 1000
Merge factor: 100
Max merge docs: 9999999
Min merge docs: 1000
Time: 44235 ms

% java lia.indexing.IndexTuningDemo 100000 1000 9999999 1000
Merge factor: 1000
Max merge docs: 9999999
Min merge docs: 1000
Time: 44223 ms

% java -server -Xms128m -Xmx256m
➾ lia.indexing.IndexTuningDemo 100000 1000 9999999 1000

Merge factor: 1000
Max merge docs: 9999999
Min merge docs: 1000
Time: 36335 ms

% java lia.indexing.IndexTuningDemo 100000 1000 9999999 10000
Exception in thread "main" java.lang.OutOfMemoryError

大家可以从上面看到各个参数的作用.

注意 : 增加mergeFactor and minMergeDocs的值可以提高indexing 的数度.但是凡事都有个度.如果值太大了, 会用去更多的RAM 最终会导致run out of memory 异常就像上面的一样.

II 在RAM中indexing

利用RAMDirectory你可以把index建立在内存中而不是硬盘上这样可以加快处理数度,当然了如果关机后index就没有了.

建于RAM的数度可以把RAMDirectory作为批处理的buffer.到一定量再转存到FSDirectory中.

步骤如下:

1 Create an FSDirectory-based index.
2 Create a RAMDirectory-based index.
3 Add Documents to the RAMDirectory-based index.
4 Every so often, flush everything buffered in RAMDirectory into FSDirectory.
5 Go to step 3. (Who says GOTO is dead?)

利用Lucene的伪代码可以如下表示:

FSDirectory fsDir = FSDirectory.getDirectory("/tmp/index",true);
RAMDirectory ramDir = new RAMDirectory();
IndexWriter fsWriter = IndexWriter(fsDir,new SimpleAnalyzer(), true);
IndexWriter ramWriter = new IndexWriter(ramDir,new SimpleAnalyzer(), true);
while (there are documents to index) {
... create Document ...
ramWriter.addDocument(doc);
if (condition for flushing memory to disk has been met) {
    fsWriter.addIndexes(Directory[] {ramDir}); // 转存
    ramWriter.close();
    ramWriter = new IndexWriter(ramDir, new SimpleAnalyzer(),true); //创建一个新的buffer
}
}
　

你还可以并发处理RAMDirectory然后合并的FSDirectory中以提高数度.

III. 限制Field的大小 maxFieldLength.

maxFieldLength 这个参数用来设置一个Field中最大Term数目，超过部分忽略，不会被index，所以自然就搜索不到结果.

看个例子吧:

01 package lia.indexing;
02
03 import org.apache.lucene.index.IndexWriter;
04 import org.apache.lucene.analysis.SimpleAnalyzer;
05 import org.apache.lucene.document.Document;
06 import org.apache.lucene.document.Field;
07 import org.apache.lucene.store.Directory;
08 import org.apache.lucene.store.FSDirectory;
09 import org.apache.lucene.index.Term;
10 import org.apache.lucene.search.Query;
11 import org.apache.lucene.search.TermQuery;
12 import org.apache.lucene.search.Hits;
13 import org.apache.lucene.search.IndexSearcher;
14
15 import junit.framework.TestCase;
16 import java.io.IOException;
17
18 /**
19   *
20   */
21 public class FieldLengthTest extends TestCase {
22
23    private Directory dir;
24    private String[] keywords = {"1", "2"};
25    private String[] unindexed = {"Netherlands", "Italy"};
26    private String[] unstored = {"Amsterdam has lots of bridges",
27                                 "Venice has lots of canals"};
28    private String[] text = {"Amsterdam", "Venice"};
29
30    protected void setUp() throws IOException {
31      String indexDir =
32        System.getProperty("java.io.tmpdir", "tmp") +
33        System.getProperty("file.separator") + "index-dir";
34      dir = FSDirectory.getDirectory(indexDir, true);
35    }
36
37    public void testFieldSize() throws IOException {
38      addDocuments(dir, 10); // index field 的前 10 个 Term
39      assertEquals(1, getHitCount("contents", "bridges"));// term bridges 在前 10 个中所以被索引了
40
41      addDocuments(dir, 1); // index field 的第一个 term
42      assertEquals(0, getHitCount("contents", "bridges")); //term bridges 没有被索引
43    }
44
45    private int getHitCount(String fieldName, String searchString)
46      throws IOException {
47      IndexSearcher searcher = new IndexSearcher(dir);
48      Term t = new Term(fieldName, searchString);
49      Query query = new TermQuery(t);
50      Hits hits = searcher.search(query);
51      int hitCount = hits.length();
52      searcher.close();
53      return hitCount;
54    }
55
56    private void addDocuments(Directory dir, int maxFieldLength)
57      throws IOException {
58      IndexWriter writer = new IndexWriter(dir, new SimpleAnalyzer(),
59        true);
60      writer.maxFieldLength = maxFieldLength; // 设置 maxFieldLength 值
61      for (int i = 0; i < keywords.length; i++) {
62        Document doc = new Document();
63        doc.add(Field.Keyword("id", keywords[i]));
64        doc.add(Field.UnIndexed("country", unindexed[i]));
65        doc.add(Field.UnStored("contents", unstored[i]));
66        doc.add(Field.Text("city", text[i]));
67        writer.addDocument(doc);
68      }
69      writer.optimize();
70      writer.close();
71    }
72 }

8. 优化Index

优化index 可以提供查找数度,在优化过程中需要双备的磁盘空间, 应为是把小的段合并在完成后才会删除以前的文件.

注意 :optimizing an index only affects the speed of searchesagainst that index, and doesn’t affect the speed of indexing.

Contrary to a popular belief, optimizing an index doesn’t improve
indexing speed. Optimizing an index improves only the speed of
searching by minimizing the number of index files that need to be
opened, processed, and searched. Optimize an index only at the end of
the indexing process, when you know the index will remain unmodified
for a while.

9. 并发操作,线程安全和locking机制

并发操作

所有只读操作都可以并发访问
在index被修改期间，所有只读操作都可以并发访问
对index修改操作不能并发，一个index只能被一个线程修改
对index的优化，合并，添加都是修改操作

IndexWriter 和IndexReader的实例可以被多线程共享，他们内部是实现了同步，所以外面使用不需要同步

使用了文件锁的Locking机制. 在不需要的情况下如index在cd rom上.可以禁用该机制.

如果你看到了下面类似的Exception;

java.io.IOException: Lock obtain timed out
at org.apache.lucene.store.Lock.obtain(Lock.java:97)
at
org.apache.lucene.index.IndexWriter.<init>(IndexWriter.java:173)
at lia.indexing.LockTest.testWriteLock(LockTest.java:34)

说明你用错了Lucene的API 赶快找错误吧 :).

10.Debugging indexing

通过设置 writer.infoStream 变量 , 如

writer.infoStream = System.out;

可以打印一些indexing信息让你对内部机制有一定的了解.
　

11. 总结:

This chapter has given you a solid understanding of how a Lucene index operates.
In addition to adding Documents to an index, you should now be able to
remove and update indexed Documents as well as manipulate a couple of indexing
factors to fine-tune several aspects of indexing to meet your needs. The
knowledge about concurrency, thread-safety, and locking is essential if you’re
using Lucene in a multithreaded application or a multiprocess system. By now
you should be dying to learn how to search with Lucene, and that’s what you’ll
read about in the next chapter.

哎经过8小时奋战 ch2 终于搞明白了, 现在对indexing 不在糊涂了.虽然类还是很高兴的 , 记录下来以备不时之需.

posted on 2007-01-05 10:10 Lansing 阅读(850) 评论(0) 编辑收藏所属分类: 搜索引擎

新用户注册刷新评论列表


只有注册用户登录后才能发表评论。




网站导航: 博客园 IT新闻 Chat2DB C++博客博问管理
相关文章: Lucene In Action Ch6 笔记 Lucene In Action Ch4 笔记 Lucene In Action Ch3 笔记 Lucene In Action Ch2 笔记 ORACLE 全文索引功能实现全文索引—CONTAINS语法基于Java的全文索引/检索引擎——Lucene

2007年1月

日

一

二

三

四

五

六

欢迎探讨，努力学习Java哈

常用链接

留言簿(3)

随笔分类

随笔档案

文章分类

学习(1)

文章档案

2006年8月 (1)

Lansing's Download

Lansing's Link

我的博客

我的QQ空间