Lucene In Action ch2
系统的讲解了 indexing,下面就来看看吧.
1,indexing
的处理过程.
首先要把indexing的数据转换为text,因为Lucene只能索引text,然后由Analysis来过虑text,把一些ch1中提到的所谓的stop words 过滤掉, 然后建立index.建立的index为
inverted index
也就是所谓的倒排索引
.
2,
基本的
ingex
操作
基本的操作
包括
:
添加
删除
更新
.
I .
添加
下面我们看个例子代码
BaseIndexingTestCase.class
01
package lia.indexing;
02
03
import org.apache.lucene.store.Directory;
04
import org.apache.lucene.store.FSDirectory;
05
import org.apache.lucene.document.Document;
06
import org.apache.lucene.document.Field;
07
import org.apache.lucene.index.IndexWriter;
08
import org.apache.lucene.index.IndexReader;
09
import org.apache.lucene.analysis.Analyzer;
10
import org.apache.lucene.analysis.SimpleAnalyzer;
11
12
import junit.framework.TestCase;
13
import java.io.IOException;
14
15
/**
16
*
17
*/
18
public abstract class BaseIndexingTestCase extends TestCase {
19
protected String[] keywords = {"1", "2"};
20
protected String[] unindexed = {"Netherlands", "Italy"};
21
protected String[] unstored = {"Amsterdam has lots of bridges",
22
"Venice has lots of canals"};
23
protected String[] text = {"Amsterdam", "Venice"};
24
protected Directory dir;
25
// setUp
方法
26
protected void setUp() throws IOException {
27
String indexDir =
28
System.getProperty("java.io.tmpdir", "tmp") +
29
System.getProperty("file.separator") + "index-dir";
30
dir = FSDirectory.getDirectory(indexDir, true);
31
addDocuments(dir);
32
}
33
34
protected void addDocuments(Directory dir)
35
throws IOException {
36
IndexWriter writer = new IndexWriter(dir, getAnalyzer(),
37
true); //
得到
indexWriter
实例
38
writer.setUseCompoundFile(isCompound());
39
for (int i = 0; i < keywords.length; i++) {
40
Document doc = new Document(); //
添加文档
41
doc.add(Field.Keyword("id", keywords[i]));
42
doc.add(Field.UnIndexed("country", unindexed[i]));
43
doc.add(Field.UnStored("contents", unstored[i]));
44
doc.add(Field.Text("city", text[i]));
45
writer.addDocument(doc);
46
}
47
writer.optimize(); //
优化
index
48
writer.close();
49
}
50
//
可以覆盖该方法提供不同的
Analyzer
51
protected Analyzer getAnalyzer() {
52
return new SimpleAnalyzer();
53
}
54
//
也可以覆盖该方法
指出
Compound
属性
是否是
Heterogeneous Documents
55
protected boolean isCompound() {
56
return true;
57
}
58
//
测试添加文档
59
public void testIndexWriter() throws IOException {
60
IndexWriter writer = new IndexWriter(dir, getAnalyzer(),
61
false);
62
assertEquals(keywords.length, writer.docCount());
63
writer.close();
64
}
65
//
测试
IndexReader
66
public void testIndexReader() throws IOException {
67
IndexReader reader = IndexReader.open(dir);
68
assertEquals(keywords.length, reader.maxDoc());
69
assertEquals(keywords.length, reader.numDocs());
70
reader.close();
71
}
72
}
|
这是一个测试超类
可以被其他的测试用例继承
来测试不同的功能
.
上面带有详细的注释
.
在添加
Field
时
,
会遇到同义词的情况
,
添加同义词由两种方式
:
a.
创建一个同义词词组
,
循环添加到
Single Strng
的不同
Field
中
.
b.
把同义词添加到一个
Base word
的
field
中
.
如下
:
String baseWord = "fast";
String synonyms[] = String {"quick", "rapid", "speedy"};
Document doc = new Document();
doc.add(Field.Text("word", baseWord));
for (int i = 0; i < synonyms.length; i++) {
doc.add(Field.Text("word", synonyms[i]));
}
这样
在
Lucene
内部把每个词都添加的一个名为
word
的
Field
中
,
在搜索时
你可以使用任何一个给定的词语
.
II.
删除
删除
Document
利用
IndexReader.IndexReader
并不是立即的在
Index
中删除
Document,
而是标记为
Deleted,
当
IndexReader's close()
方法调用时
才真正删除
.
这点和
Hibernate
的
Lazy Loading
差不多
并不是立即加载数据
而是在你用到的时候加载数据
.
下面看一个测试的例子
:
DocumentDeleteTest.
java
01
package lia.indexing;
02
03
import org.apache.lucene.index.IndexWriter;
04
import org.apache.lucene.index.IndexReader;
05
06
import java.io.IOException;
07
08
/**
09
*
10
*/
11
public class DocumentDeleteTest extends BaseIndexingTestCase {
12
//
在
Merge
执行以前测试删除
13
public void testDeleteBeforeIndexMerge() throws IOException {
14
IndexReader reader = IndexReader.open(dir);
15
assertEquals(2, reader.maxDoc()); //
下一个文档号是
2
16
assertEquals(2, reader.numDocs()); //index
中的文档数目
17
reader.delete(1); //
删除
id
为
1
的文档
18
19
assertTrue(reader.isDeleted(1)); // id
为
1
的文档被删除
20
assertTrue(reader.hasDeletions()); //index
中存在被标记为删除的文档
21
assertEquals(2, reader.maxDoc()); //
注意这两行
22
assertEquals(1, reader.numDocs()); // index
中有一个文档
23
24
reader.close();
25
26
reader = IndexReader.open(dir);
27
28
assertEquals(2, reader.maxDoc()); //
重新打开
reader
测试
29
assertEquals(1, reader.numDocs());
30
31
reader.close();
32
}
33
//
在
index Merge
后测试删除
34
public void testDeleteAfterIndexMerge() throws IOException {
35
IndexReader reader = IndexReader.open(dir);
36
assertEquals(2, reader.maxDoc());
37
assertEquals(2, reader.numDocs());
38
reader.delete(1);
39
reader.close();
40
41
IndexWriter writer = new IndexWriter(dir, getAnalyzer(),
42
false);
43
writer.optimize(); //
优化
,
即
Merge idnex
改变了下面的文档号
44
writer.close();
45
46
reader = IndexReader.open(dir);
47
48
assertFalse(reader.isDeleted(1));
49
assertFalse(reader.hasDeletions());
50
assertEquals(1, reader.maxDoc()); //
注意与上面的不同
51
assertEquals(1, reader.numDocs());
52
53
reader.close();
54
}
55
}
|
该
DocumentDeleteTest
类继承上面的类
.
里面带有详细的注释
.
删除
Document
可以利用
lucene
内部的文档号码来删除
,
但是该号码不是固定的
,
当
Merge
后会有变化
.
也可以查找删除
就是删除符合条件的文档
.
如
:
IndexReader reader = IndexReader.open(dir);
reader.delete(new Term("city", "Amsterdam"));
reader.close();
这样只删除符合条件的文档
.
使用该方法时
要小心
可能一不小心就删除了所有的数据了
.
关于为什么使用
IndexReader
来删除文档而不使用
IndexWriter
大家可以看看
Lucene in Actin
里面的解释
,
You may wonder why Lucene performs Document deletion from IndexReader
and not IndexWriter instances. That question is asked in the Lucene community
every few months, probably due to imperfect and perhaps misleading class names.
Lucene users often think that IndexWriter is the only class that can modify an
index and that IndexReader accesses an index in a read-only fashion. In reality,
IndexWriter touches only the list of index segments and a small subset of index
files when segments are merged. On the other hand, IndexReader knows how to
parse all index files and make sense out of them. When a Document is deleted,
IndexReader first needs to locate the segment containing the specified Document
before it can mark it as deleted. There are currently no plans to change either the
names or behavior of these two Lucene classes.
注意
:
由于删除是延迟的
,
所以在
Reader
关闭以前你还有机会把标记为
deleted
的文档
恢复回来
,
调用
undeleteAll()
方法就可以了
.
III.
更新
更新
index,
在
Lucene
中只能先删除
然后再添加
.
如下例子所示
:
DocumentUpdateTest.java
01
package lia.indexing;
02
03
import org.apache.lucene.index.IndexWriter;
04
import org.apache.lucene.index.IndexReader;
05
import org.apache.lucene.analysis.WhitespaceAnalyzer;
06
import org.apache.lucene.analysis.Analyzer;
07
import org.apache.lucene.document.Document;
08
import org.apache.lucene.document.Field;
09
import org.apache.lucene.index.Term;
10
import org.apache.lucene.search.Query;
11
import org.apache.lucene.search.TermQuery;
12
import org.apache.lucene.search.Hits;
13
import org.apache.lucene.search.IndexSearcher;
14
15
import java.io.IOException;
16
17
/**
18
*
19
*/
20
public class DocumentUpdateTest extends BaseIndexingTestCase {
21
22
public void testUpdate() throws IOException {
23
24
assertEquals(1, getHitCount("city", "Amsterdam"));
25
26
IndexReader reader = IndexReader.open(dir);
27
reader.delete(new Term("city", "Amsterdam")); //
删除文档
28
reader.close();
29
30
IndexWriter writer = new IndexWriter(dir, getAnalyzer(),
31
false);
32
Document doc = new Document();
33
doc.add(Field.Keyword("id", "1"));
34
doc.add(Field.UnIndexed("country", "Russia"));
35
doc.add(Field.UnStored("contents",
36
"St. Petersburg has lots of bridges"));
37
doc.add(Field.Text("city", "St. Petersburg")); //
重新加入
38
writer.addDocument(doc); //
完成更新
39
writer.optimize();
40
writer.close();
41
42
assertEquals(0, getHitCount("city", "Amsterdam"));
43
assertEquals(1, getHitCount("city", "Petersburg"));
44
}
45
46
protected Analyzer getAnalyzer() {
47
return new WhitespaceAnalyzer();
48
}
49
50
private int getHitCount(String fieldName, String searchString)
51
throws IOException {
52
IndexSearcher searcher = new IndexSearcher(dir);
53
Term t = new Term(fieldName, searchString);
54
Query query = new TermQuery(t);
55
Hits hits = searcher.search(query);
56
int hitCount = hits.length();
57
searcher.close();
58
return hitCount;
59
}
60
}
|
至于为什么利用这样的方法更新文档
,
我想应该和
Lucene
的索引建立机制有关
,
可以找找相关资料看看
.
在批量更新时
,
请安装如下步骤
,
会提升性能的
.
1 Open IndexReader.
2 Delete all the Documents you need to delete.
3 Close IndexReader.
4 Open IndexWriter.
5 Add all the Documents you need to add.
6 Close IndexWriter.
3.
给
Document
和
Field
标记
Boosting
值
Boosting
值就是所谓的加权比重系数
,
确定该数据的重要程度
.
例如一些不重要的资料你可以设置小的
Boost
值
默认值为
1.0.
4.
索引日期
日期是比较重要的
,
在
Lucene
中有两种方法来
indexing date.
一个是利用
:Field.Keyword(String,Date) ,
一个是
DateField
类
.
例如
:
你可以
indexing
今天的日期如下
:
Document doc = new Document();
doc.add(Field.Keyword("indexDate", new Date()));
而在
Lucene
内部呢
利用
DateField
类把
Date
转换为
String
来存储的
.
注意该方法是有
Java
平台来决定日期的精确度
,
目前是精确到毫秒级的
,
在实践中可能不需要这么精确
如
:
精确到天就可以了
,
这时你可以自己转换为
String
来
indexing
如
"YYYYMMDD"
这样你还可以根据年或者月来索引
.
如
:
根据年来索引利用
YYYY
的
ProfixQuery
就可以
.
在
ch3
中讨论到
ProfixQuery.
5. indexing
数字
.
数字有两种不同的情况
.
I.
文本中的数据
如
:Mt.Everest is 8848 meters tall.
此时可以用
String
来
idnexing.
II.
数据还要用来范围搜索或者排序的
.
由于数组是存储为
Sting
所以排序的结果是字典顺序
.
只要把数字前面填充
0
结果就会字典顺序一样了
.
如
: 7, 20, 71
如果不填充
结果为
20 , 7, 71.
而填充后
结果为
07, 20, 71
是正确的
.
6.
索引用来排序的
Field
当我们
search
得到
Hits
时
, Lucene
安装默认的
Score
来排序结果
.
如果你要用自己的条件来排序结果
,
例如一个
Field
的值
.
那么你必须添加该值被
Indexed
并且不被
tokenized(
例如
:Field.Keyword).
用来排序的
Field
值必须可以转换为
Integers,
Floats, or Strings:
Field.Keyword("size", "4096"); //
Field.Keyword("price", "10.99");
Field.Keyword("author", "Arthur C. Clark");
Although we’ve indexed numeric values as Strings, you can specify the correct
Field type (such as Integer or Long) at sort time, as described in section 5.1.7.
注意
: Fields used for sorting have to be indexed and must not be tokenized.
7.
控制
索引处理过程
(controlling the indexing process)
I,
调整
indexing
性能
.
在处理大量数据时候
,
使用
Lucene
的默认设置可能不会适合你的情况
,
有几个参数可以让你来调整
Lucene
的
indexing
性能
.
|
IndexWrite
变量
|
属性
|
默认值
|
说明
|
mergeFactor
|
org.apache.lucene.mergeFactor
|
10
|
控制segment merge的大小和频率
|
maxMergeDocs
|
org.apache.lucene.maxMergeDocs
|
Integer.MAX_VALUE
|
限制一个segment中的document数目
|
minMergeDocs
|
org.apache.lucene.minMergeDocs
|
10
|
Controls the amount of RAM used when indexing
|
上面的几个参数的作用
我们来看看原文吧
翻译出来就变味了
: :)
IndexWriter’s mergeFactor lets you control how many Documents to store in memory
before writing them to the disk, as well as how often to merge multiple index
segments together. (Index segments are covered in appendix B.) With the default
value of 10, Lucene stores 10 Documents in memory before writing them to a single
segment on the disk. The mergeFactor value of 10 also means that once the
number of segments on the disk has reached the power of 10, Lucene merges
these segments into a single segment.
For instance, if you set mergeFactor to 10, a new segment is created on the disk
for every 10 Documents added to the index. When the tenth segment of size 10 is
added, all 10 are merged into a single segment of size 100. When 10 such segments
of size 100 have been added, they’re merged into a single segment containing
1,000 Documents, and so on. Therefore, at any time, there are no more than 9
segments in the index, and the size of each merged segment is the power of 10.
There is a small exception to this rule that has to do with maxMergeDocs,
another IndexWriter instance variable: While merging segments, Lucene ensuresthat no segment with more than maxMergeDocs Documents is created. For instance,
suppose you set maxMergeDocs to 1,000. When you add the ten-thousandth Document,
instead of merging multiple segments into a single segment of size 10,000,
Lucene creates the tenth segment of size 1,000 and keeps adding new segments
of size 1,000 for every 1,000 Documents added.
Now that you’ve seen how mergeFactor and maxMergeDocs work, you can
deduce that using a higher value for mergeFactor causes Lucene to use more RAM
but let it write data to disk less frequently, consequently speeding up the indexing
process. A lower mergeFactor uses less memory and causes the index to be
updated more frequently, which makes it more up to date but also slows down the
indexing process. Similarly, a higher maxMergeDocs is better suited for batch
indexing, and a lower maxMergeDocs is better for more interactive indexing. Be
aware that because a higher mergeFactor means less frequent merges, it results in
an index with more index files. Although this doesn’t affect indexing performance,
it may slow searching, because Lucene will need to open, read, and process
more index files.
minMergeDocs is another IndexWriter instance variable that affects indexing
performance. Its value controls how many Documents have to be buffered before
they’re merged to a segment. The minMergeDocs parameter lets you trade in
more of your RAM for faster indexing. Unlike mergeFactor, this parameter
doesn’t affect the size of index segments on disk.
下面来看几个例子吧
:
01
package lia.indexing;
02
03
import org.apache.lucene.index.IndexWriter;
04
import org.apache.lucene.analysis.Analyzer;
05
import org.apache.lucene.analysis.SimpleAnalyzer;
06
import org.apache.lucene.document.Document;
07
import org.apache.lucene.document.Field;
08
import org.apache.lucene.store.Directory;
09
import org.apache.lucene.store.FSDirectory;
10
11
/**
12
*
13
*/
14
public class IndexTuningDemo {
15
16
public static void main(String[] args) throws Exception {
17
int docsInIndex = Integer.parseInt(args[0]);
18
19
// create an index called 'index-dir' in a temp directory
20
Directory dir = FSDirectory.getDirectory(
21
System.getProperty("java.io.tmpdir", "tmp") +
22
System.getProperty("file.separator") + "index-dir", true);
23
Analyzer analyzer = new SimpleAnalyzer();
24
IndexWriter writer = new IndexWriter(dir, analyzer, true);
25
26
// set variables that affect speed of indexing
27
writer.mergeFactor = Integer.parseInt(args[1]); //
设置各个参数
28
writer.maxMergeDocs = Integer.parseInt(args[2]);
29
writer.minMergeDocs = Integer.parseInt(args[3]);
30
writer.infoStream = System.out;
31
32
System.out.println("Merge factor: " + writer.mergeFactor);
33
System.out.println("Max merge docs: " + writer.maxMergeDocs);
34
System.out.println("Min merge docs: " + writer.minMergeDocs);
35
36
long start = System.currentTimeMillis();
37
for (int i = 0; i < docsInIndex; i++) {
38
Document doc = new Document();
39
doc.add(Field.Text("fieldname", "Bibamus"));
40
writer.addDocument(doc);
41
}
42
writer.close();
43
long stop = System.currentTimeMillis();
44
System.out.println("Time: " + (stop - start) + " ms");
45
}
46
}
不同的运行结果如下;
% java lia.indexing.IndexTuningDemo 100000 10 9999999 10
Merge factor: 10
Max merge docs: 9999999
Min merge docs: 10
Time: 74136 ms
% java lia.indexing.IndexTuningDemo 100000 100 9999999 10
Merge factor: 100
Max merge docs: 9999999
Min merge docs: 10
Time: 68307 ms
% java lia.indexing.IndexTuningDemo 100000 10 9999999 100
Merge factor: 10
Max merge docs: 9999999
Min merge docs: 100
Time: 54050 ms
% java lia.indexing.IndexTuningDemo 100000 100 9999999 100
Merge factor: 100
Max merge docs: 9999999
Min merge docs: 100
Time: 47831 ms
% java lia.indexing.IndexTuningDemo 100000 100 9999999 1000
Merge factor: 100
Max merge docs: 9999999
Min merge docs: 1000
Time: 44235 ms
% java lia.indexing.IndexTuningDemo 100000 1000 9999999 1000
Merge factor: 1000
Max merge docs: 9999999
Min merge docs: 1000
Time: 44223 ms
% java -server -Xms128m -Xmx256m
➾
lia.indexing.IndexTuningDemo 100000 1000 9999999 1000
Merge factor: 1000
Max merge docs: 9999999
Min merge docs: 1000
Time: 36335 ms
% java lia.indexing.IndexTuningDemo 100000 1000 9999999 10000
Exception in thread "main" java.lang.OutOfMemoryError
大家可以从上面看到各个参数的作用.
注意
:
增加mergeFactor and minMergeDocs的值 可以提高indexing 的数度.但是凡事都有个度.如果值太大了, 会用去更多的RAM 最终会导致run out of memory 异常 就像上面的一样.
II
在RAM中indexing
利用RAMDirectory你可以把index建立在内存中 而不是硬盘上 这样可以加快处理数度,当然了如果关机后index就没有了.
建于RAM的数度 可以把RAMDirectory作为批处理的buffer.到一定量再转存到FSDirectory中.
步骤如下:
1 Create an FSDirectory-based index.
2 Create a RAMDirectory-based index.
3 Add Documents to the RAMDirectory-based index.
4 Every so often, flush everything buffered in RAMDirectory into FSDirectory.
5 Go to step 3. (Who says GOTO is dead?)
利用Lucene的伪代码可以如下表示:
FSDirectory fsDir = FSDirectory.getDirectory("/tmp/index",true);
RAMDirectory ramDir = new RAMDirectory();
IndexWriter fsWriter = IndexWriter(fsDir,new SimpleAnalyzer(), true);
IndexWriter ramWriter = new IndexWriter(ramDir,new SimpleAnalyzer(), true);
while (there are documents to index) {
... create Document ...
ramWriter.addDocument(doc);
if (condition for flushing memory to disk has been met) {
fsWriter.addIndexes(Directory[] {ramDir}); //
转存
ramWriter.close();
ramWriter = new IndexWriter(ramDir, new SimpleAnalyzer(),true); //创建一个新的buffer
}
}
你还可以并发处理RAMDirectory然后合并的FSDirectory中以提高数度.
III.
限制Field的大小 maxFieldLength.
maxFieldLength
这个参数用来设置一个Field中最大Term数目,超过部分忽略,不会被index,所以自然就搜索不到结果.
看个例子吧:
01
package lia.indexing;
02
03
import org.apache.lucene.index.IndexWriter;
04
import org.apache.lucene.analysis.SimpleAnalyzer;
05
import org.apache.lucene.document.Document;
06
import org.apache.lucene.document.Field;
07
import org.apache.lucene.store.Directory;
08
import org.apache.lucene.store.FSDirectory;
09
import org.apache.lucene.index.Term;
10
import org.apache.lucene.search.Query;
11
import org.apache.lucene.search.TermQuery;
12
import org.apache.lucene.search.Hits;
13
import org.apache.lucene.search.IndexSearcher;
14
15
import junit.framework.TestCase;
16
import java.io.IOException;
17
18
/**
19
*
20
*/
21
public class FieldLengthTest extends TestCase {
22
23
private Directory dir;
24
private String[] keywords = {"1", "2"};
25
private String[] unindexed = {"Netherlands", "Italy"};
26
private String[] unstored = {"Amsterdam has lots of bridges",
27
"Venice has lots of canals"};
28
private String[] text = {"Amsterdam", "Venice"};
29
30
protected void setUp() throws IOException {
31
String indexDir =
32
System.getProperty("java.io.tmpdir", "tmp") +
33
System.getProperty("file.separator") + "index-dir";
34
dir = FSDirectory.getDirectory(indexDir, true);
35
}
36
37
public void testFieldSize() throws IOException {
38
addDocuments(dir, 10); // index field
的前
10
个
Term
39
assertEquals(1, getHitCount("contents", "bridges"));// term bridges
在前
10
个中
所以被索引了
40
41
addDocuments(dir, 1); // index field
的第一个
term
42
assertEquals(0, getHitCount("contents", "bridges")); //term bridges
没有被索引
43
}
44
45
private int getHitCount(String fieldName, String searchString)
46
throws IOException {
47
IndexSearcher searcher = new IndexSearcher(dir);
48
Term t = new Term(fieldName, searchString);
49
Query query = new TermQuery(t);
50
Hits hits = searcher.search(query);
51
int hitCount = hits.length();
52
searcher.close();
53
return hitCount;
54
}
55
56
private void addDocuments(Directory dir, int maxFieldLength)
57
throws IOException {
58
IndexWriter writer = new IndexWriter(dir, new SimpleAnalyzer(),
59
true);
60
writer.maxFieldLength = maxFieldLength; //
设置
maxFieldLength
值
61
for (int i = 0; i < keywords.length; i++) {
62
Document doc = new Document();
63
doc.add(Field.Keyword("id", keywords[i]));
64
doc.add(Field.UnIndexed("country", unindexed[i]));
65
doc.add(Field.UnStored("contents", unstored[i]));
66
doc.add(Field.Text("city", text[i]));
67
writer.addDocument(doc);
68
}
69
writer.optimize();
70
writer.close();
71
}
72
}
|
8.
优化Index
优化index 可以提供查找数度,在优化过程中需要双备的磁盘空间, 应为是把小的段合并 在完成后才会删除以前的文件.
注意
:optimizing an index only affects the speed of searchesagainst that index, and doesn’t affect the speed of indexing.
Contrary to a popular belief, optimizing an index doesn’t improve
indexing speed. Optimizing an index improves only the speed of
searching by minimizing the number of index files that need to be
opened, processed, and searched. Optimize an index only at the end of
the indexing process, when you know the index will remain unmodified
for a while.
9.
并发操作,线程安全和locking机制
并发操作
-
所有只读操作都可以并发访问
-
在index被修改期间,所有只读操作都可以并发访问
-
对index修改操作不能并发,一个index只能被一个线程修改
-
对index的优化,合并,添加都是修改操作
IndexWriter
和IndexReader的实例可以被多线程共享,他们内部是实现了同步,所以外面使用不需要同步
使用了文件锁的Locking机制. 在不需要的情况下 如index在cd rom上.可以禁用该机制.
如果你看到了下面类似的Exception;
java.io.IOException: Lock obtain timed out
at org.apache.lucene.store.Lock.obtain(Lock.java:97)
at
org.apache.lucene.index.IndexWriter.<init>(IndexWriter.java:173)
at lia.indexing.LockTest.testWriteLock(LockTest.java:34)
说明你用错了Lucene的API 赶快找错误吧 :).
10.Debugging indexing
通过设置
writer.infoStream
变量
,
如
writer.infoStream = System.out;
可以打印一些indexing信息 让你对内部机制有一定的了解.
11.
总结:
This chapter has given you a solid understanding of how a Lucene index operates.
In addition to adding Documents to an index, you should now be able to
remove and update indexed Documents as well as manipulate a couple of indexing
factors to fine-tune several aspects of indexing to meet your needs. The
knowledge about concurrency, thread-safety, and locking is essential if you’re
using Lucene in a multithreaded application or a multiprocess system. By now
you should be dying to learn how to search with Lucene, and that’s what you’ll
read about in the next chapter.
哎 经过8小时奋战 ch2 终于搞明白了, 现在对indexing 不在糊涂了.虽然类还是很高兴的 , 记录下来 以备不时之需.
posted on 2007-01-05 10:10
Lansing 阅读(834)
评论(0) 编辑 收藏 所属分类:
搜索引擎