Lucene In Action ch 1 笔记 -- 基本概念

在第一章中作者主要讲了Lucene 是什么能用来干什么, 以及一个 indexing 和 searching 的例子, 通过例子讲解了一点基本(核心)概念.给读者一个基本的Lucene 概况. 然后又介绍了现在流行的搜索框架.

我们主要来看看这个 indexing and searching 例子然后了解一些基本概念.

package lia.meetlucene; import org.apache.lucene.index.IndexWriter; import org.apache.lucene.analysis.standard.StandardAnalyzer; import org.apache.lucene.document.Document; import org.apache.lucene.document.Field; import java.io.File; import java.io.IOException; import java.io.FileReader; import java.util.Date; /** * This code was originally written for * Erik's Lucene intro java.net article */ public class Indexer { public static void main(String[] args) throws Exception { if (args.length != 2) { throw new Exception("Usage: java " + Indexer.class.getName() + " <index dir> <data dir>"); } File indexDir = new File(args[0]); // 在该目录中创建Lucene Incex File dataDir = new File(args[1]); // 该目录中存放备索引的文件 long start = new Date().getTime(); int numIndexed = index(indexDir, dataDir); long end = new Date().getTime(); System.out.println("Indexing " + numIndexed + " files took " + (end - start) + " milliseconds"); } public static int index(File indexDir, File dataDir) throws IOException { if (!dataDir.exists() || !dataDir.isDirectory()) { throw new IOException(dataDir + " does not exist or is not a directory"); } IndexWriter writer = new IndexWriter(indexDir, new StandardAnalyzer(), true); //(1)创建 Lucene Index writer.setUseCompoundFile(false); indexDirectory(writer, dataDir); int numIndexed = writer.docCount(); writer.optimize(); writer.close(); // close index return numIndexed; } private static void indexDirectory(IndexWriter writer, File dir) throws IOException { File[] files = dir.listFiles(); for (int i = 0; i < files.length; i++) { File f = files[i]; if (f.isDirectory()) { indexDirectory(writer, f); //(2) recurse } else if (f.getName().endsWith(".txt")) { indexFile(writer, f); } } } private static void indexFile(IndexWriter writer, File f) throws IOException { if (f.isHidden() || !f.exists() || !f.canRead()) { return; } System.out.println("Indexing " + f.getCanonicalPath()); Document doc = new Document(); doc.add(Field.Text("contents", new FileReader(f))); // (3) index file content doc.add(Field.Keyword("filename", f.getCanonicalPath())); // (4) index file name writer.addDocument(doc); //(5) add document in Lucene index } }

上面的Indexer 使用了几行 Lucene的API, 来indexing 一个目录下面的文件. 运行时候需要两个参数 , 一个保存index的目录和要索引的文件目录.

在上面的类中,需要下面的一些Lucene classes 来执行 indexing 处理:

■

IndexWriter

■

导航

统计

常用链接

留言簿(5)

随笔档案

文章分类

文章档案

java

工具

朋友

搜索

积分与排名

最新评论

阅读排行榜

评论排行榜