随笔分类(45)

随笔档案(82)

文章档案(2)

2006年4月 (2)

Java Spaces

搜索

积分与排名

积分 - 66196
排名 - 812

阅读排行榜

评论排行榜

Planning for next job

1. Develop a searching engine merely for Weblogs (Main jobs will be on WebCrawler, Indexer and Searcher part has been done for xml-based information retrieval)

Motivation:
   a. Weblog is more and more popular recently
   b. Though there has some weblog search engines such as Technorati and Blogdigger, but still seems lots of work need to do.
   c. the formats of weblog feed (RSS2.0 & Atom) are xml-based and more standard, which is very close to my current job on xml-based information retrieval
   d. easily extensible for crawling xml-based information websites besides weblogs

HOWTO:
         a. Utilize GData for feeding xml-based information
or      b. using some Open Source Crawlers + Lucene (similar idea in this article)
or    c. develop and merge my own simple Crawler package into my Shemy project which is clustering structure searching engine design based on Lucene

         likely: c > a > b (coz most open source crawlers are supposed to deal with much complex web pages/links, while since weblog feed is simpler, the crawler for it should be lighter)

Requirement/Functionality Analysis : (in progress)

Schedule: (in progress)

2. Exploration of performation tuning on searching issues to improve Shemy kernel

posted on 2006-05-17 06:36 Dedian 阅读(247) 评论(0) 编辑收藏

新用户注册刷新评论列表


只有注册用户登录后才能发表评论。




网站导航: 博客园 IT新闻 Chat2DB C++博客博问管理


Copyright © Dedian	Powered by: 博客园模板提供：沪江博客

导航

常用链接

留言簿(8)