Avenue U

posts(42) comments(0) trackbacks(0)
  • BlogJava
  • 联系
  • RSS 2.0 Feed 聚合
  • 管理

常用链接

  • 我的随笔
  • 我的评论
  • 我的参与

留言簿

  • 给我留言
  • 查看公开留言
  • 查看私人留言

随笔分类

  • C++(1)
  • Core Java(2)
  • My Master-degree Project(33)
  • SSH(4)
  • struts2(1)

随笔档案

  • 2009年7月 (1)
  • 2009年6月 (41)

Core Java

最新随笔

  • 1. String Stream in C++
  • 2. Validators in Struts2
  • 3. An Interceptor Example in Strut2-Spring-Hibernate Application
  • 4. 3 Validators in Struts2-Spring-Hibernate
  • 5. Strut2-Spring-Hibernate under Lomboz Eclipse3.3
  • 6. Run Spring by Maven2 in Vista
  • 7. Appendix B
  • 8. 5 Conclusion
  • 9. 4.7 Sentence Rank on Yahoo News Page
  • 10. 4.6 Sentence Rankv

搜索

  •  

最新评论

阅读排行榜

评论排行榜

View Post

4.2 Title

The text in HTML’s title tag is always playing a vital role in web page retrieval. During the beginning of this project, an extensive amount of experiments were conducted by using the title method. It was believed that the success rate would reach 90% from using title text as a query if the query could be composed carefully and properly. Figure4.12 shows that the title method also has a good stability along with the words number in a query. It is important to mention that, from Figure4.1 to Figure4.10, although the classic methods have better results, it only means the HTML extractions have good performance, which filter the structural HTML tags and functional scripts which could be big distractions in the following application on the target page, because all the basic retrieval process is only designed for pure text without structural tags. For example, HTML tags like ‘td’ and ‘tr’ will have a big term frequencies and the function or variable names in Javascript will cause a very low document frequencies, if they are not filtered or removed in the pre-processing step. However, by using title method, it is much easier to extract the text information only between <title> and </title>.

Title tag

Google

 

Yahoo

 

3

82.00

36.44%

72

32.00%

4

91.00

40.44%

86.00

38.22%

5

111.00

49.33%

94.00

41.78%

6

116.00

51.56%

99.00

44.00%

7

116.00

51.56%

102.00

45.33%

8

115.00

51.11%

102.00

45.33%

9

115.00

51.11%

101.00

44.89%

10

115.00

51.11%

101.00

44.89%

11

115.00

51.11%

102.00

45.33%

12

117.00

52.00%

102.00

45.33%

13

118.00

52.44%

103.00

45.78%

14

126.00

56.00%

111.00

49.33%

15

127.00

56.44%

112.00

49.78%

Average

112.62

50.05%

99.00

44.00%

Table4.11

 

(a)                                                                                                  (b)

Figure4.12 Use title terms as search query

posted on 2009-06-18 08:25 JosephQuinn 阅读(260) 评论(0)  编辑  收藏 所属分类: My Master-degree Project

新用户注册  刷新评论列表  

只有注册用户登录后才能发表评论。


网站导航:
博客园   IT新闻   Chat2DB   C++博客   博问   管理
相关文章:
  • Appendix B
  • 5 Conclusion
  • 4.7 Sentence Rank on Yahoo News Page
  • 4.6 Sentence Rankv
  • 4.5 Random pick sentence
  • 4.4 Word Rank
  • 4.3 Google search tips: meta keys and meta description
  • 4.2 Title
  • 4.1 The basics
  • 3.5 Deep Web Search Engine
 
 
Powered by:
BlogJava
Copyright © JosephQuinn