Avenue U

posts(42) comments(0) trackbacks(0)
  • BlogJava
  • 联系
  • RSS 2.0 Feed 聚合
  • 管理

常用链接

  • 我的随笔
  • 我的评论
  • 我的参与

留言簿

  • 给我留言
  • 查看公开留言
  • 查看私人留言

随笔分类

  • C++(1)
  • Core Java(2)
  • My Master-degree Project(33)
  • SSH(4)
  • struts2(1)

随笔档案

  • 2009年7月 (1)
  • 2009年6月 (41)

Core Java

最新随笔

  • 1. String Stream in C++
  • 2. Validators in Struts2
  • 3. An Interceptor Example in Strut2-Spring-Hibernate Application
  • 4. 3 Validators in Struts2-Spring-Hibernate
  • 5. Strut2-Spring-Hibernate under Lomboz Eclipse3.3
  • 6. Run Spring by Maven2 in Vista
  • 7. Appendix B
  • 8. 5 Conclusion
  • 9. 4.7 Sentence Rank on Yahoo News Page
  • 10. 4.6 Sentence Rankv

搜索

  •  

最新评论

阅读排行榜

评论排行榜

View Post

3.2 Search Engine Selection

As 2 of the most powerful search engines, Google and Yahoo have the strongest abilities in searching the surface web and they also provide all kinds of different special search functions such as web search, news search, image search which are familiar by the people all over the world. A general experiment in exploring the search abilities without testing these 2 search engines is certainly not conclusive.

Meanwhile, the detailed search ability test implementation on Google, Yahoo or other search engine needs programming according to their result pages, specifically speaking, they are shown in different HTML templates. For example, Figure3.3 and Figure3.5 are the returned pages from Google and Yahoo, with the same query “job search”, currently, the differences or quality are not compared in this section. Although the 2 result pages Figure3.3 and Figure3.5 show a very similar format such as the search engine input interface at the top, the main content is in left and takes more than 2/3 spaces and leaving the right 1/3 to commercial websites as advertisements, the HTML behind the pages show quite different grammar and make extracting the result title, result summary, result URL not a single general template, but specifically one extracting algorithm for one search engine.

Here is a segment from Google result page, the texts shown in Figure3.3 are also in the red boxes in Figure3.2.

Figure3.2

Figure3.3

Here is one segment from Yahoo web search result page.

Figure3.4

Figure3.4 is a segment from Yahoo result page and the text shown in Figure3.5 is also in the red square in Figure3.4.

Before parsing the HTML in Figure3.2 and Figure3.4, both Google and Yahoo provide easier ways for the developers to parse and extract results information. It is quite necessary to carefully examine their API first. Meanwhile different kinds of API provided by Google and Yahoo show quite different capabilities in returning the result links.

Figure3.5

posted on 2009-06-18 07:16 JosephQuinn 阅读(187) 评论(0)  编辑  收藏 所属分类: My Master-degree Project

新用户注册  刷新评论列表  

只有注册用户登录后才能发表评论。


网站导航:
博客园   IT新闻   Chat2DB   C++博客   博问   管理
相关文章:
  • Appendix B
  • 5 Conclusion
  • 4.7 Sentence Rank on Yahoo News Page
  • 4.6 Sentence Rankv
  • 4.5 Random pick sentence
  • 4.4 Word Rank
  • 4.3 Google search tips: meta keys and meta description
  • 4.2 Title
  • 4.1 The basics
  • 3.5 Deep Web Search Engine
 
 
Powered by:
BlogJava
Copyright © JosephQuinn