3.2 Search Engine Selection

View Post

As 2 of the most powerful search engines, Google and Yahoo have the strongest abilities in searching the surface web and they also provide all kinds of different special search functions such as web search, news search, image search which are familiar by the people all over the world. A general experiment in exploring the search abilities without testing these 2 search engines is certainly not conclusive.

Meanwhile, the detailed search ability test implementation on Google, Yahoo or other search engine needs programming according to their result pages, specifically speaking, they are shown in different HTML templates. For example, Figure3.3 and Figure3.5 are the returned pages from Google and Yahoo, with the same query “job search”, currently, the differences or quality are not compared in this section. Although the 2 result pages Figure3.3 and Figure3.5 show a very similar format such as the search engine input interface at the top, the main content is in left and takes more than 2/3 spaces and leaving the right 1/3 to commercial websites as advertisements, the HTML behind the pages show quite different grammar and make extracting the result title, result summary, result URL not a single general template, but specifically one extracting algorithm for one search engine.

Here is a segment from Google result page, the texts shown in Figure3.3 are also in the red boxes in Figure3.2.

Figure3.2

Figure3.3

Here is one segment from Yahoo web search result page.

Figure3.4

Figure3.4 is a segment from Yahoo result page and the text shown in Figure3.5 is also in the red square in Figure3.4.

Before parsing the HTML in Figure3.2 and Figure3.4, both Google and Yahoo provide easier ways for the developers to parse and extract results information. It is quite necessary to carefully examine their API first. Meanwhile different kinds of API provided by Google and Yahoo show quite different capabilities in returning the result links.

Figure3.5

posted on 2009-06-18 07:16 JosephQuinn 阅读(187) 评论(0) 编辑收藏所属分类: My Master-degree Project

新用户注册刷新评论列表


只有注册用户登录后才能发表评论。




网站导航: 博客园 IT新闻 Chat2DB C++博客博问管理
相关文章: Appendix B 5 Conclusion 4.7 Sentence Rank on Yahoo News Page 4.6 Sentence Rankv 4.5 Random pick sentence 4.4 Word Rank 4.3 Google search tips: meta keys and meta description 4.2 Title 4.1 The basics 3.5 Deep Web Search Engine

Avenue U

常用链接

留言簿

随笔分类

随笔档案

Core Java

最新随笔

搜索

最新评论

阅读排行榜

评论排行榜

View Post

3.2 Search Engine Selection