As 2 of the most powerful
search engines, Google and Yahoo have the strongest abilities in searching the
surface web and they also provide all kinds of different special search
functions such as web search, news search, image search which are familiar by
the people all over the world. A general experiment in exploring the search
abilities without testing these 2 search engines is certainly not conclusive.
Meanwhile, the
detailed search ability test implementation on Google, Yahoo or other search engine
needs programming according to their result pages, specifically speaking, they
are shown in different HTML templates. For example, Figure3.3 and Figure3.5 are the returned pages from Google and
Yahoo, with the same query “job search”, currently, the differences or quality
are not compared in this section. Although the 2 result pages Figure3.3 and Figure3.5 show a very similar format such as the
search engine input interface at the top, the main content is in left and takes
more than 2/3 spaces and leaving the right 1/3 to commercial websites as advertisements,
the HTML behind the pages show quite different grammar and make extracting the
result title, result summary, result URL not a single general template, but
specifically one extracting algorithm for one search engine.
Here is a segment
from Google result page, the texts shown in Figure3.3 are also in the red boxes in Figure3.2.
Figure3.2
Figure3.3
Here is one
segment from Yahoo web search result page.
Figure3.4
Figure3.4 is a segment from Yahoo result page and
the text shown in Figure3.5 is also in the red square in Figure3.4.
Before parsing the
HTML in Figure3.2 and Figure3.4, both Google and Yahoo provide easier ways
for the developers to parse and extract results information. It is quite
necessary to carefully examine their API first. Meanwhile different kinds of
API provided by Google and Yahoo show quite different capabilities in returning
the result links.
Figure3.5