Due to the
excellent performance by sentence rank, a further experiment is conducted:
applying sentence rank on real news web pages. In this section, due to the
length of report, only implement undirected graph and 10 terms per query, the
following success retrieve rate shows a high percentage value when the cosine
similarity on 2 web pages is applied by using 4.1and 4.2. 10 terms a query means only take first 10
words in the selected sentence including stop words which is consistent with
section 4.6. Unlike locating the exact address of a web page itself, this
comparison leads to find similar topic document by comparing 2 different URL
web pages, the details are all introduced in section 3.4.
4.1
4.2
Meanwhile, there
are 3 search engines employed in this section: Yahoo News Search, Yahoo Web
Search and Google Web Search. Unlike from section 4.6 to section 4.1 which only
count URL string match as success retrieval, section 4.7 take document
similarity into consideration, and if equation 4.2’s value is bigger than 0.9, which is also
permitted in S. T Park and Xiaojun Wang’s research, a success retrieval is
considered effective. There are 183 pages in this section which are all from
May 4, 2009, Yahoo News, and all related URL addresses are listed in Appendix
D.
|
Success Counts
|
Success Rate
|
Yahoo News Search
|
171
|
93.44%
|
Yahoo Web Search
|
178
|
97.27%
|
Google Web Search
|
177
|
96.72%
|
Table4.29
(a) (b)
Figure4.32
As Figure4.32’s (b) shows, the success rate is above 90%
which satisfies the project’s initial requirements by applying a single text
retrieval method.