Xiaojun Wang’s paper
included word rank on all words in a document, all nouns and verbs in a
document, all noun and adjectives in a document, 2 combined word rank and DF:
WordRank3DF2 and WordRank4DF1. In this project, tests target on all words’ word
rank, nouns and verbs word rank, but nouns and verbs word rank is not included.
Word rank based on direct weighted graph is not included. There are WordRank3DF2,
WordRank4DF1 and WordRank5DF5, meanwhile, TF is taken into consideration, 3
groups are added: WordRank3TFIDF2, WordRank4TFIDF1 and WordRank5TFIDF5.
WordRank
|
Google
|
Yahoo
|
3
|
47.00
|
20.89%
|
46.00
|
20.44%
|
4
|
73.00
|
32.44%
|
68.00
|
30.22%
|
5
|
93.00
|
41.33%
|
88.00
|
39.11%
|
6
|
99.00
|
44.00%
|
98.00
|
43.56%
|
7
|
119.00
|
52.89%
|
119.00
|
52.89%
|
8
|
133.00
|
59.11%
|
119.00
|
52.89%
|
9
|
145.00
|
64.44%
|
127.00
|
56.44%
|
10
|
152.00
|
67.56%
|
133.00
|
59.11%
|
11
|
149.00
|
66.22%
|
129.00
|
57.33%
|
12
|
155.00
|
68.89%
|
129.00
|
57.33%
|
13
|
155.00
|
68.89%
|
129.00
|
57.33%
|
14
|
156.00
|
69.33%
|
135.00
|
60.00%
|
15
|
156.00
|
69.33%
|
130.00
|
57.78%
|
Average
|
125.54
|
55.79%
|
111.54
|
49.57%
|
Table4.16
(a) (b)
Figure4.17 success retrieved pages’ counts per 225 pages and
corresponding percentage value by undirected graph word rank.
NounsVerbs
|
Google
|
Yahoo
|
3
|
29.00
|
12.89%
|
24.00
|
10.67%
|
4
|
52.00
|
23.11%
|
51.00
|
22.67%
|
5
|
72.00
|
32.00%
|
67.00
|
29.78%
|
6
|
85.00
|
37.78%
|
82.00
|
36.44%
|
7
|
100.00
|
44.44%
|
108.00
|
48.00%
|
8
|
110.00
|
48.89%
|
109.00
|
48.44%
|
9
|
125.00
|
55.56%
|
117.00
|
52.00%
|
10
|
129.00
|
57.33%
|
120.00
|
53.33%
|
11
|
134.00
|
59.56%
|
121.00
|
53.78%
|
12
|
134.00
|
59.56%
|
121.00
|
53.78%
|
13
|
138.00
|
61.33%
|
130.00
|
57.78%
|
14
|
136.00
|
60.44%
|
123.00
|
54.67%
|
15
|
140.00
|
62.22%
|
130.00
|
57.78%
|
Average
|
106.46
|
47.32%
|
100.23
|
44.55%
|
Table4.17
(a) (b)
Figure4.18 success retrieved pages’ counts per 225 pages and
corresponding percentage value by undirected graph nouns and verbs rank.
WR3DF2
|
Google
|
Yahoo
|
3
|
128.00
|
56.89%
|
122.00
|
54.22%
|
4
|
151.00
|
67.11%
|
139.00
|
61.78%
|
5
|
165.00
|
73.33%
|
148.00
|
65.78%
|
6
|
168.00
|
74.67%
|
145.00
|
64.44%
|
7
|
170.00
|
75.56%
|
141.00
|
62.67%
|
8
|
169.00
|
75.11%
|
141.00
|
62.67%
|
9
|
167.00
|
74.22%
|
136.00
|
60.44%
|
10
|
165.00
|
73.33%
|
127.00
|
56.44%
|
11
|
165.00
|
73.33%
|
125.00
|
55.56%
|
12
|
165.00
|
73.33%
|
131.00
|
58.22%
|
13
|
163.00
|
72.44%
|
131.00
|
58.22%
|
14
|
160.00
|
71.11%
|
132.00
|
58.67%
|
15
|
161.00
|
71.56%
|
134.00
|
59.56%
|
Average
|
161.31
|
71.69%
|
134.77
|
59.90%
|
Table4.18
(a) (b)
Figure4.19 success retrieved pages’ counts per 225 pages and
corresponding percentage value by undirected graph WordRank3DF2.
WR4DF1
|
Google
|
Yahoo
|
3
|
47.00
|
20.89%
|
47.00
|
20.89%
|
4
|
73.00
|
32.44%
|
68.00
|
30.22%
|
5
|
150.00
|
66.67%
|
139.00
|
61.78%
|
6
|
155.00
|
68.89%
|
130.00
|
57.78%
|
7
|
159.00
|
70.67%
|
135.00
|
60.00%
|
8
|
161.00
|
71.56%
|
129.00
|
57.33%
|
9
|
165.00
|
73.33%
|
132.00
|
58.67%
|
10
|
167.00
|
74.22%
|
138.00
|
61.33%
|
11
|
169.00
|
75.11%
|
144.00
|
64.00%
|
12
|
170.00
|
75.56%
|
148.00
|
65.78%
|
13
|
170.00
|
75.56%
|
154.00
|
68.44%
|
14
|
171.00
|
76.00%
|
149.00
|
66.22%
|
15
|
172.00
|
76.44%
|
148.00
|
65.78%
|
Average
|
148.38
|
65.95%
|
127.77
|
56.79%
|
Table4.19
(a) (b)
Figure4.20 success retrieved pages’ counts per 225 pages and
corresponding percentage value by undirected graph WordRank4DF1.
WR5DF5
|
Google
|
Yahoo
|
3
|
128.00
|
56.89%
|
121.00
|
53.78%
|
4
|
151.00
|
67.11%
|
141.00
|
62.67%
|
5
|
157.00
|
69.78%
|
140.00
|
62.22%
|
6
|
168.00
|
74.67%
|
144.00
|
64.00%
|
7
|
167.00
|
74.22%
|
142.00
|
63.11%
|
8
|
169.00
|
75.11%
|
140.00
|
62.22%
|
9
|
168.00
|
74.67%
|
142.00
|
63.11%
|
10
|
167.00
|
74.22%
|
134.00
|
59.56%
|
11
|
165.00
|
73.33%
|
142.00
|
63.11%
|
12
|
163.00
|
72.44%
|
131.00
|
58.22%
|
13
|
163.00
|
72.44%
|
137.00
|
60.89%
|
14
|
163.00
|
72.44%
|
131.00
|
58.22%
|
15
|
163.00
|
72.44%
|
131.00
|
58.22%
|
Average
|
160.92308
|
71.52%
|
136.6154
|
60.72%
|
Table4.20
(a) (b)
Figure4.21 success retrieved pages’ counts per 225 pages and
corresponding percentage value by undirected graph WordRank5DF5.
WR3TFIDF2
|
Google
|
Yahoo
|
3
|
38.00
|
16.89%
|
38.00
|
16.89%
|
4
|
55.00
|
24.44%
|
47.00
|
20.89%
|
5
|
85.00
|
37.78%
|
86.00
|
38.22%
|
6
|
91.00
|
40.44%
|
85.00
|
37.78%
|
7
|
120.00
|
53.33%
|
92.00
|
40.89%
|
8
|
124.00
|
55.11%
|
99.00
|
44.00%
|
9
|
139.00
|
61.78%
|
111.00
|
49.33%
|
10
|
146.00
|
64.89%
|
114.00
|
50.67%
|
11
|
145.00
|
64.44%
|
119.00
|
52.89%
|
12
|
155.00
|
68.89%
|
129.00
|
57.33%
|
13
|
152.00
|
67.56%
|
123.00
|
54.67%
|
14
|
159.00
|
70.67%
|
136.00
|
60.44%
|
15
|
160.00
|
71.11%
|
131.00
|
58.22%
|
Average
|
120.69231
|
53.64%
|
100.7692
|
44.79%
|
Table4.21
(a) (b)
Figure4.22 success retrieved pages’ counts per 225 pages and
corresponding percentage value by undirected graph WordRank3TFIDF2.
WR4TFIDF1
|
Google
|
Yahoo
|
3
|
48.00
|
21.33%
|
48.00
|
21.33%
|
4
|
69.00
|
30.67%
|
64.00
|
28.44%
|
5
|
82.00
|
36.44%
|
72.00
|
32.00%
|
6
|
96.00
|
42.67%
|
84.00
|
37.33%
|
7
|
110.00
|
48.89%
|
102.00
|
45.33%
|
8
|
126.00
|
56.00%
|
111.00
|
49.33%
|
9
|
137.00
|
60.89%
|
125.00
|
55.56%
|
10
|
143.00
|
63.56%
|
128.00
|
56.89%
|
11
|
156.00
|
69.33%
|
136.00
|
60.44%
|
12
|
158.00
|
70.22%
|
139.00
|
61.78%
|
13
|
160.00
|
71.11%
|
131.00
|
58.22%
|
14
|
161.00
|
71.56%
|
129.00
|
57.33%
|
15
|
159.00
|
70.67%
|
132.00
|
58.67%
|
Average
|
123.46
|
54.87%
|
107.77
|
47.90%
|
Table4.22
(a) (b)
Figure4.23 success retrieved pages’ counts per 225 pages and
corresponding percentage value by undirected graph WordRank4TFIDF1.
WR5TFIDF5
|
Google
|
Yahoo
|
3
|
38.00
|
16.89%
|
38.00
|
16.89%
|
4
|
55.00
|
24.44%
|
47.00
|
20.89%
|
5
|
78.00
|
34.67%
|
67.00
|
29.78%
|
6
|
92.00
|
40.89%
|
85.00
|
37.78%
|
7
|
105.00
|
46.67%
|
97.00
|
43.11%
|
8
|
124.00
|
55.11%
|
99.00
|
44.00%
|
9
|
128.00
|
56.89%
|
106.00
|
47.11%
|
10
|
139.00
|
61.78%
|
107.00
|
47.56%
|
11
|
144.00
|
64.00%
|
114.00
|
50.67%
|
12
|
143.00
|
63.56%
|
122.00
|
54.22%
|
13
|
146.00
|
64.89%
|
124.00
|
55.11%
|
14
|
155.00
|
68.89%
|
129.00
|
57.33%
|
15
|
156.00
|
69.33%
|
123.00
|
54.67%
|
Average
|
115.62
|
51.38%
|
96.77
|
43.01%
|
Table4.23
(a) (b)
Figure4.24 success retrieved pages’ counts per 225 pages and
corresponding percentage value by undirected graph WordRank5TFIDF5.
The above charts
show the following 2 facts:
1. The start up success retrieval rate increases
when the DF takes more parts in WordRankxDFy or WordRankxTFIDFy. This acts very
similarly as the basic methods in section 4.1, like TFxDFy or TFIDFxDFy.
2. After exceeding 10 words in a query, the
success retrieval rates tend to be flat and stable. This also acts very
similarly like the basic methods.
Figure4.25 all word rank related methods comparison
The average
success retrieve percentage rates are shown in Figure4.25. WR3DF2 and WR5DF5 show the 2 best results
than other word rank related methods. Google’s results are more than 70% and
Yahoo’s results are more than 60%.