In previous
section, a graph-based ranking algorithm on words is introduced. In this
section, the vertex in the graph is applied on the sentences. Compared to
word-rank, sentence rank preserves much more linguistic information and the
results from sentence rank are more understandable because now the LS is
composed by whole sentence rather than terms and phrases.
Similar to the
word rank graph, without considering the un-weighted graph, there are 2 kinds
of graph construction for sentences in a document: undirected weighted graph and directed
weighted graph. Moreover, Rada and Paul proposed another form of directed
weighted graph and they classify directed weighted graph into directed forward weighted graph and directed backward weighted graph [9]. In
undirected weighted graph, the
assumption is that every sentence in the document has connection with all other
sentences. In directed forward weighted graph, every sentence
points to all the following sentences in text, while every sentence receives all
the previous sentences in text. In directed
backward weighted graph, every sentence points to all the previous
sentences in text, while every sentence receives all the following sentences in
text.
Figure2.17
Figure2.18 shows a 5-sentences passage. Figure2.18 shows the 3 graphs in detail. In Figure2.18 (a), as an undirected graph, every
sentence has connection with other 4 sentences with a pair of in coming pointer
and out coming pointer. In Figure2.18 (b), as a directed forward graph, sentence
1 points to sentences 2, 3, 4, 5; sentence 2 points to sentences 3, 4, 5;
sentence 3 points to sentences 4, 5; sentence 4 points to sentence 5 and
sentence 5 does not have out coming pointer. As a directed backward graph, Figure2.18 (c) shows the (b)’s reversed order,
sentence 5 points to sentences 1, 2, 3, 4; sentence 4 points to sentences 1, 2,
3; sentence 3 points to sentences 1, 2; sentence 2 points to sentence 1 and
sentence 1 does not have out coming pointer.
a. undirected b.
directed forward c. directed backward
Figure2.18
Therefore, after the
direction being decided, the weight between the connections is about computing
the 2 sentences’ similarity, as in equation 2-10, the common terms shared by both sentences
are divided by the length of both sentences in log form. Length can be simply considered as the terms number in
sentence.
[10] 2-10
Generally, Wk
is a word appearing in both Si and Sj. Wk can
also be the words with same meaning but different forms such as “interested”
and “interesting” which could finally prove the ability in finding the similar
or related documents. If there is no common word Wk, then the edge
between Si and Sj can be removed.
After iterations
on 2-8 and 2-9, the sentences from text with the highest
rank are selected and taken as an abstraction. In [9], 4 highest ranked sentences
are extracted after the iteration, resulting in a summary of about 100 words. Rada
and Paul also evaluated the abstraction from this graph ranking algorithm with
other methods such as HITS and Positional Power Function (POS) by ROUGE
evaluation [15]. It turned out the sentence rank from page rank
provided good performance. The details are not included in this report.
After all, Rada and
Paul concluded that, based on this sentence rank algorithm, the abstraction
from the extracted sentences are more informative for the given text and it
does not require deep linguistic language nor domain or language specific
annotated corpora [10].