2.5.3 Sentence-Rank

View Post

In previous section, a graph-based ranking algorithm on words is introduced. In this section, the vertex in the graph is applied on the sentences. Compared to word-rank, sentence rank preserves much more linguistic information and the results from sentence rank are more understandable because now the LS is composed by whole sentence rather than terms and phrases.

Similar to the word rank graph, without considering the un-weighted graph, there are 2 kinds of graph construction for sentences in a document: undirected weighted graph and directed weighted graph. Moreover, Rada and Paul proposed another form of directed weighted graph and they classify directed weighted graph into directed forward weighted graph and directed backward weighted graph ^[9]. In undirected weighted graph, the assumption is that every sentence in the document has connection with all other sentences. In directed forward weighted graph, every sentence points to all the following sentences in text, while every sentence receives all the previous sentences in text. In directed backward weighted graph, every sentence points to all the previous sentences in text, while every sentence receives all the following sentences in text.

Figure2.17

Figure2.18 shows a 5-sentences passage. Figure2.18 shows the 3 graphs in detail. In Figure2.18 (a), as an undirected graph, every sentence has connection with other 4 sentences with a pair of in coming pointer and out coming pointer. In Figure2.18 (b), as a directed forward graph, sentence 1 points to sentences 2, 3, 4, 5; sentence 2 points to sentences 3, 4, 5; sentence 3 points to sentences 4, 5; sentence 4 points to sentence 5 and sentence 5 does not have out coming pointer. As a directed backward graph, Figure2.18 (c) shows the (b)’s reversed order, sentence 5 points to sentences 1, 2, 3, 4; sentence 4 points to sentences 1, 2, 3; sentence 3 points to sentences 1, 2; sentence 2 points to sentence 1 and sentence 1 does not have out coming pointer.

a. undirected b. directed forward c. directed backward

Figure2.18

Therefore, after the direction being decided, the weight between the connections is about computing the 2 sentences’ similarity, as in equation 2-10, the common terms shared by both sentences are divided by the length of both sentences in log form. Length can be simply considered as the terms number in sentence.

^[10]2-10

Generally, W_k is a word appearing in both S_i and S_j. W_k can also be the words with same meaning but different forms such as “interested” and “interesting” which could finally prove the ability in finding the similar or related documents. If there is no common word W_k, then the edge between S_i and S_j can be removed.

After iterations on 2-8 and 2-9, the sentences from text with the highest rank are selected and taken as an abstraction. In [9], 4 highest ranked sentences are extracted after the iteration, resulting in a summary of about 100 words. Rada and Paul also evaluated the abstraction from this graph ranking algorithm with other methods such as HITS and Positional Power Function (POS) by ROUGE evaluation ^[15]. It turned out the sentence rank from page rank provided good performance. The details are not included in this report.

After all, Rada and Paul concluded that, based on this sentence rank algorithm, the abstraction from the extracted sentences are more informative for the given text and it does not require deep linguistic language nor domain or language specific annotated corpora ^[10].

posted on 2009-06-18 03:15 JosephQuinn 阅读(283) 评论(0) 编辑收藏所属分类: My Master-degree Project

新用户注册刷新评论列表


只有注册用户登录后才能发表评论。




网站导航: 博客园 IT新闻 Chat2DB C++博客博问管理
相关文章: Appendix B 5 Conclusion 4.7 Sentence Rank on Yahoo News Page 4.6 Sentence Rankv 4.5 Random pick sentence 4.4 Word Rank 4.3 Google search tips: meta keys and meta description 4.2 Title 4.1 The basics 3.5 Deep Web Search Engine

Avenue U

常用链接

留言簿

随笔分类

随笔档案

Core Java

最新随笔

搜索

最新评论

阅读排行榜

评论排行榜

View Post

2.5.3 Sentence-Rank