List of Tables vii
List of Figures viii
Abstract xi
1. Introduction. 1
2. Related Work. 3
2.1 Seung-Taek Park’s Study on Lexical Signature. 3
2.2 Martin Klein and Michael Nelson’s study on Lexical Signature. 4
2.3 Robust Hyperlinks 6
2.4 Michal Cutler’s Study on HTML Structure. 6
2.5 Graph-Based ranking algorithm.. 9
2.5.1 Word-Rank. 12
2.5.2 Word-Rank on Web Pages 13
2.5.3 Sentence-Rank. 14
2.5.4 Sentence-Rank on Web Pages 16
3 Experiments Design and Setup. 18
3.1 Experiments Steps 18
3.2 Search Engine Selection. 20
3.2.1 Google Ajex: 22
3.2.2 Google Base Data API 22
3.2.3 Extract Google Results by Brutal Force. 23
3.2.4 Yahoo web search API and news search API. 23
3.3 Data Set 24
3.3.1 Page Quality. 24
3.3.2 HTML Parsing and Text Extraction. 30
3.3.3 Query Length. 31
3.4 Result Page Comparison. 31
3.5 Deep Web Search Engine. 34
4 Experimental Result and Analysis 35
4.1 The basics 35
4.2 Title. 42
4.3 Google search tips: meta keys and meta description. 43
4.4 Word Rank. 46
4.5 Random pick sentence. 52
4.6 Sentence Rank. 53
4.7 Sentence Rank on Yahoo News Page. 57
5 Conclusion. 59
5.1 Summaries 59
5.2 Limitations 61
5.2.1 HTML Parsing and Text Extraction. 61
5.2.2 Solution. 63
References 64
Appendix A.. 66
Appendix B. 66
Appendix C. 71
Appendix D.. 73
posted on 2009-06-15 04:11 JosephQuinn 阅读(171) 评论(0) 编辑 收藏 所属分类: My Master-degree Project