用正则表达式取出去除html页面中的tags

这个就比较简单了，正则式是 “<[^>]*>”，其表意为“以<开头的，后续任意个不为>的字符，并以>结尾的字符串”
这样做的目的是为了获得所谓plain的文本，方便下一步的处理。

代码如下：

/**
2

* Remove all "<>" tags in the text
3

* @param tagText
4

* @return the clean text without tags
5

*/
6

public String removeTags( String tagText )
7

{
8

return tagText.replaceAll("<[^>]*>", "");
9

}

posted on 2009-11-06 22:19 甜菜侯爵阅读(226) 评论(0) 编辑收藏