既然知道了FeedParser类只是一个中介类，真正的解析工作都在各个版本号对应的解析器完成，那么下面就从RSS 0.9.x / 2.0系协议解析器开始学习。

★RSS_0_91_Parser

/**

* Private constructor suppresses generation of a (public) default

* constructor.

private RSS_0_91_Parser() {

}

/**

* Holder of the RSS_0_91_Parser instance.

private static class RSS_0_91_ParserHolder {

private static RSS_0_91_Parser instance = new RSS_0_91_Parser();

}

/**

* Get the RSS_0_91_Parser instance.

public static RSS_0_91_Parser getInstance() {

return RSS_0_91_ParserHolder.instance;

}

可以看到解析器都是单例模式，这里比较特别的是使用静态内部类来保存这个单例。个人认为这里有点多余，静态内部类的作用主要有：
A.在创建静态内部类的实例前，不需要额外创建外部类的实例
B.静态内部类可以也只能访问外部类的静态成员和静态方法

内部类的方法通常用在一些隐藏实现细节的地方。通常这些类只对另外的某个类有意义，对其他类来说完全没有作用。如果定义在包里成为独立类，反而显得不协调。所以采用内部类的方式定义在需要使用它的类的内部，隐藏细节。

其次静态内部类的内部变量和方法都必须是静态的，这样我们才可以通过Outerclass.InnerClass.staticMethod的形式来引用他们。

这个类的重点是parse方法，我们都知道了肯定是先从channel节点开始解析。

Element channel = root.getChild("channel");

ChannelIF chnl = cBuilder.createChannel(channel, channel

.getChildTextTrim("title"));

首先获取XML Document的root节点下的channel子节点。然后利用Jdom提供的访问节点文本值的方法获取channel的title。接下来就是调用ChannelBuilder的createChannel了。在这里就用到了Java的多态特性，不同的实现使用不同的构建方法。如果是使用hibernate方法，那么则是如下的过程：

public ChannelIF createChannel(Element channelElement, String title,

String location) {

ChannelIF obj = null;

if (location != null) {

Query query = session
.createQuery("from Channel as channel where channel.locationString = ? ");

query.setString(0, location);

obj = (ChannelIF) query.uniqueResult();

}

if (obj == null) {

obj = new Channel(channelElement, title, location);

session.save(obj);

} else {

logger

.info("Found already existing channel instance with location "

+ location);

}

return obj;

}

先从数据库加载，如果找不到就创建然后持久化它。

下面的代码则是对channel下属的子节点和item的获取。

// 1..n item elements

List items = channel.getChildren("item");

Iterator i = items.iterator();

while (i.hasNext()) {

Element item = (Element) i.next();

ParserUtils.matchCaseOfChildren(item, new String[] { "title",

"link", "description", "source", "enclosure" });

// get title element

Element elTitle = item.getChild("title");

String strTitle = "<No Title>";

if (elTitle != null) {

strTitle = elTitle.getTextTrim();

}

if (logger.isDebugEnabled()) {

logger.debug("Item element found (" + strTitle + ").");

}

// get link element

Element elLink = item.getChild("link");

String strLink = "";

if (elLink != null) {

strLink = elLink.getTextTrim();

}

// get description element

Element elDesc = item.getChild("description");

String strDesc = "";

if (elDesc != null) {

strDesc = elDesc.getTextTrim();

}

// generate new RSS item (link to article)

ItemIF rssItem = cBuilder.createItem(item, chnl, strTitle, strDesc,

ParserUtils.getURL(strLink));

rssItem.setFound(dateParsed);

// get source element (an RSS 0.92 element)

Element source = item.getChild("source");

if (source != null) {

String sourceName = source.getTextTrim();

Attribute sourceAttribute = source.getAttribute("url");

if (sourceAttribute != null) {

String location = sourceAttribute.getValue().trim();

ItemSourceIF itemSource = cBuilder.createItemSource(

rssItem, sourceName, location, null);

rssItem.setSource(itemSource);

}

// get enclosure element (an RSS 0.92 element)

Element enclosure = item.getChild("enclosure");

if (enclosure != null) {

URL location = null;

String type = null;

int length = -1;

Attribute urlAttribute = enclosure.getAttribute("url");

if (urlAttribute != null) {

location = ParserUtils.getURL(urlAttribute.getValue()

.trim());

}

Attribute typeAttribute = enclosure.getAttribute("type");

if (typeAttribute != null) {

type = typeAttribute.getValue().trim();

}

Attribute lengthAttribute = enclosure.getAttribute("length");

if (lengthAttribute != null) {

try {

length = Integer.parseInt(lengthAttribute.getValue()

.trim());

} catch (NumberFormatException e) {

logger.warn(e);

}

ItemEnclosureIF itemEnclosure = cBuilder.createItemEnclosure(

rssItem, location, type, length);

rssItem.setEnclosure(itemEnclosure);

}

可以看到，对于这个解析过程，一般的步骤就是：
A.获取channnel下的某个子节点元素
B.如果该子节点元素有子元素或属性，则继续递归访问
C.调用该channnel子元素的createXxx方法加载或创建该子元素
D.调用Channel的setXxx方法添加该子元素到channel实例中

整个RSS 0.9.1协议的解析过程如下：

==================根元素==================

1. channel

==================必需元素==================

2. title

3.description

4.link

==================可选元素==================

5.language

6.item

7.image

8.textinput

9.copyright

10.rating

11.pubDate

12.lastBuildDate

13.docs

14.managingEditor

15.webMaster

16.cloud

★RSS_2_0_Parser

比较0.9.1和2.0协议，发现整个解析过程几乎相同。最大的不同有以下两点：

A.从RSS 2.0协议开始，增加了对名称空间(Namespace)的支持
B.增加了对几个2.0协议新增元素的解析

在RSS_2_0_Parser类中，每个元素的访问都需要使用name和namespace来区分，默认的namespace是""。其次在RSS 2.0的解析器中增加了对subject、category、author、creator、comments、guid这些元素的解析，这些在0.9.1协议中是没有的元素

-------------------------------------------------------------
生活就像打牌，不是要抓一手好牌，而是要尽力打好一手烂牌。

posted on 2009-12-30 10:45 Paul Lin 阅读(262) 评论(0) 编辑收藏所属分类: J2SE

新用户注册刷新评论列表


只有注册用户登录后才能发表评论。




网站导航: 博客园 IT新闻 Chat2DB C++博客博问管理
相关文章: 【Java基础专题】IO与文件读写---优化搜索程序(01) 【Java基础专题】IO与文件读写---DirectoryWalker和FileFilter的复杂条件使用【Java基础专题】IO与文件读写---使用DirectoryWalker和FileFilterUtils进行搜索【Java基础专题】IO与文件读写---慎用FileUtils.writeLines(File, Collection)方法 TSS上关于JDBC操作优化的Tips总结【Java基础专题】IO与文件读写---对同步/异步和阻塞/非阻塞的理解【Java基础专题】IO与文件读写---同步/异步与阻塞/非阻塞的区别（转）【Java基础专题】IO与文件读写---使用Apache commons IO包进行资源遍历【Java基础专题】IO与文件读写---使用Apache commons IO过滤文件和目录【Java基础专题】IO与文件读写---使用Apache commons IO操纵底层读写

2009年12月

日

一

二

三

四

五

六

常用链接

留言簿(21)

随笔分类

随笔档案

BlogJava热点博客

好友博客

无羽苍鹰

常用链接

留言簿(21)

随笔分类

随笔档案

BlogJava热点博客

好友博客

搜索

最新评论

阅读排行榜

评论排行榜