spiders

思想有多远,我们就能走多远!

常用链接

统计

AJAX

最新评论

2006年4月12日 #

AJAX网站

1。http://www.netvibes.com/

第三方个人首页:Netvibes

前一阵介绍过一个自定义个人首页服务ProtoPageNetvibes所提供的也是类似的服务,比过它在功能上比ProtoPage要“Geek Style”的多,也更加适合Blogger使用。

Netvibes的功能和界面可以说都是汲取了Start.comiGoogle的精华。

Netvibes的搜索模块可以搜索Google、Yahoo、MSN以及WikiPedia。它的Feed读取功能强大,和Start.com非常相似,是一个完整的在线Feed Reader。Netvibes还可以像iGoogle那样读取Gmail邮件,显示标题和邮件摘要。它所提供的天气预报服务也优于Start.com和iGoogle,可以显示全球主要城市的天气。另外,Netvibes还有一个WebNote便签功能,方便随时记录,与GDS里的Sidebar类似。

和Start.com类似的,Netvibes也支持直接使用或者登录用户使用,登录后页面的改动都会保存在服务器。

在界面上,Netvibes也使用了AJAX,可以在Internet Explorer和Firefox下拖曳各个模块,并能够即时修改页面标题。

MSN、Google又或者是Yahoo所提供的个人首页固然稳定,但是Netvibes这样的第三方服务却可以提供更高的灵活性。对于同时使用多家网站服务的用户来说,可能是更好的解决方案。

Netvibes :目前最具特色的个性化首页之一;目前国内有okrss,周博通
照搬了它的代码, Netvibes的代码写的的确很简练,值得学习。

www.okrss.com
http://www.potu.com/my/

2.http://www.eskobo.com/default.aspx

posted @ 2006-04-14 09:31 spiders 阅读(209) | 评论 (0)编辑 收藏

XMLHTTPRequest对象的实现可以兼容IE,Firefox,Opera浏览器。

From: http://www.scss.com.au/family/andrew/webdesign/xmlhttprequest/
/*

Cross-Browser XMLHttpRequest v1.2
=================================

Emulate Gecko 'XMLHttpRequest()' functionality in IE and Opera. Opera requires
the Sun Java Runtime Environment <http://www.java.com/>.

by Andrew Gregory
http://www.scss.com.au/family/andrew/webdesign/xmlhttprequest/

This work is licensed under the Creative Commons Attribution License. To view a
copy of this license, visit http://creativecommons.org/licenses/by-sa/2.5/ or
send a letter to Creative Commons, 559 Nathan Abbott Way, Stanford, California
94305, USA.

Attribution: Leave my name and web address in this script intact.

Not Supported in Opera
----------------------
* user/password authentication
* responseXML data member

Not Fully Supported in Opera
----------------------------
* async requests
* abort()
* getAllResponseHeaders(), getAllResponseHeader(header)

*/
// IE support
if (window.ActiveXObject && !window.XMLHttpRequest) {
  window.XMLHttpRequest = function() {
    var msxmls = new Array(
      'Msxml2.XMLHTTP.5.0',
      'Msxml2.XMLHTTP.4.0',
      'Msxml2.XMLHTTP.3.0',
      'Msxml2.XMLHTTP',
      'Microsoft.XMLHTTP');
    for (var i = 0; i < msxmls.length; i++) {
      try {
        return new ActiveXObject(msxmls[i]);
      } catch (e) {
      }
    }
    return null;
  };
}
// Gecko support
/* ;-) */
// Opera support
if (window.opera && !window.XMLHttpRequest) {
  window.XMLHttpRequest = function() {
    this.readyState = 0; // 0=uninitialized,1=loading,2=loaded,3=interactive,4=complete
    this.status = 0; // HTTP status codes
    this.statusText = '';
    this._headers = [];
    this._aborted = false;
    this._async = true;
    this._defaultCharset = 'ISO-8859-1';
    this._getCharset = function() {
      var charset = _defaultCharset;
      var contentType = this.getResponseHeader('Content-type').toUpperCase();
      val = contentType.indexOf('CHARSET=');
      if (val != -1) {
        charset = contentType.substring(val);
      }
      val = charset.indexOf(';');
      if (val != -1) {
        charset = charset.substring(0, val);
      }
      val = charset.indexOf(',');
      if (val != -1) {
        charset = charset.substring(0, val);
      }
      return charset;
    };
    this.abort = function() {
      this._aborted = true;
    };
    this.getAllResponseHeaders = function() {
      return this.getAllResponseHeader('*');
    };
    this.getAllResponseHeader = function(header) {
      var ret = '';
      for (var i = 0; i < this._headers.length; i++) {
        if (header == '*' || this._headers[i].h == header) {
          ret += this._headers[i].h + ': ' + this._headers[i].v + '\n';
        }
      }
      return ret;
    };
    this.getResponseHeader = function(header) {
      var ret = getAllResponseHeader(header);
      var i = ret.indexOf('\n');
      if (i != -1) {
        ret = ret.substring(0, i);
      }
      return ret;
    };
    this.setRequestHeader = function(header, value) {
      this._headers[this._headers.length] = {h:header, v:value};
    };
    this.open = function(method, url, async, user, password) {
      this.method = method;
      this.url = url;
      this._async = true;
      this._aborted = false;
      this._headers = [];
      if (arguments.length >= 3) {
        this._async = async;
      }
      if (arguments.length > 3) {
        opera.postError('XMLHttpRequest.open() - user/password not supported');
      }
      this.readyState = 1;
      if (this.onreadystatechange) {
        this.onreadystatechange();
      }
    };
    this.send = function(data) {
      if (!navigator.javaEnabled()) {
        alert("XMLHttpRequest.send() - Java must be installed and enabled.");
        return;
      }
      if (this._async) {
        setTimeout(this._sendasync, 0, this, data);
        // this is not really asynchronous and won't execute until the current
        // execution context ends
      } else {
        this._sendsync(data);
      }
    }
    this._sendasync = function(req, data) {
      if (!req._aborted) {
        req._sendsync(data);
      }
    };
    this._sendsync = function(data) {
      this.readyState = 2;
      if (this.onreadystatechange) {
        this.onreadystatechange();
      }
      // open connection
      var url = new java.net.URL(new java.net.URL(window.location.href), this.url);
      var conn = url.openConnection();
      for (var i = 0; i < this._headers.length; i++) {
        conn.setRequestProperty(this._headers[i].h, this._headers[i].v);
      }
      this._headers = [];
      if (this.method == 'POST') {
        // POST data
        conn.setDoOutput(true);
        var wr = new java.io.OutputStreamWriter(conn.getOutputStream(), this._getCharset());
        wr.write(data);
        wr.flush();
        wr.close();
      }
      // read response headers
      // NOTE: the getHeaderField() methods always return nulls for me :(
      var gotContentEncoding = false;
      var gotContentLength = false;
      var gotContentType = false;
      var gotDate = false;
      var gotExpiration = false;
      var gotLastModified = false;
      for (var i = 0; ; i++) {
        var hdrName = conn.getHeaderFieldKey(i);
        var hdrValue = conn.getHeaderField(i);
        if (hdrName == null && hdrValue == null) {
          break;
        }
        if (hdrName != null) {
          this._headers[this._headers.length] = {h:hdrName, v:hdrValue};
          switch (hdrName.toLowerCase()) {
            case 'content-encoding': gotContentEncoding = true; break;
            case 'content-length'  : gotContentLength   = true; break;
            case 'content-type'    : gotContentType     = true; break;
            case 'date'            : gotDate            = true; break;
            case 'expires'         : gotExpiration      = true; break;
            case 'last-modified'   : gotLastModified    = true; break;
          }
        }
      }
      // try to fill in any missing header information
      var val;
      val = conn.getContentEncoding();
      if (val != null && !gotContentEncoding) this._headers[this._headers.length] = {h:'Content-encoding', v:val};
      val = conn.getContentLength();
      if (val != -1 && !gotContentLength) this._headers[this._headers.length] = {h:'Content-length', v:val};
      val = conn.getContentType();
      if (val != null && !gotContentType) this._headers[this._headers.length] = {h:'Content-type', v:val};
      val = conn.getDate();
      if (val != 0 && !gotDate) this._headers[this._headers.length] = {h:'Date', v:(new Date(val)).toUTCString()};
      val = conn.getExpiration();
      if (val != 0 && !gotExpiration) this._headers[this._headers.length] = {h:'Expires', v:(new Date(val)).toUTCString()};
      val = conn.getLastModified();
      if (val != 0 && !gotLastModified) this._headers[this._headers.length] = {h:'Last-modified', v:(new Date(val)).toUTCString()};
      // read response data
      var reqdata = '';
      var stream = conn.getInputStream();
      if (stream) {
        var reader = new java.io.BufferedReader(new java.io.InputStreamReader(stream, this._getCharset()));
        var line;
        while ((line = reader.readLine()) != null) {
          if (this.readyState == 2) {
            this.readyState = 3;
            if (this.onreadystatechange) {
              this.onreadystatechange();
            }
          }
          reqdata += line + '\n';
        }
        reader.close();
        this.status = 200;
        this.statusText = 'OK';
        this.responseText = reqdata;
        this.readyState = 4;
        if (this.onreadystatechange) {
          this.onreadystatechange();
        }
        if (this.onload) {
          this.onload();
        }
      } else {
        // error
        this.status = 404;
        this.statusText = 'Not Found';
        this.responseText = '';
        this.readyState = 4;
        if (this.onreadystatechange) {
          this.onreadystatechange();
        }
        if (this.onerror) {
          this.onerror();
        }
      }
    };
  };
}
// ActiveXObject emulation
if (!window.ActiveXObject && window.XMLHttpRequest) {
  window.ActiveXObject = function(type) {
    switch (type.toLowerCase()) {
      case 'microsoft.xmlhttp':
      case 'msxml2.xmlhttp':
      case 'msxml2.xmlhttp.3.0':
      case 'msxml2.xmlhttp.4.0':
      case 'msxml2.xmlhttp.5.0':
        return new XMLHttpRequest();
    }
    return null;
  };
}

posted @ 2006-04-13 16:30 spiders 阅读(1894) | 评论 (0)编辑 收藏

RSS开发规范(英文|中文)

Contents

What is RSS?

RSS is a Web content syndication format.

Its name is an acronym for Really Simple Syndication.

RSS is dialect of XML. All RSS files must conform to the XML 1.0 specification, as published on the World Wide Web Consortium (W3C) website.

At the top level, a RSS document is a <rss> element, with a mandatory attribute called version, that specifies the version of RSS that the document conforms to. If it conforms to this specification, the version attribute must be 2.0.

Subordinate to the <rss> element is a single <channel> element, which contains information about the channel (metadata) and its contents.

Sample files

Here are sample files for: RSS 0.91, 0.92 and 2.0.

Note that the sample files may point to documents and services that no longer exist. The 0.91 sample was created when the 0.91 docs were written. Maintaining a trail of samples seems like a good idea.

About this document

This document represents the status of RSS as of the Fall of 2002, version 2.0.1.

It incorporates all changes and additions, starting with the basic spec for RSS 0.91 (June 2000) and includes new features introduced in RSS 0.92 (December 2000) and RSS 0.94 (August 2002).

Change notes are here.

First we document the required and optional sub-elements of <channel>; and then document the sub-elements of <item>. The final sections answer frequently asked questions, and provide a roadmap for future evolution, and guidelines for extending RSS.

Required channel elements

Here's a list of the required channel elements, each with a brief description, an example, and where available, a pointer to a more complete description.

ElementDescriptionExample
titleThe name of the channel. It's how people refer to your service. If you have an HTML website that contains the same information as your RSS file, the title of your channel should be the same as the title of your website.GoUpstate.com News Headlines
linkThe URL to the HTML website corresponding to the channel.http://www.goupstate.com/
description Phrase or sentence describing the channel.The latest news from GoUpstate.com, a Spartanburg Herald-Journal Web site.


Optional channel elements

Here's a list of optional channel elements.

ElementDescriptionExample
languageThe language the channel is written in. This allows aggregators to group all Italian language sites, for example, on a single page. A list of allowable values for this element, as provided by Netscape, is here. You may also use values defined by the W3C.en-us
copyrightCopyright notice for content in the channel.Copyright 2002, Spartanburg Herald-Journal
managingEditorEmail address for person responsible for editorial content.geo@herald.com (George Matesky)
webMasterEmail address for person responsible for technical issues relating to channel.betty@herald.com (Betty Guernsey)
pubDateThe publication date for the content in the channel. For example, the New York Times publishes on a daily basis, the publication date flips once every 24 hours. That's when the pubDate of the channel changes. All date-times in RSS conform to the Date and Time Specification of RFC 822, with the exception that the year may be expressed with two characters or four characters (four preferred).Sat, 07 Sep 2002 0:00:01 GMT
lastBuildDateThe last time the content of the channel changed.Sat, 07 Sep 2002 9:42:31 GMT
categorySpecify one or more categories that the channel belongs to. Follows the same rules as the <item>-level category element. More info.<category>Newspapers</category>
generatorA string indicating the program used to generate the channel.MightyInHouse Content System v2.3
docsA URL that points to the documentation for the format used in the RSS file. It's probably a pointer to this page. It's for people who might stumble across an RSS file on a Web server 25 years from now and wonder what it is.http://backend.userland.com/rss
cloudAllows processes to register with a cloud to be notified of updates to the channel, implementing a lightweight publish-subscribe protocol for RSS feeds. More info here.<cloud domain="rpc.sys.com" port="80" path="/RPC2" registerProcedure="pingMe" protocol="soap"/>
ttlttl stands for time to live. It's a number of minutes that indicates how long a channel can be cached before refreshing from the source. More info here.<ttl>60</ttl>
imageSpecifies a GIF, JPEG or PNG image that can be displayed with the channel. More info here.
textInputSpecifies a text input box that can be displayed with the channel. More info here.
skipHoursA hint for aggregators telling them which hours they can skip. More info here.
skipDaysA hint for aggregators telling them which days they can skip. More info here.


<image> sub-element of <channel>

<image> is an optional sub-element of <channel>, which contains three required and three optional sub-elements.

<url> is the URL of a GIF, JPEG or PNG image that represents the channel.

<title> describes the image, it's used in the ALT attribute of the HTML <img> tag when the channel is rendered in HTML.

<link> is the URL of the site, when the channel is rendered, the image is a link to the site. (Note, in practice the image <title> and <link> should have the same value as the channel's <title> and <link>.

Optional elements include <width> and <height>, numbers, indicating the width and height of the image in pixels. <description> contains text that is included in the TITLE attribute of the link formed around the image in the HTML rendering.

Maximum value for width is 144, default value is 88.

Maximum value for height is 400, default value is 31.

<cloud> sub-element of <channel>

<cloud> is an optional sub-element of <channel>.

It specifies a web service that supports the rssCloud interface which can be implemented in HTTP-POST, XML-RPC or SOAP 1.1.

Its purpose is to allow processes to register with a cloud to be notified of updates to the channel, implementing a lightweight publish-subscribe protocol for RSS feeds.

<cloud domain="radio.xmlstoragesystem.com" port="80" path="/RPC2" registerProcedure="xmlStorageSystem.rssPleaseNotify" protocol="xml-rpc" />

In this example, to request notification on the channel it appears in, you would send an XML-RPC message to radio.xmlstoragesystem.com on port 80, with a path of /RPC2. The procedure to call is xmlStorageSystem.rssPleaseNotify.

A full explanation of this element and the rssCloud interface is here.

<ttl> sub-element of <channel>

<ttl> is an optional sub-element of <channel>.

ttl stands for time to live. It's a number of minutes that indicates how long a channel can be cached before refreshing from the source. This makes it possible for RSS sources to be managed by a file-sharing network such as Gnutella.

Example: <ttl>60</ttl>

<textInput> sub-element of <channel>

A channel may optionally contain a <textInput> sub-element, which contains four required sub-elements.

<title> -- The label of the Submit button in the text input area.

<description> -- Explains the text input area.

<name> -- The name of the text object in the text input area.

<link> -- The URL of the CGI script that processes text input requests.

The purpose of the <textInput> element is something of a mystery. You can use it to specify a search engine box. Or to allow a reader to provide feedback. Most aggregators ignore it.


Elements of <item>

A channel may contain any number of <item>s. An item may represent a "story" -- much like a story in a newspaper or magazine; if so its description is a synopsis of the story, and the link points to the full story. An item may also be complete in itself, if so, the description contains the text (entity-encoded HTML is allowed), and the link and title may be omitted. All elements of an item are optional, however at least one of title or description must be present.

ElementDescriptionExample
titleThe title of the item.Venice Film Festival Tries to Quit Sinking
linkThe URL of the item.http://www.nytimes.com/2002/09/07/movies/07FEST.html
description The item synopsis.Some of the most heated chatter at the Venice Film Festival this week was about the way that the arrival of the stars at the Palazzo del Cinema was being staged.
authorEmail address of the author of the item. More.oprah@oxygen.net
categoryIncludes the item in one or more categories. More.Simpsons Characters
commentsURL of a page for comments relating to the item. More.http://www.myblog.org/cgi-local/mt/mt-comments.cgi?entry_id=290
enclosureDescribes a media object that is attached to the item. More.<enclosure url="http://live.curry.com/mp3/celebritySCms.mp3" length="1069871" type="audio/mpeg"/>
guidA string that uniquely identifies the item. More.<guid isPermaLink="true">http://inessential.com/2002/09/01.php#a2</guid>
pubDateIndicates when the item was published. More.Sun, 19 May 2002 15:21:36 GMT
sourceThe RSS channel that the item came from. More.<source url="http://www.quotationspage.com/data/qotd.rss">Quotes of the Day</source>


<source> sub-element of <item>

<source> is an optional sub-element of <item>.

Its value is the name of the RSS channel that the item came from, derived from its <title>. It has one required attribute, url, which links to the XMLization of the source.

<source url="http://static.userland.com/tomalak/links2.xml">Tomalak's Realm</source>

The purpose of this element is to propogate credit for links, to publicize the sources of news items. It's used in the post command in the Radio UserLand aggregator. It should be generated automatically when forwarding an item from an aggregator to a weblog authoring tool.

<enclosure> sub-element of <item>

<enclosure> is an optional sub-element of <item>.

It has three required attributes. url says where the enclosure is located, length says how big it is in bytes, and type says what its type is, a standard MIME type.

The url must be an http url.

<enclosure url="http://www.scripting.com/mp3s/weatherReportSuite.mp3" length="12216320" type="audio/mpeg" />

A use-case narrative for this element is here

<category> sub-element of <item>

<category> is an optional sub-element of <item>.

It has one optional attribute, domain, a string that identifies a categorization taxonomy.

The value of the element is a forward-slash-separated string that identifies a hierarchic location in the indicated taxonomy. Processors may establish conventions for the interpretation of categories. Two examples are provided below:

<category>Grateful Dead</category>

<category domain="http://www.fool.com/cusips">MSFT</category>

You may include as many category elements as you need to, for different domains, and to have an item cross-referenced in different parts of the same domain.

<pubDate> sub-element of <item>

<pubDate> is an optional sub-element of <item>.

Its value is a date, indicating when the item was published. If it's a date in the future, aggregators may choose to not display the item until that date.

<pubDate>Sun, 19 May 2002 15:21:36 GMT</pubDate>

<guid> sub-element of <item>

<guid> is an optional sub-element of <item>.

guid stands for globally unique identifier. It's a string that uniquely identifies the item. When present, an aggregator may choose to use this string to determine if an item is new.

<guid>http://some.server.com/weblogItem3207</guid>

There are no rules for the syntax of a guid. Aggregators must view them as a string. It's up to the source of the feed to establish the uniqueness of the string.

If the guid element has an attribute named "isPermaLink" with a value of true, the reader may assume that it is a permalink to the item, that is, a url that can be opened in a Web browser, that points to the full item described by the <item> element. An example:

<guid isPermaLink="true">http://inessential.com/2002/09/01.php#a2</guid>

isPermaLink is optional, its default value is true. If its value is false, the guid may not be assumed to be a url, or a url to anything in particular.

<comments> sub-element of <item>

<comments> is an optional sub-element of <item>.

If present, it is the url of the comments page for the item.

<comments>http://rateyourmusic.com/yaccs/commentsn/blogId=705245&itemId=271</comments>

<author> sub-element of <item>

<author> is an optional sub-element of <item>.

It's the email address of the author of the item. For newspapers and magazines syndicating via RSS, the author is the person who wrote the article that the <item> describes. For collaborative weblogs, the author of the item might be different from the managing editor or webmaster. For a weblog authored by a single individual it would make sense to omit the <author> element.

<author>lawyer@boyer.net (Lawyer Boyer)</author>

Comments

RSS places restrictions on the first non-whitespace characters of the data in <link> and <url> elements. The data in these elements must begin with an IANA-registered URI scheme, such as http://, https://, news://, mailto: and ftp://. Prior to RSS 2.0, the specification only allowed http:// and ftp://, however, in practice other URI schemes were in use by content developers and supported by aggregators. Aggregators may have limits on the URI schemes they support. Content developers should not assume that all aggregators support all schemes.

In RSS 0.91, various elements are restricted to 500 or 100 characters. There can be no more than 15 <items> in a 0.91 <channel>. There are no string-length or XML-level limits in RSS 0.92 and greater. Processors may impose their own limits, and generators may have preferences that say no more than a certain number of <item>s can appear in a channel, or that strings are limited in length.

In RSS 2.0, a provision is made for linking a channel to its identifier in a cataloging system, using the channel-level category feature, described above. For example, to link a channel to its Syndic8 identifier, include a category element as a sub-element of <channel>, with domain "Syndic8", and value the identifier for your channel in the Syndic8 database. The appropriate category element for Scripting News would be <category domain="Syndic8">1765</category>.

A frequently asked question about <guid>s is how do they compare to <link>s. Aren't they the same thing? Yes, in some content systems, and no in others. In some systems, <link> is a permalink to a weblog item. However, in other systems, each <item> is a synopsis of a longer article, <link> points to the article, and <guid> is the permalink to the weblog entry. In all cases, it's recommended that you provide the guid, and if possible make it a permalink. This enables aggregators to not repeat items, even if there have been editing changes.

If you have questions about the RSS 2.0 format, please post them on the RSS2-Support mail list, hosted by Sjoerd Visscher. This is not a debating list, but serves as a support resource for users, authors and developers who are creating and using content in RSS 2.0 format.

Extending RSS

RSS originated in 1999, and has strived to be a simple, easy to understand format, with relatively modest goals. After it became a popular format, developers wanted to extend it using modules defined in namespaces, as specified by the W3C.

RSS 2.0 adds that capability, following a simple rule. A RSS feed may contain elements not described on this page, only if those elements are defined in a namespace.

The elements defined in this document are not themselves members of a namespace, so that RSS 2.0 can remain compatible with previous versions in the following sense -- a version 0.91 or 0.92 file is also a valid 2.0 file. If the elements of RSS 2.0 were in a namespace, this constraint would break, a version 0.9x file would not be a valid 2.0 file.

Here's an example of a file that makes use of elements in namespaces, authored by Mark Pilgrim.

Roadmap

RSS is by no means a perfect format, but it is very popular and widely supported. Having a settled spec is something RSS has needed for a long time. The purpose of this work is to help it become a unchanging thing, to foster growth in the market that is developing around it, and to clear the path for innovation in new syndication formats. Therefore, the RSS spec is, for all practical purposes, frozen at version 2.0.1. We anticipate possible 2.0.2 or 2.0.3 versions, etc. only for the purpose of clarifying the specification, not for adding new features to the format. Subsequent work should happen in modules, using namespaces, and in completely new syndication formats, with new names.

Copyright and disclaimer

© Copyright 1997-2002 UserLand Software. All Rights Reserved.

This document and translations of it may be copied and furnished to others, and derivative works that comment on or otherwise explain it or assist in its implementation may be prepared, copied, published and distributed, in whole or in part, without restriction of any kind, provided that the above copyright notice and these paragraphs are included on all such copies and derivative works.

This document may not be modified in any way, such as by removing the copyright notice or references to UserLand or other organizations. Further, while these copyright restrictions apply to the written RSS specification, no claim of ownership is made by UserLand to the format it describes. Any party may, for commercial or non-commercial purposes, implement this format without royalty or license fee to UserLand. The limited permissions granted herein are perpetual and will not be revoked by UserLand or its successors or assigns.

This document and the information contained herein is provided on an "AS IS" basis and USERLAND DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.

 


RSS 2.0 标准

来源:http://www.cpcwedu.com/Document/WEBOfficial/095447158.htm

什么是 RSS?

RSS 是一种站点内容聚合的格式。

它的名字是Really Simple Syndication 的的简写。

RSS是XML的一种。所有的RSS文档都遵循 XML 1.0规范, 该规范发布在W3C网站上。

这里是RSS版本历史的一个概要。

在一个RSS文档中,最外层是一个<rss>元素,这个元素必须规定version属性,该属性明确了本文档遵从何种RSS版本规范。如果一个RSS文档以这个规范来表示,那么它的version属性就必须是2.0。

<rss>元素只有一个子元素<channel>,包含了关于这个频道(元数据)和它的内容的一些信息。

样本文件

这里有一些RSS样本文件: RSS 0.91, 0.922.0

注意这些样本文件所指向的链接地址和服务器可能已经不再存在。在撰写0.91文档的时候,这个0.91的样本文件就已经创建了。维护一个样本文件的历史也许是一个不错的主意。

关于本文档

本文档完成于2002年秋天,版本为 2.0.1。

它包含从 RSS 0.91 规范(2000年)开始的所有的修改和添加,以及包含在RSS 0.92 (2000年12月)和RSS 0.94(2002年8月)中的新的特性。

详细的文档历史纪录请参阅这里

本文档中首先介绍必须的和可选的频道元素;接着介绍了<item>的子元素。最后回答了一些经常碰到的问题,并提供了未来的发展路线和RSS扩展的指导方针。


必需的频道元素

下面是一个必须包含的频道(channel)元素的列表,每一个都有一个简单的描述、一个例子、应该出现的位置和更详细描述的链接地址。

01.● title 
名称:title
描述:频道的名称。它表明别人如何访问你的服务。如果你有一个与你的RSS文件内容一致的HTML网站,你的title元素值应该与你的网站的标题相同。
例子:GoUpstate.com 的新闻大字标题。

02.● link 
名称:link
描述:对应频道的网站的URL链接地址。
例子:http://www.goupstate.com/ 。

03.● description 
名称:description
描述:关于频道的描述。
例子:The latest news from GoUpstate.com, a Spartanburg Herald-Journal Web site。

可选的频道元素

下面是一个可选的频道(channel)元素的列表。

01.● language
名称:language
描述:频道使用的语言。比如,在一个网站上,允许聚合所有的意大利语站点到相应的分组。对于这个元素,可使用的值,参阅 Netscape提供的清单。或者可以参阅W3C定义的 清单
例子:en-us。

02.● copyright
名称:copyright
描述:频道内容的版权声明。
例子:Copyright 2002, Spartanburg Herald-Journal

03.● managingEditor
名称:managingEditor
描述:频道内容责任编辑的电子邮件地址。
例子:geo@herald.com (George Matesky)

04.● webMaster
名称:webMaster
描述:频道技术支持人员的电子邮件地址。
例子:betty@herald.com (Betty Guernsey)

05.● pubDate
名称:pubDate
描述:频道内容发布的日期。所有的日期和时间都必须遵循 RFC 822规范, 但年份可以用2个或4个字母表示(首选4个字母)。
例子:Sat, 07 Sep 2002 00:00:01 GMT

06.● lastBuildDate
名称:lastBuildDate
描述:频道内容的最后修改时间。
例子:Sat, 07 Sep 2002 09:42:31 GMT

07.● category
名称:category
描述:指定频道所属的一个或多个分类。遵循与item级category元素相同的规则。详见这里
例子:<category>Newspapers</category>

08.● generator
名称:generator
描述:表明生成频道的程序名称的字符串。
例子:MightyInHouse Content System v2.3

09.● docs
名称:docs
描述:指向该RSS文件所用格式说明文档的URL链接地址。
例子:http://blogs.law.harvard.edu/tech/rss。

10.● cloud
名称:cloud
描述:允许通过注册一个cloud来处理获得频道的更新通知,并为rss种子实现一个轻量级的发布订阅协议,详见这里
例子:<cloud domain="rpc.sys.com" port="80" path="/RPC2" registerProcedure="pingMe" protocol="soap"/>

11.● ttl
名称:ttl
描述:ttl是Time to live的缩写,表示生存时间。它表示频道从源更新之前可以缓存的时间。详见 这里
例子:<ttl>60</ttl>。

12.● image
名称:image
描述:指定一个可以在频道中显示的GIF、JPEG或者 PNG 图像。详见这里
例子:。

13.● rating
名称:rating
描述:频道的 PICS 内容分级信息。
例子: 。

14.● textInput
名称:textInput
描述:指定一个可以在频道中显示的文本输入框。详见这里
例子:。

15.● skipHours
名称:skipHours
描述:提示聚合器,可以跳过那些小时的时间段。详见这里
例子:。

16.● skipDays
名称:skipDays
描述:提示聚合器,可以跳过那些天的时间段。详见这里
例子:。

<channel>的子元素<image> 

<image> 是 <channel>的一个可选子元素, 它本身包含了三个必须的和三个可选的子元素。

<url>是一个GIF、JPEG或PNG图像文件的URL链接地址,该图像代表整个频道。

<title>用于描述上面的图像,当频道在HTML中显示时,用于HTML语言中的<img>的alt属性。

<link>是要连接的站点的url,当显示频道时,图像的连接指向该站点。(在实际中,<title>和<link>应该与频道的<title>和<link>有相同的值)。

可选的元素包括<width>和<height>,它们是数字类型,指定图像的宽度和高度,单位为像素。
<description>就是link的TITLE属性中文本,它将在调用网页时显示出来。

图像宽度的最大值为144,默认值为88 。

图像高度的最大值为400,默认值为31 。

<channel>的子元素<cloud>

<cloud> 是 <channel>的一个可选子元素。

它指定一个可以支持rssCloud接口的web服务,rssCloud接口可以在HTTP-POST、XML-RPC或SOAP1.1中实现。

它的目的是允许通过注册一个cloud来处理获得频道的更新通知,从而为RSS feeds实现一个轻量级的发布订阅协议.

<cloud domain="rpc.sys.com" port="80" path="/RPC2" registerProcedure="myCloud.rssPleaseNotify" protocol="xml-rpc" />

在这个例子中,为了请求频道通知,你需要发送一个XML-RPC消息到rpc.sys.com的80端口,路径为/RPC2。调用的程序为为myCloud.rssPleaseNotify。

这个元素的详细说明和 rssCloud 接口说明请参阅 这里

<channel>子元素<ttl>

<ttl><channel>的一个可选子元素。

ttl是Time to live的缩写,表示生存时间。它表示频道从源重新更新之前可以缓存的时间。这使得rss源可以被一个支持文件共享的网络所管理,例如Gnutella

例子: <ttl>60</ttl>

<channel>的子元素<textInput>

频道可以选择包含一个<textInput>子元素,它本身包含了四个必须的子元素。

<title>--文本输入区域提交按钮的标签。

<description>--文本输入区域的描述。

<name>--文本输入区域中文本对象的名称。

<link>--处理文本输入请求的CGI脚本的URL链接地址。

使用<textInput>元素的目的看起来有些神秘。你可以用它提供一个搜索引擎输入框,或让读者提供反馈信息。许多聚合器忽略该元素。

<item>的元素

一个频道可以包含许多<item>元素。一个项目可以代表一个"故事" ——比如说一份报纸或杂志上的故事;如果是这样的话,那么项目的描述则是故事的摘要,项目的链接则指向整个故事的链接位置。一个项目也可以本身是完整的,如果是这样的话,项目的描述就包含了文本(整体以HTML格式编码是可以的;参见 例子),而链接和标题可以省略。项目的所有元素都是可选的,但是至少要包含一个标题(title)或描述(description)。

01.● title
名称:title
描述:item的标题。
例子:Venice Film Festival Tries to Quit Sinking

02.● link
名称:link
描述:item的URL链接地址。
例子:http://nytimes.com/2004/12/07FEST.html

03.● description     
名称:description     
描述:item的摘要。
例子:Some of the most heated chatter at the Venice Film Festival this week was about the way that the arrival of the stars at the Palazzo del Cinema was being staged.

04.● author
名称:author
描述:item作者的电子邮件地址。详见这里
例子:。

05.● category
名称:category
描述:包含item在一个或多个分类中。详见这里
例子:。

06.● comments
名称:comments
描述:与item相关的评论的URL链接地址。详见 这里
例子:。

07.● enclosure
名称:enclosure
描述:item附加的媒体对象。详见这里
例子:。

08.● guid
名称:guid
描述:可以唯一确定item身份的字符串。详见 这里
例子:。

09.● pubDate
名称:pubDate
描述:item发布的时间。详见 这里
例子:。

10.● source
名称:source
描述:rss频道来源。详见 这里
例子:。

<item>的子元素<source>

<source>是<item>的一个可选子元素。

它的值是item所属rss频道的名称,从title衍生而来。它有一个必须包含的属性url, 该属性链接到XML序列化源。

<source url="http://www.tomalak.org/links2.xml">Tomalak's Realm</source>

该元素的作用是提高链接的声望,从而进一步推广新闻项目的源头。它可以用在聚合器的Post命令中。当从聚合器中通过一个webblog编辑工具提交一个item时,<source>应该能够被自动生成。

<item>的子元素<enclosure>

<enclosure>是<item>的一个可选子元素。

它有三个必须的属性。url属性表明enclosure的位置,length属性表明它的字节大小,而type属性则指出它的标准MIME类型。

这里的url必须为一个http url。

<enclosure url="http://www.scripting.com/mp3s/weatherReportSuite.mp3" length="12216320" type="audio/mpeg" />

它的 use-case 说明请参见这里

<item>的子元素<category>

<category>是<item>的一个可选子元素。

它有一个可选属性domain,该属性是一个用来定义分类法的字符串。

该节点的值是一个斜杠分割的字符串,它用来表明在指定的分类法中的层次位置。处理器可以为分类的识别建立协定。以下是两个例子:

<category>Grateful Dead</category>

<category domain="http://www.fool.com/cusips">MSFT</category>

你可以根据需要为不同的域包含很多<category>元素,并且可以在相同域的不同部分拥有一个交叉引用的item。

<item>的子元素<pubDate>

<pubDate> 是<item>的一个可选子元素。

它的值是一个 日期, 表明项目发布的时间。如果它是一个将来的日期,则聚合器在日期到达之前可以选择不显示该项目。 

<pubDate>Sun, 19 May 2002 15:21:36 GMT</pubDate>

<item>的子元素<guid> 

<guid>是<item>的一个可选子元素。

guid 是 globally unique identifier的缩写。它是一个可以唯一识别这个<item>的字符串。在发布之后,聚合器可以选择使用该字符串判断这个<item>是否是新的。 

<guid>http://some.server.com/weblogItem3207</guid>

guid没有特定的语法规则。聚合器必须将它们当作一个字符串来处理。生成具有唯一性的字符串guid取决于种子的源头。

如果guid元素有isPermaLink属性,并且值为真,解释器就会认为它是item的permalink。permalink是一个可在web浏览器中打开的url链接,它指向<item>节点所描述的完整item。 例如:

<guid isPermaLink="true">http://inessential.com/2002/09/01.php#a2</guid>

isPermaLink是可选属性,默认值为真。如果值为假,guid将不会被认为是一个url或指向任何对象的url。

<item>的子元素<comments>

<comments>是<item>的一个可选子元素。

如果出现,它指向与item相关的评论的url。

<comments>http://ekzemplo.com/entry/4403/comments</comments>

更多信息,请参阅 这里

<item>的子元素<author> 

<author>是<item>的一个可选子元素。

它是item作者的电子邮件地址l。对于通过rss传播的报纸和杂志,作者可能是写该item所描述的文章的人。对于聚集型webblogs,作者可能不是责任编辑或站长。对于个人维护的webblog,忽略<author>节点是有意义的。

<author>lawyer@boyer.net (Lawyer Boyer)</author>

注释

RSS限制<link> 和 <url>元素的数据首字母为非空格字符。这些元素的数据必须以IANA-registered URI方案规定的格式开始,如http://, https://, news://, mailto:和 ftp://等。在RSS2.0规范之前,RSS规范只允许http:// 和 ftp://,然而在实践中,其他的URI方案被内容开发者广泛使用,并被聚合器所支持。聚合器也许对它们支持的URI方案有一些限制,而内容开发者不应该假定所有的聚合器都支持所有的URI方案。

在 RSS 0.91规范中,各种元素都被限制为500或100个字符。在一个符合0.91规范的频道中,不能超过15个 <item> 元素。在RSS 0.92和以后的规范中,不再有这些字符长度或者XML级别的限制。处理器也许强加一些它们自己的限制,产生者也许有自己的一些参数选择,它们可以规定在一个频道中,不超过一定数目的<item>元素,或者字符串都限制在一定的长度。 

如上所述,在 RSS 2.0规范中,对于一个目录系统,当链接一个频道到它的标识中时,使用基于频道级别的分类特征。 例如,如果链接一个频道到它的Syndic8 标识,则将包括一个分类元素作为频道的子元素,它有域“Syndic8”属性,同时在Syndic8 数据库中为你的频道确定这个标识。正确的分类元素脚本应该是 <category domain="Syndic8">1765</category>。

一个经常被问到的问题是关于<guid> 如何和 <link>进行区别。它们指的是相同的事情吗?在一些内容系统中是,但在别的内容系统中可能不是。在一些系统中,<link>是一个网络日志项的永久链接。然后在别的系统中,每一个<item>都是一个较长文章的摘要,<link>指向这篇文章,而<guid>则是这个网络日志入口的永久链接。在所有的情况下,建议提供<guid>,如果可能的话,并使它成为一个永久链接。这使聚合着在内容发生变化时,也不会出现重复项目成为可能。

如果你对RSS 2.0规范的格式有任何问题,请向由Sjoerd Visscher维护的电子邮件列表RSS2-Support发送邮件。这个邮件列表不是一个技术辩论的列表,而是一个针对作者和开发人员在创建和使用RSS 2.0格式的内容时提供技术支持的列表。

RSS的扩展

RSS起源于1999年,目标是成为一个简单、易于理解的数据格式。在它逐渐成为一种流行格式之后,开发者想在一个名字空间中使用模块对它进行扩展,正像W3C定义的那样。

RSS遵循简单的规则,增加了它的能力。一个RSS feed 可以包含不是在本页中描述的内容,而只是在一个名字空间中定义的那些元素。

本文档中定义的元素不是一个名字空间本省的元素,因此, RSS2.0从某种意义上来讲,和原来的版本是兼容的,即一个 0.91 或者 0.92 版本的文件也是一个有效的 2.0 版本文件。如果RSS2.0的元素是在一个名字空间中,那么这种约束将被打破,即 一个0.9x 版本的文件不可能是一个有效的2.0 版本的文件。

发展方向 

RSS决不是一个完美的格式,但是它现在已经非常流行,并得到广泛的支持。要成为一个固定的规范,RSS需要很长的一段时间。这项工作的目的是帮助RSS成为一个固定的事情,同时促进和培育围绕它进行的开发的市场的增长,并为新的聚合格式铺平道路。因此,为了实用的目的,RSS规范将被冻结在2.0.2版本。我们可以预期的可能的2.0.2 或者 2.0.3等版本,都只是出于澄清规范的目的,而不是在格式上增加新的特征。后续的工作应该集中在模块化、名字空间的使用和在完全新的聚合格式中用新的名字等方面。

许可协议和作者

RSS 2.0 是在遵循i the Attribution/Share Alike Creative Commons 许可协议 的基础上由 the Berkman Center for Internet & Society at Harvard Law School 提供。本文档的作者是 Dave Winer,UserLand software的创始人,也是 Berkman Center 的员工。

posted @ 2006-04-13 16:07 spiders 阅读(776) | 评论 (0)编辑 收藏

网页中META标签的使用

Meta 标签放在每个网页的<head>...</head>中,我们大家比较熟悉的如: 

<meta name="GENERATOR" content="Microsoft FrontPage 3.0">说明编辑工具;
<meta name="KEYWORDS" content="...">说明关键词;
<meta name="DESCRIPTION" content="...">说明主页描述;

<meta http-equiv="Content-Type" content="text/html; charset=gb_2312-80">和
<meta http-equiv="Content-Language" content="zh-CN">说明所用语言及文字... 

可见META有两种,name和http-equiv。 

name主要用于描述网页,对应于content,以便于搜索引擎机器人查找、分类(目 前几乎所有的搜索引擎都使用网上机器人自动查找META值来给你的网页分类)。这其中最重要的是DESCRIPTION(你的站点在引擎上的描述)和KEYWORDS(搜索引 擎籍以分类的关键词),应该给你的“每一页”都插入这两个META值。当然你也可以不要搜索引擎检索,可用: 
<meta name="ROBOTS" content="all | none | index | noindex | follow | nofollow"> 来确定:
设定为"all"时文件将被检索,且页上链接可被查询;
设定为"none"则表示文件不被检索,而且不查询页上的链接;
设定为"index"时文件将被检索;
设定为"follow"则可查询页上的链接;
设定为"noindex"时文件不检索,但可被查询链接;
设定为"nofollow"则表示文件不被检索,但可查询页上的链接.

http-equiv顾名思义相当于http文件头的作用,可以直接影响网页的传输。比较 直接的例子如: 

1、自动刷新,并指向新网页
<meta http-equiv="Refresh" content="10; url= http://yourlink"> 10秒后刷新到http://yourlink;

2、网页间转换时加入效果
<meta http-equiv="Page-Enter" content="revealTrans(duration=10, transition=50)"> 
<meta http-equiv="Page-Exit" content="revealTrans(duration=20, transition=6)"> 
加在一个网页中,进出时有一些特殊效果,这个功能即FrontPage 98的Format/Page Transition.不过注意所加网页不能是一个Frame页;

3、强制网页不被存入Cache中
<meta http-equiv="pragma" content="no-cache"> 
<meta http-equiv="expires" content="wed, 26 Feb 1997 08:21:57 GMT"> 
大家可以到http://www.internet.com上看看,它的首页当你断线后,就无法在cache中再调出。(本身是关于建站很棒的站点)

4、定义指向窗口
<meta http-equiv="window-target" content="_top">
可以防止网页被别人作为一个Frame调用.(不过,我试了一下,似乎不灵) 

Meta还有很多功能, 如大家关心的 "怎样在搜索引擎中,被放在搜索结果前面的 位置"( http://vancouver-webpages.com/VWbot/mk-metas.html). 你可以在以下站点进一步查询:http://webdeveloper.com/categories/html/ html_metatag_res.html
http://vancouver-webpages.com/META/
http://www.nlc-bnc.ca/ifla/II/metadata.htm

posted @ 2006-04-12 08:54 spiders 阅读(197) | 评论 (0)编辑 收藏