semi-sleep

  BlogJava :: 首页 :: 联系 :: 聚合  :: 管理
  17 Posts :: 0 Stories :: 1 Comments :: 0 Trackbacks

  download full version

1        Overview

XML Schema mainly defines reference and datatype, reference can be <element> or <attribute>, and both will refer to a certain datatype. Datatype may in turn contains references (<attribute for <complexType> and <element> for <complexContent>).

1.1    Difference between OO language

XML Schema define type system just like OO language, note that for OO language class definition is mixed with codes which will initialize and operate on instance, while for XML, schema and instance document are separated.

1.2    Inheritance

Like OO language, datatype here can be derived, also datatype is polymorphic, so you can use “xsi:type” attribute in instance document to specify to use an instance of child datatype.

1.3    Substitution

For reference like <element>, it’s merely a named link to a datatype, you can use substitution (by specifying the “substitutionHead” attribute) to enable a link to be replaced by some other links. Substitution can form a tree style graph just like inheritance dose, you can change the “name” of the child link to any valid string, also, for the “type”, you can change it to be any child datatype of the parent link’s “type”.

1.4    Global and local

Both reference and datatype can be global or local, only global ones can be referred to by “ref”, “substitution”, “type” and “xsi:type”.

For the “type” attribute used to specify a global datatype, it can be replaced by defining a local datatype.

Like the “base” attribute for <extension>, for local <element>, theoretically you can use “substitutionHead” to reference a global element just like using “base” to reference a global datatype, but since local <element> can not be used in other place, doing this is meaningless.

1.5    Final and block

The “final” attribute affects the schema itself, while the “block” attribute affects instance document.

The “final” attribute for <element> is used to prohibit substitution with certain derived datatype (derived by restriction or extension), note that there’s no way to prohibit substitution with the same datatype in schema level.

The “block” attribute for <element> in addition, can accept a “substitution” value which can be used to prohibit substitution in instance document.

1.6    Abstract

The “abstract” attribute only affects instance document, with used with <element>, it mean that this element can not be appear in instance document, it should only be used as substitution head. When used with <complexType>, it means that this type can not be used directly in instance document, you can use “xsi:type” to specify a child type.

2        Built-in datatypes

Built-in datatypes contain built-in primitive and built-in derived datatype. Datatypes defined by schema author is call user-derived datatype.

Built-in datatypes are put together with other xml schema definitions inside namespace "http://www.w3.org/2001/XMLSchema", also, they are separated and put in namespace "http://www.w3.org/2001/XMLSchema-datatypes".

2.1.String

String is considered to have the following aspect:

§    Contains return, line feed or tab (preserve or replace)? - normalized

§    Is normalized? Contains leading or trialing spaces, contains internal sequences of two or more spaces (collapse)? - tokenized

§    Contains only name characters (not whitespace)? - Name and NMTOKEN

§    Name characters fall into the following groups: letter, number, other character (. : _ -)

§    The first character of a string – Name (:, _ and letters), NCName (non-colonized name, so only _ and letters), NMTOKEN (any name character is ok)

§    String may have external constrains, like ID should be unique, IDREF should refer to an existing ID, ENTITY should refer to an external unparsed entity, QName(qualified name, should has a prefix and a postfix separated by a colon, both prefix and postfix should be NCName, also prefix should be declared as a namespace prefix before)

2.2.Number

Numbers are felled into 2 groups: finite (float, double, and [unsigned](long|int|short|byte)), infinite (decimal and [[non]positive|negative]integer). Note that decimal can be any number.

2.3.Binary

Datatype base64Binary and hexBinary are the same in value space (they are both byte array), but different in lexical space (one is base64 coded, another is hexadecimal coded).

2.4.Date time

For datatypes regarding to time, dateTime, time, date, gYearMonth and gYear represent a exact period, duration represent the length of some period, gMonthDay, gDay and gMonth represents some period that will recur.

3        Inheritance overview

The xml schema type system:
Any Type (contains attribute?)
    - Simple Type
    - Complex Type (contains element?)
        - Simple Content
        - Complex Content (can be mixed with character?)
            - Element Only
            - Mixed
For those "√", unless otherwise explained, they should be derived from the same type.

Create New

By List 

By Union 

 By Restriction

By Extension

Simple Type

Complex Type with Simple Content

√ [1]

Complex Type with Complex Content
(Element Only)

√ [2]

Complex Type with Complex Content
(Mixed)

√ [2]

[1] can extend from Simple Type

[2] [3] don't need to use <complexContext>, use <sequence>, <any> or <all>

4        Inheritance for simple type

A datatype is composed of a value space, a lexical space and facets. There’s a one-to-many relationship between value space and lexical space.

Atomic datatype is simple datatype like string or decimal or types that derived from these simple datatypes, so datatypes derived by list or union is not atomic datatype.

4.1.By restriction

Facet defines aspect of a value space (maybe through lexical space).

Fundamental facets describe some properties of a datatype which is read-only, that means each datatype has such properties but you can not modify them, I believe is only useful for describing those built-in datatypes so that they can be mapped to other languages. Here are the fundamental facets: equal, ordered, bounded, cardinality, numeric.

Restriction facets are placed inside the <restriction> tag, several facets can be used together. All facets except pattern and enumeration facet have a fixed attribute which denote whether derived datatype can override this attribute. All facets have a value attribute to specify the value. All facets’ content text are ignored and used as annotation. Some facets can appear more than once, like pattern and enumeration.
<restriction base="xxx">
    <facet_xxx fix="true|false" value="xxx">annotation</facet_xxx>
</restriction>

§    The whitespace, enumeration and pattern facets apply to all datatypes.

§    The maxExclusive, maxInclusive, minExclusive, minInclusive, totalDigits and fractionDigits apply to number.

§    The length, maxLength, minLength apply to string, list and byte array.

4.2.By list

Value of a list datatype should contains space seperated value(s) of given itemType:
<simpleType name='sizes'>
  <list itemType='decimal'/>
</simpleType>
<cerealSizes xsi:type='sizes'> 8 10.5 12 </cerealSizes>
Value of list datatype is always separated by space first, so the following codes contains 18 items, not 3 items:
<simpleType name='listOfString'>
  <list itemType='string'/>
</simpleType>
<someElement xsi:type='listOfString'>
  this is not list item 1
  this is not list item 2
  this is not list item 3
</someElement>

The pattern facet for list datatype apply to the whole list, not a single item.

4.3.By union

Union datatype is the union of its member types, the order of member types definition is significant and can be overrided by “xsi:type”:
<xsd:element name='size'>
  <xsd:simpleType>
    <xsd:union>
      <xsd:simpleType>
        <xsd:restriction base='integer'/>
      </xsd:simpleType>
      <xsd:simpleType>
        <xsd:restriction base='string'/>
      </xsd:simpleType>
    </xsd:union>
  </xsd:simpleType>
</xsd:element>
<size>1</size>
<size>large</size>
<size xsi:type='xsd:string'>1</size>

5        Definition and inheritance for complex type

5.1.Choice, sequence and all

Comparing to <sequence>, <all> allow elements to appear in any order.

5.2.The mixed content model

Both <complexType> and <complexContent> has "mixed" attribute which can be used to specify a mixed content model, note that there’s no way to specify datatype for "mixed" content, they are always treated as character data.

5.3.Derived by restriction and extension

About restriction, if A is restricted from B, then instance of A will always be valid for B. For simple type, only facets are allowed, so it should be true, for complex type, this means that you can use restriction to narrow down valid usage of some attributes and elements.

About extension, extension is more like template, if A is extended from B, this is equivalent to a datatype created by first copy B's definition and then append A's definition. So extension can be use to add new attribute or element to an existing type, and either instance of A or B may be invalid for the other. For complex content, is a little special as that the element definition for A and B can be considered to be surrounded by a bigger <sequence>.

5.4.The “default” and “fixed” attributes

Both <element> and <attribute> contain “fixed” and “default” attributes, they are mutually exclusive. The “default” attribute mean that if not present, it will be set to this value. Beyond the constraint stated by “default”, the “fixed” attribute enforce that if present, it should be the same as this value.

5.5.The “form”, “elementFormDefault” and “attributeFormDefault” attributes

The “form” attribute only apply to local <element> and <attribute> (they are consider local even when they are nested inside a global <group> or <attributeGroup>). Because they are local, they are not explicitly related to a namespace, so you can choose the form they present, whether they should be enforce to be qualified with a namespace.

§    If the value is “unqualified”, it can be related to target namespace or blank namespace.

§    If the value is “qualified”, it must be related to target namespace.

The <schema> tag has “elementFormDefault” and “attributeFormDefault” attributes to allow a default value to be set.

5.6.The “ref” attribute

Use “ref” attribute to refer to global <element>, <attribute>, <group> or <attributeGroup>, note that lots attributes can not present if “ref” is used.

5.7.The “maxOccurs”, “minOccurs” and “use” attributes

The “maxOccurs” and “minOccurs” attributes apply to <element>, <any>, <choice>, <sequence>, <all> and <group>. Note that they can only be used within local scope. The <all> tag and all its child tags should has “maxOccurs” set to 1, and “minOccurs” set to 1 or 0, I think this limitation is used to eliminate ambiguous definition or to ease xml processor implementation.

The “use” attribute has similar usage to <attribute>, it values:

§    “prohibited”, should appear 0 times

§    “required”, should appear 1 times

§    “optional”, can appear 0 or 1 times

5.8.The “nillable” attribute

For <element>, if “nillable” is set to true, "xsi:nil" attribute can be used in instance document to explicitly specify that the elements content (not matter it's simple content or complex content) is null, for example, if a element is type of "long", you can not set it to null unless you use "xsi:nil".

Grouping

Grouping is another way to build a modularize schema, use <group> for element and use <attributeGroup> for attribute. Note that <group> can not contain <element> or <any> as its direct child.

6        Wildcard

6.1.Any and any attribute

<any> apply to element, and <anyAttribute> apply to attribute, both has “namespace” and “processContents” attributes.

For “namespace”, here are the valid values:

For “processContents”, here are the valid values:

§    “strict”, obtain schema (error if not found), validate content (error if invalid).

§    “lax”, obtain schema (skip validation if not found), validate content (error if invalid).

§    “skip”, do nothing.

6.2.Any type and any simple type

The anySimpleType are quite special as all primitive type is considered as derived by restriction from anySimpleType, but they are call "primitive" (normally, primitive types don't derive from other). The anyType is the base type for anySimpleType and complex types.

You can use anyType, anySimpleType in attributes like "type" or "base", anyType allow any content and attribute, while anySimpleType allow any text content. You can also use “xsi:Type” to specify actual type being used.

These kinds of datatype are called ur-type (ur prefix means origin in German).

7        Namespace

7.1.Several schema namespaces

http://www.w3.org/2001/XMLSchema

http://www.w3.org/2001/XMLSchema-datatypes

http://www.w3.org/2001/XMLSchema-instance

7.2.Different between prefixed / default / blank namespace.

Elements with prefixed and default namespace both are related to some namespace, though they appear in different form.

Elements with blank namespace are related to nothing, though they are in the same form as default namespace.

7.3.Namespace regarding the “name” and “type” attribute.

Values inside the “name” attribute (e.g. of <element>, <attribute> or <simpleType>), are consider to be declared inside the “targetNamespace” of <schema> tag. While values inside “type” (or “ref”, “base”…) attribute don’t has such assumption, you should always quality them with a namespace (though you may use default namespace so they appear the same as values in “name” attribute).

8        Constraint

8.1.Unique, key and keyref

The <unique>, <key> and <keyref> tags are used to specify constraints to document instance. The <selector> and <field> tags are nested inside these tags to retrieve a list of value.

§    <unique>, values inside the list must either be null or unique.

§    <key>, values inside the list must not be null and must be unique.

§    <keyref>, values inside the list must correspond to values retrieved from <unique> or <key> the “name” of which is specified by the “refer” attribute.

8.2.Selector and field

The <selector> and <field> both has a “xpath” attribute, for <selector>, it’s used to select a list of element, for <field>, it’s used to select value(s) from each element in list. Note that <field> tag may appear several times to specify a complex value (like complex primary key in relational database).

8.3.Xpath scope and name scope

For the “xpath” attribute in <selector>, its context path is related to the closest enclosing <element>, which means that unless you use absolute path, you should place <unique>, <key> or <keyref> properly inside certain <element>, in fact, these tags are only allowed to be nested inside <element> tag.

For the name scope, though <unique>, <key> and <keyref> always not in global scope (not directly inside <schema>), they always have global names and can be referred by other tags.

9        Import, include, redefine and schema

9.1.Import, include and redefine,

The <import> tag imports schema from different namespace to current schema, the <include> tag include a schema, the <redefine> tag act like <include> tag, beyond that it provide the ability to redefine datatype, group and attribute group defined inside the schema.

All these tags have a “schemaLocation” attribute.

§    <import> tag has an additional “namespace” attribute, it maybe used to guess the schema location if the “schemaLocation” attribute is missed, the “namespace” attribute must be exactly the same as the “targetNamespace” attribute in the imported schema.

§    <include> tag should refer to a schema which has the same namespace as current schema, or has blank namespace.

§    <redefine> tag further requires that datatype should always derived from itself, while group and attribute group should always have exactly one reference to itself.

9.2.Schema

The <schema> tag is the root tag in a schema, it has attributes like “targetNamespace”, “version” and “xml:lang” to provide information for current schema, it also has attributes like “finalDefault”, “blockDefault”, “elementFormDefault” and “attributeFormDefault” to provide default values.

10 Schema instance

These attributes are put inside namespace “http://www.w3.org/2001/XMLSchema-instance

10.1.        The “schemaLocation” and “noNamespaceSchemaLocation”

The “schemaLocation” attribute accepts a list of URI pair, each pair contains a namespace and a schema location. The “noNamespaceSchemaLocation” only accepts one URI, which is treated as schema location.

These attributes can appear anywhere as long as they are before (or at) the first element to validate, note that if the root element dose not contains “schemaLocation” attribute, for my understanding the root element's content is consider to be "skip" (refer to the "processContent" attribute of <any> and <anyAttribute>), so no validation will perform.

10.2.        The “type”

Use to specify the actual type used.

10.3.        The “nil”

Use to specify that an element is null.

11 Other

11.1.        Schema documentation

<annotation> contains <appinfo> (for machine) and <documentation> (for human). Each contains a "source" attribute refer to a URI, <documentation> may has "xml:lang" attribute.

11.2.        About DTD

Schema can be used together with DTD, note that schema can not define entity like dtd, while dtd can not support local element, namespace, grouping, inheritance.

11.3.        Root element

There' no way to force which element should be the root element using schema, specify this information to xml processor.

12 Questions

Why float and double are not derived from decimal?

Why <complexContent> need the “mixed” attribute?

Why not force <choice>, <sequence>, <all> and <group> to be put inside <complexContent>?

What’s the usage of <notation>?

13 References

"MSDN - XML Standards Reference"

http://msdn.microsoft.com/en-us/library/ms256177.aspx

"XML Schema Design Patterns: Is Complex Type Derivation Unnecessary?"

http://www.oreillynet.com/xml/blog/2007/07/derivation_by_implied_restrict.html

“XML Schema Part 0: Primer Second Edition

http://www.w3.org/TR/xmlschema-0/

“XML Schema Part 1: Structures Second Edition”

http://www.w3.org/TR/xmlschema-1/

“XML Schema Part 2: Datatypes Second Edition”

http://www.w3.org/TR/xmlschema-2/

posted on 2009-01-05 16:31 semi-sleep 阅读(379) 评论(0)  编辑  收藏 所属分类: xml

只有注册用户登录后才能发表评论。


网站导航:
博客园   IT新闻   Chat2DB   C++博客   博问