If your XML documents contain document data (e.g., Framemaker documents stored in XML format), then DOM is a completely natural fit for your solution. If you are creating some sort of document information management system, then you will probably have to deal with a lot of document data. An example of this is the Datachannel RIO product, which can index and organize information that comes from all kinds of document sources (like Word and Excel files). In this case, DOM is well suited to allow programs access to information stored in these documents.
However, if you are dealing mostly with structured data (the equivalent of serialized Java objects in XML) DOM is not the best choice. That is when SAX might be a better fit.
If the information stored in your XML documents is machine readable (and generated) data then SAX is the right API for giving your programs access to this information. Machine readable and generated data include things like:
- Java object properties stored in XML format
- queries that are formulated using some kind of text based query language (SQL, XQL, OQL)
- result sets that are generated based on queries (this might include data in relational database tables encoded into XML).
So machine generated data is information that you normally have to create data structures and classes for in Java. A simple example is the address book which contains information about persons, as shown in Figure 1. This address book XML file is not like a word processor document, rather it is a document that contains pure data, which has been encoded into text using XML.
When your data is of this kind, you have to create your own data structures and classes (object models) anyway in order to manage, manipulate and persist this data. SAX allows you to quickly create a handler class which can create instances of your object models based on the data stored in your XML documents. An example is a SAX document handler that reads an XML document that contains my address book and creates an AddressBook class that can be used to access this information. The first SAX tutorial shows you how to do this. The address book XML document contains person elements, which contain name and email elements. My AddressBook object model contains the following classes:
- AddressBook class, which is a container for Person objects
- Person class, which is a container for name and email String objects.
So my "SAX address book document handler" is responsible for turning person elements into Person objects, and then storing them all in an AddressBook object. This document handler turns the name and email elements into String objects.
The SAX document handler you write does element to object mapping. If your information is structured in a way that makes it easy to create this mapping you should use the SAX API. On the other hand, if your data is much better represented as a tree then you should use DOM.