This chapter is based on Chapter 8 of Erik Ray's book [Ray 2001], Chapters 8 and 9 of the Deitel, et.al. tome [Deitel 2001], plus additional material from the web.
XML documents, when parsed, are represented as a hierarchical tree structure. SAX and DOM are two dramatically different APIs for accessing information in XML documents.
SAX was developed by the W3C and released in 1998. SAX-based parsers invoke methods when markup (e.g. a start tag, an end tag, etc.) is encountered. No tree structure is created - data is passed to the application from the XML document as it is found. SAX parsers are typically used for reading XML documents that will not be modified.
SAX-based parsers are available for a variety of programming languages; C++, Java, and Perl being the most popular. SAX is based on an event-driven model using call-backs to handle processing. Consult Figure 1 for what an event-driven program can and can't do.
| Can do | Can't do |
|---|---|
|
|
SAX parsing is fast with low memory consumption, making it ideal for processing XML on the server side, e.g. to translate XML into HTML for viewing in a traditional web browser. Figure 2 illustrates the events that occur when SAX parsing our CD collection document.
|
SAX forgets events as quickly as they are generated, but it can be used as the basis of a more complex API, such as DOM (see below). The three most popular SAX parsers are Xerces [Xerces] from the Apache Project, JAXP [JAXP] from Sun Microsystems, and MSXML [MSXML] from Microsoft. In fact, all three support both SAX and DOM.
The W3C provides a standard recommendation for building a tree structure in memory for XML documents called the XML Document Object Model (DOM). A DOM-based parser exposes (i.e. makes available) a library, called the DOM Application Programming Interface (API), that allows data in an XML document to be accessed and modifyed by manipulating the nodes in a DOM tree, such as our CD collection tree.
Actually programming a DOM-based parser in a language such as Java or Perl is beyond the scope of this chapter. However, manipulating a DOM tree using XSL transformations is the subject of a later chapter. Further details concerning the DOM can be found on the W3C web site [W3C] or their training site [W3Schools].