More XML

This chapter is based on Chapters 1 and 2 of Eric Ray's book [Ray 2001], Chapters 4, 5, 20 and 21 of the Deitel, et.al. tome [Deitel 2001], Part 1 of Elliotte Rusty Harold's book [Harold 1999], plus additional material from the web.

Introduction

Figure 1 illustrates how a collection of CDs may be put together in XML.

<?xml version="1.0"?>
<!DOCTYPE cdcollection SYSTEM "cd-collection.dtd">

<cdcollection>
  <album id="540 590-2">
    <title>Sheryl Crow</title>
    <artist>Sheryl Crow</artist>
    <label>A &amp; M Records</label>
    <track time="4:56">maybe angels</track>
    <track time="3:50">a change</track>
    <track time="4:51">home</track>
    <track time="3:58">sweet rosalyn</track>
    <track time="5:23">if it makes you happy</track>
    <track time="4:27">redemption day</track>
    <track time="3:07">hard to make a stand</track>
    <track time="4:16">everyday is a winding road</track>
    <track time="4:43">love is a good thing</track>
    <track time="3:30">oh marie</track>
    <track time="4:58">superstar</track>
    <track time="4:34">the book</track>
    <track time="3:55">ordinary morning</track>
    <track time="3:20">free man</track>
  </album>
  <album id="332 80-2">
    <title>Slide on This</title>
    <artist>Ronnie Wood</artist>
    <label>KOCH International</label>
    <track>Somebody Else Might</track>
    <track>Testify</track>
    <track>Ain't Rock'n Roll</track>
    <track>Josephine</track>
    <track>Knock Yer Teeth Out</track>
    <track>Ragtime Annie (Lillie's Bordello)</track>
    <track>Must Be Love</track>
    <track>Fear For Your Future</track>
    <track>Show Me</track>
    <track>Always Wanted More</track>
    <track>Thinkin'</track>
    <track>Like It</track>
    <track>Breath On Me</track>
    <track>Somebody Else Might (Remix)</track>
  </album>
</cdcollection>


Figure 1 : CD Collection in XML

The document root is <cdcollection>, this contains one or more <album>s, etc. Figure 2 illustrates the collection rendered in HTML with the individual tracks being items in an ordered list.

Sheryl Crow

Sheryl Crow

  1. maybe angels (4:56)
  2. a change (3:50)
  3. home (4:51)
  4. sweet rosalyn (3:58)
  5. if it makes you happy (5:23)
  6. redemption day (4:27)
  7. hard to make a stand (3:07)
  8. everyday is a winding road (4:16)
  9. love is a good thing (4:43)
  10. oh marie (3:30)
  11. superstar (4:58)
  12. the book (4:34)
  13. ordinary morning (3:55)
  14. free man (3:20)

Slide on This

Ronnie Wood

  1. Somebody Else Might
  2. Testify
  3. Ain't Rock'n Roll
  4. Josephine
  5. Knock Yer Teeth Out
  6. Ragtime Annie (Lillie's Bordello)
  7. Must Be Love
  8. Fear For Your Future
  9. Show Me
  10. Always Wanted More
  11. Thinkin'
  12. Like It
  13. Breath On Me
  14. Somebody Else Might (Remix)

Figure 2 : CD Collection in HTML

Note that track times, if known, are provided as attributes of <track> instead of as elements in their own right. Figure 3 illustrates the tree-like structure that all XML documents possess.

CD Collection Tree
Figure 3 : CD Collection Tree

This structure enables us to process or parse the document by starting at the root and visiting the nodes of interest. For example, we could visit just the <artist> nodes and generate an alphabetic list of artists. This will be demonstrated when we look at the Extensible Stylesheet Language (XSL).

Attributes versus Elements

There are no hard and fast rules about when to use child elements and when to use attributes. In general, the data itself should be stored in elements, with information about the data (meta-data) stored in attributes. Imagine a document with all the tags (and their attributes) removed, then the basic information should still be present. Attributes are good places for storing IDs, URLs, references, and other information not directly relevent to the reader of the document. Keep the following points in mind.

The first two points are illustrated by the XML fragment given in Figure 4.

<article date="10/11/2000">
  Yet More XML
</article>


Figure 4 : Date Attribute

The date attribute has structure signified by the / character. This is difficult to extract and potentially ambiguous depending on your country and upbringing. Parsers and people entering data can interpret the date differently to you. Figure 5 illustrates the alternative approach.

<article>
  <date>
    <day>10</day>
    <month>11</month>
    <year>2000</year>
  </date>
  <title>
    Yet More XML
  </title>
</article>


Figure 5 : Date Element

Now the XML is unambiguous. With CSS or XSL it's easy to format the date, or even omit parts. It also allows more than one <date> to be associated with an element, for example revisions of the article. Note that once the <article> element has child elements, it is preferable to explicitly identify text children, such as <title> in Figure 5, rather than leave their meaning implicit, as in Figure 4.

References

  1. [an error occurred while processing this directive]
  2. [an error occurred while processing this directive]
  3. [an error occurred while processing this directive]