Extensible Stylesheet Language (XSL) and XSL Transformations (XSLT)

This chapter is based on Chapter 6 of Erik Ray's book [Ray 2001], Chapter 12 of the Deitel, et.al. tome [Deitel 2001], Chapter 12 of Elliotte Rusty Harold's book [Harold 1999], plus additional material from the web.

There is a good tutorial available in the Guide to the XML Galaxy from Zvon.

Introduction

So far, our XML documents, and their associated DTDs, have been static objects. We now see how XSL is used to transform these documents into other useful formats. To do this we need the original XML document, a stylesheet written in the XSLT language, and some suitable software (freely available on the web :-).

Here are some reasons for transforming an XML document into another form.

Store in one format, display in another
Convert the document into HTML (for the web) or PDF (for display and printed paper). Can be combined with CSS for fine-grained control and XHTML output.
Convert to a more useful format
A variation on the above.
Compact (compress) the document
Remove the elements and attributes that are not required for a particular application.
Use the document as a front-end to database queries
A CGI script generates the XPointer references and a suitable stylesheet retrieves the relevent data and builds the result tree which could consist of a number of XHTML files.

Obviously these tasks could be achieved by writing suitable programs in languages such as Perl and Visual Basic. XSLT is designed to perform these transformations (and nothing else) efficiently. It is also easy to learn and read.

Transformations

The Extensible Stylesheet Language for Transformation (XSLT) is a subset of XSL. It's a specification of how to match elements and what XML to produce as output. XML documents are represented as tree-shaped diagrams with each part of the structure represented as a node in the tree. There are seven different types of node.

Element
An element node can contain other elements, plus any other node type except the root node.
Attribute
It represents the content of an element and is called a leaf node because it has children.
Text
Another leaf element that is a child of an element, but not necessarily the only child.
Comment
A node even though it technically does not contribute to the content of the document.
Processing instruction
Like a comment, included for completeness, even though it has meaning only for a particular XML processor.
Namespace
It is not treated as an attribute as it has special power over the document.
Root node
An abstract point above the document element containing everything in the document.

Figure 1 illustrates an XML document containing all these types of nodes.

<?xml version="1.0"?>
<!-- Mick's favourite -->
<sandwich xmlns="http://www.dcs.bbk.ac.uk/~mick/ns">
  <ingredient type="Bovril">
    Savoury spread
    <?knife spread thickly?>
  </ingredient>
  <ingredient>
    Bread
    <!-- Brown wholemeal preferably -->
  </ingredient>
</sandwich>


Figure 1 : Mick's Sandwich

Figure 2 is the aboreal view of Mick's Sandwich. The root node contains a comment and the document root element, <sandwich>, which in turn contains a namespace declaration and two child elements of type <ingredient>.

Mick's Tree
Figure 2 : Mick's XSL Tree

The first <ingredient> contains an attribute (type="Bovril"), a processing instruction (Spread thickly), and some text (Savoury spread). The second <ingredient> contains some text (Bread) and a comment.

XSLT relies on the principle that we can break down the transformation into smaller, more manageable chunks. Each transformation rule focuses on one level, without dealing with the rest of the tree. It contains references to other rules that carry out the processing all the way down to the leaves. Continuing with the sandwich example.

  1. Process the root node. As this is the outermost node, set up any outer containers, e.g. <HTML>, <HEAD>, and <BODY> elements. Process the branches.
  2. Process the <sandwich> element. Generate a suitable heading.
  3. Process each <ingredient> element as a bulleted list item containing the text node.

This is where the sandwich example runs out of steam! Instead, consider the on-line calendar seen earlier. The notion of containment is an important benefit of XML markup. A rule matching a container element (e.g. <date>) can be used to set up the surrounding structure for the children (e.g. <event>) to follow.

Structure and Templates

XSLT is designed to make it easy to work on the document tree by defining template rules interspersed with output. The processor knows the difference between them because the XSLT elements are in the xsl namespace. The special elements <xsl:apply-templates> and <xsl:value-of> navigate around the tree and insert node values respectively. Figure 3 is the complete stylesheet for generating the HTML version of the on-line calendar. Note that, in general, each element of the DTD has a corresponding template indicating what transformations (e.g. markup) are to be performed.

<?xml version="1.0"?>
<!-- cal2html.xsl -->

<xsl:stylesheet version="1.0"
                xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

  <xsl:template match="calendar">
    <xsl:apply-templates/>
  </xsl:template>

  <xsl:template match="year">
    <H1>Calendar for <xsl:value-of select="@value"/></H1>
    <UL>
    <xsl:apply-templates/>
    </UL>
  </xsl:template>

  <xsl:template match="date">
    <LI>
      <xsl:value-of select="@day"/>/<xsl:value-of select="@month"/>
      <OL>
      <xsl:apply-templates/>
      </OL>
    </LI>
  </xsl:template>

  <xsl:template match="event">
    <LI value="{@time}">
      <xsl:apply-templates/>
    </LI>
  </xsl:template>

</xsl:stylesheet>


Figure 3 : On-line Calendar Stylesheet

The stylesheet has to be a well-formed XML document. The root element is always <xsl:stylesheet> and it contains a number of <xsl:template match="..."> elements. These specify how to process particular elements. By default, elements are processed in sequence with leaf elements represented by their text nodes. Individual attribute values are obtained using the <xsl:value-of select="@..."/> empty element. The construct {@...} is a convenient way of including attribute values inside output elements.

Returning to Figure 3. The stylesheet matches the root element <calendar> which matches the templates of its children; this is the default behaviour. Any <year> element generates an HTML <H1> heading and applies the templates of its children as items in an unordered list. Each <date> element outputs the values of the day and month attributes and applies the templates of its children as items in an ordered list.

Finally, each <event> element uses the value of its time attribute as its value in the ordered list and, by default, its text is used as the content of the list item.

There are a number of other XSLT constructs that provide better control over the way the document tree is processed. Figure 4 is a listing of the stylesheet used to transform our CD collection into HTML.

<?xml version="1.0"?>
 
<xsl:stylesheet version="1.0"
		xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
  
  <xsl:template match="cdcollection">
    <xsl:apply-templates select="album"/>
  </xsl:template>
  
  <xsl:template match="album">
    <H1>
    <xsl:apply-templates select="title"/>
    </H1>
    <H2>
    <xsl:apply-templates select="artist"/>
    </H2>
    <OL>
    <xsl:for-each select="track">
      <LI>
      <xsl:value-of select="text()"/>
      <xsl:if test="@time">
        (<xsl:value-of select="@time"/>)
      </xsl:if>
      </LI>
    </xsl:for-each>
    </OL>
  </xsl:template>

  <xsl:template match="artist">
    <xsl:value-of select="text()"/>
    <xsl:if test="position()!=last()">, </xsl:if>
  </xsl:template>

</xsl:stylesheet>


Figure 4 : CD Collection Stylesheet

This figure has introduced the following new constucts (XSLT elements).

<xsl:if test="...">
If the specified test evaluates to true, then the element's contents are processed.
<xsl:for-each select="...">
Iterate through the selected node set.

Default Rules

XSLT defines a set of default rules that makes it easier to write stylesheets. If no rule from the stylesheet matches, these default rules are invoked. These are the default rules for each type of node.

Root node
Processing starts at the root element. The default rule processes the entire tree by applying templates to all the children.
<xsl:template match="/">
  <xsl:apply-templates/>
</xsl:template>
Element
The processor must touch every element in the tree so that it doesn't miss any branches for which rules are defined.
<xsl:template match="*">
  <xsl:apply-templates/>
</xsl:template>
Attribute
The value of every attribute should be included in the result tree.
<xsl:template match="@*">
  <xsl:value-of select="."/>
</xsl:template>
Text
It's inconvenient to include the <xsl:value-of> element in every template to output text; this is done by default.
<xsl:template match="text()">
  <xsl:value-of select="."/>
</xsl:template>
Comment
These are omitted by default using an empty element.
<xsl:template match="comment()"/>
Processing instruction
These are omitted by default using an empty element.
<xsl:template match="processing-instruction()"/>

The minimal stylesheet that matches any well-formed XML document is illustrated in Figure 5.

<?xml version="1.0"?>
 
<xsl:stylesheet version="1.0"
		xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
  
  <xsl:template match="/">
    <xsl:apply-templates/>
  </xsl:template>

</xsl:stylesheet>


Figure 5 : Default Stylesheet

Applying this bland stylesheet to our CD collection produces the predictable result illustrated in Figure 6.


    Sheryl Crow
    Sheryl Crow
    A & M Records
    maybe angels
    a change
    home
    sweet rosalyn
    if it makes you happy
    redemption day
    hard to make a stand
    everyday is a winding road
    love is a good thing
    oh marie
    superstar
    the book
    ordinary morning
    free man
    Slide on This
    Ronnie Wood
    KOCH International
    Somebody Else Might
    Testify
    Ain't Rock'n Roll
    Josephine
    Knock Yer Teeth Out
    Ragtime Annie (Lillie's Bordello)
    Must Be Love
    Fear For Your Future
    Show Me
    Always Wanted More
    Thinkin'
    Like It
    Breath On Me
    Somebody Else Might (Remix)


Figure 6 : Default CD Collection

Note that whitespace has been preserved as text nodes which are output. The HTML output given earlier ignores this whitespace.

XSLT Element Roundup

The following is a non-exhaustive list of the XSLT transformation functions in alphabetical order.

<xsl:apply-templates select="...">
Process selected templates.
<xsl:attribute name="...">
Generate an attribute name/value pair.
<xsl:attribute-set name="...">
Create a set of attribute name/value pairs which can be accessed by their common name.
<xsl:call-template select="...">
Call a named template explicitly.
<xsl:choose>
Offer a choice between several options with <xsl:when> elements evaluated sequentially.
<xsl:comment>
Create an output comment.
<xsl:element name="...">
Create an element.
<xsl:for-each select="...">
Iterate through a context node set.
<xsl:if test="...">
Content is output if the test evaluates to true.
<xsl:number value="...">
Output numbers in a variety of formats.
<xsl:otherwise>
Optional last element inside an <xsl:choose> element.
<xsl:output method="..."/>
Provide a classification for output (XML, HTML, or plain text).
<xsl:param name="...">
Declare a named parameter and provide a default value for use in named templates.
<xsl:sort select="..."/>
Sort a node set along an axis.
<xsl:template match="...">
Process the template if a node matches.
<xsl:template name="...">
XSLT version of a subroutine.
<xsl:text>
Provide strong control over output text.
<xsl:transform>
An alternative for <xsl:stylesheet>
<xsl:value-of select="...">
Calculate and return the value of a node.
<xsl:variable name="...">
Define a constant value for use elsewhere.
<xsl:when test="...">
A child element of <xsl:choose>.

There are a number of additional elements providing advanced techniques; these are beyond the scope of this chapter. Consult Eric Ray's book [Ray 2001] for further details and examples.

References

  1. [an error occurred while processing this directive]
  2. [an error occurred while processing this directive]
  3. [an error occurred while processing this directive]
  4. Zvon.org, XSLT Tutorial.