Extensible Style Language

Peter Wood

Extensible Style Language (XSL)

XSL example

XSL rules

XSL Formatting

Ways of Presenting XML

Different ways of presenting XML

XSLT Program Skeleton

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0"
  xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

  <xsl:template match="...">
      ...
  </xsl:template>

  <xsl:template match="...">
      ...
  </xsl:template>

  ...

</xsl:stylesheet>

Data Model - example document

<CD publisher="Deutsche Grammophon"
    length="PT1H13M37S" >
  <composer>Johannes Brahms</composer>
  <performance>
    <composition>Piano Concerto No. 2</composition>
    <soloist>Emil Gilels</soloist>
    <orchestra>Berlin Philharmonic</orchestra>
    <conductor>Eugen Jochum</conductor>
  </performance>
  <performance>
    <composition>Fantasias Op. 116</composition>
    <soloist>Emil Gilels</soloist>
  </performance>
</CD>
  • note that the CD element now has two attributes

Data Model - example tree

  • document is viewed as a tree (hierarchy) of nodes

Tree representation of CD example

Data Model Description

  • 6 types of node:
    • root, element, attribute, text, comment, processing instruction
  • root of tree is different from (and parent of) root element of the document (CD in example)
  • in the example slide
    • the special root node is red
    • element nodes are yellow
    • attribute nodes are pink
    • text nodes are green
  • element nodes have associated set of attribute nodes
  • attribute nodes are not children of element nodes
  • order of child element nodes is significant

XSLT Processing Model

  • processor (e.g., browser) reads an input XML document and an XSLT stylesheet
  • input XML document viewed as a tree (the source tree)
  • processing starts at root node of source tree
  • a single node is processed by
    • finding the template rule with the best matching pattern
    • once found, executing the template instructions (often selecting more nodes to process) and creating a fragment of the output document (result tree)
    • if not found, proceeding with the list of child nodes
  • a node list is processed by processing each node in order
  • process continues recursively until no new source nodes are selected

RSS Example

<rss>
  <channel>
    <title> ... </title>
      ...
    <item>
      <title> ... </title>
      <description> ... </description>
      <link> ... </link>
      <pubDate> ... </pubDate>
    </item>
      ...
    <item>
      <title> ... </title>
      <description> ... </description>
      <link> ... </link>
      <pubDate> ... </pubDate>
    </item>
  </channel>
</rss>

Example of an XSLT Rule

  <xsl:template match="channel">
    <html>
      <xsl:apply-templates select="item"/>
    </html>
  </xsl:template>
  • template matches channel elements
  • the value of the match attribute is an XPath expression in general (see later)
  • the matched element is called the context node
  • <html> and </html> are instructions to construct output element using literals
  • <xsl:apply-templates select="item"/> is an instruction to apply templates to all item children of the context node
  • the select attribute value is also an XPath expression
  • patterns allowed in match are a subset of expressions allowed in select

Another Example of an XSLT Rule

  <xsl:template match="item">
    <p>
      <xsl:value-of select="title"/>
    </p>
  </xsl:template>
  • template matches item elements
  • <p> and </p> are literals constructing a result element named p
  • xsl:value-of element is an instruction to output the value of what is selected by select attribute value

Example: RSS headlines

  • an XSLT processor will take as input
  • apply the stylesheet to the source to give
    • the HTML output
        <html>
          <p>Policewoman shot during burglary</p>
          <p>Lebanon marks Hariri anniversary</p>
          <p>MPs to vote on full smoking ban</p>
        </html>
      
      (see rss-fragment-headlines.html)

Saxon and XT

  • Saxon and XT are XSLT processors written in Java
  • to run xt in the labs, you can use the batch file xt.bat in n:\xmltools
  • e.g., running the following from the command line
    n:\xmltools\xt rss-fragment.xml rss-headlines.xsl rss-fragment-headlines.html
    
  • takes rss-fragment.xml and rss-headlines.xsl as input
  • produces rss-fragment-headlines.html as output
  • to use Saxon in the labs, you can use the batch file saxon.bat in n:\SaxonHE
    n:\SaxonHE\saxon
       rss-fragment.xml rss-headlines.xsl rss-fragment-headlines.html
    

Using a stylesheet processing instruction

  • web browsers have XSL processors built in to them
  • can be invoked by including a stylesheet processing instruction in the XML source file
  • processing instruction comes after the XML declaration and before the root element
  • an example might be:
    <?xml-stylesheet href="rss-headlines.xsl" type="text/xsl" ?>
    
    where the value of href is a URI and the value of type is a MIME type
  • using the above stylesheet in our RSS fragment yields rss-fragment-headlines.xml (view the source to see the stylesheet processing instruction)

Example: RSS descriptions

applying the stylesheet rss.xsl comprising

  <xsl:template match="channel">
    <html>
      <head>
        <title><xsl:value-of select="title"/></title>
      </head>
      <body>
        <table border="1">
          <xsl:apply-templates select="item"/>
        </table>
      </body>
    </html>
  </xsl:template>

  <xsl:template match="item">
    <tr>
      <td><xsl:value-of select="title"/></td>
      <td><xsl:value-of select="description"/></td>
    </tr>
  </xsl:template>

to rss-fragment.xml yields (rss-fragment.html) rss-fragment-xsl.xml as viewed in a browser with the correct stylesheet processing instruction

Example: RSS headlines (again)

  • can use one rule instead of two:
      <xsl:template match="channel">
        <html>
          <xsl:for-each select="item">
            <p>
              <xsl:value-of select="title"/>
            </p>
          </xsl:for-each>
        </html>
      </xsl:template>
    
  • xsl:for-each selects all item children of channel
  • instructions given as contents of xsl:for-each element are applied to each item in turn
  • note that title selects child elements of item named title

Some XPath expressions

  • XPath is a general language for selecting nodes from an XML document tree, used in
    • match attribute of xsl:template element
    • select attribute of xsl:apply-templates, xsl:value-of and xsl:for-each elements
  • we've seen the simplest kinds of expressions: simple element names like channel
  • can build up paths of names:
    channel/title
    
    selects all title children of channel children of the current context node
  • can select the parent of the context node: ..
  • can select the the context node itself: .
  • can select the special extra root node of the tree: /
  • can select descendants of the root node:
    //title
    
    selects all title children of descendants of the root (including itself)

XPath expressions

  • an XPath expression is either
    • an absolute expression or
    • a relative expression
  • an absolute expression
    • starts with /
    • is followed by a relative expression
    • and is evaluated starting at the root node
  • a relative expression is
    • a sequence of location steps
    • each separated by /
  • example (absolute expression comprising 2 steps):
    /item/title

Relative expressions

  • relative expression is evaluated with respect to an initial context (set of nodes)
  • initial context is defined externally (e.g. by XSLT)
    <xsl:template match="item">
      <xsl:value-of select="title"/>
    </xsl:template>
    
    context for title given by item
  • each location step
    • is evaluated with respect to some context
    • produces a set of nodes which
    • provides the context for the next location step

Simple subset of XPath

  • subset uses abbreviated syntax
  • a location step has one of 3 forms:
    • it is empty, i.e., //
    • element-name predicates
    • @attribute-name predicates
  • an empty step means search all descendants of each node in the context
  • element-name means find all child elements of each node in the context which have the given name
  • @attribute-name means find the attribute node of each node in the context which has the given name
  • optional predicates (each enclosed in [ and ]) filter out nodes

Example: using XPath (1)

  • output all title elements from RSS feed
  • first rule is
    <xsl:template match="/">
      <html>
        <body>
            <xsl:apply-templates select="//title"/>
        </body>
      </html>
    </xsl:template>
    
  • rule matches only the root node of the document (match="/")
  • select attribute causes templates to be applied only to title descendents of the root node

Example: using XPath (2)

  • other rules are
      <xsl:template match="channel/title">
          <h1><xsl:value-of select="."/></h1>
      </xsl:template>
    
      <xsl:template match="image/title"/>
    
      <xsl:template match="item/title">
        <p>
          <b><xsl:value-of select="."/></b><br />
          <xsl:value-of select="../description"/>
        </p>
      </xsl:template>
    
  • the first rule matches title elements that are children of channel elements
  • the matched element (title) is selected using .
  • the second rule matches title elements that are children of image elements and does nothing (we will see why later)
  • the third rule matches title elements that are children of item elements
  • the description element, which is a sibling of the matched title is selected using ../
  • the result of applying rss-xpath.xsl is rss-fragment-xpath.xml (rss-fragment-xpath.html)

Built-in template rules (1)

  • if an element node is selected by a stylesheet but no rule matches it, the processor tries to find rules to match each of the node's children
  • the following rule is effectively built in:
    <xsl:template>
      <xsl:apply-templates/>
    </xsl:template>
    
  • template with no match attribute matches any node, but the above rule has the lowest priority
  • apply-templates with no select attribute applies rules to all child nodes

Built-in template rules (2)

  • if a text or attribute node is selected by a stylesheet but no rule matches it, the processor outputs the node's value
  • the following rule is effectively built in:
    <xsl:template match="text()|@*">
      <xsl:value-of select="."/>
    </xsl:template>
    
  • text() matches text nodes
  • @ matches attribute nodes
  • * matches any (attribute) name
  • | matches either of its operands (text() or @*)

More XPath examples

  • consider file cd.xml
  • view results using
  • /CDlist/CD: all child CD elements of the CDlist element that is the child of the root
  • //composer: all composer elements that are descendants of the root
  • //performance/composer: all composer child elements of performance elements which are descendants of the root
  • //performance[composer]: all performance elements that have a composer element as a child
  • //CD[performance/date]: all CD elements that have a performance element as a child that has a date element as a child
  • //performance[conductor][date]: all performance elements that have both conductor and date elements as children

Predicates

  • predicates filter out nodes from an ordered node-set S
  • evaluate predicate on each node x in node-set S with
    • x as the context node
    • the size of S as the context size
    • the position of x in S as the context position
  • predicate comprises
    • Boolean expressions: using and, or, not, =, ...
    • numerical expressions: using +, -, ...
    • node-set expressions: location paths filtered by predicates
    • node-set functions

Node-Set Functions

  • last(): returns context size
  • position(): returns context position
  • count(S): returns number of nodes in S
  • name(S): returns name of first node in S
  • id(S): returns nodes who have an ID-type attribute with a value in S
  • e.g.
    • position()=2: true if node is 2nd in the context
    • position()=last(): true if node is last in the context

Examples

  • count(//performance): the number of performance elements
  • //performance[not(date)]: performance elements that do not have a date element as a child
  • all CD elements that have "Deutsche Grammophon" as publisher and have more than one performance element as child:
    //CD [publisher="Deutsche Grammophon"
          and count(performance) > 1]
    
    or
    //CD [publisher="Deutsche Grammophon"]
         [count(performance) > 1]
    
    or
    //CD [count(performance) > 1]
         [publisher="Deutsche Grammophon"]
    

Example: Notes with table of contents

  • notes for this course (in XHTML)
  • each chapter (e.g. XSL) is in a separate file, e.g., xsl.html
  • each section (slide) is inside a div element, with a class attribute value of slide
  • for notes, we want to
    • generate table of contents
    • number each section, e.g., 6.32
    • output everything else "as is"
  • there are other div elements used for notes, with class attribute values of handout and page-break

Notes example (part 1)

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

<xsl:param name="chapter"/>

<xsl:template match="/html">
  <html>
    <!-- Part 2 goes here -->
    <body>
      <!-- Output heading -->
      <h1>
        <xsl:value-of select="$chapter"/>.
        <xsl:value-of select="head/title"/>
      </h1>
    <hr />

    <!-- Part 3 (table of contents) goes here -->
    <!-- Output the contents, numbering each h1 as an h2 -->
    <xsl:apply-templates select="body/div" />

    </body>
  </html>
</xsl:template>

<!-- Parts 4 (matching slides) and 5 (matching other divs) go here -->

</xsl:stylesheet>
  • value for parameter named chapter is supplied on command line to XT, e.g.:
    xt xsl.html notes-new.xsl notes.html chapter=6
    
  • parameter value is referenced using $chapter

Notes example (part 2)

<!-- Output title and scripts -->
<head>
  <title><xsl:value-of select="head/title"/></title>
  <link rel="stylesheet" href="../notes.css" type="text/css" />
  <xsl:copy-of select="head/style"/>
  <xsl:copy-of select="head/script[not(contains(@src,'slidy.js'))]"/>
</head>
  • output title as title
  • output link to a stylesheet
  • copy-of copies the selected input (whole tree rooted at node) to output
  • output all style elements in the head
  • output all script elements in the head, except any containing slidy.js in the src attribute

Notes example (part 3)

<!-- Table of contents has one entry for each h1 -->
<ol>
  <xsl:for-each select="body/div[@class='slide']/h1">
    <li><xsl:value-of select="."/></li>
  </xsl:for-each>
</ol>
<hr />
  • use ol to number each section in the table of contents
  • context of for-each is the html element in the source
  • [@class='slide'] is a predicate: selects those div elements that have a class attribute whose value is slide

Notes example (part 4)

<xsl:template match="div[@class='slide']">
  <xsl:variable name="section"><xsl:number count="div[@class='slide']"/></xsl:variable>
  <xsl:for-each select="*">
    <xsl:choose>
      <xsl:when test="name()='h1'">
        <h2>
          <xsl:value-of select="$chapter"/>.
          <xsl:value-of select="$section"/>.
          <xsl:value-of select="."/>
        </h2>
      </xsl:when>
      <xsl:otherwise>
        <xsl:copy-of select="."/>
      </xsl:otherwise>
    </xsl:choose>
  </xsl:for-each>
</xsl:template>
  • variable works like parameter: provides means for storing a value in a named variable
  • number element is replaced by a value in the output: the position of present div element with respect to all those satisfying div[@class='slide']
  • use choose for conditional processing:
    • contents of when processed if result of test expression is true
    • contents of otherwise processed if result of every test expression is false
  • name() function returns the name of the current element

Notes example (part 5)

<xsl:template match="div">
  <xsl:if test="@class='handout' or @class='page-break'">
    <xsl:copy-of select="."/>
  </xsl:if>
</xsl:template>
  • will match div elements that do not have a class attribute value of slide
  • if element is used for conditional processing where there is no "else" part
  • output entire contents of div elements if they have the class value of handout or page-break

Notes example (complete)

Some other XSLT elements

  • xsl:element element allows an element to be created with a computed name
  • xsl:attribute element can be used to add attributes to result elements
  • literal data characters may also be wrapped in an xsl:text element
  • xsl:comment element is instantiated to create a comment node in the result tree
  • sorting specified by adding xsl:sort elements as children of xsl:apply-templates or xsl:for-each element

Exercises

  1. Consider an XML represention of information about students on an MSc programme. All information should be represented using elements rather than attributes. The root element of the document is programme. A programme has a degree, whose value might be "MSc", and a year, whose value might be "2014/2015". These elements are followed by the results for the programme. The results are partitioned into distinction, merit, pass and fail. Within each is a sequence of name elements, each containing the name of a person having achieved the corresponding result for the programme.

    Write an XSL template rule that, when matched against an XML document described above, produces an HTML document comprising a list of names of those students who obtained distinctions. The title of the document should be assembled from the contents of the degree and year elements, so that the answer when run on the document with the values suggested above would be "MSc (2014/2015)". There should be a level-2 heading "List of Distinctions", followed by an unnumbered list of names of students who obtained a distinction.


  2. Write an XSLT program which will transform an XML document of the form:
    <teaches>
      <teaches-tuple course="IWT" lecturer="Peter Wood"/>
      <teaches-tuple course="CS" lecturer="Szabolcs Mikulas"/>
    </teaches>
    
    into one of the form:
    <teaches>
      <teaches-tuple>
        <course>IWT</course>
        <lecturer>Peter Wood</lecturer>
      </teaches-tuple>
      <teaches-tuple>
        <course>CS</course>
        <lecturer>Szabolcs Mikulas</lecturer>
      </teaches-tuple>
    </teaches>
    
    You can assume that teaches is the root element, and that the course and lecturer attributes are required. Obviously your program should work for any number of occurrences of the teaches-tuple element.


  3. For this exercise the source XML document is booker.xml. This file contains information about winners of the Booker prize. You should save a copy of this file in the directory where you intend to do the exercise. You will need to look at the document in order to see how the elements are structured.
    • Write an XSLT program to extract the titles of all books that have won the Booker prize. The output should be in HTML and each book title should be inserted inside double quote marks and should constitute a separate HTML paragraph.
    • Write an XSLT program to produce a table of winners of the Booker prize. The output should be in HTML, with a heading "Winners of the Booker Prize" (excluding the quotes). This should be followed by a table with column headings "Author", "Book title" and "Year" (excluding the quotes). Each row of the table should include the author, title and year of a Booker prize winner.

      Note that the rows in the table are ordered by author name. You can change this ordering by using the xsl:sort element. This empty element is placed as the contents of an xsl:apply-templates element or as the first child of an xsl:for-each element. Attributes include order, with values "ascending" (the default) and "descending", data-type, with values "text" (the default) and "number", and select, to order elements by the values of, for example, one of its child elements.

      Modify your program to order the table by descending year of award.

Links to more information

XSLT is covered in Chapter 6 and 7 of [Jacobs] and in Chapter 5 of [Moller and Schwartzbach].