Java Programming Notes # 2208
- Some Details Regarding XSLT
- Discussion and Sample Code
- Run the Program
- What's Next?
- Complete Program Listings
In the previous lesson entitled
Implementing Default XSLT Behavior in Java , I
explained default XSLT behavior,
and showed you how to write Java code that mimics default XSLT
The Java program named Dom11 that I developed in that lesson serves as
a skeleton for more
This lesson updates Dom11 into a new program that tests and
exercises several methods that were not
tested by the samples used in the previous lesson.
I will show that once you have a library of Java methods that emulate XSLT elements, it is no more difficult to write a Java program to transform an XML document than it is to write an XSL stylesheet to transform the same document.
JAXP is an
to help you write programs for creating and processing XML
documents. It is a critical part of Sun's Java Web Services Developer
This lesson is one in a series designed to help you understand how to use JAXP and how to use the JWSDP.
The first lesson in the series was entitled Java API for XML Processing (JAXP), Getting Started . The previous lesson was entitled Java JAXP, Implementing Default XSLT Behavior in Java.
XMLXML is an acronym for the eXtensible Markup Language. I will assume that you already understand XML, and will teach you how to use JAXP to write programs for creating and processing XML documents.
XSL and XSLT
XSL is an acronym for Extensible Stylesheet language. XSLT is an acronym for XSL Transformations.
- Transforming non-XML documents into XML documents.
- Transforming XML documents into other XML documents.
- Transforming XML documents into non-XML documents.
You may find it useful to open another copy of this lesson in a separate browser window. That will make it easier for you to scroll back and forth among the different listings and figures while you are reading about them.
I recommend that you also study the other lessons in my extensive collection of online Java and XML tutorials. You will find those lessons published at Gamelan.com. As of the date of this writing, Gamelan doesn't maintain a consolidated index of my tutorial lessons, and sometimes they are difficult to locate there. You will find a consolidated index at www.DickBaldwin.com.
A tree structure in memory
A DOM parser can be used to
create a tree structure in memory that represents an XML
document. In Java, that tree structure is encapsulated in an
object of the interface type Document.
Many operations are possible
Given an object of type Document (often called a DOM tree), there
can be invoked on the object to perform a variety of operations.
For example, it is possible to write Java code to:
- Move nodes from one location in the tree to another location in the tree
- Delete nodes
- Insert new nodes
- Recursively traverse the tree, extracting information about the nodes along the way
- Various combinations of the above
Two ways to
transform an XML document
There are at least two ways to transform the contents of an XML
document into another document:
- By writing Java code to manipulate the DOM tree and perform the transformation.
- By using XSLT to perform the transformation.
As is usually the case, there are advantages and disadvantages to
As an example of an advantage provided by XSLT, if it is possible to perform the required transformation using XSLT, that approach will probably require you to write less code than would be required to perform the same transformation by writing a Java program from scratch. However, I will show that once you have a library of Java methods that emulate XSLT elements, it is no more difficult to write a Java program to transform an XML document than it is to write an XSL stylesheet to transform the same document.
XSLT can be difficult
opinion, it is much easier to debug a Java program than it is to debug
an XSL stylesheet that doesn't work properly. However, the use of
a good XSLT debugger may resolve that difference.
provides more detailed control
library of Java methods
This is one of several lessons that show you
how to write the skeleton of a Java library containing methods that
emulate the most common XSLT elements. Once you have the library,
writing Java code to transform XML documents consists mainly of writing
a short driver program to access and use those methods. Thus,
given the proper library of methods, it is no more difficult to write a
Java program to perform the transformation than it is to write
not my primary purpose
However, my primary purpose in these lessons is not to provide such a library, but rather is to help you understand how to use a DOM tree to create, modify, and manipulate XML documents. By comparing Java code that manipulates a DOM tree with similar XSLT operations, you will have an opportunity to learn a little about XSLT in the process of learning how to manipulate a DOM tree using Java code.
Assume that an XML document has been parsed to produce a DOM
in memory that represents the XML document.
An XSLT processor starts examining the DOM tree at its root
obtains instructions from the XSLT stylesheet telling it how to
tree, and how to treat each node that it encounters along the way.
and applying matching template rules
As each node is encountered, the processor searches the stylesheet
looking for a template rule that governs how to treat nodes of that
type. If the
a template rule that matches the node type, it performs the operations
indicated by the template rule. If it doesn't find a matching
template rule, it
executes a built-in template rule appropriate to that node. (I explained the behavior of the built-in
template rules in the previous lesson.)
in the XSLT stylesheet elements
You can think of the XSLT process as operating on an input DOM tree to produce an output DOM tree. If the template rule being applied contains literal text, that literal text is used to create text nodes in the output tree.
An XPath expression can be
used to point to a specific node and to
establish that node as the context node. Once a context node is
established, there are at least two XSLT elements that can be used to
traverse the children of that node:
The first of these, xsl:apply-templates,
examines all child nodes of the context node that match
an optional select
attribute. If the optional select attribute is omitted, then
all child nodes of the context node are examined.
(When combined with a default template rule, this often results in a recursive examination and processing of all descendant nodes of the context node.)
As each child node is examined, it is processed using a matching template rule or a built-in template rule.Iterative operation
The second XSLT element in the above list, xsl:for-each, executes an iterative
examination of all child nodes of the context node that
match a required select attribute.
Note that unlike with the xsl:apply-templates
element, the select attribute
optional for this element.
The processor examines all child nodes of the context node that match the select attribute. As each child node is examined, it is processed using a matching template rule or a built-in template rule.
Let's see some code
I will begin by discussing the XML file named Dom12.xml (shown in Listing 25 near the end of the
lesson) along with
stylesheet file named Dom12.xsl
(shown in Listing 26).
A Java program named Dom12
After explaining the transformation produced by applying this stylesheet to this XML document, I will explain the transformation produced by processing the XML file with a Java program named Dom12 (shown in Listing 24) that mimics the behavior of the XSLT transformation.
The XML file shown in Listing 25 is relatively straightforward. A tree view of the XML file is shown in Figure 1. (This XML file is both well-formed and valid.) I used alternating colors of red and blue to identify successive nodes named theData. The reason for doing this will become apparent later.
#document DOCUMENT_NODE top DOCUMENT_TYPE_NODE #comment COMMENT_NODE #comment COMMENT_NODE dummy-target PROCESSING_INSTRUCTION_NODE xml-stylesheet PROCESSING_INSTRUCTION_NODE false-target PROCESSING_INSTRUCTION_NODE top ELEMENT_NODE theData ELEMENT_NODE Attribute: attr=Dummy Attr Value title ELEMENT_NODE #text Java subtitle ELEMENT_NODE Attribute: position=Low #text really part1 ELEMENT_NODE #text This is part 1 part2 ELEMENT_NODE #text This is part 2 #text rules author ELEMENT_NODE #text R.Baldwin price ELEMENT_NODE #text $9.95 theData ELEMENT_NODE title ELEMENT_NODE #text Python author ELEMENT_NODE #text R.Baldwin price ELEMENT_NODE #text $15.42 theData ELEMENT_NODE title ELEMENT_NODE #text XML author ELEMENT_NODE #text R.Baldwin price ELEMENT_NODE #text $19.60 Figure 1
(This tree view of the XML file was produced using a program named DomTree02, which was discussed in an earlier lesson. Note that in order to make the tree view more meaningful, I manually removed extraneous line breaks and text nodes associated with those line breaks. The extraneous line breaks in Figure 1 were caused by extraneous line breaks in the XML file. The extraneous line breaks in the XML file were placed there for cosmetic reasons and to force it to fit into this narrow publication format.)
A database of books
As you may already have figured out, this XML document represents a small database containing information about fictitious books.
It is important to note, however, that the structure and content of this XML file was not intended to have any purpose other than to illustrate the concepts being covered in this lesson. In other words, some of the structure makes no sense with regard to a database containing information about books.
Recall that an XSL stylesheet is itself an XML file, and can therefore be represented as a tree. Figure 2 presents an abbreviated tree view of the stylesheet shown in Listing 26. I colored each of the five template rules in this view with alternating colors of red and blue to make them easier to identify visually.
(As is often the case with XSL stylesheets, this stylesheet file is well-formed but it is not valid.)
#document DOCUMENT_NODE xsl:stylesheet ELEMENT_NODE Attribute: xmlns:xsl=http: //www.w3.org/1999/XSL/Transform Attribute: version=1.0 xsl:template ELEMENT_NODE Attribute: match=/ #textA Match Root xsl:apply-templates ELEMENT_NODE Attribute: select=top xsl:template ELEMENT_NODE Attribute: match=top #textB Match top xsl:apply-templates ELEMENT_NODE Attribute: select=theData xsl:template ELEMENT_NODE Attribute: match=theData #textC Match theData and show attribute xsl:value-of ELEMENT_NODE Attribute: select=@attr xsl:apply-templates ELEMENT_NODE Attribute: select=title xsl:template ELEMENT_NODE Attribute: match=title #text D Match title and show value of title as context xsl:value-of ELEMENT_NODE Attribute: select=. #textE Show value of subtitle xsl:value-of ELEMENT_NODE Attribute: select=subtitle xsl:apply-templates ELEMENT_NODE Attribute: select=subtitle xsl:template ELEMENT_NODE Attribute: match=subtitle #text F match subtitle and show value of attribute xsl:value-of ELEMENT_NODE Attribute: select=@position #text G Show value of subtitle as context node xsl:value-of ELEMENT_NODE Attribute: select=. Figure 2
The reason that I refer to this as an abbreviated tree view is because I manually deleted comment nodes and extraneous text nodes in order to emphasize the important elements in the stylesheet.
(Extraneous text nodes occur as a result of inserting line breaks in the original XSL document for cosmetic purposes. Note that I also manually entered a line break in the third line of Figure 2 to force the material to fit into this narrow publication format.)
The root element
The root node of all XML documents is the document node. In addition to the root node, there is also a root element, and it is important not to confuse the two.
As you can see from Figure 2, the root element in the XSL document is of type xsl:stylesheet. The root element has two attributes, each of which is standard for XSL stylesheets.
The first attribute points to the XSLT namespace URI, which you can read about in the W3C Recommendation. The second attribute provides the XSLT version.
Children of the root element node
The root element node in Figure 2 has five child nodes, each of which is a template rule. (I discussed template rules in detail in the previous lesson.)
Each of the five child nodes of the root node has a match pattern. The five match patterns in the order that they appear in Figure 2 are as follows:
- match=/ (root node)
- match=top (matches element node named top)
- match=theData (matches element node named theData)
- match=title (matches element node named title)
- match=subtitle (matches element node named subtitle)
(Note that the Java program discussed later produces essentially the same output as the XSLT transformation.)
The result of performing an XSLT transformation by applying the XSL stylesheet shown in Listing 26 to the XML file shown in Listing 25 is shown in Figure 3.
I will explain the operations in the XSLT transformation that produced each line of text in Figure 3.
<?xml version="1.0" encoding="UTF-8"?> A Match Root B Match top C Match theData and show attribute Dummy Attr Value D Match title and show value of title as context Java really This is part 1 This is part 2 rules E Show value of subtitle really This is part 1 This is part 2 F match subtitle and show value of attribute Low G Show value of subtitle as context node really This is part 1 This is part 2 C Match theData and show attribute D Match title and show value of title as context Python E Show value of subtitle C Match theData and show attribute D Match title and show value of title as context XML E Show value of subtitle
(Note that I manually deleted a couple of extraneous line breaks from the output shown in Figure 3.)
The first line of text in the output shown in Figure 3 is an XML declaration that is produced automatically by the XSLT transformer available with JAXP.
(Note however, that the existence of this line of text doesn't cause the document to be an XML document. This document cannot be parsed as an XML document. An attempt to do so results in various parser errors.)