What's New in XPath 2.0?

by Steven Holzner

Get a head start on XPath 2.0 by reviewing the new features.

As of this writing, XPath 2.0 is still in Working Draft form, but it's now stabilized, giving us the chance to work with it. XPath 2.0 is described this way by W3C—just as you'd describe XPath 1.0, in fact:

"The primary purpose of XPath is to address parts of an XML document. XPath uses a compact, non-XML syntax to facilitate use of XPath within URIs and XML attribute values. XPath gets its name from its use of a path notation as in URLs for navigating through the hierarchical structure of an XML document."

Although the primary purpose of XPath hasn't changed in this new version, much of the actual specification has. You'll still be able to use the familiar path steps, each made up of an axis (XPath 2.0 uses the same axes as XPath 1.0), followed by a node test, followed by a predicate. However, much of the terminology has changed, along with some basic concepts—for example, XPath supports sequences instead of node-sets. I go into more detail on this in my book XPath Kick Start : Navigating XML with XPath 1.0 and 2.0.

XPath 2.0, XQuery 1.0, and XSLT 2.0 are all tied together, and XPath 2.0 is the common denominator. The W3C groups working on these standards have been working together closely. One way of looking at what's been going on is that XSLT 2.0 and XQuery 1.0 are designed to share as much as possible—and that what they share is in fact XPath 2.0.

So why XPath 2.0? What's it got that XPath 1.0 doesn't have? There are many answers, but one of the main ones is support for new data types. As you know, XPath 1.0 supports only these data types:

  • string

  • boolean

  • node-set

  • number

That was okay long ago, but things have changed—in particular, W3C has been moving toward XML schema for its data types. Supporting new data types based on XML schema means that XPath 2.0 supports all the simple primitive types built into XML schema. There are 19 such types in all, including many that XPath 1.0 doesn't support, such as data types for dates, URIs, and so on.

The XPath 2.0 data model also supports data types that you can derive from these data types in your own XML schema. We're going to see how to work with these various types ourselves.

XML Schema - If you're not familiar with XML schema, you can get all the details at http://www.w3.org/TR/xmlschema-0/, http://www.w3.org/TR/xmlschema-1/, and http://www.w3.org/TR/xmlschema-2/. Another good resource is the book Sams Teach Yourself XML in 21 Days (ISBN: 0672325764).

XPath 2.0 also gives you tremendously more power than XPath 1.0 did. There are dozens of new built-in functions that you can use now, and many more operators. These functions and operators are far more type-aware than what we've seen in XPath 1.0.

Also new in XPath 2.0 are sequences, which replace the familiar node-sets from XPath 1.0. In fact, all XPath 2.0 expressions evaluate to sequences, as we're going to see. And you can also use variables in XPath 2.0.

The current working draft for XPath 2.0 is at http://www.w3.org/TR/xpath20/. This document tells you about XPath 2.0 in some detail, but it doesn't provide the whole story. In addition, there are documents outlining the XPath 2.0 data model—which tells you how XPath 2.0 sees an XML document—the data types used in XPath 2.0, and the functions and operators available. Here's the list:

You still create location paths in XPath 2.0, of course, and build them from location steps. A location step, as in XPath 1.0, can contain an axis, a node test, and a predicate. The allowable axes are the same as in XPath 1.0.However, there are differences already—the namespace axis is considered deprecated in XPath 2.0, which means it's considered obsolete. It's included for backward compatibility, but is not available at all in XQuery 1.0.

Handling Nodes

Although the data types have changed, the node kinds are more or less the same in XPath 2.0 compared to XPath 1.0. As you recall, you can have these kinds of nodes in XPath 1.0: root nodes, element nodes, attribute nodes, processing instruction nodes, comment nodes, text nodes, and namespace nodes. There is one difference in XPath 2.0, however—root nodes are now called document nodes instead, ending a long-standing confusion.

Handling Data Types

As also mentioned, one of the main motivations behind XPath 2.0 was to expand the data types available. XPath 1.0 supported Booleans, node-sets, strings, and numbers, but that was pretty basic. XPath 2.0 supports all the primitive simple types built into XML schema, as well as the types you can derive by restriction from the primitive simple types, which gives you a great deal more control over data typing. Here are the simple primitive types—the xs namespace corresponds to "http://www.w3.org/2001/XMLSchema":

  • xs:string

  • xs:boolean

  • xs:decimal

  • xs:float

  • xs:double

  • xs:duration

  • xs:dateTime

  • xs:time

  • xs:date

  • xs:gYearMonth

  • xs:gYear

  • xs:gMonthDay

  • xs:gDay

  • xs:gMonth

  • xs:hexBinary

  • xs:base64Binary

  • xs:anyURI

  • xs:QName


Besides these types, you can also use types derived from primitive simple types by restriction. Collectively, these simple primitive types and the types derived from primitive simple types by restriction are called atomic types. And XPath 2.0 sequences can contain both atomic types and nodes.

Working with Sequences

Every XPath 2.0 expression (that is, anything an XPath processor can evaluate, including expressions that return nodes from a document or string values and so on) evaluates to a sequence. Here's the XPath 2.0 definition of a sequence:

  • A sequence is an ordered collection of zero or more items.

  • An item is either an atomic value or a node.

  • An atomic value is a value in the value space of an XML Schema atomic type, as defined in the XML Schema specification. Atomic values can either be simple primitive types, or be derived by restriction from these types.

  • A node is one of the seven node kinds described in the XQuery 1.0 and XPath 2.0 Data Model document.

  • A sequence containing exactly one item is called a singleton sequence. An item is identical to a singleton sequence containing that item.

  • A sequence containing zero items is called an empty sequence.

Sequences can contain nodes or atomic values. As we've seen, an atomic value is a value of one of the 19 built-in simple primitive data types defined in the XML schema specification, or a type derived from them by restriction.

Sequences are the successor to node-sets—besides nodes, they also let you work with simple data items. The term "sequence" is really a catch-all way to refer to data you can work with in XPath 2.0, either an atomic value or a node, or a collection of such items. Sequences can be made up of a single item or multiple items; it's all the same to XPath 2.0. Giving them one name, sequence is an easy way to let you handle single or multiple items (even though the term "sequence" is not very apt for single items).

Sequences can be constructed with this kind of syntax: (1, 2, 3), which is a sequence of the atomic values 1, 2, and 3. In fact, the comma is an operator in XPath 2.0—the sequence construction operator. You can also extract items from a sequence using the [] operator. Here's an example:

(4, 5, 6)[2]

This expression returns the value 5. You can also use the range operator, to, to create sequences, as in this example:

(1 to 1000)

Note that you cannot nest sequences—that is, if you have a sequence (1, 2) and then try to nest that in another sequence as ((1, 2), 3), the result is simply the sequence (1, 2, 3).

Sequences are also ordered, which is different from node-sets in XPath 1.0. For example, take a look at this sequence:

(//planet/mass, //planet/name)

Here, we're creating a sequence in which <mass> elements from our planetary data XML document come before <name> elements—which is the opposite of the way these elements appear in actual document order. But the order of these elements as we've specified them is preserved in the sequence we're creating here.

Here's another way in which XPath 2.0 differs from 1.0—sequences, unlike node-sets, can have duplicate items. For example, take a look at this sequence:

(//planet/mass, //planet/name, //planet/mass)

Here, we're creating a sequence of all <mass> elements, followed by all <name> elements—followed by all <mass> elements again. This is legal in sequences, but not in node-sets. (In fact, the very definition of XPath 1.0 node-sets precludes duplicate items.)

Ordered Versus Unordered Sequences - Here's something to know behind the scenes about sequences versus node-sets. W3C wanted to make life a little easier for people moving from XPath 1.0 to 2.0, so the way sequences are constructed is designed to be somewhat node-set friendly.

Although node-sets are unordered, node-sets are usually constructed in document order. XSLT 2.0 is designed to work on sequences in sequence order, but in order to be compatible with XPath 1.0, path expressions are designed to always return their results using document order by default.

Also, duplicates are removed from the results by default, which means the sequence you get from a path expression is usually going to be the same as the node-set you'd get.

So that's what sequences are all about in general—instead of only supporting one multiple-item construct, the node-set, XPath 2.0 supports sequences, which can contain multiple simple-typed data items as well as nodes.

The for Expression

Sequences are more than just a new concept—XPath 2.0 is really centered around them. There are whole new expressions designed to work with sequences, such as the for expression. This expression is designed to let you handle sequences by looping, or iterating, over all items in a sequence.

Here's a preview that also puts XPath 2.0 variables to work. Say that you wanted to find the average planetary mass in our planets example. Doing that with the for expression is easy—here's what that might look like:

for $variable in /planets/planet return $variable/mass 

Notice what we're doing here—we're using the for expression to loop over all <mass> values. We do that with a variable, something new for us in XPath, named $variable. Variables in XPath 2.0 start with a $ preceding a normal XML-legal name, so you can use any legal XML name here, like $var, $numberProducts, $name, and so on.

We're using the path expression /planets/planet to return a sequence holding all <planet> elements in the document. How do we return the <mass> elements of these <planet> elements in a sequence? We can use the return keyword, as you see here. In this case, the expression we want to return each time through the loop is $variable/mass, and because $variable holds a new <planet> element each time through the loop, we'll get a sequence of all <mass> elements this way.

To get the average mass of the planets, you could use the avg function this way:

avg(for $variable in /planets/planet return $variable/mass)

Note that you could also write our for expression as

for $variable in /planets/planet/mass return $variable 

This does the same thing that the expression /planets/planet/mass does—it returns a sequence of <mass> elements. Here's another example, where we're multiplying the miles per gallon of a number of cars by their fuel capacity to get their total operating ranges:

for $variable in /cars return $variable/milesPerGallon * $variable/gasCapacity 

That's how the for expression works in general, like this:

for variable in sequence return expression 

The if Expression

Besides the for expression, you can now use the conditional if expression in XPath 2.0. Being able to use conditional expressions like if and loop expressions like for in XPath adds a lot of the programming power of true programming languages to XPath 2.0.

Here's an example of an if expression, which finds the minimum of two temperatures (which you can also do with the XPath 2.0 min function):

if ($temperature1 < $temperature2) then $temperature1 else $temperature2 

Here, we're comparing the value in $temperature1 to the value in $temperature2. If $temperature1 holds a value that is less than the value in $temperature2, this if expression returns the value in $temperature1; otherwise, it returns to the value in $temperature2.

This has the feel of a true programming language, and there's a lot more power here than in XPath 1.0. Now, you're allowed to branch to different expressions based on a test expression.

The some and every Expressions

You can use a rudimentary form of a conditional expression in XPath 1.0—for example, this expression, as used in a predicate:

/planets/planet[1]/name = "Mars"

would be true if any <name> element in the first <planet> element had the text "Mars".

XPath 2.0 extends this kind of checking. You can either perform the same test in XPath 2.0 using the same syntax, or you can use the some expression. Using the some expression means that at least one item in a sequence satisfies an expression given with a satisifies predicate, like this:

some $variable in /planets/planet[1]/name satisfies $variable = "Mars"

In this case, this expression returns true if at least one <name> element in the first <planet> element has the text "Mars".

You can also perform other kinds of tests here, such as this expression, which is true if any <radius> element in the first <planet> element contains a value greater than 2000:

some $variable in /planets/planet[1]/radius satisfies $variable > 2000

You can also insist that every <radius> element in the first <planet> element contains a value greater than 2000 if you use the every expression instead of some, like this:

every $variable in /planets/planet[1]/radius satisfies $variable > 2000

Unions, Intersections, and Differences

In XPath 1.0, you could use the | operator to create the union (that is, the combination) of two sets, as in this case, where we're matching attributes and nodes in an XSLT template:

<?xml version="1.0" encoding="utf-8"?>
<xsl:stylesheet version="1.1" 
<xsl:output method="xml"/>

 <xsl:template match="distance[preceding::*/name='Mercury']">
  <distance>This planet is farther than Mercury from the sun.</distance>
 <xsl:template match="@*|node()">
   <xsl:apply-templates select="@*|node()"/>


In XPath 2.0, you can create not only unions like this, but also intersections, which contain all the items two sequences have in common, and differences, which contain all the items that two sequences have that are not in common.

Let's take a look at how this works. For example, to get the same result as the previous XPath 1.0 example, we can use the union operator in XPath 2.0:

<?xml version="1.0" encoding="utf-8"?>
<xsl:stylesheet version="2.0" 
<xsl:output method="xml"/>

 <xsl:template match="distance[preceding::*/name='Mercury']">
  <distance>This planet is farther than Mercury from the sun.</distance>
 <xsl:template match="@* union node()">
   <xsl:apply-templates select="@* union node()"/>


In addition, XPath 2.0 introduces the intersect operator, which returns the intersection of two sequences (that is, all those items they have in common). For example, if the variable $planets holds a sequence of <planet> elements, we could create a sequence of <planet> elements that $variable has in common with the planets in our planetary data document, like this:

$planets intersect /planets/planet

To find the difference between two sequences, you can use the except operator. For example, if you wanted to find all items in $planets that were not also in the sequence returned by /planets/planet, you could use except this way:

$planets except /planets/planet

Here's something else that's new in XPath 2.0—you can now specify multiple node tests in location steps. Here's an example:


Here's what that would look like in XPath 1.0:

planets/mass/text() | planets/day/text()

And, as already mentioned, there are many new functions coming up in XPath 2.0. One of the specific tasks that W3C undertook in XPath 2.0 was to augment its string-processing capabilities. Accordingly, you'll find more string functions in XPath 2.0, including upper-case, lower-case, string-pad, matches, replace, and tokenize.

Note in particular the matches, replace, and tokenize functions—these functions use regular expressions, a powerful new addition to XPath. Regular expressions let you create patterns to use in matching text. Regular expression patterns use their own syntax—for example, the pattern \d{3}-\d{3}-\d{4} matches U.S. phone numbers, like 888-555-1111. Being able to use regular expressions like this is very powerful because you can match the text in a node to the patterns you're searching for.


You can also create XPath 2.0 comments using the delimiters (: and :). Here's an example:

(: Check for at least one planet with the name Mars :)
some $variable in /planets/planet[1]/name satisfies $variable = "Mars"

Comments may be nested.

That completes our XPath 2.0 overview—now you've gotten an idea of the kinds of things that are different in XPath 2.0. Besides what we've seen in these few examples, there are plenty of additional new expressions coming up, such as cast, treat, and instance of. My next article will provide you with some XPath 2.0 examples.

About the Author

Steven Holzner is an award-winning author who has been writing about XML topics such as XSLT as long as they've been around. He's the author of XPath Kick Start : Navigating XML with XPath 1.0 and 2.0 (published by Sams Publishing), and has written 67 books, all on programming topics, selling well over a million copies. His books have been translated into 16 languages around the world and include a good number of industry bestsellers. He's a former contributing editor of PC Magazine, graduated from MIT, and received his Ph.D. at Cornell. He's been on the faculty of both MIT and Cornell, and also teaches corporate seminars around the country.

This article was originally published on Friday Apr 23rd 2004
Mobile Site | Full Site