XML and Scripting Languages

Wednesday Oct 18th 2000 by Piroz Mohseni

Piroz Mohseni looks at how some of the most popular scripting languages today work with XML.

Scripting languages have been helping programmers crunch data for a long time. For years, Unix shell scripting languages have been used to process system logs, configuration files, and other text files. Lately, newer scripting languages like Perl, Tcl, Python, PHP, and REBOL have become popular for handling data in flat files.

In the past, these flat files were generally filled with straight text, sometimes comma-delimited or tab-delimited, sometimes unstructured. Today, however, they might be filled with XML, as XML rapidly becomes the standard platform-independent data format. But just how good are different scripting languages at processing XML?


The instructions that make up computer programs are well-structured and could potentially be expressed in XML. Not surprisingly, new scripting languages that use XML are emerging. XMLScript (http://www.xmlscript.org) is one such language where XML documents make up both the data and the program.

Interestingly, the literature at the XMLScript Web site compares XMLScript with XSLT. XSLT - XML stylesheets - essentially solves the problem of data transformation, processing XML data according to the rules spelled out in stylesheets. Data transformation, however, is really a programming problem, not a stylesheet problem. What's more, XSLT constructs are not always intuitive, and can be hard to understand. XMLScript provides an alternative to the XSLT, one that may be more intuitive, as well as less expensive in terms of performance.

Here is a short code segment in C++:

int main() {
      for (int count = 1; count <= 10; count ++) {
              printf("hello world");

The above code segment would be expressed like this in XMLScript:

    <_for from="1" to="10">
         hello world
In addition to XMLScript, there is a XSLScript. XSLScript also primarily solves the fundamental data transformation problem, but it provides a more familiar and easier to write syntax for writing XSL. The "compiler" will take XSLScript code and create regular XSLT instructions that are then processed by an XSLT processor. There are some shortcuts built within XSLScript that should eliminate extra typing. There is one-to-one mapping between XSLScript and XSLT instructions. If the instruction is already in XSLT, the compiler will not translate it and therefore it is possible to easily mix XSLT and XSLScript instructions.


Perl has its roots in text processing and there are many CGI scripts written in Perl. Perl modules allow for creation of independent code performing specific functions. Support of XML is well established among the Perl community. The XML::Parser::PerlSAX module supports the SAX event-oriented interface. XML::DOM supports the DOM interface. The following lines parse the XML document and create the DOM tree:
use XML::DOM
$parser = new XML::DOM::Parser (NoExpand => 1);
$doc = $parser->parsefile ($cfgfile);
You can then call various subroutines to navigate the tree.

The XML::Twig module builds partial trees which are good for handling large XML documents. One of the big problems with with Java-based parsers when they first came out was that they would only parse complete XML documents, although often, you are only interested in a portion of a large XML document. The XML::Twig module gives you that flexibility and reduces the overhead of parsing the entire document.


The closest Javascript comes to supporting XML is Xparse. This is a relatively small piece of Javascript code, not a full parser, but it can read an XML document and draw a tree representation of it. It understands Elements, PI, Comments and text. The code can be found at http://www.jeremie.com/Dev/XML. It is a good piece of code to read if you are interested in learning how XML parsing is done. Since it is not a full XML parser, it is less complex, and therefore easier to read.


Tcl is another common scripting language. With commercial products like StoryServer - a Web content management system from Vignette Corp.--which is written in Tcl, it's clear that XML support for Tcl is needed. The TclXML package from Steve Ball is available from http://www.zveno.com/zm.cgi/in-tclxml. It contains two parsers. The first is a Tcl interface to James Clark's expat XML parser and is called TclExpat. The other is a parser written in Tcl and is commonly referred to as the native TclXML parser.


Python's support of XML is worth mentioning as well. As with many Python modules, the best place to start is the Python XML SIG. And the PyXML distribution is the best place to start. It may not have all the latest modules, but collectively it includes much of what you need to get started with XML. One of the modules, xmllib, is a non-validating and low-level parser. It works similar to SAX in that the application programmer overrides a series of methods to handle various document elements.

For speed, you should consider pyexpat which is a wrapper for the expat parser (which is written in C). pyexpat is a non-validating parser. If you are looking for a validating parser, the only one I was able to find is xmlproc which is written in Python. The PyXML distribution includes support for SAX and DOM. Given the dynamic nature of the SIG, you should always check the web site for the latest in Python XML development.


PHP was one of the first scripting languages that embraced XML through James Clarke's expat library. The latest version of the language, PHP4, includes support for W3C DOM. Another effort within the PHP community is to fully support Java classes. Once that is done, then PHP can utilize XML parsing and XSL processing capabilities of Java. It seems, however, that the current path is to develop XML capabilities both natively within the language itself, and through interfaces to Java. You can learn more about PHP and its XML capabilities by visiting http://www.xmlhack.com/read.php?item=338 and http://www.php.net/manual/ref.xml.php.


Scripting languages have been helping programmers crunch data for years, and they're quick, effective solutions. Since XML is rapidly becoming a popular data format, it makes sense for scripting languages to embrace it and provide programmers with interfaces for manipulating, reading, and writing XML documents. Popular scripting languages have done that by supporting SAX, DOM and where it makes sense they have introduced innovative solutions based on the strengths of the particular scripting language. We have also seen "new" languages forming where the program itself is an XML document. The dynamics are exciting and very encouraging.

Author Bio

Piroz Mohseni is President of Bita Technologies, a consulting company which focuses on business improvement through effective usage of technology. His areas of interest include enterprise Java, XML, and e-commerce applications. Contact him at mohseni@bita-tech.com.

Mobile Site | Full Site
Copyright 2017 © QuinStreet Inc. All Rights Reserved