XML and Scripting Languages

by Piroz Mohseni

In the old days, when data was stored in plain text files, scripting languages were quite popular. With a lot of data now moving to XML, it becomes an interesting exercise to see how scripting languages have evolved to support XML-based data.

In the old days, when data was stored in plain text files, scriptinglanguages were quite popular. They provided a quick way to read, process andwrite text data, as compared with the traditional compiled languages. Languageslike Perl gained a lot of popularity because of their ability to do regularexpressions and handle text. With a lot of data now moving to XML,it becomes an interesting exercise to see how scripting languages have evolvedto support XML-based data.

Perl has its roots in text processing and there are many CGI scripts writtenin Perl. Perl modules allow for creation of independent code performing specificfunctions. As suspected, support of XML and XML-related standards is wellestablished among the Perl community. The XML::Parser::PerlSAXmodule supports the SAX event-oriented interface. XML::DOMsupports the DOM interface. For example, the following lines parse the XMLdocument and create the DOM tree:

use XML::DOM
$parser = new XML::DOM::Parser (NoExpand => 1);
$doc = $parser->parsefile ($cfgfile);

You can then call various subroutines to navigate the tree.

The XML::Twig module builds partial trees whichare good for handling large XML documents, something that many Java-basedparsers still have trouble with. Many times, you are only interested in aportion of the large XML document and the XML::Twigmodule gives you that flexibility and reduces the overhead of parsing the entiredocument. You can get a good feel for the depth and scope of XML coverage amongthe Perl community by visiting http://search.cpan.org/and searching for XML.

Pythons support of XML is worth mentioning as well. As with many Pythonmodules, the best place to start is the Pythons XML SIG (http://www.python.org/sigs/xml-sig/).The PyXML distribution (http://sourceforge.net/projects/pyxml) is the best place to start. It includes much of what you need to getstarted with XML. For example, xmlproc is a basic validating XML parser, PySAXgives you SAX 1 and SAX 2 compliant drivers, and 4DOM is a DOM Level 2implementation. You should also visit http://pyxml.sourceforge.net/topics/software.htmlfor additional XML-related software including soaplib, Pyxie (parsers), XSV(schema validator), and PyTREX (a Python implementation of TREX).

Tcl is another common scripting language. The TclXML package from Steve Ballof Zveno is available from http://www.zveno.com/zm.cgi/in-tclxml.It contains two parsers. The first is a Tcl interface to James Clarks expatXML parser and is called TclExpat. The other is a parser written in Tcl and iscommonly referred to as the native TclXML parser. Tcl is also used to write XSLTextensions.

Scripting languages have been helping programmers crunch data for many years.They are very effective when quick solutions need to be created. Since XML israpidly becoming a popular data format, it makes sense for scripting languagesto embrace it and provide programmers with interfaces for manipulating, readingand writing XML documents. Popular scripting languages have done that bysupporting SAX, DOM and where it makes sense they have introduced innovativesolutions based on the strengths of the particular scripting language. Thedynamics are exciting and very encouraging.

About the Author

Piroz Mohseni is president of BitaTechnologies, focusing on business improvement through the effective use oftechnology. His areas of interest include enterprise Java, XML, and e-commerceapplications.

# # #

This article was originally published on Thursday Apr 26th 2001
Mobile Site | Full Site