Part two of a two-part article
In part one of this article: Scripting in and around Java , we introduced a large number of scripting languages based on the Java platform. We found that some were implementations or hybrids of existing scripting languages, some were created to script regular Java code, and others were new languages in their own right. In this part, we take a closer look at two of the scripting languages introduced previously, Compaq's Web Language (formerly known as WebL) and JPython. We'll explore these languages more closely and include some example code that show off language features and capabilities.
JPython is an implementation of the Python programming language targeted to the Java virtual machine. It has all the features of the Python programming language, including the standard Python library, and provides easy access to the entire Java class library. JPython was developed as an alternative implementation of the Python programming language that could exploit the portability of the Java virtual machine.
Python is a scripting language that is similar in many ways to Perl and Tcl. The main distinguishing characteristic is that Python is generally considered easier to read and write. It's an object-oriented language with a simple object model, and the syntax is fairly Pascal-like, avoiding "line-noise" modifiers and operators. Many people find Python to be a good language for engineering large applications for reuse in a rapid manner.
Like most scripting languages, Python is dynamically typed; the programmer does not specify types for variables in the source code, and any type errors can only be caught at runtime. Python code can be put into source files and executed, or it can be typed directly into the interpreter at a prompt, executing each statement as it is written. This feature is highly useful when combined with JPython. It is incredibly easy to access Java classes from JPython; consequently, users can easily prod and poke Java applications interactively. This functionality makes it easy to tweak Swing or AWT interfaces, easily adding widgets, seeing how they look, then removing or repositioning them.
(J)Python provides all the things one expects from a very high-level language, including high-level data types such as hash tables (associative arrays) and lists of arbitrary length. It handles memory management automatically, as would generally be expected from this class of language. It also allows for dynamic evaluation of Python code. In the JPython environment, Python code is compiled to Java bytecode and run on the fly.
A tutorial on Python by Guido van Rossum can be found at Python.org.
JPython vs. Python
There are some slight differences between Python and JPython, although they strive to be the same language. One of the most obvious differences is the accessibility of particular libraries. Python has its own standard library, and JPython has almost complete access. But JPython also gives access to the entire Java library, including Java's GUI facilities. In JPython, one can even subclass Java classes. For example, implementing the canonical "Hello, World!" applet in JPython is done by extending Java's Applet class:
import java class ExampleApplet(java.applet.Applet): def paint(self, gc): gc.drawString("Hello, world!", 0, 0);
Notice that we've overloaded the
method in Java's Applet class. Instead of one argument, though, it now has two. That is because Python methods do not have an implicit notion of "this", as exists in Java. Instead, "this" is implicitly passed as a first parameter when a method is called. Therefore, when calling "paint," pass only one argument, but declare "paint" with two arguments. By tradition, the first argument is always called "self".
One interesting issue is that Java allows for methods to have multiple definitions, each with a different signature. Since Python is dynamically typed, there cannot be multiple signatures. So what does JPython allow? Well, from Python, all overloaded methods appear to be a single method. The proper method to call is selected at runtime, based on the actual dynamic types of the parameters. There are other little differences in the two languages that are handled seamlessly. For example, Python's constructors are always named
, whereas in Java, they are named after the class. JPython seamlessly translates calls to
in a Java object.
Another interesting difference between the two languages is that Java only allows single inheritance, whereas Python allows for multiple inheritance. JPython's solution is to allow multiple inheritance, unless inheriting from a class implemented in Java, in which case only single inheritance is allowed.
There are several more differences between the languages. Most of them would not be things you would generally notice. For example, where JPython will treat "001.1" as a floating point literal, non-Java implementations of Python will not allow it. More noticeably, JPython's string routines have Unicode support that Python still lacks. Many of the differences between the two implementations are actually more correct in JPython than Python! There are only a few places where JPython is not as functional as Python. In particular, there are a few modules in the Python standard library not available for JPython, including the interface to the TK widget set. That is no huge loss, since AWT and Swing are far more portable.
Below is a GUI-driven program that allows you to grep a mailbox for messages matching a certain specification. The pattern matching is done via regular expressions. Any matching messages are dumped to a temporary mailbox. The UI uses AWT and is very primitive; there are plenty of improvements one could easily make to this program. We went for bare-bones functionality to simplify the example; we wanted to be able to show how a JPython program could use both Java libraries (e.g., java.awt) and Python libraries (e.g., the regular expression library) at the same time.
On Unix machines where JPython lives in
, you can save this script as "mailgrep",
it, and then run it as a command. Otherwise, you should save it in "mailgrep.py" and run it as such:
Notice in the example that Java's ActionListener functionality is circumvented by using a JPython shortcut. The actionPerformed parameter to AWT constructors automates all that work. JPython also takes advantage of JavaBeans properties, allowing the Python programmer to access them directly as if they were variables.Figure 1.
Mail Grep Utility GUI.Listing 1. Source to grep a mailbox for messages matching a certain specification.
Compaq's Web Language or WebL
Originally developed at Digital, before being bought by Compaq, Compaq's Web Language (formerly WebL) was created to process documents over the Internet, specifically, HTML documents over HTTP. However, the prior simple introductory statement does not due justice to the breadth and depth of Compaq's Web Language's functionality. Compaq's Web Language (CWL hereafter) can also speak any Internet protocol that Java can, handles three different DTDs for HTML, and can also process XML documents. Granted, vanilla Java and probably endless other scripting languages can do these same things with the correct amount of engineering work. However, CWL provides some fairly powerful abstractions that hide the details and allow the CWL script author to focus on the functional logic needed to complete the task.
CWL's primitive or "value" types include all the usual suspects. Types such as Bool, Char, String, Int, and Real should be familiar to every programmer. CWL also provides Set and List, Fun(ction) and Meth(od), and Object. Objects in CWL contain fields, and fields can be any value type. Object-oriented programming in CWL is accomplished by adding Fun's and Meth's to Object types.
Since programming for the Internet can be a bit error-prone due to Web pages disappearing, network traffic problems, and the occasional distributed denial-of-service attack, CWL builds upon Java's exception-handling mechanism to provide a robust method for dealing with network connectivity issues. CWL services are used to return "page" objects, described in the next section. With a few simple statements, scripts can be written to fetch a URL from the Internet that can: 1) timeout and retry the same or a different URL; 2) attempt to connect to two different URLs, taking the result from the first service to finish; 3) keep trying to retrieve the URL forever; or 4) a combination of 1, 2, or 3 that makes sense for your application.
The two main "service" routines are
. These functions take a URL and optional query parameters to simulate HTML form submissions and CGI queries. Both
return a Page object. Two service calls can be combined for either sequential or concurrent execution using the "?" or "|" operators, respectively. The
functions also return Page objects but take other service routines as parameters.
accepts a number of milliseconds, and
will keep at it until it succeeds.
When a Service or combination of Services succeeds, a Page object is returned. A Page object represents the hierarchical XML or HTML content of the requested URL. Content is parsed based upon the MIME type of the retrieved data. Given that most Web browsers will accept and render HTML that is poorly or downright incorrectly formatted, CWL must occasionally make changes to HTML content in order to create a proper hierarchical representation. A Page object is similar to the Document object in the W3C's DOM structure model. It would be nice if a CWL Page object did follow the DOM structure, but that's a choice CWL made before the DOM existed.
CWL provides a set of routines that operate on Page objects that allow for searching and modification of its content. There are also three Page-related data types: Tag, Piece, and PieceSet.
Using the standard modules packaged with CWL is accomplished through the import statement. Importing a module provides a script with all exported Objects and variables from the module. Modules are CWL statements stored in ".webl" files that are executed when imported. CWL comes with just under a dozen modules. Modules for Strings, URLs, and Cookies are self-explanatory. CWL also includes a WebServer module and a module to control a Web browser. Probably the most powerful module is the Java module.
The Java module allows CWL access to any Java class in the CLASSPATH when CWL is invoked. Java objects can be created and their public methods invoked through the use of the Java module and the special "j-object" CWL type. This module expands CWL's functionality to include any Java APIs not explicitly available through a module or through a built-in.
There is some slight overhead of converting CWL primitive values into Java value types, and vice-versa. Because of the type conversion that must be done for value types when going from CWL to Java, in the case of overloaded method or constructor calls where value types are overloaded, CWL uses the "widest" value type to determine which overloaded call to invoke.
For example, the Java module can be used to gain access to the JDBC API. Just about every portal provides Stock Quotes. Using CWL built-in functionality, a page with stock quotes can be retrieved from the portal's site, and the Ticker symbols and values can be extracted from the Page object, and the values can be stored in a database using the Java module to access the JDBC API. This script could then be run every day after the markets close or periodically during the day to watch for sudden movements.
Listing 2. Using the Java module.
In this article, we've explored two Java-based scripting langauges. For those already fluent in Python, JPython should be an easy next step. JPython is also quite good for automating unit and system testing, since it can work directly with Java classes and objects. Compaq's Web Languages specializes in fetching, parsing, and generating HTML and XML content.
About the Authors
Tom O'Connor is an application engineer at Surety.com.
John Viega is a senior research associate and consultant at Reliable Software Technologies, as well as an Adjunct Professor of Computer Science at the Virginia Polytechnic Institute. He is the author of Mailman, the GNU Mailing list manager, and its4, a tool for finding security vulnerabilities in C and C++ code.