May 25, 1999
Data access via the Web
by Piroz Mohseni
When the Web revolution began, there was a huge rush to "Web-enable" various corporate resources, especially databases.
This phenomenon perhaps was best exemplified (and publicized) by the Web version of the Federal Express package-tracking system. Any customer from anywhere could get tracking information from the Fed Ex Web site. The push for bringing databases to the Web world is still going on. In fact, some corporate databases use the Web as the only user interface. We are, however, seeing another phenomenon emerging, and that is applications that act as integrators for other applications.
Suppose a company has three major database systems. Call them A, B, and C. Each one has been Web-enabled via various technologies such as servlets, CGI, and application servers (e.g., Cold Fusion). It is now conceivable to write a Web application that uses systems A, B, and C as its data sources and provides a consistent user interface to all three. You no longer need direct access to a database via technologies like ODBC, JDBC, and CORBA, but can use the Web interface that was already created for the database to access it. Granted, this integration scheme will not work for all cases. For example, if the data or feature you need is not Web-enabled, then you need to find other ways to access the needed data.
Let's take an integrated package-tracking system. You can ask the user to type the tracking number and select the shipper from a pull-down menu. Your program would then access the appropriate Web site and interact with it as if a user was visiting the site. It would retrieve the tracking information and present it to the user. The user only will see your user interface; the back-end processing is hidden. Fortunately, most programming languages do provide support for HTTP, which means you can simulate a browser programmatically.
In order to do this, you need to familiarize yourself with the site you want to integrate into your application. You need to find out what program processes which HTML form and what the output of each program is. In HTTP, FORM data is sent to a back-end program (e.g., CGI, servlet) via either a GET method or a POST method. Once you have this information, you are ready to begin your integration. For discussion purposes, we assume there is an HTML form that accepts a last name and an ID number and will return the person's telephone, fax, and e-mail. We will show code fragments for how such a simple interaction can be accomplished in Perl (CGI) and Java (servlet). Here is the HTML form that is probably displayed in a page with advertisements and other promotional material surrounding it.
PerlIt is very easy to simulate a browser in Perl. Almost all relevant HTTP-related functions are wrapped in a package called Library for WWW access in Perl (LWP). Specifically, for our case, we use LWP::UserAgent, which is simply a mini-browser for our purposes. Since you are accessing the Web resources programmatically (not through a GUI-based browser) you have to become familiar with how the HTTP protocol works. After the user fills out the HTML and clicks the Submit button, the content of that form is wrapped in the form of an HTTP Request message and sent to the server (specified by the action attribute). In this case, since we don't specify a server explicitly, it goes to the same server that hosted the HTML form. The server in turn passes on the information to the CGI program called getinfo.cgi. The CGI program will do some processing and generate an output that the server sends back to the browser (in this case our program) in the form of an HTTP Response message. Our program must then interpret that response and extract the information it needs. Note that the response contains HTML. Our data is mixed with HTML tags, so our code must do some data extraction. For example, we may get a response like this:
User InformationTelephone: 123-123-1234