perrochon@acm.org
This article deals with the problem of integrating other information systems (IS) into the World Wide Web (W3). It defines some notions and concepts and it gives a short overview over the different approaches.
The first experiments took place in Summer 1992, when Arthur Secret at CERN implemented HTOracle, a simple gateway from W3 to Oracle. HTOracle proved the feasibility of the approach. Since then, ongoing work has resulted in more sophisticated products. In fact, right now many organizations start to build internal applications based on IS and the W3. Cross platform development, client/server architecture, simplicity of development and easy use are very attractive reasons to have a close look at the W3.
However, in contrast to
"traditional" client/server computing, there is almost no "middleware" available
for the W3. The fancy tools available to build complex client/server systems
are not compatible with the W3. W3 "middleware" is implemented from scratch.
However since the author wrote the first sentences of this article some months
ago, many products have been brought to the market. Soon we will have a very
comfortable situation.
This articles does not try to evaluate different products as this is beyond the scope of a single article because of speed of improvements and the variety of products. Any article will be outdated the day it is finished. This article only classifies different approaches to the problem and gives some hints, how future developments can be judged and classified.
Nevertheless, Table 2. contains a list of currently available products for IS-W3 integration. Using existing products has several advantages over "home-baked" solutions. Besides reducing costs and risks of development, products (especially commercial products) are quite well tested and are used at many places. So there is a lot of knowledge and support available.
A legacy IS is an IS that significantly resists modification and evolution. In our context, it stands for all kind of traditional IS that are designed and implemented without regards for the special needs of today's global IS like the W3. At the moment, this still includes all commercial database management systems. In the following, we will discuss how to connect a W3 browser to a legacy IS despite the resistance of the legacy IS.
One of the problems of legacy IS is their statefulnes. State information is the sum of all the data stored inside the legacy IS concerning the ongoing interaction with a client. Storing state information about the history of communications allows stateful protocols. A stateful protocol is a protocol, where the meaning of a message depends on previous messages. Stateful information servers and stateful protocols are closely related. There are good reasons for stateful servers, e.g. efficiency: less information has to be transported over the network. Unfortunately W3 browsers do not support states. A stateless server is always in the same (single) state and the reaction on one and the same query is always the same, independent of previous interactions. The hypertext transfer protocol (HTTP) is by definition a stateless protocol.
To our knowledge, no gateway on the market deals with this problem properly in full extend. A gateway can easily be tested by resubmitting a cached page. Normally this page has been generated earlier, when the legacy IS was in another state. In the meantime things have changed. The old page cannot be processed correctly.
An IS can be considered as having three components: one or several applications, a database service or database management system (DBMS), and the data. The architecture of legacy IS varies widely from a set of well structured modules to unstructured and non decomposable blocks.
The W3 client can be considered
as a new application with a new user interface, which is added to the legacy
IS. Ideally, we want to connect the W3 browser directly to the DBMS of the legacy
IS. Unfortunately, some legacy IS do not have a DBMS module. If they do, the
DBMS may not support the type of connectivity required by the W3 (e.g. stateless
protocol, normally no user identification, ...).
On the W3, most information is stored as static Hypertext Markup Language (HTML) documents. When legacy IS are accessed, the hypertext needs to be generated dynamically. As it is not the purpose of a DBMS to produce HTML on-the-fly, an other component is required.
Gateways bridge the gap. In general, gateways insulate certain components of an IS from changes being made to other components. A gateway is a program that allows two independent information systems to exchange data. A gateway can alter data in both structure and format. Changes in the value of data are not done in a gateway.
According to this definitions, the solution to our problem of integrating W3 and legacy information systems are gateways. Gateways translate requests and data between the W3 browser and the legacy IS. In our context, gateways especially help to concentrate on the hard part of the integration: designing the new W3 user interface. Low level aspects of communication are more or less hidden.
The gateways classified below simultaneously interact with the W3 and and the legacy IS. The following two-dimensional classification is based on the design of these two interactions. First we classify depending on the interaction with the IS, then depending on the interaction with the W3 client.
Some gateways access the data
files of the legacy IS directly. This is only possible if the format of the data
is known and the files are accessible.
With sophisticated gateways only the format of the data in the files is described, queries are specified and output masks for the presentation of the retrieved data are designed. The queries are then interpreted in the gateway. If no suitable gateways is available, it has to be developed with one of the techniques mentioned below (CGI, SSI).
The data gateway approach is especially successful, if the legacy data is stored in the Standard General Markup Language (SGML) format or can be exported to SGML. As HTML is one dialect of SGML, such documents can be integrated into the W3 very easily.
If we have a legacy IS with
a modern DBMS, the gateway should access the DBMS. This is the most convenient
approach and thus normally selected for commercial gateways and "home-baked" solutions.
Simple gateways replace direct access to data files with simple queries to the
DBMS, which then acts as a search engine. The presentation of the retrieved data
is similar to the first approach. More complex gateways also exploit the programming
features of databases (e.g. stored procedures) and very powerful W3 applications
can be programmed and stored mostly inside the database. Some DBMS companies already
offer their own gateways. For most important DBMS, public domain gateways are
freely available.
Database gateways can be further classified on the number of layers between the gateway and the DBMS. While many gateways access the DBMS directly, some have an additional layer in between, e.g. based on the open database connectivity standard (ODBC).
The third approach is based
on the following idea: As legacy IS already have a user interface, a gateway that
accesses the user interface could be developed. It handles the translation from
a stateless protocol like the Hypertext Transfer Protocol (HTTP) into the stateful
one required by most legacy IS applications (e.g. VT-100, IBM's 3270). This approach
is especially useful if no other interface to access the legacy IS exists.
For generic translators the user interface (and not the data!) of a legacy IS is described in a formal way [Perrochon et al. 95]. This description mainly covers form and function, and not the "look and feel".
When a client sends a
request to a server (1), the server starts an external program (2) and the request
is forwarded to this program (2). The program then calculates the result with
the help of the IS (3) and sends them back to the W3 server (4). Afterwards,
the W3 server sends them to the client (5).
A server side include (SSI)
consists of a special sequence of characters (tokens) inside an HTML page. As
the page is sent from the W3 server to the requesting client (2,5), the page is
scanned by the server for these special tokens. When a token is found, the server
interprets the instructions in the token and performs an action based on the token
data (3,4). This is similar to the "mail-merge" function of a word processor where
a document is merged with a database. However, in our case, the result is only
one single document with all the data in it.
Some of the gateways listed in Table 2. are based on SSI. SSI can be implemented in the server itself or can be provided by a CGI program that merges the HTML document and the data from the legacy system.
Some of the gateways are their
own server. The advantage of this is normally a big improvement in performance
and a greater flexibility when designing the new W3 application. This is the way
all database companies go. They are selling (or announcing) dedicated servers
that are highly optimized to interact with their DBMS.
Recently, a new approach has
emerged. The idea is to distribute the code of the application and send it to
the client. The client then executes the code locally on the user's machine. Although
the existing application of the legacy system will probably not fit completely
into the W3, this idea allows to rebuild parts of the user interface of the legacy
system inside the W3. Of course, there must be additional access control mechanisms,
usually including authentication, to maintain security and system integrity of
the client computer.
The simplest approach is to send a compiled program to the user and start it there. Besides considerable security risks, this approach depends on the user's computer type.
There are also interpreted mobile code systems available. Especially designed for the W3 is Java (<http://java.sun.com/>) from Sun. Java is a programming language similar in syntax to C++, but similar in other ways to Smalltalk and Objective C. The system supports secure loading, so that code from untrusted sources can be added dynamically. The demonstration application for Java - HotJava - is a complete web browser. Applications written in Java are executable on any computer with a Java compatible Browser. Another approach is based on Safe-Tcl [van Doorn et al. 95].
MCS are still a matter of promising research. So far, there are no gateways based on the MCS approach available.
There are many gateways available already. The future will bring us specialized W3 servers, tailored for one IS product supported by the vendor. This conforms to the right-sizing approach which requires separate modules for separate function blocks.
[Perrochon et al. 95] Perrochon, L., Fischer R. IDLE: Unified W3 Access to Interactive Servers. The Third International World Wide Web Conference 1995 (WWW'95), Darmstadt, April 1995. Computer Networks and ISDN Systems 27(1995) 927-938. See <http://www.inf.ethz.ch/department/IS/ea/tsp/>.
[Ronchetti 95] Ronchetti, M. Face Lift: using WWW technology for an external re-engineering of old applications. The Third International World Wide Web Conference 1995 (WWW'95), Darmstadt, April 1995. See <http://www.inf.unitn.it/~ronchet/CBT/>.
[Varela et al. 94] Varela, C. A., Hayes, C. C. Providing Data on the
Web: From Examples to Programs. The Second International WWW
Conference '94: Mosaic and the Web, Chicago, USA October, 1994
Table 2. Overview over available products for the integration of legacy systems into the W3.