W3 "Middleware": Notions and Concepts

Louis Perrochon, Institut für Informationssysteme, ETH Zürich, Switzerland.
perrochon@acm.org
Abstract The "middleware" of the World Wide Web consists of a bunch of CGI gateways and server modifications, still many of them research prototypes. However, there is a fast growing market for products, that allow the integration of legacy information systems into the World Wide Web. This article presents some concepts and notions in this very important context. This concepts allow the comparison of different approaches and solutions.

Keywords:
World Wide Web (W3, WWW), database, legacy system, integration, coexistence.

1. Introduction

We urgently have to discuss the integration of legacy information systems into the World Wide Web (W3) for two main reasons: Most of the data already stored on computer systems reside inside legacy information systems and much of the new data not yet stored in computer systems will go into legacy information systems. The second reason needs some additional explanations: W3, the major hype of today's information technology and the most successful information system ever, is still largely based on file system technology. We prefer to use database technology that has been developed to manage huge amounts of data. Unfortunately, even modern databases must still be considered as a legacy system when they should be integrated into the W3.

This article deals with the problem of integrating other information systems (IS) into the World Wide Web (W3). It defines some notions and concepts and it gives a short overview over the different approaches.

The first experiments took place in Summer 1992, when Arthur Secret at CERN implemented HTOracle, a simple gateway from W3 to Oracle. HTOracle proved the feasibility of the approach. Since then, ongoing work has resulted in more sophisticated products. In fact, right now many organizations start to build internal applications based on IS and the W3. Cross platform development, client/server architecture, simplicity of development and easy use are very attractive reasons to have a close look at the W3.

Figure 1 However, in contrast to "traditional" client/server computing, there is almost no "middleware" available for the W3. The fancy tools available to build complex client/server systems are not compatible with the W3. W3 "middleware" is implemented from scratch. However since the author wrote the first sentences of this article some months ago, many products have been brought to the market. Soon we will have a very comfortable situation.

This articles does not try to evaluate different products as this is beyond the scope of a single article because of speed of improvements and the variety of products. Any article will be outdated the day it is finished. This article only classifies different approaches to the problem and gives some hints, how future developments can be judged and classified.

Nevertheless, Table 2. contains a list of currently available products for IS-W3 integration. Using existing products has several advantages over "home-baked" solutions. Besides reducing costs and risks of development, products (especially commercial products) are quite well tested and are used at many places. So there is a lot of knowledge and support available.


2. Definitions

The term information system (IS) stands for a computer system providing information to users. It encompasses the data, often a database management system (DBMS), always applications, support programs, etc. We use the term information server to stress the idea, that parts of an IS are accessed (remotely) by other programs. These other programs are called clients or browsers. (Client is a more general term. Browsers are only those clients that allow a human user to access information interactively.) Browsers and information server together build a new IS. The above definition of IS includes traditional systems which are accessed by terminal emulation programs, as well as newer systems like the W3.

A legacy IS is an IS that significantly resists modification and evolution. In our context, it stands for all kind of traditional IS that are designed and implemented without regards for the special needs of today's global IS like the W3. At the moment, this still includes all commercial database management systems. In the following, we will discuss how to connect a W3 browser to a legacy IS despite the resistance of the legacy IS.

One of the problems of legacy IS is their statefulnes. State information is the sum of all the data stored inside the legacy IS concerning the ongoing interaction with a client. Storing state information about the history of communications allows stateful protocols. A stateful protocol is a protocol, where the meaning of a message depends on previous messages. Stateful information servers and stateful protocols are closely related. There are good reasons for stateful servers, e.g. efficiency: less information has to be transported over the network. Unfortunately W3 browsers do not support states. A stateless server is always in the same (single) state and the reaction on one and the same query is always the same, independent of previous interactions. The hypertext transfer protocol (HTTP) is by definition a stateless protocol.

To our knowledge, no gateway on the market deals with this problem properly in full extend. A gateway can easily be tested by resubmitting a cached page. Normally this page has been generated earlier, when the legacy IS was in another state. In the meantime things have changed. The old page cannot be processed correctly.

An IS can be considered as having three components: one or several applications, a database service or database management system (DBMS), and the data. The architecture of legacy IS varies widely from a set of well structured modules to unstructured and non decomposable blocks.

Figure 2 The W3 client can be considered as a new application with a new user interface, which is added to the legacy IS. Ideally, we want to connect the W3 browser directly to the DBMS of the legacy IS. Unfortunately, some legacy IS do not have a DBMS module. If they do, the DBMS may not support the type of connectivity required by the W3 (e.g. stateless protocol, normally no user identification, ...).

On the W3, most information is stored as static Hypertext Markup Language (HTML) documents. When legacy IS are accessed, the hypertext needs to be generated dynamically. As it is not the purpose of a DBMS to produce HTML on-the-fly, an other component is required.

Gateways bridge the gap. In general, gateways insulate certain components of an IS from changes being made to other components. A gateway is a program that allows two independent information systems to exchange data. A gateway can alter data in both structure and format. Changes in the value of data are not done in a gateway.

According to this definitions, the solution to our problem of integrating W3 and legacy information systems are gateways. Gateways translate requests and data between the W3 browser and the legacy IS. In our context, gateways especially help to concentrate on the hard part of the integration: designing the new W3 user interface. Low level aspects of communication are more or less hidden.

The gateways classified below simultaneously interact with the W3 and and the legacy IS. The following two-dimensional classification is based on the design of these two interactions. First we classify depending on the interaction with the IS, then depending on the interaction with the W3 client.


3. Gateway Interaction with the Legacy IS

The first dimension of the classification is based on the interaction between the gateway and the legacy IS. As discussed above, we would like the gateway to interact with the DBMS. But this is not always possible.

3.1 Data Gateways

Figure 3 Some gateways access the data files of the legacy IS directly. This is only possible if the format of the data is known and the files are accessible.

With sophisticated gateways only the format of the data in the files is described, queries are specified and output masks for the presentation of the retrieved data are designed. The queries are then interpreted in the gateway. If no suitable gateways is available, it has to be developed with one of the techniques mentioned below (CGI, SSI).

The data gateway approach is especially successful, if the legacy data is stored in the Standard General Markup Language (SGML) format or can be exported to SGML. As HTML is one dialect of SGML, such documents can be integrated into the W3 very easily.


3.2 DBMS Gateways

Figure 4 If we have a legacy IS with a modern DBMS, the gateway should access the DBMS. This is the most convenient approach and thus normally selected for commercial gateways and "home-baked" solutions. Simple gateways replace direct access to data files with simple queries to the DBMS, which then acts as a search engine. The presentation of the retrieved data is similar to the first approach. More complex gateways also exploit the programming features of databases (e.g. stored procedures) and very powerful W3 applications can be programmed and stored mostly inside the database. Some DBMS companies already offer their own gateways. For most important DBMS, public domain gateways are freely available.

Database gateways can be further classified on the number of layers between the gateway and the DBMS. While many gateways access the DBMS directly, some have an additional layer in between, e.g. based on the open database connectivity standard (ODBC).


3.3 Interface Gateways

Figure 5 The third approach is based on the following idea: As legacy IS already have a user interface, a gateway that accesses the user interface could be developed. It handles the translation from a stateless protocol like the Hypertext Transfer Protocol (HTTP) into the stateful one required by most legacy IS applications (e.g. VT-100, IBM's 3270). This approach is especially useful if no other interface to access the legacy IS exists.

For generic translators the user interface (and not the data!) of a legacy IS is described in a formal way [Perrochon et al. 95]. This description mainly covers form and function, and not the "look and feel".


4. Gateway Interaction with the World Wide Web

Gateways do not only access the legacy information system. They also must interact with a W3 client. Most of the gateways are not directly accessible by the client, but are coupled with a W3 server. This section classifies gateways on how they interact with the W3 client.


4.1 The Common Gateway Interface (CGI)

Most of the gateways listed in Table 2. require a running W3 server capable of starting programs using the Common Gateway Interface (CGI). CGI defines how data is transferred between a W3 system and a gateway. Most W3 servers on UNIX support this standard.

Figure 6 When a client sends a request to a server (1), the server starts an external program (2) and the request is forwarded to this program (2). The program then calculates the result with the help of the IS (3) and sends them back to the W3 server (4). Afterwards, the W3 server sends them to the client (5).


4.2 Server Side Includes (SSI)

Figure 7 A server side include (SSI) consists of a special sequence of characters (tokens) inside an HTML page. As the page is sent from the W3 server to the requesting client (2,5), the page is scanned by the server for these special tokens. When a token is found, the server interprets the instructions in the token and performs an action based on the token data (3,4). This is similar to the "mail-merge" function of a word processor where a document is merged with a database. However, in our case, the result is only one single document with all the data in it.

Some of the gateways listed in Table 2. are based on SSI. SSI can be implemented in the server itself or can be provided by a CGI program that merges the HTML document and the data from the legacy system.


4.3 Gateways as Stand Alone Servers

Figure 8 Some of the gateways are their own server. The advantage of this is normally a big improvement in performance and a greater flexibility when designing the new W3 application. This is the way all database companies go. They are selling (or announcing) dedicated servers that are highly optimized to interact with their DBMS.


4.4 Mobile Code Systems (MCS)

Figure 9 Recently, a new approach has emerged. The idea is to distribute the code of the application and send it to the client. The client then executes the code locally on the user's machine. Although the existing application of the legacy system will probably not fit completely into the W3, this idea allows to rebuild parts of the user interface of the legacy system inside the W3. Of course, there must be additional access control mechanisms, usually including authentication, to maintain security and system integrity of the client computer.

The simplest approach is to send a compiled program to the user and start it there. Besides considerable security risks, this approach depends on the user's computer type.

There are also interpreted mobile code systems available. Especially designed for the W3 is Java (<http://java.sun.com/>) from Sun. Java is a programming language similar in syntax to C++, but similar in other ways to Smalltalk and Objective C. The system supports secure loading, so that code from untrusted sources can be added dynamically. The demonstration application for Java - HotJava - is a complete web browser. Applications written in Java are executable on any computer with a Java compatible Browser. Another approach is based on Safe-Tcl [van Doorn et al. 95].

MCS are still a matter of promising research. So far, there are no gateways based on the MCS approach available.


5. Open Questions


5.1 Transactions

The W3 does not support transactions very well: as soon as the W3 server has responded to the request of the client, the connection between them breaks down. If a single transaction takes only one request and the transaction can immediately be committed this is no problem. However, if a separate commitment or follow-up requests are required, the problem of tracking follow-up requests arises. And there is no proper user identification possible on the W3. We have to wait until the secure protocols are widely used.


5.3 "Hard" legacy systems

Some legacy systems may still resist integration. The Development and Applications Research Group of the Department of Computer Science of ETH Zürich is currently conducting further research on this topic. One research project tries to enhance the existing integration approaches, another one concentrates on the migration of legacy information systems. We are also interested in cooperation with industrial partners in this area.


6. Summary

As legacy IS are not designed to be used as W3 servers, gateways are required to integrated legacy IS and the W3. On the other hand we also want to use a gateway to insulate the IS from the W3. Gateways can be classified depending on the interaction with both the IS and the W3.

There are many gateways available already. The future will bring us specialized W3 servers, tailored for one IS product supported by the vendor. This conforms to the right-sizing approach which requires separate modules for separate function blocks.



7. References

[van Doorn et al. 95] van Doorn, M., Eliëns, A. Integrating applications and the World Wide Web. The Third International World Wide Web Conference 1995 (WWW'95), Darmstadt, April 1995. Computer Networks and ISDN Systems 27(1995) 1105-1110.

[Perrochon et al. 95] Perrochon, L., Fischer R. IDLE: Unified W3 Access to Interactive Servers. The Third International World Wide Web Conference 1995 (WWW'95), Darmstadt, April 1995. Computer Networks and ISDN Systems 27(1995) 927-938. See <http://www.inf.ethz.ch/department/IS/ea/tsp/>.

[Ronchetti 95] Ronchetti, M. Face Lift: using WWW technology for an external re-engineering of old applications. The Third International World Wide Web Conference 1995 (WWW'95), Darmstadt, April 1995. See <http://www.inf.unitn.it/~ronchet/CBT/>.

[Varela et al. 94] Varela, C. A., Hayes, C. C. Providing Data on the Web: From Examples to Programs. The Second International WWW Conference '94: Mosaic and the Web, Chicago, USA October, 1994

8. Tables

Table 1. URL's for various information.

Table 2. Overview over available products for the integration of legacy systems into the W3.


©1995 L. Perrochon