On the Integration of Legacy Information Systems into the World-Wide Web

Louis Perrochon, Institut für Informationssysteme, ETH Zürich, Switzerland.


  1. Introduction
  2. Definitions
  3. Gateway Interaction with the Legacy IS
  4. Gateway Interaction with the World-Wide Web
  5. Research Agenda
  6. Summary
  7. References
  8. Tables

1. Introduction

Tomorrow, we will see the synergy of two technologies: databases and networking. Tomorrows global information systems will integrate network technology and database technology. However, each world still has its own concepts and technology. They are aware of each other, but they do not collaborate. Networking people have heard of SQL, but had few opportunities to use it. Database people use the Internet, but its technology is unfamiliar to them. No wonder, the Internet, the major hype of todays computer science and the most successful information system ever, is still based on file system technology.

This article deals with the problem of integrating information systems (IS) and the World-Wide Web (W3). It gives a short overview over the different approaches. The first experiments took place in Summer 1992, when Arthur Secret at CERN implemented HTOracle, a simple gateway from W3 to Oracle. HTOracle proved the feasibility of the approach. Since then, ongoing work has resulted in more sophisticated products.

In fact, right now many organization start to build internal applications based on IS and the W3. Cross platform development, client/server architecture simplicity of development and easy use are very attractive reasons to have a close look at the W3.

Figure 1 This articles does not try to evaluate different products. This is beyond the scope of a single article because of speed of improvements and the variety of products. Any article will be outdated the day it is finished. This article only classifies different approaches to the problem.

However, Table 2. contains a list of currently available products for IS-W3 integration. Using existing products has several advantages. Besides reducing cost and risk of developments, products (especially commercial products) are quite well tested and are used at many places. So there is a lot of knowledge and support available.

2. Definitions

The term information system (IS) stands for a computer system providing information to users. It encompasses the data, some times a Database Management System (DBMS), applications, support programs, etc. We use the term information server to stress the idea, that parts of an IS are accessed (remotely) by other programs. These other programs are called clients or browsers. (Clients is a more general term. Browsers are only those clients that allow a human user to access information on the W3.) Browsers and information server together build a new IS. The above definition of IS includes traditional systems which are accessed by terminal emulation programs, as well as newer systems like the W3. A legacy IS is an IS that significantly resists modification and evolution. In our context, it stands for all kind of traditional IS that are designed and implemented without regards for the special needs of todays public global IS like the W3. At the moment, this includes all database management systems. In the following, we will discuss how to connect a W3 browser to a legacy IS despite the resistance of the legacy IS.

An IS can be considered as having three components: 1) one or several applications, 2) a database service or database management system (DBMS), and the data. The architecture of legacy IS vary widely from a set of well structured modules to unstructured and non decomposable blocks.

Figure 2 The W3 client can be considered a new application with a new user interface that is added to the legacy IS. Ideally, we want to connect the W3 browser directly to the DBMS of the legacy IS. Unfortunately, some legacy IS do not have a DBMS module. If they do, the DBMS may not support the type of connectivity required by the W3 (e.g. stateless protocol, normally no user identification, ...).

On the W3, most information is stored as static Hypertext Mark-up Language (HTML) documents. When legacy IS are accessed, the hypertext needs to be generated dynamically. As it is not the purpose of a DBMS to produce HTML on-the-fly, some other component is required.

Software tools, called gateways, have been developed to bridge this gap. In general, gateways insulate certain components of an IS from changes being made to other components. Gateways translate requests and data between the mediated components. In our context, Gateways especially help to concentrate on the hard part of the integration: designing the new W3 user interface. Low level aspects of communication are more or less hidden.

The gateways classified below simultaneously interact with the W3 and and the legacy IS. The following two-dimensional classification is based on the design of these two interactions.

3. Gateway Interaction with the Legacy IS

The first dimension of the classification is based on the interaction between the gateway and the legacy IS. Again, we would like the gateway to interact with the DBMS. As mentioned above, this is not always possible.

3.1 Data Gateways

Figure 3 Some gateways access the flat data files of the legacy IS directly. This is only possible if the format of the data is known and the files are accessible.

With sophisticated gateways only the format of the data in the files is described, queries are specified and output masks for the presentation of the retrieved data are designed. The queries are then interpreted in the gateway. If no suitable gateways is available, it has to be developed with one of the techniques mentioned below (CGI, SSI).

The data gateway approach is especially successfull, if the legacy data is stored in the Standard General Mark-up Language (SGML) format or can be exported to SGML. As HTML is one dialect of SGML, such documents can be integrated into the W3 very easily.

3.2 DBMS Gateways

Figure 4 If we have a legacy IS with a modern DBMS, the gateway should access the DBMS. This is the most convenient approach and thus normally selected for new information systems. Simple gateways replace direct access to flat data files with simple queries to the DBMS, which then acts as a search engine. The presentation of the retrieved data is similar to the first approach. More complex gateways also exploit the programming features of databases (e.g. stored procedures) and very powerful W3 applications can be programmed and stored mostly inside the database. Some DBMS companies already offer their own gateways. For most important DBMS, public domain gateways are freely available.

Database gateways can be further classified on the number of layers between the gateway and the DBMS. While many gateways access the DBMS directly, some have an additional layer in between, e.g. ODBC drivers.

3.3 Interface Gateways

Figure 5 The third approach is based on the following idea: As legacy IS already have a user interface, a gateway that accesses the user interface could be developed. It handles the translation from a stateless protocol like the Hypertext Transfer Protocol (HTTP) into the stateful one required by most legacy IS applications (e.g. VT-100, IBM's 3270).

For generic translators the user interface (and not the data!) of a legacy IS is described in a formal way [Perrochon et al. 95]. This description mainly covers form and function, and not the "look and feel".

4. Gateway Interaction with the World-Wide Web

Gateways do not only access the legacy information system. They also must interact with a W3 client. Most of the gateways are not directly accessible by the client, but are coupled with a W3 server. This section classifies gateways on how they interact with the W3 client.

4.1 The Common Gateway Interface (CGI)

Figure 6 Most of the gateways listed in Table 2. require a running W3 server capable of starting programs using the Common Gateway Interface (CGI). CGI defines how data is transferred between a W3 system and a gateway. Most W3 servers on UNIX support this standard.

When a client sends a request to a server (1), the server starts an external program (2) and the request is forwarded to this program (2). The program then calculates the result and sends them back to the W3 server (4). The W3 server then sends them to the client (5).

4.2 Server Side Includes (SSI)

Figure 7 A server side include (SSI) consists of a special sequence of characters (tokens) inside an HTML page. As the page is sent from the W3 server to the requesting client (2,5), the page is scanned by the server for these special tokens. When a token is found, the server interprets the instructions in the token and performs an action based on the token data(3,4). This is similar to the "mail-merge" function of a word processor where a document is merged with a database. However, in our case, the result is only one single document with all the data in it.

Some of the gateways listed in Table 2. base on SSI. SSI can be implemented in the server itself or provided by a CGI program that merges the HTML document and the data from the legacy system.

4.3 Gateways as Stand Alone Servers

Figure 8 Some of the gateways are their own sever. The advantage of this is normally a big improvement in performance and a greater flexibility when designing the new W3 application. This is the the way all the today's database companies go. They are selling dedicated servers that are highly optimized to interact with their DBMS.

4.4 Mobile Code Systems (MCS)

Figure 9 Recently, a new approach has emerged. The idea is to distribute the code of the application and send it to the client. The client then executes the code locally on the user's machine. Although the existing application of the legacy system will probably not fit into the W3, this idea allows rebuilding parts of the user interface of the legacy system on the W3 and run the on the client computer. Of course, there must be additional access control mechanisms, usually including authentication, to maintain security and system integrity of the client computer.

The simplest approach is to send a compiled program to the user and start it there. Besides considerable security risks, this approach is dependent on the user's computer type.

There are also interpreted mobile code systems available. Especially designed for the W3 is Java (<http://java.sun.com/>) from Sun. Java is a programming language similar in syntax to C++, but similar in other ways to Smalltalk and Objective C. The system supports secure loading, so that code from untrusted sources can be added dynamically. The demonstration application for Java - HotJava - is a complete web browser. Applications written in Java are executable on any computer with a Java compatible Browser. Another approach is based on Safe-Tcl [van Doorn et al. 95].

MCS are still a matter of promising research. So far, there are no gateways based on the MCS approach available.

5. Research Agenda

5.1 Transactions

The W3 does not support transactions very well: as soon as the W3 server responded to the request of the client, the connection between them is broken down. If a single transaction takes only one request and the transaction can immediately be committed this is no problem. However if a separate commitment or follow-up requests are required, the problem of tracking follow-up requests arises.

5.2 Updates

Besides all the problems involved with transactions, update also face another problem: There is no proper user-identification possible on the W3.

5.3 "Hard" legacy systems

Some legacy systems may still resist integration. The Development and Applications Research Group of the Department of Computer Science of ETH Zurich is currently conduction further research on this topic. One research project tries to enhance the existing integration approaches, another one concentrates on the migration of legacy information system. We are also interested in cooperation with industrial partners in this area.

6. Summary

As legacy IS are not designed to be used as W3 server, gateways are required to integrated legacy IS and the W3. On the other hand we also want to use a gateway to insulate the IS from the W3. Gateways can be classified depending on the interaction with both the IS and the W3.

There are many gateways available already. The future will bring us spezialized W3-servers, tailored for one IS product, supported by the vendor. This conforms to the right-sizing approach, which requires separate modules for separate function blocks.

7. References

[van Doorn et al. 95] van Doorn, M., Eliëns, A. Integrating applications and the World-Wide Web. The Third International World-Wide Web Conference 1995 (WWW'95), Darmstadt, April 1995. Computer Networks and ISDN Systems 27(1995) 1105-1110. See <http://www.igd.fhg.de /www/www95/papers/48/main.html>.

[Perrochon et al. 95] Perrochon, L., Fischer R. IDLE: Unified W3-Access to Interactive Servers. The Third International World-Wide Web Conference 1995 (WWW'95), Darmstadt, April 1995. Computer Networks and ISDN Systems 27(1995) 927-938. See <http://www.inf.ethz.ch/department/IS/ea/tsp/>.

[RDA] RDA was published in 1993. It consists of two parts: Part 1 - Generic RDA. ISO 9579-1:1993 and Part 2 - SQL Specialization ISO 9579-2:1993. Part 1 specifies the generic model, service, and protocol for arbitrary database connection and Part 2 specifies additional protocols for connecting databases conforming to SQL. See <http://www.iso.ch/>.

[Ronchetti 95] Ronchetti, M. Face Lift: using WWW technology for an external re-engineering of old applications. The Third International World-Wide Web Conference 1995 (WWW'95), Darmstadt, April 1995. See <http://www.inf.unitn.it/~ronchet/CBT/>.

[SQL 92] SQL is under continual development by the International Organization for Standardization (ISO). The most recent published version was ISO 9075:1992. See <http://www.iso.ch/>.

[Varela et al. 94] Varela, C. A., Hayes, C. C. Providing Data on the Web: From Examples to Programs. The Second International WWW Conference '94: Mosaic and the Web, Chicago, USA October, 1994

8. Tables

Table 1. URL's for various information.

Table 2. Overview over available products for the integration of legacy systems into the W3.

©1995 L. Perrochon