On the Integration of Legacy Information Systems into the World-Wide
Institut für Informationssysteme,
- Gateway Interaction with the Legacy IS
- Gateway Interaction with the World-Wide Web
- Research Agenda
Tomorrow, we will see the synergy of two technologies: databases and
networking. Tomorrows global information systems will integrate
network technology and database technology. However, each world still
has its own concepts and technology. They are aware of each other, but
they do not collaborate. Networking people have heard of SQL, but had
few opportunities to use it. Database people use the Internet, but its
technology is unfamiliar to them. No wonder, the Internet, the major
hype of todays computer science and the most successful information
system ever, is still based on file system technology.
This article deals with the problem of integrating information systems (IS)
and the World-Wide Web (W3). It gives a short overview over the different
approaches. The first experiments took place in Summer 1992, when
Arthur Secret at CERN implemented HTOracle, a simple gateway from W3
to Oracle. HTOracle proved the feasibility of the approach. Since
then, ongoing work has resulted in more sophisticated products.
In fact, right now many organization start to build internal
applications based on IS and the W3. Cross platform
development, client/server architecture simplicity of development and
easy use are very attractive reasons to have a close look at the
This articles does not
try to evaluate different products. This is beyond the scope of a single
article because of speed of improvements and the variety of products. Any article
will be outdated the day it is finished. This article only classifies different
approaches to the problem.
However, Table 2. contains a list of currently
available products for IS-W3 integration. Using existing products has several
advantages. Besides reducing cost and risk of developments, products (especially
commercial products) are quite well tested and are used at many places. So there
is a lot of knowledge and support available.
The term information system (IS) stands for a computer system
providing information to users. It encompasses the data, some times a
Database Management System (DBMS), applications, support programs,
etc. We use the term information server to stress the idea,
that parts of an IS are accessed (remotely) by other programs. These
other programs are called clients or browsers.
(Clients is a more general term. Browsers are only those clients that
allow a human user to access information on the W3.) Browsers and
information server together build a new IS. The above definition of IS
includes traditional systems which are accessed by terminal emulation
programs, as well as newer systems like the W3.
A legacy IS is an IS that significantly resists modification
and evolution. In our context, it stands for all kind of traditional
IS that are designed and implemented without regards for the special
needs of todays public global IS like the W3. At the moment, this
includes all database management systems. In the following, we will
discuss how to connect a W3 browser to a legacy IS despite the
resistance of the legacy IS.
An IS can be considered as having three components: 1) one or several
applications, 2) a database service or database management system
(DBMS), and the data. The architecture of legacy IS vary widely from a
set of well structured modules to unstructured and non decomposable
The W3 client can be
considered a new application with a new user interface that is added to the
legacy IS. Ideally, we want to connect the W3 browser directly to the DBMS of
the legacy IS. Unfortunately, some legacy IS do not have a DBMS module. If they
do, the DBMS may not support the type of connectivity required by the W3 (e.g.
stateless protocol, normally no user identification, ...).
On the W3, most information is stored as static Hypertext Mark-up
Language (HTML) documents. When legacy IS are accessed, the hypertext
needs to be generated dynamically. As it is not the purpose of a DBMS
to produce HTML on-the-fly, some other component is required.
Software tools, called gateways, have been developed to
bridge this gap. In general, gateways insulate certain components of
an IS from changes being made to other components. Gateways translate
requests and data between the mediated components. In our context,
Gateways especially help to concentrate on the hard part of the
integration: designing the new W3 user interface. Low level aspects of
communication are more or less hidden.
The gateways classified below simultaneously interact with the W3 and
and the legacy IS. The following two-dimensional classification is
based on the design of these two interactions.
3. Gateway Interaction with the
The first dimension of the classification is based on the interaction
between the gateway and the legacy IS. Again, we would like the
gateway to interact with the DBMS. As mentioned above, this is not
3.1 Data Gateways
Some gateways access the
flat data files of the legacy IS directly. This is only possible if the format
of the data is known and the files are accessible.
With sophisticated gateways only the format of the data in the files
is described, queries are specified and output masks for the
presentation of the retrieved data are designed. The queries are then
interpreted in the gateway. If no suitable gateways is available, it
has to be developed with one of the techniques mentioned below (CGI,
The data gateway approach is especially successfull, if the legacy
data is stored in the Standard General Mark-up Language (SGML) format
or can be exported to SGML. As HTML is one dialect of SGML, such
documents can be integrated into the W3 very easily.
3.2 DBMS Gateways
If we have a legacy IS with
a modern DBMS, the gateway should access the DBMS. This is the most convenient
approach and thus normally selected for new information systems. Simple gateways
replace direct access to flat data files with simple queries to the DBMS, which
then acts as a search engine. The presentation of the retrieved data is similar
to the first approach. More complex gateways also exploit the programming features
of databases (e.g. stored procedures) and very powerful W3 applications can be
programmed and stored mostly inside the database. Some DBMS companies already
offer their own gateways. For most important DBMS, public domain gateways are
Database gateways can be further classified on the number of layers
between the gateway and the DBMS. While many gateways access the DBMS
directly, some have an additional layer in between,
e.g. ODBC drivers.
3.3 Interface Gateways
The third approach is based
on the following idea: As legacy IS already have a user interface, a gateway that
accesses the user interface could be developed. It handles the translation from
a stateless protocol like the Hypertext Transfer Protocol (HTTP) into the stateful
one required by most legacy IS applications (e.g. VT-100, IBM's 3270).
For generic translators the user interface (and not the data!)
of a legacy IS is described in a formal way [Perrochon
et al. 95]. This description mainly covers form and function, and not
the "look and feel".
4. Gateway Interaction with the
Gateways do not only access the legacy information system. They also
must interact with a W3 client. Most of the gateways are not directly
accessible by the client, but are coupled with a W3 server. This
section classifies gateways on how they interact with the W3 client.
4.1 The Common Gateway Interface (CGI)
Most of the gateways listed
in Table 2. require a running W3 server capable of
starting programs using the Common Gateway Interface (CGI). CGI defines how data
is transferred between a W3 system and a gateway. Most W3 servers on UNIX support
When a client sends a request to a server (1), the server starts an
external program (2) and the request is forwarded to this program
(2). The program then calculates the result and sends them back to the
W3 server (4). The W3 server then sends them to the client (5).
4.2 Server Side Includes (SSI)
A server side include (SSI)
consists of a special sequence of characters (tokens) inside an HTML page. As
the page is sent from the W3 server to the requesting client (2,5), the page is
scanned by the server for these special tokens. When a token is found, the server
interprets the instructions in the token and performs an action based on the token
data(3,4). This is similar to the "mail-merge" function of a word processor where
a document is merged with a database. However, in our case, the result is only
one single document with all the data in it.
Some of the gateways listed in Table 2. base on
SSI. SSI can be implemented in the server itself or provided by a CGI program
that merges the HTML document and the data from the legacy system.
4.3 Gateways as Stand Alone Servers
Some of the gateways are
their own sever. The advantage of this is normally a big improvement in performance
and a greater flexibility when designing the new W3 application. This is the the
way all the today's database companies go. They are selling dedicated servers
that are highly optimized to interact with their DBMS.
4.4 Mobile Code Systems (MCS)
Recently, a new approach
has emerged. The idea is to distribute the code of the application and send it
to the client. The client then executes the code locally on the user's machine.
Although the existing application of the legacy system will probably not fit into
the W3, this idea allows rebuilding parts of the user interface of the legacy
system on the W3 and run the on the client computer. Of course, there must be
additional access control mechanisms, usually including authentication, to maintain
security and system integrity of the client computer.
The simplest approach is to send a compiled program to the user and
start it there. Besides considerable security risks, this approach is
dependent on the user's computer type.
There are also interpreted mobile code systems available. Especially
designed for the W3 is Java
(<http://java.sun.com/>) from Sun. Java is a programming
language similar in syntax to C++, but similar in other ways to
Smalltalk and Objective C. The system supports secure loading, so
that code from untrusted sources can be added dynamically. The
demonstration application for Java - HotJava - is a complete web
browser. Applications written in Java are executable on any computer
with a Java compatible Browser. Another approach is based on Safe-Tcl
[van Doorn et al. 95].
MCS are still a matter of promising research. So far, there are no
gateways based on the MCS approach available.
5. Research Agenda
The W3 does not support transactions very well: as soon as the W3
server responded to the request of the client, the connection between
them is broken down. If a single transaction takes only one request
and the transaction can immediately be committed this is no
problem. However if a separate commitment or follow-up requests are
required, the problem of tracking follow-up requests arises.
Besides all the problems involved with transactions, update also face
another problem: There is no proper user-identification possible on the
5.3 "Hard" legacy systems
Some legacy systems may still resist integration. The Development and
Applications Research Group of the Department of Computer Science
of ETH Zurich is currently conduction further research on this
topic. One research project tries to enhance the existing integration
approaches, another one concentrates on the migration of legacy
information system. We are also interested in cooperation with
industrial partners in this area.
As legacy IS are not designed to be used as W3 server,
gateways are required to integrated legacy IS and the W3. On
the other hand we also want to use a gateway to insulate the IS from
the W3. Gateways can be classified depending on the interaction with
both the IS and the W3.
There are many gateways available already. The future will bring us
spezialized W3-servers, tailored for one IS product, supported by the
vendor. This conforms to the right-sizing approach, which requires
separate modules for separate function blocks.
[van Doorn et al. 95] van Doorn, M., Eliëns, A. Integrating applications
and the World-Wide Web. The Third International World-Wide Web Conference
1995 (WWW'95), Darmstadt, April 1995. Computer Networks and ISDN Systems 27(1995)
1105-1110. See <http://www.igd.fhg.de /www/www95/papers/48/main.html>.
[Perrochon et al. 95] Perrochon, L., Fischer R. IDLE: Unified W3-Access
to Interactive Servers. The Third International World-Wide Web Conference
1995 (WWW'95), Darmstadt, April 1995. Computer Networks and ISDN Systems 27(1995)
927-938. See <http://www.inf.ethz.ch/department/IS/ea/tsp/>.
[RDA] RDA was published in 1993. It consists of two parts: Part 1 - Generic
RDA. ISO 9579-1:1993 and Part 2 - SQL Specialization ISO 9579-2:1993. Part 1
specifies the generic model, service, and protocol for arbitrary database connection
and Part 2 specifies additional protocols for connecting databases conforming
to SQL. See <http://www.iso.ch/>.
[Ronchetti 95] Ronchetti, M. Face Lift: using WWW technology for an external
re-engineering of old applications. The Third International World-Wide Web Conference
1995 (WWW'95), Darmstadt, April 1995. See <http://www.inf.unitn.it/~ronchet/CBT/>.
[SQL 92] SQL is under continual development by the International Organization
for Standardization (ISO). The most recent published version was ISO 9075:1992.
[Varela et al. 94] Varela, C. A., Hayes, C. C. Providing Data on the
Web: From Examples to Programs. The Second International WWW
Conference '94: Mosaic and the Web, Chicago, USA October, 1994
Table 1. URL's for various information.
Table 2. Overview over available products for
the integration of legacy systems into the W3.
©1995 L. Perrochon