perrochon@acm.org
Contents
This article deals with the problem of integrating information systems (IS) and the World-Wide Web (W3). It gives a short overview over the different approaches. The first experiments took place in Summer 1992, when Arthur Secret at CERN implemented HTOracle, a simple gateway from W3 to Oracle. HTOracle proved the feasibility of the approach. Since then, ongoing work has resulted in more sophisticated products.
In fact, right now many organization start to build internal applications based on IS and the W3. Cross platform development, client/server architecture simplicity of development and easy use are very attractive reasons to have a close look at the W3.
This articles does not
try to evaluate different products. This is beyond the scope of a single
article because of speed of improvements and the variety of products. Any article
will be outdated the day it is finished. This article only classifies different
approaches to the problem.
However, Table 2. contains a list of currently available products for IS-W3 integration. Using existing products has several advantages. Besides reducing cost and risk of developments, products (especially commercial products) are quite well tested and are used at many places. So there is a lot of knowledge and support available.
An IS can be considered as having three components: 1) one or several applications, 2) a database service or database management system (DBMS), and the data. The architecture of legacy IS vary widely from a set of well structured modules to unstructured and non decomposable blocks.
The W3 client can be
considered a new application with a new user interface that is added to the
legacy IS. Ideally, we want to connect the W3 browser directly to the DBMS of
the legacy IS. Unfortunately, some legacy IS do not have a DBMS module. If they
do, the DBMS may not support the type of connectivity required by the W3 (e.g.
stateless protocol, normally no user identification, ...).
On the W3, most information is stored as static Hypertext Mark-up Language (HTML) documents. When legacy IS are accessed, the hypertext needs to be generated dynamically. As it is not the purpose of a DBMS to produce HTML on-the-fly, some other component is required.
Software tools, called gateways, have been developed to bridge this gap. In general, gateways insulate certain components of an IS from changes being made to other components. Gateways translate requests and data between the mediated components. In our context, Gateways especially help to concentrate on the hard part of the integration: designing the new W3 user interface. Low level aspects of communication are more or less hidden.
The gateways classified below simultaneously interact with the W3 and and the legacy IS. The following two-dimensional classification is based on the design of these two interactions.
Some gateways access the
flat data files of the legacy IS directly. This is only possible if the format
of the data is known and the files are accessible.
With sophisticated gateways only the format of the data in the files is described, queries are specified and output masks for the presentation of the retrieved data are designed. The queries are then interpreted in the gateway. If no suitable gateways is available, it has to be developed with one of the techniques mentioned below (CGI, SSI).
The data gateway approach is especially successfull, if the legacy data is stored in the Standard General Mark-up Language (SGML) format or can be exported to SGML. As HTML is one dialect of SGML, such documents can be integrated into the W3 very easily.
If we have a legacy IS with
a modern DBMS, the gateway should access the DBMS. This is the most convenient
approach and thus normally selected for new information systems. Simple gateways
replace direct access to flat data files with simple queries to the DBMS, which
then acts as a search engine. The presentation of the retrieved data is similar
to the first approach. More complex gateways also exploit the programming features
of databases (e.g. stored procedures) and very powerful W3 applications can be
programmed and stored mostly inside the database. Some DBMS companies already
offer their own gateways. For most important DBMS, public domain gateways are
freely available.
Database gateways can be further classified on the number of layers between the gateway and the DBMS. While many gateways access the DBMS directly, some have an additional layer in between, e.g. ODBC drivers.
The third approach is based
on the following idea: As legacy IS already have a user interface, a gateway that
accesses the user interface could be developed. It handles the translation from
a stateless protocol like the Hypertext Transfer Protocol (HTTP) into the stateful
one required by most legacy IS applications (e.g. VT-100, IBM's 3270).
For generic translators the user interface (and not the data!)
of a legacy IS is described in a formal way [Perrochon
et al. 95]. This description mainly covers form and function, and not
the "look and feel".
Most of the gateways listed
in Table 2. require a running W3 server capable of
starting programs using the Common Gateway Interface (CGI). CGI defines how data
is transferred between a W3 system and a gateway. Most W3 servers on UNIX support
this standard.
When a client sends a request to a server (1), the server starts an
external program (2) and the request is forwarded to this program
(2). The program then calculates the result and sends them back to the
W3 server (4). The W3 server then sends them to the client (5).
A server side include (SSI)
consists of a special sequence of characters (tokens) inside an HTML page. As
the page is sent from the W3 server to the requesting client (2,5), the page is
scanned by the server for these special tokens. When a token is found, the server
interprets the instructions in the token and performs an action based on the token
data(3,4). This is similar to the "mail-merge" function of a word processor where
a document is merged with a database. However, in our case, the result is only
one single document with all the data in it.
Some of the gateways listed in Table 2. base on SSI. SSI can be implemented in the server itself or provided by a CGI program that merges the HTML document and the data from the legacy system.
Some of the gateways are
their own sever. The advantage of this is normally a big improvement in performance
and a greater flexibility when designing the new W3 application. This is the the
way all the today's database companies go. They are selling dedicated servers
that are highly optimized to interact with their DBMS.
Recently, a new approach
has emerged. The idea is to distribute the code of the application and send it
to the client. The client then executes the code locally on the user's machine.
Although the existing application of the legacy system will probably not fit into
the W3, this idea allows rebuilding parts of the user interface of the legacy
system on the W3 and run the on the client computer. Of course, there must be
additional access control mechanisms, usually including authentication, to maintain
security and system integrity of the client computer.
The simplest approach is to send a compiled program to the user and start it there. Besides considerable security risks, this approach is dependent on the user's computer type.
There are also interpreted mobile code systems available. Especially designed for the W3 is Java (<http://java.sun.com/>) from Sun. Java is a programming language similar in syntax to C++, but similar in other ways to Smalltalk and Objective C. The system supports secure loading, so that code from untrusted sources can be added dynamically. The demonstration application for Java - HotJava - is a complete web browser. Applications written in Java are executable on any computer with a Java compatible Browser. Another approach is based on Safe-Tcl [van Doorn et al. 95].
MCS are still a matter of promising research. So far, there are no gateways based on the MCS approach available.
There are many gateways available already. The future will bring us
spezialized W3-servers, tailored for one IS product, supported by the
vendor. This conforms to the right-sizing approach, which requires
separate modules for separate function blocks.
[Perrochon et al. 95] Perrochon, L., Fischer R. IDLE: Unified W3-Access to Interactive Servers. The Third International World-Wide Web Conference 1995 (WWW'95), Darmstadt, April 1995. Computer Networks and ISDN Systems 27(1995) 927-938. See <http://www.inf.ethz.ch/department/IS/ea/tsp/>.
[RDA] RDA was published in 1993. It consists of two parts: Part 1 - Generic RDA. ISO 9579-1:1993 and Part 2 - SQL Specialization ISO 9579-2:1993. Part 1 specifies the generic model, service, and protocol for arbitrary database connection and Part 2 specifies additional protocols for connecting databases conforming to SQL. See <http://www.iso.ch/>.
[Ronchetti 95] Ronchetti, M. Face Lift: using WWW technology for an external re-engineering of old applications. The Third International World-Wide Web Conference 1995 (WWW'95), Darmstadt, April 1995. See <http://www.inf.unitn.it/~ronchet/CBT/>.
[SQL 92] SQL is under continual development by the International Organization for Standardization (ISO). The most recent published version was ISO 9075:1992. See <http://www.iso.ch/>.
[Varela et al. 94] Varela, C. A., Hayes, C. C. Providing Data on the
Web: From Examples to Programs. The Second International WWW
Conference '94: Mosaic and the Web, Chicago, USA October, 1994
Table 2. Overview over available products for the integration of legacy systems into the W3.