Enlisting Event Patterns for Cyber Battlefield Awareness
Louis Perrochon, Eunhei Jang, Stephane Kasriel, David C. Luckham
Stanford University
{perrochon,ejang,dcl}@pavg.stanford.edu
(Note: Save as HTML in Word does not include EndNotes references. Read the PostScript file if you need those.)
Abstract
Cyber warfare consists to a large degree of reaction to activities happening in the information infrastructure. Better knowledge of the status of this infrastructure at any time allows more appropriate reactions. Context-based event correlation can provide a more appropriate view of the cyber battlefield by providing users a view on the desired level of abstraction. We informally introduce context as the temporal and causal relations between events. Event correlation based on event patterns in a declarative language means we specify what to detect, instead of how to detect. We describe the Stanford University context-based event correlator that is able to process events on-line, as they are generated. It can be reconfigured dynamically while it is running. On the example of intrusion detection, we show how Complex Event Processing (CEP) increases detection rate, reduce false alarms, and detect large-scale attack patterns at an early stage.
1. Introduction
As organizations build ever-larger computer networks, security has become an increasingly important issue. The growth of the Internet has resulted in an increase in the size of individual networks as well as an increase in the volume of traffic flowing through the network. As system log files have grown to contain gigabits of data, it has become almost impossible to manually trace through log files to detect attempts at intrusion activity or other security violations. This has given rise to a need for software tools that automate network monitoring for security intrusions .
Typically, these tools gain knowledge of the target system based on individual events that denote the target system’s activities. When multiple events are considered at the same time we call the process correlation. Correlation allows us to deduce knowledge beyond what is in the individual events. Context-based correlation considers the context of events, including their temporal and causal relations to each other. The Stanford University Complex Event Processor models context of activities as partially ordered sets of events. Partial orders are a natural model of activity in asynchronous distributed systems that may not have a central clock or a single thread of causality. In this article we describe Complex Event Processing (CEP) and its application to cyber warfare in general and intrusion detection in particular.
Some of the results that can be achieved by using context-based event correlation are:
CEP provides on on-line overview of the state of the cyber battlefield. Patterns of events that may lead to a failure (e.g. a DNS server having slower and slower response time) are detected as they occur. CEP provides immediate notification of such conditions. Because the context of events is maintained, user driven drill-down from a notification message back to the root cause is possible. Automatic prioritizing of alerts and quick root cause analysis leads to reduced response time, higher up time and allows system managers to quickly respond to critical situations. CEP also allows automatic response based on pre-defined rules.
This article gives an informal introduction over CEP and the results achieved by applying it to cyber warfare, an ongoing process. More formal discussion of CEP and causal analysis, although not specific to cyber warfare can be found in . Also, we do not have quantitative measurement yet. How to quantitatively measure cyber warfare capabilities is an issue of ongoing research, and we do not have the answer.
This article has two major parts. Section 2 describes CEP applied to information survivability in general terms and section 3 discusses two experiments performed under the DARPA Information Survivability project.
2. The Stanford Event Correlator
This section starts out with our model of context-based event correlation and describes our approach to allow flexible configuration of the correlation. It then describes the architecture of our event correlation engine.
2.1. Context-based Event Correlation
In order to understand complex systems and efficiently deal with complex patterns, simple filters solely based on event content are not enough. CEP event correlation algorithms are based on earlier work on CEP theory (e.g. ). In the Stanford University CEP engine events are stored as complex objects together with their relationships to each other. CEP supports event relationships beyond time, e.g. causality: one event causes another. In today's networked environments, events come from multiple independent sources and not all events are ordered in respect to each other. If the original implicit partial order of events is reduced to a total order then information is lost and non-determinism is introduced . This is typically the case when activity is logged to a log file. Thus we use partial orders to maintain the original relations.
2.2. Concept Abstraction Hierarchies
A fundamental part of Complex Event Processing is the mathematical specification of concept abstraction hierarchies. Hierarchical organization of event pattern processing is a critical component in recognition and classification of complex activity. It ensures that users receive information at the appropriate semantic level. Low level events streams may be our main source of data, (e.g., raw network intrusion detector output). A technique is needed to correlate subsets of these events that signify possible parts of attacks. Hierarchical organization of CEP lets us configure CEP to detect event patterns of low level events and generate high level events that correlate and summarize the data in the lower level pattern instances. Higher level events can be fed to even higher level event processing agents, transitively, until a complex activity is humanly recognizable. An abstraction hierarchy gives us two very important things:
it defines a set of concepts in terms of which views of the target system can be constructed,
it structures the concepts into a hierarchy of levels, and defines mathematical relationships between the concepts in different levels.
We can use aggregator agents containing event pattern mapping rules to specify relationships between the concepts at different levels in the definition of an abstraction hierarchy. This gives us an event processing network (EPN) which is an executable form of hierarchy definition. This EPN will generate a hierarchy of events corresponding to the hierarchy of concepts. We can then add additional event processing agents (EPAs) that construct views consisting of abstract events corresponding to concept levels in a specified hierarchy. When sets of events denoting instances of concepts at one level occur, they trigger aggregators in an EPN corresponding to the hierarchy mappings which then generate abstract events denoting instances of higher level concepts.
Concept hierarchies also work the other way round. The user can specify actions on a high-level of abstractions and event processing agents translate these instructions into corresponding lower level events.
Abstraction Hierarchies are useful for understanding a system as they visualize levels of abstractions often used by humans when thinking about complex systems. Hierarchies are implemented by sequence of aggregation steps (called maps) that create more and more abstract events, all deduced from a basic source of events. The same mechanism can be used to provide different views of the same system to different users. The relation of events is maintained between different views.
2.3. Dynamic Causal Models
Dynamic causal models are one form of event processing. Events in event streams are automatically processed to include genetic information indicating which other events had to happen in order for them to happen. The genetic data in an event allows the events in its causal ancestry to be immediately uncovered by other EPAs, making their operation more efficient and accurate. Causal information must be recognized across multiple event streams. Extracting and developing accurate causal models for C2 environments is another research issue.
2.4. Dynamic Configuration
CEP queries are flexible and configurable at run time so the user can adjust them at any time. This includes starting a new query against an ongoing event stream, that either considers only new events, only old events, or both. A formal pattern language supports the construction of filters and maps . Filters and maps consist of rules of the form if condition then statement. As a simple example, consider the following rule:
PenetrationAttempt(targetIP is ?h) ->
RootActivity(hostIP is ?h)
=> Escalation(target is ?h)
The pattern specifies that two events PenetrationAttempt and RootActivity that are causally related (->) trigger (=>) the creation of an Escalation event. The value of the event parameters targetIP and hostIP is bound to the variable ?host, and has to be the same in both triggering events. This value is also assigned to the target parameter of the new Escalation event.
Having a high level event pattern language allows for rapid configuration of filters and maps. Patterns represent a declarative way of specifying what needs to be detected, instead of specifying how it should be detected. All filters and maps can be added, modified and removed at run time. Easy configuration is especially important for intrusion detection as policies may be site specific. Any intrusion detection system must be tuned for its environment.
2.5. Event Processing Networks
CEP is component-based to allow a variety of different components to process a broad spectrum of events. A component architecture also allows reuse of components across different environments. We build event processing networks (EPNs) consisting of any number of event processing agents (EPAs), namely event suppliers, event processors, and event consumers. EPNs are networks of communicating agents, where the communication topology of the network can vary at run time (i.e., is dynamic). Figure 1@@ shows an overview over the three categories, with broad arrows indicating the (logical) flow of events from suppliers through processors to consumers.
Event sources in our applications are typically middleware sniffers. The target system middleware can be pure TCP/IP, Common Intrusion Detection Framework (CIDF) (see experiment below), or any middleware layer, etc.
Typical examples for event processors are filters and maps. Filters pass on only a subset of their input, maps aggregate multiple events in the input to output events thus generating events on a higher level of abstraction. Any third party event processor can be inserted into an EPN allowing for the integration with other approaches. Typical consumers are viewers like a graphical viewer for partially ordered sets of events, or a tabular viewer of event frequency. Another important consumer in the Common Intrusion Detection Framework context is a responder (Figure 1@@). Responders generate activities in the target system based on events.
EPNs are dynamic in that all EPAs can be added and removed at run time. Newly added agents can ignore all previous events and just start with the current event at the time they are added, or they can try to catch up all events from the beginning. As EPNs are distributed, EPAs can reside on machines distributed across a network.
2.6. Relational Representation
The event pattern language used in CEP has been developed from the RAPIDE pattern language . In this section we give a short introduction to an alternative representation of the same event pattern language that is useful in representing and optimizing Event Processing Networks. The relational representation and possible optimizations are discussed in .
We work in an algebra on the set C of all the event containers in a given universe. We are either working in terms of binary relations over C px C q or in terms of predicates. The two notions are linked: For a relation definition, the following statement holds: x R y = PR(x,y), where PR is a predicate that defines R. In other terms, R={(x,y)|PR(x,y)}. Considering this way of viewing a relation (i.e. by its graph), one can apply the usual set operations to relations.
Composition is defined as:
![]()
This leads to the notion of inverse of a relation R, written R-1, where R;R-1 = Id(C p) and R-1;R = Id(C q)
We will also need to use the concept of power set:
![]()
(where we write y as a vector as a reminder that it is actually a subset of Cq).
Finally, because Event Processing Networks typically have fan-out portions, we also define the concept of pairs or tuples of relations:
![]()
A useful operator which is directly related to the concept of pairs is the dup operator, which transforms x into (x,x).
Finally, we can define conjugation by graphs: the graph of the conjugate of a relation R, written
, is the complement of the graph of R.
One can prove that C , with conjugation and composition has an algebraic structure.
From the basic definitions, we can define the usual set operators:
Intersection:![]()
Union:![]()
Difference:![]()
Given this framework, we can define the CEP patterns. A pattern p is a binary relation p: CxC such that:
if
matches p.
A pattern macro in CEP is a parametrizable pattern. CEP has six predefined pattern macros , which take patterns as parameters and are the glue that makes up the pattern language. These predefined pattern macros express causal relationships; for instance, aà b means "b causally follows a", a||b means "b and a are causally independent", and a~b means "b causally follows a or b and a are causally independent".
Because of their key importance in the language, and because we are following a bottom-up approach, it makes sense to start by expressing those predefined pattern macros, in terms of their predicates. This is summarized in the following:
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Here, we assume, according to the causal models, that a causal relation amongst events is known, namely
. This is a partial order, with the notion of equivalent events, noted
.
From the pattern macros, we can work towards more complex patterns, using the following relation: Given two patterns p1 and p2, and a pattern macro +, which is one of à , ~, ||, OR, AND or U, and the corresponding predicates, we have:
![]()
This is very powerful, as it lets us now write every non iterative CEP pattern.
Most of the agents that are used in Event Processing Networks can be defined in terms of the algebra presented in the previous section, possibly using the pattern language. As the number of such Event Processing Agents is quite large (and growing), we will only present here some typical and interesting examples, namely simple filters and maps.
Although it is probably not possible to express all possible EPAs in relational form (maybe not even every possible map), this representation is still useful to optimize an EPN when possible.
The task of a filter is, given a pattern p, to only let pass through those events that are part of a match for p. Filters can be very complex, given the power of the CEP filter language. They can maintain state between matches. Such state makes our representation much harder to write (although it does not make it impossible), and we therefore present the results for stateless filters.
Given this intuitive definition, two definitions are possible for filters:
The first one produces a set of computations, and can be written as:
. Here a given event can participate in several matches.
In the second definition, only the earliest maximal match is produced. Therefore a filter only produces one computation.
The second definition is the one that is currently implemented, primarily for efficiency reasons, and because, in most cases, only the earliest maximal match matters. However the first one is much easier to represent, because our framework does not include the notion of time, i.e. of which events came first, and which match is earliest.
To make up for the inaccuracy between theory and implementation and to make sure that following EPAs in the EPN get a computation rather than a set of computations as an input, we also define a special kind of EPA, called choose, which selects a computation from a set of computations.
Maps are potentially very complex EPAs. The basic idea behind map is that, given a domain and a range, rules will be triggered when events of the domain match a given pattern, and those rules may produce events in the range of the map. A map is usually a list of rules. A rule is usually made of a trigger (a pattern which describes events which will start the map) and a statement (a pattern which defines which event(s) are produced when the trigger matches).
Because maps can be arbitrarily complex (the map language is a large superset of the pattern language), not much can be done to optimize or even predict map behavior in general.
Therefore we will only concentrate on simple maps, which are basically maps that do not maintain any state. These maps can be represented as a list of triggers, which are CEP patterns, and a list of bodies, which are also CEP patterns. Whenever a set of events matches a pattern in the list of triggers, then corresponding body is executed, leading to another set of events. In the end, maps operate on CxC.
Before introducing the representation of maps, we define the matching of a rule
as
A single ruled map is a little more involved than that, because of the notion of consumption of events: a rule will only trigger on the first earliest set of events (in the causal sense). To represent that in our relational representation, we need to introduce the concept of prefixes. This notion is commonly defined in partial orders and therefore we will only say that a prefix is a subset of a computation which respects the causal ordering of events, i.e. if a prefix contains an event then it contains all the events which causally precede it.
We now have the tools to define a single-ruled map:

Here, the cl’s and cr’s are taken from the corresponding matches, defined in Rnatch, and
is the causal preserving disjoint union operator, i.e. it enforces elements to be disjoint while building them monotonically (following the causal order), which can be expressed using the notion of consistent cuts:
![]()
where
stands for "is a consistent cut of". This notion of monotonic union can also be expressed as:
![]()
Multiple ruled maps can be defined in terms of single rule maps, depending on the semantics of triggering: when multiple rules can trigger, only one of them should trigger. The decision of which rule will actually trigger in such cases is not made as of now because it is unclear whether random, round-robin or user specified priorities would be most useful.
2.7. Event Processing Infrastructure
The EPN is supported by an event service. The event service provides repositories for events and event schemas, templates for processors, and a consumer registry. Event Schemas define the format of events forwarded by the event service. The event repository stores events. Processor templates are configurable templates that can be activated to filter, aggregate, or otherwise process events. The consumer registry keeps track of which consumer is interested in which events.
Data needs to be stored persistently because agents may want to access past events, even long after they have happened. Also the number of objects currently under consideration may easily exceed the size of the available main memory, thus CEP requires some way of storing objects temporarily to disk. CEP includes an event repository that keeps track of all the objects. New objects are written into the data store from where agents and viewers read them. A communication service notifies other EPAs when new objects are added.
Events are processed on-line and are displayed to users as soon as they are created, limited only by the speed of the underlying infrastructure. Network sniffers produce relevant events at rates of thousands/minute or more. CEP supports these rates through efficient pattern matching algorithms, distribution and concurrency, and distributed query optimization. Processed events are displayed in viewers shortly after the underlying events have been created by the event source.
One of the future areas of CEP development is event mining. In the context of intrusion detection, event mining will search streams of intrusion alerts for previously unknown attack patterns to improve the added-value of event correlation .

Figure 1. Complex Event Processing (CEP)
3. Application to Information Survivability
3.1. Adoption of the CEP Pattern Language
An intrusion correlator typically is one component in a larger intrusion detection system. The Common Intrusion Detection Framework (CIDF) defines an integration platform for software applications defending in information warfare. Among the intrusion detectors that currently support CIDF are Network Associate Inc.’s Cybercop Scanner , ISS RealSecure , or SRI’s Emerald . A CIDF system consists of any number of distributed, heterogeneous detectors, correlators, and responders. They all communicate using the Common Intrusion Specification Language (CISL). CISL defines General Intrusion Detection Objects (GIDOs) that contain information about potential attacks and counteractions. Some of the ideas involved in CIDF have encouraged the creation of an Internet Engineering Task Force (IETF) working group, named the Intrusion Detection Working Group .
CIDF was a prime target on which to apply CEP. As a first step, we needed to develop a logger that was able to extract the required information out of CIDF GIDOs. However, once we have GIDOs in our internal format, we needed to extend our pattern language to more easily handle GIDOs. The event correlator typically wants to ask a single GIDO a specific question. We call such a question a basic pattern, it is matched against a single GIDO. The available basic patterns did not support the rich tree structure of GIDOs and we extended the basic pattern mechanism as described below. The result of this basic matching process is then used to correlate the GIDO with other GIDOs.
This extension of the pattern language can be implemented modular, within a single event processing agent. In this section, we will illustrate the first steps in developing a basic pattern extension for GIDO.
Let’s assume that the following is the input GIDO for the computation (GIDOs are easily understandable by humans, see for details).
(And
(ByMeansOf
(Attack
(Initiator
(IPV4Address 134.52.160.76)
)
(Target
(IPV4Address 134.52.160.114)
)
(AttackSpecifics
(Certainty 100)
(Severity 50)
(AttackID 000000020000000f)
)
(When
(BeginTime Mon May 24 12:44:17 1999 PDT)
(EndTime Mon May 24 12:44:18 1999 PDT)
)
)
(SendMessage
(Initiator
(IPV4Address 134.52.160.76)
)
(Receiver
(IPV4Address 134.52.160.114)
)
(Message
(IPV4Protocol 0)
(SourceIPV4Address 134.52.160.76)
(DestinationIPV4Address 134.52.160.114)
(ReferAs 0)
)
(When
(BeginTime Mon May 24 12:34:17 1999 PDT)
(EndTime Mon May 24 12:54:18 1999 PDT)
)
)
)
)
This is an attack that began on Monday, May 24, at 12:44. The first words in the parentheses are called sequence identifiers (SIDs) in CIDF terminology and they are standardized in the CIDF specification. Assuming we need to know the certainty of that this attack happened. We would write a basic pattern:
Attack,Certainty
For this input, CEP will search the GIDO for all the paths that contain "Certainty", and therefore will identify the path "And->ByMeansOf->Attack->AttackSpecifics->Certainty->". This will be the only path, and the certainty of this GIDO will be 100
In another case, we may be interested in the method used for this attack, e.g. if we are investigating attacks based on their methods. The basic pattern
ByMeansOf,Attack#true
Notice the string "#true" that comes after the pattern SIDs. This says that the answer should come from the SID that comes after Attack, not the child of Attack. This can also be implied by looking at the question string. Since the use of the conjunction SID "ByMeansOf" is to indicate the way an action is performed, we can see that the question is to find out the mean of the attack for this event. CEP will generate "SendMessage" as the method of this attack
Let's consider another example for more complex case. Suppose that we want to find out when the attack was made, but not really sure how the GIDO looks. This is the typical case, as we need to come up with patterns for GIDOs that have not been produced yet. One may come up with two possible patterns that may lead to the answer, like:
ByMeansOf,AttackSpecifics,When
ByMeansOf,SendMessage,When
First, CEP will iterate through both of these pattern strings, and generate paths according to the criteria mentioned above. For the same sample GIDO shown above, the generated paths will be
And->ByMeansOf->Attack->When->
And->ByMeansOf->SendMessage->When->
Then, the CEP will perform the selection operation to select one best path. Since the second pattern string can only be fully matched with the given GIDO structure, CEP will pick the second path among those two that are found, and extract the time from there.
3.2. Creating an event processing network
In this experiment, the CEP event correlator receives attack notification from detectors using CIDF and performs further correlation to create added value by sharing the results of multiple components. Figure 2@@ shows how CEP is integrated into CIDF. The framework includes middleware that provides secure transport of GIDOs. These GIDOs are formulated in the Common Intrusion Specification Language (CISL). The middleware and the language provide a mechanism for components to communicate (shown as thick arrows). The main components in the framework are detectors and correlators. Detectors monitor a part of the information infrastructure (network in Figure 2@@), such as a (sub)network, a host, or the middleware. Event correlators investigate the output for several purposes like reduction of false alarm or detection of large-scale attack pattern that involve simultaneous attacks to multiple parts of the target system. In the following attack is used for a single action detected by a detector and represented as a GIDO, attack pattern is used for a group of correlated attacks.
CEP/CIDF integration starts with a Log agent that listens to CIDF traffic and intercepts relevant GIDOs. Each GIDOs is then translated into an event in the CEP internal format (Level 0 events). In Figure 2, these level 0 events are stored to disk for later analysis. They are also displayed to the user in a viewer, in this case the CIDF Traffic Viewer. The level 0 events are also the basis for a first event-processing agent, called Extract. Because CEP has been optimized for strongly structured events, and CIDF events have a very flexible structure, we extract relevant information from the level 0 events and create another layer of events with a rigid structure (level 1). These events are then used as input for several other event-processing agents that may be built on top of each other as shown in Figure 2 (Correlate, Prioritize).
The logger, the event-processing agents, the viewers, and their connections as displayed in Figure 2 @@ build a small event-processing network (EPN).

Figure 2. CIDF-CEP Integration
In the following we look at a typical attack patterns going from network-probe to penetration of a host to gaining root-access. We set up an EPN that should alert us of such attack patterns as they happen. Figure 3@@ shows a screenshot of our management application RapNet displaying the EPN used for the following on the top right. RapNet lets you configure individual agents and networks of agents. Networks can be saved, edited, loaded, started, and stopped from within RapNet. The Operators panel displays the library of available agent templates as a hierarchy, grouped into Inputs (producers), Maps and Filters (processors), and Outputs (consumers). The maps group is currently expanded, and shows the five map agents available in the library. Two more entries contain Connections and drawing elements (Decoration) for documentation purposes. The agents in the library can be instantiated by dragging them over to the drawing panel at the right. The drawing panel currently shows the EPN consisting of a logger, four maps (their graphical representation is a box with a missing corner), and four viewers. The output of the logger is fed into a first map (Extract) the output of which is fed into three more maps and one viewer. The output of the three second level maps is fed into a viewer each.

Figure 3. Screenshot of CEP running on CIDF data.
The remaining four windows in Figure 3@@ are all tabular viewers. They correspond to the four boxes on the far right of the RapNet window. The bottom right window shows the GIDOs from the extract agent. It corresponds to the lowest viewer element in the EPN displayed above. This viewer shows GIDOs in our internal format (some false positives have already been filtered out inside the extract agent). The top left window displays the output of the top map in the EPN. It shows ongoing attack patterns in different stages of escalation. In this window, each ongoing attack pattern is displayed as a separate line. The second line shows an alert about unusual activity on one host. CEP assumes this attack is part of a bigger attack pattern, however, no penetration attacks to this host have been observed so far. Thus CEP suspects undetected penetration attacks. The two viewers on the bottom show attacks grouped by attack ID or the source IP number of the attack.
Figure 4@@ shows a tool that lets you review the history of correlated data. The window on the top right is again a control window that manages the other windows. The other two windows are tabular viewers, the upper one displaying the escalation events and the lower one the GIDOs from the extract agent. They display the same data as the top left and bottom right viewers in Figure 3@@. However, in Figure 3@@ each line represented an ongoing attack pattern and was updated on every new attack observed. The viewers in Figure 4@@ display one attack per line, thus an attack pattern is visible over multiple lines.
We now want to investigate the first escalation that is found by our escalation map. It is an attack pattern coming from a host with IP number 134.52.160.125. This escalation corresponds to the fourth line in the top left window of Figure 3@@. In Figure 4@@, it is event 11 in the upper viewer. In Figure 3@@, the attack pattern has proceeded further in time, thus the line reads Repeated penetration attack. At the time displayed in Figure 4@@, the message tells us that a first penetration attack from IP number 134.52.160.125 was just observed. The message contains ESCALATION, because so far we only observed network probes which are considered less harmful. All GIDOs that are correlated to the escalation event selected in the top viewer are automatically selected in the bottom viewer. We can see that before the penetration there was a SATAN scan from this machine, and then two IMAP buffer attacks. This mechanism allows us to quickly view an attack pattern over several levels of abstraction hierarchies.

Figure 4. Screenshot of CIDF data analysis with CEP
A few more things to note: The second column in the lower window displays the detector that observed an attack. In our example, two detectors were involved (IP numbers ending with 114 and 163). In this example CEP does not know the names of these detectors and displays <unknown>. For StackGuard it knows the name, and also displays it (event 9 in the bottom viewer). The escalation viewer displays the highest state of attack patterns in the 5th column. In the example above, this is only Penetration. More serious are the ones labeled Root, as these attackers successfully penetrated a machine and are working on getting root access.
4. Performance Issues
The event rate in the CIDF experiment was only a few events a second. In another experiment we have to deal with much higher data rates. In one particular instance, we used log files produced by a CISCO NetFlow FlowCollector running on the Stanford Network (SUNet) gateway. The FlowCollector logged traffic over a specified time interval and aggregated it into a log file. In this experiment our event correlator watches network traffic and event correlation is used to build a higher level view of the network traffic based on an ongoing series of log files and also to detect suspicious activity like the detectors in the CIDF experiment. In other, scale related experiments, we run EPNs with up to millions of events with no decline in throughput. We are also actively looking for industry partners for further experiments.
5. Related Work
CEP is not intended as a replacement to other intrusion detection tools but rather as a complementary correlator that combines the output of multiple tools to produce better results.
Tools like EMERALD already have slots available to plug in future and third party analysis engines. . EMERALD and its predecessor at SRI (IDES, NIDES) date back to 1992 . They mostly perform statistical analyses of events. Signature analysis and the integration of an expert system added correlation capabilities. This makes EMERALD more advanced than commercial systems that basically support only simple signature analysis, but the full context of events is not yet a part of the correlation. In fact, the overall architecture of EMERALD is very similar to our own architecture as it supports distributed components that work together to perform correlation. Using CIDF as described in the example above EMERALD and CEP components can easily exchange events.
The Purdue University COAST project approached intrusion detection with pattern matching based on colored Petri Nets . Apparently, only a small prototype has been implemented, and performance results are missing . Kumar and Spafford recognized that "the sheer rate at which data is generated" is the most dominant problem, the need for partial orders to model concurrency, online processing, and dynamic patterns (i.e. patterns can be changed at run time). COAST intrusion signatures (i. e. event patterns) are represented as Petri Nets. Transitions are associated with the type of events in the pattern. The incoming events trigger tokens to flow through the net. Events in the middle of the net represent partial matches of patterns. The main problem of this approach is that partial matches are kept around forever. Also, some patterns representing repetition of events are hard to implement in Petri Nets. Last but not least, COAST Petri Nets are based on temporal relations between events only, they do not model causality.
P-BEST , Wisdom and Sense and NADIR are examples of intrusion detection systems that employ rule-based analysis.
A number of commercial products are also available. Due to the proprietary nature of these systems, it is difficult to understand how their correlation engines work. Kane Security Monotor (KSM) checks the security log files on servers and workstations being monitored for certain activities and patterns. KSM analyses these using its SHADOWARE artificial intelligence engine, and determines which events are normal, and which are attempts to breach the systems security. As with all commercial products, information about the capabilities of the correlators is hard to find. Event Orchestrar is the current Network Associates, Inc. correlator. However, as for most products at this stage, the focus is on interoperation with all their tools.
The Open View correlation service as described in is limited to simple filtering and aggregation. SMART correlation books is restricted to simple event correlation . The correlation is not done by an EPN, but rather by single process based on a pre-compiled table of correlations. None of them support context-based correlation. Also, both and are focused on network management events, while CEP is applicable to any domain.
Intrusion Detectors like RealSecure , NetRanger , or are strongly focused on intrusion detection on the packet level. They do not perform complex correlation and produce large amount of alarms. However, their output makes a great input for CEP.
6. Future Directions
Future directions are split into two directions: Implementation and theory. We are constantly improving throughput using engineering methods. Currently we are working on better support for concurrency so EPNs can be more efficiently distributed.
Regarding theory, we are developing new pattern languages tailored to specific domain. For now, we used an existing pattern language for the intrusion domain. Our current language does not support statistical analysis very well. We plan to introduce support for simple statistics, like counting events that fit a certain signature.
We are also planning to make EPNs self-modifying by creating special agents that can modify other agents and create new agents to improve event correlation. As an example, complex and time consuming agents may be turned on and off by other agents only when needed. We need to improve our theoretical foundations to be able to model such self-modification. Also, event patters are often not a priori known. Event mining allows automatic detection of new patterns, irregularities, divergences, and discontinuities on-line . Event mining distinguishes itself from data mining (e.g. ) by the complexity of the underlying data (posets, vs. relational tables as often the case in data mining) and by the time constraints (constant on average per new event).
Regarding implementation, we mostly need to become even faster. This includes optimization for transient events, that need not be stored persistently, making better use of indexes and caching, etc. Additional work lies in interfacing with more systems by writing loggers.
7. References
(Note: Save as HTML in Word does not include EndNotes references. Read the PostScript file if you need those.)