Maintaining A High Security C4I Information System for

Download Report

Transcript Maintaining A High Security C4I Information System for

A High Security C4I
Information System for
Emergency Management
Joseph E. Johnson
[email protected]
University of South Carolina
Norfolk VA DARPA Oasis Conference
February 16, 2001
Our Team




Theoretical Physics – 3 PhD Faculty + 2 GRA
Computer Science – 2 PhD Faculty + 1 Post
Doc + 1 GRA
Applied Mathematics – 2 PhD Faculty +2 GRA
My R&D group – several Oracle / Web / Network
development staff.
Phase 0 - Non-DARPA Work

Historic Foundations & Background
Historical Setting – A real-world
system





Our R&D group has designed and developed the SC Emergency
Management Information System called IRIS.
IRIS is a Web based Java & Oracle 8i system running on IBM
RS/6000 (with North American maps & pager triggers).
IRIS has managed the SC emergency information for over 4 years
including Hurricane Floyd, the largest US peacetime evacuation.
We have a working voice recognition I-O interface to IRIS allowing
updating and querying of the Oracle database by cell phone.
IRIS is a C4I type system that manages all incident reports,
resource requests, messaging, resources and critical facilities
databases and logging of ongoing actions.
Objective


Maintain full operation capability of this
statewide emergency information
management system at 99.999% (down 5
minutes/yr).
System is to run via browser on the
Internet with a large number of users at
low security.
The Problem: IRIS Must Function
Adequately Under All Threats



Acts of Nature: hurricanes, floods, fires,
earthquakes, tornados, loss of power or ISP.
Acts of Man Unintentional: HW/SW & Network
failures & bugs, human error, BNCI disasters.
Act of Man Intentional: Hackers, terrorists,
disgruntled employees, war, a full spectrum of
acts of criminal intent & intentional BNCI.
Some Good Features






Rather complex roles & permissions are managed using
Oracle with individual Ids.
Use of 128 bit encryption – secure socket layer.
Reasonably good firewall.
System not yet successfully hacked (not a challenge).
Separate tables contain a full continuous backup of
every historical image (no data is ever erased or
overwritten).
A complete mirrored system runs at USC to backup the
State system (but with substantial delay).
Difficulties So Far









Firewall reconfigurations have blocked users.
False alarms, as well as duplicate & bad data.
ISP failures (before new site)
Power failures (before new site).
Earlier programming errors (e.g. an incident site Lon/Lat is plotted at
the city geocenter which is in the bay).
User misuse of technical terms and codes.
Mistakes of technical staff in system management.
Difficulty of managing a rapidly changing approved user list.
Technical infrastructure problems of a continuing random nature.
Dominant Anticipated Threats





BCNI Catastrophe or Hurricane - Massive
Incident with Staff and Information
Overload.
Loss of Internet & Telephony
Terrorism & Hackers
Earthquake, Tornado, Flood, Fire.
Random Unanticipated Failures
Philosophy – Phase 1:



The dominant threats are correlated with a given
site (earthquake/hurricane, ISP loss, terrorism,
random failures, BCN incident, disgruntled
employees, poor technical personnel actions).
If we can develop a robust and rapid dominant
host site transfer, we minimize site correlated
failure.
Also multiple systems could best handle multiple
regions in the future, as disasters are usually
geo-centered.
IRIS Catalogues Major Threats


The emergency management software
must track the system failures including
power, internet, and computer failures.
It must also track its own failure (on a
replicated system).
Phase I - DARPA Funded
Work
System Replication
Solution Phase 1: System Replication at
Widely Dispersed Sites.



We have installed an IBM H70 at each of USC,
UU, and MHPCC in secure environments.
The IRIS system with current SC real data is
currently being replicated over the three sites.
Oracle replication specific to the needs of this
C4I type system is being studied.
Objectives:




Maximum system availability at the minimum
cost.
Minimum information loss upon fail-over.
Reasonably good security at reasonable cost for
a system with a highly dynamic user base.
Multiple hosts well synchronized with fail-over
and potential immersion in a larger set of hosts.
Reliance on expert staff to monitor
system for additional security



Staff monitors statistical aspects of network
traffic, data density categorized by type,
intensity, threat, and geography (Oracle queryby-example filters).
Staff monitors and manages users added and
deleted and usage by role and individual.
Specialized regional personnel are responsible
for data quality assurance in their area.
Advantages of this Environment





The entire operation is wrapped in Oracle
Each piece of information is an Oracle record.
County senior administrator oversees all data
submissions which are invisible until approval.
All mail is an Oracle record – no attachments.
“Regular” mail is maintained on a separate NT
server and separate network.
Photos & enclosures are allowed separately
Phase 2 - DARPA Funded
Work
Network Attacks –
Threats that are not Site
Specific
Philosophy




Read-only access is not a disaster
Write or Use denial is a major disaster.
Keep only one of three systems on-line
replicating to the other two.
Rapidly identify a crisis, identify problem,
and ‘repair’.
Approach to Identify Network
Attacks

Treat the network information as a
Complex System
Theoretical Analysis of the Local
Network Traffic


To understand the structure of the
variables for internet host-to-host
communications, we used dumped output
of our local network traffic.
1. Dump of IP traffic at EPD site (get
characteristics of different ‘emergencies’)

2. Dump of IP traffic at a major USC site
(get characteristics of different ‘attacks on a general
system’).
Real Time Identification of Threat

Parameters encapsulated in all IP packets have
been divided into two classes – dynamic (those that
change during propagation) and static (those that are
unchanged).


The information traffic for the host-to-host
communication can be described a s a trajectory
in a multi-dimensional static parameter space
There are well defined patterns in the parameter
subspaces related to the ‘normal’ and ‘abnormal’
network behavior.
First Set of Objectives



What is a characteristic dimension of the
network parameter space?
How many nodes are needed to consider
the network as a ‘complex enough’
system?
How does the dimension of the space
depend upon the network topology and
the number of nodes?
Desired Outcomes




A structure of the possible network intrusion in
terms of the network parameters.
A quantitative method for the classification and
characteristics of attacks.
A model independent way to obtain the best
possible (optimized) level for the detection of an
intrusion for a given class of intrusions.
The ability to detect an abnormal network
behavior at the reconnaissance stage of the
attack.
Methods for Pattern Recognition for
Intrusion Detection in Real Time



Fast Fourier Transform (FFT) for obtaining
stable nodes of the network.
Random Matrix Theory simulations to be
able to distinguish between chaotic and
regular network behavior.
Wavelet Analysis (WA) for fast pattern
recognition used for network analysis and
for detection of possible intrusions.
Results to Date


We have found patterns related to ‘normal’
and ‘abnormal’ network behavior.
We have also found that for the initial set
of examples under investigation, the
abnormal behavior can be detected on the
reconnaissance stage of the attack.
Conclusions




Phase 1 Replication is operational but needs a
lot of ‘tuning to minimize transient data loss and
optimize ‘fail-over’
Phase II Initial results are promising and will be
reported in detail in 6 months at the next PI
meeting.
If Phase II works, we seek to be able to identify
an attack and general type and thus activate a
fail over prior to a system compromise
We will probably fail-over to a secure
(disconnected from open internet) mode of
operation that only lets core staff onto the fail-
Our Hope



Our hope is that the approach, of treating
selective network space of parameters+time, as
a complex system can be scaled to systems of
any size and design.
It is essential to be able to process the vast
amount of network data in real time. We
envision an attached Linux cluster.
We will seek to identify both anomalies and
signatures.
JL Question 1:
What threats is your project considering?


Part 1 considers all threats except network
attacks to a central Web/Oracle C4I
system.
Part 2 considers all network attacks on
any system.
JL Question 2:
What assumptions does your project make?


Part 1 Assumes that most threats to a system
are site specific and if the system is a relational
database, these can be managed by server
replication and fail-over.
Part 2 Assumes the network information host-tohost traffic can be described as a trajectory, in a
multi-dimensional space of parameters+time,
and which behaves as a complex system
revealing patterns indicating ‘normal’ and
‘abnormal’ behavior.
JL Question 3:
What Policies can your Project Enforce?


Part 1 allows fail-over to a replicated
system based upon any policy indicating
an appropriate threat.
Part 2 allows any policy actions resulting
from abnormal network patterns.
JL Question 4:
What policies can the group of projects enforce?


Part 1 suggests the policy, where possible, of
storing information in relational DB wrappers
and replicating to dual systems on a continuous
basis for mission critical applications.
Part 2 suggest that agents triggered by
abnormal pattern detection in our system could
act in concert with agents from other projects
indicating attack and thus triggering an
appropriate response or human intervention.
Phase III – Non-DARPA Parallel
Work
Phase 3 – Outside and Parallel to
This Project

Cost-Benefit Analysis




Cost/benefit analysis cannot be achieved devoid of
an underlying value of the associated information.
We are working to quantify the cost-benefit of
damage, response, resources, and general system
management.
All attacks result in ‘misinformation or ‘lack of
information’ which results in higher costs as directly
linked to the ‘information failure” of the system.
We believe that this problem is of the greatest
importance for all decision makers.
Phase 3 – Outside & Parallel to
This Project

Synthesis of Expert Opinion




Voting weights from both experts and software agents
(such as the FFT/wavelet system we are studying)
can be folded with each other and with human votes
on system malfunction.
Such a system can be self-adjusting.
We have derived associated sets of coupled
nonlinear equations and can show that iterative
solutions can be found with rapid convergence.
We will study these applications if possible at the end
of our current project.
Phase 3 – Outside & Parallel to
this Project

Use of numerical uncertainty and continuous
valued logic as data.


Use of Markov models


Use to record probabilities of threat as output of both
software agent and on-line experts via their
‘evaluations’ of each threat using a continuous valued
logic developed by the PI.
We are looking at the use of these models to evaluate
costs-benefit analysis.
Use of User Agents that constantly test
performance features including timing and report
information will be explored.
Acknowledge Support




DARPA
Supporting Sites: USC, UU, MHPCC
SC EPD
My team – that works a lot on their own
initiatives.