SG Security Knoxville-State Analysis-Feb2012

Download Report

Transcript SG Security Knoxville-State Analysis-Feb2012

ASAP-SG: Summary
• Project Description:
– Utility-driven, public-private collaborative project to develop
system-level security requirements for smart grid technology
• Needs Addressed:
– Utilities: specification in RFP
– Vendors: reference in build process
– Government: assurance of infrastructure security
– Commissions: protection of public interests
• Approach:
–
Architectural team  produce material
–
Usability Analysis team  assess effectiveness
–
NIST, UtiliSec  review, approve
• Deliverables:
–
Strategy & Guiding Principles white paper
–
Security Profile Blueprint
–
6 Security Profiles
–
Usability Analysis
Schedule: June 2009 – June 2012
Budget: $3M/year
($1.5M Utilities + $1.5M DOE)
Performers: Utilities, EnerNex,
Inguardians, SEI, ORNL
Partners: DOE, EPRI
Release Path: NIST, UCAIug
Contacts:
Bobby Brown [email protected]
Darren Highfill [email protected]
ASAP-SG Security Profiles
• Advanced Security Acceleration Project for the Smart Grid
– Prescriptive, actionable guidance
– How to build-in and implement security
• Tailored to a set of specific smart grid functions, such as
COMPLETE
– Advanced Metering Infrastructure
– Third Party Data Access
– Distribution Management
– WAMPAC (Synchrophasors)
– Substation Automation
– Home Area Networks
COMPLETE
COMPLETE
COMPLETE
IN PROGRESS
PROPOSED
Methods from reliability engineering
and their application to cybersecurity
James Nutaro
Oak Ridge National Laboratory
[email protected]
Outline
• Failure identification
– State transition systems
– Applications
• Failure likelihood
– Markov models
– Applications
• Consequence assessment
– Dynamic models
– Applications
Failure identification
• We will use state transition models to
– Enumerate failures of a systems
– Prioritize failures
– Determine failure modes to high priority failures
– Device security controls to negate failure modes
What is a state transition system?
• A state transition systems has
– A set of state variables
– A range for each state variable
• A state is an assignment of values to the state
variables
• A transition is a change of state
• An trajectory of the system is a sequence of
states (or, equivalently, a sequence of
transitions)
An example of a simple data
processing systems, part 1
• Two state variables:
– data
– activity
• The data state variable
– Describes if data is presently available for the system
to process
– Range is none and present
• The activity state variable
– Describes what the system is doing
– Range is idle and active
An example of a simple data
processing systems, part 2
• This system has four
states
• It has sixteen
possible transitions
– Acceptable
transitions are
shown in the figure
– The system is
designed for
executions that
involve only these
transitions
data=none
activity=idle
data=present
activity=idle
data=none
activity=active
data=present
activity=active
An example of a simple data
processing systems, part 3
• Unacceptable
transitions are
shown in this figure
• Any execution that
includes one of
these transitions is a
failure – something
went wrong
data=none
activity=idle
data=present
activity=idle
data=none
activity=active
data=present
activity=active
Enumerating failure transitions
• The simplest failure is a trajectory that
consists of a single unacceptable transition
– Call this simplest failure a failure transition
• We can enumerate these transitions
– Given N states, there are N*N possible transitions
– M of these occur by design
– The remaining N*N – M are failure transitions
An example of a simple data
processing systems, part 4
data=none
activity=idle
data=present
activity=idle
data=none
activity=active
data=present
activity=active
Failure transition
Acceptable transition
Failure modes
• Each failure
transition has,
in general,
several causes
• These causes
are the failure
modes for that
failure transition
data=none
activity=idle
data=none
activity=active
A failure mode
Driver for the network
card incorrectly signals
the arrival of a data
packet
Security controls
• Security controls
are designed to
mitigate, negate,
or otherwise
render
implausible one
or more failure
modes
A failure mode
data=none
activity=idle
Driver for the network
card incorrectly signals
the arrival of a data
packet
A security control
data=none
activity=active
Require all drivers to be
signed and then verified
upon loading by the OS
kernel
Which failures to address?
• Most useful models will be much larger than our
example
– As the number of states grows, the number of failures
grows as the square of that
• Thousands upon thousands of failure transitions
– It is infeasible to address all of them
• One solution
–
–
–
–
Create a rule for prioritizing failures
Generate prioritized list based upon rule and model
Start at the top
Stop when out of time, money, or have met a coverage
criteria (e.g., top 10% of failures have been addressed)
Failure likelihood
• We will extend state transition models to
– Estimate the probability of a failure
– Use this as a tool for
• prioritization
• Estimating the benefit of a security control
• Markov chains will be our primary tool
Markov chain
• State transition
model plus a
probability for each
transition
• Sum of probabilities
on the transitions
away from a state
must equal 1
• Right is an example
with two states
0.5
0.5
0.9
0.1
Basic likelihood assessment
• The probability of particular failure transition
occurring during an arbitrary execution is
calculated by simulation
– Start in the initial state for the model
– Select a transition at random based on the
probabilities of the outgoing transitions
– Repeat until satisfied (e.g., confidence interval is
sufficiently small)
– Probability of particular transition is the number
of times it was taken divided by the total number
Other types of assessments
• Ranking of first failures
– What are my most likely problems?
– For each failure transition, calculate the likelihood
that it will be encountered first during an
execution of the system
• Mean transitions to fail
– How long until I encounter a problem?
– Determine the average number of acceptable
transitions prior to the first failure transition
Security controls
• A security control
reduces the
likelihood of the
failure transition
that it addresses
A failure mode
data=none
activity=idle
Driver for the network
card incorrectly signals
the arrival of a data
packet
A security control
data=none
activity=active
Require all drivers to be
signed and then verified
upon loading by the OS
kernel
Challenges
• Probabilities are difficult to come by in practice
– But there may be sufficient data to make a good guess
– e.g., how likely is it that without authentication you
will be subject to an unauthorized user?
– e.g., how likely is this is you use a particular password
policy?
– Lots of real world experience to build statistics from
here; possibly sufficient data in other cases
• Analysis can be quite involved (i.e., expensive in
terms of time and dollars)
Rewards
• A tool for guiding investment in cyber-security
– To what extent does a security control reduce my
likelihood of a system failure?
– Is the reduction worth the cost?
– How much is enough? Are my expected failure
rates acceptable?
Consequence assessment
• We will extend state transition models to
– Include time and dynamics
– Use this as a tool for
• Estimating the likelihood of unwanted physical effects
• Determining performance requirements for security
solutions
• Assessing risk
• Discrete event models will be our primary tool
Discrete event model
• All the elements of
state transition and
Markov models plus
– Interactions with
the outside world
(e.g., the system
being controlled)
– Evolution through
time
Input
State
Output
Method for consequence assessment
Discrete event
model of
computer system
Dynamic model of
system under
control
Uses of a combined model
• Links failure analysis to physical consequences
• Questions that might be answerable:
– Which failures pose the biggest risk in terms of
physical outcomes?
– How is my risk related to the speed with which I
can find and remove an intruder?
– How does a particular security solution affect
these risks?
Challenges
• Performance characteristics for some security
solutions may be difficult to obtain
– For example, how quickly does an intrusion
detection system find an intruder?
– How quickly can I remove that intruder?
• Analysis can be very involved (i.e., very
expensive in terms of time and dollars)
Rewards
• A tool for both understanding risk and guiding
investment in cyber-security
– To what extent does a security control reduce my
risk?
– Is the reduction in risk worth the cost?
– How much is enough? Are my expected risks
acceptable?
Comments and questions?