A Language-Based Approach to Confidentiality and Integrity

Download Report

Transcript A Language-Based Approach to Confidentiality and Integrity

Programming Languages
and
Computer Systems @Cornell
Keshav Pingali
CS & ECE
Highlights
•
•
•
•
•
Nationally recognized leaders in programming
languages and systems research
Broad coverage of these areas
Vibrant groups with strong students
Collaborative environment
Principled approach to system-building
Focus areas
•
•
•
•
High-performance systems
– Martin Burtscher, Rajit Manohar, Jose Martinez,
Sally McKee, Keshav Pingali, Radu Rugina,
Evan Speight
Distributed Systems
– Ken Birman, Alan Demers, Robbert van Rennesse, Fred Schneider, Gün
Sirer
Databases
– Alan Demers, Johannes Gehrke,
Jai Shanmugasundaram
Security
– Dexter Kozen, Greg Morrisett, Andrew Myers,
Radu Rugina, Fred Schneider, Tim Teitelbaum
Focus areas
• High-performance systems
– Memory-wall problem:
 Martin Burtscher,Jose Martinez,Sally McKee,Keshav Pingali, Radu
Rugina
– Fault-tolerance:
 Rajit Manohar, Keshav Pingali
– Resource-constrained computing:
 Jose Martinez, Rajit Manohar, Gun Sirer, Evan Speight
•
•
•
Distributed systems
Databases
Security
Memory-wall Problem
•
Processors double in speed every 1.5 yrs.
•
Memory is not keeping pace.
 On most programs, processors sit idle most of the time
– ~90% idle time for ASCI machines at DOE
– ~60% idle time for commercial TP workloads
Solutions to memory wall problem
•
Latency avoidance:
– caches
 Program
restructuring to exploit locality:
McKee, Pingali, Rugina
•
Latency tolerance:
– value prediction
 Burtscher
– multi-threading
 Martinez,
Rugina
Program restructuring for locality
•
Natural way of coding most algorithms results in
programs with poor locality
– Even something as simple as matrix multiplication which has
2
operations on O(N ) data!
•
3
O(N )
Program restructuring by compiler
– Transform program to improve locality

•
(eg) MMM: decompose into a bunch of smaller MMMs each of whose
working set fits in the cache
Techniques depend on data structure
– Dense arrays: Pingali
– Sparse arrays: Pingali
– Recursive data structures: McKee, Rugina
Ongoing projects
•
Immanuel system: (KP with UIUC)
– Estimating working sets to determine transformation parameters
optimally is difficult: architectures are very complex
– Can we use a combination of model-driven and empirical
optimization to determine these parameters?
– Autonomic computing: (IBM)

•
•
Self-configuring, self-optimizing systems
Applying such techniques to recursive data structure
programs (KP/McKee)
Hardware support for streaming applications:McKee
Ongoing projects contd. (Rugina)
•
Complex dynamic data structures: trees/graphs
– Reachability analysis
 Determine
heap locations reachable from stack
 Detect memory leaks, improve garbage collection
– Region analysis
 Compute
accessed regions: sub-lists, sub-trees
– Analysis of parallel programs:
 Threads
concurrently access shared data structures
 Must analyze potential interactions between threads
Value prediction (Burtscher)
•
•
•
•
Processor guesses value returned by load and speculatively computes
with guess
– If memory value = guess, OK
– Else backup and redo computation with correct value
Performance depends on
– Accuracy of predictor
– Overhead of back-up and redo
SPEC benchmarks: 25-50% of ints can be guessed correctly.
Ongoing work:
– use pattern mining to determine new predictors
– Application to trace file compression
Thread-level speculation (Martinez)
•
Problem:
– Data access patterns that escape parallelizing compilers
– Hand parallelization that is too costly
•
Idea: Add hardware for speculative execution
–
–
–
–
•
Sequential codes: threads execute ahead of time
Parallel codes: threads skip synchronization
Hardware watches for data races, corrects on the fly
Keep a fall-back plan: safe thread(s)
Outcome: More performance with less effort
Focus areas
• High-performance systems
– Memory-wall problem:
 Martin Burtscher,Jose Martinez,Sally McKee,Keshav Pingali, Radu
Rugina
– Fault-tolerance:
 Rajit Manohar, Keshav Pingali
– Resource-constrained computing:
 Jose Martinez, Rajit Manohar, Gun Sirer, Evan Speight
•
•
•
Distributed systems
Databases
Security
Fault-tolerance for grid computing (KP)
•
•
Developing a preprocessor that will transparently add application-level
checkpointing to MPI applications
New runtime protocol for non-blocking co-ordinated checkpointing
MPI source code,
no FT consideration
MPI source code
with app. level FT
our preprocessor
FT MPI application
native compiler
FT Application
FT Application
Co-ordination layer
Co-ordination layer
MPI
MPI
Network
Fault-tolerance at nanoscale (Manohar)
•
•
•
How do we design reliable systems with
unreliable devices?
Timing faults: asynchronous circuits
Real faults
– Cellular architecture
– Dynamic translation and virtual machines
– Fault recovery
Focus areas
• High-performance systems
– Memory-wall problem:
 Martin Burtscher,Jose Martinez,Sally McKee,Keshav Pingali, Radu
Rugina
– Fault-tolerance:
 Rajit Manohar, Keshav Pingali
– Resource-constrained computing:
 Jose Martinez, Rajit Manohar, Gun Sirer, Evan Speight
•
•
•
Distributed systems
Databases
Security
Resource-constrained Systems
•
How do we design systems in the presence of
resource constraints?
– Hardware resources: registers, execution units, …
– Energy constraints
•
Solutions:
– Micro-architecture level

Manohar
– Architecture level

Martinez
– OS level

Sirer
Energy-efficient Computing (Manohar)
•
•
•
•
•
•
Asynchronous circuits: no clocks
– Idle until activated (1.2x)
Microarchitecture optimization (1.5x)
Dynamic datapath adaptation (5.6x)
Dynamic voltage scaling
10x reduction in E
– At constant throughput
SNAP: low power asynchronous architecture for sensor networks
– Asynchronous processor optimized for network protocols
– MEMS battery + RF transmitter + sensor
– 100 year lifetime!
Resource-conscious processors (Martinez)
•
•
•
•
Problem:
– Processor-memory speed gap continuously widening
– Processor’s recipe: more in-flight instructions
– Traditional approach: enlarge resources (reg. files, queues, etc.)
– But technology not scalable: Pipeline depth, clock cycle, etc.
Observation:
– Many resources allocated to inactive instructions
 Waiting for data that is not (yet) coming
Idea: Become resource-conscious
– Grab resources when predicted to be needed (as late as possible)
– Release them when unlikely to be needed (as early as possible)
– Keep a fall-back plan: state checkpointing
Outcome: More performance with less silicon
Focus areas
•
•
•
•
High-performance systems
Distributed systems
Wide-area, Mobile and Peer-to-peer
 Ken Birman, Al Demers, Zygmunt Haas, Robbert van
Renesse, Fred B. Schneider, Gün Sirer, Evan Speight
– Astrolabe: Distributed Databases
– Galaxy: Cluster Computing
– COCA: Cornell Online Certification Authority
– MagnetOS: Operating Systems
– Bifrost: Mobile Computing
– SHARP: Routing Algorithms
– Kelips: Efficient Distributed Hash Tables
– Herbivore: Anonymous Communication Substrate
– Karma: Microeconomic Framework for P2P
Databases
Security
Isis, Horus and Ensemble (Birman)
•
Seminal work on group communication systems at Cornell
– The virtual synchrony model
– Group membership and multicast protocols
– Epidemic (gossip-based) algorithms
Isis is still used in some important settings:
– New York Stock Exchange (runs overhead quote display system), Swiss Exchange
(replicates state of the whole exchange in real-time)
– French Air Traffic Control System (PHIDIAS high availability console cluster architecture,
and also as the communication layer for inter-airport flight plan update reporting)
– AEGIS Naval Warship integrated information system
•
Recent focus: Spinglass
•
– State replication ultimately doesn’t scale well enough to deal with a new generation of
really big systems
•
Spinglass project responds to this need
– Goal is to provide robustness but also scalability, stability under stress
– Looks for probabilistic instead of deterministic reliability guarantees
Ken Birman
Five Spinglass Technologies
•
•
•
Astrolabe: A virtual database constructed with peer-to-peer techniques,
tracks system state and supports a novel form of scalable data mining. For
large-scale distributed control.
SelectCast: Uses Astrolabe to support a powerful new kind of IP multicast
with a pub-sub capability
Bimodal Multicast: A protocol that can be layered over SelectCast to provide
scalable, reliable multicast
– All three can be combined to support scalable, fault-tolerant publish-subscribe.
– An initial target is Air Force Joint Battlespace Infosphere
•
•
Kelips: A distributed hash table, for rapidly finding things in very large
networks (like Napster and Gnutella)
Galaxy: Scalable tools for Web Services and Data Center or cluster
management
Ken Birman
Publisher offers new events to a proxy server. Subjects
are partitioned among the server sets. In this example
there are two partitions: yellow and green. Server set
and partition function can adjust dynamically
Scalable Event
Reporting
Subscriber uses Astrolabe
to identify the best servers.
log
Server cluster
Astrolabe manages configuration and
connection parameters, tracks system
membership and state.
Bimodal
multicast
Subjects are partitioned
among servers hence
one subscriber may
make multiple
connections
Virtual “summary” table
SQL query
Name
Avg
Load
WL contact
SMTP contact
SF
2.6
123.45.61.3
123.45.61.17
NJ
1.8
127.16.77.6
127.16.77.11
Paris
3.1
14.66.71.8
14.66.71.12
Name
Load
Weblogic?
SMTP?
Word
Version
swift
2.0
0
1
falcon
1.5
1
cardinal
4.5
1
Name
Load
Weblogic?
SMTP?
Word
Version
6.2
gazelle
1.7
0
0
4.5
0
4.1
zebra
3.2
0
1
6.2
0
6.0
gnu
.5
1
0
6.2
San Francisco
…
SQL query
…
New Jersey
Like the subscribers, each publisher connects to the “best” proxy (or proxies)
given its own location in the network. The one selected must belong to the
partition handling the subject of the event.
Ken Birman
Mobile Computing
•
•
•
•
Mobile systems are increasingly ubiquitous
• Cheap wireless networks enable interconnection
A central problem in mobile computing is routing
• Need to discover and disseminate routes
• Traditional solutions: proactive or reactive
Cornell: hybrid routing protocols
• ZRP: Zone Routing Protocol
• SHARP: A hybrid routing framework
Hybrid protocols support a wider range of
Zygmunt Haas
network conditions and networking applications
Gün Sirer
Bifrost
•
•
•
Mobile applications are stripped-down versions of
their desktop counterparts
A better system for an “almost-always” connected
environment:
– “Page in” needed application functionality
– Dynamically decide where to place the
“execution component”
Bifrost: Extensible, adaptive system over DCOM
– Provides location and device independent access to data
– Bifrost adaptively determines best execution location for a component
– Components migrate between servers and clients using a decision making
process based on fuzzy-set theory
Evan
Speight
– Just-in-time application delivery
MagnetOS
•
Ad hoc networks are an emerging domain, but…
– No programming model
– Hard to develop applications
– Competing applications can jam the network
•
MagnetOS is an adaptive operating system for
power-efficient computing on ad hoc networks
– A unifying single-system image abstraction
– Automatic partitioning
– Transparent component migration
•
Factor of 4-5 increase in system longevity
Gün Sirer
Herbivore
•
Internet communication protocols do not
provide privacy
•
Herbivore: A scalable, efficient, provably
anonymous communication system
•
Provably anonymous
– Dining cryptographer networks
•
Scalable
– Divide and conquer the
network into cliques
•
Efficient
– Wire-level protocol sends only two bits per client
Gün Sirer
Karma
•
The newly-emerging peer-to-peer computing model is
based on a principle of cooperation
– Yet up to 70% of peers freeload, i.e. use many more
resources than they provide to others
•
Karma is a peer-to-peer microeconomic system for
keeping track of the resource contributions of peers
– Associate karma, a measure of their contributions to the
global community, with each participant
– Provide the ability to transfer karma, securely and atomically,
in exchange for resources
– Create a microeconomic foundation for P2P sharing Gün Sirer
Focus areas
•
•
•
High-performance systems
Distributed systems
Databases
–
–
–
–
–
–
•
Amazon: Stream Processing
Cougar: Sensor Databases
Deep Glue: Internet Querying
Himalaya: Data Mining
PEPPER: Peer-to-Peer Databases
Quark: Unifying Databases and IR
Security
Data Stream Processing
Data streams versus traditional databases:
–
–
–
–
–
Persistent relations  High-speed data streams
Interactive queries  Continuous queries
Random access  Sequential access
Huge disk store  Bounded main memory
Central repository  Physically distributed streams
Applications:
– Network monitoring: Network packet streams
– Financial applications: Stream of stock data, news feeds
– Retail, E-business: Stream of transactions
Ongoing Research
•
•
•
•
Synopsis data structures
Approximate data stream query answering
Stream
Synopses
Semantic load shedding
(in memory)
High-speed archival
Stream
Processing
Engine
Data Streams
Archive
(on disk)
(Approximate)
Answer
Sensor Databases
Traditional
Procedural addressing of individual sensor
nodes; user specifies how task
executes, data is processed centrally.
Cougar
Complex declarative querying and tasking.
User isolated from “how the network
works”, in-network distributed
processing. Abstraction of a
centralized, virtual database.
16
10
17
10
Experts on ants estimate that there are
or
ants on earth.
In the year 1997, we produced one transistor per ant.
Research in Sensor DBMS
•
•
•
•
Query-centric sensor data storage
Query language
Query optimization, query processing, catalog
management
Inter-layer optimizations
Querying the Internet: The Present
HTML
File
HTML
File
The Internet
www.google.com
Nobel prize physics 1999
Search
Engine
Who won the
Nobel prize for
Physics in 1999?
Querying the Internet: The Present
HTMLHTML
HTML
HTML File
HTML
File File
File
File
HTMLHTML HTML
File File
File
The Internet
www.google.com
1998 Red BMW price
Search
Engine
Want 1998 Red BMW
No accidents
20% < avg. model price
Querying the Internet: The Future?
(Queryable)
(Queryable)
XML
Source
(Queryable)
XML Source
XML Source
The Internet
www.google+.com
Structured Query Through GUI
Internet
Query
Engine
Want 1998 Red BMW
No accidents
20% < avg. model price
Peer-to-Peer Databases
•
No centralized structure
– Scalability
– Fault-tolerance
•
•
View P2P system as one big database system!
Issue complex queries on data
– Find all “Metallica” CDs that cost less than $10
•
Standing queries
– Let me know when a “Metallica” CD that costs less than $10
becomes available
Querying the Internet: The Future?
(Queryable)
XML Source
(Queryable)
XML Source
The Internet
www.google+.com
Structured Query Through GUI
(Queryable)
XML Source
Want 1998 Red BMW
No accidents
20% < avg. model price
Focus areas
• High-performance systems
• Distributed systems
• Databases
• Security
Some security problems
•
Complex software, critical infrastructure
– downloadable software: user vs. provider
– B2B, financial systems, user profiles
– military information systems
•
Problems:
–
–
–
–
malicious downloaded and mobile code
mutual distrust
privacy and data integrity
untrustworthy hosts and hardware components
A
B
C
D
Language-based Security
Exploit and extend programming language- based
technologies to enable
– specifying security policies
– enforcing security policies
– increasing assurance
– relocating trust
in software systems.
Multiple abstraction levels
•
•
•
•
High-level languages
– Myers, Rugina
Assembly language
– Morrisett
Object code
– Schneider, Morrisett
Boot firmware
– Kozen, Teitelbaum
HLL: Static information flow (Myers)
•
Programs are annotated with
information flow policies for
confidentiality, integrity
•
Compiler checks, possibly transforms Target Code
program to ensure that all
executions obey rules (type
checking + type inference + type
erasure)
•
Loader, run-time validates program
policy against system policies
Source Code
Policy
Policy
Executable code
System
Policy
Type systems for legacy code (JGM)
•
Cyclone: a type-safe dialect of C
– Retains the control & interoperability of C.


You control memory management.
No hidden garbage collection or type-tags.
– But gives the same type-safety guarantees as a high-level
language (e.g., Java or ML):

e.g., can’t access deallocated memory, overflow buffer, …
– Useful in many settings:

Drivers, protocol stacks, kernel modules, etc.
– Ideas have been folded into MS’s development tools.
Inlined Reference Monitors (IRM):FBS
Program
Safety
policy
Rewriter
Object code
satisfying policy
Given a safety policy, we can
automatically rewrite a program
to satisfy the desired policy using
an Inlined Reference Monitor.
The Rewriter:
• inserts checks at all points where
events of interest may occur.
• Makes sure the checks cannot be
bypassed.
• Microsoft .NET supporting the
research.
Typed Assembly Language (JGM)
“RISC”-like type system for assembly language:
– Guarantees machine code is type-safe

no need to trust the rewriter, optimizer, JIT, etc.
– A general set of primitive type constructors:



Independent of a particular source language (e.g., Java).
Can encode high-level language abstractions (e.g., objects, ADTs,
classes, etc.)
Can encode compiler invariants (e.g., calling conventions).
– TALx86 tools and TAL-targeted compilers

Linux and Win32 supported binaries
– CMU, Penn, Princeton all using it in research projects.
BootSafe (Kozen)
•
•
•
•
Purpose: Guard against malicious boot firmware
Context: Open Firmware (IEEE 1275-1994)
Approach: Compile device drivers written in Java to annotated
fcode (Forth VM code), statically verify against a strict
security policy at load time
Components:
– J2F, a certifying Java bytecode to Forth fcode compiler
– Trusted verifier, part of the Open Firmware kernel
– Java API and runtime support package
•
Project status: Compiler, runtime,
12/02
st1 level
verifier operational,
BootSafe architecture
Firmware Development
Open Firmware System
Boot driver
written in Java
trust boundary
Java compiler
Java bytecode
Open Firmware
boot kernel
API
BootSafe
verifier
J2F compiler
device
annotated
fcode
burn into
ROM
annotated
fcode
load
fcode
fcode
fcode
Research that has impact …
•
•
•
•
Compiler optimizations
– Pingali: Intel’s Merced, SGI, DEC DCPI, HP
Virtual synchrony
– Birman, van Renesse and Vogels: Isis Inc, French air traffic control
system, NYSE, etc.
Distributed virtual machines
– Sirer: Appliant Inc, Hewlett-Packard ChaiVM
Inline reference monitors
– Schneider and Morrisett: Microsoft .NET IRM
•
National Academy Trust in Cyberspace study
•
AFRL/Cornell Information Assurance Institute
PL and Systems @Cornell
•
•
•
•
High-performance systems
– Martin Burtscher, Rajit Manohar, Jose Martinez,
Sally McKee, Keshav Pingali, Radu Rugina,
Evan Speight
Distributed Systems
– Ken Birman, Alan Demers, Fred Schneider, Gün Sirer
Databases
– Alan Demers, Johannes Gehkre,
Jai Shanmugasundaram
Security
– Dexter Kozen, Greg Morrisett, Andrew Myers,
Radu Rugina, Fred Schneider, Tim Teitelbaum