20060912_scs

Download Report

Transcript 20060912_scs

Secure Compiler Seminar 9/12
Survey on Design of
Secure Low-Level Languages
Toshihiro YOSHINO, Yonezawa Lab.
<[email protected]>
A Secure Low-Level Language is Needed
 As the secure compiler target
language
 “secure” means it has a method
to prove program’s properties
Memory safe, control flow safe, …
For this, its concrete formal model
should be given
Verifier
 It should also be
low-level
To reduce complexity of JIT compiler
(In other words, TCB)
High-Level
Program
Secure Compiler
Secure
Low-Level
Language
JIT compilation
Machine
Executable
Existing Researches

1.
Two major approaches
TAL, PCC


2.
Extension to conventional assembly languages
Utilizes certain logic (such as type theories) to prove safety
Virtual Machines


Introduce intermediate languages of their own
Many of them adopt safe-by-construction design
e.g. Java VM is semantically safe in memory operation

Java VM, Microsoft CIL, mvm [Franz et al. 2003],
Jinja [Klein et al. 2006], ADL [Yoshino 2006], …
Comparison: PCC vs. Java VM
 PCC
Machine
Independent
 (Extended) Machine code + Proof
 Generated code and proof are machinedependent
 Requires one VC implementation for each
architecture
 Java VM
Machine
Dependent
 Verifier ensures type and control flow safety
 It often restricts optimizations
 Leads to performance degradation
 High cost to perform verification
 Stack is nothing more than a set of untyped
(variable-number) registers
Limitations of
Java Bytecode Verification
 Initialization check is incomplete
Correlation with other variables is not taken into account
Example:
class Test {
int test(boolean b) {
int i;
try { if(b) return 1; i = 2; }
finally { if(b) i = 3; }
return i;
}
}
 Incomplete common subexpression elimination
Cannot eliminate c.s. in address calculations (array refs)
Then What Should We Do?
Maximize machine-independent part
Avoids porting cost of the system
Tradeoff against the size of TCB (Trusted
Computing Base)
But recent works in PCC and TAL (e.g. Foundational
PCC) aim solely to minimize TCB
Reduce proof size and generation cost
PCC requires much effort to produce proof,
because the target’s level is very low
Registers and memory are untyped, etc.
mvm [Franz et al. 2003]
Aimed to find the semantics level that:
Is effective at supporting proof-carrying code
Can also be translated efficiently
into highly performing native code
(on many platforms)
Separated design between VM
layer and PCC layer
[1] M. Franz et al. A Portable Virtual Machine Target for Proof-Carrying Code.
IVME ’03.
mvm: Virtual Machine Design
Register-based architecture
The number of registers is not bounded
Registers are categorized by the type of values:
Integer, Boolean, Pointer, Address
Pointer registers are used to store pointers to heap
objects (more specifically, array heads)
Address registers are for storing results of address
arithmetic
• Bounds check is not performed in arithmetic, so it has to
be checked in higher layer
Heap can be used to store objects
Heap model is explained next
mvm: Virtual Machine Design
Integer
Boolean
1
false
42
true
Pointer
Address
…
…
…
…
label1:
instr
instr …
label2:
…
mvm Virtual Machine
Heap
mvm: Heap Model
mvm heap consists of arrays of objects
Object representation in mvm
Each object is tagged
Tag can only be written with new operation and is
immutable after creation
Two sections of data area: values and pointers
Integers, booleans are stored into the first section
Pointers are stored into the second section
1
42
mvm: Heap Model
 A type is associated with its tag value, layout
and structure
This association is managed by compiler
Layout describes the sizes of data sections
Structure describes the possible substructure inside
pointer section
 Example of type information
{} means disjunction
datatype T =
Int of int
| Pair of int * int
… T list …
<> means a tuple
mvm: Heap Model
 Example of a T list object tree
int*int
T
T list
mvm: Instructions
Arithmetics, Logical calculation
Similar to many other languages ;-)
Branch
Unconditional: goto label
Conditional: brtrue bi, label / brfalse bi, label
Condition must be taken from a boolean register
Jump is allowed only to a label
Conditional by object tag (RTTI): iftag
mvm: Instructions
 Object creation and access
pj = new(tag, ik)
 Creates an array of ik objects with type tag
r = load([sizev, sizep] | tag, pk, offset)
store([sizev, sizep] | tag, pk, offset, r)
 sizes and tag are used to check memory safety
 Pointer registers and address registers
Object access also permit address registers ak
This distinction is for supporting garbage collection
 Address registers always contain “derived” pointers
mvm: Instructions
 Accessing arrays
an = adda([sizev, sizep] | tag, pk, il)
 Calculates address of the il-th element in an array of type
tag stored at pk
in = getlen(pk)
 Guards
Bounds checking: CHECKLEN(pk, il)
Validity checking: CHECKNOTNULL(pk)
Type checking: CHECKTAG(pi, [sizev, sizep] | tag)
 These guards are inserted when static checking failed
An Example mvm Program
Type Safety in mvm Programs
Operations on primitives are all type-safe
Because registers to store values are distinct
Type-safety proofs are needed only for
non-primitive operations
Pointers, arrays and records
For every pointer operation, check that result
pointer:
Points to the beginning of an array, record or value
Points to an object of the correct type
Jinja [Klein et al. 2006]
A Java-like programming language built on
Isabelle/HOL
Formal description of Jinja language, Jinja VM
and compiler are given
Several properties were machine-checked
Big step evaluation and small step evaluation
(atomic operations) are equivalent
Compiler correctness
[2] G. Klein, T. Nipkow. A Machine-Checked Model for a Java-Like Language,
Virtual Machine, and Compiler. TOPLAS 28(4), 2006.
Jinja Language
 “Jinja is not Java”
 Object-oriented language with exceptions
A program is a set of class definitions and,
a class consists of several fields and methods
Method body is an expression
 Overriding is supported as in Java
But not overloading, because it is complicated
 Language is statically typed
Type system ensures that the execution of a well-typed
program never gets stuck
Jinja Language: Language Elements
 Values
 Boolean Bool b, Integer Intg i, Reference Addr a
 Null reference Null, Dummy value Unit
 Expressions
 Val v , binary operations e1 op e2 , Var V , V := e , e1; e2 , …
 Conditional: if (e) e1 else e2 , while (e) e’ / Block: {V:T, e}
 Object construction: new C
 Casting: Cast C e
 Field access: e.F{D} , e.F{D} := e
 D is annotation added in preprocessing (e.g., by typechecker)
 Method call: e.M(e, e, …)
 Exception: throw e , try e1 catch(C V) e2
Jinja Language: Semantics
Big step semantics
Typical operational semantics
State = <Heap, Local Variables>
Detail abbreviated because nothing special
Small step semantics
Finer-grained semantics
One-step evaluation
Useful for formalizing parallelism (?)
Each (small) operation is considered atomic
Not discussed in the paper
Jinja Language: Semantics
Big and small semantics are proven to be
equivalent
wwf-J-prog means “weak well-formedness”,
which is defined by the following properties:
Number of parameter types and of parameter names
are equal
“this” is not included in parameter list
Free variables in the method body only refer to this or
these parameters
Jinja VM
 Similar to Java VM
Stack-based machine with heap
 State = <addr option, heap, frame list>
First element is possibly a generated exception
Third element is a call-stack
 Frame =
<stack, registers, cname, mname, pc>
where stack = value list, registers = value list
Evaluation of operands are done on stack
Registers are for storing local variables
Jinja VM: Instructions
 Basic operations
Push v / Pop
Register operations: Load n / Store n
Arithmetics: IAdd, …
Logical operations: CmpEq, …
 Object manipulation
Construction: New cname
Casting: Checkcast cname
Field access:
Getfield vname cname / Putfield vname cname
Method invocation: Invoke mname n
Jinja VM: Instructions
Control flow operation
Branching: Goto n / IfFalse n
n is relative offset from the instruction
Exit from a method: Return
Exception
Throwing an exception: Throw
Information about exception handlers (try-catch)
are attached to method declarations
Handler is retrieved from there when needed
Jinja VM: Semantics
Please refer to the paper for detail
Basically, straightforward and intuitive
In this level, there are no runtime checks
For example, IAdd (Integer addition) does not
check whether its argument is really integers
Otherwise, the result is unspecified
This kind of checks is performed by a bytecode
verifier
Jinja VM Bytecode Verification
JVM relies on the following assumptions:
Types are correct
No overflow or underflow in stack
Code containment
Register initialization before use
Just the same as Java VM
Bytecode verifier statically ensures these
assumptions
Jinja VM Bytecode Verification
Abstract interpretation
Instead of values, consider only types
Addr 1
Intg 1
Addr 1
Addr 3
0
State
Class A
Int
Class A
Class B
Int
State Type
JVM Program
Addr 2
Addr 1
Class A
Addr 3
Class B
0
Class B
Int
Jinja Compiler and its Correctness
 2-staged compilation
1. Map parameter names to register indices
 Assign local variables to registers
 Gather variable occurrences and use it to lookup
2. Code generation
 expression → instruction list (compE2)
•
Straightforward definition
 Exception table generation (compEx2)
•
Separated from compE2, because exception table must
contain global addresses
Jinja Compiler and its Correctness
 Correctness of compilation

 If a program is weakly well-formed, then:
Heap,
Vars
Jinja
program
Heap,
Vars
Heap,
[Frame]
compilation
JVM
bytecode
Heap,
[]
Implementation of Jinja
 http://afp.sourceforge.net/entries/Jinja.shtml
About 20kLoC in Isabelle/HOL
Over 1,000 theorems are defined
It takes about 25 min. to process these proofs
on a 3GHz Pentium 4 machine with 1GB RAM
Summary of Today’s Talk
 We would need a secure low-level language for the
target of a secure compiler
 Minimize machine-dependent part to reduce implementation cost
 Also reduce cost for proof generation
 To answer this, surveyed two VM projects
 mvm
 Aimed to find the “sweet spot” that reconciles high performance and
small type-safety proofs
 Jinja
 Constructed a unified formal model of a Java-like language, the
underlying VM and compiler
 In contrast to mvm, this research is oriented toward higher-level
languages and compiler’s properties
How about ADL [Yoshino 2006] …?
The position of ADL is close to mvm
To provide a common basis of implementing
verifier for low-level languages
Assumed translation direction is opposite
mvm is an intermediate code of compilation
ADL is designed to simulate real machines
JVM, mvm
Secure L3
ADL
Machine
Code
How about ADL [Yoshino 2006] …?
 ADL takes minimalist approach
Only 7 kinds of commands
Instead, expression-based design to allow complex
formulae to be easily written
 ADL can be used as an intermediate language?
Probably some modification needed
 Register allocation is done, but except for variables
 Minimalist design, however, may increase complexity in
constructing a verification logic
• Abstract interpretation is often not sufficient, so a verification
logic may want to calculate exact values
More References
LLVM Project [Lattner 2000]
http://www.llvm.org/
Use VM for interprocedural optimization
SafeTSA [Amme et al. 2001]
SSA-based language for mobile code security
Dis virtual machine [Winterbottom et al. 1997]
Omniware system [Adl-Tabatabai et al. 1996]
(Typical) Compiler Construction and
Several Intermediate Languages
Lexing /
Parsing
Type
Checking
High-Level
Program
Normalize
(SSA, etc.)
LLVM,
SafeTSA
Java, CIL,
Jinja(VM)
Intermediate Code
Generation
Optimize
mvm
TAL,
PCC
ADL
Register
Allocation
Target Code
Generation
Pretty
Printing
Machine
Executable