20060912_scs
Download
Report
Transcript 20060912_scs
Secure Compiler Seminar 9/12
Survey on Design of
Secure Low-Level Languages
Toshihiro YOSHINO, Yonezawa Lab.
<[email protected]>
A Secure Low-Level Language is Needed
As the secure compiler target
language
“secure” means it has a method
to prove program’s properties
Memory safe, control flow safe, …
For this, its concrete formal model
should be given
Verifier
It should also be
low-level
To reduce complexity of JIT compiler
(In other words, TCB)
High-Level
Program
Secure Compiler
Secure
Low-Level
Language
JIT compilation
Machine
Executable
Existing Researches
1.
Two major approaches
TAL, PCC
2.
Extension to conventional assembly languages
Utilizes certain logic (such as type theories) to prove safety
Virtual Machines
Introduce intermediate languages of their own
Many of them adopt safe-by-construction design
e.g. Java VM is semantically safe in memory operation
Java VM, Microsoft CIL, mvm [Franz et al. 2003],
Jinja [Klein et al. 2006], ADL [Yoshino 2006], …
Comparison: PCC vs. Java VM
PCC
Machine
Independent
(Extended) Machine code + Proof
Generated code and proof are machinedependent
Requires one VC implementation for each
architecture
Java VM
Machine
Dependent
Verifier ensures type and control flow safety
It often restricts optimizations
Leads to performance degradation
High cost to perform verification
Stack is nothing more than a set of untyped
(variable-number) registers
Limitations of
Java Bytecode Verification
Initialization check is incomplete
Correlation with other variables is not taken into account
Example:
class Test {
int test(boolean b) {
int i;
try { if(b) return 1; i = 2; }
finally { if(b) i = 3; }
return i;
}
}
Incomplete common subexpression elimination
Cannot eliminate c.s. in address calculations (array refs)
Then What Should We Do?
Maximize machine-independent part
Avoids porting cost of the system
Tradeoff against the size of TCB (Trusted
Computing Base)
But recent works in PCC and TAL (e.g. Foundational
PCC) aim solely to minimize TCB
Reduce proof size and generation cost
PCC requires much effort to produce proof,
because the target’s level is very low
Registers and memory are untyped, etc.
mvm [Franz et al. 2003]
Aimed to find the semantics level that:
Is effective at supporting proof-carrying code
Can also be translated efficiently
into highly performing native code
(on many platforms)
Separated design between VM
layer and PCC layer
[1] M. Franz et al. A Portable Virtual Machine Target for Proof-Carrying Code.
IVME ’03.
mvm: Virtual Machine Design
Register-based architecture
The number of registers is not bounded
Registers are categorized by the type of values:
Integer, Boolean, Pointer, Address
Pointer registers are used to store pointers to heap
objects (more specifically, array heads)
Address registers are for storing results of address
arithmetic
• Bounds check is not performed in arithmetic, so it has to
be checked in higher layer
Heap can be used to store objects
Heap model is explained next
mvm: Virtual Machine Design
Integer
Boolean
1
false
42
true
Pointer
Address
…
…
…
…
label1:
instr
instr …
label2:
…
mvm Virtual Machine
Heap
mvm: Heap Model
mvm heap consists of arrays of objects
Object representation in mvm
Each object is tagged
Tag can only be written with new operation and is
immutable after creation
Two sections of data area: values and pointers
Integers, booleans are stored into the first section
Pointers are stored into the second section
1
42
mvm: Heap Model
A type is associated with its tag value, layout
and structure
This association is managed by compiler
Layout describes the sizes of data sections
Structure describes the possible substructure inside
pointer section
Example of type information
{} means disjunction
datatype T =
Int of int
| Pair of int * int
… T list …
<> means a tuple
mvm: Heap Model
Example of a T list object tree
int*int
T
T list
mvm: Instructions
Arithmetics, Logical calculation
Similar to many other languages ;-)
Branch
Unconditional: goto label
Conditional: brtrue bi, label / brfalse bi, label
Condition must be taken from a boolean register
Jump is allowed only to a label
Conditional by object tag (RTTI): iftag
mvm: Instructions
Object creation and access
pj = new(tag, ik)
Creates an array of ik objects with type tag
r = load([sizev, sizep] | tag, pk, offset)
store([sizev, sizep] | tag, pk, offset, r)
sizes and tag are used to check memory safety
Pointer registers and address registers
Object access also permit address registers ak
This distinction is for supporting garbage collection
Address registers always contain “derived” pointers
mvm: Instructions
Accessing arrays
an = adda([sizev, sizep] | tag, pk, il)
Calculates address of the il-th element in an array of type
tag stored at pk
in = getlen(pk)
Guards
Bounds checking: CHECKLEN(pk, il)
Validity checking: CHECKNOTNULL(pk)
Type checking: CHECKTAG(pi, [sizev, sizep] | tag)
These guards are inserted when static checking failed
An Example mvm Program
Type Safety in mvm Programs
Operations on primitives are all type-safe
Because registers to store values are distinct
Type-safety proofs are needed only for
non-primitive operations
Pointers, arrays and records
For every pointer operation, check that result
pointer:
Points to the beginning of an array, record or value
Points to an object of the correct type
Jinja [Klein et al. 2006]
A Java-like programming language built on
Isabelle/HOL
Formal description of Jinja language, Jinja VM
and compiler are given
Several properties were machine-checked
Big step evaluation and small step evaluation
(atomic operations) are equivalent
Compiler correctness
[2] G. Klein, T. Nipkow. A Machine-Checked Model for a Java-Like Language,
Virtual Machine, and Compiler. TOPLAS 28(4), 2006.
Jinja Language
“Jinja is not Java”
Object-oriented language with exceptions
A program is a set of class definitions and,
a class consists of several fields and methods
Method body is an expression
Overriding is supported as in Java
But not overloading, because it is complicated
Language is statically typed
Type system ensures that the execution of a well-typed
program never gets stuck
Jinja Language: Language Elements
Values
Boolean Bool b, Integer Intg i, Reference Addr a
Null reference Null, Dummy value Unit
Expressions
Val v , binary operations e1 op e2 , Var V , V := e , e1; e2 , …
Conditional: if (e) e1 else e2 , while (e) e’ / Block: {V:T, e}
Object construction: new C
Casting: Cast C e
Field access: e.F{D} , e.F{D} := e
D is annotation added in preprocessing (e.g., by typechecker)
Method call: e.M(e, e, …)
Exception: throw e , try e1 catch(C V) e2
Jinja Language: Semantics
Big step semantics
Typical operational semantics
State = <Heap, Local Variables>
Detail abbreviated because nothing special
Small step semantics
Finer-grained semantics
One-step evaluation
Useful for formalizing parallelism (?)
Each (small) operation is considered atomic
Not discussed in the paper
Jinja Language: Semantics
Big and small semantics are proven to be
equivalent
wwf-J-prog means “weak well-formedness”,
which is defined by the following properties:
Number of parameter types and of parameter names
are equal
“this” is not included in parameter list
Free variables in the method body only refer to this or
these parameters
Jinja VM
Similar to Java VM
Stack-based machine with heap
State = <addr option, heap, frame list>
First element is possibly a generated exception
Third element is a call-stack
Frame =
<stack, registers, cname, mname, pc>
where stack = value list, registers = value list
Evaluation of operands are done on stack
Registers are for storing local variables
Jinja VM: Instructions
Basic operations
Push v / Pop
Register operations: Load n / Store n
Arithmetics: IAdd, …
Logical operations: CmpEq, …
Object manipulation
Construction: New cname
Casting: Checkcast cname
Field access:
Getfield vname cname / Putfield vname cname
Method invocation: Invoke mname n
Jinja VM: Instructions
Control flow operation
Branching: Goto n / IfFalse n
n is relative offset from the instruction
Exit from a method: Return
Exception
Throwing an exception: Throw
Information about exception handlers (try-catch)
are attached to method declarations
Handler is retrieved from there when needed
Jinja VM: Semantics
Please refer to the paper for detail
Basically, straightforward and intuitive
In this level, there are no runtime checks
For example, IAdd (Integer addition) does not
check whether its argument is really integers
Otherwise, the result is unspecified
This kind of checks is performed by a bytecode
verifier
Jinja VM Bytecode Verification
JVM relies on the following assumptions:
Types are correct
No overflow or underflow in stack
Code containment
Register initialization before use
Just the same as Java VM
Bytecode verifier statically ensures these
assumptions
Jinja VM Bytecode Verification
Abstract interpretation
Instead of values, consider only types
Addr 1
Intg 1
Addr 1
Addr 3
0
State
Class A
Int
Class A
Class B
Int
State Type
JVM Program
Addr 2
Addr 1
Class A
Addr 3
Class B
0
Class B
Int
Jinja Compiler and its Correctness
2-staged compilation
1. Map parameter names to register indices
Assign local variables to registers
Gather variable occurrences and use it to lookup
2. Code generation
expression → instruction list (compE2)
•
Straightforward definition
Exception table generation (compEx2)
•
Separated from compE2, because exception table must
contain global addresses
Jinja Compiler and its Correctness
Correctness of compilation
If a program is weakly well-formed, then:
Heap,
Vars
Jinja
program
Heap,
Vars
Heap,
[Frame]
compilation
JVM
bytecode
Heap,
[]
Implementation of Jinja
http://afp.sourceforge.net/entries/Jinja.shtml
About 20kLoC in Isabelle/HOL
Over 1,000 theorems are defined
It takes about 25 min. to process these proofs
on a 3GHz Pentium 4 machine with 1GB RAM
Summary of Today’s Talk
We would need a secure low-level language for the
target of a secure compiler
Minimize machine-dependent part to reduce implementation cost
Also reduce cost for proof generation
To answer this, surveyed two VM projects
mvm
Aimed to find the “sweet spot” that reconciles high performance and
small type-safety proofs
Jinja
Constructed a unified formal model of a Java-like language, the
underlying VM and compiler
In contrast to mvm, this research is oriented toward higher-level
languages and compiler’s properties
How about ADL [Yoshino 2006] …?
The position of ADL is close to mvm
To provide a common basis of implementing
verifier for low-level languages
Assumed translation direction is opposite
mvm is an intermediate code of compilation
ADL is designed to simulate real machines
JVM, mvm
Secure L3
ADL
Machine
Code
How about ADL [Yoshino 2006] …?
ADL takes minimalist approach
Only 7 kinds of commands
Instead, expression-based design to allow complex
formulae to be easily written
ADL can be used as an intermediate language?
Probably some modification needed
Register allocation is done, but except for variables
Minimalist design, however, may increase complexity in
constructing a verification logic
• Abstract interpretation is often not sufficient, so a verification
logic may want to calculate exact values
More References
LLVM Project [Lattner 2000]
http://www.llvm.org/
Use VM for interprocedural optimization
SafeTSA [Amme et al. 2001]
SSA-based language for mobile code security
Dis virtual machine [Winterbottom et al. 1997]
Omniware system [Adl-Tabatabai et al. 1996]
(Typical) Compiler Construction and
Several Intermediate Languages
Lexing /
Parsing
Type
Checking
High-Level
Program
Normalize
(SSA, etc.)
LLVM,
SafeTSA
Java, CIL,
Jinja(VM)
Intermediate Code
Generation
Optimize
mvm
TAL,
PCC
ADL
Register
Allocation
Target Code
Generation
Pretty
Printing
Machine
Executable