Transcript ppt

Specialization Tools and
Techniques for Systematic
Optimization of System Software
Dylan McNamee, Jonathan Walpole,
Calton Pu, Crispin Cowan, Charles
Krasic, Ashvin Goel, Perry Wagle
Presented by: Rizal Arryadi
Background
• Much complexity in OS code arises from the requirement
to handle all possible system states.
• There are conflict between correctness across all
applications vs high performance for individual
applications
• Micro-kernel approach to OS shows the conflict between
performance, modularity, and portability
Approaches
• Write general-purpose code, but optimized for a few
anticipated common cases
– Problem: “common” cases varies
• Explicit Customization: incorporate customizability into system
structure
– SPIN, Exokernel, Synthesis, Mach, etc.
– Problem:
• Burden in system tuners
• Limit access to global system state  optimization
opportunities are reduced
• Inferred Customization (Specialization):
– Automatically derived optimization
– Create optimized code for common cases
– Restricting code, not extending it
Process Specialization
• Also called Partial Evaluation
• Consider a program P, taking 2 arguments S and D,
producing a result R:
run P(S,D) = R
A specialization of P wrt S is as follows:
run Ps(D) = run P(S,D) = R
• Creating optimized code for common cases
– Burden in system tuner is reduced
– But, more complex analysis of system
• Tedious, error-prone
• Result is more complex, harder to debug & maintain
Objective of this paper
• Provides toolkit to reduce manual work in
specialization
• Evaluate the toolkit’s effectiveness in
operating system components
Fundamentals of Specialization
• Specialization Predicates
– States of the system known in advance
• Partial Evaluation
– Given specialization predicates, separate the
static parts from the dynamic parts
• Guards
– Enable/Disable specialized code when
specialized terms are modified
Three Kinds of Specialization
• Static specialization
– Predicates known at compile time
– Partial Evaluation can be applied before execution
• Dynamic specialization
– Defer specialization until runtime
– The values of spec predicates are not established
until some point during execution
– Once established, hold for the remainder of execution
• Optimistic specialization
– Spec predicates only hold for bounded time intervals
(aka quasi-invariants)
Steps to specialize a system
1.
Identify specialization predicates
•
•
•
2.
Use developer’s knowledge
Locate code that can be optimized
Estimate the net performance improvement
Generate specialized code
•
3.
Use partial evaluation to automate it
Check when specialization predicates hold
•
•
4.
For dynamic and optimistic specialization
Locate all places that can cause predicates to change, and
“guard” them
Replace specialized code
•
•
Replugging: enabling/disabling one version of specialized
code with another
Not surprising: synchronization issue
Tempo: specialized code generator
• Partial evaluator for C programs
• Challenge in binding time analysis
– Side effects in C
• Pointers & aliases
• Structures & arrays
• Functions that modify global state
– Ignored in conventional approaches but captured well by Tempo
• Tempo features:
–
–
–
–
Use sensitivity: accurate treatment of “nonliftable values”
Flow sensitivity: variables could be static
Context sensitivity: assign specific binding time desc for each context
Return sensitivity: return value of side-effecting functions can be static
• Static specialization
• Compile-time and run-time specialization
Enabling/Disabling Specialized Code:
TypeGuard
• Place guards at the site of modifications to
specialization predicate terms
• To be efficient, the predicate terms must be used
more frequently than modified.
• Problems:
–
–
–
–
May include too many sites
Different variables, same name (locally defined)
Aliases  pass by reference
Spec Predicate terms often are not simple scalar, but
fields of dynamically allocated structures.
Enabling/Disabling Specialized Code:
TypeGuard
• Type based static analysis
• Two-phase approach:
– Phase 1
• Statically identify structure types whose fields are spec
predicate terms
• Extend them to include spec predicate ID (SPID)
• Identify statements that update a guarded field
• Insert the guarding code
– Phase 2
• Set SPID dynamically when specialized code is enabled
• Clear it when specialized code is disabled
• Check it when a spec predicate term is modified
Enabling/Disabling Specialized Code:
TypeGuard
• Guarding example:
current->uid = bar
would become:
if (current.SPID!=NULL)
current.SPID->update_uid(bar);
else
current->uid = bar;
Enabling/Disabling Specialized Code:
MemGuard
• TypeGuard issues warnings about alias-producing operations, and it
needs to be validated.
• MemGuard guarantees complete guard coverage
• Uses memory protection HW to write-protect pages that contain
spec predicate terms
• The write-fault handler check if the address being written is a spec
predicate term; if so, perform guarded write and might trigger
replugging
• Uses HW memory protection  guaranteed to capture all writes to
spec predicate terms
• Drawback:
– Coarse granularity
– High Overheads
Replugger
• Replace current code with code that is consistent with the new state
of specialization predicate.
• Problem: concurrent replugging and function invocation
– Solution: synchronization using locks
• Factors affect the design:
– Concurrent invocation of the same repluggable function
• Can be avoided by associating functions with threads
– Concurrency between replugging and invocation
• With concurrent invocation (i.e. counting replugger):
– Use counter to detect whether threads are exectuing the
repluggable function
– Use stub function holding_tank to avoid invocation while
the function being replugged
• Without concurrent invocation:
– Use a boolean flag
Experiments:
Specializing RPC
• Applying Tempo to Sun RPC’s marshaling process
• Static specialization  the spec predicates are
available when stubs are generated.
• Specialization Opportunities:
– Some fields in data structures used by marshaling code have
values known at stub generation time
• Some details:
–
–
–
–
Encoding/Decoding Dispatch Elimination
Buffer Overflow Checking Elimination
Exit Status Propagation
Marshaling Loop Unrolling
Experiments:
Specializing RPC (Cont.)
• Performance
Experiments:
Specializing Packet Filters
• BSD Packet Filter (BPF):
– Interface for selecting packets from a network interface
• Specialization Opportunities:
– BPF is executed many times
– To be specialized: the packet interpreter
– “Specializing an interpreter wrt a particular program effectively
compiles that program”
• Either static and dynamic specialization
– Static: packet filter program is available in advance
– Dynamic: if the packet filter program is presented immediately
before execution  overhead are included in run-time.
Experiments:
Specializing Packet Filters (cont.)
• A packet is read and filterd by calling:
Parameters: packet filter program, a packet, the length of the
original packet, the amount of data present
• BPF interpreter:
Recursion:
Experiments:
Specializing Packet Filters (Cont.)
• Performance
• Code Size
Unspecialized: 550 lines
Specialized (6 instr filter): 366 lines
Specialized (10 instr filter): 576 lines
Experiments:
Specializing Signals
• Statement ret = kill (pid, n)
• Specialization opportunity:
a process might repeatedly sends the same signal to the
same destination process  both task_structs won’t
change
Experiments:
Specializing Signals (cont.)
• Optimistic specialization:
– if any predicate terms are modified between signals,
the specialized code is invalid (e.g. the destination
exits)
• TypeGuard is used to identify locations that require
guarding.
Experiments:
Specializing Signals (Cont.)
• Performance
• Application Level Impact
– 100,000 Producer-Consumer
iterations
– Buffer size: 4
– Unspecialized: 11.9 s
– Specialized: 5.6 s
• Code Size
– 4 functions into 1, 59 LOC into 18
Related Work
• Multistage programming
– Programs that generate other programs
– Tempo is a:
•
•
•
•
Automatic
Heterogenous
Static and Run-time generation
Two or Three Stage system
• Aspect-oriented programming
“Different aspects of a system’s behavior tend to have their own
natural form, so while one abstraction framework might do a good
job of capturing one aspect, it will do a less good job capturing
others”