Transcript David Wood
Transactional Memory
An Overview of Hardware Alternatives
David A. Wood
University of Wisconsin
Transactional Memory Workshop
April 8th, 2005
What’s database got to do with it?
Atomicity
Consistency
Correct at begin and end
All (or some) memory ops,
not just database objects
Isolation
All updates, or none
Partial work not visible
Inputs stay stable
Durability
Survive “system” failures
October 21, 2004
Despite increasing
awareness of failures
Thread-Level Transactional Memory
2
801 Database Storage
Lock bits on virtual memory
128 byte granularity
Added to pagetable and TLB
Caches user’s lock state
Trap on lock conflict
No h/w for logging, abort,
etc.
Only uniprocessors
Memory
PPN
TLB
801 and RS/6000
Was this transactional memory?
October 21, 2004
Tid, Lock bits
Tid
Thread-Level Transactional Memory
CPU
3
SQL/801
“The development of SQL/801 was greatly simplified
because, with minor exceptions, it considers only a
single user. It achieves multiuser concurrency [on a
uniprocessor] by running in multiple processes using
the shared database storage….”
Chang and Mergen, ’88
Largest transactional memory application
Only real hardware transactional memory implementation
No one seems to be looking at what they learned
October 21, 2004
Thread-Level Transactional Memory
4
Basic Transactional Mechanisms
Isolation
Version management
Detect when transactions conflict
Track read and write sets
Record new and old values
Atomicity
Commit new values
Abort back to old values
October 21, 2004
Thread-Level Transactional Memory
5
H/W Transactional Memory Systems
Knight’s Lisp Work
Transactional Memory
Oklahoma Update
SLE/TLR
Transactional Coherence and Consistency
Unbounded TM
Virtual TM
Thread-level TM
October 21, 2004
Thread-Level Transactional Memory
6
Knight’s Lisp Work [’86]
Parallel execution of sequential code
Break program into “transaction blocks”
Execute transactions in parallel
Multiple loads in a transaction
Exactly one store ends the transaction
No register state passed between transactions
Track dependences (i.e., read set)
Abort and restart on conflicting write
Transactions commit in sequential order
October 21, 2004
Broadcast writes on commit
Thread-Level Transactional Memory
7
Knight’s Hardware
Two caches
Dependency cache
Confirm cache
Holds write set
Supports multiple writes
Commits
Check dep. cache
Broadcast writes
Fast aborts
Memory
Tracks read set
Bus monitor detects conflicts
Valid
Old Value
Depends
Old Value
A
New
Value
Invalid
Invalidate Confirm cache
Use old values in Dep. Cache
Immediately restart execution
Dependency
Cache
Confirm
Cache
CPU
Spawned two threads: TLS & TM
October 21, 2004
Thread-Level Transactional Memory
8
H&M’s Transactional Memory [’93]
Targets explicitly parallel (non-functional) codes
Motivated by lock-free data structures
Transactions:
Read and write multiple locations
Commit in arbitrary order
Implicit begin, explicit commit operations
Abort affects memory, not registers
Software manages restarting execution
Validate instruction detects pending abort
Implementation extends cache coherence
Read/Write locks correspond to MOESI states
Add orthogonal transaction states
October 21, 2004
Thread-Level Transactional Memory
9
H&M’s Transactional Memory
Adds Transaction Cache
Before and after image
Even for read-only data
Small, fully associative
Abort on all conflicts
NACK conflicting requests
Abort NACKed transaction
Fast commit and abort
Memory
Stores all data accessed by
transactions
2 copies of each line
Change trans. cache state
October 21, 2004
M
M
XCommit
New Value
S
M
XAbort
Old Value
S
S
XCommit
Old Value
S
XAbort
Old Value
Cache
Transaction
Cache
CPU
Thread-Level Transactional Memory
10
SLE/TLR
Hardware exploits speculative processors
Read sets tracked by coherence protocol
Write set maintained in store queue
Abort restarts execution, including register state
Speculative lock elision (SLE)
Elide locks from the dynamic execution stream
Convert critical sections to optimistic transactions
Concurrently execute non-conflicting transactions
Fall back on explicit locks if conflicts
Transactional Lock Removal (TLR)
Resolve conflicts using priority ordering (timestamps)
Delay lower priority transactions
Deadlock and starvation free
October 21, 2004
Thread-Level Transactional Memory
11
Transactional Coherence and Consistency [’04]
TCC unifies coherence, memory consistency,
and transaction support
Transaction ordering
All transactions, all the time
Ordered, Unordered, Partially Ordered
Supports thread-level speculation
Optimistic concurrency model
Unordered transactions serialize at commit
Conflicts detected at commit
October 21, 2004
Thread-Level Transactional Memory
12
TCC
Write buffer ~4 kB,
holds new values
until commit
Shadow register
file checkpoints
architectural
registers
On-Chip
Interconnect
Broadcast updates
at commit
L2 Cache
Logically Shared
CPU
SRF
October 21, 2004
L1 D
L1 cache tracks read
set, bit per line
Thread-Level Transactional Memory
13
TCC
Commits are sequential
Broadcasts addresses of all updates
Supports large transactions
Serialize all other transactions
Cannot abort large transactions
Grabs and holds the commit bus
Updates affect L2/Mem; no undo
Extensions forthcoming
talk to Kunle and Christos
October 21, 2004
Thread-Level Transactional Memory
14
Unbounded Transactional Memory (UTM)
Unbounded transactions
Arbitrary size
Arbitrary duration
Not limited by interrupts, context switch, etc.
Complex implementation
Not limited by write buffer, cache, or memory
Not justified by performance
Settle for “nearly” unbounded transactions
Much simpler hardware
October 21, 2004
Thread-Level Transactional Memory
15
Transactional Linux
9.355x10^6
make
dbench
10^6
10^4
10^2
Log-log scale
1
1
10
100
1000
Fully associative cache size (64 byte lines)
8144
Almost all of the transactions require < 100 cache lines
99.9% need fewer than 54 cache lines
There are, however, some very large transactions!
>500k-byte fully-associative cache required
October 21, 2004
Thread-Level Transactional Memory
16
Large Transaction Memory (LTM)
Register checkpoints
Cache tracks read and write sets
Snapshot of rename maps
T-bits mark transactional blocks
Cache holds new data values “in place”
O-bit indicates overflow to in-memory hashtable
Memory holds committed state
Abort invalidates all modified blocks
Miss on re-execution
Transactional writes force memory updates
October 21, 2004
Repeated writes (e.g., to local data) are written through
Thread-Level Transactional Memory
17
Virtual Transactional Memory (VTM)
Only an overflow mechanism
No overhead on common in-cache case
Low overhead when no conflict
Shared Bloom Filter rules out conflicts
Filter resides in virtual memory
Higher overhead on possible conflict
Check shared overflow counter on cache miss
Hardware table walk to detect actual conflict
Table resides in virtual memory
Only incurred by large transactions with likely conflict
Supports context switches and paging
October 21, 2004
Thread-Level Transactional Memory
18
801 revisited
Why didn’t 801 database storage succeed?
Answer #1:
Lock bits helped performance and simplified software
Changing lock bits requires TLB shootdown
Too complicated for the benefits?
Not a current problem: transaction h/w is easy
Answer #2:
Not universally available
DB2 was (is) multiplatform
Can’t rely on feature only available in one architecture
Still a relevant concern
October 21, 2004
Thread-Level Transactional Memory
19
Need Standard Transaction Interface
Abstract away resource requirements
Virtualize transactional memory
Support large, long transactions
Transaction semantics between threads
NOT a hardware property
Permit range of implementations
Hardware, software, and combinations
October 21, 2004
Thread-Level Transactional Memory
20
Thread-level Transactional Memory
Abstract mechanisms
Version management
Isolation
Update memory “in place”
Log “before images” to thread level VM
Logically extend memory words with read and write bits
Implementations can be conservative (e.g., blocks)
Atomicity
Commits easy due to in place updates
Aborts trap to user-level software
October 21, 2004
Hardware can accelerate common case
Thread-Level Transactional Memory
21
Conclusions
Make the common case fast
Handle the uncommon case
99+% of transactions fit in hardware
Lots of alternatives
Make both commits and aborts fast
Large transactions will occur, deal with ‘em
Shouldn’t be limited by hardware
Agree on a common abstraction
Success requires multi-platform support
Let vendors compete on price-performance
October 21, 2004
Thread-Level Transactional Memory
22