Slides available - PHARM - University of Wisconsin

Download Report

Transcript Slides available - PHARM - University of Wisconsin

Multicore: Panic or Panacea?
Mikko H. Lipasti
Associate Professor
Electrical and Computer Engineering
University of Wisconsin – Madison
http://www.ece.wisc.edu/~pharm
Multicore Mania

First, servers


Then desktops


AMD Athlon X2, 2005
Then laptops


IBM Power4, 2001
Intel Core Duo, 2006
Soon, your cellphone

Sep 18, 2007
ARM MPCore, prototypes for a while now
Mikko Lipasti-University of Wisconsin
What is behind this trend?



Moore’s Law
Chip power consumption
Single-thread performance trend
Sep 18, 2007
Mikko Lipasti-University of Wisconsin
[source: Intel]
Dynamic Power
P

k
A
dyn
iC
iV
if
2
i
units

Static CMOS: current flows when active

Combinational logic evaluates new inputs
Flip-flop, latch captures new value (clock edge)

C: capacitance of circuit


Terms





wire length, number and size of transistors
V: supply voltage
A: activity factor
f: frequency
Future: Fundamentally power-constrained
Sep 18, 2007
Mikko Lipasti-University of Wisconsin
Easy answer: Multicore
Core
Core
Core
Core
Core
Core
Core
Single Core
Dual Core
Quad Core
Core area
A
~A/2
~A/4
Core power
W
~W/2
~W/4
Chip power
W+O
W + O’
W + O’’
Core performance
P
0.9P
0.8P
Chip performance
P
1.8P
3.2P
Sep 18, 2007
Mikko Lipasti-University of Wisconsin
Amdahl’s Law
# CPUs
n
f
1
f
1-f
Time
f – fraction that can run in parallel
1-f – fraction that must run serially
Speedup 
Sep 18, 2007
1
f
(1  f ) 
n
1
lim

n 
f
1 f
1 f 
n
Mikko Lipasti-University of Wisconsin
1
# CPUs
Fixed Chip Power Budget

1
f
1-f
Amdahl’s Law


n
Time
Ignores (power) cost of n cores
Revised Amdahl’s Law




Sep 18, 2007
More cores  each core is slower
Parallel speedup < n
Serial portion (1-f) takes longer
Also, interconnect and scaling overhead
Mikko Lipasti-University of Wisconsin
Fixed Power Scaling
Chip Performance
128
64
32
99.9% Parallel
16
99% Parallel
8
90% Parallel
4
80% Parallel
2
1
1
2
4
8
16
32
64
128
# of cores/chip


Fixed power budget forces slow cores
Serial code quickly dominates
Sep 18, 2007
Mikko Lipasti-University of Wisconsin
Predictions and Challenges

Parallel scaling limits many-core




Interconnect overhead
Single-thread performance


>4 cores only for well-behaved programs
Optimistic about new applications
Will degrade unless we innovate
Parallel programming


Sep 18, 2007
Express/extract parallelism in new ways
Retrain programming workforce
Mikko Lipasti-University of Wisconsin
Research Agenda

Programming for parallelism



Single-thread performance and power


Sources of parallelism
New applications, tools, and approaches
Most attractive to programmer/user
Chip multiprocessor overheads

Sep 18, 2007
Interconnect, caches, coherence, fairness
Mikko Lipasti-University of Wisconsin
Finding Parallelism
1.
Functional parallelism


2.
Automatic extraction

3.
[UW Multiscalar]
Decompose serial programs
Data parallelism

4.
Car: {engine, brakes, entertain, nav, …}
Game: {physics, logic, UI, render, …}
Vector, matrix, db table, pixels, …
Request parallelism

Sep 18, 2007
Web, shared database, telephony, …
Mikko Lipasti-University of Wisconsin
Balancing Work


Amdahl’s parallel phase f: all cores busy
If not perfectly balanced
(1-f) term grows (f not fully parallel)
 Performance scaling suffers
Manageable for data & request parallel apps
Very difficult problem for other two:






Functional parallelism
Automatically extracted
Scale power to mismatch [Multiscalar]
Sep 18, 2007
Mikko Lipasti-University of Wisconsin
Coordinating Work

Synchronization




Traditionally: locks and mutual exclusion


Some data somewhere is shared
Coordinate/order updates and reads
Otherwise  chaos
Hard to get right, even harder to tune for perf.
Research: Transactional Memory



Sep 18, 2007
[UW Multifacet]
Programmer: Declare potential conflict
Hardware and/or software: speculate & check
Commit or roll back and retry
Mikko Lipasti-University of Wisconsin
Single-thread Performance

Still most attractive source of performance



Speeds up parallel and serial phases
Can use it to buy back power
Must focus on power consumption

Sep 18, 2007
Performance benefit ≥ Power cost
Mikko Lipasti-University of Wisconsin
Single-thread Performance

Hardware accelerators and circuits

Domain-specific [UW MESA]
Reconfigurable [UW Compton]

VLSI and design automation [UW WISCAD, Kursun]


Increasing frequency



Seems prohibitive: clock power
Clever clocking schemes can help [UW Pharm]
Increasing instruction-level parallelism
[UW Multiscalar, UW Pharm, UW Smith]


Sep 18, 2007
Without blowing power budget
Alternatively, reduce power for same performance
Mikko Lipasti-University of Wisconsin
Chip Multiprocessor Overheads

Core Interconnect


80% of chip power [Borkar, ISLPED ‘07 panel]
Need fundamentally different approach


Revisit circuit switching
Cache coherence


Sep 18, 2007
[UW Pharm]
[UW Multifacet, Pharm]
Match workload behavior
Optimize for on-chip communication
Mikko Lipasti-University of Wisconsin
Chip Multiprocessor Overheads

Shared caches



On-chip memory can be shared
Optimize replacement, replication
Fairness


Sep 18, 2007
[UW Multifacet, Multiscalar, Smith]
[UW Smith]
Maintain Performance isolation
Share resources fairly (memory, caches)
Mikko Lipasti-University of Wisconsin
Research Groups @ UW
Group
Faculty
URL
Compton
Kati Compton
www.ece.wisc.edu/~kati
Kursun
Volkan Kursun
www.cae.wisc.edu/~kursun
MESA
Mike Schulte
mesa.ece.wisc.edu
Multifacet
Mark Hill, David Wood
http://www.cs.wisc.edu/multifacet
Multiscalar
Guri Sohi
www.cs.wisc.edu/~mscalar
PHARM
Mikko Lipasti
www.ece.wisc.edu/~pharm
Smith
James Smith
www.engr.wisc.edu/ece/faculty/smith_james.html
Vertical
Karu Sankaralingam
www.cs.wisc.edu/vertical/wiki
WISCAD
Azadeh Davoodi
www.cae.wisc.edu/~adavoodi
Sep 18, 2007
Mikko Lipasti-University of Wisconsin
Conclusion

Forecast



Hardware Challenges



Limited multicore (≤4) is here to stay
Manycore (>4) will find its place
Single-thread performance and power
Multicore overhead
Software Challenges



Sep 18, 2007
Finding application parallelism
Creating correct parallel programs
Creating scalable parallel programs
Mikko Lipasti-University of Wisconsin
Questions?
http://www.ece.wisc.edu/~pharm
Sep 18, 2007
Mikko Lipasti-University of Wisconsin