pptx - Computer Sciences Dept.

Download Report

Transcript pptx - Computer Sciences Dept.

21st Century
Computer Architecture
A community white paper
http://cra.org/ccc/docs/init/21stcenturyarchitecturewhitepaper.pdf
Technion, Haifa Israel, June 2013
•
•
•
•
Information & Commun. Tech’s Impact
Semiconductor Technology’s Challenges
Computer Architecture’s Future
Example: Bypassing Paged Virtual Memory
White Paper Participants
Sarita Adve, U Illinois *
David H. Albonesi, Cornell U
David Brooks, Harvard U
Luis Ceze, U Washington *
Sandhya Dwarkadas, U Rochester
Joel Emer, Intel/MIT
Babak Falsafi, EPFL
Antonio Gonzalez, Intel/UPC
Mark D. Hill, U Wisconsin *,**
Mary Jane Irwin, Penn State U *
David Kaeli, Northeastern U *
Stephen W. Keckler, NVIDIA/U Texas
Christos Kozyrakis, Stanford U
Alvin Lebeck, Duke U
Milo Martin, U Pennsylvania
José F. Martínez, Cornell U
Margaret Martonosi, Princeton U *
Kunle Olukotun, Stanford U
Mark Oskin, U Washington
Li-Shiuan Peh, M.I.T.
Milos Prvulovic, Georgia Tech
Steven K. Reinhardt, AMD
Michael Schulte, AMD/U Wisconsin
Simha Sethumadhavan, Columbia U
Guri Sohi, U Wisconsin
Daniel Sorin, Duke U
Josep Torrellas, U Illinois *
Thomas F. Wenisch, U Michigan *
David Wood, U Wisconsin *
Katherine Yelick, UC Berkeley/LBNL *
“*” contributed prose; “**” effort coordinator
Thanks of CCC, Erwin Gianchandani & Ed Lazowska for
guidance and Jim Larus & Jeannette Wing for feedback
2
20th Century ICT Set Up
• Information & Communication Technology (ICT)
Has Changed Our World
o <long list omitted>
• Required innovations in algorithms, applications,
programming languages, … , & system software
• Key (invisible) enablers (cost-)performance gains
o Semiconductor technology (“Moore’s Law”)
o Computer architecture (~80x per Danowitz et al.)
3
Enablers: Technology + Architecture
Architecture
Technology
Danowitz et al., CACM 04/2012, Figure 1
4
21st Century Promise
• ICT Promises Much More
o
o
o
o
Data-centric personalized health care
Computation-driven scientific discovery
Human network analysis
Much more: known & unknown
• Characterized by
o
o
o
o
Big Data
Always Online
Secure/Private
…
Whither enablers of future (cost-)performance gains?
5
Technology’s Challenges 1/2
Late 20th Century
The New Reality
Moore’s Law —
2× transistors/chip
Transistor count still 2× BUT…
Dennard Scaling —
~constant power/chip
Gone. Can’t repeatedly double
power/chip
6
Classic CMOS Dennard Scaling:
the Science behind Moore’s Law
Source: Future of Computing Performance:
Game Over or Next Level?,
National Academy Press, 2011
Scaling:
Voltage:
Oxide:
(Finding 2)
V/a
tOX/a
Results:
1/a2
Power/ckt:
Power Density: ~Constant
National Research Council (NRC) – Computer Science and Telecommunications Board (CSTB.org)
7
Post-classic CMOS Dennard Scaling
Post Dennard CMOS Scaling Rule
TODO:
Chips w/ higher power (no), smaller (),
Scaling:
dark silicon (), or other (?)
Voltage:
V/a V
tOX/a
Oxide:
Results:
1/a2 1
Power/ckt:
Power Density: ~Constant a2
National Research Council (NRC) – Computer Science and Telecommunications Board (CSTB.org)
8
Technology’s Challenges 2/2
Late 20th Century
The New Reality
Moore’s Law —
2× transistors/chip
Transistor count still 2× BUT…
Dennard Scaling —
~constant power/chip
Gone. Can’t repeatedly double
power/chip
Modest (hidden)
transistor unreliability
Increasing transistor unreliability
can’t be hidden
Focus on computation
over communication
Communication (energy) more
expensive than computation
1-time costs amortized
via mass market
One-time cost much worse &
want specialized platforms
How should architects step up as technology falters?
9
21st Century Comp Architecture
20th Century
21st Century
Single-chip in
generic
computer
Architecture as Infrastructure:
Spanning sensors to clouds
X
Performance plus security, privacy, Crossavailability, programmability, …
Cutting:
Performance
via invisible
instr.-level
parallelism
Energy First
●
Parallelism
X
●
Specialization
●
Cross-layer design
Predictable
technologies:
CMOS, DRAM,
& disks
New technologies (non-volatile
memory, near-threshold, 3D,
photonics, …) Rethink: memory &
storage, reliability, communication
Break
current
layers with
new
interfaces
10
What Research Exactly?
• Research areas in white paper (& backup slides)
1.
2.
3.
4.
Architecture as Infrastructure: Spanning Sensors to Clouds
Energy First
Technology Impacts on Architecture
Cross-Cutting Issues & Interfaces
• Much more research developed by future PIs!
• E.g.: Efficient Virtual Memory for Big Memory Servers
o Basu, Gandhi, Chang, Hill, & Swift [ISCA 2013]
o Big Memory: graph500, memcached, databases
o Self-manage most memory (e.g., bufferpool)
12
35
30
25
20
51.1
1. Significant waste
2. Larger memory?
3. Byte-addr NVM?
15
83.1 51.3
4KB
2MB
1GB
Lower is better
10
Direct
Segment
5
PS
GU
NP
B:
CG
NP
B:
BT
yS
Q
L
M
d
m
em
ca
ch
e
ap
h5
00
0
gr
Percentage of execu on cycles spent on
servicing DTLB missses
Execution Time Overhead: TLB Misses
10/5/12
13
Hardware: Direct Segment
1
Conventional Paging
BASE
2
Direct Segment
LIMIT
VA
OFFSET
PA
Why Direct Segment?
• Matches Big Memory Workload needs
• NO Paging => NO TLB Miss
35
51.1
83.1 51.3
4KB
30
25
2MB
20
15
10
0.01
~0
0.48
~0
1GB
0.01
0.49
Direct
Segment
5
PS
GU
NP
B:
CG
NP
B:
BT
yS
QL
M
ap
h5
00
m
em
ca
ch
ed
0
gr
Percentage of execu on cycles spent on
servicing DTLB missses
Execution Time Overhead: TLB Misses
92-100% TLB “misses” to direct segment
Requires: Both small SW + small HW changes
10/5/12
15
21st Century
Computer Architecture
A community white paper
http://cra.org/ccc/docs/init/21stcenturyarchitecturewhitepaper.pdf
Technion, Haifa Israel, June 2013
•
•
•
•
Information & Commun. Tech’s Impact
Semiconductor Technology’s Challenges
Computer Architecture’s Future
Example: Bypassing Paged Virtual Memory
Back Up Slides
• Detailed research areas in white paper
1.
2.
3.
4.
Architecture as Infrastructure: Spanning Sensors to Clouds
Energy First
Technology Impacts on Architecture
Cross-Cutting Issues & Interfaces
http://cra.org/ccc/docs/init/21stcenturyarchitecturewhitepaper.pdf
• Findings on National Academy “Game Over” Study
• Glimpse at DARPA/ISAT Workshop “Advancing
Computer Systems without Technology Progress”
19
1. Architecture as Infrastructure:
Spanning Sensors to Clouds
• Beyond a chip in a generic computer
• To pillar of 21st century societal infrastructure.
o
o
o
o
Computation in context (sensor, mobile, …, data center)
Systems often large & distributed
Communication issues can dominate computation
Goals beyond performance (battery life, form factor)
• Opportunities (not exhaustive)
o Reliable sensors harvesting (intermittent) energy
o Smart phones to Star Trek’s medical “tricorder”
o Cloud infrastructure suitable for both “Big Data” streams
& low-latency qualify-of-service with stragglers
o Analysis & design tools that scale
20
2. Energy First
• Beyond single-core performance computer
• To (cost-)performance per watt/joule
• Energy across the layers
o Circuit/technology (near-threshold CMOS, 3D stacking)
o Architecture (reducing unnecessary data movement)
o Software (communication-reducing algorithms)
• Parallelism to save energy
o Vast (fined-grained) homogeneous & heterogeneous
o Improved SW stack
o Applications focus (beyond graphic processing units)
• Specialization for performance & energy efficiency
o Abstractions for specialization (reducing 1-time cost)
o Energy-efficient memory hierarchies
o Reconfigurable logic structures
21
3. Technology Impacts on Architecture
• Beyond CMOS, Dram, & Disks of last 3+ decades to
• Using replacement circuit technologies
o Sub/near-threshold CMOS, QWFETs, TFETs, and QCAs
• Non-volatile storage
o Beyond flash memory to STT-RAM, PCRAM, & memristor
• 3D die stacking & interposers
o logic, cache, small main memory
• Photonic interconnects
o Inter- & even intra-chip
• Design automation
o from circuit-design w/ new technologies to
o pre-RTL functional, performance, power, area modeling of
heterogeneous chips & systems
22
4. Cross-Cutting Issues & Interfaces
• Beyond performance w/ stable interfaces to
• New design goals (for pillar of societal infrastructure)
o
o
o
o
Verifiability (bugs kill)
Reliability (“dependability” computing base?)
Security/Privacy (w/ non-volatile memory?)
Programmability (time to correct-performant solution)
• Better Interfaces
o
o
o
o
High-level information (quality of service, provenance)
Parallelism ((in)dependence, (lack of) side-effects)
Orchestrating communication ((recursive) locality)
Security/Reliability (fine-grain protection)
23
Executive summary (Added to National Academy Slides)
 Highlights of National Academy Findings
(F1) Computer hardware has transitioned to multicore
(F2) Dennard scaling of CMOS has broken down
(F3) Parallelism and locality must be exploited by software
(F4) Chip power will soon limit multicore scaling
 Eight recommendations from algorithms to education
 We know all of this at some level, BUT:
Are we all acting on this knowledge or hoping for business as usual?
Thinking beyond next paper to where future value will be created?
– Questions Asked but Not Answered Embedded in NA Talk
– Briefly Close with Kübler-Ross Stages of Grief:
Denial  …  Acceptance
Source: Future of Computing Performance: Game Over or Next Level?,
National Academy Press, 2011
Mark Hill talk (http://www.cs.wisc.edu/~markhill/NRCgameover_wisconsin_2011_05.pptx)
System Capability (log)
The Graph
Fallow Period
80s
90s
00s
10s
20s
30s
40s
50s
Source: Advancing Computer Systems without Technology Progress,
ISAT Outbrief (http://www.cs.wisc.edu/~markhill/papers/isat2012_ACSWTP.pdf)
Mark D. Hill and Christos Kozyrakis, DARPA/ISAT Workshop, March 26-27, 2012.
Approved for Public Release, Distribution Unlimited
The views expressed are those of the author and do not reflect the official policy or position of the
25
Department of Defense or the U.S. Government.
Surprise 1 of 2
• Can Harvest in the “Fallow” Period!
• 2 decades of Moore’s Law-like perf./energy gains
• Wring out inefficiencies used to harvest Moore’s Law
HW/SW Specialization/Co-design (3-100x)
Reduce SW Bloat (2-1000x)
Approximate Computing (2-500x)
--------------------------------------------------~1000x = 2 decades of Moore’s Law!
26
“Surprise” 2 of 2
• Systems must exploit LOCALITY-AWARE parallelism
• Parallelism Necessary, but not Sufficient
• As communication’s energy costs dominate
• Shouldn’t be a surprise, but many are in denial
• Both surprises hard, requiring “vertical cut” thru SW/HW
27