IBM Presentations: Blue Pearl DeLuxe template

Download Report

Transcript IBM Presentations: Blue Pearl DeLuxe template

IBM Research Division
The 50B Transistor Challenge
Mikko Lipasti
Department of Electrical and Computer Engineering
University of Wisconsin - Madison
IBM T.J. Watson Research Center
July 22 and 23, 2008
July 22, 2008
© 2007 IBM Corporation
IBM Research Division
50B Transistors on a Chip?
 History
– 1997 IEEE Computer Special Issue, 1B T/chip by 2007
•
•
•
•
•
3 papers advocate single fast core – CMU, Michigan, Wisconsin
IRAM – Berkeley
RAW – MIT
SMT – Washington
Multicore – Stanford
 11 years later, 50x more transistors
– We still need faster cores : computation
• Fundamentally constrained by power
– Will get more than one core : communication
• Need efficient interconnects and coherent caches
– Will get lots of on-chip memory
• Need to think about new algorithms and new approaches to use it
2
July 22, 2008
IBM Research Division
(1) What Will We Do With 50B Transistors?
 50B transistors/chip dramatically alters data centers
 E.g. Nokia moving aggressively into services
– Google, Yahoo, MSN each provision ~1M servers
– Now provision for 10x installed base (phone vs. PC)
• Witness recent problems with Iphone/MobileMe
 Impossible to anticipate applications
– Youtube/Facebook/Flickr/Twitter
– Unstructured real world data
– Organize, search, extract semantic knowledge, mashups, …
 Existing and future server apps all benefit
3
July 22, 2008
IBM Research Division
(2) How Will We Design Chips with 50B Transistors
 Three things that processors need to be good at:
– Computation
– Communication
– Storage/Memory
 Focus on cost and nature of computation
 Focus on cost of communication
 Shift emphasis to memory
4
July 22, 2008
IBM Research Division
Cost of Computation
 Less than 10% of energy spent on useful work
– EPI overhead has gotten out of hand
– Need to rethink operand delivery [ICCD’07], queues [ISPLED’07],
caches, register files, control, …
 Exploit program attributes
– Solve hard problems via elimination
• Macro-ops : no single-cycle operations [MICRO’03, HPCA’06]
– Do the hard parts with narrow values [JILP’07]
 Eliminate redundancy, excessive pipelines
– Clever clock gating [ISLPED’06, ICCD’07]
– Remove renaming, register file, clocked scheduler, pipelines
[submitted]
 Goal: reduce EPI by 10x at fixed process technology and MIPS
5
July 22, 2008
IBM Research Division
Cost of Communication
 Reduce coherence overhead and speculation
– Region coherence [ISCA’05, ASPLOS’06, HPCA’08]
 Exploit locality of communication patterns
– Switched circuits [CALetters’07, NOCS’08]
– On-chip multicasting [ISCA’08]
– Multicast coherence [submitted]
 New technologies
– Nanophotonic rings [HP Labs collaboration]
– Massive bandwidth, speed-of-light latency
– Lots of interesting problems to solve
6
July 22, 2008
IBM Research Division
Emphasis on Memory
 In future processes, memory will be easier than logic
– Reliability, variability: well-known solutions (ECC, sparing)
– Interesting new technologies (PCRAM, etc.)
– Not caches -- diminishing returns
 Return to more regular, “memory-like” devices and logic?
– Gate array, LUT, PLA
 Majority of 50B T must not be switching
– Remembering is cheaper than computing
• Revisit value locality/reuse/memoization?
– New search algorithms:
• TCAM accelerator [ICCD’08] : Logic in memory—but not IRAM!
7
July 22, 2008
IBM Research Division
Unstructured Real-World Data
 Internet is exploding with data
– Text
– Semantic knowledge
– Photo, video, audio
 It is all in digital form but all we can do is view and
copy it
 Algorithms for analysis range from poor to
nonexistent
– Machine learning?
 Why not learn from nature?
8
July 22, 2008
IBM Research Division
Brains
 Human brain  Von Neumann machine
–
Face recognition: <500ms
–
Neurons are slow:
•
–
Critical path is a handful of “gates”
Fundamentally different computational model
 Made of shoddy, unreliable parts
“…neurons are noisy, unreliable devices, … the nervous system
averages over many cells to compensate for these shoddy
components.”
-Christof Koch
 We can build it. We have the technology.
Dec. 3, 2007
MICRO’-40 Panel: Computing Beyond Von Neumann
9
IBM Research Division
Brains (2)
 Human neocortex:
– ~20B neurons, ~200T synapses
– Structurally homogenous
– Hypothesis: runs common algorithm
 Apply architecture 101?
– Abstraction layers
– Hierarchy and replication
– Simulation/analysis/synthesis
–Let’s
Build
Brains!
– Massively parallel fault-tolerant hardware
 Best news: no need for parallel programming
– Train vs. program
Dec. 3, 2007
MICRO’-40 Panel: Computing Beyond Von Neumann
10
IBM Research Division
Summary
 Computation :
– Reduce cost (EPI) by 10x
– New algorithms
 Communication
– Streamline coherence protocols, interconnects
– Exploit new technologies
 Storage/Memory
– Reliability/variability
– Logic in memory/new algorithms
 Brain computing for unstructured real-world data
11
July 22, 2008
IBM Research Division
Questions?
http://www.ece.wisc.edu/~pharm