IBM Presentations: Blue Pearl DeLuxe template
Download
Report
Transcript IBM Presentations: Blue Pearl DeLuxe template
IBM Research Division
The 50B Transistor Challenge
Mikko Lipasti
Department of Electrical and Computer Engineering
University of Wisconsin - Madison
IBM T.J. Watson Research Center
July 22 and 23, 2008
July 22, 2008
© 2007 IBM Corporation
IBM Research Division
50B Transistors on a Chip?
History
– 1997 IEEE Computer Special Issue, 1B T/chip by 2007
•
•
•
•
•
3 papers advocate single fast core – CMU, Michigan, Wisconsin
IRAM – Berkeley
RAW – MIT
SMT – Washington
Multicore – Stanford
11 years later, 50x more transistors
– We still need faster cores : computation
• Fundamentally constrained by power
– Will get more than one core : communication
• Need efficient interconnects and coherent caches
– Will get lots of on-chip memory
• Need to think about new algorithms and new approaches to use it
2
July 22, 2008
IBM Research Division
(1) What Will We Do With 50B Transistors?
50B transistors/chip dramatically alters data centers
E.g. Nokia moving aggressively into services
– Google, Yahoo, MSN each provision ~1M servers
– Now provision for 10x installed base (phone vs. PC)
• Witness recent problems with Iphone/MobileMe
Impossible to anticipate applications
– Youtube/Facebook/Flickr/Twitter
– Unstructured real world data
– Organize, search, extract semantic knowledge, mashups, …
Existing and future server apps all benefit
3
July 22, 2008
IBM Research Division
(2) How Will We Design Chips with 50B Transistors
Three things that processors need to be good at:
– Computation
– Communication
– Storage/Memory
Focus on cost and nature of computation
Focus on cost of communication
Shift emphasis to memory
4
July 22, 2008
IBM Research Division
Cost of Computation
Less than 10% of energy spent on useful work
– EPI overhead has gotten out of hand
– Need to rethink operand delivery [ICCD’07], queues [ISPLED’07],
caches, register files, control, …
Exploit program attributes
– Solve hard problems via elimination
• Macro-ops : no single-cycle operations [MICRO’03, HPCA’06]
– Do the hard parts with narrow values [JILP’07]
Eliminate redundancy, excessive pipelines
– Clever clock gating [ISLPED’06, ICCD’07]
– Remove renaming, register file, clocked scheduler, pipelines
[submitted]
Goal: reduce EPI by 10x at fixed process technology and MIPS
5
July 22, 2008
IBM Research Division
Cost of Communication
Reduce coherence overhead and speculation
– Region coherence [ISCA’05, ASPLOS’06, HPCA’08]
Exploit locality of communication patterns
– Switched circuits [CALetters’07, NOCS’08]
– On-chip multicasting [ISCA’08]
– Multicast coherence [submitted]
New technologies
– Nanophotonic rings [HP Labs collaboration]
– Massive bandwidth, speed-of-light latency
– Lots of interesting problems to solve
6
July 22, 2008
IBM Research Division
Emphasis on Memory
In future processes, memory will be easier than logic
– Reliability, variability: well-known solutions (ECC, sparing)
– Interesting new technologies (PCRAM, etc.)
– Not caches -- diminishing returns
Return to more regular, “memory-like” devices and logic?
– Gate array, LUT, PLA
Majority of 50B T must not be switching
– Remembering is cheaper than computing
• Revisit value locality/reuse/memoization?
– New search algorithms:
• TCAM accelerator [ICCD’08] : Logic in memory—but not IRAM!
7
July 22, 2008
IBM Research Division
Unstructured Real-World Data
Internet is exploding with data
– Text
– Semantic knowledge
– Photo, video, audio
It is all in digital form but all we can do is view and
copy it
Algorithms for analysis range from poor to
nonexistent
– Machine learning?
Why not learn from nature?
8
July 22, 2008
IBM Research Division
Brains
Human brain Von Neumann machine
–
Face recognition: <500ms
–
Neurons are slow:
•
–
Critical path is a handful of “gates”
Fundamentally different computational model
Made of shoddy, unreliable parts
“…neurons are noisy, unreliable devices, … the nervous system
averages over many cells to compensate for these shoddy
components.”
-Christof Koch
We can build it. We have the technology.
Dec. 3, 2007
MICRO’-40 Panel: Computing Beyond Von Neumann
9
IBM Research Division
Brains (2)
Human neocortex:
– ~20B neurons, ~200T synapses
– Structurally homogenous
– Hypothesis: runs common algorithm
Apply architecture 101?
– Abstraction layers
– Hierarchy and replication
– Simulation/analysis/synthesis
–Let’s
Build
Brains!
– Massively parallel fault-tolerant hardware
Best news: no need for parallel programming
– Train vs. program
Dec. 3, 2007
MICRO’-40 Panel: Computing Beyond Von Neumann
10
IBM Research Division
Summary
Computation :
– Reduce cost (EPI) by 10x
– New algorithms
Communication
– Streamline coherence protocols, interconnects
– Exploit new technologies
Storage/Memory
– Reliability/variability
– Logic in memory/new algorithms
Brain computing for unstructured real-world data
11
July 22, 2008
IBM Research Division
Questions?
http://www.ece.wisc.edu/~pharm