Final slides - NYU Computer Science

Download Report

Transcript Final slides - NYU Computer Science

NYU
Computer Architecture
Lecture 26
Past and Future
Ralph Grishman
November 2015
IC Scaling for Speed
• Smaller transistors 
faster transistors 
faster clock
• but also
 more heat
• power wall at about 4 GHz, can’t run faster
11/30/15
Computer Architecture lecture 24
2
IC Scaling for Integration
• Current IC production:
• basic dimension = 14 nm
• chips with 5 billion transistors
• Planning for next generation at 7 nm
• will require new transistor geometries and probably
new materials
• hard to predict beyond that
• fabrication becoming very expensive, affordable by only
a few companies
• more cores on chip  network on chip issues
11/30/15
Computer Architecture lecture 24
3
Instruction Level Parallelism
• Pipelining yields substantial speed-up
• reduces CPI to close to 1
• Dynamic instruction scheduling produces
modest further gain
• CPI rarely below 0.5 (Text Fig. 4.78)0
• limited by difficulty of branch prediction
and by cache misses (Text Fig. 4.79, 5.46)
11/30/15
Computer Architecture lecture 24
4
Memory
• Improvements in access time don’t keep up
with other components
• fast CPU
• slow main memory
• very slow disk
• Problem reduced by multilevel caches
• high hit rates are crucial to performance
• top chips have 4 cache levels
• flash memory as cache for disk
11/30/15
Computer Architecture lecture 24
5
Communication
• Communication becomes more of a limiting
factor than computation
11/30/15
Computer Architecture lecture 24
6
More exotic ideas:
• quantum computing
• approximate computation
• brain-inspired computation
11/30/15
Computer Architecture lecture 24
7
New Avenues in Computer Architecture
As the clock frequency of silicon chips is leveling off, the computer architecture
community is looking for different solutions to continue application performance
scaling.
1.
Specialized logic in form of Accelerators
Designed to perform special tasks efficiently compared to GPPs.
Specialization leads to better efficiency by trading off flexibility for leaner
logic and hardware resources
2.
Exploiting Approximate Computing
Today's computers are designed to compute precise results even when it is not
necessary.
Approximate computing trades off accuracy to enable novel optimizations
http://www.purdue.edu/newsroom/releases/2013/Q4/approximate-computing-improves-efficiency,-savesenergy.html
UNIVERSITY OF WISCONSIN-MADISON
8
Approximate Computing ?
500 / 21 = ?
Is it greater than 1 ?
Is it greater than 30?
Filtering based on precise calculations
Is it greater than 23?
Filtering based on approx. calculations
9
Approximate Computing
• Identify cases where error can be tolerated
• video rendering
• image and speech recognition
• web search
• Calculate approximate result
• use fewer bits
• replace exact calculation with learned model
11/30/15
Computer Architecture lecture 24
10
• Is 'Good Enough' Computing Good Enough?By
Logan Kugler Communications of the ACM,
Vol. 58 No. 5, Pages 12-14
• 10.1145/2742482
• http://cacm.acm.org/magazines/2015/5/1860
12-is-good-enough-computing-goodenough/fulltext
11/30/15
Computer Architecture lecture 24
11
Brain-inspired Computing
• Computation based on artificial neural
network
11/30/15
Computer Architecture lecture 24
12
Multi-layer Perceptron (MLP)
- > Artificial Neural Network (ANN) model.
- > Maps sets of input data onto a set of appropriate outputs
- > MLP utilizes a supervised learning technique called backpropagation for training
the network
typically use
non-linear
weighted sum
at each node
- > The goal of any supervised learning algorithm is to find a function that best
maps a set of inputs to its correct output.
13
Multi-layer Perceptron (MLP)
General idea of supervised learning
1. Send the MLP an input pattern, x, from the training set.
2. Get the output from the MLP, y.
3. Compare y with the “right answer”, or target t, to get the
error quantity.
4. Use the error quantity to modify the weights, so next time y
will be closer to t.
5. Repeat with another x from the training set.
14
Brain-inspired Computing
• Human brain
• 10 billion neurons
• 100 trillion synapses
• Latest IBM neuromorphic chip
•
•
•
•
11/30/15
4 K neurosynaptic cores
1 million programmable neurons
256 million adjustable synapses
5 billion transistors
Computer Architecture lecture 24
15
• Science 8 August 2014: Vol. 345 no. 6197 pp.
668-673 DOI: 10.1126/science.1254642
• REPORT A million spiking-neuron integrated
circuit with a scalable communication
network and interface
• http://www.sciencemag.org/content/345/619
7/668
11/30/15
Computer Architecture lecture 24
16