Introduction to VHDL - NYU Polytechnic School of Engineering

Download Report

Transcript Introduction to VHDL - NYU Polytechnic School of Engineering

CS/EE 1012
Computing for the Near
and Long Term
Haldun Hadimioglu
Spring 2010
Outline
What has happened ?
Designing chips
Near future directions
Long term directions
Conclusions
Intel Eight-Core Xeon die
with 2.3 billion transistors
Cray Jaguar Supercomputer the
fastest computer in the world
CS/EE1012 Introduction to Computer Engineering Spring 2010
Page 2
What has Happened ?
Moore’s Law has been holding since 1960s
It will continue to hold
Perhaps at a slower rate of doubling every three years
www.ieee.org
We will have very
small transistors !
Smaller transistors
are susceptible to
alpha particles !
More transistors
will be defective !
CS/EE1012 Introduction to Computer Engineering Spring 2010
Page 3
Intel ‘s Past Microprocessor Roadmap
Intel 1.01 TFLOP,
100 million transistor,
62-Watt, 80-core die,
each core at 3.16GHz
Intel Eight-Core Xeon
7500 die with 2.3 billion
transistors
Intel eight-core Xeon processor (>26MB cache) 2010 2,300,000,000
CS/EE1012 Introduction to Computer Engineering Spring 2010
Page 4
Power Density was Increasing Exponentially!
Power was doubling every 4 years
1000
Rocket
Nozzle
Watts/cm 2
Nuclear Reactor
100
Pentium® 4
Pentium® III
Pentium® II
Hot plate
10
Pentium® Pro
Pentium®
i386
i486
1
1.5m
1m
0.7m
0.5m
0.35m
0.25m
0.18m
0.13m
0.1m
0.07m
Courtesy : “New Microarchitecture Challenges in the Coming Generations of CMOS Process Technologies” – Fred
Pollack, Intel Corp. Micro32 conference key note - 1999. Courtesy Avi Mendelson, Intel.
CS/EE1012 Introduction to Computer Engineering Spring 2010
Page 5
Microprocessor speed
Every two years the speed of microprocessors doubles
The processor speed increases 50% a year !
But, memory speed increases 10 % a year !
Microprocessor speed for an application depends on
Number of operations in the application (lower better)
The quality of the code
Number of parallel operations performed (higher better)
Do more operations in parallel
How fast each operation is performed (higher better)
Because of Moore’s Law : transistors are smaller and wires are shorter
Clock frequency is increased
Until 2005 increasing the clock frequency was the main way to
increase the speed
Power consumption (heat generation) increases with the frequency
The chip has to be cooled by usingcooled
A heat sink or a fan or a liquid
Since 2005 power consumption changed way to increase speed
CS/EE1012 Introduction to Computer Engineering Spring 2010
Page 6
Multi-Core Microprocessors
Since 2005 microprocessor speed increase depends
on
Number of operations in the code (the quality of the code)
Number of parallel operations performed
Dual-core microprocessors with reduced frequency consume
less power (generate less heat)
Two/Four/Eight cores perform more operations in parallel
The speed increase continues into the future with more cores on chip
Clock frequency
Number of cores per chip doubles every two years
The memory can become a bottleneck
The memory speed increases 10% a year
More cores increase the demand on the memory
The memory wall problem
Parallel Programming has to be improved dramatically
Parallel programming wall
CS/EE1012 Introduction to Computer Engineering Spring 2010
Page 7
Designing Chips
We have been using hardware description languages
(HDLs) to design chips
We write an HDL program to design a chip !
Just like we draw a schematic to design a chip
Why an HDL program, why not schematics ?
Real life circuits are too complex to be designed by schematics
There are two popular HDLs today
VHDL
Verilog HDL
Knowing one HDL language helps one learn another HDL
language faster
CS/EE1012 Introduction to Computer Engineering Spring 2010
Page 8
Why HDLs ?
Software : Statements are executed sequentially
The sequence of statements is significant, since they are
executed in that order
Java, C++, C, Ada, Pascal, Fortran,…
Hardware : Events happen concurrently
A software language cannot be used for describing and
simulating hardware
Concurrent software languages cannot be used either
Because we do not have powerful tools
Programs in C/C++ etc. will be used to design chips in
the future
It is already done for C and C++ programs in limited cases
First they are converted to HDL programs and then to hardware
CS/EE1012 Introduction to Computer Engineering Spring 2010
Page 9
Full Adder VHDL Program
© IBM
Data-flow description of the Full Adder circuit :
IBM dual-core BlueGene/L
microprocessor die & its chip
ki
mi
ci
si
Full
Adder
co
si = ki mi ci + ki mi ci + ki mi ci + ki mi ci
co = ki mi + ki ci + mi ci
CS/EE1012 Introduction to Computer Engineering Spring 2010 Page 10
VHDL Details : 3-to-8 Decoder
CS/EE1012 Introduction to Computer Engineering Spring 2010
Page 11
3-to-8 Decoder VHDL Program
Y_L0
A0
Y_L1
A1
Entity Part :
Y_L2
A2
3-to-8
DCD
Y_L3
Y_L4
G1
Y_L5
G2A_L
Y_L6
G2B_L
Y_L7
CS/EE1012 Introduction to Computer Engineering Spring 2010 Page 12
3-to-8 Decoder VHDL Program
Architecture Part :
Y_L0
A0
Y_L1
A1
Y_L2
A2
3-to-8
DCD
Y_L3
Y_L4
G1
Y_L5
G2A_L
Y_L6
G2B_L
Y_L7
All statements happen concurrently
CS/EE1012 Introduction to Computer Engineering Spring 2010 Page 13
Near Future Directions
 Double number of cores every two years
Make sure to handle
errors due to
Alpha particles
Defective transistors
Make sure to handle
Power Wall
Memory Wall
Make sure to improve
Parallel Programming
CS/EE1012 Introduction to Computer Engineering Spring 2010 Page 14
Near Future Directions
Intel Unveils 48-Core Research Chip
On Wednesday Intel shifted its Tera-scale Computing Research Program into second gear by
demonstrating a 48-core x86 processor. The company is intending to use the new chip as a
research platform for the purpose of lighting a fire under many-core computing.
According to Intel, the new chip boasts 1.3 billion transistors and is built on 45nm CMOS
technology. It's distinction is that it contains the largest number of Intel Architecture (IA)
cores ever assembled on a single microprocessor. As such, it represents the sequel to Intel's
2007 "Polaris" 80-core prototype that was based on simple floating point units. While the
latter chip was said to reach 2 teraflops, the company is not talking about performance for
the 48-core version.
HPC Wire, December 4, 2009
The IBM Power7 chips are implemented in a 45 nanometer copper/SOI
process and have 1.2 billion transistors with eight cores on a single die. The
Power7 core has 32KB of L1 instruction cache and 32KB of L1 data cache.
Each core sports simultaneous multithreading that delivers four virtual
threads per core, and has a 256KB of L2 cache tightly coupled to it. The chip
also has 32MB of embedded DRAM that acts as a shared L3 cache, with 4 MB
segments affiliated with each of the eight cores. The Power7 chip has two
dual-channel DDR3 memory controllers implemented on the chip, which deliver
100 GB/sec of sustained bandwidth per chip.
September 1, 2009
http://www.arstechnica.com
http://www.theregister.co.uk, November, 27, 2009
CS/EE1012 Introduction to Computer Engineering Spring 2010 Page 15
Scalable High Performance Main
Memory System Using PCM
Technology, Moinuddin K. Qureshi,
et.al., ISCA 2009, IBM
From Intel
www.anandtech.com
Intel Technology Journal, November 2005
Intel & IBM Vision for Next 5-8 Years
Intel
CS/EE1012 Introduction to Computer Engineering Spring 2010 Page 16
Near Future Directions : Next 5-8 Years
Applications
Intel : Recognition, Mining, Synthesis as platform 2015 Workload
Model (on massively parallel core chips)
IBM : Presence information, knowing where and things are and how to
best match them, people are sensorized
Microsoft : Intention machine, computer predicts user intentions and
delivers useful information
CMU : Computational thinking, computer science based approach to
solving problems, designing systems, understanding human behavior
Traditional computing will continue
A C/C++/Java program for an application becomes Software
A compiler generates the machine language program file
A new type of computing
A C/C++/Java program for an application becomes Hardware
A hardware compiler generates the transistor circuit
The result is a custom chip
CS/EE1012 Introduction to Computer Engineering Spring 2010 Page 17
Near Future Directions : New Computing Types ?
Any other new possibility ?
A C/C++/Java program for an application becomes Hardware
A CAD tool generates the bit file to reconfigure the FPGA
An FPGA chip is a hardware programmable chip
The chip emulates the circuit designed
The bit file configures the chip
The CS 2204 Digital Logic Lab uses FPGAs !
There can be more opportunities with FPGA chips !
FPGAs are increasingly used in commercial products !
FPGAs are becoming cost competitive with microprocessors
FPGAs are becoming speed competitive with custom chips
FPGAs are used for applications where
Speed and programmability matter
Latest FPGAs also have microprocessor cores
They can run software as well
The application can be divided into software and hardware
CS/EE1012 Introduction to Computer Engineering Spring 2010 Page 18
Near Future Directions : New Computing Types
A C/C++/Java program becomes
Part software and part hardware
FPGA with cores and reconfigurable areas runs applications
Software is run by processor cores and
Hardware is in the reconfigurable area
When such an FPGA runs an application, some operations are in hardware
and simultaneously some operations in software
Reconfigurable area
to do operations in
hardware
Processor core
to run software
These FPGAs are available
now but we need much
better tools
Software tools (compilers) and CAD tools must merge
Reconfigurable areas & cores allow recovering from errors due to
Alpha particles
Defective transistors
CS/EE1012 Introduction to Computer Engineering Spring 2010 Page 19
Near Future Directions : Hybrid Switching Elements
CMOL : A circuitry composed of CMOS and nanodevices
A closer look at FPGA-like
reconfigurable logic circuits
Interface between CMOS and nanodevices
Figures from :
Konstantin K.
Likharev
A larger view of FPGA-like reconfigurable logic circuits
Two CMOS cells and a nanodevice
CS/EE1012 Introduction to Computer Engineering Spring 2010 Page 20
Near Future Directions : Possible New Structures
Microelectromechanical systems, MEMS, with computing elements
Microembedded systems
Smart Dust at UC Berkeley
Microbiolab on a chip
Sometimes referred to as a biochip !
Other structures that can be used for a number of different
applications with or without computing elements
Microcameras
Microsensors
Micromirrors
Micromotors
Microlenses
An all-optical computing chip with
Micromirrors
Microlenses
Bio MEMS
The Biochip Group at Mesa+,
University of Twente, Holland
CS/EE1012 Introduction to Computer Engineering Spring 2010 Page 21
Near Future Directions : Year 2020
SEMATECH : consortium of semiconductor
manufacturers from America, Asia and Europe.
SEMATECH predictions for year 2020 (from its 2009
Update of International Technology Roadmap for
Semiconductors, ITRS, study) :
Clock speed : 12 GHz
Number of transistors on a microprocessor chip : 35
Billion
Make sure to handle errors due to
32Gbit DRAM chips
Alpha particles
Process length : 14 nm
Defective transistors
http://www.sematech.org
CS/EE1012 Introduction to Computer Engineering Spring 2010 Page 22
Long Term Directions : Possible New Structures
Nanotechnology
Programmable materials
NEMS
Bio NEMS
Nano medicine
Drug delivery
Smart diagnosis
Nanocomputing
Quantum computing
Molecular computing
IBM Blue Gene/L molecular dynamics demo
Molecular self assembly
Testing of molecular structures
Adaptive molecular structures
Merger of bio and non-bio structures
Synthetic biology
www.ibm.com
1 Watt supercomputer
CS/EE1012 Introduction to Computer Engineering Spring 2010 Page 23
Long Term Directions : 2020 and Beyond
Many interconnected varying-size computing elements using
each other’s results autonomously
Ubiquitous computing with little human intervention
Cloud computing to nano computing
Personal agents
Intelligent spaces
Nano medicine
Targeted drug delivery
We need
Self-healing, adaptive, self managing, trustworthy, dependable
hardware and software
New computational models
New programming languages
Hardware and software reliability
www.uky.edu
Efficient parallel processing
CS/EE1012 Introduction to Computer Engineering Spring 2010 Page 24
Long Term Directions : 2020 and Beyond
Will hardware and software be developed separately
like today ?
How will software be developed for nano systems ?
Quantum software ?
Molecular software ?
Biosoftware ?
How will hardware be developed for nano systems ?
VHDL or Verilog HDL or C or C++ or ?
Developing tools is critical
Simulation of
protein
molecules
folding on a
supercomputer
Iron atoms
on copper
with electron
movement
CS/EE1012 Introduction to Computer Engineering Spring 2010 Page 25
Long Term Directions : 2020 and Beyond
By 2019 a $1000 computer will match the processing
power of the human brain
Raymond Kurzweil, KurzweilAI.net, 9/1/1999
His keynote speech at the Supercomputing Conference (SC06) in
November 2006
The title of his talk is “The Coming Merger of Biological and Non-Biological
Intelligence”
 Singularity point ?
Brain downloads possible by 2050
Ian Pearson, Head of British Telecom’s futurology unit,
CNN.com, 5/23/2005
Computers will be used as virtual brain extensions ?
Direct brain - Internet link ?
CS/EE1012 Introduction to Computer Engineering Spring 2010 Page 26
Long Term Directions
Hans Moravec, 1998
Many ethical issues will be facing you ! Being prepared will help !
CS/EE1012 Introduction to Computer Engineering Spring 2010 Page 27
Conclusions
Digital Logic evolution will continue :
Faster, cheaper, smaller, lighter, less power
consuming, higher reliability digital products
Due to converging research in various areas :
Mathematics
Computer Science
Computer Engineering
Electrical Engineering
Mechanical Engineering
Physics
Chemistry
Material Science
Biology ?
There will be many ethical issues
Try to prepare !
Try to be informed !
CS/EE1012 Introduction to Computer Engineering Spring 2010 Page 28