Computer Architecture & Related Topics

Download Report

Transcript Computer Architecture & Related Topics

Computer Architecture
& Related Topics
Ben Schrooten
Shawn Borchardt, Eddie Willett
Vandana Chopra
Presentation
Topics




Computer
Architecture History
Single Cpu Design
GPU Design (Brief)
Memory
Architecture



Communications
Architecture
Dual Processor
Design
Parallel &
Supercomputing
Design
Part 1 History and Single Cpu
Ben Schrooten
HISTORY!!!
One of the first computing devices to come about was . .
The ABACUS!
The ENIAC : 1946
•
Completed:1946
•
Programmed:plug board and switches
•
Speed:5,000 operations per second
•
Input/output:cards, lights, switches, plugs
•
Floor space:1,000 square feet
The EDSAC(1949) and
The UNIVAC I(1951)
UNIVAC
EDSAC
Speed:1,905 operations per second
Technology:vacuum
tubes
Input/output:magnetic tape, unityper, printer
Memory:1K
words 12-digit words in delay
Memory size:1,000
lines
Speed:714 operations per second
Memory type:delay lines, magnetic tape
First practical stored-program
Technology:serial vacuum tubes, delay lines,
computer
magnetic tape
Floor space:943 cubic feet
Cost:F.O.B. factory $750,000 plus $185,000
for a high speed printer
Intel 4004 1971
Progression of The Architecture
Vacuum tubes -- 1940 – 1950
Transistors -- 1950 – 1964
Integrated circuits -- 1964 – 1971
Microprocessor chips -- 1971 – present
Current CPUArchitecture
•Basic CPU Overview
Single Bus
Slow Performance
Example of Triple
Bus Architecture
Motherboards / Chipsets / Sockets
OH MY!
•Chipset
In charge of:
•Memory Controller
•IRDA Controller
•EIDE Controller
•Keyboard
•PCI Bridge
•Mouse
•Real Time Clock
•Secondary Cache
•DMA Controller
•Low-Power CMOS SRAM
Sockets
•Socket 4 & 5
•Socket 7
•Socket 8
•Slot 1
•Slot A
GPU’s
•Allows for Real Time Rendering Graphics on a small PC
•GPUs are true processing units
•Pentium 4 contains 42 million transistors on a 0.18
micron process
•Geforce3 contains 57 million transistors on a 0.15 micron
manufacturing process
More GPU
Sources
Source for DX4100 Picture
Oneironaut
http://oneironaut.tripod.com/dx4100.jpg
Source for Computer Architecture Overview Picture
http://www.eecs.tulane.edu/courses/cpen201/slides/201Intro.pdf
Pictures of CPU Overview, Single Bus Architecture, Tripe Bus Architecture
Roy M. Wnek Virginia Tech. CS5515 Lecture 5
http://www.nvc.cs.vt.edu/~wnek/cs5515/slide/Grad_Arch_5.PDF
Historical Data and Pictures
The Computer Museum History Center.
http://www.computerhistory.org/
Intel Motherboard Diagram/Pentium 4 Picture
Intel Corporation
http://www.intel.com
The Abacus
Abacus-Online-Museum
http://www.hh.schule.de/metalltechnikdidaktik/users/luetjens/abakus/china/china.htm
Information Also from
Clint Fleri
http://www.geocities.com/cfleri/
Memory Functionality
Dana Angluin
http://zoo.cs.yale.edu/classes/cs201/Fall_2001/handouts/lecture
-13/node4.html
Benchmark Graphics
Digital Life
http://www.digit-life.com/articles/pentium4/index3.html
Chipset and Socket Information
Motherboards.org
http://www.motherboards.org/articlesd/techplanations/17_2.html
Amd Processor Pictures
Toms hardware
http://www6.tomshardware.com/search/search.html?category=a
ll&words=Athlon
GPU Info
4th Wave Inc.
http://www.wave-report.com/tutorials/gpu.htm
NV20 Design Pictures
Digital Life
http://www.digit-life.com/articles/nv20/
Main Memory
Memory Hierarchy
DRAM vs. SRAM
•DRAM is short for Dynamic Random Access Memory
•SRAM is short for Static Random Access Memory
DRAM is dynamic in that, unlike SRAM, it needs to have
its storage cells refreshed or given a new electronic charge
every few milliseconds. SRAM does not need refreshing
because it operates on the principle of moving current that
is switched in one of two directions rather than a storage cell
that holds a charge in place.
Parity vs. Non-Parity


Parity is error detection that was developed
to notify the user of any data errors. By
adding a single bit to each byte of data, this
bit is responsible for checking the integrity
of the other 8 bits while the byte is moved
or stored.
Since memory errors are so rare, many of
today’s memory is non-parity.
SIMM vs. DIMM vs. RIMM?





SIMM-Single In-line Memory Module
DIMM-Dual In-line Memory Modules
RIMM-Rambus In-line Memory Modules
SIMMs offer a 32-bit data path while DIMMs offer a 64bit data path. SIMMs have to be used in pairs on
Pentiums and more recent processors
RIMM is the one of the latest designs. Because of the fast
data transfer rate of these modules, a heat spreader
(aluminum plate covering) is used for each module
Evolution of Memory
1970
1987
1995
1997
1998
1999
1999/2000
2000
2001
RAM / DRAM
FPM
EDO
PC66 SDRAM
PC100 SDRAM
RDRAM
PC133 SDRAM
DDR SDRAM
EDRAM
4.77 MHz
20 MHz
20 MHz
66 MHz
100 MHz
800 MHz
133 MHz
266 MHz
450MHz
• FPM-Fast Page Mode DRAM
-traditional DRAM
•EDO-Extended Data Output
-increases the Read cycle between Memory and the CPU
•SDRAM-Synchronous DRAM
-synchronizes itself with the CPU bus and runs at higher
clock speeds
•RDRAM-Rambus DRAM
-DRAM with a very high bandwidth (1.6 GBps)
•EDRAM-Enhanced DRAM
-(dynamic or power-refreshed RAM) that includes a
small amount of static RAM (SRAM) inside a larger
amount of DRAM so that many memory accesses will
be to the faster SRAM. EDRAM is sometimes used as
L1 and L2 memory and, together with Enhanced
Synchronous Dynamic DRAM, is known as cached
DRAM.
Read Operation
•On a read the CPU will first try to find the data in the
cache, if it is not there the cache will get updated
from the main memory and then return the data to
the CPU.
Write Operation
• On a write the CPU will write the information into
the cache and the main memory.
References



http://www-ece.ucsd.edu/~weathers/ece30/downloads/Ch7_memory(4x).pdf
http://home.cfl.rr.com/bjp/eric/ComputerMemory.html
http://aggregate.org/EE380/JEL/ch1.pdf
Defining a Bus

A parallel circuit that connects the major
components of a computer, allowing the
transfer of electric impulses from one
connected component to any other
VESA - Video Electronics Standards Association





32 bit bus
Found mostly on 486 machines
Relied on the 486 processor to function
People started to switch to the PCI bus
because of this
Otherwise known as VLB
ISA - Industry Standard Architecture




Very old technology
Bus speed 8mhz
Speed of 42.4 Mb/s maximum
Very few ISA ports are found in
modern machines.
MCA - Micro Channel Bus




IBM’s attempt to compete with the ISA bus
32 bit bus
Automatically configured cards (Like Plug and
Play)
Not compatible with ISA
EISA - Extended Industry Standard Architecture





Attempt to compete with IBM’s MCA bus
Ran on a 8.33Mhz cycle rate
32 bit slots
Backward compatible with ISA
Went the way of MCA
PCI – Peripheral Component Interconnect







Speeds up to 960 Mb/s
Bus speed of 33mhz
16-bit architecture
Developed by Intel in 1993
Synchronous or Asynchronous
PCI popularized Plug and Play
Runs at half of the system bus speed
PCI – X





Up to 133 Mhz bus speed
64-bit bandwidth
1GB/sec throughput
Backwards compatible with all PCI
Primarily developed for increased I/O
demands of technologies such as Fibre
Channel, Gigabit Ethernet and Ultra3
SCSI.
AGP – Accelerated Graphics Port




Essentially a high speed PCI Port
Capable of running at 4 times PCI
bus speed. (133mhz)
Used for High speed 3D graphics
cards
Considered a port not a bus


Only two devices involved
Is not expandable
BUS
Width
(bits)
8-bit ISA
16-bit ISA
EISA
VLB
PCI
AGP
AGP(X2)
AGP(X4)
8
16
32
32
32
32
32
32
Bus
Speed
(Mhz)
8.3
8.3
8.3
33
33
66
66 X 2
66 X 4
Bus Bandwith
(Mbytes/sec)
7.9
15.9
31.8
127.2
127.2
254.3
508.6
1017.3
IDE - Integrated Drive Electronics



Tons of other names: ATA,
ATA/ATAPI, EIDE, ATA-2, Fast
ATA, ATA-3, Ultra ATA, Ultra
DMA
Good performance at a cheap
cost
Most widely used interface for
hard disks
SCSI - Small Computer System Interface “skuzzy”



Capable of handling
internal/external peripherals
Speed anywhere from 80 – 640
Mb/s
Many types of SCSI
TYPE
Bus
Speed,
MBytes/
Sec. Max.
Bus
Width,
bits
Max.
Device
Support
SCSI-1
5
8
8
Fast SCSI
10
8
8
Fast Wide
SCSI
20
16
16
Ultra SCSI
20
8
8
Ultra Wide SCSI
40
16
16
Ultra2 SCSI
40
8
8
Wide Ultra2 SCSI
80
16
16
Ultra3 SCSI
160
16
16
Ultra320 SCSI
320
16
16
Serial Port



Uses DB9 or DB25
connector
Adheres to RS-232c
spec
Capable of speeds up to
115kb/sec
USB

1.0





hot plug-and-play
Full speed USB devices signal at 12Mb/s
Low speed devices use a 1.5Mb/s
subchannel.
Up to 127 devices chained together
2.0

data rate of 480 mega bits per second
USB On-The-Go



For portable devices.
Limited host capability to communicate with
selected other USB peripherals
A small USB connector to fit the mobile form
factor
Firewire i.e. IEEE 1394 and i.LINK




High speed serial port
400 mbps transfer rate
30 times faster than USB 1.0
hot plug-and-play
PS/2 Port

Mini Din Plug with 6 pins

Mouse port and keyboard port
Developed by IBM

Parallel port i.e. “printer port”
Old type
 Two “new” types
 ECP (extended capabilities port)
and EPP (enhanced parallel port)



Ten times faster than old parallel
port
Capable of bi-directional
communication.
Game Port


Uses a db15 port
Used for joystick connection to the
computer
Need for High Performance
Computing


There’s a need for tremendous
computational capabilities in science
engineering and business
There are applications that require
gigabytes of memory and gigaflops of
performance
What is a High Performance
Computer

Definition of a High Performance computer :
An HPC computer can solve large problems in a
reasonable amount of time
Characteristics : Fast Computation
Large memory
High speed interconnect
High speed input /output
How is an HPC computer made to
go fast

Make the sequential computation faster

Do more things in parallel
Applications
1> Weather Prediction
2> Aircraft and Automobile Design
3> Artificial Intelligence
4> Entertainment Industry
5> Military Applications
6> Financial Analysis
7> Seismic exploration
8> Automobile crash testing
Who Makes High Performance
Computers
* SGI/Cray
Power Challenge Array
Origin-2000
T3D/T3E
* HP/Convex
SPP-1200
SPP-2000
* IBM
SP2
* Tandem
Trends in Computer Design
Performance of the fastest computer has
grown exponentially from 1945 to the
present averaging a factor of 10 every five
years
The growth flattened somewhat in 1980s but
is accelerating again as massively parallel
computers became available

Real World Sequential Processes
Sequential processes we find in the world.
The passage of time is a classic example of a
sequential process.
Day breaks as the sun rises in the morning.
Daytime has its sunlight and bright sky.
Dusk sees the sun setting in the horizon.
Nighttime descends with its moonlight, dark sky
and stars.
Parallel Processes
Music
An orchestra performance, where every
instrument plays its own part, and playing
together they make beautiful music.
Parallel Features of Computers
Various methods available on computers for
doing work in parallel are :
Computing environment
Operating system
Memory
Disk
Arithmetic
Computing Environment - Parallel
Features
Using a timesharing environment
The computer's resources are shared among many
users who are logged in simultaneously.
Your process uses the cpu for a time slice, and then is
rolled out while another user’s process is allowed to
compute.
The opposite of this is to use dedicated mode where
yours is the only job running.
The computer overlaps computation and I/O
While one process is writing to disk, the computer lets
another process do some computation
Operating System - Parallel Features
Using the UNIX background processing facility
a.out > results &
man etime
Using the UNIX Cron jobs feature
You submit a job that will run at a later time.
Then you can play tennis while the computer
continues to work.
This overlaps your computer work with your personal
time.
Memory - Parallel Features
Memory Interleaving
Memory is divided into multiple banks, and consecutive
data elements are interleaved among them.
There are multiple ports to memory. When the data
elements that are spread across the banks are needed,
they can be accessed and fetched in parallel.
The memory interleaving increases the memory
bandwidth.
Memory - Parallel Features(Cont)

Multiple levels of the memory hierarchy
Global memory which any processor can access.
Memory local to a partition of the processors.
Memory local to a single processor:
cache memory
memory elements held in registers
Disk - Parallel Features
RAID disk
Redundant Array of Inexpensive Disk
Striped disk
When a dataset is written to disk, it is broken into
pieces which are written simultaneously to
different disks in a RAID disk system.
When the same dataset is read back in, the pieces
of the dataset are read in parallel, and the original
dataset is reassembled in memory.
Arithmetic - Parallel Features
We will examine the following features that
lend themselves to parallel arithmetic:
Multiple Functional Units
Super Scalar arithmetic
Instruction Pipelining
Parallel Machine Model
(Architectures)

von Neumann Computer
MultiComputer


A multicomputer comprises a number of
von Neumann computers or nodes linked
by a interconnection network
In a idealized network the cost of sending
the a message between two nodes is
independent of both node location and
other network traffic but does depend on
message length
Locality
Scalibility
Concurrency
Distributed Memory (MIMD)
MIMD means that each processor can execute
separate stream of instructions on its own local
data,distributed memory means that memory is
distributed among the processors rather than
placed in a central location

Difference between multicomputer and
MIMD
The cost of sending a message between
multicomputer and the distributed memory
is not independent of node location and
other network traffic
Examples of MIMD machine
MultiProcessor or
Shared Memory MIMD

All processors share access to a common
memory via bus or hierarchy of buses
Example for Shared Memory MIMD

Silicon Graphics Challenge
SIMD Machines

All processors execute the same instruction
stream on a different piece of data
Example of SIMD machine:

MasPar MP
Use of Cache
Why is cache used on parallel computers?
The advances in memory technology aren’t keeping up with
processor innovations.
Memory isn’t speeding up as fast as the processors.
One way to alleviate the performance gap between main
memory and the processors is to have local cache.
The cache memory can be accessed faster than the main
memory.
Cache keeps up with the fast processors, and keeps them
busy with data.
Shared Memory
Network
Cache
Memory 1
processor
1
Cache
Memory 2
Cache
Memory 3
processor
2
processor
3
Cache Coherence
What is cache coherence?
Keeps a data element found in several caches current
with each other and with the value in main
memory.
Various cache coherence protocols are used.
snoopy protocol
directory based protocol
Various Other Issues



Data Locality Issue
Distributed Memory Issue
Shared Memory Issue