Four Key Questions

Download Report

Transcript Four Key Questions

CS, CoE, EE 362
Digital Computers II: Architecture
• Prof. Mark Franklin: [email protected]
• Course Assistants:
– Drew Frank: [email protected]
• Required Book: “Heuring & Jordan” 2nd Edition
• Optional Book: “Intro. VHDL” Yalamanchili
• Read: Academic Integrity Statement.
• Course Web Site:
http://www.cse.wustl.edu/~jbf/cse362.d/cse362.html
Mark Franklin, S06
Four Key Questions
• What components must every computer have ?
• How can computers be described, specified and
evaluated ?
• What constitutes computer architecture
(hardware, software, firmware, algorithms, etc.) ?
• How does technology effect computer
architecture (chip size, feature size, power, pin
density, etc) ?
Mark Franklin, S06
Essential Computer Components
• Processor: interpret/execute instructions.
• Memory: store instructions & data.
• Communication Device(s): communicate
with outside world, I/O.
Processor
Control
Unit
Classic Computer Architecture (SISD: Single
Instruction Stream-Single Data Stream)
Memory
Input/
Output
ALU
Mark Franklin, S06
Architecture Components
• INSTRUCTION SET DESIGN: Programmer visible
instruction set
Algorithm, compiler, OS design,
algorithmic complexity
• HIGH LEVEL COMPONENT ORGANIZATION:
Memory system, bus structure, processor design, branch
handling, pipelining,
execution algorithms,
instructions/second, clocks/instruction.
• HARDWARE: Detailed logic design, packaging
VLSI & Logic design CAD algorithms
speed, area,
power, …
Mark Franklin, S06
Program
Control Unit
Program
Memory
ALU
ALU
ALU
ALU
Interconnection Network
Data Memory Unit
Input / Output
(SIMD) Single Instruction Stream – Multiple Data Stream Architecture
Mark Franklin, S06
Performance Expression: Amdahl’s Law
f n  fraction of operations that must be
performed sequential ly ; 0  f n  1
S  maximum achieveabl e speedup
1
S
f n  (1  f n ) / p
p  number of processors present
E  Efficiency  S / p
Mark Franklin, S06
Amdahl’s Law
It does no good to have many processors if there is not
enough parallelism. What portion of a computation can
be sequential if we want the processors to be used at 50
percent efficiency ? ( S = p/2 )
1
p/2 
f n  (1  f n ) / p
 pf n  1  f n  2
1
 fn 
p 1
To maintain a constant efficiency , the fraction of
the computatio n devoted to sequential processing
must be proportion al to the inverse of the number
of processors .
Mark Franklin, S06
Generalize Amdahl’s Law
Speedupoverall =
ExTimeold
ExTimenew
1
=
(1 - Fractionenhanced) + Fractionenhanced
Speedupenhanced
Example: “Suppose a program runs in 100 seconds on a
machine. Multiply operations are responsible for 80 seconds
of this time. How much do we have to improve the speed of
multiplication if we want the program to run 4 times faster?”
What about 5 times faster?
PRINCIPAL: Make the common case fast!
Mark Franklin, S06
Computer Market Partitioning
(costs are for processor, not system)
• Desktop Computing ($100 - $1,000):
– Price-performance
• Servers: ($200 - $2,000)
– Availability (reliability + effectiveness)
– Scalability
– Throughput
• Embedded Computers: ($0.20 - $1,000)
–
–
–
–
Real-time performance
Power and memory minimization
Cost minimization
Interface with special purpose logic; use of processor
cores
Mark Franklin, S06
HLL (e.g., C, C++, Perl) vs Machine/Assembly
Language (AL)
• HLL Pros:
– Easier to express algorithms due to higher level constructs
(e.g., For, Case, Arithmetic expressions, objects, etc.)
– Type checking (Hardware for type checking ?).
– Some memory allocation checking.
• Assembly Language Pros:
– More control over ISA  more speed, less memory
– More control over I/O
• Combination is often best for embedded systems:
HLL calling AL .
Mark Franklin, S06
Example: HLL  AL Mapping
HLL
• b = c + d*e
•
•
•
•
•
•
AL
LOAD R1, d
LOAD R2, e
LOAD R3, c
MPY R4, R2, R1
ADD R5, R4, R3
STORE R5, b
Mark Franklin, S06
•
•
•
•
Buses: I
A set of path(s) (wires) connecting on-chip
or off-chip modules.
– Serial bus: transmit one bit at a time
– Parallel bus: transmits many bits
simultaneously
Generally time-shared.
Generally has separate data & control paths.
Typically has a separate bus controller or
arbiter that decides which modules can use
the bus at any given time.
Mark Franklin, S06
Buses: II
• Some common buses:
– On-chip: AMBA, Wishbone, (generally not standard)
– Off-chip: PCI Bus Family),
• ---------------32bit transfer 64bit transfer
• 33-MHz PCI
133 MB/sec
266 MB/sec
• 66-MHz PCI
266 MB/sec
532 MB/sec
• 100-MHz PCI-X
-----------800 MB/sec
• 133-MHz PCI-X
-----------1 GB/sec
• PCI-e(xpress) serial, 1 lane
500 MB/sec
• PCI-e(xpress) serial, 4 lanes
2 GB/sec
– Off-chip: Other buses - SCSI, IDE, Infiniband
• Common issues: Arbitration, congestion.
• Logical equivalence between buses, multiplexers
and switches.
Mark Franklin, S06
Bandwidth Requirements
Mark Franklin, S06
Bandwidth Trend
Mark Franklin, S06
Simple Queuing Theory View of Buses
• Bus
is a shared resource and can be viewed as a server in a
queuing system.
• Modules attached to the bus present inputs (i.e., requests) to
the server (or Bus) and are queued up if the server is busy.
Memory
CPU
I/O


Server



Queue

BUS
Mark Franklin, S06
Basic Queueing Theory
• Utilization: % time a server is busy
• Average Queue Length: Avg # of jobs in queue.
• Average System Delay (latency): Avg time from job
entry into, to job departure from system.
• Arrival Time Distribution: Poisson Distribution of
arrival times (exponential interarrival times).
• Service Time Distribution: Exponentially distributed
service times.
• Queue Charactericstics: Infinite length; FIFO service
discipline.
Mark Franklin, S06
Basic Queueing Results
arrival.rate 
Utilization   

service.rate 
Avg.Queue.Length  Lq 
Avg. Num.in.System  L 


   )

 
1
Avg.System.Waiting .Time 
 
Mark Franklin, S06
Basic Queueing Results
M/M/1
Waiting Time
Queue Length
M/M/1
1/
0
1

0
1

Mark Franklin, S06
Computer Generations
•
•
•
•
1: 1950 - 1959 Vacuum Tubes
2: 1960 - 1968 Transistors
3: 1969 - 1977 Integrated Circuit
4: 1978 - 2005 LSI-Large Scale
Integration; VLSI-Very LSI
• 5: 2005 - 20?? ULSI-Ultra LSI; parallel
processing
Mark Franklin, S06
Technology: How we make a chip
(roughly)
Mark Franklin, S06
Integrated Circuit Cost
Cost.per.wafer
Cost.per.die = ----------------------------------(Dies.per.wafer) x (Yield)
Wafer.area
Dies.per.wafer = ------------------Die.area
(approximate)
1
Yield = ---------------------------------------------- (empirical observation)
(1 + (Defects.per.area)x(die.area/2))2
Typical: Die area = 1.5 cm x 1.5 cm; Wafer Diameter = 10 inches;
Defects.per.cm2 = 1.7;
Yield = 50 %
Mark Franklin, S06
TECHNOLOGY TRENDS
• Semiconductors:
– Transistor Density: +50%/year, quadruple in 4 years.
– Die Size: +10 - 25%/year
• IC Logic Technology:
– Transistors per Chip: +50 - 60%/year
– Device Speed: +30%/year
– Wire/Communications Speed: ~constant (Cu vs Al)
• Magnetic Disk Technology:
– Density: +25 - 60% / year
– Access Time: +35% / 10 years (8 ms).
Mark Franklin, S06
Feature and Die Size
Wafer Size
12-inch
wafer
Mark Franklin, S06
SILICON & MAGNETIC DENSITIES
Mark Franklin, S06
Performance (x VAX-10/780)
Processor Performance Gains
Mark Franklin, S06
Processor Cost Trends with Time
Mark Franklin, S06
SILICON & MAGNETIC DENSITIES
Mark Franklin, S06