Transcript ENGS116 F04
ENGS 116 Lecture 1
1
ENGS 116 / COSC 107
Computer Architecture
Introduction
Vincent H. Berk
September 24th , 2008
Reading for Friday: Chapter 1.1 – 1.4, Amdahl article
Reading for Monday: 1.5 – 1.11
ENGS 116 Lecture 1
2
Prerequisite Knowledge
• Assembly language programming
• Fundamentals of logic design
Combinational and sequential components (e.g., gates, multiplexers,
decoders, ROMs, flip-flops, registers, RAMs)
• Processor Design
Instruction cycle, pipelining, branch prediction, exceptions
• Memory Hierarchy
Caches (direct-mapped, fully-associative, 2-way set associative), spatial
locality, temporal locality, virtual memory, translation lookaside buffer
(TLB)
• Input and Output
Polling, interrupts
• Multiprocessors
ENGS 116 Lecture 1
3
What is Computer Architecture?
Two viewpoints:
•
Hardware designer’s viewpoint: CPUs, caches, buses,
pipelines, physical memory, etc.
•
Programmer’s viewpoint: instruction set – opcodes, addressing
modes, registers, virtual memory, etc.
Study of architecture covers both instruction-set architectures
and machine implementation organizations.
ENGS 116 Lecture 1
4
Computer Architecture Is ...
The attributes of a [computing] system as seen by the
programmer, i.e., the conceptual structure and functional
behavior, as distinct from the organization of the data
flows and controls the logic design, and the physical
implementation.
Amdahl, Blaauw, and Brooks, 1964
ENGS 116 Lecture 1
Computer Architecture’s Changing Definition
• 1950s to 1960s:
Computer Architecture Course = Computer Arithmetic.
• 1970s to 1980s:
Computer Architecture Course = Instruction Set Design, especially
ISA appropriate for compilers.
• 1990s to 2000s:
Computer Architecture Course = Design of CPU, memory system, I/O
• 2000 to now:
Computer Architecture Course = ILP, DLP, TLP, storage
5
ENGS 116 Lecture 1
6
ENGS 116 Lecture 1
7
5 Generations of Electronic Computers (Hwang)
First
(1945-54)
Vacuum tubes and relay memories,
CPU driven by PC and accumulator,
fixed-point arithmetic
Second
(1955-64)
Discrete transistors and core
memories, floating-point arithmetic,
I/O processors, multiplexed memory
access.
Integrated circuits (SSI/MSI),
microprogramming, pipelining, cache
and lookahead.
LSI/VLSI and semi-conductor
memory, multiprocessors, vector
super-computers, multicomputers.
Third
(1965-74)
Fourth
(1975-90)
Fifth
(1991present)
ULSI/VHSIC processors, memory,
and switches, high-density
packaging, scalable architectures
Machine/assembly lan-guages,
single user, no subroutine
linkage, pro-grammed I/O
using CPU.
HLL used with compilers,
subroutine libraries, batch
processing monitor.
Multiprogramming and timesharing OS, multiuser
applications
Multiprocessor OS, languages,
compilers, and environments
for parallel processing.
ENIAC, Princeton
IAS, IBM 701
IBM 7030,
CDC 1604,
Univac LARC
IBM 360/370,
CDC 6600,
TI-ASC, PDP-8
VAX/900,
Cray X/MP,
IBM 3090, BBN
TC2000.
Massively parallel process-ing, IBM/MPP,
grand challenge appli-cations, Cray/MPP,
heterogeneous pro-cessing.
TMC/CM-5,
Intel Paragon.
ENGS 116 Lecture 1
8
Computer Tasks
• Desktop Computing, Lightweight Servers, Laptops
Price-performance (low cost)
Communication, Graphics
• Server Computing, Mainframe Systems
Specific performance, processing power, storage
Availability, Reliability
• Embedded Computers and DSPs
Power and Memory requirements
Lowest cost for required performance
Real-time or soft-real-time performance
ENGS 116 Lecture 1
9
Task of Computer Designer
• Determine which attributes are important for a new
machine.
• Design a machine to meet functional requirements, price,
power and performance goals.
ENGS 116 Lecture 1
10
Basic Computer Organization
Processor
Control
Input
Memory
Datapath
Output
ENGS 116 Lecture 1
11
Computer Architecture Topics
Input/Output and Storage
Disks, WORM, Tape
SDRAM
Memory
Hierarchy
VLSI
L2/L3 Cache
L1 Cache
Instruction Set Architecture
Pipelining, Hazard Resolution,
Superscalar, Reordering,
Prediction, Speculation
RAID
Emerging Technologies
Interleaving
Bus protocols
Multi-Core Coherence,
Bandwidth,
Latency
Addressing,
Protection,
Exception Handling
Pipelining, InstructionLevel Parallelism, ThreadLevel Parallelism
ENGS 116 Lecture 1
12
Computer Architecture Topics
P M P M
S
°°°
P M P M
Interconnection Network
Processor-Memory-Switch
Multiprocessors
Networks and Interconnections
Shared Memory,
Message Passing,
Data Parallelism
Network Interfaces
Topologies,
Routing,
Bandwidth,
Latency,
Reliability
ENGS 116 Lecture 1
13
Course Focus
Understanding the design techniques, machine structures,
technology factors, and evaluation methods that will
determine the form of computers in the 21st Century
Technology
Parallelism
Programming
Languages
Applications
Computer Architecture:
• Instruction Set Design
• Organization
• Hardware
Operating
Systems
Measurement &
Evaluation
Interface Design
(ISA)
History
ENGS 116 Lecture 1
14
Technology Trends
• Integrated circuit logic technology
transistor density (feature size)
transistor count
cycle speed
multiple cores
• Semiconductor DRAM
density
latency and bandwidth
• Magnetic disk technology
density
access time
• Network technology
bandwidth
latency
ENGS 116 Lecture 1
15
Scaling in ICs
• Feature size: minimum size of a single
distinguishable/producible item on a chip die
1971 – 10 microns
2001 – 0.18 microns
2003 – 0.06 microns
2006 – 5 nanometers (0.005 microns)
• Complex relationships:
Transistor density increases quadratically with decrease in feature size
Reduction in feature size requires voltage reduction to maintain correct operation
and reasonable reliability
• Scaling IC wiring:
Signal delay increases with product of resistance and capacitance
Shorter wires can be smaller
Smaller features have higher current leakage
ENGS 116 Lecture 1
16
Power Consumption of ICs
• Power requirements per transistor are proportional to load
capacitance, frequency of switching and the square of the
voltage.
Power = ½ x Capacitance x Voltage2 x Frequency switched
• Switching frequency and density of transistors increases
faster than decrease in capacitance and voltage, leading to
increased power consumption == generated heat
• Pentium 4 consumes 135 Watts of power while the 8086i386 did not even feature a heat-sink
ENGS 116 Lecture 1
17
Cost and Price
• Cost of manufacturing decreases over time: learning curve
• Learning curve is measured as an increase in yield
• Volume doubling leads to 10% reduction in cost
• Commodity products tend to decrease cost:
Volume
Competition
Efficiency
ENGS 116 Lecture 1
18
Difference between Cost and Price
ENGS 116 Lecture 1
19
Wafers and Dies
• Chips are produced on round silicon disks
• Dies are the actual chip, cut out from the wafer
• Testing occurs before cutting and after packaging
ENGS 116 Lecture 1
20
Yield and Cost
•
However:
Wafers do not just contain chip-dies, usually a large area, including
several chip-dies, is dedicated for test equipment hook-up
Actual yield in mass-production chip-fabs varies between 98% for
DRAMS to 1% for new Processors
ENGS 116 Lecture 1
21
Yield and Cost
• Switch from 200mm to 300mm wafers:
Although 300mm wafers have lower yield than 200mm wafers, the overhead
processing costs per wafer are high enough to make 300mm wafers more cost
effective.
• Redundancy in dies:
Single transistors do fail during production, causing memory cells, pipeline stages,
control logic sections to fail
Redundancy is built into the each die by introducing backup-units
After testing, backup units are enabled and failed units can be disabled by LASER
This decreases the chances of small flaws failing an entire die
Few companies give insight into their redundant circuitry numbers
ENGS 116 Lecture 1
22
Performance
Hwang: “The ideal performance of a computer system demands a
perfect match between machine capability and program behavior.”
Machine capability – enhanced with better hardware technology,
innovative architectural features, efficient resource management.
Program behavior – affected by algorithm design, data structures,
language efficiency, programmer skill, compiler technology.
To improve software performance, need to understand how various
hardware factors affect overall system performance!
ENGS 116 Lecture 1
23
Measuring Performance
• Key measure is time.
• Response time (execution time): Time between start and
completion of a task.
• Throughput: total amount of work completed in a given time.
Seconds Inst rcount Clock Cycles
Seconds
=
P rogram
P rogram
Inst rcount
Clock Cycle
ENGS 116 Lecture 1
24
Comparing Design Alternatives
“X is n times faster than Y” means
ENGS 116 Lecture 1
25
Benchmarking
•
Real programs; e.g., compilers, photo editing
•
Modified or scripted real programs; e.g., compression algorithms
•
Kernels – small, key pieces from real programs; e.g., Livermore Loops,
Linpack.
•
Toy benchmarks – typically 10 to 100 lines of code, useful primarily for intro
programming assignments; e.g., quicksort, prime numbers, encryption
•
Synthetic benchmarks – try to match average frequency of operations and
operands for a set of programs; e.g., Whetstone, Dhrystone.
•
Benchmark suites – collections of programs; e.g, SPEC CPU2000