Designing Classes and Programs
Download
Report
Transcript Designing Classes and Programs
Today’s topics
Performance & Computer Architecture
Notes from David A. Patterson and John L. Hennessy, Computer
Organization and Design: The Hardware/Software Interface,
Morgan Kaufmann, 1997.
http://computer.howstuffworks.com/pc.htm
Slides from
Alvy Lebeck, Duke CS
Marti Hearst, UC Berkeley SIMS
David Patterson, UC Berkeley CS
Mounir Hamdi, HKUST CS
Upcoming
Complexity
Compsci 001
4.1
Performance
Performance= 1/Time
The goal for all software and hardware developers is to
increase performance
Metrics for measuring performance (pros/cons?)
Elapsed time
CPU time
• Instruction count (RISC vx. CISC)
• Clock cycles per instruction
• Clock cycle time
MIPS vs. MFLOPS
Throughput (tasks/time)
Other more subjective metrics?
What kind of workload to be used?
Applications, kernels and benchmarks (toy or synthetic)
Compsci 001
4.2
What is Realtime?
Response time
Panic
• How to tell “I am still computing”
• Progress bar
Flicker
Fusion frequency
Update rate vs. refresh rate
Movie film standards (24 fps projected at 48 fps)
Interactive media
Interactive vs. non-interactive graphics
• computer games vs. movies
• animation tools vs. animation
Interactivity => real-time systems
• system must respond to user inputs without any perceptible delay
(A Primary Challenge in VR)
Compsci 001
4.3
The Big Picture
Since 1946 all computers have had 5 components
The Von Neumann Machine
Processor
Input
Control
Memory
Datapath
Output
What is computer architecture?
Computer Architecture = Machine Organization +
Instruction Set Architecture + ...
Compsci 001
4.4
Fetch, Decode, Execute Cycle
Computer instructions are stored (as bits) in memory
A program’s execution is a loop
Fetch instruction from memory
Decode instruction
Execute instruction
Cycle time
Measured in hertz (cycles per second)
2 GHz processor can execute this cycle up to 2 billion
times a second
Not all cycles are the same though…
Compsci 001
4.5
Organization
Capabilities & Performance
Characteristics of Principal Functional
Units (Fus)
(e.g., Registers, ALU, Shifters, Logic
Units, ...)
Ways in which these components are
interconnected
Information flows between components
Logic and means by which such
information flow is controlled.
Choreography of FUs to realize the ISA
Compsci 001
Logic Designer's View
ISA Level
FUs & Interconnect
4.6
Instruction Set Architecture
... the attributes of a [computing] system as seen by the
programmer, i.e. the conceptual structure and functional behavior,
as distinct from the organization of the data flows and controls the
logic design, and the physical implementation.
– Amdahl, Blaaw, and Brooks, 1964
-- Organization of Programmable
Storage
SOFTWARE
-- Data Types & Data Structures:
Encodings & Representations
-- Instruction Set
-- Instruction Formats
-- Modes of Addressing and Accessing Data Items and Instructions
-- Exceptional Conditions
Compsci 001
4.7
The Instruction Set: a Critical Interface
instruction set
What is an example of an Instruction Set architecture?
Compsci 001
4.8
Forces on Computer Architecture
Technology
Programming
Languages
Applications
Computer
Architecture
Cleverness
Operating
Systems
History
Compsci 001
4.9
Technology
DRAM chip capacity
Microprocessor Logic Density
DRAM
Year
1980
1983
1986
1989
1992
1996
1999
2002
2007
2009
Size
64 Kb
256 Kb
1 Mb
4 Mb
16 Mb
64 Mb
256 Mb
1 Gb
2 Gb
4 Gb
uP-Name
In ~1985 the single-chip processor (32-bit) and the single-board
computer emerged
=> workstations, personal computers, multiprocessors have
been riding this wave since
Now, we have multicore processors
Compsci 001
4.10
Technology => dramatic change
Processor
logic capacity: about 30% per year
clock rate:
about 20% per year
Memory
DRAM capacity: about 60% per year (4x every 3 years)
Memory speed: about 10% per year
Cost per bit: improves about 25% per year
Disk
capacity: about 60% per year
Total use of data: 100% per 9 months!
Network Bandwidth
Bandwidth increasing more than 100% per year!
Compsci 001
4.11
Performance Trends
Compsci 001
4.12
Processor Transistor Count
(from http://en.wikipedia.org/wiki/Transistor_count)
Processor
Transistor
count
Date of
introduction
Manufacturer
Processor
Transistor
count
Date of
introduction
Manufacturer
Intel 4004
2300
1971
Intel
Itanium
25 000 000
2001
Intel
Intel 8008
2500
1972
Intel
Barton
54 300 000
2003
AMD
Intel 8080
4500
1974
Intel
AMD K8
105 900 000
2003
AMD
Intel 8088
29 000
1978
Intel
Itanium 2
220 000 000
2003
Intel
Intel 80286
134 000
1982
Intel
592 000 000
2004
Intel
Intel 80386
275 000
1985
Intel
Itanium 2 with
9MB cache
Intel 80486
1 200 000
1989
Intel
Cell
241 000 000
2006
Sony/IBM/
Toshiba
Pentium
3 100 000
1993
Intel
Core 2 Duo
291 000 000
2006
Intel
AMD K5
4 300 000
1996
AMD
Core 2 Quadro
582 000 000
2006
Intel
Pentium II
7 500 000
1997
Intel
2006
Intel
8 800 000
1997
AMD
Dual-Core
Itanium 2
1 700 000 000
AMD K6
Pentium III
9 500 000
1999
Intel
2 000 000 000
200
Intel
AMD K6-III
21 300 000
1999
AMD
Quad-Core
Itanium
AMD K7
22 000 000
1999
AMD
Pentium 4
42 000 000
2000
Intel
Compsci 001
4.13
Processor-Memory Speed Gap
1000
CPU
Processor-Memory
Performance Gap:
(grows 50% / year)
100
10
DRAM
1
DRAM
9%/yr.
(2X/10 yrs)
1980
1981
1982
1983
1984
1985
1986
1987
1988
1989
1990
1991
1992
1993
1994
1995
1996
1997
1998
1999
2000
Performance
“Moore’s Law”
µProc
50%/yr.
Compsci 001
4.14
Latency vs. Throughput
Compsci 001
4.15
Memory bottleneck
CPU can execute dozens of instruction in the time it takes to
retrieve one item from memory
Solution: Memory Hierarchy
Use fast memory
Registers
Cache memory
Rule: small memory is fast, large memory is small
Compsci 001
4.16
A great idea in computer science
Temporal locality
Programs tend to access data that has been accessed
recently (i.e. close in time)
Spatial locality
Programs tend to access data at an address near recently
referenced data (i.e. close in space)
Useful in graphics and virtual reality as well
Realistic images require significant computational power
Don’t need to represent distant objects as well
Efficient distributed systems rely on locality
Memory access time increases over a network
Want to acess data on local machine
Compsci 001
4.17
Microprocessor Generations
First generation: 1971-78
Behind the power curve
(16-bit, <50k transistors)
Second Generation: 1979-85
Becoming “real” computers
(32-bit , >50k transistors)
Third Generation: 1985-89
Challenging the “establishment”
(Reduced Instruction Set Computer/RISC,
>100k transistors)
Fourth Generation: 1990 Architectural and performance leadership
(64-bit, > 1M transistors,
Intel/AMD translate into RISC internally)
Compsci 001
4.18
In the beginning (8-bit) Intel 4004
First general-purpose, singlechip microprocessor
Shipped in 1971
8-bit architecture, 4-bit
implementation
2,300 transistors
Performance < 0.1 MIPS
(Million Instructions Per Sec)
8008: 8-bit implementation in
1972
3,500 transistors
First microprocessor-based
computer (Micral)
• Targeted at laboratory
instrumentation
• Mostly sold in Europe
All chip photos in this talk courtesy of Michael W. Davidson and The Florida State University
Compsci 001
4.19
1st Generation (16-bit) Intel 8086
Introduced in 1978
New 16-bit architecture
Performance < 0.5 MIPS
“Assembly language”
compatible with 8080
29,000 transistors
Includes memory protection,
support for Floating Point
coprocessor
In 1981, IBM introduces PC
Based on 8088--8-bit bus
version of 8086
Compsci 001
4.20
2nd Generation (32-bit) Motorola 68000
Major architectural step in
microprocessors:
First 32-bit architecture
• initial 16-bit implementation
First flat 32-bit address
• Support for paging
General-purpose register
architecture
• Loosely based on PDP-11
minicomputer
First implementation in 1979
68,000 transistors
< 1 MIPS (Million Instructions
Per Second)
Used in
Apple Mac
Sun , Silicon Graphics, & Apollo
workstations
Compsci 001
4.21
3rd Generation: MIPS R2000
Several firsts:
First (commercial) RISC
microprocessor
First microprocessor to
provide integrated support for
instruction & data cache
First pipelined microprocessor
(sustains 1 instruction/clock)
Implemented in 1985
125,000 transistors
5-8 MIPS (Million
Instructions per Second)
Compsci 001
4.22
4th Generation (64 bit) MIPS R4000
First 64-bit architecture
Integrated caches
Integrated floating point
Implemented in 1991:
On-chip
Support for off-chip,
secondary cache
Deep pipeline
1.4M transistors
Initially 100MHz
> 50 MIPS
Intel translates 80x86/
Pentium X instructions into
RISC internally
Compsci 001
4.23
Key Architectural Trends
Increase performance at 1.6x per year (2X/1.5yr)
True from 1985-present
Combination of technology and architectural enhancements
Technology provides faster transistors
( 1/lithographic feature size) and more of them
Faster transistors leads to high clock rates
More transistors (“Moore’s Law”):
• Architectural ideas turn transistors into performance
– Responsible for about half the yearly performance growth
Two key architectural directions
Sophisticated memory hierarchies
Exploiting instruction level parallelism
Compsci 001
4.24
Where have all the transistors gone?
Superscalar
(multiple instructions per clock cycle)
• 3 levels of cache
• Branch prediction
Execution
2 Bus Intf
D
cache
TLB
(predict outcome of decisions)
Out-Of-Order
branch
• Out-of-order execution (executing
instructions in different order
than programmer wrote them)
Icache
SS
Intel Pentium III
(10M transistors)
Compsci 001
4.25
Laws?
Define each of the following. What has its effect been on the
advancement of computing technology?
Moore’s Law
Amdahl’s Law
Metcalfe’s Law
Compsci 001
4.26