Designing Classes and Programs

Download Report

Transcript Designing Classes and Programs

Today’s topics



Performance & Computer Architecture
 Notes from David A. Patterson and John L. Hennessy, Computer
Organization and Design: The Hardware/Software Interface,
Morgan Kaufmann, 1997.
 http://computer.howstuffworks.com/pc.htm
Slides from
 Alvy Lebeck, Duke CS
 Marti Hearst, UC Berkeley SIMS
 David Patterson, UC Berkeley CS
 Mounir Hamdi, HKUST CS
Upcoming
 Complexity
Compsci 001
4.1
Performance


Performance= 1/Time
 The goal for all software and hardware developers is to
increase performance
Metrics for measuring performance (pros/cons?)


Elapsed time
CPU time
• Instruction count (RISC vx. CISC)
• Clock cycles per instruction
• Clock cycle time
MIPS vs. MFLOPS
 Throughput (tasks/time)
 Other more subjective metrics?
What kind of workload to be used?
 Applications, kernels and benchmarks (toy or synthetic)


Compsci 001
4.2
What is Realtime?

Response time
 Panic
• How to tell “I am still computing”
• Progress bar



Flicker
 Fusion frequency
Update rate vs. refresh rate
 Movie film standards (24 fps projected at 48 fps)
Interactive media
 Interactive vs. non-interactive graphics
• computer games vs. movies
• animation tools vs. animation

Interactivity => real-time systems
• system must respond to user inputs without any perceptible delay
(A Primary Challenge in VR)
Compsci 001
4.3
The Big Picture
Since 1946 all computers have had 5 components


The Von Neumann Machine
Processor
Input
Control
Memory
Datapath

Output
What is computer architecture?
Computer Architecture = Machine Organization +
Instruction Set Architecture + ...
Compsci 001
4.4
Fetch, Decode, Execute Cycle



Computer instructions are stored (as bits) in memory
A program’s execution is a loop
 Fetch instruction from memory
 Decode instruction
 Execute instruction
Cycle time
 Measured in hertz (cycles per second)
 2 GHz processor can execute this cycle up to 2 billion
times a second
 Not all cycles are the same though…
Compsci 001
4.5
Organization





Capabilities & Performance
Characteristics of Principal Functional
Units (Fus)
 (e.g., Registers, ALU, Shifters, Logic
Units, ...)
Ways in which these components are
interconnected
Information flows between components
Logic and means by which such
information flow is controlled.
Choreography of FUs to realize the ISA
Compsci 001
Logic Designer's View
ISA Level
FUs & Interconnect
4.6
Instruction Set Architecture
... the attributes of a [computing] system as seen by the
programmer, i.e. the conceptual structure and functional behavior,
as distinct from the organization of the data flows and controls the
logic design, and the physical implementation.
– Amdahl, Blaaw, and Brooks, 1964
-- Organization of Programmable
Storage
SOFTWARE
-- Data Types & Data Structures:
Encodings & Representations
-- Instruction Set
-- Instruction Formats
-- Modes of Addressing and Accessing Data Items and Instructions
-- Exceptional Conditions
Compsci 001
4.7
The Instruction Set: a Critical Interface
instruction set

What is an example of an Instruction Set architecture?
Compsci 001
4.8
Forces on Computer Architecture
Technology
Programming
Languages
Applications
Computer
Architecture
Cleverness
Operating
Systems
History
Compsci 001
4.9
Technology
DRAM chip capacity
Microprocessor Logic Density
DRAM
Year
1980
1983
1986
1989
1992
1996
1999
2002
2007
2009


Size
64 Kb
256 Kb
1 Mb
4 Mb
16 Mb
64 Mb
256 Mb
1 Gb
2 Gb
4 Gb
uP-Name
In ~1985 the single-chip processor (32-bit) and the single-board
computer emerged
 => workstations, personal computers, multiprocessors have
been riding this wave since
Now, we have multicore processors
Compsci 001
4.10
Technology => dramatic change




Processor
 logic capacity: about 30% per year
 clock rate:
about 20% per year
Memory
 DRAM capacity: about 60% per year (4x every 3 years)
 Memory speed: about 10% per year
 Cost per bit: improves about 25% per year
Disk
 capacity: about 60% per year
 Total use of data: 100% per 9 months!
Network Bandwidth
 Bandwidth increasing more than 100% per year!
Compsci 001
4.11
Performance Trends
Compsci 001
4.12
Processor Transistor Count
(from http://en.wikipedia.org/wiki/Transistor_count)
Processor
Transistor
count
Date of
introduction
Manufacturer
Processor
Transistor
count
Date of
introduction
Manufacturer
Intel 4004
2300
1971
Intel
Itanium
25 000 000
2001
Intel
Intel 8008
2500
1972
Intel
Barton
54 300 000
2003
AMD
Intel 8080
4500
1974
Intel
AMD K8
105 900 000
2003
AMD
Intel 8088
29 000
1978
Intel
Itanium 2
220 000 000
2003
Intel
Intel 80286
134 000
1982
Intel
592 000 000
2004
Intel
Intel 80386
275 000
1985
Intel
Itanium 2 with
9MB cache
Intel 80486
1 200 000
1989
Intel
Cell
241 000 000
2006
Sony/IBM/
Toshiba
Pentium
3 100 000
1993
Intel
Core 2 Duo
291 000 000
2006
Intel
AMD K5
4 300 000
1996
AMD
Core 2 Quadro
582 000 000
2006
Intel
Pentium II
7 500 000
1997
Intel
2006
Intel
8 800 000
1997
AMD
Dual-Core
Itanium 2
1 700 000 000
AMD K6
Pentium III
9 500 000
1999
Intel
2 000 000 000
200
Intel
AMD K6-III
21 300 000
1999
AMD
Quad-Core
Itanium
AMD K7
22 000 000
1999
AMD
Pentium 4
42 000 000
2000
Intel
Compsci 001
4.13
Processor-Memory Speed Gap
1000
CPU
Processor-Memory
Performance Gap:
(grows 50% / year)
100
10
DRAM
1
DRAM
9%/yr.
(2X/10 yrs)
1980
1981
1982
1983
1984
1985
1986
1987
1988
1989
1990
1991
1992
1993
1994
1995
1996
1997
1998
1999
2000
Performance
“Moore’s Law”
µProc
50%/yr.
Compsci 001
4.14
Latency vs. Throughput
Compsci 001
4.15
Memory bottleneck


CPU can execute dozens of instruction in the time it takes to
retrieve one item from memory
Solution: Memory Hierarchy
 Use fast memory
 Registers
 Cache memory
 Rule: small memory is fast, large memory is small
Compsci 001
4.16
A great idea in computer science




Temporal locality
 Programs tend to access data that has been accessed
recently (i.e. close in time)
Spatial locality
 Programs tend to access data at an address near recently
referenced data (i.e. close in space)
Useful in graphics and virtual reality as well
 Realistic images require significant computational power
 Don’t need to represent distant objects as well
Efficient distributed systems rely on locality
 Memory access time increases over a network
 Want to acess data on local machine
Compsci 001
4.17
Microprocessor Generations




First generation: 1971-78
 Behind the power curve
(16-bit, <50k transistors)
Second Generation: 1979-85
 Becoming “real” computers
(32-bit , >50k transistors)
Third Generation: 1985-89
 Challenging the “establishment”
(Reduced Instruction Set Computer/RISC,
>100k transistors)
Fourth Generation: 1990 Architectural and performance leadership
(64-bit, > 1M transistors,
Intel/AMD translate into RISC internally)
Compsci 001
4.18
In the beginning (8-bit) Intel 4004






First general-purpose, singlechip microprocessor
Shipped in 1971
8-bit architecture, 4-bit
implementation
2,300 transistors
Performance < 0.1 MIPS
(Million Instructions Per Sec)
8008: 8-bit implementation in
1972


3,500 transistors
First microprocessor-based
computer (Micral)
• Targeted at laboratory
instrumentation
• Mostly sold in Europe
All chip photos in this talk courtesy of Michael W. Davidson and The Florida State University
Compsci 001
4.19
1st Generation (16-bit) Intel 8086

Introduced in 1978


New 16-bit architecture




Performance < 0.5 MIPS
“Assembly language”
compatible with 8080
29,000 transistors
Includes memory protection,
support for Floating Point
coprocessor
In 1981, IBM introduces PC

Based on 8088--8-bit bus
version of 8086
Compsci 001
4.20
2nd Generation (32-bit) Motorola 68000

Major architectural step in
microprocessors:

First 32-bit architecture
• initial 16-bit implementation

First flat 32-bit address
• Support for paging

General-purpose register
architecture
• Loosely based on PDP-11
minicomputer


First implementation in 1979

68,000 transistors

< 1 MIPS (Million Instructions
Per Second)
Used in


Apple Mac
Sun , Silicon Graphics, & Apollo
workstations
Compsci 001
4.21
3rd Generation: MIPS R2000

Several firsts:




First (commercial) RISC
microprocessor
First microprocessor to
provide integrated support for
instruction & data cache
First pipelined microprocessor
(sustains 1 instruction/clock)
Implemented in 1985


125,000 transistors
5-8 MIPS (Million
Instructions per Second)
Compsci 001
4.22
4th Generation (64 bit) MIPS R4000


First 64-bit architecture
Integrated caches




Integrated floating point
Implemented in 1991:





On-chip
Support for off-chip,
secondary cache
Deep pipeline
1.4M transistors
Initially 100MHz
> 50 MIPS
Intel translates 80x86/
Pentium X instructions into
RISC internally
Compsci 001
4.23
Key Architectural Trends


Increase performance at 1.6x per year (2X/1.5yr)
 True from 1985-present
Combination of technology and architectural enhancements
 Technology provides faster transistors
( 1/lithographic feature size) and more of them
 Faster transistors leads to high clock rates
 More transistors (“Moore’s Law”):
• Architectural ideas turn transistors into performance

– Responsible for about half the yearly performance growth
Two key architectural directions
 Sophisticated memory hierarchies
 Exploiting instruction level parallelism
Compsci 001
4.24
Where have all the transistors gone?

Superscalar
(multiple instructions per clock cycle)
• 3 levels of cache
• Branch prediction
Execution
2 Bus Intf
D
cache
TLB
(predict outcome of decisions)
Out-Of-Order
branch
• Out-of-order execution (executing
instructions in different order
than programmer wrote them)
Icache
SS
Intel Pentium III
(10M transistors)
Compsci 001
4.25
Laws?

Define each of the following. What has its effect been on the
advancement of computing technology?

Moore’s Law

Amdahl’s Law

Metcalfe’s Law
Compsci 001
4.26