Transcript Cost

Fundamentals of Computer Design
1
Outline
•
•
•
•
•
•
Performance Evolution
The Task of a Computer Designer
Technology and Computer Usage Trends
Cost and Trends in Cost
Measuring and Reporting Performance
Quantitative Principles of Computer Design
2
Computer Architecture Is …
• The attributes of a [computing] system as seen by the
programmer, i.e., the conceptual structure and functional
behavior, as distinct from the organization of the data flows
and controls, the logic design, and the physical
implementation. (Amdahl, Blaaw, and Brooks, 1964)
3
Computer Architecture’s
Changing Definition
• 1950s to 1960s: Computer Architecture Course
– Computer Arithmetic
• 1970s to mid 1980s: Computer Architecture Course
– Instruction Set Design, especially ISA appropriate for compilers.
[ISA includes Instruction set, word size, memory address, processor
registers, address and data formats]
• 1990s to 2000s: Computer Architecture Course
– Design of CPU, memory system, I/O system, Multiprocessors
4
Performance Evolution
• 1970s
– performance improved 25—30%/yr
– Mostly due to improved architecture + some technology aids
• 1980s
– VLSI + microprocessor became the foundation
– Technology improves at 35%/yr
– Mostly with UNIX and C in mid-80’s
• Even most system programmers gave up assembly language
• With this came the need for efficient compilers
5
Performance Evolution (Cont.)
• 1980s (Cont.)
– Compiler focus brought on the great CISC vs. RISC debate
• With the exception of Intel – RISC won the argument
• RISC performance improved by 50%/year initially
• Of course RISC is not as simple anymore and the compiler is a
key part of the game
– Does not matter how fast your computer is, if the compiler
wastes most of it due to the inability to generate efficient
code
– With the exploitation of instruction-level parallelism (pipeline + superscalar) and the use of caches, performance is further enhanced
CISC: Complex Instruction Set Computing
RISC: Relegate Important Stuff to the Compiler (Reduced Instruction Set Computing
6
Growth in Performance
(Figure 1.1)
MIPS: Microprocessor without Interlocked Pipeline
IBM: International Business Machines
Mainly due to advanced
architecture ideas
HP: Hewlett Packard
DEC: Digital Equipment Corporation
7
Technology
driven
The changing Face of Computing and the Task of the
Computer Designer
• Changes in computer use have led to three different
computing markets
• Each characterized by different applications, requirements,
and computing technologies:
– Desktop Computing
– Servers
– Embedded Computers
8
Desktop Computing
• The first, and still the largest market in dollar terms, is
desktop computing.
• Desktop top systems often are where the newest, highestperformance ,microprocessors appear.
• Recently cost-reduced microprocessors and systems
appear.
9
Servers
• For servers different characteristics are important:
– Availability
• Which means that the system can reliably and effectively provide
service.
– Scalability
• Means server systems often grow over their lifetime in response
to a growing demand for the services they support (or) an
increase in functional requirements.
• The ability to scale up the computing capacity, the memory, the
storage and the I/O bandwidth of server.
– Efficient Throughput
• That is, the overall performance of the server, in terms of
transactions per minute or web page served per second.
10
Embedded Computers
• Embedded computers have the widest range of processing
power and cost.
• Performance requirement in an embedded application is a
real –time requirement.
– A real-time performance requirement is one where a segment of the
application has an absolute maximum execution time that is allowed.
• Two other key characteristics exist in many embedded
applications:
– The need to minimize memory
– The need to minimize power.
11
The Task of A Computer Designer
12
The Task of A Computer Designer
• The task of computer designer faces is a complex one:
– Determine what attributes are important for a new machine.
– Then design a machine to maximize performance.
• This aspects, including
» Instruction set design
» Functional Organization
» Logic design and implementation
13
Condt…
• Instruction set architecture refers to the actual programmervisible instruction set.
– The instruction set architecture servers as the boundary between the
software and hardware.
• The term Organization includes the high-level aspects of a
computer design, such as
– The Memory system,
– The bus structure and
– Design of the internal CPU
(CPU-where arithmetic, logic, branching, and data transfer are
implemented)
14
Condt…..
• Hardware is used to refer to the specifics of a machine,
including:
– The detailed logic design
– The packaging technology of the machine.
• Computer architects must design a computer to meet
functional requirements as well as price, power, and
performance goal.
• In addition to performance and cost, performance must be
aware of important trends in both the implementation
technology and the use of computers.
15
Technology Trends
16
Technology Trends
• To plan for the evolution of a machine, the designer must be
especially aware of rapidly occurring changes in
implementation technology.
• 4 implementation technologies, which change at a dramatic
pace, are critical to modern implementations:
–
–
–
–
Integrated Circuit Logic Technology
Semiconductor DRAM
Magnetic disk Technology
Network Technology
17
Technology Trends
(Cont..)
• Integrated Circuits
–
–
–
–
Density increases at 35%/yr.
Die size increases 10%-20%/yr
Combination is a chip complexity growth rate of 55%/yr
Transistor speed increase is similar but signal propagation does not
track this curve - so clock rates don’t go up as fast
• Semiconductor DRAM
– Density quadruples every 3 years (approx. 60%/yr) [4x steps]
– Cycle time decreases slowly - 33% in 10 years
– Interface changes have improved bandwidth
18
Technology Trends (Cont.)
• Magnetic Disk
– Currently density improves at 100%/yr
– Access time has improved by 33% in 10 years
• Network Technology
– Depends both on the performance of switches and transmission
system
– 1GB Ethernet becomes available about 5 years after 100MB
– Doubling in bandwidth every year
19
Cost, Price, and Their Trends
20
Cost
• Clearly a market place issue -- profit as a function of volume
• Price is what you sell a finished good for, and Cost is the
amount spent to produce it.
• Let’s focus on hardware costs
• Factors impacting cost
– Learning curve – manufacturing costs decrease over time
– Yield – the percentage of manufactured devices that survives the
testing procedure
– Volume is also a key factor in determine cost
– Commodities are products that are sold by multiple vendors in large
volumes and are essentially identical.
21
Integrated Circuits Costs
22
Remember This Comic…?
23
Cost of an Integrated Circuit
• The cost of a packaged integrated circuit is
Cost of die+Cost of testing die+Cost of packaging and final test
Cost of IC=----------------------------------------------------------------------------Final test yield
Cost of die=(Cost of wafer) / (Dies per wafer  Die yield)
  (Wafer diameter/2)2
  Wafer diameter
Dies per wafer=------------------------------ - ------------------------Die area
(2  Die area) 0.5
24
Cost of an Integrated Circuit
• The fraction or percentage of good dies on a wafer
number (die yield):
Defects per unit area  Die area
-
Die yield=Wafer yield  { 1 +------------------------------------------}

Where  is a parameter that corresponds roughly to the number of masking level, a
measure on manufacturing complexity, critical to die yield ( = 4.0 is a good
estimate).
Die Cost goes roughly with die area5
25
Cost versus Price
• The relationship between price and volume can increase the impact of
changes in cost, especially at the low end of the market.
• Factors determine price, and to typical ranges for these
factors:
– Direct Costs (add 10% to 30%): costs directly related to making a
project
• Labor, purchasing, scrap, warranty
– Gross Margin (add 10% to 45%): the company’s overhead that
cannot be billed directly to one project
• R&D, marketing, sales, equipment maintenance,
rental, financing cost, pretax profits, taxes
– Average Discount to get List Price (add 33% to 66%)
• Volume discounts and/or retailer markup
26
Cost/Price Illustration
27
Cost/Price for Different Kinds
of Systems
100%
80%
Average Discount
60%
Gross Margin
40%
Direct Costs
20%
Component Costs
0%
Mini
W/S
PC
28
Measuring and Reporting
Performance
29
Performance
• When we can say that one computer is faster than another?
– The user of a desktop machine may say a computer is faster when a
program runs in less time.
– The computer center manger running a large server system may say
a computer is faster when it completes more jobs in an hour.
• The computer user is interested in reducing response time
[ The time between the start and the completion of an event also
referred to as execution time]
• The manager of large data processing center may be
interested in increasing throughput.
[The total amount of work done in given time].
30
Performance
(Cont.)
• If we want to relate the performance of 2 different machines X
and Y.
• In particular “X is n times faster than Y” which means
N
ExecutionTimeY
ExecutionTime X
• Execution time is the reciprocal of performance.
i. e performance = 1/execution time
• Improved performance  decreasing execution time
31
Measuring Performance
• Several kinds of time
– Wall-clock time: response time, or elapsed time
[which is the latency to complete a task, including
• Disk access, memory access, input/output activities, operating
systems overhead.
– CPU time
• CPU time can be further divided into the
– User CPU time: CPU time spent in the program
– System CPU time: CPU time spent in the operating system
performing tasks requested by the program.
32
Choosing Programs to Evaluate Performance
• Real applications – clearly the right choice
– Porting and eliminating system-dependent activities
– User burden -- to know which of your programs you really care about
• Modified (or scripted) applications
– Enhance portability or focus on particular aspects of system
performance
• Kernels – small, key pieces of real programs
– Best used to isolate performance of individual features to explain the
reasons from differences in performance of real programs
– Not real programs however -- no user really uses them
33
Choosing Programs to Evaluate Performance
(Cont.)
• Toy benchmarks – quicksort, puzzle
– Beginning programming assignment
• Synthetic benchmarks
– Try to match the average frequency of operations and operands of a
large set of programs
– No user really runs them -- not even pieces of real programs
– They typically reside in cache & don’t test memory performance
– At the very least you must understand what the benchmark code is in
order to understand what it might be measuring
– Companies thrive or bust on benchmark performance
• Hence they optimize for the benchmark
34
Benchmark Suites
• One of the most successful attempts to create standardized
benchmarks application suits has been the SPEC.
• SPEC (Standard Performance Evaluation Corporation)
– http://www.spec.org
• Desktop benchmarks
• Server benchmarks
• Embedded benchmarks
35
Reporting Performance Results
• The guiding principle of reporting performance
measurements should be reproducibility
– List everything another experimenter would need to duplicate the
results.
– A SPEC benchmark report requires a fairly complete descriptions of
the machine and the compiler flags, as well as the publication of the
both the baseline and optimized results.
36
Quantitative Principles of
Computer Design
37
Make the Common Case Fast
• Most pervasive principle of design is to make the common case fast.
• In making a design trade-off, favor the frequent case over the infrequent
case.
• Improving the frequent event, rather than the rare event, will obviously
help performance
• The frequent case is simpler and can be done faster than the infrequent
case.
Ex:
– adding two numbers in the CPU, we can expect the overflow in the rare
cases.
– Therefore improve the performance by optimizing the more common case of
no overflow.
38
Amdahl’s Law
• To calculate the performance that can be obtained by
improving some portion of computer.
• It states that the performance improvement to be gained
from using some faster mode of execution is limited by the
fraction of the time the faster mode can be used.
• It also defines the speed up that can be gained by using a
particular feature,
39
Amdahl’s Law
(Condt…)
• Defines speedup gained from a particular feature
• Depends on 2 factors
– Fraction of original computation time that can take advantage of the
enhancement
– Level of improvement gained by the feature
• Amdahl’s law
40
Simple Example
Amdahl’s Law says
nothing about cost
• Important Application:
– FPSQRT 20%
– FP instructions account for 50%
– Other 30%
• Designers say same cost to speedup:
– FPSQRT by 40x
– FP by 2x
– Other by 8x
• Which one should you invest?
• Straightforward plug in the numbers & compare BUT what’s
your guess??
41
And the Winner Is…?
42
Calculating CPU Performance
• Essentially all computers are constructed using a clock
running at a constant rate.
– These discrete time events are called tricks, clock tricks, clock
periods, clocks, cycles, or clock cycles.
• CPU time for a program can be expressed 2 ways:
CPU _ Time  CPU _ clock _ cycles _ for _ a _ program * Clock _ cycle _ time
OR
CPU _ clock _ cycles _ for _ a _ program
CPU _ Time 
Clock _ rate
43
Calculating CPU Performance
(Cont.)
• In addition to the number of clock cycles needed to execute
a program, we can also count the number of instructions
executed.
– The instruction path length or instruction count (IC).
• Now we can calculate the average number of cycles per
instruction (CPI).
CPU _ clock _ cycles _ for _ a _ program
CPI 
IC
IC * CPI
CPU _ Time  IC * CPI * Clock _ cycle _ time 
Clock _ rate
44
Calculating CPU Performance (Cont.)
• As the formula demonstrates, CPU performance Focus on
3 Factors -- Cycle Time, CPI, IC
– Sadly - they are interdependent and making one better often makes
another worse (but small or predictable impacts)
• Cycle time depends on Hardware technology and organization
• CPI depends on organization (pipeline, caching...) and ISA
• IC depends on ISA and compiler technology
• Often CPI’s are easiernto deal with on a per instruction basis
CPU _ clock _ cycles   CPIi * ICi
n
i 1
 CPIi * ICi
n
ICi
Overall _ CPI 
  CPIi *
Instruction _ count
Instruction _ count
i 1
45
i 1
Calculating CPU Performance (Cont.)
• Some times it is useful in designing the CPU to calculate the
number of total CPU cycles as
n
CPU _ clock _ cycles   CPIi * ICi
n
i 1
 CPIi * ICi
n
ICi
Overall _ CPI 
  CPIi *
Instruction _ count i 1
Instruction _ count
i 1
46