Lect2-performance - ODU Computer Science

Download Report

Transcript Lect2-performance - ODU Computer Science

1
CPS3340
COMPUTER ARCHITECTURE
Fall Semester, 2013
Lecture 2: Computer Performance
Instructor: Ashraf Yaseen
08/29/2013
DEPARTMENT OF MATH & COMPUTER SCIENCE
CENTRAL STATE UNIVERSITY, WILBERFORCE, OH
Review
2

Last Class




This Class






Syllabus
Classes of Computers
Decimal, Binary, Octal, Hexadecimal Representations
Program and Computer
Compiler, Assembler, and Linker
Components of a Computer
Definition of Computer Performance
Measure of Computer Performance
Next Class



Quiz
Assignment 1
Power Wall
Understanding Computer Performance
3

Algorithm


Programming language, compiler, architecture


Determine number of machine instructions executed per
operation
Processor and memory system


Determines number of operations executed
Determine how fast instructions are executed
I/O system (including OS)

Determines how fast I/O operations are executed

Application software


Written in high-level language
System software
Compiler: translates High Level
Language code to machine code
 Operating System: service code
 Handling input/output
 Managing memory and storage
 Scheduling tasks & sharing resources


Hardware

Processor, memory, I/O controllers
§1.2 Below Your Program
Below Your Program
Levels of Program Code
5

High-level language
Level of abstraction closer to
problem domain
 Provides for productivity and
portability


Assembly language


Symbolic representation of
instructions
Hardware representation
Binary digits (bits)
 Encoded instructions and data

Compiler
6

Function of Compiler
 Convert
programs in high-level language to programs
in assembly language
Example: C Compiler
7
C program
Assembly Program
Assembler
8

Assembler
 Translates

assembly language into binary instructions
Assembly Language
 Use
symbols instead of 0’s and 1’s
 More readable
Binary Instructions
9
MIPS binary code for summing 0 to 100 square
Linker
10

Separate Compilation
 Allows
a program to be split into pieces that are stored
in different files
 Each file contains a logically related collection of
subroutines and data structures that form a module
 Can
be compiled separately
 Can be reused

Linker
 Merge
Modules together
Functions of a Linker
11
Tasks of a Linker
12



Search the program libraries to find library routines
used by the program
Determine the memory locations that code from
each module will occupy and relocates its
instructions by adjusting absolute references
Resolves references among modules
 Matching
references
Example: gcc compiler
13


Compile a simple program
gcc –v test.c

Same components for
all kinds of computer


Input/output includes
User-interface devices
 Display, keyboard, mouse
 Storage devices
 Hard disk, CD/DVD, flash
 Network adapters
 For communicating with other
computers

The BIG Picture
Desktop, server,
embedded
§1.3 Under the Covers
Components of a Computer
Anatomy of a Computer
15
Output
device
Network
cable
Input
device
Input
device
Opening the Box
16
Inside the Processor (CPU)
17



Datapath: performs operations on data
Control: sequences datapath, memory, ...
Cache memory
 Small
fast SRAM memory for immediate access to data
Inside the Processor
18

AMD Barcelona: 4 processor cores
Abstractions
The BIG Picture
19

Abstraction helps us deal with complexity
 Hide

Instruction set architecture (ISA)
 The

hardware/software interface
Application binary interface
 The

lower-level detail
ISA plus system software interface
Implementation
 The
details underlying and interface
A Safe Place for Data

Volatile main memory


Loses instructions and data when power off
Non-volatile secondary memory



Magnetic disk
Flash memory
Optical disk (CDROM, DVD)
Networks
21


Communication and resource sharing
Local area network (LAN): Ethernet
 Within


a building
Wide area network (WAN: the Internet)
Wireless network: WiFi, Bluetooth
Technology Trends
22

Electronics technology
continues to evolve
DRAM capacity
Increased capacity and
performance
 Reduced cost

Year
Technology
1951
Vacuum tube
1965
Transistor
1975
Integrated circuit (IC)
1995
Very large scale IC (VLSI)
2005
Ultra large scale IC
Relative performance/cost
1
35
900
2,400,000
6,200,000,000
Computer Performance
23
24

Which airplane has the best performance?
Boeing 777
Boeing 777
Boeing 747
Boeing 747
BAC/Sud
Concorde
BAC/Sud
Concorde
Douglas
DC-8-50
Douglas DC8-50
0
100
200
300
400
0
500
Boeing 777
Boeing 777
Boeing 747
Boeing 747
BAC/Sud
Concorde
BAC/Sud
Concorde
Douglas
DC-8-50
Douglas DC8-50
500
1000
Cruising Speed (mph)
4000
6000
8000 10000
Cruising Range (miles)
Passenger Capacity
0
2000
1500
0
100000 200000 300000 400000
Passengers x mph
§1.4 Performance
An Analogy
Answer
25

That depends on …
 If
performance means
 “the
least time of transferring 1 passenger from one place
to another”

Concorde
 “the
least time of transferring 450 passenger from one
place to another”


Boeing 747
Performance can be defined in different ways
Response Time and Throughput
26

Response time (AKA Execution Time)


Total time required for a computer to complete a task
 Measured by time
Throughput (AKA Bandwidth)

Number of tasks done work done per unit time
 e.g., tasks/transactions/… per hour
Response Time and Throughput
27

Assuming each task in a computer is a serial task.
How are response time and throughput affected by
Replacing with a faster processor?
 Reduce response time
 Increase throughput
 Adding more processors?
 Increase throughput
 Same response time


We’ll focus on response time for now…
Performance and Execution Time
28

Performance
Performanc eX  1 Execution time X
Relative Performance
29

“X is n time faster than Y”
Performanc e X Performanc e Y
 Execution time Y Execution time X  n

Example: time taken to run a program



10s on A, 15s on B
Execution TimeB / Execution TimeA
= 15s / 10s = 1.5
So A is 1.5 times faster than B
Measuring Execution Time
30

Elapsed (Wallclock) time
Total response time, including all aspects
 Processing, I/O, OS overhead, idle time
 Determines system performance


CPU time
Time spent processing a given job
 Discounts I/O time, other jobs’ shares
 Comprises user CPU time and system CPU time
 User CPU time: CPU time spent in a program itself
 System CPU time: CPU time spent in the OS performing task
on behalf of the program
 Different programs are affected differently by CPU and
system performance

CPU Clocking
31

Operation of digital hardware governed by a
constant-rate clock
Clock period
Clock (cycles)
Data transfer
and computation
Update state

Clock period: duration of a clock cycle


e.g., 250ps = 0.25ns = 250×10–12s
Clock frequency (rate): cycles per second

e.g., 4.0GHz = 4000MHz = 4.0×109Hz
CPU Time
32
CPU Time  CPU Clock Cycles  Clock Cycle Time
CPU Clock Cycles

Clock Rate
Performance Improvement
33

Performance improved by either

Increasing clock rate
=> Shorter clock period
 => More but shorter instructions
 => More clock cycles


Reducing number of clock cycles
=> Longer clock period
 => Less but Longer Instructions
 => Reducing clock rate


Hardware designer must often trade off clock rate
against cycle count
CPU Time Example
34


A Program on Computer A: 2GHz clock, 10s CPU time
Designing Computer B



Aim for 6s CPU time
Can do faster clock, but causes 1.2 × clock cycles
How fast must Computer B clock be?
Clock Cycles B 1.2  Clock Cycles A
Clock Rate B 

CPU Time B
6s
Clock Cycles A  CPU Time A  Clock Rate A
 10s  2GHz  20  109
1.2  20  109 24  109
Clock Rate B 

 4GHz
6s
6s
Instruction Set Architecture
35

Instruction Set Architecture (ISA)
 An
abstract interface between the hardware and the
lowest-level software that encompasses all the
information necessary to write a machine language
program that will run correctly
 Repertoire
of instructions
 Registers
 Memory
 I/O
access
Clock Cycles per Instruction (CPI)
36

Clock Cycles per Instruction (CPI)
 Average
program
number of clock cycles per instruction for a
Instruction Count and CPI
37
Clock Cycles  Instructio n Count  Cycles per Instructio n
CPU Time  Instructio n Count  CPI  Clock Cycle Time
Instructio n Count  CPI

Clock Rate

Instruction Count (IC) for a program


Determined by program, ISA and compiler
Average cycles per instruction
Determined by CPU hardware
 If different instructions have different CPI
 Average CPI affected by instruction mix

CPI Example
38




Computer A: Cycle Time = 250ps, CPI = 2.0
Computer B: Cycle Time = 500ps, CPI = 1.2
Same ISA
Which is faster, and by how much?
CPU Time
CPU Time
A
 Instructio n Count  CPI  Cycle Time
A
A
 I  2.0  250ps  I  500ps
A is faster…
B
 Instructio n Count  CPI  Cycle Time
B
B
 I  1.2  500ps  I  600ps
B  I  600ps  1.2
CPU Time
I  500ps
A
CPU Time
…by this much
CPI in More Detail
39

If different instruction classes take different numbers
of cycles
n
Clock Cycles   (CPIi  Instructio n Count i )
i1

Weighted average CPI
n
Clock Cycles
Instructio n Count i 

CPI 
   CPIi 

Instructio n Count i1 
Instructio n Count 
Relative frequency
CPI Example
40


Alternative compiled code sequences using instructions
in classes A, B, C
Class
A
B
C
CPI for class
1
2
3
IC in sequence 1
2
1
2
IC in sequence 2
4
1
1
Sequence 1: IC = 5


Clock Cycles
= 2×1 + 1×2 + 2×3
= 10
Avg. CPI = 10/5 = 2.0

Sequence 2: IC = 6


Clock Cycles
= 4×1 + 1×2 + 1×3
=9
Avg. CPI = 9/6 = 1.5
Summary
41







Performance of a Computer
Compiler
Assembler
Linker
Components of a Computer
Response Time and Throughput
Performance Measure



CPI
IC
Performance Definition
What I want you to do
42


Review Chapter 1 and Class Slides
Prepare for your first Quiz