Lect2-performance - ODU Computer Science
Download
Report
Transcript Lect2-performance - ODU Computer Science
1
CPS3340
COMPUTER ARCHITECTURE
Fall Semester, 2013
Lecture 2: Computer Performance
Instructor: Ashraf Yaseen
08/29/2013
DEPARTMENT OF MATH & COMPUTER SCIENCE
CENTRAL STATE UNIVERSITY, WILBERFORCE, OH
Review
2
Last Class
This Class
Syllabus
Classes of Computers
Decimal, Binary, Octal, Hexadecimal Representations
Program and Computer
Compiler, Assembler, and Linker
Components of a Computer
Definition of Computer Performance
Measure of Computer Performance
Next Class
Quiz
Assignment 1
Power Wall
Understanding Computer Performance
3
Algorithm
Programming language, compiler, architecture
Determine number of machine instructions executed per
operation
Processor and memory system
Determines number of operations executed
Determine how fast instructions are executed
I/O system (including OS)
Determines how fast I/O operations are executed
Application software
Written in high-level language
System software
Compiler: translates High Level
Language code to machine code
Operating System: service code
Handling input/output
Managing memory and storage
Scheduling tasks & sharing resources
Hardware
Processor, memory, I/O controllers
§1.2 Below Your Program
Below Your Program
Levels of Program Code
5
High-level language
Level of abstraction closer to
problem domain
Provides for productivity and
portability
Assembly language
Symbolic representation of
instructions
Hardware representation
Binary digits (bits)
Encoded instructions and data
Compiler
6
Function of Compiler
Convert
programs in high-level language to programs
in assembly language
Example: C Compiler
7
C program
Assembly Program
Assembler
8
Assembler
Translates
assembly language into binary instructions
Assembly Language
Use
symbols instead of 0’s and 1’s
More readable
Binary Instructions
9
MIPS binary code for summing 0 to 100 square
Linker
10
Separate Compilation
Allows
a program to be split into pieces that are stored
in different files
Each file contains a logically related collection of
subroutines and data structures that form a module
Can
be compiled separately
Can be reused
Linker
Merge
Modules together
Functions of a Linker
11
Tasks of a Linker
12
Search the program libraries to find library routines
used by the program
Determine the memory locations that code from
each module will occupy and relocates its
instructions by adjusting absolute references
Resolves references among modules
Matching
references
Example: gcc compiler
13
Compile a simple program
gcc –v test.c
Same components for
all kinds of computer
Input/output includes
User-interface devices
Display, keyboard, mouse
Storage devices
Hard disk, CD/DVD, flash
Network adapters
For communicating with other
computers
The BIG Picture
Desktop, server,
embedded
§1.3 Under the Covers
Components of a Computer
Anatomy of a Computer
15
Output
device
Network
cable
Input
device
Input
device
Opening the Box
16
Inside the Processor (CPU)
17
Datapath: performs operations on data
Control: sequences datapath, memory, ...
Cache memory
Small
fast SRAM memory for immediate access to data
Inside the Processor
18
AMD Barcelona: 4 processor cores
Abstractions
The BIG Picture
19
Abstraction helps us deal with complexity
Hide
Instruction set architecture (ISA)
The
hardware/software interface
Application binary interface
The
lower-level detail
ISA plus system software interface
Implementation
The
details underlying and interface
A Safe Place for Data
Volatile main memory
Loses instructions and data when power off
Non-volatile secondary memory
Magnetic disk
Flash memory
Optical disk (CDROM, DVD)
Networks
21
Communication and resource sharing
Local area network (LAN): Ethernet
Within
a building
Wide area network (WAN: the Internet)
Wireless network: WiFi, Bluetooth
Technology Trends
22
Electronics technology
continues to evolve
DRAM capacity
Increased capacity and
performance
Reduced cost
Year
Technology
1951
Vacuum tube
1965
Transistor
1975
Integrated circuit (IC)
1995
Very large scale IC (VLSI)
2005
Ultra large scale IC
Relative performance/cost
1
35
900
2,400,000
6,200,000,000
Computer Performance
23
24
Which airplane has the best performance?
Boeing 777
Boeing 777
Boeing 747
Boeing 747
BAC/Sud
Concorde
BAC/Sud
Concorde
Douglas
DC-8-50
Douglas DC8-50
0
100
200
300
400
0
500
Boeing 777
Boeing 777
Boeing 747
Boeing 747
BAC/Sud
Concorde
BAC/Sud
Concorde
Douglas
DC-8-50
Douglas DC8-50
500
1000
Cruising Speed (mph)
4000
6000
8000 10000
Cruising Range (miles)
Passenger Capacity
0
2000
1500
0
100000 200000 300000 400000
Passengers x mph
§1.4 Performance
An Analogy
Answer
25
That depends on …
If
performance means
“the
least time of transferring 1 passenger from one place
to another”
Concorde
“the
least time of transferring 450 passenger from one
place to another”
Boeing 747
Performance can be defined in different ways
Response Time and Throughput
26
Response time (AKA Execution Time)
Total time required for a computer to complete a task
Measured by time
Throughput (AKA Bandwidth)
Number of tasks done work done per unit time
e.g., tasks/transactions/… per hour
Response Time and Throughput
27
Assuming each task in a computer is a serial task.
How are response time and throughput affected by
Replacing with a faster processor?
Reduce response time
Increase throughput
Adding more processors?
Increase throughput
Same response time
We’ll focus on response time for now…
Performance and Execution Time
28
Performance
Performanc eX 1 Execution time X
Relative Performance
29
“X is n time faster than Y”
Performanc e X Performanc e Y
Execution time Y Execution time X n
Example: time taken to run a program
10s on A, 15s on B
Execution TimeB / Execution TimeA
= 15s / 10s = 1.5
So A is 1.5 times faster than B
Measuring Execution Time
30
Elapsed (Wallclock) time
Total response time, including all aspects
Processing, I/O, OS overhead, idle time
Determines system performance
CPU time
Time spent processing a given job
Discounts I/O time, other jobs’ shares
Comprises user CPU time and system CPU time
User CPU time: CPU time spent in a program itself
System CPU time: CPU time spent in the OS performing task
on behalf of the program
Different programs are affected differently by CPU and
system performance
CPU Clocking
31
Operation of digital hardware governed by a
constant-rate clock
Clock period
Clock (cycles)
Data transfer
and computation
Update state
Clock period: duration of a clock cycle
e.g., 250ps = 0.25ns = 250×10–12s
Clock frequency (rate): cycles per second
e.g., 4.0GHz = 4000MHz = 4.0×109Hz
CPU Time
32
CPU Time CPU Clock Cycles Clock Cycle Time
CPU Clock Cycles
Clock Rate
Performance Improvement
33
Performance improved by either
Increasing clock rate
=> Shorter clock period
=> More but shorter instructions
=> More clock cycles
Reducing number of clock cycles
=> Longer clock period
=> Less but Longer Instructions
=> Reducing clock rate
Hardware designer must often trade off clock rate
against cycle count
CPU Time Example
34
A Program on Computer A: 2GHz clock, 10s CPU time
Designing Computer B
Aim for 6s CPU time
Can do faster clock, but causes 1.2 × clock cycles
How fast must Computer B clock be?
Clock Cycles B 1.2 Clock Cycles A
Clock Rate B
CPU Time B
6s
Clock Cycles A CPU Time A Clock Rate A
10s 2GHz 20 109
1.2 20 109 24 109
Clock Rate B
4GHz
6s
6s
Instruction Set Architecture
35
Instruction Set Architecture (ISA)
An
abstract interface between the hardware and the
lowest-level software that encompasses all the
information necessary to write a machine language
program that will run correctly
Repertoire
of instructions
Registers
Memory
I/O
access
Clock Cycles per Instruction (CPI)
36
Clock Cycles per Instruction (CPI)
Average
program
number of clock cycles per instruction for a
Instruction Count and CPI
37
Clock Cycles Instructio n Count Cycles per Instructio n
CPU Time Instructio n Count CPI Clock Cycle Time
Instructio n Count CPI
Clock Rate
Instruction Count (IC) for a program
Determined by program, ISA and compiler
Average cycles per instruction
Determined by CPU hardware
If different instructions have different CPI
Average CPI affected by instruction mix
CPI Example
38
Computer A: Cycle Time = 250ps, CPI = 2.0
Computer B: Cycle Time = 500ps, CPI = 1.2
Same ISA
Which is faster, and by how much?
CPU Time
CPU Time
A
Instructio n Count CPI Cycle Time
A
A
I 2.0 250ps I 500ps
A is faster…
B
Instructio n Count CPI Cycle Time
B
B
I 1.2 500ps I 600ps
B I 600ps 1.2
CPU Time
I 500ps
A
CPU Time
…by this much
CPI in More Detail
39
If different instruction classes take different numbers
of cycles
n
Clock Cycles (CPIi Instructio n Count i )
i1
Weighted average CPI
n
Clock Cycles
Instructio n Count i
CPI
CPIi
Instructio n Count i1
Instructio n Count
Relative frequency
CPI Example
40
Alternative compiled code sequences using instructions
in classes A, B, C
Class
A
B
C
CPI for class
1
2
3
IC in sequence 1
2
1
2
IC in sequence 2
4
1
1
Sequence 1: IC = 5
Clock Cycles
= 2×1 + 1×2 + 2×3
= 10
Avg. CPI = 10/5 = 2.0
Sequence 2: IC = 6
Clock Cycles
= 4×1 + 1×2 + 1×3
=9
Avg. CPI = 9/6 = 1.5
Summary
41
Performance of a Computer
Compiler
Assembler
Linker
Components of a Computer
Response Time and Throughput
Performance Measure
CPI
IC
Performance Definition
What I want you to do
42
Review Chapter 1 and Class Slides
Prepare for your first Quiz