Transcript CS 270
CS 270
CS 270: Computer Organization
Course Overview
Instructor:
Professor Stephen P. Carl
CS 270
Quote of the Day
640 Kbytes [of main memory] ought to be enough for
anybody.
– Bill Gates, 1981
CS 270
Course Perspective
Most Systems Courses are Builder-Centric; this Course is
Programmer-Centric
How one can become a more effective programming by knowing
more about the underlying system
Enable you to:
Write programs that are more reliable and efficient
Incorporate features that require hooks into OS
– E.g., concurrency, signal handlers (in CS 428)
Not just a course for dedicated hackers
(But - might just bring out the hidden hacker in you)
You won’t see most of this material in any other course
CS 270
Course Components
Lectures
Higher level concepts
Labs and Assignments
The heart of the course
1 - 3 weeks
Provide in-depth understanding of an aspect of systems
Programming and problem solving
Exams (2 + final)
Test your understanding of concepts & mathematical principles
CS 270
Timeliness
Grace days
4 for the course
Covers scheduling crunch, out-of-town trips, illnesses, minor setbacks
Save them until late in the term!
Lateness penalties
Once grace days used up, get penalized 10% per day
Typically shut off all handins 4 days after due date
Catastrophic events
Major illness, death in family, …
Work with your professor on plan for getting back on track
Advice
Once you start running late, it’s really hard to catch up
CS 270
Cheating
What is cheating?
Sharing code: either by copying, retyping, looking at, or supplying a
copy of a file.
Coaching: helping your friend to write a lab, line by line.
Copying code from previous course or from elsewhere on WWW
Only allowed to use code we supply, or from CS:APP website
What is NOT cheating?
Explaining how to use systems or tools.
Helping others with high-level design issues.
Penalty for cheating:
Case remanded to Honor Council.
Detection of cheating:
We do check and our tools for doing this are much better than you
might think
CS 270
Policies: Attendance and Grading
Presence in lectures: strongly advised
(cut warning after 2 unexcused absences)
Presence in design/help sessions: voluntary
Exams: weighted 15, 15, 15 (final)
Labs: completed/not completed
Guaranteed:
> 90%: A
> 80%: B
> 70%: C
CS 270
Introduction: Themes and Concepts
A bird’s eye view of computer system organization
Processors and technological progress
Abstraction vs. Reality
Internal data representations - it’s all just numbers
Why knowing Assembly language is a Good Thing
Computer storage is large but not infinite
Asymptotic complexity is only part of the story
Computers compute, but they also communicate
CS 270
5 main components of any Computing System
Sun Blade Computer
Computer
Processor
Control
(“brain”)
Datapath
(“brawn”)
Memory
(where
programs,
data
live when
running)
Devices
Input
Output
Keyboard,
Mouse
Disk
(where
programs,
data
live when
not running)
Display,
Printer
CS 270
5 main components of Computer Systems
Datapath
Performs operations on signals moving through the CPU
Control Circuitry
Routes signals into, through, and out of the CPU
Memory
Volatile (RAM)
Permanent Storage (hard drives, DVD-ROM, etc.)
Input devices
Mouse, keyboard, etc.
Output devices
Monitor, printers, etc.
CS 270
CS 270
March of Progress: Moore’s Law
Moore’s Law: the number of transistors on a
single integrated circuit, also called chip density,
doubles every 18 months
Applies to any kind of semiconductor chip:
memory (RAM), microprocessors, GPUs, etc.
Trend first described by Intel co-founder Gordon
Moore in 1965.
CS 270
The March of Progress: DRAM capacity
Size in bits of single-chip Dynamic Random Access Memory (DRAM )
year
1980
1983
1986
1989
1992
1996
1998
2000
2002
size (Mbit)
0.0625
0.25
1
4
16
64
128
256
512
• Now 1.4X/yr, or 2X every 2 years.
• 8000X since 1980!
CS 270
The March of Progress: Microprocessor
Complexity
Itanium 2: 41 Million
Athlon (K7): 22 Million
Alpha 21264: 15 million
PowerPC 620: 6.9 million
Pentium Pro: 5.5 million
(1995)
Sparc Ultra: 5.2 million
Intel 80486: 1 million (1989)
Intel 8088: < 50,000 (1979)
CS 270
The March of Progress: Processor Performance
Performance measure
900
800
700
600
500
400
300
200
100
0
Intel P4 2000 MHz
(Fall 2001)
DEC Alpha
21264/600
1.54X/yr
DEC Alpha 5/500
DEC Alpha 5/300
DEC Alpha 4/266
IBM POWER 100
87 88 89 90 91 92 93 94 95 96 97
year
CS 270
The March of Progress: Dramatic Changes
°
Memory
• DRAM capacity: 64x size improvement in last decade.
°
Processor
• Speed: 100X performance in last decade.
°
Disk
• Capacity doubled every year since 1997: 250X in last decade.
CS 270
The March of Progress: Dramatic Changes
What will the state-of-the-art PC be when you graduate?
• Processor clock speed:
5000 MegaHertz (5.0 GigaHertz)
• Memory capacity: 4000 MegaBytes (4.0 GigaBytes)
• Disk capacity:
2000 GigaBytes
(2.0 TeraBytes)
• Time to learn some new units!
•
•
•
•
•
•
Mega => Giga
Giga => Tera
Tera => Peta
Peta => Exa
Exa => Zetta
Zetta => Yotta = 1024
CS 270
The Death of Progress? The Power Wall
CS 270
Multicore Processors
CS 270
CS 270
Computing Systems as Abstractions
Most CS courses emphasize abstraction
Abstract data types (CS 157/257)
Asymptotic analysis (CS 320)
Computer Systems are organized according to layers of
abstraction.
In general, abstraction helps engineers of all sorts to
manage complexity
Abstraction helps insulate programmers from differences between
various hardware platforms
CS 270
Our Theme: Abstraction Is Important But Don’t Forget
Reality
Abstractions have limits
Especially in the presence of bugs
Understanding details of the underlying implementations: sometimes,
you just need to
Useful outcomes
Become a more effective programmer!
Find and eliminate bugs efficiently
Understand and tune for program performance
Preparation for later “systems” classes in CS
Operating Systems, Networking
CS 270
Reality #1:
Ints are not Integers, Floats are not Reals
Example 1: Is x2 ≥ 0?
Floats: Yes!
Ints:
40000 * 40000 --> 1600000000
50000 * 50000 --> ??
Example 2: Is (x + y) + z = x + (y + z)?
Unsigned & Signed Ints: Yes!
Floats:
(1e20 + -1e20) + 3.14 --> 3.14
1e20 + (-1e20 + 3.14) --> ??
CS 270
Code Security Example
/* Kernel memory region holding user-accessible data */
#define KSIZE 1024
char kbuf[KSIZE];
/* Copy at most maxlen bytes from kernel region to user buffer */
int copy_from_kernel(void *user_dest, int maxlen) {
/* Byte count len is minimum of buffer size and maxlen */
int len = KSIZE < maxlen ? KSIZE : maxlen;
memcpy(user_dest, kbuf, len);
return len;
}
Similar to code found in FreeBSD’s implementation of
getpeername
There are legions of smart people trying to find
vulnerabilities in programs
CS 270
Typical Usage
/* Kernel memory region holding user-accessible data */
#define KSIZE 1024
char kbuf[KSIZE];
/* Copy at most maxlen bytes from kernel region to user buffer */
int copy_from_kernel(void *user_dest, int maxlen) {
/* Byte count len is minimum of buffer size and maxlen */
int len = KSIZE < maxlen ? KSIZE : maxlen;
memcpy(user_dest, kbuf, len);
return len;
}
#define MSIZE 528
void getstuff() {
char mybuf[MSIZE];
copy_from_kernel(mybuf, MSIZE);
printf(“%s\n”, mybuf);
}
CS 270
Malicious Usage
/* Kernel memory region holding user-accessible data */
#define KSIZE 1024
char kbuf[KSIZE];
/* Copy at most maxlen bytes from kernel region to user buffer */
int copy_from_kernel(void *user_dest, int maxlen) {
/* Byte count len is minimum of buffer size and maxlen */
int len = KSIZE < maxlen ? KSIZE : maxlen;
memcpy(user_dest, kbuf, len);
return len;
}
#define MSIZE 528
void getstuff() {
char mybuf[MSIZE];
copy_from_kernel(mybuf, -MSIZE);
. . .
}
CS 270
Computer Arithmetic
Arithmetic operations have important mathematical properties
Cannot assume all the usual mathematical properties, due to
finiteness of representations
Integer operations satisfy “ring” properties
Commutativity, associativity, distributivity
Floating point operations satisfy “ordering” properties
Monotonicity, values of signs
Observation
Need to understand which abstractions apply in which contexts
Important issues for compiler writers and serious application
programmers
CS 270
Great (Grim?) Reality #2:
You Need to Know Assembly
Chances are, you’ll never write program in assembly
Compilers are much better & more patient than you are
But understanding assembly is key to understanding the machinelevel execution model.
When bugs happen, high-level language model breaks down
Tuning program performance
What optimizations are done/not done by the compiler?
What are the sources of program inefficiency?
Implementing system software
Compiler has machine code as target
Operating systems must manage process state
Creating / fighting malware
x86 assembly is the language of choice!
CS 270
Assembly Code Example
Time Stamp Counter
Special 64-bit register in Intel-compatible machines
Incremented every clock cycle
Read with rdtsc instruction
Application
Measure time (in clock cycles) required by a procedure:
double t;
start_counter();
P();
// function to be timed
t = get_counter();
printf("P required %f clock cycles\n", t);
CS 270
Code to Read Counter
Add a small amount of assembly code using GCC’s asm facility
This inserts assembly code into machine code generated by compiler:
static unsigned cyc_hi = 0;
static unsigned cyc_lo = 0;
/* Set *hi and *lo to the high and low order bits
of the cycle counter. */
void access_counter(unsigned *hi, unsigned *lo)
{
asm("rdtsc; movl %%edx,%0; movl %%eax,%1"
: "=r" (*hi), "=r" (*lo)
:
: "%edx", "%eax");
}
CS 270
Grim Reality #3: “Random Access Memory” is a nonphysical abstraction
Memory is not unbounded
It must be allocated and managed
Many applications are memory dominated
Bugs due to memory referencing errors are especially pernicious
Effects are distant in both time (error may show up long after erroneous
instruction executes) and space (effect may be outside the bounds of the
executing program)
Memory performance is not uniform
Cache and virtual memory effects can greatly affect program performance
Adapting program to characteristics of memory system can lead to major
speed improvements
CS 270
Example of a Memory Referencing Bug
double fun(int i)
{
volatile double d[1] = {3.14};
volatile long int a[2];
a[i] = 1073741824; /* Possibly out of bounds */
return d[0];
}
fun(0)
fun(1)
fun(2)
fun(3)
fun(4)
–>
–>
–>
–>
–>
3.14
3.14
3.1399998664856
2.00000061035156
3.14, then segmentation fault
CS 270
Memory Referencing Bug Example
double fun(int i)
{
volatile double d[1] = {3.14};
volatile long int a[2];
a[i] = 1073741824; /* Possibly out of bounds */
return d[0];
}
fun(0)
fun(1)
fun(2)
fun(3)
fun(4)
–>
–>
–>
–>
–>
Explanation:
3.14
3.14
3.1399998664856
2.00000061035156
3.14, then segmentation fault
Saved State
4
MSB of d[0]
3
LSB of d[0]
2
a[1]
1
a[0]
0
Location accessed by
fun(i)
CS 270
Memory Referencing Errors
Unlike Java, C and C++ do not protect memory at all
Out of bounds array references
Invalid pointer values
Abuses of malloc/free (e.g., memory leaks)
This usually leads to nasty bugs
Whether or not bug has any effect depends on system and compiler
Action at a distance
Corrupted object logically unrelated to one being accessed
Effect of bug may be first observed long after it is generated
How can I deal with this?
Program in Scheme, Java or Python
Understand what possible interactions may occur
Use tools that detect referencing errors
CS 270
Memory System Performance Example
void copyij(int src[2048][2048],
int dst[2048][2048])
{
int i,j;
// access in row-major order
for (i = 0; i < 2048; i++)
for (j = 0; j < 2048; j++)
dst[i][j] = src[i][j];
}
void copyji(int src[2048][2048],
int dst[2048][2048])
{
int i,j;
// access in column-major order
for (j = 0; j < 2048; j++)
for (i = 0; i < 2048; i++)
dst[i][j] = src[i][j];
}
21 times slower
on Pentium 4
Hierarchical memory organization
Performance depends on access patterns
Including how we step through multi-dimensional arrays
CS 270
The Memory Mountain
Read throughput (MB/s)
Pentium III Xeon
550 MHz
16 KB on-chip L1 d-cache
16 KB on-chip L1 i-cache
512 KB off-chip unified
L2 cache
1200
1000
L1
800
600
400
L2
8k
32k
128k
512k
2m
8m
s15
s13
s11
Stride (words)
s9
s7
Mem
s5
s3
s1
0
2k
200
Working set size (bytes)
CS 270
Reality #4: There’s more to system
performance than asymptotic complexity
Constant factors matter too!
And even exact op count does not predict performance
Easily see 10:1 performance range depending on how code was written
Must optimize at multiple levels: algorithm, data representations,
procedures, and loops
Must understand system to optimize performance
How are programs compiled and executed
How do we measure program performance and identify bottlenecks
How do we improve performance without destroying code modularity and
generality
CS 270
Example: Matrix Multiplication
Matrix-Matrix Multiplication (MMM) on 2 x Core 2 Duo 3 GHz (double precision)
Gflop/s
50
45
40
35
Best code (K. Goto)
30
25
160x
20
15
10
Triple loop
5
0
0
1,000
2,000
3,000
4,000
5,000
6,000
7,000
8,000
matrix size
Standard desktop computer, vendor compiler, using optimization flags
Both implementations have exactly the same operations count (2n3)
What is going on?
9,000
CS 270
MMM Plot: Analysis
Matrix-Matrix Multiplication (MMM) on 2 x Core 2 Duo 3 GHz
Gflop/s
50
45
40
35
30
Multiple threads: 4x
25
20
15
10
Vector instructions: 4x
5
Memory hierarchy and other optimizations: 20x
0
0
1,000
2,000
3,000
4,000
5,000
matrix size
6,000
7,000
8,000
9,000
Each speedup due to taking advantage of increasingly complex system
resources
Effect: less register spills, less L1/L2 cache misses, less TLB misses
CS 270
Reality #5: Computers do more than just execute
programs
They need to get data in and out
I/O system is critical to program reliability and performance
They communicate with each other over networks
Many system-level issues arise in presence of network
Concurrent operations by autonomous processes
Coping with unreliable media
Cross platform compatibility
Complex performance issues
CS 270
Slide Acknowledgements
Slides to accompany textbook due to Bryant and
O’Hallaron
Technology Trends slides due to Dr. XXX Carle of
UC/Berkeley
Some graphs from Hennesey and Patterson: Computer
Organization and Design, 4th Ed.