Transcript CS 270

CS 270
CS 270: Computer Organization
Course Overview
Instructor:
Professor Stephen P. Carl
CS 270
Quote of the Day
640 Kbytes [of main memory] ought to be enough for
anybody.
– Bill Gates, 1981
CS 270
Course Perspective

Most Systems Courses are Builder-Centric; this Course is
Programmer-Centric
 How one can become a more effective programming by knowing
more about the underlying system
 Enable you to:
 Write programs that are more reliable and efficient
 Incorporate features that require hooks into OS
– E.g., concurrency, signal handlers (in CS 428)
 Not just a course for dedicated hackers
(But - might just bring out the hidden hacker in you)
 You won’t see most of this material in any other course
CS 270
Course Components

Lectures
 Higher level concepts

Labs and Assignments





The heart of the course
1 - 3 weeks
Provide in-depth understanding of an aspect of systems
Programming and problem solving
Exams (2 + final)
 Test your understanding of concepts & mathematical principles
CS 270
Timeliness

Grace days
 4 for the course
 Covers scheduling crunch, out-of-town trips, illnesses, minor setbacks
 Save them until late in the term!

Lateness penalties
 Once grace days used up, get penalized 10% per day
 Typically shut off all handins 4 days after due date

Catastrophic events
 Major illness, death in family, …
 Work with your professor on plan for getting back on track

Advice
 Once you start running late, it’s really hard to catch up
CS 270
Cheating

What is cheating?
 Sharing code: either by copying, retyping, looking at, or supplying a
copy of a file.
 Coaching: helping your friend to write a lab, line by line.
 Copying code from previous course or from elsewhere on WWW
 Only allowed to use code we supply, or from CS:APP website

What is NOT cheating?
 Explaining how to use systems or tools.
 Helping others with high-level design issues.

Penalty for cheating:
 Case remanded to Honor Council.

Detection of cheating:
 We do check and our tools for doing this are much better than you
might think
CS 270
Policies: Attendance and Grading

Presence in lectures: strongly advised
(cut warning after 2 unexcused absences)

Presence in design/help sessions: voluntary

Exams: weighted 15, 15, 15 (final)

Labs: completed/not completed

Guaranteed:
 > 90%: A
 > 80%: B
 > 70%: C
CS 270
Introduction: Themes and Concepts

A bird’s eye view of computer system organization

Processors and technological progress

Abstraction vs. Reality





Internal data representations - it’s all just numbers
Why knowing Assembly language is a Good Thing
Computer storage is large but not infinite
Asymptotic complexity is only part of the story
Computers compute, but they also communicate
CS 270
5 main components of any Computing System
Sun Blade Computer
Computer
Processor
Control
(“brain”)
Datapath
(“brawn”)
Memory
(where
programs,
data
live when
running)
Devices
Input
Output
Keyboard,
Mouse
Disk
(where
programs,
data
live when
not running)
Display,
Printer
CS 270
5 main components of Computer Systems

Datapath
 Performs operations on signals moving through the CPU

Control Circuitry
 Routes signals into, through, and out of the CPU

Memory
 Volatile (RAM)
 Permanent Storage (hard drives, DVD-ROM, etc.)

Input devices
 Mouse, keyboard, etc.

Output devices
 Monitor, printers, etc.
CS 270
CS 270
March of Progress: Moore’s Law

Moore’s Law: the number of transistors on a
single integrated circuit, also called chip density,
doubles every 18 months

Applies to any kind of semiconductor chip:
memory (RAM), microprocessors, GPUs, etc.

Trend first described by Intel co-founder Gordon
Moore in 1965.
CS 270
The March of Progress: DRAM capacity
Size in bits of single-chip Dynamic Random Access Memory (DRAM )
year
1980
1983
1986
1989
1992
1996
1998
2000
2002
size (Mbit)
0.0625
0.25
1
4
16
64
128
256
512
• Now 1.4X/yr, or 2X every 2 years.
• 8000X since 1980!
CS 270
The March of Progress: Microprocessor
Complexity
Itanium 2: 41 Million
Athlon (K7): 22 Million
Alpha 21264: 15 million
PowerPC 620: 6.9 million
Pentium Pro: 5.5 million
(1995)
Sparc Ultra: 5.2 million
Intel 80486: 1 million (1989)
Intel 8088: < 50,000 (1979)
CS 270
The March of Progress: Processor Performance
Performance measure
900
800
700
600
500
400
300
200
100
0
Intel P4 2000 MHz
(Fall 2001)
DEC Alpha
21264/600
1.54X/yr
DEC Alpha 5/500
DEC Alpha 5/300
DEC Alpha 4/266
IBM POWER 100
87 88 89 90 91 92 93 94 95 96 97
year
CS 270
The March of Progress: Dramatic Changes
°
Memory
• DRAM capacity: 64x size improvement in last decade.
°
Processor
• Speed: 100X performance in last decade.
°
Disk
• Capacity doubled every year since 1997: 250X in last decade.
CS 270
The March of Progress: Dramatic Changes
What will the state-of-the-art PC be when you graduate?
• Processor clock speed:
5000 MegaHertz (5.0 GigaHertz)
• Memory capacity: 4000 MegaBytes (4.0 GigaBytes)
• Disk capacity:
2000 GigaBytes
(2.0 TeraBytes)
• Time to learn some new units!
•
•
•
•
•
•
Mega => Giga
Giga => Tera
Tera => Peta
Peta => Exa
Exa => Zetta
Zetta => Yotta = 1024
CS 270
The Death of Progress? The Power Wall
CS 270
Multicore Processors
CS 270
CS 270
Computing Systems as Abstractions

Most CS courses emphasize abstraction
 Abstract data types (CS 157/257)
 Asymptotic analysis (CS 320)

Computer Systems are organized according to layers of
abstraction.
 In general, abstraction helps engineers of all sorts to
manage complexity
 Abstraction helps insulate programmers from differences between
various hardware platforms
CS 270
Our Theme: Abstraction Is Important But Don’t Forget
Reality

Abstractions have limits
 Especially in the presence of bugs
 Understanding details of the underlying implementations: sometimes,
you just need to

Useful outcomes
 Become a more effective programmer!
Find and eliminate bugs efficiently
 Understand and tune for program performance
 Preparation for later “systems” classes in CS
 Operating Systems, Networking

CS 270
Reality #1:
Ints are not Integers, Floats are not Reals

Example 1: Is x2 ≥ 0?
 Floats: Yes!
 Ints:



40000 * 40000 --> 1600000000
50000 * 50000 --> ??
Example 2: Is (x + y) + z = x + (y + z)?
 Unsigned & Signed Ints: Yes!
 Floats:


(1e20 + -1e20) + 3.14 --> 3.14
1e20 + (-1e20 + 3.14) --> ??
CS 270
Code Security Example
/* Kernel memory region holding user-accessible data */
#define KSIZE 1024
char kbuf[KSIZE];
/* Copy at most maxlen bytes from kernel region to user buffer */
int copy_from_kernel(void *user_dest, int maxlen) {
/* Byte count len is minimum of buffer size and maxlen */
int len = KSIZE < maxlen ? KSIZE : maxlen;
memcpy(user_dest, kbuf, len);
return len;
}


Similar to code found in FreeBSD’s implementation of
getpeername
There are legions of smart people trying to find
vulnerabilities in programs
CS 270
Typical Usage
/* Kernel memory region holding user-accessible data */
#define KSIZE 1024
char kbuf[KSIZE];
/* Copy at most maxlen bytes from kernel region to user buffer */
int copy_from_kernel(void *user_dest, int maxlen) {
/* Byte count len is minimum of buffer size and maxlen */
int len = KSIZE < maxlen ? KSIZE : maxlen;
memcpy(user_dest, kbuf, len);
return len;
}
#define MSIZE 528
void getstuff() {
char mybuf[MSIZE];
copy_from_kernel(mybuf, MSIZE);
printf(“%s\n”, mybuf);
}
CS 270
Malicious Usage
/* Kernel memory region holding user-accessible data */
#define KSIZE 1024
char kbuf[KSIZE];
/* Copy at most maxlen bytes from kernel region to user buffer */
int copy_from_kernel(void *user_dest, int maxlen) {
/* Byte count len is minimum of buffer size and maxlen */
int len = KSIZE < maxlen ? KSIZE : maxlen;
memcpy(user_dest, kbuf, len);
return len;
}
#define MSIZE 528
void getstuff() {
char mybuf[MSIZE];
copy_from_kernel(mybuf, -MSIZE);
. . .
}
CS 270
Computer Arithmetic

Arithmetic operations have important mathematical properties

Cannot assume all the usual mathematical properties, due to
finiteness of representations
 Integer operations satisfy “ring” properties
Commutativity, associativity, distributivity
 Floating point operations satisfy “ordering” properties
 Monotonicity, values of signs


Observation
 Need to understand which abstractions apply in which contexts
 Important issues for compiler writers and serious application
programmers
CS 270
Great (Grim?) Reality #2:
You Need to Know Assembly

Chances are, you’ll never write program in assembly
 Compilers are much better & more patient than you are

But understanding assembly is key to understanding the machinelevel execution model.
 When bugs happen, high-level language model breaks down
 Tuning program performance
 What optimizations are done/not done by the compiler?
 What are the sources of program inefficiency?
 Implementing system software
 Compiler has machine code as target
 Operating systems must manage process state
 Creating / fighting malware
 x86 assembly is the language of choice!
CS 270
Assembly Code Example

Time Stamp Counter
 Special 64-bit register in Intel-compatible machines
 Incremented every clock cycle
 Read with rdtsc instruction

Application
 Measure time (in clock cycles) required by a procedure:
double t;
start_counter();
P();
// function to be timed
t = get_counter();
printf("P required %f clock cycles\n", t);
CS 270
Code to Read Counter


Add a small amount of assembly code using GCC’s asm facility
This inserts assembly code into machine code generated by compiler:
static unsigned cyc_hi = 0;
static unsigned cyc_lo = 0;
/* Set *hi and *lo to the high and low order bits
of the cycle counter. */
void access_counter(unsigned *hi, unsigned *lo)
{
asm("rdtsc; movl %%edx,%0; movl %%eax,%1"
: "=r" (*hi), "=r" (*lo)
:
: "%edx", "%eax");
}
CS 270
Grim Reality #3: “Random Access Memory” is a nonphysical abstraction

Memory is not unbounded
 It must be allocated and managed
 Many applications are memory dominated

Bugs due to memory referencing errors are especially pernicious
 Effects are distant in both time (error may show up long after erroneous
instruction executes) and space (effect may be outside the bounds of the
executing program)

Memory performance is not uniform
 Cache and virtual memory effects can greatly affect program performance
 Adapting program to characteristics of memory system can lead to major
speed improvements
CS 270
Example of a Memory Referencing Bug
double fun(int i)
{
volatile double d[1] = {3.14};
volatile long int a[2];
a[i] = 1073741824; /* Possibly out of bounds */
return d[0];
}
fun(0)
fun(1)
fun(2)
fun(3)
fun(4)
–>
–>
–>
–>
–>
3.14
3.14
3.1399998664856
2.00000061035156
3.14, then segmentation fault
CS 270
Memory Referencing Bug Example
double fun(int i)
{
volatile double d[1] = {3.14};
volatile long int a[2];
a[i] = 1073741824; /* Possibly out of bounds */
return d[0];
}
fun(0)
fun(1)
fun(2)
fun(3)
fun(4)
–>
–>
–>
–>
–>
Explanation:
3.14
3.14
3.1399998664856
2.00000061035156
3.14, then segmentation fault
Saved State
4
MSB of d[0]
3
LSB of d[0]
2
a[1]
1
a[0]
0
Location accessed by
fun(i)
CS 270
Memory Referencing Errors

Unlike Java, C and C++ do not protect memory at all
 Out of bounds array references
 Invalid pointer values
 Abuses of malloc/free (e.g., memory leaks)

This usually leads to nasty bugs
 Whether or not bug has any effect depends on system and compiler
 Action at a distance



Corrupted object logically unrelated to one being accessed
Effect of bug may be first observed long after it is generated
How can I deal with this?
 Program in Scheme, Java or Python
 Understand what possible interactions may occur
 Use tools that detect referencing errors
CS 270
Memory System Performance Example
void copyij(int src[2048][2048],
int dst[2048][2048])
{
int i,j;
// access in row-major order
for (i = 0; i < 2048; i++)
for (j = 0; j < 2048; j++)
dst[i][j] = src[i][j];
}
void copyji(int src[2048][2048],
int dst[2048][2048])
{
int i,j;
// access in column-major order
for (j = 0; j < 2048; j++)
for (i = 0; i < 2048; i++)
dst[i][j] = src[i][j];
}
21 times slower
on Pentium 4


Hierarchical memory organization
Performance depends on access patterns
 Including how we step through multi-dimensional arrays
CS 270
The Memory Mountain
Read throughput (MB/s)
Pentium III Xeon
550 MHz
16 KB on-chip L1 d-cache
16 KB on-chip L1 i-cache
512 KB off-chip unified
L2 cache
1200
1000
L1
800
600
400
L2
8k
32k
128k
512k
2m
8m
s15
s13
s11
Stride (words)
s9
s7
Mem
s5
s3
s1
0
2k
200
Working set size (bytes)
CS 270
Reality #4: There’s more to system
performance than asymptotic complexity

Constant factors matter too!

And even exact op count does not predict performance
 Easily see 10:1 performance range depending on how code was written
 Must optimize at multiple levels: algorithm, data representations,
procedures, and loops

Must understand system to optimize performance
 How are programs compiled and executed
 How do we measure program performance and identify bottlenecks
 How do we improve performance without destroying code modularity and
generality
CS 270
Example: Matrix Multiplication
Matrix-Matrix Multiplication (MMM) on 2 x Core 2 Duo 3 GHz (double precision)
Gflop/s
50
45
40
35
Best code (K. Goto)
30
25
160x
20
15
10
Triple loop
5
0
0
1,000
2,000
3,000
4,000
5,000
6,000
7,000
8,000
matrix size



Standard desktop computer, vendor compiler, using optimization flags
Both implementations have exactly the same operations count (2n3)
What is going on?
9,000
CS 270
MMM Plot: Analysis
Matrix-Matrix Multiplication (MMM) on 2 x Core 2 Duo 3 GHz
Gflop/s
50
45
40
35
30
Multiple threads: 4x
25
20
15
10
Vector instructions: 4x
5
Memory hierarchy and other optimizations: 20x
0
0


1,000
2,000
3,000
4,000
5,000
matrix size
6,000
7,000
8,000
9,000
Each speedup due to taking advantage of increasingly complex system
resources
Effect: less register spills, less L1/L2 cache misses, less TLB misses
CS 270
Reality #5: Computers do more than just execute
programs

They need to get data in and out
 I/O system is critical to program reliability and performance

They communicate with each other over networks
 Many system-level issues arise in presence of network




Concurrent operations by autonomous processes
Coping with unreliable media
Cross platform compatibility
Complex performance issues
CS 270
Slide Acknowledgements

Slides to accompany textbook due to Bryant and
O’Hallaron

Technology Trends slides due to Dr. XXX Carle of
UC/Berkeley

Some graphs from Hennesey and Patterson: Computer
Organization and Design, 4th Ed.