Tour of Computer Systems

Download Report

Transcript Tour of Computer Systems

CS 105
“Tour of the Black Holes of Computing”
Introduction to
Computer Systems
“Mike”
Michael A. Erlinger
Topics:




intro.ppt
Theme
Five great realities of computer systems
How this fits within CS curriculum
Logistics
Course Theme

Abstraction is good, but don’t forget reality!
Many CS Courses emphasize abstraction


Abstract data types
Asymptotic analysis
These abstractions have limits


Especially in the presence of bugs
Need to understand underlying implementations
Useful outcomes

Become more effective programmers
 Able to find and eliminate bugs efficiently
 Able to tune program performance

Prepare for later “systems” classes in CS or CE
 Compilers, Operating Systems, Networks, Computer
Architecture, Robotics, etc.
–2–
CS 105
Textbooks
Randal E. Bryant and David R. O’Hallaron,

“Computer Systems: A Programmer’s Perspective”, Second
Edition, Prentice Hall 2010
Brian Kernighan and Dennis Ritchie,

“The C Programming Language, Second Edition”, Prentice
Hall, 1988
Larry Miller and Alex Quilici

–3–
The Joy of C, Wiley, 1997
CS 105
Syllabus



–4–
Syllabus on Web
Calendar defines due dates: Everything is linked off the
Calendar
Labs – cs grading system used to hold grades, many have
specific grading system (individual grades will be moved
over)
CS 105
Course Components
Lectures

Higher level concepts
Problems – Quizzes & Class

Applied concepts, important tools and skills for labs,
clarification of lectures, exam coverage
Labs






–5–
The heart of the course
1 or 2 weeks
Provide in-depth understanding of an aspect of systems
Programming and measurement
Time to learn, avoid time to optimize
Teams of two, one must know pointers
CS 105
Notes:
Work groups


You must work in pairs on all labs
Honor-code violation to work without your partner!
Handins


Check Calendar.
Electronic submissions only.
Grading Characteristics

Lab scores tend to be high
 Serious handicap if you do not finish a lab

Tests/Quizzes typically have a wider range of scores
 They are primary determinant of your grade…
but not only one!
Do your share of lab work and prepare for quizzes or bomb the
test!!
–6–
CS 105
Cheating
What is cheating?

Sharing code: either by copying (web search, etc), retyping,
looking at, or supplying a copy of a file.
What is NOT cheating?



–7–
Helping others use systems or tools.
Helping others with high-level design issues.
Helping others debug their code.
CS 105
Facilities
Assignments will use Intel computer systems

Not all machines are created alike
 Even Intel Macs aren’t necessarily compatible
 Knuth is a 64-bit server

Wilkes: x86/Linux specifically set up for this class

Login on a Mac, then ssh to Wilkes
 If you want fancy programs, start X11 first
 Directories are cross-mounted, so you can edit on Knuth
or your Mac, and Wilkes will see your files
…or ssh into Wilkes from your dorm
 All programs must run on Wilkes: that’s where we
grade

–8–
CS 105
Overview: Programs and Data
Topics



Bit operations, arithmetic, assembly language programs,
representation of C control and data structures
Includes aspects of computer architecture and compilers
Data objects are distinguished by context
 100011 = 35 or ‘#’
Assignments



–9–
L1: Manipulating bits
L2: Defusing a binary bomb
L3: Hacking a buffer bomb
CS 105
Overview: Performance
Topics



High level processor models, code optimization (control and
data), measuring time on a computer
Includes aspects of architecture, compilers, and OS
Easier and cheaper to make Processors run faster than to
make Memory run faster: Processor-Memory Gap
Assignments

– 10 –
L5: Optimizing Code Performance
CS 105
Overview: The Memory Hierarchy,
Caching, VM, Disks
Topics



Memory technology, memory hierarchy, caches, disks,
locality
Virtual memory, address translation, dynamic storage
allocation.
Exploit caches for performance or ignore and run slowly..
Assignments


– 11 –
L5: Optimizing Code Performance
Malloc Lab – not done
CS 105
Overview: Linking and
Exceptions
Topics - Overview




Object files, static and dynamic linking, libraries, loading
Hardware exceptions, Unix signals, nonlocal jumps
Includes aspects of compilers, OS, and architecture
Programs: source – machine language – executable object
code
Assignments

– 12 –
Shell lab, web lab – usually not done
CS 105
Overview: Processes and
Concurrency
Topics





Process creation, process hierarchy, shared memory between
processes
Process == OS’s abstraction for a running program
Semaphores, critical sections
Threads
Concurrent processes, context switching
Assignments

L6: Ring Buffer
 Parent and Child process management
 Shared memory between processes
 Use of Threads

– 13 –
Fork based Ring Buffer
CS 105
Overview: I/O, Networking
Topics


High level and low-level I/O, network programming, Internet
services, Web servers
Includes aspects of networking, OS, and architecture.
Assignments

– 14 –
L7: Socket lab
CS 105
Lab Rationale
Each lab should have a well-defined goal such as solving a puzzle
or winning a contest.


Defusing a binary bomb.
Winning a performance contest.
Doing a lab should result in new skills and concepts





Data Lab: computer arithmetic, digital logic.
Bomb Labs: assembly language, using a debugger, understanding
the stack
Performance Lab: profiling, measurement, performance
debugging.
Process Lab: Interprocess communication.
Sockets Lab: Intercomputer communication
We try to use competition in a fun and healthy way.



– 15 –
Set a threshhold for full credit.
Post intermediate results (anonymized) on Web page for glory!
Most points for doing it correctly
CS 105
Good Luck!
– 16 –
CS 105
Great Reality #1
Int’s are not Integers, Float’s are not Reals !!
Examples

Is x2 ≥ 0?
 Float’s:
Yes!
 Int’s:
» 40000 * 40000 --> 1600000000
» 50000 * 50000 --> ???

Is (x + y) + z = x + (y + z)?
 Unsigned & Signed Int’s:
Yes!
 Float’s:
» (1e20 + -1e20) + 3.14 --> 3.14
» 1e20 + (-1e20 + 3.14) --> ??
– 17 –
CS 105
Computer Arithmetic
Does not generate random values

Arithmetic operations have important mathematical
properties…BUT
Cannot assume “usual” properties


Due to finiteness of representations--???
Integer operations satisfy “ring” properties
 Commutativity, associativity, distributivity

Floating point operations satisfy “ordering” properties
 Monotonicity, values of signs
Observation


– 18 –
Need to understand which abstractions apply in which
contexts
Important issues for compiler writers and serious application
programmers
CS 105
Great Reality #2
You’ve got to know assembly
Chances are, you’ll never program in assembly…C

Compilers are much better & more patient than you are
Understanding assembly key to machine-level
execution model

Behavior of programs in presence of bugs
 High-level language model breaks down

Tuning program performance
 Understanding sources of program inefficiency

Implementing system software
 Compiler has machine code as target
 Operating systems must manage process state

Creating/Fighting malware
 X86 is the architecture of choice
– 19 –
CS 105
Assembly Code Example
Time Stamp Counter



Special 64-bit register in Intel-compatible machines
Incremented every clock cycle
Read with rdtsc instruction
Application

Measure time required by procedure
 In units of clock cycles…NOT instructions
double t;
start_counter(); ---need to access reg
P();
t = get_counter(); ---need to access reg
printf("P required %f clock cycles\n", t);
– 20 –
CS 105
Code to Read Counter


Write small amount of assembly code using GCC’s asm
facility
Inserts assembly code into machine code generated by
compiler
static unsigned cyc_hi = 0;
static unsigned cyc_lo = 0;
/* Set *hi and *lo to the high and low order bits
of the cycle counter.
*/
void access_counter(unsigned *hi, unsigned *lo)
{
asm("rdtsc; movl %edx,%0; movl %eax,%1"
: "=r" (*hi), "=r" (*lo)
:
: "%edx", "%eax");
}
– 21 –
CS 105
Code to Read Counter
/* Record the current value of the cycle counter. */
void start_counter()
{
access_counter(&cyc_hi, &cyc_lo);
}
/* Number of cycles since the last call to start_counter. */
double get_counter()
{
unsigned ncyc_hi, ncyc_lo;
unsigned hi, lo, borrow;
/* Get cycle counter */
access_counter(&ncyc_hi, &ncyc_lo);
/* Do double precision subtraction */
lo = ncyc_lo - cyc_lo;
borrow = lo > ncyc_lo;
hi = ncyc_hi - cyc_hi - borrow;
return (double) hi * (1 << 30) * 4 + lo;
}
– 22 –
CS 105
Measuring Performance
Trickier than it Might Look

Many sources of variation
Example – cycles per element, rather than time

Sum integers from 1 to n
n
100
1,000
1,000
10,000
10,000
1,000,000
1,000,000
1,000,000,000
– 23 –
Cycles
961
8,407
8,426
82,861
82,876
8,419,907
8,425,181
8,371,2305,591
Cycles/n
9.61
8.41
8.43
8.29
8.29
8.42
8.43
8.37
CS 105
Great Reality #3
Memory Matters
Memory is not unbounded


It must be allocated and managed
Many applications are memory dominated
Memory referencing bugs especially pernicious

Effects are distant in both time and space - SegFault
Memory performance is not uniform


– 24 –
Cache and virtual memory effects can greatly affect program
performance
Adapting program to characteristics of memory system can
lead to major speed improvements
CS 105
Memory Referencing Bug
Example
main ()
{
long int a[2];
double d = 3.14;
a[2] = 1073741824; /* Out of bounds reference */
printf("d = %.15g\n", d);
// float in scientific notation
exit(0);
}
Alpha
MIPS
Linux
-g
5.30498947741318e-315 3.1399998664856
3.14
-O
3.14
3.14
3.14
(Linux version gives correct result, but implementing as
separate function gives segmentation fault.) – storage
allocation issue
– 25 –
CS 105
Memory Referencing Errors
C and C++ do not provide any memory protection



Out of bounds array references
Invalid pointer values
Abuses of malloc/free
Can lead to nasty bugs


Whether or not bug has any effect depends on system and
compiler
Action at a distance
 Corrupted object logically unrelated to one being accessed
 Effect of bug may be first observed long after it is generated
How can I deal with this?


– 26 –

Program in Java, Lisp, or ML
Understand what possible interactions may occur
Use or develop tools to detect referencing errors
CS 105
Memory System Performance
Example
void copyij(int
int
{
int i,j;
for (i = 0; i
for (j = 0;
dst[i][j]
}
src[2048][2048],
dst[2048][2048])
< 2048; i++)
j < 2048; j++)
= src[i][j];
void copyji(int src[2048][2048],
int dst[2048][2048])
{
int i,j;
for (j = 0; j < 2048; j++)
for (i = 0; i < 2048; i++)
dst[i][j] = src[i][j];
}
21 times slower
(Pentium 4)
Hierarchical memory organization
Performance depends on access patterns


– 27 –
Including how step through multi-dimensional array
Order of elements matters ???
CS 105
The Memory Mountain
Read throughput (MB/s)
Pentium III Xeon
550 MHz
16 KB on-chip L1 d-cache
16 KB on-chip L1 i-cache
512 KB off-chip unified
L2 cache
1200
1000
L1
800
Live in L1,
very fast
600
400
L2
200
– 28 –
2k
8k
32k
128k
512k
2m
8m
s15
s13
s11
Stride (words)
s9
s7
Mem
s5
s3
s1
0
Working set size (bytes), ???
CS 105
Great Reality #4
There’s more to performance than asymptotic
complexity !!!
Constant factors matter too!


Easily see 10:1 performance range depending on how code
written
Must optimize at multiple levels: algorithm, data
representations, procedures, and loops
Must understand system to optimize performance



– 29 –
How programs compiled and executed
How to measure program performance and identify
bottlenecks
How to improve performance without destroying code
modularity and generality
CS 105
Example Matrix Multiplication
Matrix-Matrix Multiplication (MMM) on 2 x Core 2 Duo 3 GHz (double precision)
Gflop/s
50
45
40
Best code (K. Goto)
35
30
25
160x
20
15
10
5
Triple loop
0
0
1,000
2,000
3,000
4,000
5,000
6,000
7,000
8,000
9,000
matrix size
Standard desktop computer, vendor compiler, using optimization flags
Both implementations have exactly the same operations count (2n3)
What is going on?
– 30 –
CS 105
MMM Plot: Analysis
Matrix-Matrix Multiplication (MMM) on 2 x Core 2 Duo 3 GHz
Gflop/s
50
45
40
35
30
Multiple threads: 4x
25
20
15
10
Vector instructions: 4x
5
Memory hierarchy and other optimizations: 20x
0
0
1,000
2,000
3,000
4,000
5,000
6,000
7,000
8,000
9,000
matrix size


– 31 –
Reason for 20x: Blocking or tiling, loop unrolling, array scalarization, instruction
scheduling, search to find best choice
Effect: less register spills, less L1/L2 cache misses, less TLB misses
CS 105
Great Reality #5
Computers do more than execute programs
They need to get data in and out

I/O system critical to program reliability and performance
They communicate with each other over networks

Many system-level issues arise in presence of network
 Concurrent operations by autonomous processes
 Coping with unreliable media
 Cross platform compatibility
 Complex performance issues
– 32 –
CS 105
Role within Curriculum
CS 125
Networks
Network
Protocols
CS 124
Operating
Systems
CS 132
Compilers
Processes
Mem. Mgmt
Machine Code
Optimization
Data Structures
Applications
Programming
– 33 –
CS 136
Advanced Arch
Exec. Model
Memory System
CS 105
Systems
CS 70
C++ &
Structures
CS 156
Parallel
CS 60
Principles of CS
Transition from Abstract to
Concrete!


From: high-level language
model
To: underlying
implementation
CS 105
Course Perspective
Most Systems Courses are Builder-Centric

Computer Architecture
 Design pipelined processor in Verilog

Operating Systems
 Implement large portions of operating system

Compilers
 Write compiler for simple language

Networking
 Implement and simulate network protocols
– 34 –
CS 105
Course Perspective (Cont.)
This Course is Programmer-Centric


Purpose is to show how, by knowing more about the
underlying system, one can be more effective as a
programmer
Enable you to
 Write programs that are more reliable and efficient
 Incorporate features that require hooks into OS
» E.g., concurrency, signal handlers

Not just a course for dedicated hackers
 We bring out the hidden hacker in everyone

– 35 –
Cover material that you will not see elsewhere
CS 105
The End
– 36 –
CS 105
Memory Performance Example
Implementations of Matrix Multiplication

Multiple ways to nest loops
/* ijk */
for (i=0; i<n; i++) {
for (j=0; j<n; j++) {
sum = 0.0;
for (k=0; k<n; k++)
sum += a[i][k] * b[k][j];
c[i][j] = sum;
}
}
– 37 –
/* jik */
for (j=0; j<n; j++) {
for (i=0; i<n; i++) {
sum = 0.0;
for (k=0; k<n; k++)
sum += a[i][k] * b[k][j];
c[i][j] = sum
}
}
CS 105
Matmult Performance (Alpha
21164)
Too big for L1 Cache
Too big for L2 Cache
160
140
120
ijk
100
ikj
jik
80
jki
kij
60
kji
40
20
0
matrix size (n)
– 38 –
CS 105
Blocked matmult perf (Alpha 21164)
160
140
120
100
bijk
bikj
80
ijk
ikj
60
40
20
0
50
75
100 125 150 175 200 225 250 275 300 325 350 375 400 425 450 475 500
matrix size (n)
– 39 –
CS 105