Introduction

Download Report

Transcript Introduction

Advanced Computer Architecture
From Fall 2007
March 29, 2017
1
Advanced Computer Architecture
Nvidia Fermi GPU
March 29, 2017
2
Slides from ...


We use course slides from :
— “CS232: Advanced Computer Architecture II”
— http://www.cs.uiuc.edu/class/fa11/cs232/lectures/
©2007-2011 Craig Zilles (adapted from slides by Howard Huang)
March 29, 2017
3
What is computer architecture about?


Computer architecture is about building and analyzing computer systems.
This course is roughly split into three parts.
— The first third discusses instruction set architectures—the bridge
between hardware and software.
— Next, we introduce more advanced processor implementations. The
focus is on pipelining, which is one of the most important ways to
improve performance.
— Finally, we talk about memory systems, I/O, and how to connect it
all together.
March 29, 2017
4
What is computer architecture about?

Computer architecture is about building and analyzing computer systems.
HLL
Compiler
ASM
Processor
Memory
Input/Output

We will take a tour of the whole machine.

Specifically, we’ll…
March 29, 2017
5
Do low-level programming in a high-level language
HLL
Compiler
ASM
Processor
Memory
Input/Output

We’ll look at bit-wise logical and shifting operations in C (Stage HLL:
High Level Language ).
March 29, 2017
6
Study Instruction Set Architectures
HLL
Compiler
ASM
Processor
Memory
Input/Output

The Instruction Set Architecture (ISA) is the bridge between the hardware
and the software.
— We’ll learn the MIPS ISA in detail
— We’ll get a brief introduction to the x86 ISA
— We’ll learn how HLL program constructs are represented to the
machine
— We won’t learn how compilers work, but we’ll learn what they do
March 29, 2017
7
MIPS (in Turkish)






Microprocessor without Interlocked Pipeline Stages, MIPS Technologies, 1985
— http://tr.wikipedia.org/wiki/MIPS_Mimarisi
İndirgenmiş komut kümesi türü ilk mikroişlemci mimarisidir.
— Her komut aynı boyuttadır ve komut bilgisayar donanımı tarafından kolayca
çözülebilir.
Intel x86 ise karmaşık komut kümeli bilgisayar sayılır.
— Komutların boyutları farklıdır ve komutları çözebilmek için bilgisayar
donanımına gömülmüş programlar (microcode) gereklidir.
RISC yapısından ötürü tasarımı çok temiz ve basittir.
— Sistem karmaşık işlemleri destekleyen yapılar yaratmaktansa sık yapılan basit
işlemleri iyileştirme üzerine kuruludur.
— Bu tasarım avantajından dolayı üniversitelerdeki bilgisayar mimarisi
derslerinde genellikle MIPS mimarisi okutulur.
— Yine basit ve sağlam tasarımından ötürü çoğu modern mikroişlemci mimarisi
(IBM/Motorola PowerPC, DEC, ARM) MIPS mimarisinden esinlenerek
geliştirilmiştir.
1990 itibariyle üretilen her üç RISC işlemciden birinin MIPS mimarisinde olduğu
tahmin edilmektedir. İlk MIPS tasarımları 32 bit, daha yeni tasarımlar ise 64
bittir.
MIPS mimarisi SGI bilgisayarlarından gömülü sistemlere kadar geniş bir yelpazede
kullanılmaktadır.
— Örneğin; Nintendo 64, Sony PlayStation ve Sony PSP MIPS mimarisi ile çalışan
işlemcilere sahiptirler.
March 29, 2017
8
Learn
HLL
Compiler
about performance
ASM
Processor
Memory
Input/Output


We’ll learn how to performance tune programs.
We’ll exploit explicit parallelism to make programs run faster
— We’ll optimize a program using SSE instructions
March 29, 2017
9
Learn about Modern Processor Organization
HLL
Compiler
ASM
Processor
Memory
Input/Output

The key technique we’ll focus on is: Pipelining
— Pipelining allows processors to work on multiple instructions at the
same time.
March 29, 2017
10
Learn about Memory and I/O systems
HLL
Compiler
ASM
Processor
Memory
Input/Output



We’ll learn how virtual memory makes programming easy
We’ll learn how caches make memory fast
We’ll learn about buses and disks
March 29, 2017
11
Why should you care?

It is interesting.
— How do you make a processor that runs at 3Ghz?

It will help you be a better programmer.
— Understanding how your program is translated to assembly code lets
you reason about correctness and performance.
— Demystify the seemingly arbitrary (e.g., bus errors, segmentation faults)

Many cool jobs require an understanding of computer architecture.
— The cutting edge is often pushing computers to their limits.
— Supercomputing, games, portable devices, etc.

Computer architecture illustrates many fundamental ideas in computer
science
— Abstraction, caching, and indirection are CS staples
March 29, 2017
12
Instruction set architectures
Software
ISA
Hardware

In this course, we’ll talk about several important issues that we didn’t
see in the simple processor from predecessor courses.
— The instruction set in predecessor courses lacked many features, such
as support for function calls. We’ll work with a larger, more realistic
processor.
— We’ll also see more ways in which the instruction set architecture
affects the hardware design.
March 29, 2017
13
Basic vs. Advanced Architecture


This class expands upon the computer architecture material, and we rely
on many other ideas from predecessor courses.
— Understanding binary, hexadecimal and two’s-complement numbers
is still important.
— Devices like multiplexers, registers and ALUs appear frequently. You
should know what they do, but not necessarily how they work.
— Finite state machines and sequential circuits will appear again.
We do not spend much time with logic design topics like Karnaugh maps,
Boolean algebra, latches and flip-flops.
Y
W
0
0
1
1
0
0
1
1
0
1
0
0
0
1
0
0
X
Z
March 29, 2017
14
MIPS
 In this class, we’ll use the MIPS instruction set architecture (ISA) to
illustrate concepts in assembly language and machine organization
— Of course, the concepts are not MIPS-specific
— MIPS is just convenient because it is real, yet simple (unlike x86)
 The MIPS ISA is still used in many places today. Primarily in embedded
systems, like:
— Various routers from Cisco
— Game machines like the Nintendo 64 and Sony Playstation 2
March 29, 2017
15
What you will need to learn this month

You must become “fluent” in MIPS assembly:
— Translate from C to MIPS and MIPS to C

Example problem from a previous mid-term 1:
Question 3: Write a recursive function (30 points)
Here is a function pow that takes two arguments (n and m, both 32-bit
numbers) and returns nm (i.e., n raised to the mth power).
int
pow(int n, int m) {
if (m == 1)
return n;
return n * pow(n, m-1);
}
Translate this into a MIPS assembly language function.
March 29, 2017
16
MIPS: register-to-register, three address
 MIPS is a register-to-register, or load/store, architecture.
— The destination and sources must all be registers.
— Special instructions, which we’ll see later today, are needed to access
main memory.
 MIPS uses three-address instructions for data manipulation.
— Each ALU instruction contains a destination and two sources.
— For example, an addition instruction (a = b + c) has the form:
operation
add
operands
a,
destination
March 29, 2017
b,
c
sources
17
Register file review
 Here is a block symbol for a general 2k  n register file.
— If Write = 1, then D data is stored into D address.
— You can read from two registers at once, by supplying the A address
and B address inputs. The outputs appear as A data and B data.
 Registers are clocked, sequential devices.
— We can read from the register file at any time.
— Data is written only on the positive edge of the clock.
n
D data
k
Write
D address
2k  n Register File
k
A address
A data
n
March 29, 2017
B address
k
B data
n
18
MIPS register file
 MIPS processors have 32 registers, each of which holds a 32-bit value.
— Register addresses are 5 bits long.
— The data inputs and outputs are 32-bits wide.
 More registers might seem better, but there is a limit to the goodness.
— It’s more expensive, because of both the registers themselves as well
as the decoders and muxes needed to select individual registers.
— Instruction lengths may be affected, as we’ll see in the future.
32
D data
5
Write
D address
32  32 Register File
5
March 29, 2017
A address
B address
A data
B data
32
32
5
19
writenum
Internal organization of Register File
Decoder
Input bus
8
loadx
8
reg7
8
reg6
8
reg5
8
8
reg4
reg3
8
reg2
8
reg1
8
reg0
clk
readnum
Eight input, 8-bit wide multiplexer
8
Output bus
 For 8 registers (3 bit address) and 8 bit data
20
MIPS register names
 MIPS register names begin with a $. There are two naming conventions:
— By number:
$0
$1
$2
…
$31
— By (mostly) two-character names, such as:
$a0-$a3
$s0-$s7
$t0-$t9
$sp
$ra
 Not all of the registers are equivalent:
— E.g., register $0 or $zero always contains the value 0
• (go ahead, try to change it)
 Other registers have special uses, by convention:
— E.g., register $sp is used to hold the “stack pointer”
 You have to be a little careful in picking registers for your programs.
March 29, 2017
21
All registers
March 29, 2017
22
Basic arithmetic and logic operations
 The basic integer arithmetic operations include the following:
add
sub
mul
div
 And here are a few logical operations:
and
or
xor
 Remember that these all require three register operands; for example:
add $t0, $t1, $t2
mul $s1, $s1, $a0
# $t0 = $t1 + $t2
# $s1 = $s1 x $a0
Note: a full MIPS ISA reference can be found in Appendix A
(linked from website)
March 29, 2017
23
Larger expressions
 More complex arithmetic expressions may require multiple operations at
the instruction set level.
t0  (t1  t2)  (t3  t4)
add $t0, $t1, $t2
sub $s0, $t3, $t4
mul $t0, $t0, $s0
# $t0 contains $t1 + $t2
# Temporary value $s0 = $t3 - $t4
# $t0 contains the final product
 Temporary registers may be necessary, since each MIPS instructions can
access only two source registers and one destination.
— In this example, we could re-use $t3 instead of introducing $s0.
— But be careful not to modify registers that are needed again later.
March 29, 2017
24
Immediate operands
 The ALU instructions we’ve seen so far expect register operands. How do
you get data into registers in the first place?
— Some MIPS instructions allow you to specify a signed constant, or
“immediate” value, for the second source instead of a register. For
example, here is the immediate add instruction, addi:
addi $t0, $t1, 4
# $t0 = $t1 + 4
— Immediate operands can be used in conjunction with the $zero
register to write constants into registers:
addi $t0, $0, 4
# $t0 = 4
 MIPS is still considered a load/store architecture, because arithmetic
operands cannot be from arbitrary memory locations. They must either
be registers or constants that are embedded in the instruction.
March 29, 2017
25
A more complete example

What if we wanted to compute the following?
1+2+3+4
March 29, 2017
26
We need more space!
 Registers are fast and convenient, but we have only 32 of them, and each
one is just 32-bits wide.
— That’s not enough to hold data structures like large arrays.
— We also can’t access data elements that are wider than 32 bits.
 We need to add some main memory to the system!
— RAM is cheaper and denser than registers, so we can add lots of it.
— But memory is also significantly slower, so registers should be used
whenever possible.
 In the past, using registers wisely was the programmer’s job.
— For example, C has a keyword “register” that marks commonly-used
variables which should be kept in the register file if possible.
— However, modern compilers do a pretty good job of using registers
intelligently and minimizing RAM accesses.
March 29, 2017
27
Memory review
 Memory sizes are specified much like register files; here is a 2k x n RAM.
2k  n memory
k
n
ADRS
DATA
CS
WR
OUT
n
CS
WR
0
1
1
x
0
1
Operation
None
Read selected address
Write selected address
 A chip select input CS enables or “disables” the RAM.
 ADRS specifies the memory location to access.
 WR selects between reading from or writing to the memory.
— To read from memory, WR should be set to 0. OUT will be the n-bit
value stored at ADRS.
— To write to memory, we set WR = 1. DATA is the n-bit value to store
in memory.
March 29, 2017
28
MIPS memory
232  8 memory
32
8
ADRS
DATA
CS
WR
OUT
8
 MIPS memory is byte-addressable, which means that each memory address
references an 8-bit quantity.
 The MIPS architecture can support up to 32 address lines.
— This results in a 232 x 8 RAM, which would be 4 GB of memory.
— Not all actual MIPS machines will have this much!
Address
0
1
2
3
4
5
6
7
8
9
10 11
8-bit data
March 29, 2017
29
Loading and storing bytes
 The MIPS instruction set includes dedicated load and store instructions for
accessing memory, much like the basic processor.
 The main difference is that MIPS uses indexed addressing.
— The address operand specifies a signed constant and a register.
— These values are added to generate the effective address.
 The MIPS “load byte” instruction lb transfers one byte of data from main
memory to a register.
lb $t0, 20($a0)
# $t0 = Memory[$a0 + 20]
 The “store byte” instruction sb transfers the lowest byte of data from a
register into main memory.
sb $t0, 20($a0)
March 29, 2017
# Memory[$a0 + 20] = $t0
30
Byte loads

Question: if you load a byte (8 bits) into a register (32 bits), what value
do those other 24 bits have?

Consider the code mem.s demonstrated in class (see website)
$t0 is initialized to 0x0000FFFF

if the byte 0x08 is loaded into $t0 ( using lb ), then
$t0 = 0xFFFFFF80

What’s going on there?? Answer: sign-extension !
March 29, 2017
31
Loading and storing words
 You can also load or store 32-bit quantities—a complete word instead of
just a byte—with the lw and sw instructions.
lw $t0, 20($a0)
sw $t0, 20($a0)
# $t0 = Memory[$a0 + 20]
# Memory[$a0 + 20] = $t0
 Most programming languages support several 32-bit data types.
— Integers
— Single-precision floating-point numbers
— Memory addresses, or pointers
 Unless otherwise stated, we’ll assume words are the basic unit of data.
Address
0
1
2
3
4
5
6
7
8
9
10 11
8-bit data
March 29, 2017
32
An array of words
 Remember to be careful with memory addresses when accessing words.
 For instance, assume an array of words begins at address 2000.
— The first array element is at address 2000.
— The second word is at address 2004, not 2001.
 Revisiting the earlier example, if $a0 contains 2000, then
lw $t0, 0($a0)
accesses the first word of the array, but
lw $t0, 8($a0)
would access the third word of the array, at address 2008.
March 29, 2017
33
Computing with memory
 So, to compute with memory-based data, you must:
1. Load the data from memory to the register file.
2. Do the computation, leaving the result in a register.
3. Store that value back to memory if needed.
 For example, let’s say that you wanted to do the same addition, but the
values were in memory. How can we do the following using MIPS assembly
language?
char A[4] = {1, 2, 3, 4};
int result;
result = A[0] + A[1] + A[2] + A[3];
March 29, 2017
34
Memory alignment
 Keep in mind that memory is byte-addressable, so a 32-bit word actually
occupies four contiguous locations (bytes) of main memory.
Address
0
1
2
3
4
5
6
7
8
9
10 11
8-bit data
Word 1
Word 2
Word 3
 The MIPS architecture requires words to be aligned in memory; 32-bit
words must start at an address that is divisible by 4.
— 0, 4, 8 and 12 are valid word addresses.
— 1, 2, 3, 5, 6, 7, 9, 10 and 11 are not valid word addresses.
— Unaligned memory accesses result in a bus error, which you may have
unfortunately seen before.
 This restriction has relatively little effect on high-level languages and
compilers, but it makes things easier and faster for the processor.
March 29, 2017
35
Low-level Programming in “High-level” Languages

Very often it is necessary to store a large number of very small data
items.

Example: A Social Security Number (SSN) registry
— Needs to keep track of how which SSNs have already been allocated.
How much space is required?

March 29, 2017
36
Storing collections of bits as integers

Store N bits in each N-bit integer, only need 109/N integers
— Requires 109/8 bytes = 125MBs of storage (fits on a CD)

Allocate array:
int array_size = 1000000000/_____________
unsigned int SSN_registry[array_size];
01000110011111010111010100101001
11011101011010010001010100101011
10010101111010010101100100100010
10010101010101100101010010110010
10010101000101011001010111111101
00001000010111001100100011110101
01101011101001010001000000101011

Want two operations on this array:
— check_SSN: returns 1 if SSN is used, 0 otherwise
— set_SSN: marks an SSN as used.
March 29, 2017
SSN #7
SSN #68
37
check_SSN
int check_SSN(unsigned int SSN_registry[], int ssn) {
int word_index = ssn / (8*sizeof(int));
int word = SSN_registry[word_index];
10010101000101011001010111111101
March 29, 2017
01000110011111010111010100101001
11011101011010010001010100101011
10010101111010010101100100100010
10010101010101100101010010110010
10010101000101011001010111111101
00001000010111001100100011110101
01101011101001010001000000101011
38
check_SSN
int check_SSN(unsigned int SSN_registry[], int ssn) {
int word_index = ssn / (8*sizeof(int));
int word = SSN_registry[word_index];
int bit_offset = ssn % (8*sizeof(int)) // % is the remainder operation
word = word >> bit_offset; // >> shifts a value “right”
(note: zeros are inserted at the left because it is an unsigned int)
10010101000101011001010111111101
01000110011111010111010100101001
11011101011010010001010100101011
10010101111010010101100100100010
10010101010101100101010010110010
10010101000101011001010111111101
00001000010111001100100011110101
01101011101001010001000000101011
00001001010100010101100101011111
March 29, 2017
39
check_SSN
int check_SSN(unsigned int SSN_registry[], int ssn) {
int word_index = ssn / (8*sizeof(int));
int word = SSN_registry[word_index];
int bit_offset = ssn % (8*sizeof(int))
word = word >> bit_offset;
word = (word & 1); // & is the bit-wise logical AND operator
return word;
(each bit position is considered independently)
}
01000110011111010111010100101001
10010101000101011001010111111101
11011101011010010001010100101011
10010101111010010101100100100010
10010101010101100101010010110010
10010101000101011001010111111101
00001000010111001100100011110101
01101011101001010001000000101011
00001001010100010101100101011111
&
00000000000000000000000000000001
00000000000000000000000000000001
March 29, 2017
40
set_SSN
void set_SSN(unsigned int SSN_registry[], int ssn) {
int bit_offset = ssn % (8*sizeof(int))
int new_bit = (1 << bit_offset) // “left” shift the 1 to the desired spot
(always shifts in 0’s at the right)
000000000000000000000000000000001
01000110011111010111010100101001
11011101011010010001010100101011
000000000000000000000010000000000
10010101111010010101100100100010
10010101010101100101010010110010
10010101000101011001010111111101
00001000010111001100100011110101
01101011101001010001000000101011
March 29, 2017
41
set_SSN
void set_SSN(unsigned int SSN_registry[], int ssn) {
int bit_offset = ssn % (8*sizeof(int))
int new_bit = (1 << bit_offset)
int word_index = ssn / (8*sizeof(int));
int word = SSN_registry[word_index];
000000000000000000000000000000001
01000110011111010111010100101001
11011101011010010001010100101011
000000000000000000000010000000000
100101010001010110010101111111101
10010101111010010101100100100010
10010101010101100101010010110010
10010101000101011001010111111101
00001000010111001100100011110101
01101011101001010001000000101011
March 29, 2017
42
set_SSN
void set_SSN(unsigned int SSN_registry[], int ssn) {
int bit_offset = ssn % (8*sizeof(int))
int new_bit = (1 << bit_offset)
int word_index = ssn / (8*sizeof(int));
int word = SSN_registry[word_index];
word = word | new_bit; // bit-wise logical OR sets the desired bit
000000000000000000000000000000001
01000110011111010111010100101001
11011101011010010001010100101011
|
000000000000000000000010000000000
100101010001010110010101111111101
100101010001010110010111111111101
March 29, 2017
10010101111010010101100100100010
10010101010101100101010010110010
10010101000101011001010111111101
00001000010111001100100011110101
01101011101001010001000000101011
43
set_SSN
void set_SSN(unsigned int SSN_registry[], int ssn) {
int bit_offset = ssn % sizeof(int)
int new_bit = (1 << bit_offset)
int word_index = ssn / sizeof(int);
int word = SSN_registry[word_index];
word = word | new_bit;
SSN_registry[word_index] = word; // write back the word into array
}
000000000000000000000000000000001
01000110011111010111010100101001
11011101011010010001010100101011
|
000000000000000000000010000000000
100101010001010110010101111111101
100101010001010110010111111111101
10010101111010010101100100100010
10010101010101100101010010110010
10010101000101011001011111111101
00001000010111001100100011110101
01101011101001010001000000101011
Shorthand for last 3 lines: SSN_registry[word_index] |= new_bit;
March 29, 2017
44
What you just saw



Storage of a collection of booleans in an integer
Use of bit-wise logical operations to read and write specific bits
Use of shifts to move bits with an integer

This stuff gets used all over the place in real code:
— Bitmap graphics
— Network packet headers
— Operating system tracking free disk blocks
March 29, 2017
45
General hints to reach to nirvana




Remember the big picture.
What are we trying to accomplish, and why?
Read the textbook.
It’s clear, well-organized, and well-written. The diagrams can be
complex, but are worth studying. Work through the examples and try
some exercises on your own. Read the “Real Stuff” and “Historical
Perspective” sections.
Talk to each other.
You can learn a lot from other students, both by asking and answering
questions. Find some good partners for the homeworks (but make sure
you all understand what’s going on).
Help us help you.
Come to lectures, sections and office hours. Send email. Ask lots of
questions! Check out the web page
March 29, 2017
46