Transcript page fault

Computer Architecture
Foundations for Graduate Level
Students
Basic Paradigm
HD
CPU
MM
cache
Transfers of data
HD
CPU
MM
cache
Transfers of data
HD
CPU
MM
cache
CPU needs particular data
HD
CPU
MM
cache
If required data is found in cache
HD
CPU
MM
cache
When required data is not in cache
HD
CPU
MM
cache
CPU ultimately gets data from
cache
HD
CPU
MM
cache
If data is not in cache and in MM
HD
CPU
MM
cache
From HD to MM to cache
HD
CPU
MM
cache
If MM is full
HD
CPU
MM
cache
If cache is full
HD
CPU
MM
cache
Swapping between MM and cache
HD
CPU
MM
cache
Access Time
• If every memory reference to cache
required transfer of one word between MM
and cache, no increase in speed is
achieved. In fact, speed will drop because
apart from MM access, there is additional
access to cache
• Suppose reference is repeated n times,
and after the first reference, location is
always found in the cache
Cache Hit Ratio
• The probability that a word will be found in
the cache
• Depends upon the program and the size
and organization of the cache
h = Number of times required word found in cache
Total number of references
h: hit ratio
Access Time
ta  tc  (1  h)tm
ta
tc
(1-h)
tm
= Average access time
= Cache access time
= miss ratio
= Memory access time
Fetch Mechanisms
• Demand Fetch
– Fetch a block from memory when it is needed
and is not in the cache
• Prefetch
– Fetch block/s from memory before they are
requested
• Selective Fetch
– Not always fetching blocks, dependent on
some defined criterion; blocks are stored in
MM rather than the cache
Data in cache should be replaced
with data from MM
• Blocks (a group of memory addresses) are
transferred from MM to cache
• Cache has a limited capacity (page frame)
MM
cache
Replacement Algorithms
• When the word being requested by the CPU is
not in the cache, it needs to be transferred from
MM. (or it can also be from secondary memory
to MM)
• A page fault occurs when a page or a block is
not in the cache (or MM in the case of secondary
memory)
• Replacement algorithms determine which
page/block to remove or overwrite
Characteristics
• Usage based or Non-usage based
– Usage based : the choice of page/block to
replace is dependent on the how many times
each page/block has been referenced
– Non-usage based : Use some other criteria
for replacement
Assumptions
• For a given page size, we only need to consider the
page/block number.
• If we have a reference (hit) to a page p, then any
immediately succeeding references to p does not cause
a page fault
• The size of memory/cache is represented as the number
of pages it is capable of holding (page frame )
Example
Consider the following address sequence calls:
0110 0432 0101 0612 0102 0103 0104
0101 0611 0102 0103 0302
which, at 100 bytes per page, can be reduced to the
following access string:
1
4
1
6
1
6
1
3
This sequence of page requests is called a reference
string.
Replacement Policies
•
•
•
•
•
•
Random replacement algorithm
First-in first-out replacement
Optimal Algorithm
Least recently used algorithm
Least Frequently Used
Most Frequently Used
Random Replacement
• A page is chosen randomly at page fault
time
• There is no relationship between the
pages or their use.
• Choice is done by a random number
generator.
FIFO
• Memory treated as a queue
– When a page comes in, it is inserted at the tail
– When a page is removed, the entry at the
head of the queue gets deleted
• Easy to understand and program
• Performance is not consistently good;
dependent on reference string
FIFO Example
Consider the following reference string:
7
0
1
2
0
3
0
4
2
*
*
*
*
3
2
1
0
3
2
4
0
3
2
4
0
With a page frame of 3
*
*
*
*
7
0
7
1
0
7
2
1
0
2
1
0
An * indicates a miss (the page requested by the CPU
is not in the cache or in MM)
FIFO Example #2
Consider the following reference string:
1 2 3 4 1 2 5 1
2
3
4
5
5
2
1
*
3
5
2
*
4
3
5
4
3
5
With a page frame of 3
*
1
*
2
1
*
3
2
1
*
4
3
2
*
1
4
3
*
2
1
4
*
5
2
1
5
2
1
We have 9 page faults
Try performing this FIFO with a page frame of 4
Belady’s Anomaly
• An increase in page frame does not
necessarily mean a decrease in page
faults
• More formally, Belady’s anomaly reflects
the fact that, for some page-replacement
algorithms, the page fault rate may
increase as the number of allocated
frames increases
Optimal Algorithm
• The page that will not be used for the
longest period of time is replaced
• Guarantees the lowest page fault rate for a
fixed number of frames
• Difficult to implement because it requires
future knowledge of the reference string
Optimal Algorithm Example
Consider the following reference string:
7
0
1
2
0
3
With a page frame of 3
0
4
2
We look ahead and see that 7 is the page which will not be used
again, so we replace 7; we also note that after our first hit we should
not replace 0 immediately, but rather 1 because 1 will not be
referenced any more (2 will be referenced last.)
*
7
*
0
7
*
1
0
7
*
2
1
0
2
1
0
*
3
2
0
3
2
0
*
4
3
2
4
3
2
Least Recently Used
• Approximates the optimal algorithm
• Replaces the page that has not been used
for the longest period of time
• When all page frames have been used up
and every time there is a page hit, the
referenced page is placed at the tail to
indicate it has been recently accessed
LRU Example
Consider the following reference string:
7 0 1 2 0 3 0 4
0
3
0
2
0
3
4
*
2
0
3
With a page frame of 3
*
7
*
0
7
*
1
0
7
*
2
1
0
0
2
1
*
3
0
2
0
3
2
*
4
0
3
0
4
3
3
0
4
We have 7 page faults
Try performing this LRU with a page frame of 4
Least Frequently Used
• Counts the number of references made to each
page; when page is accessed, counter is
incremented by one
• Page with smallest count is replaced
• FIFO is used to resolve a tie
• Rationale: Page with the bigger counter is an
actively used page
• Problem
– Page initially actively may never be used again
– Solved by using a decaying counter
LFU Example
Consider the following reference string:
7
0
1
2
0
3
0
4
0
3
0
2
41
32
05
*
21
32
05
With a page frame of 3
*
71
*
01
71
*
11
01
71
*
21
11
01
21
11
02
*
31
21
02
31
21
03
*
41
31
03
We have 7 page faults
Try performing this LFU with a page frame of 4
41
31
04
41
32
04
Most Frequently Used
•
•
•
•
Opposite of LFU
Replace page with the highest count
Tie is resolved using FIFO
Based on the argument that the page with
smallest count has just been probably
brought in and is yet to be used
• Both LFU and MFU are not common and
implementation is expensive.
The Central Processing Unit
• The operating hub and heart of every
computer system
• Composed of
– Control Unit
– Datapath
• Each component inside the CPU has a
specific role in executing a command
• Communicates with other components of
the system
Control Unit (CU)
• Regulates all activities inside the machine
• Serves as “nerve center” that sends
control signals to other units and senses
their status
• Connected to all components in the CPU
as well as main memory
How The CU Is Connected
Registers
ALU
Control
Unit
CPU
Main Memory
Inside the CPU: The Datapath
Registers
ALU
Registers
• Components used for data storage (can
be read from or written to)
• High speed memory locations used to
store important information during CPU
operations
• Two types
– Special
– General-purpose
Special Registers
• Registers used for specific purposes
• Used heavily during execution of CPU
instructions
General Purpose Registers
• CPU registers used as “scratch pad”
during execution of machine-level
instructions
• Number varies between processors
Arithmetic Logic Unit (ALU)
• Performs all mathematical and logical
operations within the CPU
• Operands not in the CPU would have to
be retrieved from main memory
CPU-Memory Coordination
• Bus - a group of
wires that connect
separate
components
• Types of bus:
– Control bus (control
signals)
– Address bus
(address
information)
– Data bus
(instruction/data)
CPU-Memory Coordination
• The different busses facilitate
communication between the CPU and
main memory
• Actions of the two components are highlysynchronized to ensure efficient and timely
execution of instructions
CPU Operations
• Instructions do not reside in the CPU, they
have to be fetched from memory
• Each machine level instruction is broken
down into a logical sequence of smaller
steps
CPU Operations
• Instructions are carried out by performing
one or more of the following functions in
some pre-specified sequence
– Retrieving data from main memory
– Putting data to main memory
– Register data transfer
– ALU operation
How An Instruction is Processed
• Instruction is retrieved from memory
• Analyze what the instruction is and how to
execute it
• Operands/parameters (if any) are fetched
from main memory
• Instruction is executed
• Results are stored (CPU or MM)
• Prepare for next instruction
Instruction Processing Example
•
•
•
•
•
•
Fetch instruction from memory
Decode it (turns out to be an ADD)
Get the two numbers to add from MM
Perform the addition
Where will it be stored?
Prepare for next instruction
Processing Data in Clusters
• Information is organized into groups of
fixed-size data that can be stored and
retrieved in a single, basic operation
• Each group of n bits is referred to as a
word of information
• Access to each word requires a distinct
name (location/address)
• Can also refer to a characteristic of other
components (i.e. size of bus)
Word Length
• Size of a word specified in bits known as
word length
• Possible benefits of a large word length:
– Faster processing (more data and/or
instructions can be fetched at a time)
– Greater numeric precision
– More powerful instructions (e.g. instructions
can have more operands)
Machine Language
• Composed of
– Instruction
– Data (instruction parameters)
• Instructions and data are represented by a
stream of 1s and 0s
• Cumbersome to deal with when preparing
programs; programmers use hexadecimal
numbers
• In some computers, both instruction and data
are stored in a single memory location
Assembly: An Improvement on
Machine Language
• Symbols, called mnemonics, are used to
represent instructions
Instruction parameters
• Sample instruction
add (105),(8)
–
• Advantage:
instruction
– Easier recall of instructions
• Disadvantage:
– Need to convert mnemonics and hexadecimal
numbers back to binary
How Programs Are Loaded Into
Main Memory
• Programs are loaded as binary numbers
• Assumptions:
– An instruction is represented by a 2-digit
hexadecimal number (e.g. add by 1A, mov by A0)
– World length of instruction parameters: 3 hex digits
(12 bits)
Instruction
100
Instruction parameters
101
102
103
A01040F2 1A105008 A00F2200 01000000
104
00000009
100
1010 0000 0001 0000 0100 0000 1111 0010
105
…
Executing Multiple Programs
• Programs share
processor time
• Time slicing
• Supported by
modern CPUs