CO-UNIT-V-Chandrashekar-MA
Download
Report
Transcript CO-UNIT-V-Chandrashekar-MA
Computer Organization
The Memory System
Department of CSE, SSE Mukka
www.bookspar.com | Website for students |
VTU NOTES
Chapter Objectives
Basic memory circuits
Organization of the main memory
Cache memory concept –
Virtual memory mechanism –
Shortens the effective memory access time
Increases the apparent size of the main memory
Secondary storage
Magnetic disks
Optical disks
Magnetic tapes
www.bookspar.com | Website for students |
VTU NOTES
Basic Memory Concepts
The maximum size of the Main Memory (MM) that can be
used in any computer is determined by its addressing
scheme.
For eg.,
16 – bit computer that generates 16-bit addresses is
capable of addressing up to ?
32 – bit computer with 32-bit address can address _____
memory locations
40 – bit computer can address _______ memory
locations
www.bookspar.com | Website for students |
VTU NOTES
Word addressability and byteaddressability
If the smallest addressable unit of information is a
memory word, the machine is called word-addressable.
If individual memory bytes are assigned distinct addresses,
the computer is called byte-addressable.
Most of the commercial machines are byte-addressable.
For example in a byte-addressable 32-bit computer, each
memory word contains 4 bytes.
A possible word-address assignment would be:
Word Address
0
4
8
Byte Address
0
1
2
4
5
6
8
9
10
www.bookspar.com | Website for students |
VTU NOTES
3
7
11
Basic Memory Concepts
Word length of a computer is the number of bits
actually stored or retrieved in one memory access
For eg., a byte addressable 32-bit computer, whose
instructions generate 32-bit addresses
High order 30 bit to determine which word in memory
Low order 2 bits to determine which byte in that word
Suppose we want to fetch only one byte from a word.
In case of Read operation, other bytes are discarded by
processor
In case of Write operation, care should be taken not to
overwrite other bytes
www.bookspar.com | Website for students |
VTU NOTES
Basic Memory concepts
Data transfer between memory and the processor takes
place through the use of 2 processor registers
If MAR is k bits long and MDR n bits long
MAR – Memory address register
MDR – Memory Data Register
Memory unit may contain up to 2k addressable locations
During a memory cycle, n bits of data are transferred between
memory and the processor
No of address lines and data lines in processor?
There are additional control lines read/write, MFC, no of
bytes to be transferred etc
www.bookspar.com | Website for students |
VTU NOTES
Processor
Memory
k-bit
address bus
MAR
n-bit
data bus
MDR
Up to 2 k addressable
locations
Word length = n bits
Control lines
( R / W , MFC, etc.)
Figure 5.1. Connection of the memory to the processor.
www.bookspar.com | Website for students |
VTU NOTES
How processor reads data from the
memory ?
Loads the address of the required memory location into
MAR
Sets R/W line to 1
The memory responds by placing the requested data on
data lines
Confirms this action by asserting MFC signal
Upon receipt of MFC signal, processor loads the data on
the data lines in to the MDR register
www.bookspar.com | Website for students |
VTU NOTES
How processor Writes Data into
memory?
Loads the address of the location into MAR
Loads the data into MDR
Indicates Write operation by setting R/W line to 0
www.bookspar.com | Website for students |
VTU NOTES
Some concepts
Memory Access Times: It is a useful measure of the speed of the memory unit. It
is the time that elapses between the initiation of an
operation and the completion of that operation (for
example, the time between READ and MFC).
Memory Cycle Time :It is an important measure of the memory system. It
is the minimum time delay required between the
initiations of two successive memory operations (for
example, the time between two successive READ
operations). The cycle time is usually slightly longer than
the access time.
www.bookspar.com | Website for students |
VTU NOTES
Random Access Memory (RAM)
A memory unit is called a Random Access Memory if
any location can be accessed for a READ or WRITE operation
in some fixed amount of time that is independent of the
location’s address.
Main memory units are of this type.
This distinguishes them from serial or partly serial access
storage devices such as magnetic tapes and disks which
are used as the secondary storage device.
www.bookspar.com | Website for students |
VTU NOTES
Cache Memory
The CPU processes instructions and data faster than they can
be fetched from compatibly priced main memory unit.
Memory cycle time becomes the bottleneck in the system.
One way to reduce the memory access time is to use cache
memory.
Its a small and fast memory that is inserted between the larger,
slower main memory and the CPU.
Holds the currently active segments of a program and its data.
Because of the locality of address references,
CPU finds the relevant information mostly in the cache memory itself
(cache hit)
infrequently needs access to the main memory (cache miss)
With suitable size of the cache memory, cache hit rates of
over 90% are possible
www.bookspar.com | Website for students |
VTU NOTES
Memory Interleaving
This technique divides the memory system into a number
of memory modules
Arranges addressing so that successive words in the
address space are placed in different modules.
When requests for memory access involve consecutive
addresses, the access will be to different modules.
Since parallel access to these modules is possible, the average
rate of fetching words from the Main Memory can be
increased
www.bookspar.com | Website for students |
VTU NOTES
Virtual Memory
In a virtual memory System, the addresses generated by the
program may be different from the actual physical address
The required mapping between physical memory and logical
address space is implemented by a special memory control
unit, called the memory management unit.
The mapping function may be changed during program
execution according to system requirements.
The logical (virtual) address space
the address generated by the CPU is referred to as a virtual or
logical address.
can be as large as the addressing capability of the CPU
The physical address space
the actual physical memory can be much smaller.
www.bookspar.com | Website for students |
VTU NOTES
Virtual memory
Only the active portion of the virtual address space is
mapped onto the physical memory
the rest of the virtual address space is mapped onto the bulk
storage device like magnetic disks( hard disks)
If the addressed information is in the Main Memory (MM),
it is accessed and execution proceeds.
Otherwise, an exception is generated, in response to
which
the memory management unit transfers a contiguous
block of words containing the desired word from the bulk
storage unit to the MM,
displacing some block that is currently inactive.
www.bookspar.com | Website for students |
VTU NOTES
b7
b7
b1
b1
b0
b0
W0
•
•
•
FF
A0
A2
•
•
•
A1
W1
FF
Address
decoder
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
Memory
cells
A3
•
•
•
W15
Sense / Write
circuit
Data input/output lines: b7
Sense / Write
circuit
Sense / Write
circuit
b1
b0
Figure 5.2. Organization of bit cells in a memory chip.
www.bookspar.com | Website for students |
VTU NOTES
R/W
CS
An example of memory organization
A memory chip consisting of 16 words of 8 bits each, which is
usually referred to as a 16 x 8 organization.
The data input and the data output of each Sense/Write
circuit are connected to a single bi-directional data line in
order to reduce the number of pins required.
One control line, the R/W (Read/Write) input is used a specify
the required operation and
another control line, the CS (Chip Select) input is used to
select a given chip in a multichip memory system.
This circuit requires 14 external connections, and allowing 2
pins for power supply and ground connections, can be
manufactured in the form of a 16-pin chip.
It can store 16 x 8 = 128 bits.
www.bookspar.com | Website for students |
VTU NOTES
5-bit row
address
W0
W1
5-bit
decoder
32 ´ 32
memory cell
array
W31
10-bit
address
Sense/Write
circuitry
32-to-1
output multiplexer
and
input demultiplexer
5-bit column
address
Data
input/output
Figure 5.3. Organization of a 1K 1 memory chip.
www.bookspar.com | Website for students |
VTU NOTES
R/ W
CS
1K X 1 memory chip
The 10-bit address is divided into two groups of 5 bits
each to form the row and column addresses for the cell
array.
A row address selects a row of 32 cells, all of which are
accessed in parallel.
One of these, selected by the column address, is
connected to the external data lines by the input and
output multiplexers.
This structure can store 1024 bits, can be implemented in
a 16-pin chip.
www.bookspar.com | Website for students |
VTU NOTES
Static memories
Memories that consist of circuits capable of retaining
their state as long as power is applied – static memories
Static rams can be accessed very quickly – few nanosecs
www.bookspar.com | Website for students |
VTU NOTES
Bit line
b
b
Two inverters
2 transistors T1 and T2
T1
X
Y
T2
Word line
Bit lines
Figure 5.4. A static RAM cell.
www.bookspar.com | Website for students |
VTU NOTES
When word
line is at
ground level
transistors are
turned off,
and latch
retains its
state
Read and Write operation in SRAM
Read
Word line is activated – to close switches T1 and T2
If cell is in state 1, the signal on bit line b is high and signal on
bit line b’ is low
Opposite holds if cell is in state 0
Sense/Write circuits at the end of the bit lines monitor the
states of b and b’ and sets output
Write
State of cell is set by placing appropriate value on bit line ba dn
b’ and then word line is activated
This forces the cell into corresponding state
Required signals on bit lines are generated by Sense/Write
circuit
www.bookspar.com | Website for students |
VTU NOTES
www.bookspar.com | Website for students |
VTU NOTES
Dynamic RAMs
Static RAMs are fast but come at a higher cost
Less expensive RAMs using less no of transistors,
Their cells require several transistors
But their cells cannot retain their state indefinitely
Called as Dynamic RAMs
Information stored in the form of charge on a capacitor
This charge can be maintained only for tens of milliseconds
Contents must be periodically refreshed
www.bookspar.com | Website for students |
VTU NOTES
Dynamic RAM – needs to refreshed
periodically to hold data
Bit line
Word line
T
C
Figure 5.6. A single-transistor dynamic memory cell
www.bookspar.com | Website for students |
VTU NOTES
A 16-Mbit DRAM chip, configured as
2M X 8
The cells are organized as 4K X 4K array
The 4096 cells in each row are divided into 512 groups of
8
A row can hence store 512 bytes of data
12 Address bits are required to select a row
9 bits needed to specify a group of 8 bits in the selected row
www.bookspar.com | Website for students |
VTU NOTES
Timing is
controlled
asynchronously.
Specialized
memory controller
circuit to provide
the necessary
control signals
CAS and RAS,
that govern A20 - 9 A 8- 0
the timing.
RA S
Row
address
latch
Row
decoder
4096X(512X8)
cell array
Sense / Write
circuits
Hence it is
asynchronous
DRAM
Column
address
latch
CA S
CS
R/ W
Column
decoder
D7
D0
Figure 5.7. Internal organization of a 2M ´ 8 dynamic memory chip.
www.bookspar.com | Website for students |
VTU NOTES
Fast Page mode
All bits of a row are sensed but only 8 bits are placed
This byte is selected by column address bits
A simple modification can make it access other bytes of the same
row without having to reselect the row
Add a latch to the output of the sense amplifier in each column
The application of a row address will load latches corresponding to
all bits in a selected row
Most useful arrangement is to transfer bytes in sequential order
Need only different column addresses to place the different bytes on the
data lines
Apply a consecutive sequence of column addresses under the control of
successive CAS signals.
This scheme allows transferring a block of data at a much faster rate
than can be achieved for transfers involving random addresses
This block transfer capability is called as fast page mode
www.bookspar.com | Website for students |
VTU NOTES
SYNCHRONOUS DRAMs
Operation directly synchronized with a clock signal
Called as SDRAMs
The cell array is the same as in Asynchronous DRAMs.
The address and data connections are buffered by means
of registers
Output of each sense amplifier is connected to a latch
A read operation causes the contents of all cells in the
selected row to be loaded into these latches
If an access is made for refreshing purposes only, it wont
change the contents of these latches
Data held in the latches that correspond to the selected
column(s) are transferred into the data output register
www.bookspar.com | Website for students |
VTU NOTES
Refresh
counter
Row
address
latch
Row
decoder
Cell array
Column
address
counter
Column
decoder
Read/Write
circuits & latches
Row/Column
address
Clock
RA S
CA S
R/ W
Mode register
and
timing control
Data input
register
Data output
register
CS
Data
Figure 5.8. Synchronous DRAM.
www.bookspar.com | Website for students |
VTU NOTES
SYNCHRONOUS DRAMs
SDRAMs have several different modes of operation
Selected by writing control information into a mode register
Can specify burst operations of different lengths
In SDRAMs, it is not necessary to provide externally
generated pulses on the CAS line to select successive
columns
Necessary signals are provided using a column counter and
clock signal
Hence new data can be placed on data lines at each clock cycle
All actions triggered by rising edge of the clock
www.bookspar.com | Website for students |
VTU NOTES
Clock
R/ W
RAS
CAS
Address
Data
Row
Col
D0
D1
D2
D3
Figure 5.9. Burst read of length 4 in an SDRAM.
www.bookspar.com | Website for students |
VTU NOTES
Burst read of length 4 in an SDRAM.
Row address latched under control of RAS signal
Column address is latched under control of CAS signal
After delay of 1 cycle, first set of data bits placed on data
lines
Memory takes about 2-3 cycles to activate selected row
SDRAM automatically increments column address to access
the next 3 sets of bits in the selected row, placed on data lines
in successive clock cycles
SDRAMs have built in refresh circuitry
Provides the addresses of rows that are selected for refreshing
Each row must be refreshed at least every 64ns
www.bookspar.com | Website for students |
VTU NOTES
Latency and Bandwidth
The parameters that indicate the performance of the
memory
Memory latency – amount of time it takes to transfer a
word of data to or from memory
In block transfers, latency is used to denote the time it takes to
transfer the first word of data
This is longer than the time needed to transfer each subsequent word
of a block
In prev diagram, access cycle begins with assertion of RAS and
first word is transferred 5 cycles later.
Hence latency is 5 clock cycles
www.bookspar.com | Website for students |
VTU NOTES
Bandwidth
Bandwidth usually is the no of bits or bytes that can be
transferred in one sec
Depends on
Speed of memory access
Transfer capability of the links – speed of the bus
No of bits that can be accessed in parallel
Bandwidth is product of the rate at which data are
transferred ( and accessed) and width of the data bus
www.bookspar.com | Website for students |
VTU NOTES
Double – Data – Rate SDRAM ( DDR
SDRAMs)
The standard SDRAM performs all actions on the rising edge
of the clock signal
DDR SDRAMs access the cell array in same way but transfers
data on both the edges of the clock
The latency is the same as standard SDRAMs
But since they transfer data on both the edges of clock,
bandwidth is essentially doubled for long burst transfers
To make this possible, the cell array is organized into 2 banks
Each bank can be accessed separately
Consecutive words of a given block are stored in different banks
Efficiently used in applications where block transfers are
prevalent
Eg., main memory to and from processor caches
www.bookspar.com | Website for students |
VTU NOTES
Questions for assignment
1. Explain how processor reads and writes data froma dn to
memory
2. explain organization of 1K X 1 memory chip
3. Explain a single SRAM cell with diagram. How read and write
operations are carried out?
4. Explain DRAM cell with diagram. How read and write
operations are carried out?
5. Explain 2M X 8 DRAM chip. How can you modify this for
fast page mode
6. Explain SDRAMs with help of a diagram
7. Explain the terms latency and bandwidth
8. Explain the burst length read of 4 in SDRAM with timing
diagram
9. Explain DDR SDRAMs
www.bookspar.com | Website for students |
VTU NOTES
Structure of larger memories
Memory systems connected to form larger memories
There are 2 types of memory systems
Static memory systems
Dynamic memory systems
www.bookspar.com | Website for students |
VTU NOTES
Static Memory systems
Following is the diagram for implementation of 2M X 32 memory using 16
512K X 8 static memory chips
There are 4 columns, each column containing 4 chips to implement one
byte position
Only selected chips ( using chip select input ) place data on output lines
21 address bits are needed to select a 32 bit word in this memory
high order 2 bits used to determine which of the 4 chip select signals should be
activated
19 bits used to access specific byte locations inside each chip of selected row
R/W inputs of each chip are tied together to form a single R/W signal
Dynamic memory systems are organized much in the same manner as
static
Physical implementation more conveniently done in the form of memory
modules
www.bookspar.com | Website for students |
VTU NOTES
21-bit
addresses
19-bit internal chip address
A0
A1
A19
A20
2-bit
decoder
512 K X 8
memory chip
D31-24
D23-16
D 15-8
D7-0
512 K X 8 memory chip
19-bit
address
8-bit data
input/output
Chip select
| Website
forusing
students
| 8 static memory chips.
Figure 5.10. Organization ofwww.bookspar.com
a 2M 32 memory
module
512K
VTU NOTES
Memory System Considerations
The choice of a RAM for a given system depends on several
factors
Static RAMs are used when very fast operation is the primary
requirement
Cost
Speed
Power dissipation
Size of chip
Used mostly in cache memories
Dynamic RAMs are predominant choice for computer main
memories
High densities achievable make larger memories economically
feasible
www.bookspar.com | Website for students |
VTU NOTES
Memory Controller
To reduce number of pins, dynamic memory chips use
multiplexed address inputs
Address divided into 2 parts
High-order address bits, to select a row in a cell array, are
provided first and latched into memory under control of RAS
Low-order address bits, to select a column, are provided on
the same address pins and latched under CAS signal
Processor issues all bits of address at the same time
The required multiplexing of address bits are performed
by a memory controller circuit,
www.bookspar.com | Website for students |
VTU NOTES
Row/Column
address
Address
RAS
R/ W
Request
Memory
controller
Processor
CAS
R/ W
CS
Clock
Clock
Data
Figure 5.11. Use of a memory controller.
www.bookspar.com | Website for students |
VTU NOTES
Memory
Memory controller functions
Interposed between processor and memory
Processor sends Request signal
The controller forwards the row and column portions of
address to the memory
Accepts complete address and R/W signal from the processor
Generates the RAS and CAS signals
Also sends R/W and CS signals to the memory
Data lines are directly connected between the processor and
the memory
When used with DRAM chips, the memory controller
provides all the information needed to control the refreshing
process
Contains a refresh counter – to refresh all rows within the time limit
specified for a device
www.bookspar.com | Website for students |
VTU NOTES
RAMBUS Memory
To increase the system bandwidth we need to increase
system bus width or system bus speed
A wide bus is expensive and required lot of space on
motherboard
Rambus – narrow bus but much faster
Key feature is fast signaling method used to transfer
information between chips
Uses the concept of differential signaling
Instead of either 0 volts or Vsupply ( 5 Volts ), uses 0.3 volt differences
from a reference voltage called as Vref
www.bookspar.com | Website for students |
VTU NOTES
READ-ONLY Memories (ROMs)
Both SRAMs and DRAMs are volatile
Many applications need to retain data even if power is off
Loses data if power is turned off
E.g., a hard disk used to store information, including OS
When system is turned on , need to load OS from hard disk to
memory
Need to execute a program that boots OS
That boot program, since is large, is stored on disk
Processor must execute some instructions that load boot program
into memory
So we need a small amount of non volatile memory that holds
instructions needed to load boot program into RAM
Special type of writing process to place info into non volatile
memories
Called as ROM – Read Only Memory
www.bookspar.com | Website for students |
VTU NOTES
Bit line
Word line
T
P
Not connected to store a 1
Connected to store a 0
Figure 5.12. A ROM cell.
www.bookspar.com | Website for students |
VTU NOTES
ROM
Transistor is connected to ground at point P then 0 is
stored
Else 1 is stored
Bit line connected to a power supply through a resistor
To read, word line is activated
If voltage drops down – then 0
If voltage remains same – then 1
www.bookspar.com | Website for students |
VTU NOTES
PROM
Allows data to be loaded by the user
Achieved by inserting a fuse at point P in the prev figure
Before it is programmed, memory contains all 0s
The user can insert 1at required locations using highcurrent pulses
Process is irreversible
www.bookspar.com | Website for students |
VTU NOTES
EPROM
Allows the stored data to be erased and new data to be
loaded
Erasable, reprogrammable ROM – called as EPROM
Can be used when memory is being developed
So that it can accommodate changes
Cell structure is similar to ROM
The connection to ground is always made at point P
A special transistor is used – ability to function either as a normal
transistor or as a disabled transistor which is always turned off
Can be programmed to behave as permanently open switch
Can erase by exposing the chip to UV light which dissipate the
charges trapped in transistor memory cells
www.bookspar.com | Website for students |
VTU NOTES
EEPROM
Disadvantage of EPROMs
EEPROM – another version of Erasable PROM that can be
both programmed and erased electrically
Chip must be physically removed from the circuit for
reprogramming
Entire contents are erased from UV light
Need not be removed for erasure
Can erase cell contents selectively
Disadvantage
Different voltages needed for erasing , writing and reading
stored data
www.bookspar.com | Website for students |
VTU NOTES
Flash Memory
An approach similar to EEPROM
In EEPROM can read and write a single cell
In Flash memory – can read the contents of a single cell but
can write only to a block of cells
Flash devices have greater density
A flash cell is based on a single transistor controlled by trapped
charge
Higher capacity
Lower cost per bit
Require a single power supply voltage
Consumes less power in operation
Used in portable equipment that is battery driven – handheld
computers, cell phones, digital cameras, MP3 players
www.bookspar.com | Website for students |
VTU NOTES
Flash Cards and Flash Drives
Single flash chips do not provide sufficient storage capacity
Larger memory modules are required – flash cards and flash
drives
Flash cards
Mount flash chips on a small card
A card is simply plugged into a conveniently accessible slot
Variety of sizes
Flash Drives
Larger modules to replace hard disk drives
Designed to fully emulate hard disks – not yet possible
Storage capacity is significantly lower
Have shorter seek and access times hence faster response
Lower power consumption
Insensitive to vibration
www.bookspar.com | Website for students |
VTU NOTES
Speed, Size and Cost
Ideal memory – fast, large and inexpensive
Very fast memory if SRAM chips are used
DRAM chips are cheaper
But also slower
Solution for space is to provide large secondary storage
devices
These chips are expensive
So impractical to build a large Memory using SRAM chips
Very large disks at reasonable prices
For main memory – use DRAMs
Use SRAMs in smaller memories like cache memory
www.bookspar.com | Website for students |
VTU NOTES
Processor
Registers
Increasing
size
Primary L1
cache
Increasing Increasing
speed cost per bit
Secondary
cache
L2
Main
memory
Magnetic disk
secondary
memory
www.bookspar.com | Website for students |
Figure 5.13. Memory hierarchy.
VTU NOTES
Cache Memories
Speed of main memory slower than modern processors
Processor cannot spend time wasting to access instructions
and data in main memory
Use a cache memory which is much faster and makes the main
memory appear faster to processor than it really is
Effectiveness of cache based on locality of reference – many
instructions in localized areas of the program are executed
repeatedly during some time period, and the remainder of the
program is accessed relatively infrequently. Two ways
Temporal – a recently executed instruction is likely to be executed
again very soon
Spatial – instructions in close proximity to a recently executed
instruction ( with respect to instruction’s address ) are likely to be
executed soon
www.bookspar.com | Website for students |
VTU NOTES
Operation of a cache
If the active segments of the program can be placed in fast
cache memory – can reduce total execution time significantly
Memory control circuitry designed to take advantage of
locality of reference
Temporal – whenever an item( instruction or data) is first needed,
this item is brought into the cache – remains there till needed again
Spatial – instead of fetching just one item from the main memory to
the cache, fetch several items that reside at adjacent addresses. –
referred to as block or cache line.
Replacement algorithm – to decide which block of data to be
moved back from cache to main memory so that a new block
can be accommodated
www.bookspar.com | Website for students |
VTU NOTES
Processor
Cache
Main
memory
Figure 5.14. Use of a cache memory.
www.bookspar.com | Website for students |
VTU NOTES
Operation of a cache
Read request from processor
The contents of a block of memory words containing the location
specified are transferred into the cache one word at a time
The cache can store reasonable no of words; but it is small
compared to main memory
When the program references any of the locations in this block, the
desired contents are read directly from the cache.
The correspondence between main memory blocks and those in the
cache is specified by a mapping function
When the cache is full and a memory word that’s not in cache
is referenced, the cache control hardware decides which block
must be removed to create space for newly arrived block
The collection of rules for this operation is called replacement
algorithm
www.bookspar.com | Website for students |
VTU NOTES
Cache operation
Processor does not explicitly need to know existence of cache
It issues read/write requests using memory addresses
the cache control circuitry determines whether the requested word
is currently in cache.
If in cache, the read/write operation is preformed on appropriate cache
location
In this case, a read or write hit is said to have occurred
If its read operation, then main memory is not involved
For a write operation – there are 2 options
Write-through protocol – both cache and main memory updated
simultaneously
Write-back or copy-back protocol – only cache will be updated during
write operation. ( denote this using a dirty or modified bit) later when
we are moving this block back to main memory – update the main
memory.
www.bookspar.com | Website for students |
VTU NOTES
Limitations of write-through and write
back protocols
The write-through protocol is simpler but it results in
unnecessary write operations in the main memory when
a given cache word is updated several times during its
cache residency
The write-back protocol may also result in unnecessary
write operations because when a cache block is written
back to the memory all words of the block are written
back, even if only a single word has been changed while
the block was in cache
www.bookspar.com | Website for students |
VTU NOTES
Read miss
When the addressed word is not present in cache
The block of words that contains the requested word is
copied from the main memory into cache
After that, the requested word is sent to processor
Alternatively, this word may be sent to the processor as
soon as it is read from the memory
Called as load-through or early restart
Reduces the processor’s waiting period
But needs more complex circuitry
www.bookspar.com | Website for students |
VTU NOTES
Write miss
Occurs if the addressed word is not in cache
If write-through protocol is used, the information is
written directly into the main memory
If write-back protocol is used,
the block containing the addressed word is first brought into
the cache
The desired word in the cache is overwritten with new info
www.bookspar.com | Website for students |
VTU NOTES
Mapping functions
Correspondence between main memory blocks and those in
the cache
3 techniques
Consider a cache
128 blocks of 16 words each
Total of 2K words ( 2048)
Consider Main memory
Direct mapping
Associative mapping
Set-associative mapping
has 16 bit address
64Kwords
4K blocks of 16 words each
Consecutive address refers to consecutive memory locations
www.bookspar.com | Website for students |
VTU NOTES
Direct Mapping
The simplest way to determine cache locations in which to
store memory blocks is the direct-mapping technique
Block j of main memory maps onto block j modulo 128 of the
cache. ( refer following figure )
Whenever one of main memory blocks 0,128,256,.. is loaded
in the cache, it is stored in cache block 0.
Blocks 1,129,257,… are stored in cache block ?
Since more than one memory block is mapped onto a given
cache block position, contention may arise even when the
cache is not full
Eg., instructions of a program may start at block 1 and continue in
block 129 ( possibly after a branch)
Can resolve the contention by allowing new block to overwrite the
currently resident block
www.bookspar.com | Website for students |
VTU NOTES
Main
memory
Block 0
Block 1
tag
Cache
Block 127
Block 0
Block 128
tag
Block 1
Block 129
tag
Block 127
Block 255
Block 256
Block 257
Block 4095
T ag
5
Block
7
Word
4
Main memory address
Figure 5.15. Direct-mapped cache.
www.bookspar.com | Website for students |
VTU NOTES
Direct mapping contd..
Placement of a block in the cache is determined from the
memory address.
Memory address divided into 3 fields
Low order 4 bits – 1 out of 16 words
7 bit cache block field – to determine which cache block this
new block is stored
High- order 5 bits – tag bits associated with the location in
cache
Identifies which of the 32 blocks that are mapped into this cache
position are currently resident in cache
If they match then the desired word is in the cache,
If there is no match, the block containing the required word must first
be read from the main memory and loaded into cache
| Website
for students |
Direct mapping iswww.bookspar.com
easy but not
flexible
VTU NOTES
Associative mapping
A main memory block can be placed into any cache block
position
12 tag bits to identify a memory block when it is resident in
the cache
The tag bits of an address received from the processor are
compared to the tag bits of each block of the cache to see if
desired block is present
Called as associative - mapping technique
Gives complete freedom in choosing the cache location in
which to place the memory block
New block has to replace an existing block only if the cache is full
Need replacement algorithms to choose which block to replace
Cost of this mapping technique is higher than direct mapping
as we need to search all 128 tag patterns – called as
associative search
www.bookspar.com | Website for students |
VTU NOTES
Main
memory
Block 0
Block 1
Cache
tag
Block 0
tag
Block 1
Block
tag
i
Block 127
Block 4095
T ag
12
W ord
4
Main memory address
Figure 5.16. Associative-mapped cache.
www.bookspar.com | Website for students |
VTU NOTES
Set-associative mapping
Combination of direct mapping and associative mapping
Blocks of the cache are grouped into sets
Mapping allows a block of the main memory to reside in any block of
a specific set.
So, we have got a few choices where to place the block, the problem
of contention of the direct method is eased
The hardware cost is reduced by decreasing the size of the
associative search.
Following figure is a example – with 2 blocks per set.
Memory blocks 0,64,128,….,4032 map into cache set 0, and they can
occupy either of the two block positions within this set.
Total 64 sets, so we need 6 bits to choose a set
Compare tag field with tags of the cache blocks to check if the
desired block is present
www.bookspar.com | Website for students |
VTU NOTES
Main
memory
Block 0
Block 1
Cache
tag
Set 0
tag
tag
Set 1
tag
tag
Set 63
tag
Block 0
Block 63
Block 1
Block 64
Block 2
Block 65
Block 3
Block 127
Block 126
Block 128
Block 127
Block 129
Block 4095
T ag
6
Set
6
W ord
4
Main memory address
Figure 5.17. Set-associative-mapped cache with two blocks per set.
www.bookspar.com | Website for students |
VTU NOTES
Set associative mapping contd…
No of blocks per set is parameter that can be selected to
suit the requirements of the computer
Four blocks per set can be accommodated by a 5-bit set field.
Eight blocks per set can be accommodated by 4-bit set field
128 blocks per set? Requires no set bits and is fully associative
technique , with 12 tag bits
Other extreme of one block per set is direct-mapping method
A cache with k blocks per set is called as k-way setassociative cache.
www.bookspar.com | Website for students |
VTU NOTES
Valid bit and cache coherence problem
A control bit called as valid bit is provided for each block
Valid bit is initially 0 when
Indicates whether the block contains a valid data
Is different from the dirty or modified bit
Dirty bit is required only in systems that don’t use write-through
method
power is applied to system
Main memory is loaded with new programs and data from the
disk
Transfers from the disk to the main memory are carried
out by a DMA mechanism
Normally DMA transfers bypass cache ( cost and performance)
www.bookspar.com | Website for students |
VTU NOTES
Valid bit and cache coherence problem
Valid bit of a block is set to 1 the first time block is loaded from
main memory
Whenever a main memory block is updated by a source that
bypasses cache
A check is made to determine whether the block being loaded is
currently in cache
Whenever a DMA transfer made from the main memory to the
disk, and cache uses write-back protocol
If so, then its valid bit is cleared to 0.
This ensures that stale data does not exist in cache
Data in the memory might not reflect the changes that have been made
in the cached copy.
Solution : flush the cache by forcing the dirty data to be written back to
the memory before DMA transfer takes place
Need to ensure that two different entities ( processor and DMA in
this case ) uses the same copies of data is referred to as a cache
coherence problem
www.bookspar.com | Website for students |
VTU NOTES
Replacement algorithms
In direct mapping method, the position of each block is predetermined
No replacement strategy exists
In associative and set-associative strategy, there is some
flexibility
If cache is full when a new block arrives, the cache controller must
decide which of the old blocks to overwrite
This decision is very important and determines system performance
Keep the blocks in the cache that may be referenced in the near future
Some algorithms
LRU block
Oldest block
Random block
www.bookspar.com | Website for students |
VTU NOTES
Least Recently Used ( LRU )
replacement algorithm
Uses the property of locality of reference
High probability that the blocks that have been referenced
recently will be referenced soon
So when a block needs to be overwritten, overwrite the one
that has gone the longest time without being referenced
This block is called as least recently used block
The cache controller must track references to all the blocks
Uses a 2-bit counter for a set of 4 blocks
When hit occurs – the block’s counter is made 0
Lower values are incremented by 1
Higher values are unchanged
When a miss occurs –
Set is not full – new block is loaded and assigned counter value 0
Set is full – block with counter value 3 is removed and new block put
in its place. Other 3 blocks’ counters are incremented by 1
www.bookspar.com | Website for students |
VTU NOTES
Reading assignment :
Go through the examples of mapping techniques in the
text book
www.bookspar.com | Website for students |
VTU NOTES
Performance considerations
2 key factors in success of a computer – cost and performance
Performance depends on how fast instructions can be brought
into the processor for execution and how fast they can be
executed
The objective is to achieve best possible performance at the lowest
possible cost
Challenge in design alternative is to improve performance without
increasing cost
Measure of success – price/performance ratio
In this unit, we will focus on the first aspect
In case of memory we need shorter access time and larger
capacity
If we have a slow and faster unit – it is beneficial if we can transfer
data at the rate of faster unit – to achieve this we go for parallel
access using a technique called as interleaving
www.bookspar.com | Website for students |
VTU NOTES
Interleaving
Main memory of a computer is structured as a collection of
physically separate modules
Each with its own address buffer register ( ABR )and data buffer
register ( DBR )
Memory access operations may proceed in more than one module at
the same time.
Two ways of implementing interleaving
High order k bits name one of n modules and low order m bits name a
particular word in that module
When consecutive locations are accessed only one module is involved
Devices with DMA capability can access info from other memory
modules
Low – order k bits select a module and high order m bits name a location
within that module
Consecutive addresses are located in successive modules
Hence faster access and higher average utilization
Called as memory interleaving – more effective way
www.bookspar.com | Website for students |
VTU NOTES
k bits
Module
ABR
DBR
m bits
Address in module
ABR
Module
0
DBR
MM address
ABR
DBR
Module
n- 1
Module
i
(a) Consecutive words in a module
m bits
k bits
Address in module
ABR
DBR
Module
0
ABR
Module
DBR
MM address
ABR
DBR
Module
2k - 1
Module
i
(b) Consecutive words in consecutive modules
Figure 5.25. Addressing multiple-module memory systems.
Website for students |
Go through the examplewww.bookspar.com
in the text for |better
understanding
VTU NOTES
Problem
A cache with 8-word blocks, on a read miss, the block that contains
the desired word must be copied from main memory into cache.
Assume – it takes one clock cycle to send an address to main memory
Memory built using DRAM chips – first word access takes 8 cc and
subsequent words in same block can be accessed in 4cc per word.
One CC needed to send one word to cache
Using a single memory module – time needed to load desired block
into cache is
1 + 8 +(7X4) + 1 = 38 CC
Using memory interleaving – 4 words accessed in 8 CC and
transferred in next 4 CC word by word, during which remaining 4
words are read and stored in DBR. These 4 words are transferred
one word at a time to cache
So time required to transfer a block is
1+8+4+4 = 17 CC
www.bookspar.com | Website for students |
VTU NOTES
Hit rate and miss penalty
The number of hits stated as fraction of all attempted
accesses is called the hit rate
The number of misses stated as a fraction of all
attempted accesses is called as miss rate
Hit rates well over 0.9 are essential for high-performance
computers
Performance is adversely affected by the actions that must
be taken after a miss
the extra time needed to bring the desired info into the cache
is called as miss penalty
Miss penalty is the time needed to bring a block of data from a
slower unit in memory hierarchy to a faster unit
Interleaving can reduce miss penalty substantially
www.bookspar.com | Website for students |
VTU NOTES
Problem
Let h be hit rate, M the miss penalty – ( time to access info from main
memory), and C – the time to access information in the cache. Average
access time is
tave = hC + (1-h)M
Consider same parameters as previous problem
If computer has no cache, then using a fast processor and a typical DRAM main
memory, it takes 10 clock cycles for each memory read access
Suppose computer has a cache that holds 8-word blocks and an interleaved main
memory.
Suppose 30 percent of the instructions in a program perform read or write
operation – 130 memory accesses for every 100 instructions
Assume hit rates are .95 for instructions and .9 for data
Assume miss penalty is same for both read and write accesses
An estimate of improvement in performance is –
Time without cache / time with cache =
Then it requires 17 cycles ( as discussed before ) to load one block to cache
(130 X 10) / ( 100(.95X1 + .05X17) + 30(.9X1 + .1X17) ) = 5.04
So computer with cache performs 5 time better ( considering processor clock
and system bus have same speed)
www.bookspar.com | Website for students |
VTU NOTES
Caches on the processor chip
From speed point of view, optimal space for cache is on the
processor chip
Since space on processor chip Is required for many other functions,
this limits the size of cache that can be accommodated
Either a combined cache(offers greater flexibility in mapping) for
instructions and data or separate caches(increases parallel access of
information but more complex circuitry) for instructions and data.
Normally 2 levels of caches are used
L1 and L2 cache
L1 designed to allow very fast access by processor
L2 can be slower but it should be much larger to ensure high hit rate
Its access time will have a very large effect on clock rate of processor
A work station computer may include L1 cache with capacity 10s of
kilobytes and L2 cache with capacity several megabytes
Including L2 cache further reduces the impact of main
memory speed on the performance of the computer
www.bookspar.com | Website for students |
VTU NOTES
Cache on processor chip
Average access time experienced by the processor with 2
levels of caches is
tave = h1C1 + (1-h1)h2C2 + (1-h1)(1-h2)M
h1 – hit rate in L1
h2 – hit rate in L2
C1 – time to access info in L1 cache
C2 – time to access info in L2 cache
M – time to access info in main memory
No of misses in L2 cache must be very low
www.bookspar.com | Website for students |
VTU NOTES
Write buffers
Temporary storage area for write requests
Usage when write-through protocol is used
Each write operation results in writing new value to the memory
If processor waits for memory function to be completed, then
processor is slowed down
Processor immediately does not require results of write operation
Processor instead of waiting for write operation to complete, places
the write requests into this buffer and continues execution of next
instruction
The write requests are sent to memory whenever read requests are
not serviced by memory
Because read requests must be serviced immediately – else processor
cannot proceed without the data to be read
www.bookspar.com | Website for students |
VTU NOTES
Write buffers
Write buffer holds a number of write requests
A read request may refer to data that are still in write buffer
So, addresses of data to be read from memory are compared with
addresses of the data in the write buffer
In case of match , data in write buffers are used
Usage of write buffers when write-back protocol is used
Write operations are simply performed on the corresponding word
in the cache
If a new block of data comes into cache as a result of read miss, it
replaces an existing block which has some dirty data ( modified data)
which has to be written into main memory
If write-back operation is performed first, then processor has to wait
longer for new block to be read into the cache
So to read the data first, provide a fast write buffer for temporary
storage of dirty block that is ejected
Afterwards contents of write buffer are written into memory
www.bookspar.com | Website for students |
VTU NOTES
Prefetching
New data are brought into cache when they are first needed
The processor has to pause until the new data arrive
To avoid this stalling of the processor, it is possible to prefetch the data into
the cache before they are needed
Prefetching done either through software or hardware
In software – include a separate prefetch instruction – which loads the
data into cache by the time data are required in the program
Allows overlapping of accesses to main memory and computation of
the processor
Prefetching instructions inserted either by compiler or by
programmer – compiler insertion is better
In hardware – adding circuitry that attempts to discover a pattern in
memory references and prefetches data according to this pattern
www.bookspar.com | Website for students |
VTU NOTES
Lock-up free cache
The software Prefetching does not work well if it
interferes with normal execution of instructions
If action of prefetching stops other accesses to the cache until
the prefetch is completed
A cache of this type is said to be locked while it services a miss
Allow the processor to access the cache while the miss is
being serviced
A cache that allows multiple outstanding misses is called lockupfree cache
Since it can service only one miss at a time, it must have
circuitry to keep track of all outstanding misses
By including special registers to hold pertinent information
www.bookspar.com | Website for students |
VTU NOTES
VIRTUAL Memories
Refer to slides given separately
www.bookspar.com | Website for students |
VTU NOTES
SECONDARY Storage
Magnetic hard disks
Organization and accessing of data on a disk
Access time
Typical disks
Data buffer/cache
Disk controller
Floppy disks
RAID Disk arrays
Commodity disk considerations
Optical disks
CD technology
CD-ROM
CD-Recordable
CD-Rewritable
DVD Technology
DVD-RAM
www.bookspar.com | Website for students |
VTU NOTES
Magnetic tape systems