Book: B.P. Douglass, “Doing Hard Time”. Website for VxWorks

Download Report

Transcript Book: B.P. Douglass, “Doing Hard Time”. Website for VxWorks

Lecture 3
Memory
Hardware Architecture
Example: MSP-430
Types of memory
• ROM (read-only memory, non-volatile):
– Mask-programmable: programmed by
manufactory and not by user, can not be
modified
– Flash programmable: can be modified, all the
memory is flashed and programmed again
• RAM (random access memory for writes):
– DRAM(dynamic): each bit of data requires
separate capacitor and transistor of the
integrated circuit, dynamic since must be
refreshed periodically otherwise info fades.
– SRAM(static): does not need refresh, each bit
requires six transistors
MSP-430 Memory Map
Bits, Bytes and Words in memory
Flash / ROM
RAM
See file “lnk430F2274.xcl” for more details about the memory map.
Memories
MPC430 – 2274 has the following memory size:
•Flash: 32 KB
•RAM: 1KB
Peripherals
•
Are connected to CPU through data, control
and address buses using instruction set.
Consist of:
1. Clock system – used by CPU and peripherals.
2. Brownout – provides internal reset signal
during power off/on.
3. Digital four 8-bit I/O ports:
•
•
Any combination of input, output and interrupt
condition is possible.
Read/write to port control registers are supported by
all instructions.
Watchdog timer
• Watchdog timer is periodically reset by
system timer.
• If watchdog is not reset, it generates an
interrupt to reset the host.
interrupt
host CPU
reset
watchdog
timer
4. Watchdog timer:
• Primary function is to perform system
reset after a software problem occurs.
• If the selected time interval is expired, a
systems reset is generated.
•
If watchdog function is not needed in the
application, it can perform secondary
function: can be configured as interval
timer and can generate interrupts after
certain time interval.
5. Timer_a3, Timer_b3: 16 bit timer/counter
with three capture/compare registers.
Interrupts may be generated from
counter overflow condition and from each
of the capture/compare registers.
6.
7.
8.
Peripherals
See
msp430x22x4.h
file
See
msp430x22x4.h
file
CPU
RISC vs. CISC
• Complex instruction set computer (CISC):
– many addressing modes;
– many operations.
• Reduced instruction set computer (RISC):
– load/store;
– pipelinable instructions.
• For code efficiency, it is better to use
CISC.
• RISC processor designed for speed and
not for code size efficiency.
• CISC designed for code size efficiency
since connects to slow devices.
• Compressed instruction set is stored.
CPU block diagram
Buses
• The system interconnect using Memory
Address Bus (MAB) and Memory Data
Bus (MDB)
Generic bus structure
m
• Address:
n
• Data:
• Control:
c
Fixed-delay memory access
read = 1
adrs = A
R/W
data
R/W
R
reg = data
adrs
mem[adrs] =
data
data = mem[adrs]
memory
CPU
W
Variable-delay memory access
read = 1
adrs = A
R/W
done = 0
data
R/W
R
done
n
y
reg = data
W mem[adrs] =
data
done = 1
adrs
data = mem[adrs]
done = 1
done
memory
CPU
Overheads for Computers as
Components
© 2000 Morgan Kaufman
Memory Management Unit (MMU)
This is a computer hardware component
responsible for handling accesses to memory
requested by the CPU.
MMU functions are:
• Translation of virtual addresses to physical
addresses
• Memory protection
• Cache control
• Bus arbitration
• Bank switching
Translation of virtual addresses
Memory management units
• Memory management unit (MMU)
translates addresses:
logical
address
CPU
memory
management
unit
physical
address
main
memory
Memory management tasks
• Allows programs to move in physical memory
during execution. In past used to compensate
on limited address space. Today the memory is
cheaper, and physical memory can be used
without logical memory.
• Allows virtual memory:
– memory images kept in secondary storage;
– images returned to main memory on demand during
execution.
Address translation
• Requires some sort of register/table to
allow arbitrary mappings of logical to
physical addresses.
• Two basic schemes:
– segmented; segment is a large arbitrarily
size section of memory
– paged; page is a small fixed size section of
memory
• Segmentation and paging can be
combined
Segments and pages
page 1
page 2
segment 1
memory
segment 2
Segment address translation
segment base address
logical address
+
segment lower bound
segment upper bound
range
check
physical address
range
error
Page address translation
page
offset
page i base
concatenate
page
offset
Page table organizations
page
descriptor
page descriptor
flat
tree
MMU address translation
• MMU divides the virtual address space (the
range of addresses used by the processor)
into pages, each having a size which is a
power of 2, usually a few kilobytes, but they
may be much larger.
• The bottom n bits of the address (the offset
within a page) are left unchanged.
• The upper address bits are the (virtual)
page number.
MMU address translation
• The MMU normally translates virtual page
numbers to physical page numbers via an
associative cache called a TLB.
• When the TLB lacks a translation, a slower
mechanism involving hardware-specific data
structures or software assistance is used.
• The data found in such data structures are
typically called page table entries (PTEs), and the
data structure itself is typically called a page
table.
• The physical page number is combined with the
page offset to give the complete physical address.
MMU cache
A TLB entry may also include information
about whether the page has been written
to (the dirty bit), when it was last used (the
accessed bit, for a least recently used
page replacement algorithm), what kind of
processes (user mode, supervisor mode)
may read and write it, and whether it
should be cached.
Page fault
• MMU keeps track of which logical addresses
actually reside in the main memory and those
that are kept in secondary storage.
• When CPU requests an address not in main
memory, MMU generates an exception page
fault.
• The exception handler reads the location from
secondary storage into main memory.
• For that some other location (usually LRU) is
moved from main memory to secondary
storage.
MMU
• Sometimes, a TLB entry or PTE prohibits access to a virtual page,
perhaps because no physical RAM has been allocated to that virtual
page.
• In this case the MMU signals a page fault to the CPU.
• OS tries to find a spare frame of RAM and set up a new PTE to map it
to the requested virtual address. If no RAM is free, it may be
necessary to choose an existing page, using some replacement
algorithm, and save it to disk (this is called "paging"). With some
MMUs, there can also be a shortage of PTEs or TLB entries, in which
case the OS will have to free one for the new mapping.
• In some cases a "page fault" may indicate a software bug. A key
benefit of an MMU is memory protection: an OS can use it to protect
against errant programs, by disallowing access to memory that a
particular program should not have access to. Typically, an OS
assigns each program its own virtual address space.
• An MMU also reduces the problem of fragmentation of memory. After
blocks of memory have been allocated and freed, the free memory
may become fragmented (discontinuous) so that the largest
contiguous block of free memory may be much smaller than the total
amount. With virtual memory, a contiguous range of virtual addresses
can be mapped to several non-contiguous blocks of physical memory.
Caching address translations
• Large translation tables require main
memory access.
• TLB(translation look aside buffer):
cache for address translation.
– Typically small.
Example of memory
management
• Memory region types:
– section: 1 Mbyte block;
– large page: 64 kbytes;
– small page: 4 kbytes.
• An address is marked as section-mapped
or page-mapped.
• Two-level translation scheme.
Example of address translation
Translation table
base register
descriptor
1st level table
1st index
2nd index
offset
concatenate
concatenate
descriptor
2nd level table
physical address
Zero-copy
• Describes computer operations in which the CPU does not perform
the task of copying data from one memory area to another.
• Zero-copy versions of operating system elements such as device
drivers, file systems, and network protocol stacks greatly increase
the performance of certain application programs and more efficiently
utilize system resources.
• Performance is enhanced by allowing the CPU to move on to other
tasks while data copies proceed in parallel in another part of the
machine.
• Also, zero-copy operations reduce the number of time-consuming
mode switches between user space and kernel space.
• System resources are utilized more efficiently since using a
sophisticated CPU to perform extensive copy operations, which is a
relatively simple task, is wasteful if other simpler system
components can do the copying.
Zero-copy
• Techniques for creating zero-copy software
include the use of DMA-based copying and
memory-mapping through an MMU. These
features require specific hardware support and
usually involve particular memory alignment
requirements.
• Zero-copy protocols have some initial overhead,
so avoiding programmed IO (PIO) makes sense
only for large messages.
Zero-copy
• Zero-copy protocols are especially important for
high-speed networks in which the capacity of a
network link approaches or exceeds the CPU's
processing capacity.
• In such a case the CPU spends nearly all of its
time copying transferred data, and thus
becomes a bottleneck which limits the
communication rate to below the link's capacity.
Direct memory access (DMA)
• DMA provides parallelism on bus by
controlling transfers without CPU.
I/O
memory
CPU
DMA
DMA
• Peripheral device controls a CPU memory bus
directly.
• DMA permits the peripheral, (eg. UART), to
transfer data to/from memory without having
each byte handled by the CPU
• DMA advantages:
– enables more efficient use of interrupts
– increases data throughput
– reduces hardware costs by eliminating the need for
peripheral specific FIFO buffers
DMA operation
• On some event (such as an incoming data-available signal
from a UART), CPU notifies a separate device called DMA
Controller.
• DMA Controller asserts DMA request signal to the CPU,
asking its permission to use the bus
• CPU completes its current bus activity and returns DMA
acknowledge signal to the DMA Controller.
• DMA controller reads/writes one or more memory bytes,
driving the address, data and control signals as if it were
CPU itself
• When complete, DMA Controller stops driving the bus and
deasserts DMA request signal
• CPU removes DMA acknowledge signal and resume
control of the bus
DMA operation
• CPU sets up DMA transfer:
–
–
–
–
Start address.
Length.
Transfer block length.
Style of transfer.
• DMA controller performs transfer, signals when
done
• DMA is essential to provide zero-copy
implementation
Remote DMA
• RDM is a direct memory access from the memory of one computer
into that of another without involving either one's operating system.
This permits high-throughput, low-latency networking, which is
especially useful in massively parallel computer clusters.
• RDMA supports zero-copy networking by enabling the network
adapter to transfer data directly to or from application memory,
eliminating the need to copy data between application memory and
the data buffers in the operating system.
• Such transfers require no work to be done by CPUs, caches, or
context switches, and transfers continue in parallel with other
system operations.
• When an application performs an RDMA Read or Write request, the
application data is delivered directly to the network, reducing latency
and enabling fast message transfer.