Transcript I/O devices
Lecture 13 Architectural Support for OS
I/O
Peng Liu
[email protected]
1
Last time in Lecture 12
• Virtual Memory
– Large Address Space for Each Executing Program
– Memory Management for Multiple Programs
2
Virtual Memory in a Nutshell Review
• Use hard disk (or Flash) as a large storage for data of all programs
– Main memory (DRAM) is a cache for the disk
– Managed jointly by hardware and the operating system (OS)
• Each running program has its own virtual address space
– Address space as shown in previous figure
– Protected from other programs
• Frequently-used portions of virtual address space copied to DRAM
– DRAM = physical address space
– Hardware + OS translate virtual addresses (VA) used by
program to physical addresses (PA) used by the hardware
– Translation enables relocation (DRAM disk) & protection
3
Terminology for Virtual Memory Review
• Virtual memory used DRAM as a cache for disk
• New terms
– VM block is called a “page”
• The unit of data moving between disk and DRAM
• It is typically larger than a cache block (e.g., 4KB or 16KB)
• Virtual and physical address spaces can be divided up to virtual
pages and physical pages (e.g., contiguous chunks of 4KB)
– VM miss is called a “page fault”
• If the valid bit for a virtual page is off, a page fault occurs.
• Once the operating system gets control, it must find the page in the
next level of the hierarchy and decide where to place the requested
page in main memory.
4
Page Faults (Similar to “Cache Misses”) Review
• What if an object is on disk rather than in memory?
– Page table entry indicates virtual address not in memory
– OS exception handler invoked to move data from disk into memory
• OS has full control over placement
• Full-associativity to minimize future misses
Before fault
Memory
Page Table
Virtual
Addresses
0
1
After fault
Physical
Addresses
Memory
Page Table
Virtual
Addresses
0
1
0
1
Physical
Addresses
0
1
CPU
CPU
P-1
P-1
N-1
N-1
Disk
Disk
5
Using Separate Addresses Spaces Per
Program
Review
• Each program has its own virtual address space and own page table
– Addresses 0x400000 from different programs can map to
different locations or same location as desired
– OS control how virtual pages as assigned to physical
memory
6
Translation Process Explained Review
• Valid page
– Check access rights (R,W,X) against access type
• Generate physical address if allowed
• Generate a protection fault (exception) if illegal access
• Invalid page
– Page is not currently mapped and a page fault is generated
• Faults are handled by the operating system
– Sometimes due to a program error => program terminated
• E.g. accessing out of the bounds of array
– Sometimes due to “caching” => refill & restart
• Desired data or code available on disk
• Space allocated in DRAM, page copied from disk, page table updated
• Replacement may be needed
7
VM: Replacement and Writes Review
• To reduce page fault rate, OS uses least-recently used
(LRU) replacement
– Reference bit (aka use bit) in PTE set to 1 on access to page
– Periodically cleared to 0 by OS
– A page with reference bit = 0 has not been used recently
• Disk writes take millions of cycles
–
–
–
–
Block at once, not individual locations
Write through is impractical
Use write-back
Dirty bit in PTE set when page is written
8
Fast Translation Using a TLB Review
9
Recap: Virtual Memory
Review
• Use hard disk ( or Flash) as large storage for data of all programs
– Main memory (DRAM) is a cache for the disk
– Managed jointly by hardware and the operating system (OS)
• Each running program has its own virtual address space
– Address space as shown in previous figure
– Protected from other programs
• Frequently-used portions of virtual address space copied to DRAM
– DRAM = physical address space
– Hardware + OS translate virtual addresses (VA) used by
program to physical addresses (PA) used by the hardware
– Translation enables relocation & protection
10
Page Table Issues & Solutions
• Page table access latency
– Use TLB to cache page table entries close to processor
– Limit size of L1 cache to overlap TLB & cache accesses
• TLB coverage
– Larger pages
– Multi-level TLBs
• Page table size
– Multi-level page tables
11
Architectural Support Operating Systems (OS)
• Operating system (OS)
– Manages hardware resources
• Processor, main memory (DRAM), I/O devices
– Provides virtualization
• Each program thinks it has exclusive access to resources
– Provides protection, isolation, and sharing
• Between user programs and between programs and OS
• Examples of operating systems
– Windows (XP, Vista, Win7), MacOS, Linux, BSD Unix, Solaris..
– Symbian, Windows Mobile, Android
12
Processes
• Definition: a process is an instance of a running program
• Process provides each program with two key abstractions
– Logical control flow
• Illusion of exclusive use of the processor
• Private set of register values (including PC)
– Private address space
• Illusion of exclusive use of main memory of “infinte” size
• How are these illusion maintained?
– Process execution is interleaved (multitasking)
• On available processors
– Address space is managed by virtual memory system
13
Execution of Processes
• Each process has its own logical flow control
– A & B are concurrent processes, B & C are sequential
– Control flows for concurrent processes are physically
disjoint in time
– However, can think of concurrent processes are running in
parallel
14
Process Address Space
0xFFFF_FFFF
• Each process has its own
private address space
Kernel (OS) memory
(code, data, heap, stack)
0x8000_0000
memory
invisible to
user code
User stack
(created at runtime)
$sp (stack pointer)
• Use processes cannot access
top region of memory
– Used by OS kernel
– The kernel is shared core of
the OS
brk
run-time heap
(managed by malloc)
Read/write segment
(.data,.bss)
0x1000_0000
Read-only segment
(.text)
Loaded from the
executable file
0x0040_0000
0x0000_0000
unused
15
Supporting Concurrent Processes
through Context Switching
• Processes are managed by the OS kernel
– Important: the kernel is not a separate process, but rather
runs as part of some user process
• Control flow passes from one process to another via a
context switch
• Time between context switches 10-20ms
16
HW Support for the OS
• Mechanisms to protect OS from processes
– Modes + virtual memory
• Mechanisms to switch control flow between OS –
processes
– System calls + exceptions
• Mechanisms to protect processes from each other
– Virtual memory
• Mechanisms to interact with I/O devices
– Primarily memory-mapped I/O
17
Hardware Modes
(Protecting OS from Processes)
• 2 modes are needed, but some architectures have more
• User mode: Used to run users processes
– Accessible instructions: user portion of the ISA
• The instructions we have seen so far
– Accessible state: user registers and memory
• But virtual memory translation is always on
• Cannot access EPC, Cause, … registers
– Exceptions and interrupts are always on
• Kernel mode: used to run (most of) the kernel
– Accessible instructions: the whole ISA
• User + privileged instructions
– Accessible state: everything
– Virtual memory, exceptions, and interrupts may be turned off
18
Altering the Control Flow: System Calls
• So far, we have branches and jumps
– They implement control flows within a program
• Expected switch to the OS: system call instruction (syscall)
– A jump (or function call) to OS code
• E.g., in order to perform a disk access
• Also similar to a program-initiated exception
– Switches processor to kernel mode & disables interrupts
– Jump to a pre-defined place in the kernel
• Returning from a syscall: use the eret instruction (privileged)
– Switch to user mode & enable interrupts
– Jump to program counter indicated by EPC
19
Altering Flow Control: Exceptions & Interrupts
• Exceptions & interrupts implement unexpected switch to
OS
– Exceptions: due to a problem in the program
• E.g., divide by zero, illegal instructions, page fault
– Interrupts: due to an external event
• I/O event, hitting clt+c, periodic timer interrupt
• Exceptions & interrupts operate similar to system calls
–
–
–
–
Set EPC & Cause registers
Switch to kernel mode, turn off interrupts
Jump to predefined part of the OS
Return to user program using eret instruction
20
Dealing with Switches to the OS
• A syscall, exception, or interrupt transfers control to the
OS kernel
• Kernel action (stating in the exception handling code)
– Examine issue & correct if possible
– Return to the same process or to another process
21
A Simple Exception Handler
• Can’t survive nested exceptions
– Since does not re-enable interrupts
• Does no use any user registers
– No need to save them
22
Exception Example #1
• Memory Reference
– User writes to memory location
– That virtual page is currently on disk
– OS must load page in memory
• Updated page table & TLB
– Return to faulting instruction
– Successful on second try
23
Exception Examples #2
• Memory Reference
– User writes to memory location
– Address is not valid or wrong access rights
– OS decides to terminate the process
• Sends SIGSEG signal to user process
• User process exits with “segmentation fault”
– OS switches to another active process
24
Using Virtual Memory & Exceptions: Page
Sharing
• Example: Unix fork() system call
– Creates a 2nd process which is a clone of the current one
– How can we avoid copying all the address space
• Expensive in time and space
• Idea: share pages in physical memory
– Each process gets a separate page table (separate virtual
space)
– But both PTs point to the same physical pages
– Saves time and space on fork()
• Same approach to support any type of sharing
– Two processes running same program
– Two processes that share data
25
Page Sharing
26
Input/Output
27
Diversity of Devices
• I/O devices can be characterized by
– Behavior: input, output, storage
– Partner: human or machine
– Data rate: bytes/sec, transfers/sec
28
I/O System Design Goals
• Performance measures
– Latency (response time)
– Throughput (bandwidth)
• Dependability is important
– Resilience in the face of failures
– Particularly for storage devices
• Expandability
• Computer classes
– Desktop: response time and diversity of devices
– Server: throughput, expandability, failure resilience
– Embedded: cost and response time
29
Throughput vs. Response time
• Throughput
– Aggregate measure of amount of data moved per unit time,
averaged over a window
• Measure in bytes/sec or transfers/sec
– Sometimes referred to as bandwidth
• Examples: Memory bandwidth, disk bandwidth
• Response time
– Response time to do a single I/O operation
• Measured in seconds or cycles
– Sometimes referred to as latency
• Example: Write a block of bytes to disk
• Example: Send a data packet over the network
30
Throughput – Response Time Tradeoffs
• Throughput is the number of tasks completed by server
per unit time
• Response time is the elapsed time between tasks
entering queue and tasks completed by the server
• Tradeoff between throughput and response time
– Highest possible throughput is when the server is always
busy and the queue is never empty
– Fastest response time is when the queue is empty and the
server is idle when the task arrives
31
Magnetic Hard Disks
• Characteristics
– Long term, nonvolatile storage
– Large, inexpensive, but slow (mechanical device)
• Usage
– Virtual memory (swap area)
– File system
32
Disk Storage
• Nonvolatile, rotating magnetic storage
33
Hard Disk
• Basic operation
– Rotating platter coated with magnetic surface
– Moving read/write head used to access the disk
• Features of hard disks
–
–
–
–
Platters are rigid (ceramic or metal)
High density since head can be controlled more precisely
High data rate with higher rotational speed
Can include multiple platters
• Incredible improvements
–
–
–
–
Examples of I/O device performance being technology driven
Capacity: 2x every year
Transfer rate: 1.4x every year
Price approaching 1$/GB = 109 bytes
34
Hard Disk Organization
• >=1 platter per disk with disk heads on both sides
• Platters are divided in concentric tracks
• Each track is divided in sectors
– Unit of transfer to/from disk (i.e., disk block)
– Some disks have a constant number of sectors per track
– Others keep constant bit density which places more sectors
on outer track
• A common track across multiple platters is referred to as
cylinder
35
Measuring Disk Access Time
• Each read or write has three major components
– Seek time is the time to position the arm over the proper
track
– Rotational latency is the wait for the desired sector to rotate
under the read/write head
– Transfer time is the time required to transfer a block of bits
(sector) under the read/write head
• Note that these represent only the “raw performance” of
the disk
– Other components: I/O bus, controllers, other caches,
interleaving, etc.
36
Seek Time
• Seek time is the time to position the arm over the proper
track
• Average seek time is the time it takes to move the
read/write head from its current position to a track on the
disk
• Industry definition is that seek time is the time for all
possible seeks divided by the number of possible seeks
– In practice, locality reduces this to 25-33% of this number
– Note that some manufacturers report minimum seek times
rather average seek times
37
Rotational Latency
• Rotational latency is the time spent waiting for the desired sector to
rotate under the read/write head
• Based upon the rotational speed, usually measured in revolutions
per minute (RPM)
• Average rotational latency
– Average rotational latency = 0.5 rotation / RPM
• Example: 7200 RPM
AverageRotationalLatency
0.5rotation
0.5rotation
4.2ms
7200 RPM 7200 RPM /(60sec/ min)
38
Transfer Time
• Transfer time is the time required to transfer a block of
bits
• A factor of the transfer size, rotational speed, and
recording density
– Transfer size is usually a sector
• Most drives today use caches to help buffer the effects of
seek time and rotational latency
39
Typical Hard Drive
• Typical hard disk drive
–
–
–
–
–
–
–
–
Rotation speed:3600,5200,7200, or 15000 RPM
Tracks per surface:500-2000 tracks
Sectors per track:32-128 sectors
Sectors size: 512B-1024KB
Minimum seek time is often approximately 0.1ms
Average seek time is often approximately 5-10ms
Access time is often approximately 9-10ms
Transfer rate is often 50-200MB/s
40
Average Access Example
• Consider the Seagate 36.4GB Ultra SCSI
–
–
–
–
–
Rotation speed:10000RPM
Sector size: 512B
Average seek time : 5.7 ms
Transfer rate: 24.5 MB/s
Controller overhead of 1 ms
• What is the average read time?
41
I/O System Design
• Satisfying latency requirement
– For time-critical operations
– If system is unloaded
• Add up latency of components
• Maximizing throughput at steady state (loaded system)
– Find “weakest link” (lowest-bandwidth component)
– Configure to operate at its maximum bandwidth
– Balance remaining components in the system
46
Buses
• A bus is a shared communication link that connects
multiple devices
• Single set of wires connects multiple “subsystems” as
opposed to a point to point link which only two
components together
• Wires connect in parallel, so 32 bit bus has 32 wires of
data
47
Advantages/Disadvantages
• Advantages
– Broadcast capability of shared communication link
– Versatility
• New device can be added easily
• Peripherals can be moved between computer systems that use the same bus
standard
– Low Cost
• A single set of wires is shared multiple ways
• Disadvantages
– Communication bottleneck
• Bandwidth of bus can limit the maximum I/O throughput
– Limited maximum bus speed
• Length of the bus
• Number of devices on the bus
• Need to support a range of devices with varying latencies and transfer rates
48
Bus Organization
• Bus Components
– Control Lines
• Signal begin and end of transactions
• Indicate the type of information on the data line
– Data Lines
• Carry information between source and destination
• Can include data, addresses, or complex commands
• Processor-memory bus or front-side bus or system bus
– Short, high-speed bus
– Connects memory and processor directly
– Designed to match the memory system and achieve the maximum memoryto-processor bandwidth (cache transfers)
– Designed specifically for a given processor/memory system (proprietary)
• I/O Bus (or peripheral bus)
–
–
–
–
Usually long and slow
Connect devices to the processor-memory bus
Must match a wide range of I/O device performance characteristics
Industry standard
49
Synchronous versus Asynchronous
• Synchronous Bus
– Includes a clock in control lines
– Fixed protocol for communication relative to the clock
– Advantages
• Involves very little logic and can therefore run very fast
– Disadvantages
• Every decision on the bus must run at the same clock rate
• To avoid clock skew, bus cannot be long if it is fast
– Processor-memory bus
• Asynchronous Bus
–
–
–
–
No clock control line
Can easily accommodate a wide range of devices
No clock skew problems, so bus can be quite long
Requires handshaking protocol
50
Increasing Bus Bandwidth
• Several factors account for bus bandwidth
– Wider bus width
• Increasing data bus width => more data per bus cycle
• Cost: More bus lines
– Separate address and data lines
• Address and data can be transmitted in one bus cycle if separate
address and data lines are available
• Costs: More bus lines
51
Increasing Bus Bandwidth
• Several factors account for bus bandwidth
– Block transfers
•
•
•
•
Transfer multiple words in back-to-back bus cycles
Only one address needs to be sent at the start
Bus is not released until the last word is transferred
Costs: Increased complexity and increased response time for
pending requests
52
Increasing Bus Bandwidth
– Spilt transaction “pipelining the bus”
• Free the bus during time between request and data transfer
• Costs: Increased complexity and higher potential latency
Spilt transaction bus with separate address and data wires
53
Accessing the Bus
• How is the bus reserved by a device that wishes to use it?
• Master-slave arrangement
– Only the bus master can control access to the bus
– The bus master initiates and controls all bus requests
– A slave responds to read and write requests
• A simple system
– Processor is the only bus master
– All bus requests must be controlled by the processor
– Major drawback is the processor must be involved in every
transfer
54
Multiple Masters
• With multiple masters, arbitration must be used so that
only one device is granted access to the bus at a given
time
• Arbitration
– The bus master wanting to use the bus asserts a bus request
– The bus master cannot use the bus until the request is
granted
– The bus master must signal the arbiter when finished using
the bus
• Bus arbitration goals
– Bus priority – highest priority device should be serviced first
– Fairness – Lowest priority devices should not starve
55
Centralized Parallel Arbitration
• Advantages
– Centralized control where all devices submit request
– Any fair priority scheme can be implemented (FCFS, roundrobin)
• Disadvantages
– Potential bottleneck at central arbiter
56
Operating System Tasks
• The OS is a resource manager and acts ad the interface
between I/O hardware and programs that request I/O
• Several important characteristics of the I/O system:
– I/O system is shared by multiple programs
– I/O systems often use interrupts to notify CPU
• Interrupts = externally generated “exceptions”
• Typically “Input available” or “Output complete” messages
• OS handles interrupts by transferring control to kernel mode
– Low-level control of an I/O device is complex
• Managing a set of concurrent events
• Requirements for correct device control are very detailed
• I/O device drivers are the most common area for bugs in an OS!
57
OS Communication
• The operating system should prevent user program from
communicating with I/O device directly
– Must protect I/O resources to keep sharing fair
– Protection of shared I/O resources cannot be provided if user
programs could perform I/O directly
• Three types of communication are required
– OS must be able to give commands to I/O devices
– I/O device must be able to notify OS when I/O device has
completed and operation or has encountered an error
– Data must be transferred between memory and an I/O device
58
I/O Commands:
A method for addressing a device
• Memory-mapped I/O:
– Portions of the address space are assigned to each I/O
device
– I/O addresses correspond to device registers
– User programs prevented from issuing I/O operations directly
since I/O address space is protected by the address
translation mechanism
59
I/O Commands
• I/O devices are managed by I/O controller hardware
– Transfers data to/from device
– Synchronizes operation with software
• Command registers
– Cause device to do something
• Status registers
– Indicate what the device is doing and occurrence of errors
• Data registers
– Write: transfer data to a device
– Read: transfer data from a device
60
Communicating with the CPU
• Method #1: Polling
– I/O device places information in a status register
– The OS periodically checks the status register
– Whether polling is used dependent upon whether the device
can initiate I/O independently
• For instance, a mouse works well since it has a fixed I/O rate and
initiates its own data (whenever it is moved)
• For others, such as disk access, I/O only occurs under the control of
the OS, so we poll only when the OS knows it is active
– Advantages
• Simple to implement
• Processor is in control and does the work
– Disadvantage
• Polling overhead and data transfer consume CPU time
61
Polling and Programmed I/O
62
I/O Notification
• Method #2: I/O Interrupt
– When an I/O device needs attention, it interrupts the
processor
– Interrupt must tell OS about the event and which device
• Using “cause” register(s): Kernel “asks” what interrupted
• Using vectored interrupts: A different exception handler for each
– I/O interrupts are asynchronous events, and happen anytime
• Processor waits until current instruction is completed
– Interrupts may have different priorities
– Advantages: Execution is only halted during actual transfer
– Disadvantage: Software overhead of interrupt processing
63
I/O Interrupts
64
Data Transfer
• The third component to I/O communication is the transfer
of data from the I/O device to memory (or vice versa)
• Simple approach: “Programmed” I/O
– Software on the processor moves all data between memory
addresses and I/O addresses
– Simple and flexible, but wastes CPU time
– Also, lots of excess data movement in modern systems
• Mem->Network->CPU->Network->graphics
• So need a solution to allow data transfer to happen
without the processor’s involvement
65
Delegating I/O : DMA
• Direct Memory Access (DMA)
– Transfer blocks of data to or from memory without CPU
intervention
– Communication coordinated by the DMA controller
• DMA controllers are integrated in memory or I/O controller chips
– DMA controller acts as a bus master, in bus-based systems
• DMA steps
– Processor sets up DMA by supplying
• Identify of the device and the operation (read/write)
• The memory address for source/destination
• The number of bytes to transfer
– DMA controller starts the operation by arbitrating for the bus
and then starting the transfer when the data is ready
– Notify the processor when the DMA transfer is complete or
on error
• Usually using an interrupt
66
DMA Problems: Virtual vs. Physical Addresses
• If DMA uses physical addresses
– Memory access across physical page boundaries may not
correspond to contiguous virtual pages (or even the same
application)
• Solution1: <1 page per DMA transfer
• Solution: chain a series of 1-page requests provided by the OS
– Single interrupt at the end of the last DMA request in the
chain
• Solution2: DMA engine uses virtual addresses
– Multi-page DMA requests are now easy
– A TLB is necessary for the DMA engine
• For DMA with physical addresses: pages must be pinned in DRAM
– OS should not page to disks pages involved with pending I/O
67
I/O Summary
• I/O performance has to take into account many variables
– Response time and throughput
– CPU, memory, bus, I/O device
– Workload e.g. transaction processing
• I/O devices also span a wide spectrum
– Disks, graphics, and networks
• Buses
– Bandwidth
– Synchronization
– Transactions
• OS and I/O
– Communication: Polling and Interrupts
– Handling I/O outside CPU: DMA
69