Transcript lec10

Systems Architecture II
(CS 282-001)
Lecture 10: Interfacing I/O Devices to Memory, Processor,
and Operating System *
Jeremy R. Johnson
Monday, August 6, 2001
*This lecture was derived from material in the text (Chap. 8).
All figures from Computer Organization and Design: The
Hardware/Software Approach, Second Edition, by David Patterson and
John Hennessy, are copyrighted material (COPYRIGHT 1998 MORGAN
KAUFMANN PUBLISHERS, INC. ALL RIGHTS RESERVED).
August 6, 2001
Systems Architecture II
1
Introduction
• Objective: To learn how an I/O device communicates with a
user program?
– How is a user I/O request transformed into a device command and
communicated to the device?
– How is data actually transferred to or from a memory location?
– What is the role of the operating system?
• Topics
– Role of the OS
– Giving commands to the I/O system
• I/O commands
• Memory mapped I/O
– Communicating with the processor
• polling
• interrupts
– Transferring data between a device and memory
• Direct Memory Access (DMA)
– Designing an I/O system
August 6, 2001
Systems Architecture II
2
Characteristics of I/O
• The responsibility of the OS arise from three characteristics
of I/O systems:
– The I/O system is shared by multiple programs using the processor.
– I/O systems often use interrupts (externally generated exceptions) to
communicate information about I/O operations. Because interrupts
cause transfer to kernel (or supervisor) mode, they must be handled
by the OS.
– The low-level control of an I/O device is complex because it requires
managing a set of concurrent events and because the requirements
for correct device control are often very detailed.
August 6, 2001
Systems Architecture II
3
Functions of the OS
• The OS guarantees that a user’s program accesses only the
portions of an I/O device to which the user has rights.
• The OS provides abstractions for accessing devices by
supplying routines that handle low-level device operations
• The OS handles the interrupts generated by I/O devices
• The OS tries to provide equitable access to the shared I/O
resources, as well as schedule accesses in order to
enhance system throughput
August 6, 2001
Systems Architecture II
4
Types of Communication Required
• The OS must be able to give commands to the I/O device
(e.g. read, write, disk seek, etc.)
• The device must be able to notify the OS when the I/O
device has completed an operation or has encountered an
error.
• Data must be transferred between memory and an I/O
device.
August 6, 2001
Systems Architecture II
5
Giving Commands to I/O Devices
• Dedicated I/O instructions (e.g. Intel 80x86)
– command and device number specified in the instruction
– processor communicates the device address via a set of wires
included as part of the I/O bus
– illegal to execute while in user mode
• Memory-mapped I/O
–
–
–
–
–
Portions of the address space are assigned to I/O devices
commands and data are written to special addresses
data and status info read from special addresses
Memory system ignores operation (determined by address)
I/O controller, sees the operation, and transmits it to the device
August 6, 2001
Systems Architecture II
6
Communicating with the Processor
• Polling
– Simplest way for an I/O device to communicate with the processor
– I/O device simply puts information in a status register, and the
processor must come and get the information
– Periodically check status bits to see if it is time for the next I/O
operation
• Interrupt-driven I/O
– The disadvantage of polling is that it wastes a lot of time.
– When a device wants to notify the processor that it has completed
some operation or that it needs attention, it causes the processor to
be interrupted
– An interrupt is similar to an exception, except
• it is asynchronous with respect to instruction execution
• the processor must be notified of the device causing the interrupt
• interrupts must be prioritized according to the devices that caused them
August 6, 2001
Systems Architecture II
7
Overhead of Polling
• Determine impact of polling on three different devices:
– Assume 400 cycles for polling operation and a 500 MHz clock
– Determine fraction of CPU time consumed in the following 3 cases
(assume that you poll often enough so that no data is lost and that the
devices are potentially always busy)
– Mouse must be polled 30 times per second
–
– Floppy disk transfers data to processor in 16-bit units and has a
transfer rate of 50 KB/sec
– Hard disk drive transfers data in 4 word chunks and can transfer at
4 MB/sec
August 6, 2001
Systems Architecture II
8
Overhead of Polling
1 Mouse: 30 accesses per second
– 30  400 = 12,000 cycles per second for polling
– Fraction of processor clock cycles = (12  103)/(500  106 ) = 0.002%
2 Floppy Drive:
50KB / sec
 25K accesses / sec
2bytes / polling access
– 25K  400 cycles per second for polling
– Fraction of processor clock cycles = (10  106)/(500  106 ) = 2%
3 Hard Drive:
4MB / sec
 250 K accesses / sec
16bytes / polling access
– 250K  400 cycles per second for polling
– Fraction of processor clock cycles = (100  106)/(500  106 ) = 20%
August 6, 2001
Systems Architecture II
9
Transferring Data between a Device
and Memory
• Using polling
– Initiate transfer and periodically check for completion
– Periodically check for updates from device (e.g. mouse)
• Interrupt-driven
– OS initiates transfer and waits for interrupt to indicate that the transfer
has completed or an error has occurred
– OS still transfers data is small chunks and must communicate through
interrupts many times during the complete I/O operation
• Direct Memory Access (DMA)
– Also interrupt-driven, but in this case the transfer is controlled by the
device without intervention by the OS (interrupt occurs only when
entire transfer is complete or an error occurs)
– Appropriate for high-bandwidth devices with relatively large blocks of
data
August 6, 2001
Systems Architecture II
10
Overhead of Interrupt-Driven I/O
• Assume hard disk drive transfers data in 4 word chunks
and can transfer at 4 MB/sec
– 500 MHz clock
– Overhead of transfer including interrupt is 500 cycles
– Hard drive is transferring data only 5% of the time
• Interrupt rate when the disk is busy is the same as polling
– 250K  500 = 125  106 cycles per second for disk
– Fraction of processor clock cycles = (125  106)/(500  106 ) = 25%
• Assuming that the disk is only transferring data 5% of the
time
– Fraction of processor clock cycles = 25%  5% = 1.25%
– Compare to polling - the absence of overhead when the disk is not
active is the major advantage of an interrupt-driven interface
August 6, 2001
Systems Architecture II
11
Overhead of DMA
• Assume hard disk drive transfers data in 4 word chunks
and can transfer at 4 MB/sec
–
–
–
–
•
500 MHz clock
Assume transfer with DMA and initial DMA setup takes 1000 cycles
Overhead of interrupt at completion is 500 cycles
If the average transfer is 8KB, what fraction of the CPU is consumed if
the disk is active 100% of the time (ignore processor/DMA controller
bus contention)
8KB
3
Each DMA transfer takes: 4MB / sec  2 10 sec
– Cycles/sec for disk =
1000  500cycles/ transfer
 750 103 clock cycles / sec
3
2 10 sec / transfer
– Fraction of processor clock cycles = (750  103)/(500  106 ) = 0.15%
August 6, 2001
Systems Architecture II
12
Issues with DMA
• With DMA there is another path to memory
• This provides difficulties with virtual memory and cache
– Should physical or virtual addresses be used?
– If virtual, the DMA unit, must translate to physical addresses
– If physical must ensure that addresses don’t cross page boundaries
(otherwise memory addresses would not be contiguous)
– Can break transfer into a sequence of page size transfers
– OS must not remap memory during DMA transfer
– The value of a memory location as seen by DMA and the processor
may differ
– stale data or coherency problem (value in cache different from
memory). Solved by routing through cache or cache flushing
August 6, 2001
Systems Architecture II
13
Designing an I/O System
• Design I/O system that ensures that latency is bounded by
a certain amount.
• Design I/O system to meet a set of bandwidth constraints
given a workload
August 6, 2001
Systems Architecture II
14
Designing an I/O System
• Consider the following system:
–
–
–
–
300 MHz CPU
50,000 instructions in OS per I/O operation
A memory backplane bus capable of a transfer rate of 100MB/sec
SCSI-2 controllers with a transfer rate of 20MB/sec and
accommodating up to seven disks
– Disk drives with read/write bandwidth of 5MB/sec and an avg. seek
plus rotational latency of 10ms
• If the workload consists of 64-KB reads and the user
program needs 100,000 instructions per I/O operation, find
the maximum sustainable I/O rate and the number of disks
and SCSI controllers required (ignore disk conflicts).
August 6, 2001
Systems Architecture II
15
Designing an I/O System
• To find max I/O rate, find rate for two fixed components to
determine which is the bottleneck
• Max I/O rate of CPU
Instructio n execution rate
300 106

 2000 I/Os / sec
Instructio ns per I/O
(50  100) 103
• Max I/O rate of bus
Bus bandwidth 100 106

 1562 I/Os / sec
Bytes per I/O
64 103
• To determine the number of disks, we need to know the
time per I/O operation
– 10ms + 64KB/5 MB/sec = 22.8 ms
– 1000/22.8 = 43.9 I/Os per sec
– 1562/43.9 = 36 disks
August 6, 2001
Systems Architecture II
16
Designing an I/O System
• To compute the number of SCSI buses, we need to know
the transfer rate
– Transfer size/Transfer time = 64KB/22.8 ms = 2.74 MB/sec
– Assume that disk accesses are not clustered so that we can use the
full bandwidth of the bus
– 2.74  7 = 19.18, so we can use seven disks per SCSI bus
• This calculation required several simplifying assumptions,
in practice, where this is not the case, simulation is used.
August 6, 2001
Systems Architecture II
17