EECC722 - Shaaban

Download Report

Transcript EECC722 - Shaaban

Storage System Issues
•
•
•
•
Designing an I/O System
ABCs of UNIX File Systems
I/O Benchmarks
Comparing UNIX File System Performance
EECC722 - Shaaban
#1
Lec # 9
Fall2000 10-11-2000
Designing an I/O System
• When designing an I/O system, the components that make
it up should be balanced.
• Six steps for designing an I/O systems are
–
–
–
–
List types of devices and buses in system
List physical requirements (e.g., volume, power, connectors, etc.)
List cost of each device, including controller if needed
Record the CPU resource demands of device
• CPU clock cycles directly for I/O (e.g. initiate, interrupts,
complete)
• CPU clock cycles due to stalls waiting for I/O
• CPU clock cycles to recover from I/O activity (e.g., cache
flush)
– List memory and I/O bus resource demands
– Assess the performance of the different ways to organize these
devices
EECC722 - Shaaban
#2
Lec # 9
Fall2000 10-11-2000
Example: Determining the I/O Bottleneck
• Assume the following system components:
–
–
–
–
–
500 MIPS CPU
16-byte wide memory system with 100 ns cycle time
200 MB/sec I/O bus
20 20 MB/sec SCSI-2 buses, with 1 ms controller overhead
5 disks per SCSI bus: 8 ms seek, 7,200 RPMS, 6MB/sec
• Other assumptions
– All devices used to 100% capacity, always have average values
– Average I/O size is 16 KB
– OS uses 10,000 CPU instr. for a disk I/O
• What is the average IOPS? What is the average
bandwidth?
EECC722 - Shaaban
#3
Lec # 9
Fall2000 10-11-2000
Example: Determining the I/O Bottleneck
• The performance of I/O systems is determined by the
portion with the lowest I/O bandwidth
–
–
–
–
–
CPU : (500 MIPS)/(10,000 instr. per I/O) = 50,000 IOPS
Main Memory : (16 bytes)/(100 ns x 16 KB per I/O) = 10,000 IOPS
I/O bus: (200 MB/sec)/(16 KB per I/O) = 12,500 IOPS
SCSI-2: (20 buses)/((1 ms + (16 KB)/(20 MB/sec)) per I/O) = 11,120 IOPS
Disks: (100 disks)/((8 ms + 0.5/(7200 RPMS) + (16 KB)/(6 MB/sec)) per I/0)
= 6,700 IOPS
• In this case, the disks limit the I/O performance to 6,700
IOPS
• The average I/O bandwidth is
– 6,700 IOPS x (16 KB/sec) = 107.2 MB/sec
EECC722 - Shaaban
#4
Lec # 9
Fall2000 10-11-2000
OS Policies and I/O Performance
• Performance potential determined by HW: CPU, Disk,
bus, memory system.
• Operating system policies can determine how much of
that potential is achieved.
• OS Policies:
1) How much main memory allocated for file cache?
2) Can boundary change dynamically?
3) Write policy for disk cache.
• Write Through with Write Buffer
• Write Back
EECC722 - Shaaban
#5
Lec # 9
Fall2000 10-11-2000
Network Attached Storage
Decreasing Disk Diameters
14" » 10" » 8" » 5.25" » 3.5" » 2.5" » 1.8" » 1.3" » . . .
high bandwidth disk systems based on arrays of disks
Network provides
well defined physical
and logical interfaces:
separate CPU and
storage system!
High Performance
Storage Service
on a High Speed
Network
Network File Services
OS structures
supporting remote
file access
3 Mb/s » 10Mb/s » 50 Mb/s » 100 Mb/s » 1 Gb/s » 10 Gb/s
networks capable of sustaining high bandwidth transfers
Increasing Network Bandwidth
EECC722 - Shaaban
#6
Lec # 9
Fall2000 10-11-2000
ABCs of UNIX File Systems
• Key Issues
–
–
–
–
File vs. Raw I/O
File Cache Size Policy
Write Policy
Local Disk vs. Server Disk
• File vs. Raw:
– File system access is the norm: standard policies apply
– Raw: alternate I/O system to avoid file system, used by data bases
• File Cache Size Policy
– Files are cached in main memory, rather than being accessed from
disk
– With older UNIX, % of main memory dedicated to file cache is
fixed at system generation (e.g., 10%)
– With new UNIX % of main memory for file cache varies depending
on amount of file I/O (e.g., up to 80%)
EECC722 - Shaaban
#7
Lec # 9
Fall2000 10-11-2000
ABCs of UNIX File Systems
• Write Policy
– File Storage should be permanent; either write immediately
or flush file cache after fixed period (e.g., 30 seconds)
– Write Through with Write Buffer
– Write Back
– Write Buffer often confused with Write Back
• Write Through with Write Buffer, all writes go to disk
• Write Through with Write Buffer, writes are asynchronous,
so processor doesn’t have to wait for disk write
• Write Back will combine multiple writes to same page; hence
can be called Write Cancelling
EECC722 - Shaaban
#8
Lec # 9
Fall2000 10-11-2000
ABCs of UNIX File Systems
• Local vs. Server
– Unix File systems have historically had different policies
(and even file systems) for local client vs. remote server
– NFS local disk allows 30 second delay to flush writes
– NFS server disk writes through to disk on file close
– Cache coherency problem if allow clients to have file
caches in addition to server file cache
• NFS just writes through on file close
Stateless protocol: periodically get new copies of file blocks
• Other file systems use cache coherency with write back to
check state and selectively invalidate or update
EECC722 - Shaaban
#9
Lec # 9
Fall2000 10-11-2000
Network File Systems
Application Program
UNIX System Call Layer
Virtual File System Interface
remote
accesses
NFS Client
UNIX File System
local
accesses
Block Device Driver
Network Protocol Stack
UNIX System Call Layer
UNIX System Call Layer
Virtual File System Interface
Virtual File System Interface
NFS File System
Server Routines
RPC/Transmission Protocols
RPC/Transmission Protocols
Client
Server
Network
EECC722 - Shaaban
#10 Lec # 9
Fall2000 10-11-2000
Typical File Server Architecture
Kernel NFS Protocol & File Processing
TCP/IP Protocols Unix File System
Ethernet
NFS
Request
Ethernet
Driver
Single Pr ocessor File Ser ver
Primary
M emory
Disk M anager
& Driver
Backplane Bus
Disk
Controller
...
• Limits to performance: data copying
– Read data staged from device to primary memory
– Copy again into network packet templates
– Copy yet again to network interface
• Normally no special hardware for fast processing between
network and disk.
EECC722 - Shaaban
#11 Lec # 9
Fall2000 10-11-2000
AUSPEX NS5000 File Server
• Special hardware/software architecture for high
performance NFS I/O
• Functional multiprocessing
I/O buffers
UNIX
frontend
Primary
Primary
M emory
M emory
Host
Processor
Host
M emory
Single Board
Computer
Enhanced
VM E Backplane
Ethernet
Ethernet
Processor
Processor
specialized for
protocol processing
File
File
Processor
Processor
Storage
Processor
Independent File
System
...
dedicated FS
software
1
2
10
Parallel
SCSI Channels
manages 10 SCSI
channels
EECC722 - Shaaban
#12 Lec # 9
Fall2000 10-11-2000
AUSPEX Software Architecture
Unix System Call Layer
VFS Interface
NFS Client
LFS Client
Ethernet Processor
LFS Client
NSF Server
Protocols
Network I/F
File Processor
LFS Server
File System Server
Primary M emory
Host Processor
Storage Processor
Disk Arrays
Ethernet
Limited
control
interfaces
Primary data flow
Primary control flow
EECC722 - Shaaban
#13 Lec # 9
Fall2000 10-11-2000
Berkeley RAID-II Disk Array File Server
FDDI Network
TM C
IOP Bus
to
UltraNet
TM C
HiPPIS
HiPPI
TM C
HiPPID
X-Bus
Board
X-Bus
Board
IOPB In
IOPB Out
8 Port Interleaved
8 Port(128
Interleaved
M emory
M Byte)
M emory (128 M Byte)
File
Server
XOR
XOR
8 x 8 x 32-bit
8Crossbar
x 8 x 32-bit
Crossbar
VM E
VM E
VM E VM E VM E VM E
VM E VM E VM E VM E
Low latency transfers mixed with high
bandwidth transfers
Application area:
"Diskless Supercomputers"
VM E
Control
Bus
ATC
ATC 5 SCSI
ATC
5 SCSIChannels
ATC 5 SCSI
Channels
5 SCSIChannels
Channels
to 120 disk drives
EECC722 - Shaaban
#14 Lec # 9
Fall2000 10-11-2000
I/O Performance Metrics: Throughput:
• Throughput is a measure of speed—the rate at which the
storage system delivers data.
• Throughput is measured in two ways:
• I/O rate, measured in accesses/second:
– I/O rate is generally used for applications where the size of each
request is small, such as transaction processing
• Data rate, measured in bytes/second or megabytes/second
(MB/s).
– Data rate is generally used for applications where the size of each
request is large, such as scientific applications.
EECC722 - Shaaban
#15 Lec # 9
Fall2000 10-11-2000
I/O Performance Metrics: Response time
• Response time measures how long a storage system takes
to access data. This time can be measured in several
ways. For example:
– One could measure time from the user’s perspective,
– the operating system’s perspective,
– or the disk controller’s perspective, depending on
what you view as the storage system.
EECC722 - Shaaban
#16 Lec # 9
Fall2000 10-11-2000
Capacity:
•
•
I/O Performance Metrics
How much data can be stored on the storage system.
Capacity is not normally applied as a metric to non-storage components
of a computer system, but it is an integral part of evaluating an I/O
system.
Reliability:
•
I/O systems require a reliability level much higher than other parts of a
computer.
– If a memory chip develops a parity error, the system will (hopefully)
crash and be restarted.
– If a storage device develops a parity error in a database of bank
accounts, however, bankscould unwittingly lose billions of dollars.
Thus, reliability is a metric of great importance to storage
COST:
• Applies to all components in computer systems.
• Disk subsystems are often the most expensive component in a large
computer installation.
EECC722 - Shaaban
#17 Lec # 9
Fall2000 10-11-2000
I/O Benchmarks
• Processor benchmarks classically aimed at response time
for fixed sized problem.
• I/O benchmarks typically measure throughput, possibly
with upper limit on response times (or 90% of response
times)
• Traditional I/O benchmarks fix the problem size in the
benchmark.
• Examples:
Benchmark
Size of Data
% Time I/O
Year
I/OStones
Andrew
1 MB
4.5 MB
26%
4%
1990
1988
– Not much I/O time in benchmarks
– Limited problem size
– Not measuring disk (or even main memory)
EECC722 - Shaaban
#18 Lec # 9
Fall2000 10-11-2000
The Ideal I/O Benchmark
•
•
•
•
•
•
An I/O benchmark should help system designers and users understand
why the system performs as it does.
The performance of an I/O benchmark should be limited by the I/O
devices. to maintain the focus of measuring and understanding I/O
systems.
The ideal I/O benchmark should scale gracefully over a wide range of
current and future machines, otherwise I/O benchmarks quickly become
obsolete as machines evolve.
A good I/O benchmark should allow fair comparisons across machines.
The ideal I/O benchmark would be relevant to a wide range of
applications.
In order for results to be meaningful, benchmarks must be tightly
specified. Results should be reproducible by general users;
optimizations which are allowed and disallowed must be explicitly
stated.
EECC722 - Shaaban
#19 Lec # 9
Fall2000 10-11-2000
I/O Benchmarks Comparison
EECC722 - Shaaban
#20 Lec # 9
Fall2000 10-11-2000
Self-scaling I/O Benchmarks
• Alternative to traditional I/O benchmarks: self-scaling
benchmark; automatically and dynamically increase
aspects of workload to match characteristics of system
measured
– Measures wide range of current & future applications
• Types of self-scaling benchmarks:
– Transaction Processing - Interested in IOPS not bandwidth
• TPC-A, TPC-B, TPC-C
– NFS: SPEC SFS/ LADDIS - average response time and
throughput.
– Unix I/O - Performance of files systems
• Willy
EECC722 - Shaaban
#21 Lec # 9
Fall2000 10-11-2000
I/O Benchmarks: Transaction Processing
• Transaction Processing (TP) (or On-line TP=OLTP)
– Changes to a large body of shared information from many terminals,
with the TP system guaranteeing proper behavior on a failure
– If a bank’s computer fails when a customer withdraws money, the TP
system would guarantee that the account is debited if the customer
received the money and that the account is unchanged if the money
was not received
– Airline reservation systems & banks use TP
• Atomic transactions makes this work
• Each transaction => 2 to 10 disk I/Os & 5,000 and 20,000
CPU instructions per disk I/O
– Efficiency of TP SW & avoiding disks accesses by keeping
information in main memory
• Classic metric is Transactions Per Second (TPS)
– Under what workload? how machine configured?
EECC722 - Shaaban
#22 Lec # 9
Fall2000 10-11-2000
I/O Benchmarks: Old TPC Benchmarks
• TPC-A: Revised version of TP1/DebitCredit
–
–
–
–
–
–
–
Arrivals: Random (TPC) vs. uniform (TP1)
Terminals: Smart vs. dumb (affects instruction path length)
ATM scaling: 10 terminals per TPS vs. 100
Branch scaling: 1 branch record per TPS vs. 10
Response time constraint: 90% Š2 seconds vs. 95% Š1
Full disclosure, approved by TPC
Complete TPS vs. response time plots vs. single point
• TPC-B: Same as TPC-A but without terminals—batch
processing of requests
– Response time makes no sense: plots tps vs. residence time
(time of transaction resides in system)
• These have been withdrawn as benchmarks
EECC722 - Shaaban
#23 Lec # 9
Fall2000 10-11-2000
I/O Benchmarks: TPC-C Complex OLTP
•
•
•
•
•
•
Models a wholesale supplier managing orders.
Order-entry conceptual model for benchmark.
Workload = 5 transaction types.
Users and database scale linearly with throughput.
Defines full-screen end-user interface
Metrics: new-order rate (tpmC)
and price/performance ($/tpmC)
• Approved July 1992
EECC722 - Shaaban
#24 Lec # 9
Fall2000 10-11-2000
TPC-C Price/Performance $/tpm(c)
Rank
1 Acer
2 Dell
3 Compaq
4 ALR
5 HP
6
Fujitsu
7 Fujitsu
8 Unisys
9 Compaq
10 Unisys
Config
$/tpmC
AcerAltos 19000Pro4
$27.25
PowerEdge 6100 c/s
$29.55
ProLiant 5500 c/s
$33.37
Revolution 6x6 c/s
$35.44
NetServer LX Pro
$35.82
teamserver M796i
$37.62
GRANPOWER 5000 Model 670
$37.62
Aquanta HS/6 c/s
$37.96
ProLiant 7000 c/s
$39.25
Aquanta HS/6 c/s
$39.39
tpmC
11,072.07
10,984.07
10,526.90
13,089.30
10,505.97
13,391.13
13,391.13
13,089.30
11,055.70
12,026.07
Database
M/S SQL 6.5
M/S SQL 6.5
M/S SQL 6.5
M/S SQL 6.5
M/S SQL 6.5
M/S SQL 6.5
M/S SQL 6.5
M/S SQL 6.5
M/S SQL 6.5
M/S SQL 6.5
EECC722 - Shaaban
#25 Lec # 9
Fall2000 10-11-2000
I/O Benchmarks: TPC-D Complex
Decision Support Workload
•
•
•
•
•
OLTP: business operation
Decision support: business analysis (historical)
Workload = 17 adhoc transaction types
Synthetic generator of data
Size determined by Scale Factor:
100 GB, 300 GB, 1 TB, 3 TB, 10 TB
• Metrics:
“Queries per Gigabyte Hour”
Power (QppD@Size) = 3600 x SF / Geo. Mean of queries
Throughput (QthD@Size) = 17 x SF / (time/3600)
Price/Performance ($/QphD@Size) =
$/ geo. mean(QppD@Size, QthD@Size)
• Report time to load database (indices, stats) too.
• Approved April 1995
EECC722 - Shaaban
#26 Lec # 9
Fall2000 10-11-2000
TPC-D Performance/Price 300 GB
Rank
Config.
1 NCR WorldMark 5150
2 HP 9000 EPS22 (16 node)
3DG AViiON AV20000
4Sun - Ultra Enterprise 6000
5Sequent NUMA-Q 2000 (32 way)
Qppd
9,260.0
5,801.2
3,305.8
3,270.6
3,232.3
QthD $/QphD
3,117.0 2,172.00
2,829.0 1,982.00
1,277.7 1,319.00
1,477.8 1,553.00
1,097.8 3,283.00
Database
Teradata
Informix-XPS
Oracle8 v8.0.4
Informix-XPS
Oracle8 v8.0.4
Rank
Config.
Qppd
1 DG AViiON AV20000
3,305.8
2 Sun Ultra Enterprise 6000
3,270.6
3 HP 9000 EPS22 (16 node)
5,801.2
4 NCR WorldMark 5150
9,260.0
5 Sequent NUMA-Q 2000 (32 way) 3,232.3
QthD $/QphD
1,277.7 1,319.00
1,477.8 1,553.00
2,829.0 1,982.00
3,117.0 2,172.00
1,097.8 3,283.00
Database
Oracle8 v8.0.4
Informix-XPS
Informix-XPS
Teradata
Oracle8 v8.0.4
EECC722 - Shaaban
#27 Lec # 9
Fall2000 10-11-2000
TPC-D Performance 1TB
Rank Config.
1 Sun Ultra E6000 (4 x 24-way)
2 NCR WorldMark (32 x 4-way)
3 IBM RS/6000 SP (32 x 8-way)
Qppd
12,931.9
12,149.2
7,633.0
QthD $/QphD
5,850.3 1,353.00
3,912.3 2103.00
5,155.4 2095.00
Database
Infomix Dyn
Teradata
DB2 UDB, V5
– NOTE: Inappropriate to compare results from different database sizes.
EECC722 - Shaaban
#28 Lec # 9
Fall2000 10-11-2000
I/O Benchmarks: TPC-W
Transactional Web Benchmark
• Represent any business (retail store, software distribution, airline
reservation, electronic stock trades, etc.) that markets and sells
over the Internet/ Intranet
• Measure systems supporting users browsing, ordering, and
conducting transaction oriented business activities.
• Security (including user authentication and data encryption) and
dynamic page generation are important
• Before: processing of customer order by terminal operator
working on LAN connected to database system
• Today: customer accesses company site over Internet connection,
browses both static and dynamically generated Web pages, and
searches the database for product or customer information.
Customer also initiate, finalize and check on product orders and
deliveries.
• Approved Fall, 1998
EECC722 - Shaaban
#29 Lec # 9
Fall2000 10-11-2000
SPEC SFS/LADDIS Predecessor:
NFSstones
• NFSStones: synthetic benchmark that generates
series of NFS requests from single client to test
server: reads, writes, & commands & file sizes
from other studies.
– Problem: 1 client could not always stress server.
– Files and block sizes not realistic.
– Clients had to run SunOS.
EECC722 - Shaaban
#30 Lec # 9
Fall2000 10-11-2000
SPEC SFS/LADDIS
• 1993 Attempt by NFS companies to agree on standard
benchmark: Legato, Auspex, Data General, DEC,
Interphase, Sun.
• Like NFSstones but:
–
–
–
–
–
–
–
Run on multiple clients & networks (to prevent bottlenecks)
Same caching policy in all clients
Reads: 85% full block & 15% partial blocks
Writes: 50% full block & 50% partial blocks
Average response time: 50 ms
Scaling: for every 100 NFS ops/sec, increase capacity 1GB.
Results: plot of server load (throughput) vs. response time &
number of users
• Assumes: 1 user => 10 NFS ops/sec
EECC722 - Shaaban
#31 Lec # 9
Fall2000 10-11-2000
Unix I/O Benchmarks: Willy
• UNIX File System Benchmark that gives insight into I/O
system behavior (Chen and Patterson, 1993)
• Self scaling to automatically explore system size
• Examines five parameters
– Unique bytes touched: data size; locality via LRU
• Gives file cache size
– Percentage of reads: %writes = 1 – % reads; typically 50%
• 100% reads gives peak throughput
– Average I/O Request Size: Bernoulli, C=1
– Percentage sequential requests: typically 50%
– Number of processes: concurrency of workload (number processes
issuing I/O requests)
• Fix four parameters while vary one parameter
• Searches space to find high throughput
EECC722 - Shaaban
#32 Lec # 9
Fall2000 10-11-2000
UNIX File System Performance Study
Using Willy
Mini/Mainframe
Desktop
9 Machines & OS
Machine
Alpha AXP 3000/400
DECstation 5000/200
DECstation 5000/200
HP 730
IBM RS/6000/550
SparcStation 1+
SparcStation 10/30
Convex C2/240
IBM 3090/600J VF
OS
Year
OSF/1
1993
Sprite LFS 1990
Ultrix 4.2
1990
HP/UX 8 & 91991
AIX 3.1.5
1991
SunOS 4.1 1989
Solaris 2.1 1992
Convex OS 1988
AIX/ESA
1990
Price
$30,000
$20,000
$20,000
$35,000
$30,000
$30,000
$20,000
$750,000
$1,000,000
Memory
64 MB
32 MB
32 MB
64 MB
64 MB
28 MB
128 MB
1024 MB
128 MB
EECC722 - Shaaban
#33 Lec # 9
Fall2000 10-11-2000
EECC722 - Shaaban
#34 Lec # 9
Fall2000 10-11-2000
Disk Performance
• I/O limited by weakest link in chain from processor to disk
• Could be disks, disk controller, I/O bus, CPU/Memory bus,
CPU, or OS - not uniform across machines
Machine
Alpha AXP 3000/400
DECstation 5000/200
DECstation 5000/200
HP 730
IBM RS/6000/550
SparcStation 1+
SparcStation 10/30
Convex C2/240
IBM 3090/600J VF
OS
OSF/1
Sprite LFS
Ultrix 4.2
HP/UX 8 & 9
AIX 3.1.5
SunOS 4.1
Solaris 2.1
Convex OS
AIX/ESA
I/O bus
Disk
TurboChannel SCSI RZ26
SCSI-I
3 CDC Wren
SCSI-I
DEC RZ56
Fast SCSI-II
HP 1350SX
SCSI-I
IBM 2355
SCSI-I
CDC Wren IV
SCSI-I
Seagate Elite
IPI-2
4 DKD-502
Channel
IBM 3390
EECC722 - Shaaban
#35 Lec # 9
Fall2000 10-11-2000
Self-Scaling Benchmark Parameters
EECC722 - Shaaban
#36 Lec # 9
Fall2000 10-11-2000
Disk Performance
Machine and Operating System
Convex C240,
ConvexOS10
4.2
2.4
SS 10, Solaris 2
2.0
AXP/4000, OSF1
IPI-2, RAID
5400 RPM SCSI-II disk
1.6
RS/6000,AIX
1.4
HP 730, HP/UX 9
1.1
3090,AIX/ESA
Sparc1+,SunOS
0.7
4.1
DS5000,Ultrix
IBM Channel, IBM 3390 Disk
0.6
0.5
DS5000,Sprite
0.0
1.0
2.0
3.0
4.0
5.0
Megabytes per Second
• 32 KB reads
• SS 10 disk spins 5400 RPM; 4 IPI disks on Convex
EECC722 - Shaaban
#37 Lec # 9
Fall2000 10-11-2000
File Cache Performance
• UNIX File System Performance: not how fast disk, but
whether disk is used (File cache has 3 to 7 x disk perf.)
• 4X speedup between generations; DEC & Sparc
31.8
Machines & Operating Systems
AXP/4000, OSF1
RS/6000,AIX
28.2
HP 730, HP/UX 9
27.9
Fast Mem Sys
27.2
3090,AIX/ESA
DEC
Generations
11.4
SS 10, Solaris 2
Convex C240,
ConvexOS10
DS5000,Sprite
9.9
8.7
Sun Generations
5.0
DS5000,Ultrix
Sparc1+,SunOS
4.1
2.8
0.0
10.0
20.0
30.0
40.0
Megabytes per Second
EECC722 - Shaaban
#38 Lec # 9
Fall2000 10-11-2000
File Cache Size
• HP v8 (8%) vs. v9 (81%);
DS 5000 Ultrix (10%) vs. Sprite (63%)
90%
71%
70%
74%
77%
80% 81%
1000
60%
100
50%
40%
30%
10
20%
20%
8% 10%
File Cache Size (MB)
63%
10%
Convex C240,
ConvexOS10
HP730, HP/UX 9
RS/6000, AIX
OSF1
Alpha,
Solaris 2
SS 10,
Sparc1+, SunOS
4.1
DS5000, Sprite
3090, AIX/ESA
1
DS5000, Ultrix
0%
HP730, HP/UX 8
% Main Memory for FIle Cache
80%
87%
EECC722 - Shaaban
#39 Lec # 9
Fall2000 10-11-2000
File System Write Policies
• Write Through with Write Buffer (Asynchronous):
AIX, Convex, OSF/1 w.t., Solaris, Ultrix
35
Convex
30
MB/sec
MB/sec
Fast Disks
25
Solaris
20
AIX
15
Fast File
Caches for
Reads
OSF/1
10
5
0
0%
20%
40%
60%
80%
100%
% Reads
EECC722 - Shaaban
#40 Lec # 9
Fall2000 10-11-2000
File System Write Policies
HP/UX no w.d.
35
SunOS
30
Sprite
MB/sec
25
Write Cancelling (Write Back):
HP/UX no write daemon (v. 30s);
must wait for write to complete when
flushed
20
MB/sec
15
10
5
0
0%
20%
40%
60%
80%
100%
% Reads
EECC722 - Shaaban
#41 Lec # 9
Fall2000 10-11-2000
File cache performance
vs. read percentage
EECC722 - Shaaban
#42 Lec # 9
Fall2000 10-11-2000
Performance vs. Megabytes Touched
EECC722 - Shaaban
#43 Lec # 9
Fall2000 10-11-2000
Write policy Performance For Client/Server Computing
• NFS: write through on close (no buffers)
• HPUX: client caches writes; 25X faster @ 80% reads
18
Megabytes per Second
16
FDDI Network
14
MB/sec 12
10
HP 720-730, HP/UX 8, DUX
8
6
Ethernet
4
2
SS1+, SunOS 4.1, NFS
0
0%
10%
20%
30%
40%
50%
% Reads
60%
70%
80%
90%
100%
EECC722 - Shaaban
#44 Lec # 9
Fall2000 10-11-2000
UNIX I/O Performance Study Conclusions
• Study uses Willy, a new I/O benchmark which supports self-scaling
evaluation and predicted performance.
• The hardware determines the potential I/O performance, but the
operating system determines how much of that potential is delivered:
differences of factors of 100.
• File cache performance in workstations is improving rapidly, with
over four-fold improvements in three years for DEC (AXP/3000 vs.
DECStation 5000) and Sun (SPARCStation 10 vs. SPARCStation 1+).
• File cache performance of Unix on mainframes and
minisupercomputers is no better than on workstations.
• Current workstations can take advantage of high performance disks.
• RAID systems can deliver much higher disk performance.
• File caching policy determines performance of most I/O events, and
hence is the place to start when trying to improve I/O performance.
EECC722 - Shaaban
#45 Lec # 9
Fall2000 10-11-2000