The Active Streams Approach to adaptive distributed systems
Download
Report
Transcript The Active Streams Approach to adaptive distributed systems
Versioning File Systems
Someone has typed: rm -r *
However, he has been in the wrong
directory. What can be done?
Typical UNIXes and Windows versions
have some tools for restoring deleted
files, if the file's blocks have not been
reclaimed.
Is this release of storage by UNIX and
Windows essential?
7.1 Advanced Operating Systems
The File System's problem
Key problem with current approach is
that user actions have immediate and
irrevocable effect on the disk storage.
– Users are not protected against their own
mistakes.
Goes against the file system objective
of protecting data against failure.
We can do better today.
7.2 Advanced Operating Systems
Disk Capacity
On 1995:
– For $200 you can get a 0.54GB disk.
– Slackware Linux 2.2 (Basic Applications+X window) is
0.15Gbytes which are 28% of the disk.
On 2000:
– For $200 you can get a 40GB disk.
– RedHat Linux 7 (Basic Applications+X window) is
1Gbytes which are 2.5% of the disk.
7.3 Advanced Operating Systems
Disk Capacity (Cont.)
On 2004:
– For $200 you can get a 300GB disk.
– RedHat Linux Advanced Workstation 2.1 (Basic
Applications+X window) for the Itanium Processor
is 4.2GB which are 1.4% of the disk.
On 2011:
– For $200 you can get a 2TB disk.
– RedHat Enterprise Linux 5 (Basic Applications+X
window) is 8.8GB which are 0.4% of the disk.
7.4 Advanced Operating Systems
Old Solutions
UNIX has RCS and CVS for maintaining
versions of files.
– The manual operation is the main disadvantage of
these tools.
On 1985 the Cedar file system has been
proposed.
– Cedar automatically retains the last few versions of
a file in a copy-on-write fashion.
– The number of copies is limited; hence when a new
write is done, the oldest version will be deleted.
• The user can explicitly delete a version, so the oldest
version will not be the victim.
VMS uses a version of the Cedar File System.
7.5 Advanced Operating Systems
Snapshots
Many systems are regularly backed up
within the disk.
The backup is usually incremental.
Changes made between snapshots
cannot be undone.
– Many users maintain multiple versions of
their critical data.
All files are treated equally.
7.6 Advanced Operating Systems
Not all files are created equal
Read-only files (like application executables)
have no versions history.
Derived files (like object files) can be easily
reconstituted.
Cached files require no version history.
Temporary files might benefit from a shortterm history but not from a long-term history.
User-modified files would benefit most from
a long-term and a short-term history.
7.7 Advanced Operating Systems
The Elephant File System
Elephant (1999) maintains multiple versions
of user files, but not all versions of all files
– Need a retention policy.
Elephant involves the user in the
retention/reclamation decisions. This means:
– Less protection from user mistakes.
– A retention policy that might be better suited to the
users’ needs.
Elephant keeps a complete history of a file
over a short period of time (one hour to one
week), but keeps forever landmark versions
of each file.
7.8 Advanced Operating Systems
Elephant's Main Concepts
Storage reclamation is separated from file
write and delete.
Files have a variety of retention policies.
Policies are specified by the user, but
implemented by the system.
Undo requires complete history for a limited
period of time, but long-term histories should
not retain all versions.
The file system assists the user in deciding
what versions to retain in the long-term
history.
7.9 Advanced Operating Systems
Landmark Versions
Elephant detects landmark versions by
looking at time line of updates to the file.
– Can identify groups of updates separated by long
periods of stability.
– Last versions of each group of updates are
assumed to be landmark versions.
User ability to recognize landmark versions of
a file degrades with time.
– Thus, landmark versions are automatically
specified by Elephant.
– Even though, user can manually specify any
version as a landmark version.
7.10 Advanced Operating Systems
Elephant's Versioning
The user can set the limit between the recent
history (save any version) and the old history
(save landmark versions).
File versions are named by combining the file
pathname with a creation date and time.
Directories can be versioned as well.
– Allows recovery of deleted files.
Previous versions of a file or a directory are
read-only.
7.11 Advanced Operating Systems
Retention Policies
Keep One: keeps only the latest version of the file.
– Identical to UFS and FAT.
Keep All: keeps all versions of the file.
– Useful for very important files.
Keep Safe: keeps all versions of the file during a
specific period.
– Can be used for log files.
Keep Landmarks: keeps all versions of the file
during a specific period and only landmark versions
after that.
– Useful for common user's files.
7.12 Advanced Operating Systems
I-map
I-map is a new structure points to the I-node
of the current version and the vector of the
old versions (I-node log). In addition, I-map
contains the file's policy.
By default the policy is "keep one".
Common blocks of some versions can be
pointed to by several I-nodes.
– Changes are detected at the block level.
New system calls have been added to handle
the new file system's features.
7.13 Advanced Operating Systems
Elephant's Performance
open() of an exiting file and close() without
flushing can be executed almost in same runtime of traditionally UFS.
– close() with flushing will be slower.
creat() of Elephant is slower.
– Should allocate an I-map in addition to the I-node
on the disk.
unlink() of Elephant is faster.
– No release of old blocks.
Elephant is much more disk space
consuming.
7.14 Advanced Operating Systems
The Moraine File Systems
On 2000 Yamamoto suggested to
compress the versioning data.
In addition his versioning file system
has software engineering tools:
– The Moraine has a version viewer tool runs
on a separate window.
– The Moraine can also tell how many lines
and how many functions any version has.
7.15 Advanced Operating Systems
The Version Viewer of Moraine
Rev is the version
ID.
+n means n lines
were added while
–n means n lines
were deleted.
The line bar
indicates the size
of the file.
The user can put a
remark in TAG.
7.16 Advanced Operating Systems
CVFS
On 2003 Soules introduced The Comprehensive
Versioning File System (CVFS).
CVFS keeps the versions of all files in a journalbased style.
– CVFS saves the changes; not the new data.
– To create old versions, each change is undone
backward through the journal until the desired version
is recreated.
– Rather than saving the blocks that have been changed,
CVFS keeps the bytes that have been changed.
CVFS is very efficient in disk space, but inefficient
in recover time.
7.17 Advanced Operating Systems