Preserving containers
Download
Report
Transcript Preserving containers
Preserving containers
EUAN COCHRANE
DIGITAL PRESERVATION MANAGER
YALE UNIVERSITY LIBRARY
How long do we need to preserve data
and software for reproducibility purposes?
Short answer: Forever
Long(er) answer:
It depends on your philosophy of science and your faith in humanity
“non-reproducible single occurrences are of no significance to science”
Karl Popper, The Logic of Scientific Discovery, Routledge, London, 1992,
p. 66.
“No amount of experimentation can ever prove me right; a single
experiment [at any point in time] can prove me wrong.”
Albert Einstein (allegedly)
Will humanity ever not want to have the option to reproduce
computational science from today?
How long will containers
be usable?
http://stackoverflow.com/questions/17934004/how-does-docker-allow-portable-containers-if-the-kernel-libraries-change
http://stackoverflow.com/questions/17934004/how-does-docker-allow-portable-containers-if-the-kernel-libraries-change
NB: Interesting conversation about ABIs here:
https://plus.google.com/11525042280361441
5116/posts/hMT5kW8LKJk
http://unix.stackexchange.com/questions/47495/oldest-binary-working-on-linux
http://unix.stackexchange.com/questions/47495/oldest-binary-working-on-linux
Linux-dependent containers can
only be guaranteed to be usable
while the operating system is
Windows/Mac containers will be worseoff
• Try running old Windows programs in Windows 10, even with the
compatibility layer
• Which version of Windows? Windows RT? Windows IoT? Windows
32-bit?
• Apple completely dropped support for PowerPC software after OSX
Tiger
Q: How long will containers
be usable without
intervention?
A: As long as the operating
systems are
So what about the
operating systems?
Challenges to operating system
compatibility over time
• Loss of backwards compatibility of new hardware with old software has
happened many times in the past
• E.g. Mac OS X Panther (version 10.3) requires a PowerPC processor
• Old operating systems often cannot interface with modern hardware
• Raspberry Pi (ARM) operating systems will not run on x86 hardware – will
Raspberry Pi follow Apple and move to x86 processors?
• Microsoft Windows Internet of Things edition will not run on x86 hardware
• Future advances such as quantum computing or 128-bit processors could
remove backwards compatibility with older operating systems
Summary:
We can’t just put things in
containers, we need to
preserve the containers
How to preserve
containers
Preserve access to the Operating
Systems
1. Preserve the operating systems
2. Maintain and develop emulators
Preserving operating systems is
achievable
• One preserved instance of an operating system
can support limitless numbers of compatible
containers
• We can use existing technologies and methods
to preserve operating systems
(bwFLA) Emulation as a Service - EaaS
• An emulation simplification tool
• Enables remote access to emulated (or virtualized) machines via a web browser
• Simplifies the use of emulation & virtualization in limitless workflows by providing a generic
API to existing emulators
• Enables citation of complex digital objects
• Reduces preservation costs by sharing underlying (e.g. OS) bit streams amongst EMs
• Can run remotely or on local hardware
• Can pass hardware connections from host computer to emulated computers when run locally
• http://eaas.uni-freiburg.de/
• Docker package available for installation locally see: http://bw-fla.unifreiburg.de/wordpress/?p=817
How might using emulation for preserving containers
be incorporated into scientific workflows?
• During the research process scientists test their containers to ensure they can
run on Emulated Machines (EMs)
• At the point of publication scientists:
• Install (automatically where possible) published packages on a new EM derivative
instance hosted by a digital archive
• Document and configure external data dependencies either on the same EM or as
an associated data source connectable to the preserved EM
• Receive a unique persistent URL for the EM and it’s networked/associated
“external” dependencies
• Scientists share the URL for their EM with reviewers and the community
• The digital archive preserves the EM over time and provides appropriate access
to it
Challenges to achieving sustainable
container preservation using emulation
1. Archives of preserved operating systems need to be funded,
established and maintained
2. Instances of emulation services need to be running and accessible by
scientists
3. Emulators need to be preserved
4. Big-data makes this more complicated
5. The scientific community needs to buy-into this vision
6. External data sources that are dependencies of the containers need to
be able to be preserved, documented, and usefully associated with the
preserved containers via practical workflows
Software Preservation at Yale University
Library
• CLIR Data/Software curation fellow starting in July 2016
• Short/medium term goals:
• Documenting software and producing standard, citable software documentation and identifiers
• Acquiring and preserving all software needed to maintain access to digital content the Library collects &
supports (including operating system dependencies of containers)
• Configuring and preserving “canonical” emulatable machines with standard versions of operating systems
and software packages installed on them (including operating system dependencies of containers)
• Assigning persistent identifiers to emulatable machines to enable citation and sharing
• Sharing configured machines where the installed software is sharable
• Longer term goals:
• Working with software vendors to enable preservation of commercial software
• Establishing an externally accessible instance of the Emulation Service for sharing interactive access to
virtual machines
• Working to establish middleware organization for connecting software archives, software licenses and
access systems
Thank you
Euan Cochrane
Digital Preservation Manager
Yale University Library
[email protected]
http://twitter.com/euanc
http://eaas.uni-freiburg.de