Building a Province-Wide Research Cloud for Ontario*s Academic

Download Report

Transcript Building a Province-Wide Research Cloud for Ontario*s Academic

O N TA R I O L I B R A RY R E S E A R C H C L O U D :
BUILDING A PROVINCE-WIDE
R E S E A R C H C L O U D F O R O N TA R I O ’ S
ACADEMIC LIBRARIES
P a s c a l C a l a r c o , U n i v e r s i t y o f Wa t e r l o o L i b r a r y
A n d r e w M c A l o r u m , I n f o r m a t i o n S y s t e m s & Te c h n o l o g y
watitis.uwaterloo.ca
@watitisconf
#watitis2014
AGENDA
•
•
•
•
•
•
Problem we’re trying to solve - Pascal
Funding and project plan - Pascal
Technology overview – Andrew
Some likely use cases – Andrew
Next steps – Pascal
Q&A
OLRC: Pascal Calarco & Andrew McAlorum
#watitis2014
LIBRARIES’ GROWING STORAGE
NEEDS
• Digitized physical materials: books, journals,
film, audio
Reformatting to conserve original eg. Acidic
paper such as newspapers
Reformatting to increase access eg. Rare
materials
Format migration to preserve content eg. 16mm
film
OLRC: Pascal Calarco & Andrew McAlorum
#watitis2014
LIBRARIES GROWING STORAGE
NEEDS
• Born digital scholarly content for long term
stewardship:
E-Theses and supplemental material
Scholarship: Working papers, Pre-prints, Open
Access
Research data: numeric, geospatial, image, audio
Websites and digital ephemera of academic interest
Donated electronic materials for Special Collections
• John English’s hard drives of personal email correspondence,
drafts and other materials
OLRC: Pascal Calarco & Andrew McAlorum
#watitis2014
OCUL STORAGE SURVEY (2013)
• 10 of 21 institutions responded; six >10k
FTE, 4 smaller than 10k
• Preservation & Access Needs:
80%:
80%:
60%:
50%:
50%:
40%:
20%:
20%:
digitized print content
faculty publications
donated digital content
research data
GIS data
purchased digital resources
corporate records
E-Theses
OLRC: Pascal Calarco & Andrew McAlorum
#watitis2014
OCUL SURVEY: STORAGE NEEDS
• Current storage requirements: 100GB-30TB;
total of respondents: 58.5 TB
• Expected storage needs, next 2-3 years:
20% 100TB+
40% 10TB-100TB
20% >10TB
250TB total for all 10 institutions
OLRC: Pascal Calarco & Andrew McAlorum
#watitis2014
OCUL SURVEY: STORAGE
PROVISIONING
• 80% partner with campus IT often/mostly
• 60% provision in-house often/mostly
• 40% provision with other partner libraries
often/mostly
• 30% provision with commercial services
often/mostly
OLRC: Pascal Calarco & Andrew McAlorum
#watitis2014
OCUL STORAGE SURVEY: TOP
FEATURES (2013)
•
•
•
•
•
Large storage on demand
Low cost
Canadian-based hosting
Transparent pricing
Archival quality storage
OLRC: Pascal Calarco & Andrew McAlorum
#watitis2014
OLRC: Pascal Calarco & Andrew McAlorum
#watitis2014
OLRC: Pascal Calarco & Andrew McAlorum
#watitis2014
OLRC: Pascal Calarco & Andrew McAlorum
#watitis2014
OLRC: Pascal Calarco & Andrew McAlorum
#watitis2014
OLRC: Pascal Calarco & Andrew McAlorum
#watitis2014
STORAGE ARCHITECTURES AND
COST TIERS
#watitis2014
CLOUD OPTIONS
• Amazon S3/Glacier: $500k/year for current
250TB SP content
$2000/TB per year, recurring
• DuraCloud: Amazon reseller, adding
preservation & mgmt. tools
$1000-$1500/TB per year, recurring
• Private Cloud: OpenStack
$280-$350/TB per year, amortized over three
years
OLRC: Pascal Calarco & Andrew McAlorum
#watitis2014
MTCU PROPOSAL AND PIF
FUNDING
• 2013/2014: OCUL was awarded $1.2 million
Productivity and Innovation Fund (PIF)
funding for OLRC startup
• 50TB per founding partner institution
• Triplestore preservation: content copies at
three different co-located nodes for
redundancy, error correction
• Text mining portal for stored ScholarsPortal
content
OLRC: Pascal Calarco & Andrew McAlorum
#watitis2014
OLRC: Pascal Calarco & Andrew McAlorum
#watitis2014
OLRC: Pascal Calarco & Andrew McAlorum
#watitis2014
OLRC: Pascal Calarco & Andrew McAlorum
#watitis2014
OPENSTACK
•
•
•
•
#watitis2014
An open source cloud
computing platform, primarily
deployed as an Infrastructure as-a-Service (IaaS) platform
Swift – OpenStack object
store, store and retrieve data
via API
Integrate OpenStack/Swift to
Digital Repository
architectures
Develop Dropbox-like cloud
storage web interface
USE CASES
•
•
•
•
•
•
Audience: Librarians, Faculty
Digital Preservation
Institutional and Personal Storage
Repositories
Research Data Management
Text mining large volumes of digital textual
content for research purposes
OLRC: Pascal Calarco & Andrew McAlorum
#watitis2014
DIGITAL CURATION
#watitis2014
FEDORA COMMONS
Open source digital object repository, that is the
underlying architecture behind Islandora, Hydra,
and other digital asset management systems.
#watitis2014
DSPACE
An open source turnkey institutional repository
softw are for building open access repositories for
scholarly and published digital content.
#watitis2014
ARCHIVEMATICA
An open source
digital preservation
system designed to
maintain standardsbased, long term
access to collections
of digital objects.
#watitis2014
DATAVERSE
• An open source web
application for
publishing, citing,
analyzing and
preserving research
data.
• Research data
management focus
#watitis2014
TEXT MINING
Portential uses by
researchers in
Digital Humanities:
• Entity recognition
• Parts of speech
analysis
• Topic modeling
• Network analysis
• Visualization
#watitis2014
CURRENT STATUS & MILESTONES
• October 2014: integration with Archivematica
• December 2014: integration with DataVerse
• Q1 2015: Storage Nodes finalized;
installation of Waterloo/Guelph/Laurier node
• March 2015: integration with Fedora
Commons
• May 2015: Third Hackfest, Text Mining Portal
• June 2015: integration with DSpace
OLRC: Pascal Calarco & Andrew McAlorum
#watitis2014
THANKS! QUESTIONS?
• Pascal Calarco, uWaterloo Library
[email protected] x38215
• Andrew McAlorum, IST
[email protected] x31135
OLRC: Pascal Calarco & Andrew McAlorum
#watitis2014