Pascal Calarco

Download Report

Transcript Pascal Calarco

Ontario Library Research Cloud: Building A
Province-Wide Research Cloud for Ontario’s
Academic Libraries
Pascal V. Calarco, University of Waterloo
IGeLU 2015 September 3, 2015
Agenda
•
•
•
•
•
•
•
OCUL Overview
Problem we’re trying to solve
Funding and project plan
Technology overview
Some likely use cases
Next steps
Q&A
Ontario Council of University Libraries
• 21 member libraries
• 420,000 students
• Collaboration in:
– Shared electronic
collections
– Planning &
assessment
– Digital library services
& infrastructure
Libraries’ Growing Storage Needs
• Digitized physical materials: books,
journals, film, audio
– Reformatting to conserve original eg. Acidic
paper such as newspapers
– Reformatting to increase access eg. Rare
materials
– Format migration to preserve content eg.
16mm film
Libraries Growing Storage Needs
• Born digital scholarly content for long term stewardship:
– E-Theses and supplemental material
– Scholarship: Working papers, Pre-prints, Open
Access
– Research data: numeric, geospatial, image, audio
– Websites and digital ephemera of academic interest
– Donated electronic materials for Special Collections
• John English’s hard drives of personal email
correspondence, drafts and other materials
OCUL Storage Survey (2013)
• 10 of 21 institutions responded; six >10k FTE, 4 smaller than 10k
• Preservation & Access Needs:
– 80%: digitized print content
– 80%: faculty publications
– 60%: donated digital content
– 50%: research data
– 50%: GIS data
– 40%: purchased digital resources
– 20%: corporate records
– 20%: E-Theses
OCUL Survey: Storage Needs
• Current storage requirements: 100GB30TB; total of respondents: 58.5 TB
• Expected storage needs, next 2-3 years:
– 20% 100TB+
– 40% 10TB-100TB
– 20% >10TB
– 250TB total for all 10 institutions
OCUL Survey: Storage Provisioning
• 80% partner with campus IT often/mostly
• 60% provision in-house often/mostly
• 40% provision with other partner libraries
often/mostly
• 30% provision with commercial services
often/mostly
OCUL Storage Survey: Top Features
(2013)
•
•
•
•
•
Large storage on demand
Low cost
Canadian-based hosting
Transparent pricing
Archival quality storage
Storage Architectures and Cost Tiers
Cloud storage options
• Amazon S3/Glacier: $500k/year for current 250TB SP
content
– $2000/TB per year, recurring
• DuraCloud: Amazon reseller, adding preservation &
mgmt. tools
– $1000-$1500/TB per year, recurring
• Private Cloud: OpenStack
– $280-$350/TB per year, amortized over three years
MTCU Proposal and PIF funding
• 2013/2014: OCUL was awarded $1.2 million
Productivity and Innovation Fund (PIF) funding
for OLRC startup
• 50TB per founding partner institution
• Triplestore preservation: content copies at three
different co-located nodes for redundancy, error
correction
• Text mining portal for stored ScholarsPortal
content
Hardware configuration
• Dell selected as hardware vendor.
• Head units: Dell Power Edge R720xd server populated with
two 2.8GHz Xeon processors, 256GB of RAM, and two
200GB SSD drives which will be used to run the operating
system and the OpenStack software. Each head unit also
contains twelve 4TB SAS drives for an internal storage
capacity of 48TB.
• Storage shelves: Dell PowerVault MD 1200 storage
shelves, directly attached to the server, with each shelf
containing twelve 4TB SAS drives, with a total capacity per
shelf of 48TB.
• Total initial capacity 3.6 PB raw, triple-redundant, 1.2 PB net
OpenStack
•
•
•
•
An open source cloud computing
platform, primarily deployed as an
Infrastructure-as-a-Service (IaaS)
platform
Swift – OpenStack object store,
store and retrieve data via API
Integrate OpenStack/Swift to
Digital Repository architectures
Develop Dropbox-like cloud
storage web interface
Use Cases
•
•
•
•
•
Digital Preservation
Institutional and Personal Storage
Repositories
Research Data Management
Text mining large volumes of digital textual
content for research purposes
Digital Curation
Fedora Commons
• Open source digital object repository, that is the underlying
architecture behind Islandora, Hydra, and other digital asset
management systems.
DSpace
• An open source turnkey institutional repository software
for building open access repositories for scholarly and
published digital content.
Archivematica & ICAtoM
• An open source digital preservation system designed to
maintain standards-based, long term access to
collections of digital objects.
Dataverse
• An open source web application for publishing, citing,
analyzing and preserving research data.
• Research data management focus
• Access not preservation
Text Mining
• Potential uses by
researchers in Digital
Humanities:
– Entity recognition
– Parts of speech
analysis
– Topic modeling
– Network analysis
– Visualization
Canadian Text Archive Centre
• Phase 2 development
– Leverage OCUL ScholarsPortal text corpus of books
and journals for academic research
– CTAC Advisory Committee being formed
– Tools and service development for students and
researchers to create worksets of documents from
content in the OLRC
– Bring “analysis to the data”
– June 2015 – May 2016
Current Status & Milestones
• October 2014: integration with Archivematica
• December 2014: integration with DataVerse
• Q1 2015: Storage Nodes finalized; installation of
Waterloo/Guelph/Laurier node
• March 2015: integration with Fedora Commons
• May 2015: Third Hackfest, Text Mining Portal
• June 2015: integration with DSpace
• Fall 2015: Canadian Text Archive Centre Advisory
Committee
Thanks! Questions?
• Pascal Calarco, University of Waterloo
Library [email protected]