The NSF Cyberinfrastructure for the 21st Century, CIF21
Download
Report
Transcript The NSF Cyberinfrastructure for the 21st Century, CIF21
The NSF Cyberinfrastructure for the
21st Century Program
CIF21
Rob Pennington
Program Director
Office of Cyberinfrastructure
National Science Foundation
1
The Shift Towards a “Sea of Data”
Implications
All science is becoming data-dominated
Experiment,
computation,
Fundamental
questions theory How do we attribute credit
for this new publication
become
focused around
Fourth
paradigm
form? How are data peer
data: How to remove
Classes of data
reviewed? What is a
boundaries? How to
Collections,
experiments,
simulations
publication
in the modern
incentivizeobservations,
sharing?
data-rich world?
Software
Publications
Totally new methodologies
Algorithms, mathematics, culture
Data become the medium for
Multidisciplinarity, communication, publication…science
2
Scientific Data Challenges
Exa
Bytes
Square
Kilometer
Array
Climate,
Environment
Volume
Bytes per day
Genomics
Peta
Bytes
TeraGrid,
Blue
Waters
Climate,
Environment
LHC
LHC
Tera
Bytes
LSST
DataNet
Genomics
Giga
Bytes
Distribution
Many smaller datasets…
2012
2020
Data Access
3
CIF21 and Transforming Research
EarthCube, Understanding the Phenome,
Clean Energy, Climate prediction, Social networking,
Complex networks, Health records, cybersecurity,
Matter-by-design, disaster recovery, etc
Grand
Challenges
Science, innovation, discovery, economic competitiveness
Networks
Expertise,
research
Compute,
Modeling
Communities
Sea of Data
Analytic Tools
CIF21
Software
Multi-disciplinary & multi-scale integration
4
NSF CIF21 Major Areas
Organizations
Expertise
Research and Scholarship
Education
Learning and Workforce Development
Interoperability and operations
Cyberscience
Universities, schools
Government labs, agencies
Research and Medical Centers
Libraries, Museums
Virtual Organizations
Communities
Discovery
Collaboration
Education
Computational
Resources
Supercomputers
Clouds, Grids, Clusters
Visualization
Compute services
Data Centers
Advanced
Computational
Infrastructure
Networking
Software
Applications, middleware
Software development and support
Cybersecurity: access,
authorization, authentication
Scientific Instruments
Large Facilities, MREFCs,,telescopes
Colliders, shake Tables
Sensor Arrays
- Ocean, environment, weather,
buildings, climate. etc
Data
Databases, Data repositories
Collections and Libraries
Data Access; storage, navigation
management, mining tools,
curation, privacy
Data
Infrastructure
Program
Campus, national, international networks
Research and experimental networks
End-to-end throughput
Cybersecurity
Broad Principles to Lead CIF21
Builds national infrastructure for S&E
Leverages common methods, approaches,
and applications – focus on interoperability
Catalyzes other CI investments across NSF
Provides focus and is a vehicle for coordinating
efforts and programs
Based upon a shared governance model
involving all parts of NSF
Managed as a coherent program by OCI
Spiral development methodology
6
Evolution of CIF21 and NSF Data Programs
On-going input
NSB
ACCI
Task
Force
DataNet
Awards
NSF
CIF21
Data
Programs
Science &
Engineering
Research
+
Cyberinfrastructure
Community
Input
7
Data Related Context
National Science and Technology Council (NSTC)
http://www.whitehouse.gov/blog/2012/01/30/yourcomments-access-federally-funded-scientific-researchresults
Networking and Information Technology Research
and Development (NITRD)
http://www.nitrd.gov/subcommittee/bigdata.aspx
National Science Board Data Policies Task Force
http://www.nsf.gov/nsb/committees/tskforce_dp.jsp
Advisory Committee for Cyberinfrastructure (ACCI)
8
www.nsf.gov/od/oci/taskforces/
NSTC RFIs for Public Comment Context
Two Requests for Information (RFIs) – Nov 2011
Public Access to Digital Data Resulting from Federally
Funded Scientific Research
• Preservation, Discovery and Access
• Standards for Interoperability, Re-Use and Re-Purposing
RFI for Scholarly Publications
http://www.whitehouse.gov/blog/2011/11/07/requestinformation-public-access-digital-data-and-scientificpublications
Comment period closed on 12 Jan 2012
Digital Data: 118 responses
Scholarly Publications: 377 responses
Individual and institutional responses
9
NSB Data Policy Task Force - Context
Dec 2011: NSB 11-79 Recommendations
http://www.nsf.gov/nsb/publications/2011/nsb1124.pdf
#1: Provide leadership … in the development and
implementation of digital research data policies ...
#2: … require grantees to make both the data and the methods
and techniques used in the creation and analysis of the data
accessible … Data should be shared using persistent electronic
identifiers …
#3: Continue to expand the support of computational and dataenabled science and engineering …
#4: Convene a panel .. to explore and develop a range of viable
long-term business models…
#5: Further the expansion of sustainable data management,
including preservation and curation of pre-existing and newly
generated long-lived data …
10
NSF Advisory Committee for
Cyberinfrastructure (ACCI)
Task Force - Context
Grand Challenges, HPC,
Data/Viz, Software, Campus
Bridging, Cyberlearning
More than 25 workshops and
Birds of a Feather sessions and
more than 1300 people involved
Final reports:
http://www.nsf.gov/od/oci/taskf
orces/
Data and Viz
Campus
Bridging
HPC
HIGH P ERFORMANCE COMPUTING
Grand
Challenges
Cyberlearning
Software
11
ACCI Data Task Force Recommendations
Recognize data infrastructure and services as
essential research assets fundamental to today’s
science and as long-term investments in national
prosperity
Create new citation models in which data and
software tool providers are credited with their
data contributions
Develop and publish realistic cost models to
underpin institutional/national business plans for
research repositories/data services
Identify and share best-practices for the critical
areas of data management
12
CIF21 and Data Enabled Science
Provide critical tools and services for data
mining, integration, analysis, modeling and
visualization.
Overcome barriers to scaling, synthesis, and
interoperability to promote effective use of
large scale, shared data resources.
Strategic investments that concentrate tools,
resources and expertise in support of
compelling grand challenge science
questions.
13
Data Infrastructure: A Multi-tiered
and Multi-Disciplinary Landscape
Data-enabled
Science
Observational
Communities
Modeling and Simulation
Communities
Population, Climate,
Environment
Communities
Data
Content
Data
Storage
DataNet supported
14
CIF21: Data-Enabled Science
Data-intensive Science Program (knowledge)
Intensive disciplinary efforts, multi-disciplinary
discovery and innovation
Data Analysis and Tools Program (information)
Data mining, manipulation, modeling, visualization,
decision-making systems
Data Services Program (data)
Provide reliable digital preservation, access,
integration, and analysis capabilities for science
and/or engineering data over a decades-long timeline
Dumped On by Data: Scientists Say15
a Deluge Is Drowning Research
Data Curation
Sustainable, community-based networks for
management of critical scientific data resources in a
life-cycle context.
Overcome challenges of culture change, policy
development and implementation, sustainable
operations, quality and usability control.
Strategic awards that address heterogeneity in
formats, complexity, semantics of data collections
that are valued by science communities of significant
breadth.
Operate as a network of data services that promote
interoperability, multidisciplinarity, and scalability.
16
Data Storage
National storage infrastructure for scientific data
Accommodate scale and heterogeneity through robust,
open, and broadly accepted standards
Business model implemented with governmental,
academic, non profit, and commercial stakeholders
Make strategic investments that:
Leverage existing resources in XSEDE, commercial
clouds, federal data centers
Meet growing capacity needs at optimum cost
Provide coordinating and integrative functions for
integrity, access control, availability, persistence
Catalyze a national data infrastructure
17
Cross Cutting Challenges
Balancing Research into Next Generation
infrastructure with operation & maintenance of
current capacity
Sustainability through technical design,
development of business models, and integration
with the research cycle
Integration
Vertical – Linking low-level bit storage infrastructure
to data collections, and to applications
Horizontal– Achieving connectivity and interoperability
between activities that vary in scale, disciplinarity, and
18
funding source
Summary
CIF21 is focused on effective ways to
approach and respond to the challenges
Critical concepts and goals
Realistic and innovative
Spiral process with strong, on-going feedback
Structure for longevity
Scalable open inclusive governance
Long term business models
International collaborations and programs
19