The NSF Cyberinfrastructure for the 21st Century, CIF21

Download Report

Transcript The NSF Cyberinfrastructure for the 21st Century, CIF21

The NSF Cyberinfrastructure for the
21st Century Program
CIF21
Rob Pennington
Program Director
Office of Cyberinfrastructure
National Science Foundation
1
The Shift Towards a “Sea of Data”
Implications

All science is becoming data-dominated
 Experiment,
computation,
Fundamental
questions theory How do we attribute credit
for this new publication
become
focused around
 Fourth
paradigm
form? How are data peer
data: How to remove
 Classes of data
reviewed? What is a
boundaries? How to
 Collections,
experiments,
simulations
publication
in the modern
incentivizeobservations,
sharing?
data-rich world?
 Software
 Publications

Totally new methodologies
 Algorithms, mathematics, culture

Data become the medium for
 Multidisciplinarity, communication, publication…science
2
Scientific Data Challenges
Exa
Bytes
Square
Kilometer
Array
Climate,
Environment
Volume
Bytes per day
Genomics
Peta
Bytes
TeraGrid,
Blue
Waters
Climate,
Environment
LHC
LHC
Tera
Bytes
LSST
DataNet
Genomics
Giga
Bytes
Distribution
Many smaller datasets…
2012
2020
Data Access
3
CIF21 and Transforming Research
EarthCube, Understanding the Phenome,
Clean Energy, Climate prediction, Social networking,
Complex networks, Health records, cybersecurity,
Matter-by-design, disaster recovery, etc
Grand
Challenges
Science, innovation, discovery, economic competitiveness
Networks
Expertise,
research
Compute,
Modeling
Communities
Sea of Data
Analytic Tools
CIF21
Software
Multi-disciplinary & multi-scale integration
4
NSF CIF21 Major Areas
Organizations
Expertise
Research and Scholarship
Education
Learning and Workforce Development
Interoperability and operations
Cyberscience
Universities, schools
Government labs, agencies
Research and Medical Centers
Libraries, Museums
Virtual Organizations
Communities
Discovery
Collaboration
Education
Computational
Resources
Supercomputers
Clouds, Grids, Clusters
Visualization
Compute services
Data Centers
Advanced
Computational
Infrastructure
Networking
Software
Applications, middleware
Software development and support
Cybersecurity: access,
authorization, authentication
Scientific Instruments
Large Facilities, MREFCs,,telescopes
Colliders, shake Tables
Sensor Arrays
- Ocean, environment, weather,
buildings, climate. etc
Data
Databases, Data repositories
Collections and Libraries
Data Access; storage, navigation
management, mining tools,
curation, privacy
Data
Infrastructure
Program
Campus, national, international networks
Research and experimental networks
End-to-end throughput
Cybersecurity
Broad Principles to Lead CIF21
Builds national infrastructure for S&E
 Leverages common methods, approaches,
and applications – focus on interoperability
 Catalyzes other CI investments across NSF

 Provides focus and is a vehicle for coordinating
efforts and programs

Based upon a shared governance model
involving all parts of NSF
 Managed as a coherent program by OCI

Spiral development methodology
6
Evolution of CIF21 and NSF Data Programs
On-going input
NSB
ACCI
Task
Force
DataNet
Awards
NSF
CIF21
Data
Programs
Science &
Engineering
Research
+
Cyberinfrastructure
Community
Input
7
Data Related Context

National Science and Technology Council (NSTC)
 http://www.whitehouse.gov/blog/2012/01/30/yourcomments-access-federally-funded-scientific-researchresults

Networking and Information Technology Research
and Development (NITRD)
 http://www.nitrd.gov/subcommittee/bigdata.aspx

National Science Board Data Policies Task Force
 http://www.nsf.gov/nsb/committees/tskforce_dp.jsp

Advisory Committee for Cyberinfrastructure (ACCI)
8
 www.nsf.gov/od/oci/taskforces/
NSTC RFIs for Public Comment Context

Two Requests for Information (RFIs) – Nov 2011
 Public Access to Digital Data Resulting from Federally
Funded Scientific Research
• Preservation, Discovery and Access
• Standards for Interoperability, Re-Use and Re-Purposing
 RFI for Scholarly Publications
 http://www.whitehouse.gov/blog/2011/11/07/requestinformation-public-access-digital-data-and-scientificpublications

Comment period closed on 12 Jan 2012
 Digital Data: 118 responses
 Scholarly Publications: 377 responses
 Individual and institutional responses
9
NSB Data Policy Task Force - Context

Dec 2011: NSB 11-79 Recommendations

http://www.nsf.gov/nsb/publications/2011/nsb1124.pdf

#1: Provide leadership … in the development and
implementation of digital research data policies ...
#2: … require grantees to make both the data and the methods
and techniques used in the creation and analysis of the data
accessible … Data should be shared using persistent electronic
identifiers …
#3: Continue to expand the support of computational and dataenabled science and engineering …
#4: Convene a panel .. to explore and develop a range of viable
long-term business models…
#5: Further the expansion of sustainable data management,
including preservation and curation of pre-existing and newly
generated long-lived data …




10
NSF Advisory Committee for
Cyberinfrastructure (ACCI)
Task Force - Context



Grand Challenges, HPC,
Data/Viz, Software, Campus
Bridging, Cyberlearning
More than 25 workshops and
Birds of a Feather sessions and
more than 1300 people involved
Final reports:
http://www.nsf.gov/od/oci/taskf
orces/
Data and Viz
Campus
Bridging
HPC
HIGH P ERFORMANCE COMPUTING
Grand
Challenges
Cyberlearning
Software
11
ACCI Data Task Force Recommendations




Recognize data infrastructure and services as
essential research assets fundamental to today’s
science and as long-term investments in national
prosperity
Create new citation models in which data and
software tool providers are credited with their
data contributions
Develop and publish realistic cost models to
underpin institutional/national business plans for
research repositories/data services
Identify and share best-practices for the critical
areas of data management
12
CIF21 and Data Enabled Science
Provide critical tools and services for data
mining, integration, analysis, modeling and
visualization.
 Overcome barriers to scaling, synthesis, and
interoperability to promote effective use of
large scale, shared data resources.
 Strategic investments that concentrate tools,
resources and expertise in support of
compelling grand challenge science
questions.

13
Data Infrastructure: A Multi-tiered
and Multi-Disciplinary Landscape
Data-enabled
Science
Observational
Communities
Modeling and Simulation
Communities
Population, Climate,
Environment
Communities
Data
Content
Data
Storage
DataNet supported
14
CIF21: Data-Enabled Science

Data-intensive Science Program (knowledge)
 Intensive disciplinary efforts, multi-disciplinary
discovery and innovation

Data Analysis and Tools Program (information)
 Data mining, manipulation, modeling, visualization,
decision-making systems

Data Services Program (data)
 Provide reliable digital preservation, access,
integration, and analysis capabilities for science
and/or engineering data over a decades-long timeline
Dumped On by Data: Scientists Say15
a Deluge Is Drowning Research
Data Curation




Sustainable, community-based networks for
management of critical scientific data resources in a
life-cycle context.
Overcome challenges of culture change, policy
development and implementation, sustainable
operations, quality and usability control.
Strategic awards that address heterogeneity in
formats, complexity, semantics of data collections
that are valued by science communities of significant
breadth.
Operate as a network of data services that promote
interoperability, multidisciplinarity, and scalability.
16
Data Storage

National storage infrastructure for scientific data
 Accommodate scale and heterogeneity through robust,
open, and broadly accepted standards
 Business model implemented with governmental,
academic, non profit, and commercial stakeholders

Make strategic investments that:
 Leverage existing resources in XSEDE, commercial
clouds, federal data centers
 Meet growing capacity needs at optimum cost
 Provide coordinating and integrative functions for
integrity, access control, availability, persistence

Catalyze a national data infrastructure
17
Cross Cutting Challenges
Balancing Research into Next Generation
infrastructure with operation & maintenance of
current capacity
 Sustainability through technical design,
development of business models, and integration
with the research cycle
 Integration

 Vertical – Linking low-level bit storage infrastructure
to data collections, and to applications
 Horizontal– Achieving connectivity and interoperability
between activities that vary in scale, disciplinarity, and
18
funding source
Summary

CIF21 is focused on effective ways to
approach and respond to the challenges
 Critical concepts and goals
 Realistic and innovative
 Spiral process with strong, on-going feedback

Structure for longevity
 Scalable open inclusive governance
 Long term business models
 International collaborations and programs
19