Nirav-july - Federation of Earth Science Information Partners

Download Report

Transcript Nirav-july - Federation of Earth Science Information Partners

The iPlant Collaborative
A CYBERINFRASTRUCTURE FOR THE
PLANT SCIENCES
R I C H A R D J O R G E N S E N , P. I .
UNIVERSITY OF ARIZONA
Nirav Merchant
[email protected]
University of Arizona
2009
ESIP July 2009 : Data Preservation and Stewardship Track
NSF Program Goals
“to create a new type of organization—a cyberinfrastructure collaborative for
plant science—to enable new conceptual advances through integrative,
computational thinking”
“to address an evolving array of grand challenge questions in plant science: the
driving force and organizing principles for the collaborative”
www.iplantcollaborative.org
2
The iPlant Collaborative
Single national project funded by NSF
•
$50M for 5 years (with possibility to renew for an additional 5 years)
•
Led by the University of Arizona (BIO5 Institute)
•
Partners:
•
•
•
•
Cold Spring Harbor Laboratories
Texas Advanced Computing Center
University of North Carolina, Wilmington
Purdue University
• Community-appointed Board of Directors
•
Chair: Robert Last, Michigan State University
• Grand Challenge driven
www.iplantcollaborative.org
3
iPlant Objectives
Foster computational thinking in biology: via biologists working with computer
scientists, to integrate computational approaches and human creativity
Be by, for, and of the community: Enable the research community to identify the
major problems in plant science, then develop & pursue needed CI solutions
Stay at the frontier of plant biology and computer science: Track and implement
latest technologies, concepts and strategies in relevant fields
www.iplantcollaborative.org
4
iPlant Education and Outreach
Prepare the next generation:
• Train scientists to use the cyberinfrastructure
• Promote innovative, interdisciplinary research
• Provide diverse mechanisms for training and education
• Engage K-12 students to career scientists
www.iplantcollaborative.org
5
Grand Challenge Proposals
• Associating genotypes with phenotypes (two proposals)
Plant responses to the environment; adaptation to a changing climate
• A Tree of Life for all plants
Up to 500,000 species
• Photosynthetic efficiency and the use of CO2
Evolution of C3 (more ancient) to C4 (more efficient; requires more energy)
• Modeling plant development
Computational morphodynamics
• Integration of taxonomic and ecological information
“What grows where and why”; strong geospatial component
www.iplantcollaborative.org
6
Selected Grand Challenges
• Phylogenetic relationships among species (iPToL)
• Massively large (500,000) species trees
• Ancestral character state reconstruction
• Gene/species tree reconciliation
• Genotype to Phenotype (Gen2Phen)
• Phenological quantitative modeling for Arabidopsis
• The effect of multi-scale, abiotic and biotic stress factors on plant phenotypes
• Effect of climate factors on plant diversity and ecological organization
www.iplantcollaborative.org
7
A Unified Cyberinfrastructure
• A constantly evolving software environment
• A single “core” architecture
• An ever-growing collection of many integrated tools and data sets, many external
• Transparent leveraging of a national computational resource infrastructure (clouds,
the TeraGrid, etc.)
• Customizable Discovery Environments (DE) for research
• Cross-tool integration
• Utilizing common core
• Several DE’s may exist to address a single Grand Challenge
www.iplantcollaborative.org
8
From Genes to Genomes
Rex Babin, Sacramento Bee
Reality: A lot more to be done
Drew Sheneman, The Newark Star Ledger
Putting it all to work
Wayne Stayskal, The Tampa Tribune
BioInformation :: Data Flavors
 Sequences
 Structures
 Images
 Video
 Audio
 Pathways (graphs)
 Traces
 Combination (eg Video & Traces)
 And much more …
Life scientist :: Data Wrestler






Volume of data is increasing
Resolution of data is increasing
Number of data repositories is
increasing
Ever increasing analysis options
Demands to share, collaborate data
(team science)
Do you know where your data is ?
(and your collaborators data !)
got data ?
If not …Wait for the $1000 whole genome
Nucleic Acids Research 2008 Webservers
94 Webservers
www.iplantcollaborative.org
15
Nucleic Acids Research 2009 Databases
1170 Databases
www.iplantcollaborative.org
16
Get me connected
Get me connected ….do no evil
Confluence of omics
Genomics
Proteomics
Modeling
Pathways
Systems
Biology
Pharmaco-
genomics
Clinical
Functional
Genomics
Metabolomics
The paradigm shift
 Classic paradigm: You produce data, analyze, interpret (end
to end)
 Conventional paradigm: Consortium/centers produce data
and you consume it
 New Paradigm: Consortium/centers have produced data
and creating “cyber infrastructure” to tackle the “grand
challenge”
Data Preservations
Pressing data issues:
• Typical data management bottle necks
• Understanding data and information life cycle
management for each grand challenge
• Can we create a unified approaches
• Can the solution scale with complexity ?
iPlant Efforts
Technical approaches
• Leveraging virtualized infrastructure (snap
shot/hibernate)
• Using checkpoint/restart for migration of data and active
workflow
• Make transition to and across HPC resources more
transparent
• Beg, borrow from exemplar open source projects (and
contribute back)
• Learn best practices from other communities
Acknowledgements
University of Arizona
Steve Goff
Martha Narro
Karla Gendler
Sudha Ram
Greg Andrews
Sue Brown
Vicki Chandler
Nirav Merchant
Steve Rounsley
and others
Cold Spring Harbor Laboratory
East Main Educational Consulting
Barbara Heath
Lincoln Stein
Matt Vaughn
Simple Semantic Architecture and Protocol
Sheldon McKay
Gary Schiltz, Greg May and alumni at NCGR
Doreen Ware
Chris Town and alumni at JCVI
Dave Micklos
Lincoln Stein, Doreen Ware,
Rob Martienssen
Arizona St Univ / Texas Adv Comp Ctr and Shuly Avraham at CSHL
Rex Nelson, David Grant, and alumni at ISU
Dan Stanzione
We gratefully acknowledge NSF funding 0516487 to DDG and
Purdue University
USDA ARS funding 3625-21000-038-01 to Greg May
Rebecca Doerge
University of North Carolina-Wilmington
Ann Stapleton
www.iplantcollaborative.org
23