Transcript Document

The Generation Challenge
Programme (GCP)
Platform for Crop Research
Richard Bruskiewich
and the rest of …
1
…The GCP SP4 team and Contributors
Theo van Hintum (WUR), GCP Subprogramme 4 Leader
IRRI-CIMMYT
Crop Research
Informatics Laboratory
CIP:
University of British
Columbia:
Alexis Dereeper
Reinhard Simon
Mark Wilkinson
Matthieu Conte
Edwin Rojas
Brigitte Courtois
ICRISAT:
GSC Bioinformatics
Graduate
Program, BC
Cancer Agency:
CIRAD:
Manuel Ruiz
Graham McLaren
Guy Davenport
Bioversity:
Jayashree Balaji
Thomas Metz
Trushar Shah
Mathieu Rouard
ICARDA:
Martin Senger
Kyle Braak
Tom Hazekamp
Akinnola Akintunde
Ramil Mauleon
Sebastian Ritter
Milko Skofic
NCGR:
Mylah Anacleto
Raj Sood
Andrew Farmer
Michael Jonathan Mendoza
NIAS:
Gary Schiltz
Victor Jun Ulat
Yi Zhang
Masaru Takeya
SCRI:
Arllet Portugal
Sergio Gregorio
Koji Doi
Jennifer Lee
Ryan Alamban
Joseph Hermocilla
Kouji Satoh
David Marshall
Lord Hendrix Barboza Michael Echavez
Jeffrey Detras
Roque Almodiel
Shoshi Kikuchi
Cornell University:
EMBRAPA:
Terry Casstevens
Kevin Manansala
Marcos Costa
Pankaj Jaiswal
Jeffrey Morales
Natalia Martins
Dave Matthews
Georgios Pappas
ACGT:
Barry Peralta
Samart Wanchana
Rowena Valerio
Supat Thongjuea
Nelzo Ereful
Ayton Meintjes
Jane Morris
Benjamin Good
James Wagner
Overview

Generation Challenge Programme crop
informatics research and development

GCP platform architecture:

Domain model & ontology

Application development framework
Challenge Programme
“I challenge the next generation to use new
scientific tools and techniques to address the
problems that plague the world’s poor”
Dr. Norman Borlaug
http://www.generationcp.org
What is it?



An international research programme established in
2003, projected to last 10 years, and hosted by the
CGIAR with global partners from ARI and NARES
Research Themes Directed to Crop Improvement:

Genomics and comparative biology across species

Characterization of genetic diversity for allele mining

Gene transfer technologies
Five research subprogrammes, one of which is crop
information systems development.
Challenge Programme
Wageningen
John Innes Centre University
UK Netherlands
Agropolis
France
ICARDA
Syrian Arab Rep.
Bioversity
Italy
CAAS
China
Cornell
University
USA
NIAS
Japan
IRRI
Philippines
CIMMYT
Mexico
BioTec
Thailand
WARDA
Cote d’Ivore
ICAR
India
CIAT
Clombia
EMBRAPA
Brazil
ACGT
South Africa
CIP
Peru
IITA
Nigeria
ICRISAT
India
GCP Research: from Genotype to Phenotype
SP2: Functional
Assignment
Genetic
Resources
Process
Product
SP1: Allelic
Mining
SP3: Trait
Synthesis
NILs, RILs
Mapping pop.
Mutants
Genebank
Advanced
breeding lines
as vehicles
Genomic annotation,
Forward and
Reverse Genetics,
Gene arrays/gels
Germplasm
Genotyping &
Phenotyping
Marker-aided
Selection/
Transformation
Candidate genes
Beneficial alleles
Linked to Traits
Value-added
varieties
Integration across Diverse Crop Data
has
Genotype
• Inventory
• Identification (passport)
• Genealogy
has
• Genetic Maps
• Physical Maps
• DNA Sequence
• Functional Annotation
• Molecular Variation
(Natural or Induced)
• Location (GIS)
• Climate
• Day Length
• Ecosystem
• Agronomy
• Stresses
Germplasm
• Anatomical
• Developmental
• Field Performance
Molecular
Expression
• Transcripteome • Stress Response
• Proteome
• Metabolome
• Physiology
affects
Environmen
t
Phenotype
Crop Information Systems: the Next

Large, globally distributed consortium

Diverse research requiring a diversity of tools

Large data sets with diverse data types

Many legacy informatics systems and tools

Global data integration required…
Key Issue: Interoperability
Some Basic GCP Research Objectives

Compile a list of germplasm meeting specific
passport data criteria

Compile a list of genetic markers of interest
from genetic and QTL maps

Retrieve genotypes of specified markers, for
specified germplasm

Align gene expression data against QTL
positional evidence to identify candidate gene
loci for specified traits
A Generalized GCP Crop Research Integration Work Flow
Comparative
Map & Trait
Viewer
(NCGR/ISYS)
Get/analyse a
genetic map
Germplasm
Passport/
Phenotype/
Genotype
Querybuilder
Find germplasm
genotyped with
mapped markers
Comparative
(Functional)
Genomics
Tools
Get candidate
Select
Get genotype
&
genes
“interesting”
phenotype
of in map
candidateinterval
genes;
germplasm
get alleles
DIVA-GIS
Analyse source
environment of
germplasm
Plot
Getgermplasm,
functional
genotype and
information
about
phenotype
genes on
geographical maps
Generation Challenge Programme Domain Model & Middleware
Select adapted germplasm
with favorable phenotype &
alleles for further evaluation
Genetic
Map Data
Source(s)
Germplasm
Data
Source(s)
Genomics
Data
Source(s)
GIS
Data
Source(s)
GCP Information Platform: User Perspective
An environment that provides improved access
to data and analysis tools
integrated databases and tools
applications
GCP Information Platform – Developers’ Perspective
Data Registry
application layer
middleware
Tapir
MOBY, etc.
internet
local database layer
Generation CP Platform
http://pantheon.generationcp.org
GCP Platform - General Architecture

“Model Driven Architecture” based on
“platform independent” GCP scientific domain
models, parameterized with controlled
vocabulary (“ontology”)

GCP domain models mapped onto platform
specific implementations.

Reference (Java) GCP platform application
programming interface (API)
Semantics of the GCP Model Driven Architecture

GCP is trying to model the meaning (“semantics”) of the
crop research world.

Semantics is found in the domain model at three distinct
but interconnected levels:

System architectural level: general scientific semantics in
terms of high-level object concepts (“object types”) and their
global inter-relationships.

Entity level: attributes and behaviors internal to high-level
object types.

Attribute level: attribute values of objects that range over data
types: simple (e.g. identifiers, numbers), complex (other classes
of entities) or ontology (such as Gene Ontology (GO) terms, for
a gene product).
Layers of Semantics
Object Model of the Scientific Domain…
1
2 Phenotype
Observable
Germplasm
has a
has
an
Attribute
with a
Value
…Parameterized with Ontology
ranges
over
3
Plant
Ontology
GCP Domain Model Specification

High-level object types are specified with Unified
Modeling Language (UML) and associated text
narratives.

Major object classes are represented in the object
model. More specialized object types are specified
by subclassing major object types using ontology.

Reference model is coded by Eclipse Modeling
Language managed with source code versioning and
automatically compiled into other representations.
http://pantheon.generationcp.org/demeter
Scope of GCP Domain Model & Ontology

Core models: generic concepts – identification,
entities, features, organization, data management


Models heavily parameterized by ontology (e.g. entity and
feature “type” attributes)
Scientific models: extends core model into specific
scientific scopes relevant to GCP:

Germplasm data (including genetic resources passport)

Genomics including genotypes, maps, sequences and
functional annotation.

Phenotype data

Environmental data (including geographical location)
GCP Ontology

Every attribute in the GCP domain model with data
type SimpleOntologyTerm or subclass thereof, is an
integration point for an external ontology.

External public ontology (e.g. GO, PO, SO) reused
when available, and new ontology developed within
GCP to fill gaps.

Ontology consolidated into GCP database based on
GMOD Chado CV tables, indexed within platform
using a GCP formatted identifier (that retains the
source’s identifier).
GCP Domain Model Mappings
onto Platform Specific
Implementations
GCP Domain Model (UML/EMF)
GCP Platform
Java Middleware
& Applications
SOAP Web Services
(BioMOBY, SoapLab, GDPC)
XML Schemata:
GCP Data Templates,
BioCASE/Tapir
GCP Ontology
Database
OWL/RDF Ontology:
VPIN/SSWAP.info
http://pantheon.generationcp.org/demeter
Reference GCP Platform API

PantheonBase: a relatively simply core Java
Application Programming Interface (API) for
software integration:

DataSource: query data resources, using simple,
ontology-driven SearchFilter specifications

DataTransformer: computational input/output

DataConsumer: communicate data to viewers
http://pantheon.generationcp.org
GCP DataSource Interface
DataSource Interface
GCP Data Source Implementations

Direct Integration of relational databases (Spring
HttpInvoker, Hibernate, JPA):



Developed for ICIS, GMOD Chado (beta)
Protocols:

Generalized Java Client to connect to BioMoby web
services; Java support for GCP-compliant BioMoby web
service provider development (beta)

Support for BioCase/Tapir data source integration
(prototyped)

GCP-compliant GDPC data source (prototyped)

SSWAP/VPIN wrapper (under discussion)
Some other direct custom data source wrappers
Some GCP BioMOBY docs…
http://moby.generationcp.org
http://pantheon.generationcp.org/moby
http://cropwiki.irri.org/gcp/index.php/MOBY_Rice_Network
GCP BioMoby Support – a Synopsis
1.
MoSES + Dashboard developed (M. Senger).
2.
GCP model specific BioMoby datatypes specified.
3.
Java libraries partly developed for interconversion of GCP
BioMoby data types to/from GCP domain model Java
objects (Barboza).
4.
GCP DataSource Java implementation developed for
client side of BioMoby that maps GCP DataSource
find() use cases onto BioMoby web services using a
using XML configuration files (no coding).
5.
Java design pattern for modular implementation of BioMoby
web services that get their data from any GCP-compliant
DataSource that supports a given find() use case.
GCP BioMoby “Sandwich”
(Partial) Inventory of 3rd Party Data Resources
targeted for wrapping as GCP Data Sources
Data Type
Description
Microarray Data
MAXD database with microarray datasets from diverse
GCP commissioned or competitive projects.
Genetic and QTL
Mapping Data
QTL data available in ICIS, TropGenes. Genomic
Diversity and Phenotype Connector (GDPC) connecting
to Gramene, Panzea, GrainGenes et al.
Genomic Sequence
Data and Annotation
NIAS KOME full length cDNA and RAP genome
databases (?), connected to GCP web services by NIAS.
OryzaSNP and GCP comparative genomic databases.
Public sequence databases (via BioJava?)
Functional Genomics OryGenesDb mutant data (CIRAD); IR64 rice mutant
database (IRRI); Tos17 database (NIAS).
Germplasm Sample
Germplasm, passport, genotype and associated field data
Characterization Data available in ICIS databases; TropGenes, MGIS, ICRIS.
GCP Platform Implementations

Standalone workbench (“GenoMedium”)


Eclipse Rich Client Platform (RCP)
Web-based workbench (“Koios”)

AJAX, PHP, Java (server side), Java Web Start

NCGR Integrated SYStem (ISYS)

Direct tool integration (e.g. GCP MaxdLoad)
http://moby.generationcp.org
GCP Web-Based Search Engine
Summary of
query hits
GCP semantics
defined query
List of items
matched
View details at 3rd
party web site or in
locally invoked 3rd
party data viewer
http://koios.generationcp.org
(Partial) Inventory of 3rd Party Analysis/Viewer
Software being targeted for GCP Integration
Tool
Purpose
SoapLab2
Remote computational services access
Taverna
Bioinformatics work flow management
Apollo
Genome sequence browser
Cytoscape
Visualization of networks
ATV
Phylogenetic tree visualization
JalView
Comparative sequence alignments
TMEV
Microarray data analysis
EASE, Mapman
Gene functional annotation
CMTV
Comparative mapping and QTL
MAXDLoad & MAXDView
Microarray data management
GDPC tools (Browser,Tassel) Genomic diversity analysis
GCP “Pantheon” Project in CropForge
http://cropforge.org/projects/pantheon/
Closing Perspective

The GCP is a global consortium of 22++ crop
research partners who need to share diverse large
data sets and tools, in a globally distributed manner.

Given the scope and duration of the GCP,
developers within the consortium embraced the task
of developing public global informatics standards for
interoperability and integration.

The effort is an open source, global community
building exercise.

We welcome the participation of any and all
interested scientists and developers who might wish
to use and/or contribute to the further evolution and
application of these standards.