Transcript SAB-2010

Gramene
Scientific Advisory Board
December 14, 2010
Gramene SAB 2010
1
Introduction of SAB Members
• David Marshall (SCRI)
• Paul Flicek (EBI)
• Michael Ashburner (Cambridge)
• Anna M McClung (USDA-ARS)
• Patricia Klein (Texas A&M)
• William Beavis (Iowa State)
• Tim Nelson (Yale)
• Georgia Davis (Missouri)
Gramene SAB 2010
2
Introduction of Gramene
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
Doreen Ware (CSHL, PI)
Susan McCouch (Cornell, PI)
Pankaj Jaiswal (OSU, PI)
Ed Buckler (Cornell, PI)
Vindhya Amarasinghe (OSU, Pathways)
Karthikeyan Athikkattuvalasu (Cornell, Diversity, Phenotypes)
Terry Casstevens (Cornell, Diversity)
Charles Chen (Cornell, Diversity)
Aaron Chuah (CSHL, Diversity)
Genevieve DeClerck (Cornell, Diversity)
Palitha Dharmawardhana (OSU, Pathways)
Marcela Monaco (CSHL, Pathways)
Will Spooner (CSHL, Genomes)
Joshua Stein (CSHL, Genomes)
Jim Thomason (CSHL, Germplasm, Website, Pathways, Genes)
Sharon Wei (CSHL, Genomes)
Ken Youens-Clark (CSHL, Project Manager, etc.)
Gramene SAB 2010
3
Aim 1: Genomes
Doreen Ware, PI
Sharon Wei, Will Spooner, Ken Youens-Clark,
Jim Thomason, Marcela Monaco, Josh Stein,
(Total Full Time Equivalent [FTE] 3.5)
Note: hired 25% FTE (Josh) to replace Noel Yap
who left the project in the Cornell Group
1.5 FTE available from Ware, Dvorak NSF
collaborations
Gramene SAB 2010
4
Suggestions From Last Year
• Add Brachypodium
– Added in Release 29
• Add a basal plant, e.g. Selaginella
– We chose Physcomitrella patens because it was better
documented at the time (GB record and published)
– Selaginella now has GB record and will be investigated for 2011
• Add a Solanacea and/or Legume
– We are adding tomato in 2011 and are looking into either
soybean or Medicago
• Display RNAseq data
– We now have the ability to display as DAS track (see
maizesequence.org)
– Need to investigate data sources
Gramene SAB 2010
5
Highlights in 2010
• Genomes: 3 new; many updates
• Software: Ensembl 59 provides new visualizations
– SNP view
– SNP Mart
– Multi-species view
– Multi-sequence alignment
• New Analyses
– Gene-centered synteny build
– EPO multi-sequence alignment
– Split-gene detection
• New Development
– GERP Conservation (Sharon)
– GWAS views (Aaron, NSF 2010 collaboration)
– Tandem arrays (Josh, Will)
Gramene SAB 2010
6
17 Genomes in Release 32
•
•
•
•
Physcomitrella (moss): Basal land plant
Updated assemblies of grapevine & poplar
Updated annotations of Indica rice & Arabidopsis
Updated assemblies & annotations of Oryza chr 3S projects
Species
Physcomitrella patens
Oryza nivara (AA) 3S
New
Oryza rufipogon (AA) 3S
Oryza sativa ssp. indica
Brachypodium distachyon
Vitis vinifera
Populus trichocarpa
Arabidopsis thaliana
Updated Oryza brachyantha (FF) 3S
Oryza glaberrima (AA) 3S
Oryza officinalis (CC) 3S
Oryza punctata (BB) 3S
Oryza minuta (BBCC) 3S
Oryza barthii (AA) 3S
Oryza sativa ssp. Japonica
Unchange
Arabidopsis lyrata
d
Sorghum bicolor
Assembly
Annotation
v1.1
nivara_454_AGP July 2010
rufipogon_454_AGP July 2010
BGI-2005
Brachy1.0
IGGP_12X
JGI 2.0
TAIR10
brachyantha_454_AGP July 2010
BAC_Sanger_2009, Sep 2009
Officinalis_3S Sep 2009
Punctata_3S Sep 2009
Minuta_CC_3S Sep 2009
BAC_pool_2008
MSU 6.0
Araly1.2
Sbi1
v1.1
CSHL_v1.1
CSHL_v1.1
BGI GLEAN 2008
Brachy1.2
Genoscope 2010
JGI 2.0
TAIR10
CSHL_v1.1
CSHL_v2.1
CSHL_v2.1
CSHL_v2.1
CSHL_v2.1
CSHL_v2.1
MSU 6.0
Araly1.2
Sbi1.4
Gramene SAB 2010
7
Genome Plans 2011:
Planning :
• Lycopersicon esculentum (tomato)
• Oryza glabberima (African domesticated rice)
• Oryza brachyantha (wild rice)
• Aegilops tauschii (wheat D, NSF #0701916)
Investigating:
• Selaginella moellendorffii (basal vascular plant)
• Triticum aestivum (hexaploid wheat)
• Malus x domestica (apple)
• Glycine max (soybean) or Medicago
Gramene SAB 2010
8
Collaborations Genomes
–
–
–
–
–
–
–
–
–
–
–
–
–
NSF PGI #0638820 PI Wing end 2009 (wild rice OMAP)
USDA ARS Grape end 2009
NSF PGI PI Buckler end 2009
NSF 2010 #0723510 PI Nordborg end 2011 (Arabidopsis
thaliana, A. Lyrata, Capesella)
NSF #0701916 PGI PI Dvorak end 2011 (wheat)
NSF PGI PI Wilson end 2010 (maize)
NSF PGI PI #0723510 Scanlon end 2012 (maize)
NSF PGI PI Springer to start this year (maize)
NSF PGI PI Wing end 2011 (wild rice OGE)
NSF PGI #1032105 PI McCombie end 2012 (wheat)
EBI BBRSC Paul Kersey (travel for coordination
participants)
NSF PGI PI McCouch end 2014 (rice)
NSF XXX Iplant Steve Goff
New Maps and Markers
New maps in last year:
•Sorghum genetic (Mace)
•Barley genetic (Close)
•Ae. tauschii genetic (Dvorak)
•Switchgrass genetic (Tobias)
Gramene SAB 2010
10
More genomes in CMap
Added two more fully sequenced genomes to CMap with
seq/seq comparisons based on orthology (build 32).
Gramene SAB 2010
11
New SNP View
Shows functional consequences of polymorphism
New in Ensembl 56
• Synonymous coding
• Non-synonymous
coding
• Stop gain/loss
• Splice site
• UTR
• Intronic
Rice
Maize
160,000 SNPs x 21 varieties (incl. Nipponbare ref.) from OryzaSNP, MSU6
1.6 million SNPs x 27 NAM founder lines from Panzea, AGPv1
2010 Project SNP Discovery: 637,522 SNPs x 21 ecotypes (incl. Col-0 ref.), TAIR9
Arabidopsis
2010 Project 250K SNP chip genotypes v3.04, 214,000 SNPs x 1179 ecotypes, TAIR9
1001 Genomes/WTCHG SNPs from dbSNP, 2.7 million SNPs, 17 ecotypes, TAIR9
Grape
71K SNPs (Myles et al.)
Gramene SAB 2010
12
SNP BioMart
Available for rice japonica, rice
indica, Arabidopsis & grape datasets
Configure output fields and
format (XLS, CSV, TSV, or HTML)
If HTML, link to Variation,
Gene, or Browser Pages
Filter on region, phenotype,
strains, id, & consequence (e.g.
introduced STOP codon), and other
attributes
Gramene SAB 2010
13
Whole Genome Alignments
BLASTZ-CHAIN-NET between 20 pairs of species
Alignment (Release)
Oryza sativa Japonica
Oryza sativa Indica
Sorghum bicolor
Brachypodium distachyon
Arabidopsis thaliana
Arabidopsis lyrata
Vitis vinifera
Poplar trichocarpa
Oryza glaberrima 3s
Oryza minuta CC 3s
Oryza officinalis 3s
Oryza punctata 3s
Physcomitrella patens
Schwartz S et al., Genome Res.;13(1):103-7
Kent WJ et al., Proc Natl Acad Sci U S A., 2003;100(20):11484-9
O.jap
31 O.ind
31
S.bic
31
31
31
31
31
31
31
31
31
31
31
31
32
-
B.dis
31
-
A.tha
31
31
31
32
New & improved
alignment viewer
(Ensembl 56)
Gramene SAB 2010
14
Multispecies View
Re-introduced in
Ensembl 56
• Stack any number of
genomes aligned to a
common reference
by BLASTZ
• Browse & zoom along
any genome
independently
Gramene SAB 2010
15
Automated Detection of Split Genes
Special class of “paralog” since Ensembl 58
Contiguous split paralog: Non-overlapping, nearby (<1 Mb), same strand
Putative split paralog: Non-overlapping, different regions (e.g. scaffolds)
Genome alignment confirms inconsistent annotation
Species
Split Genes
Populus trichocarpa
1181
Sorghum bicolor
1087
Oryza sativa Japonica
916
Vitis vinifera
520
Oryza sativa Indica
365
Zea mays
280
Arabidopsis lyrata
202
Arabidopsis thaliana
137
Brachypodium distachyon
101
Gramene SAB 2010
16
Gene-Centered Synteny Build
2010: Implemented with automated pipeline runnables
• Release 31: monocots
• Release 32: dicots
Compara Orthologs
Collinear mappings (DAGchainer)
“in-range” mappings near collinear anchors
Oryza sativa Japonica
Map
O.jap
Brachypodium distachyon
YES
B.dis
Sorghum bicolor
YES
YES
S.bic
Arabidopsis thaliana
-
-
-
A.tha
Arabidopsis lyrata
-
-
-
YES
A.lyr
Vitis vinifera
-
-
-
YES
YES
V.vin
Poplar trichocarpa
-
-
-
YES
YES
YES
Gramene SAB 2010
P.tri
17
Grape Reference Highlights Duplicated
Regions in Arabidopsis and Poplar
• Polyploid and segmental
duplications manifest as cosyntenic regions
• SyntenyView links to
browser: Thus users can
easily navigate between
duplicated regions
Gramene SAB 2010
18
EPO Multiple Alignment & Ancestor Reconstruction
• Gramene implementation in 2010
• Release 32: 8-way EPO alignment
– Rice japonica, indica, Brachypodium, sorghum, Arabidopsis,
A. lyrata, grape, poplar
Paten et al (2008) Genome Research 18:1814
Paten et al (2008) Genome Research 18:1829
2010 Genomes Development:
Constrained Elements
• Genomic Evolutionary Rate Profiling (GERP): measures
purifying selection
• Method testing using 4-way and 8-way EPO alignments as
input with varying parameters
• Input tree generated from 1301 ortholog sets
• Planning release in 2011
Cooper et al (2005) Genome Research 15:901
Gramene SAB 2010
20
2010 Genomes Development
Gramene SAB 2010
21
Tandem Duplicate Detection
Species
Rice japonica
Sorghum
Maize
Arabidopsis
Clusters Genes Largest
Function
2519
7054
24
phytosulfokine receptor-like (LRR-kinase receptor)
2182
5927
19
Chalcone-stilbene synthase like
1871
4564
22
DUF1754 (domain of unknown function)
1738
4581
28
ECA1 gametogenesis related family
• Adjacent paralogs with no more
than 2 intervening unrelated
gene
• Increase gene dosage
• Diversifying selection
• Often species-specific
LRR-Kinase species-specific
expansions
LRR-Kinase cluster in rice
Gramene SAB 2010
22
Collaboration with Ensembl Genomes
•
•
•
•
•
Share conference calls
Developers meeting (Hinxton, UK, Sept. 2010)
Co-authored papers/posters
Two releases
Ensembl Developer’s Workshop
Gramene SAB 2010
23
Website Improvements
• Home facelift:
quick entrypoints
• Migrated to
Apache 2.0 in
Release 31
REST
Interfaces
New RESTful interface
for site gives greater
user control over data
views and format
Gramene SAB 2010
25
New Oryza Pages
• Highlights this genus with images, phylogeny,
geographic origin, & traits of interest
• Entry points to browsers, germplasm, markers, &
taxonomy ontology
Gramene SAB 2010
26
Web Services
• Distributed Annotation Server (DAS) serving
Ensembl genes as well as Gramene markers,
sequences, and QTL
• Gramene Mart integration with Galaxy
• Public MySQL server
• Diversity data via Tassel and GDPC
• Subversion for code access
Gramene SAB 2010
27
Browser Development 2011 Plans
• Communicate/distinguish gene-confidence information
– 28% of MSU6 rice genes are annotated as “TE_related” and 17% are in
poorly-conserved “hypothetical” class
– 20% Sorghum genes are “low-confidence” (TE, pseudogenes, etc)
– Color-code or display in separate tracks in browser
– Color-code in gene-tree display
• List/Display detailed gene-level synteny information
– Explicitly list syntenic genes from Gene Page
– Indicate that a gene is syntenic to one or more genes of a different species
within the browser (e.g. color-code or synteny track)
• List co-syntenic genes
– 2 genes (in separate blocks) having synteny to a common gene in another
species arose from a large scale duplication event (e.g. polyploidy or
segmental).
• Tandem Array track
– Indicate clusters of paralogous genes within browser
• [Challenges of low-depth or highly fragmented genomes, e.g. wheat &
Physcomitrella]
Gramene SAB 2010
28
2010 Ongoing Development Work
• miRNA pipeline runnable
– Refine and automate steps in miRNA
annotation
– Vmatch alignment
– mfold RNA secondary structure prediction
– Filter based on secondary structure
• Gene-Build with RNAseq evidence data
– First pilot experiments performed
Gramene SAB 2010
29
Questions for the SAB?
• Nominate genomes
• New data types e.g. RNAseq data
available for current genomes that we
may not be aware of
• Any physical aspects of web site
needing improvement
Gramene SAB 2010
30
Aim 2: Pathways
Pankaj Jaiswal, PI
Palitha Dharmawardhana, Jim Thomason,
Vindhya Amarasinghe, Liya Ren,
AS Karthikeyan, Marcela Monaco
Note: Liya left the project this year and has been
replaced by Marcela.
Gramene SAB 2010
31
Aim#2 Plan (2009-2010 / Year-3)
• Continue curating Rice and Sorghum Pathways
• Release MaizeCyc and BrachyCyc
• Add all available microarray probesets to MarkerDb and
allow OMICS viewer to validate
• Develop Reactome database for (Rice)
• Update the gene database schema to structure the
allele based annotations on function, phenotype and
interactions.
• Maintain and Develop Ontologies
32
Added BrachyCyc, MaizeCyc
Updated Pathway tools twice to
latest versions.
Updated the individual pathway
databases twice to be consistent
with the Pathway tools version
Rice Pathways curated by
addition of hydroxycinnamic
acid and serotonin biosynthetic
pathways, updates to auxin
biosynthesis, tryptophan
biosynthesis. Addition of 80
transport reactions and 477
transporters
Gramene SAB 2010
33
Suggestions from last SAB
Concerns on supporting three technologies: Cyc,
Reactome, WikiPathways.
Suggested moving to Reactome and allow the
Cyc and WikiPathway databases to be
populated by automated exports using BioPax.
Gramene SAB 2010
34
Reactome Database Build
• Reactome:
– Rice
• Start with RiceCyc import and build on the existing Enselmbl and
Curated Genedb resources
– Arabidopsis
• After consulting with the Reactome project and the Arabidopsis
Reactome group, this will become part of the renewal effort. The work
on it will start with integrating it in the Reactome central database from
its current location in JIC (www.arabidopsis reactome.org) , followed by
active curation.
• Active curation will be primarily done in collaboration with Nick
Provart’s group at Univ. of Toronto.
• This is a new International Collaboration
– Plan is to integrate the plant specific Reactome database
instances in the Reactome central database, but provide a
modified user interface for users.
Gramene SAB 2010
35
Rice Reactome
• Initial build of the Rice Reactome started by importing the complete
(curated and predicted) RiceCyc data in BioPax level-2 format.
• A test-v2 Rice Reactome is available from this link.
– The Reactome tools with some tweaking successfully imported 375
pathways and the children reactions
– Efforts are now on to integrate the mappings to
• ChEBI, Ligand and PubChem for compounds/metabolites
• KEGG for EC enzymes
• Uniprot
– Drawing the network diagrams requiring manual curation.
• Priority is to draw networks for fully curated Rice Pathways by using the Reactome tools
– Integrate predicted models of regulatory pathways for rice based on the
reference pathway projections for cell cycle, transcription, translation
etc.
– Curate test case rice pathways
• Organized a week long workshop attended by curators from Gramene and BAR-Univ. of
Toronto (Nick Provart’s group)
• Mentored by Reactome co-PI Peter D’Eustachio
• A test case of ABA metabolism and signaling was curated, which contained both the
molecular and genetic interaction datasets.
Gramene SAB 2010
36
ABA metabolism and signaling pathway
Klinger et al J. Exp. Bot. (2010)
61 (12): 3199-3210.
Reactome model: A prototype reaction network, ABA-mediated transcriptional regulation, was laid out
using material from Nambara & Marion-Poll (2005 – PMID: 15862093) to supplement the pathways of ABA
synthesis and catabolism available as RiceCyc templates, and the regulatory processes discussed by Xiong
et al. (2002 – PMID: 11779861) (especially Figure 10) and Klingler et al. (2010 – PMID: 20522527)
Gramene SAB 2010
37
Automated Cyc and
WikiPathways builds
•
•
•
•
•
Based on the SAB suggestions, the progress has been made towards the goal
of extending the annotation of pathway databases in Cyc and Wiki versions in
an automated way.
However to do that approach we have to streamline the data workflow and
structure the current curated gene database as a central
repository/aggregator of necessary datasets to help achieve this goal.
The Curated Gene database schema was restructured to hold, whole genome
based annotations on genes and alleles and their associations to function,
phenotype, germplasm, pathways, gene-to-gene interactions, gene products,
and gene models, besides providing cross references to sequencing project
objects (like gene models from IRGSP-RAP, MSU-OSA, BGI gene models for
rice O. sativa) and published literature.
Use aggregated datasets for automated Cyc build using the standard patwhay
tools and provide the BioPax and SMBL dumps to WikiPathways project for
their users.
Gramene’s focus will be pathway curation and annotation in Reactome and
functional annotation in gene database.
Gramene SAB 2010
38
Outreach
• Curated rice specific pathways and compounds contributed to
PlantCyc and MetaCyc projects on reference pathway databases.
• Organized Workshops
– Community Gene Annotation Workshop at Plant Biology 2010 (July 2010)
• Jointly organized with Plant Ontology (PO) Project.
• Provided meeting support by way of website portal and onsite helping hands
• Tool development (plant configurations of Phenote annotation tool and
Ontologies) and funding provided by PO project.
• Attended by about 35 researchers of which 12 were awarded travel support by
PO.
– Reactome workshop at CSHL, 25-29 October 2010
•
•
•
•
•
Attended by Gramene and BAR curators
Mentored by Reactome database (Peter D’Eustachio)
Hands on curation of a test case pathway.
Analysis of RiceCyc import and current Reactome Annotation tools.
Development of curation strategy and annotation guidelines.
Gramene SAB 2010
39
Plans for 2010-2011
• Release Rice Reactome
• Release curated gene database in new avatar as
aggregator of gene information
• Integrate microarray probeset mappings in OMICS
validator for non-rice pathways
• Conduct the gene and pathway annotation outreach
workshops.
• Develop test cases for upcoming Renewal and strategies
for analyzing large-scale datasets generated by NextGen
technologies on transcriptomics and metabolomics.
• Maintain the current Cyc based Pathway views upgare to
v14.5 and later of Ptools
Gramene SAB 2010
40
Pathway Collaborations
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
Metacyc/BioCyc (Peter Karp)
Reactome (Lincoln Stein, Peter D’Eustachio)
Arabidopsis Reactome (Nick Provart, Henning Hermjakob)
PlantCyc (Sue Rhee)
SolCyc and Solanaceae Genome Network (Lukas Mueller)
Phenote curation tool (Nomi Harris, Suzi Lewis)
Ontologies (GO, PO, OBO)
BrachyBase (Todd Mockler)
Sorghum Biofuel and Bioenergy Project (John Mullet)
MaizeSequence.org
MaizeGDB
Maize Pathways (Andrew Hanson)
C3-C4 project (Tim Nelson, Tom Brutnell, Chris Myer, R. Bruskiewich)
WikiPathways
Expression data (Todd Mockler, Tim Nelson, Tom Brutnell)
Gramene SAB 2010
41
Questions for SAB?
• Nominate Pathways
• Types of analysis users are interested in
• Potential collaborators (national and
International)
Gramene SAB 2010
42
Aim3:
Gramene Diversity Module
Susan McCouch & Edward Buckler, PIs
Terry Casstevens, Genevieve DeClerck,
Charles Chen, AS Karthikeyan,
Jon Zhang, Qi Sun, Ken Youens-Clark.
Gramene SAB 2010
43
Suggestions from last year
• Integration with key tools
– We provide new SNP query tool, Weblaunched Tassel, and downloads to work
with Flapjack, in formats like Plink,
HapMap, etc.
• How about genotype storage?
– Implemented BLOBs to store SNPs
New Data Sets
• Arabidopsis
– Atwell et. al.. Genotype, phenotype, association
data. ~214,000 SNPs, 199 Germplasm, 107
Phenotypes.
• Rice
– Zhao et. al PLoS May 2010, "1536 Assay": 1311
SNPs x 395 varieties, mapped to MSU6.0
– Gross B, et. al, Mol Ecol. Aug 2010 SNP diversity
study from PG
• Maize
– dbSNP IDs and AGPv2 coordinate update for
current dataset (1.6 million SNP x 27 NAM lines)
Web Interface – SNP Query
Downloads
Tassel
GWAS Visualization
Gramene SAB 2010
49
Tassel Development
•
•
•
•
•
•
•
•
New data structure significantly improving memory efficiency
Alignment viewer
User-friendly “wizards”
Progress monitoring with ability to cancel tasks
Import/export Hapmap, Flapjack, Plink data formats
Auto-loading and analysis execution from web site startup
GLM and MLM:
– GLM interface simplified.
– Compression and faster P3D implemented for MLM resulting in reduced
runtime.
– Matrix Algebra library wrapper written to make switching to newer, faster
libraries easier.
– EJML Matrix Algebra library interface implemented.
Tassel 3.0 Pipeline…
– Automates complex loading/analysis pipelines
– Doesn't need Java coding to create
– Has simultaneously executing pipeline segments
– Works from web site launch, command line, and GUI
Selection of
candidate genes
- Experimental evidences (from other species, e.g. Arabidopsis)
- Ontology terms
- SNP positions
- Linkage disequilibrium estimates (r2)
Compara
pipeline
Prior-candidate
genes
Hapmap SNP
information
Linkage block size
calculations
- Coordinates of the genes
- Functional implication or annotations
Hapmap SNP
information
Enrichment score
calculations
-SNP positions
GWAS associations
- Associated SNP map positions
- p-values
Functional
implications
Linkage block size for ith prior candidate is
given by:
Bi = 95% quantile {di1, di2, di3,…dix}
di1, di2, ..and dix are the map distances of
the SNP loci in the gene to other loci on
the same chromosome that are in a
perfect LD (r2=1.0)
for ith prior candidate gene, the
enrichment score, Ei, is calculated by the
weighted hypermetric probability of
observing gi significant associations in
the linkage block Bi, given the number of
SNP xi located in the block and the total
number of Gt SNP loci on the
chromosome
Functional implication of prior candidate
genes
by statistically significant
overrepresentation of association signals
Example: Days-to-silk flowering time associations of maize chromosome 8
- Maize first generation hapmap 1.6 M SNP of all chromosomes- 136,119 SNPs on chromosome 8
- Flowering time trait, Days-to-Silk, of maize GWAS associations on chromosome 8- 144 associations (p-values < 1e-6)
- Curated Arabidopsis flowering time candidate genes- 274 genes in total
- Compara orthology of maize homologs to Arabidopsis flowering time candidates- 74 prior candidate genes
- Linkage disequilibrium estimates (r2) from 136,119 SNPs, filtered with MAF > 0.05
- Genetic distances calculated from each maize candidate gene to 144 GWAS associations
- Genetic distances of every pair of SNP loci in a perfect LD (r2=1.0)
Linkage block size calculations
Probability
95% quantile
Linkage block
size =105,387
bp
0
0.2 Mb
0.4Mb
0.6 Mb
0.8 Mb
genetic distance of SNP loci
Empirical cumulative probability distribution of genetic
distances estimated by the SNP loci that are in a perfect LD
Enrichment score calculations
Enrichment score for ith gene:
Suppose GWAS identify Mt SNPs significantly associated with flowering time variation in Nt total
number of SNPs on a given chromosome.
The enrichment score (Sei) determines the probability of getting gi number of significant GWAS
association, weighted by p-values, within a linkage block.
Sei = log10
æM t ö æ N t - M t ö
ç
÷´ ç
÷
è gi ø è xi - gi ø
æNt ö
ç ÷
è xi ø
Mt: total number of significant GWAS SNPs on a given chromosome
Nt: total number of SNPs on a given chromosome
where
gi: significant GWAS SNPs in the defined window
xi: number of SNPS in the defined window
Sei: enrichment score of the ith maize flowering time candidate
gene
14
10
6
FT maize homolog
AGL79 maize homolog
4
Chromosome 2
Chromosome 3
GRMZM2G098443
GRMZM2G030762
GRMZM2G700665
GRMZM2G479610
GRMZM2G375448
GRMZM2G020291
GRMZM2G134941
AC209819.3_FG009
GRMZM2G021614
GRMZM2G179264
GRMZM2G049661
GRMZM2G100318
GRMZM2G054380
GRMZM2G082490
8
Log10 of odds of maize flowering time
prior candidate gene
GRMZM2G115960
GRMZM2G062262
GRMZM2G365688
GRMZM2G072052
GRMZM2G160514
GRMZM2G104549
GRMZM2G103666
GRMZM2G466139
AC208915.3_FG010
GRMZM2G157605
GRMZM2G169654
GRMZM2G057150
GRMZM2G178102
GRMZM2G026643
GRMZM2G097182
GRMZM2G099461
GRMZM2G180406
GRMZM2G133168
18
GRMZM2G059358
GRMZM2G089159
GRMZM2G105869
GRMZM2G107945
GRMZM2G408768
GRMZM2G061734
GRMZM2G010505
GRMZM2G129034
GRMZM2G438260
GRMZM2G175718
GRMZM2G105317
GRMZM2G048494
AC197575.3_FG008
GRMZM2G021044
GRMZM2G033962
GRMZM2G474468
GRMZM2G021560
GRMZM2G081812
GRMZM2G083504
GRMZM2G143602
GRMZM2G062019
GRMZM2G148693
GRMZM2G067915
GRMZM2G395244
GRMZM2G174784
GRMZM2G080054
GRMZM2G039996
GRMZM2G170934
20
GI maize homolog
16
TOC1 maize homolog
12
rap2.7
AP2 maize
homolog
LOD =2*
2
0
Chromosome 8
* Probability of null hypothesis is assessed by randomizing the association results with respect to the SNP
positions, without changing the number and strength of association signals.
Plans - Rice
• Rice Diversity 44K chip: ~39,000
SNPs, 400 rice lines, phenotype data
for 23traits - Build 33
• Rice SNP Consortium 1M chip data Build 34
• Curate key large GWAS results
Plans Maize, Arabidopsis
• Maize Diversity/Panzea, 56 million
SNPs x 104 maize lines (Build 33)
• Phenotypic data for an additional 1020 traits (depending on publication
acceptance rate)
• Additional data from Arabidopsis 2010
Project
• Curate key large GWAS results
Diversity Collaborations
• Rice:
– McCouch (#0606461, #1026555)
– Wing (#1026200)
– Purugganan (#0701382)
– Olsen (#0638820)
• Arabidopsis: Nordberg (#0723510)
• Maize: Buckler (#0820619)
Gramene SAB 2010
57
Plans - Software
• Google Web Toolkit for association data viewer
• SNP Query - additional features
• TASSEL
– Flapjack integration. Work with SCRI to create seamless
connectivity between the two applications
– Complete support for heterozygous data
– Greater Junit testing (regression testing)
– Automated MLM/GLM association analysis
– New graphical displays (i.e., Manhattan plot)
– Improvements to kinship calculations, imputation function
• Functional implications from GWAS associations -- develop webbased interface for statistical method
Plans – Comparative GWAS
• Develop web-based interface for
comparative candidate gene
enrichment system.
Diversity Questions for
the SAB
• What should happen to diversity data
in the renewal?
– Large projects such as SeeD (CIMMYT),
Wheat/Barley CAP, GRIN-Global will likely
go to new standards
• What needs to be done to transition?
Gramene SAB 2010
60
Aim 5: Outreach
Everyone
Gramene SAB 2010
61
Gramene SAB 2010
62
Tutorials
OpenHelix’s Gramene tutorial
went live the end of March, 2010.
As of Sept. 7, The tutorial includes
a self-run tutorial as well as
PowerPoint slides, handouts, and
exercises. In the five months it
has been available, the landing
page has received 305 views, with
36 viewings of the tutorial.
Five new Gramene-produced
tutorials such as this one on
pathways.
Gramene SAB 2010
63
Meetings and Presentations
– Presentations
• PAG
• Rice Technical Working Group
• Maize conference
• International Symposium on Integrative Bioinformatics
• Evolution
• ISMB
• Genome Informatics
• Agronomy, Crop and Soil Sciences Meeting
– ASPB curation workshop with hands-on exercises
– Other:
• Gramene Retreat (CSHL, June 2010)
• Plant Ensembl developers meeting (Hinxton, Sept. 2010)
• Plant Reactome training workshop (CSHL, Oct. 2010)
• Ken and Jim TA’d bioinformatics course (CSHL, Oct. 2010)
Letters of Support
• Wise/Dickerson, NSF-PGRP TRPGR: NextGen PLEXdb (0543441)
• Ana Caicedo (UMass) The evolutionary genomics of invasive weedy
rice (0638820)
• Rod Wing CPGS Oryza Genome Evolution (1026200)
• Dick McCombie CPGS: Gene Discovery in Wheat (1032105)
• Carolyn Lawrence, NSF-PGRP GERP: Functional Structural Diversity
Among Maize Haplotypes (0743804)
• Steven Briggs, TRPGR Discovery, revision, and validation of maize
genes by proteogenomics (0924023)
• Matt Vaughn, Epigenetic Variation in Maize (0922095)
Gramene SAB 2010
65
Publications
•
•
•
•
“Gramene database in 2010: updates and extensions” (Youens-Clark, et al.)
Nucleic Acids Research, 2010, 1–10 doi:10.1093/nar/gkq1148.
“Fine Quantitative Trait Loci Mapping of Carbon and Nitrogen Metabolism
Enzyme Activities and Seedling Biomass in the Intermated Maize IBM Mapping
Population.” (Zhang, Chen, Buckler, et al.) Plant Physiology, in press.
“Gramene database: a hub for comparative plant genomics.” (P Jaiswal).
Methods Mol Biol. 2011;678:247-75. (invited book chapter)
“Applications and methods utilizing the Simple Semantic Web Architecture and
Protocol (SSWAP) for bioinformatics resource discovery and disparate data
and service integration.” (Nelson et.al) BioData Min. 2010 Jun 4;3(1):3.
Coming Up:
• “Gramene GeneTrees: A comprehensive database of phylogenetic trees in
plants and other model Eukaryotes” (Plant Phys)
• RiceCyc
• Diversity
• Genome sequence analysis
Gramene SAB 2010
66
Plant Ensembl Collaboration
• Lead: Will
• EBI Participants: Paul Kersey, Paul
Derwent, Dan Staines, Andy Yates
• Gramene Participants: Will Spooner,
Doreen Ware, Aaron Chuah, Shiran
Pasternak, Sharon Wei
Gramene SAB 2010
67
Plant Reactome
Curators Meeting
Pankaj Jaiswal and Marcela Monaco
organized an intensive five-day
meeting (October 25-29) at CSHL
with Peter D'Eustachio of New York
University to learn how to use the
Reactome model and software to
curate plant pathways.
Other participants included Vindhya
Amarasinghe (OSU), Palitha
Dharmawardhana (OSU), and
Hardeep Nahal (Univ. of Toronto).
Gramene SAB 2010
68
• Development work on visualizing
annotations from DNA Subway within
Gramene’s Ensembl views
• Contribution of reference genomes for
high-throughput sequencing
Gramene SAB 2010
69
Web Usage and Stats
Gramene SAB 2010
70
Page Requests by Year per Month
2001 - 2010
Explanation of drop
in web usage
Prior to release 29, Gramene was experiencing
problems from abusive spidering by web searches
on our development site. As a consequence, all
indexing was disabled in our “robots.txt” file.
Through an error in the release process, this file
was copied to the live server, thereby refusing
access to search engines. This explains the severe
drop in usage by casual users finding Gramene
through Internet searches. The problem has been
fixed, and usage appears to be climbing again.
Gramene SAB 2010
72
3-year Perspective
Gramene SAB 2010
73
Top Countries - Visits%
Nov 2009 – Nov 2010
Duration of Visit
Depth of Visit
Visitor Loyalty
Thanks, from Gramene
Gramene SAB 2010
78
End
Gramene SAB 2010
79