The maize leaf transcriptome

Download Report

Transcript The maize leaf transcriptome

NextGen Pipeline: Enabling the Plant Science
Community
Tom Brutnell (lead), Steve Rounsley (co-lead), Matt Vaughn
(Engagement Lead)
Ed Buckler, Justin Borevitz, Todd Mockler, Pat Schnable, Bob
Schmitz, Matt Hudson, Brad Barbazuk, Damian Gessler
What is NextGen?
• Ultra high-throughput sequence analysis (UHTS)
• Several platforms including 454, ABI-Solid, Illumina/Solexa that are
capable of generating 1 to 100’s of Gb of DNA sequence on a single
run.
• Library preparations are relatively simple and kits available
• Data analysis is computationally challenging (need to process Tb of
data) and beyond the reach of many experimental biologists.
UHTS-RNA
UHTS-DNA
How will UHTS change plant science?
• Makes phenotyping not genotyping rate limiting
• Genome-wide association studies
• Allele-mining
• Enables a much deeper understanding of “non-model” species
• 1000 genomes project (transcriptome of 1000 plant species)
• Genome sequence now available for B. distachyon, S. italica genomes, RILs
of maize and rice
• Provides detailed transcriptional resolution on global scale
• Map 5’, 3’ UTR, TSS, transcript isoforms,
• Examine smRNA populations
• Map methylation, TF binding sites, etc…
NextGen 1.0 Pipeline
• Develop an a computational pipeline to process ultra-high
throughput sequence datasets
• First iteration of NextGen 1.0 Pipeline will perform simple
variant detection or transcript quantification starting from
DNA and RNA-derived datasets.
• Designed explicitly to support modularity and extensibility
• Import fastq files and export data in SAM/BAM format.
NextGen 2.0 Pipeline
• Subsequent versions will have added functionalities that may
include:
• Ability to process/compare multiple samples
• Support varient detection for non-reference genomes
• Support multiple methods of analysis (BWA,SOAP2/BOWTIE)
• Support additional workflows (smRNA annotation, ChIP seq, de novo
assembly)
• Input from working groups is imperative
• What is the decision tree for subsequent iterations?
• What do modeling/stats/viz groups need as NextGen deliverables?
• How can NextGen exploit tools under development for G2P?
Meeting the needs of biological use cases
• Flowering time and photosynthesis
• How can NextGen inform modeling efforts
• Abiotic Stress
• Should we develop a smRNA pipeline for 2.0
• Input from working groups is imperative
• What is the decision tree for subsequent iterations?
• What do modeling/stats/viz groups need as NextGen deliverables?
• How can NextGen exploit tools under development for G2P?
Integrating NextGen/Viz Pipeline
Workflow
• A pathway of operations
• Entities:
– Operation
– Data
– Flow
• The flow through the operations is managed by the workflow
software (e.g., VizTrails)
• Candidate software and package are named
/ber=Bernice Rogowitz
Integrating NextGen/Viz/Modeling Pipelines
Literature search
Modeling and Statistical
Inference
Candidate maize
gene
Homolog Finder (e.g,
CoGE)
List of homologous
Arabadopsis gene IDs
5 genes of interest
For each, examine structure of
transcripts and expression over
time (e.g, EFP Maize Genome
Browser)
Expression data for 20
maize genes
Examine clusters that
can handle maize data
(e.g., eNorthern,
MapMan)
note: very limited data for
maize so may need to go
to rice)
Co-Expression Analysis
(e.g., ATTED2)
Expression Network of
10 Arabidopsis Genes
Homolog Finder (e.g,
CoGE)
Find expression values for
these genes (e.g, Next
Gen)
List of 20 homogolous
maize gene IDs
/ber/tb