RNA-Seq - iPlant Pods - iPlant Collaborative

Download Report

Transcript RNA-Seq - iPlant Pods - iPlant Collaborative

DNA Subway Green Line
Onramp to HPC in Biology Education
Dave Micklos and
Uwe Hilgert
iPlant Collaborative
DNA Learning Center,
Cold Spring Harbor
Laboratory; Bio5 Institute,
University of Arizona
…ride
an educational Discovery Environment
Green Line:
RNA Sequence (RNA-Seq) Analysis
• First fully GUI interface for RNA-Seq analysis — no
command line or data conversions
• Accesses XSEDE system through the iPlant Agave API
• Co-localizes up to 100 GB of data in iPlant Data Store
• Look for differential gene expression in different
tissues, life stages, or treatment
• Generate lists of expressed genes and fold-changes
• Annotate sequenced genomes; add results to Red
Line projects
RNA code
represents “active”
DNA in genome
150 feet
Homo sapiens bitter taste receptor
(TAS2R38) DNA code > RNA code
CCTTTCTGCACTGGGTGGCAACCAGGTCTTTAGATTAGCCAACTAGAGAAGAGAAGTAGAATAGCC
AATTAGAGAAGTGACATCATGTTGACTCTAACTCGCATCCGCACTGTGTCCTATGAAGTCAGGAGT
ACATTTCTGTTCATTTCAGTCCTGGAGTTTGCAGTGGGGTTTCTGACCAATGCCTTCGTTTTCTTG
GTGAATTTTTGGGATGTAGTGAAGAGGCAGGCACTGAGCAACAGTGATTGTGTGCTGCTGTGTCTC
AGCATCAGCCGGCTTTTCCTGCATGGACTGCTGTTCCTGAGTGCTATCCAGCTTACCCACTTCCAG
AAGTTGAGTGAACCACTGAACCACAGCTACCAAGCCATCATCATGCTATGGATGATTGCAAACCAA
GCCAACCTCTGGCTTGCTGCCTGCCTCAGCCTGCTTTACTGCTCCAAGCTCATCCGTTTCTCTCAC
ACCTTCCTGATCTGCTTGGCAAGCTGGGTCTCCAGGAAGATCTCCCAGATGCTCCTGGGTATTATT
CTTTGCTCCTGCATCTGCACTGTCCTCTGTGTTTGGTGCTTTTTTAGCAGACCTCACTTCACAGTC
ACAACTGTGCTATTCATGAATAACAATACAAGGCTCAACTGGCAGATTAAAGATCTCAATTTATTT
TATTCCTTTCTCTTCTGCTATCTGTGGTCTGTGCCTCCTTTCCTATTGTTTCTGGTTTCTTCTGGG
ATGCTGACTGTCTCCCTGGGAAGGCACATGAGGACAATGAAGGTCTATACCAGAAACTCTCGTGAC
CCCAGCCTGGAGGCCCACATTAAAGCCCTCAAGTCTCTTGTCTCCTTTTTCTGCTTCTTTGTGATA
TCATCCTGTGCTGCCTTCATCTCTGTGCCCCTACTGATTCTGTGGCGCGACAAAATAGGGGTGATG
GTTTGTGTTGGGATAATGGCAGCTTGTCCCTCTGGGCATGCAGCCATCCTGATCTCAGGCAATGCC
AAGTTGAGGAGAGCTGTGATGACCATTCTGCTCTGGGCTCAGAGCAGCCTGAAGGTAAGAGCCGAC
CACAAGGCAGATTCCCGGACACTGTGCTGAGAATGGACATGAAATGAGCTCTTCATTAATACGCCT
GTGAGTCTTCATAAATATGCC
Differential Gene Expression
RNA Sequence (RNA-Seq) gives “snapshot” of genes
active in different cells at different times
6
Differential Gene Expression
RNA Sequence (RNA-Seq) gives “snapshot” of genes
active in different cells
7
RNA Sequence (RNA-Seq) Analysis
Design RNA-Seq experiment, i.e., differential expression
Isolate total RNA; convert to DNA library
Sequence experiment and control libraries
Analyze sequence data on DNA Subway Green Line
Follow-up experimental validation
Image source: http://www.bgisequence.com
1) Manage Data: Quality Assessment
with FastQC; ~100 Million 75/150
nucleotide reads in < 1hr
2) FastX ToolKit: Quality Control with FastX
Toolkit; ~100M 75/150 nucleotide reads in
<1 hr (some took up to 19 hours…)
3) TopHat: Aligns ~100 Million 75/150
nucleotide (paired end) reads to a reference
genome of 100M–5B in 6–19hr
TopHat Alignment
JBrowse
TopHat Alignment
JBrowse
4) CuffLinks: Assembles transcripts and
calculates abundance on BAM files,
1–12GB in 6–19hr
5) CuffDiff: Merges assemblies from Cufflinks
and performs differential expression
analysis on 4–9 samples in 6–19 hr
Green Line
Queue time vs Run time


Asking for a high run time, leads to longer queue times
Asking for a short high time may lead to job being
terminated

Users don't like to wait too long

Users want the results right away

Finding the right balance is not easy
Green Line
Dealing w/ the unexpected





Systems taken offline
Maintenance
Network outages, data transfer issues
Science API gives glitches
Authentication
Green Line
“Monitoring XSEDE”
DNA Subway
“Power Desktop”
• Intuitive interface to support seamless genome
“round trip” for eukaryote of choice
• Access high performance computing to analyze whole
genome data (RNA-seq, initially)
• Scaffold data to sequenced genomes available in
iPlant Data Store
• Directly upload RNA-seq reads as biological evidence
for genome annotation using Red Line
NSF CCLI Project Retreat
June 8–20, 2014, CSHL
• 11 faculty from PUIs
• Program included lectures/practical sessions
Wet lab: RNA library prep
Green Line analysis & bioinformatics
Pedagogy/teaching resources
Virtual training materials
NSF CCLI Project Retreat
Faculty Participants
Agnes Ayme-Southgate
College of Charleston, SC
Judy Brusslan
California State University, Long Beach, CA
Raymond Enke
James Madison University, VA
Shaye Lewis
Prairie View A&M University, TX
Irina Makarevitch
Hamline University, MN
Judith Ogilvie
Saint Louis University, MO
Jeremy Seto
New York City College of Technology, CUNY, NY
Carrie Thurber
Abraham Baldwin Agricultural College, IL
George Ude
Bowie State University, MD
Deirdre Vaden
Prairie View A&M University, TX
Scott Woody
University of Wisconsin, WI
Flight muscle development during life-stage transitions in Apis
melifera (honeybee)
Leaf development and senescence in Arabidopsis thaliana
Retina development in Gallus gallus
Testes development from juvenile to puberty in caprine (goat)
Response to cold stress in maize
Retinal changes of mice with retinitis pigmentosa
Differentiation of rat pheochromocytoma line cells (PC12) to a
neuronal-like phenotype
Seed abscission in Sorghum bicolor
Floral inflorescence genes in banana/plantains
Peripheral blood mononuclear cells from hypertensive rats
treated with captopril
Gibberellic acid exposure in Brassica rapa (Fast Plants)
gibberellic acid (gad) mutants
NSF CCLI Project Retreat
Flight muscle development during life-stage
transitions in Apis mellifera (honeybee)
Agnes Ayme-Southgate, College of Charleston, SC
All honeybees begin as worker bees, flying short distances.
Some honeybees transition into foragers, flying long distances.
This transition necessitates major changes in flight muscles.
Goal is to identify the gene expression changes in flight muscles
during this transition
Courses
• Biol 322: Developmental Biology, 30–38 students
• Genetics, 100 students
• Undergraduate research in lab, 2–3 students
NSF CCLI Project Retreat
Differential gene expression in Capra hircus (goat)
testes during juvenile development
Shaye Lewis, Prairie View A&M University, TX
Fertility phenotypes show low heritability, and semen analysis
parameters cannot determine fertility status. Molecular
biomarkers can increase efficiency of artificial insemination
and embryo transfer in goats. Goal is to identify genes
important for normal testes development and function
Courses
•4533: Animal Breeding & Genetics, 20 students
•Undergraduate research in lab, 4 students
NSF CCLI Project Retreat
Understanding transcriptional response to cold
stress in maize
Irina Makarevitch, Hamline University, MN
Maize is grown worldwide and is astaple for >1 billion people.
Maize is thermophilic and sensitive to low temperatures, and
understanding how plants respond to cold can improve yields.
Goal is to identify genes that are differentially expressed when
maize is grown under cold stress
Courses
•Biol 201: Principles of Genetics, 80 students
•Biol 301: Genomics & Bioinformatics, 20 students
•Undergraduate research in lab, 4 students
NSF CCLI Project Retreat
RNA-Seq Datasets Generated and Analyzed
Using the Green Line of DNA Subway
• 8 eukaryotic organisms
• 21 controls paired with 26
experimental conditions
• 402 Gbases sequenced
• 837 jobs submitted to TACC
• 87% jobs completed
• 695 hours total CPU time
• 16 threads/processors running
concurrently
Intended Implementation 2014-15
100
level
200
level
300
level
400
level
500
level
Intro
Genetics, 270
Genetics, 220
Molecular & Cell Molecular Biology,
Biology, 50
100
Molecular
Applications
in Crop
Improvement
15
Biology
Cell & Molecular
Biology, 75
20
15
Genomics, 40
Genomics &
Bioinformatics, 70
Animal Breeding &
Genetics, 20
Developmental
Biology, 35
Independent
Research, 5
Undergrad
Research
Cell Structure &
Function, 30
Synthetic Biology, 30
Anatomy/Physiology,
50
Advanced Genetic
Techniques, 15
100s
320
550
140
DNA Subway is…
Producers
Uwe Hilgert
David Micklos
Jason Williams
Designers
Eun-Sook Jeong
Susan Lauter
Programmers
Cornel Ghiban
Mohammed Khalfan
Sheldon McKay
Contributors
Matt Vaughn
Rion Dooley
Anthony Biondo
Jim Burnette
Scott Cain
Ed Lee
Zhenyuan Lu
Advisors
Matt Conte
Carson Holt
Bruce Nash
Oscar Pineda-Catalan
HPC in Undergraduate Biology Education
Banbury Center, CSHL, September 3-5, 2014
Contact Dave Micklos ([email protected])
A Great Gatsby era estate on
Long Island’s “Gold Coast”
Funded by NSF and the Alfred
P. Sloan Foundation