Bringing Genomics into the Classroom

Download Report

Transcript Bringing Genomics into the Classroom

iPlant Genomics in Education Workshop
Genome Exploration in Your Classroom
Working with Big Data
Challenges: the scope and scale of life sciences data continue to grow
• Big Data: data sets whose size and complexity is beyond
the capabilities of commonly used tools to capture,
manage, and process the data within a tolerable time
frame.
• Big Data: constantly moving target currently ranging from
a few dozen terabytes to many petabytes of data in single
data sets, with different types of data sets potentially
deeply intertwined.
- Wikipedia (http://en.wikipedia.org/wiki/Big_data)
Coming into the Genome Age
For the first time in the history of science students can work
with the same data and tools that are used by researchers.
Learning by posing and answering question.
Students generate new knowledge.
The iPlant Collaborative
Vision
How can we prepare for science we can’t anticipate?
The iPlant Collaborative
Vision
Enable life science researchers and educators to use and extend iPlant's
foundational cyberinfrastructure to understand and ultimately predict
the complexity of biological systems and their dynamic nature under
various environmental conditions.
The iPlant Collaborative
What is Cyberinfrastructure?
Cyberinfrastructure (CI) is data storage, software, highperformance computing, and people – organized into systems
that solve problems of size and scope that would not otherwise
be solvable.
The iPlant Collaborative
What is Cyberinfrastructure?
Platforms, tools, datasets
Storage and compute
Training and support
The iPlant Collaborative
What problems can iPlant Solve?
Crops and model plant systems
Agronomic microbes, insects…
Animal and livestock
The iPlant Collaborative
What problems can iPlant Solve?
iPlant is built for Data
The iPlant Collaborative
How was iPlant built?
The limitations of any training workshop
“I had the feeling I have been exposed to
many bioinformatics tools but I would be
unable to use any of them on my own.”
3. Keep asking questions
Don’t hesitate to ask
“Can iPlant do this?”
•
•
If iPlant can, we’ll help show you how…
If iPlant can’t we’ll find the path that gets you what you need
Keep asking at ask.iplantcollabortive.org
Bringing Genomics into the Classroom
Visualization of the Pectobacterium atrosepticum genome
http://www.scri.ac.uk/research/pp/plantpathogengenomics/pathogenbioinformatics
Bringing Genomics into the Classroom
“Essentially, all models are wrong, but some are useful” – George E.P. Box
From This…
Bringing Genomics into the Classroom
•
•
•
•
•
•
•
•
•
1866 – Mendel publishes work on inheritance
1869 – DNA discovered
1915 – Hunt Morgan describes linkage and recombination
1953 – Structure of DNA described
1956 – Human chromosome number determined
1968 – First gene mapped to autosome
1977 – Dideoxy sequencing
1983 – PCR
1986 – Human Genome Project proposed
Bringing Genomics into the Classroom
•
•
•
•
1993 – First MicroRNAs described
2003 – First ‘Gold Standard’ human genome sequence
2005 – First draft of human haplotype map (HapMap)
2007 – ENCODE project
Timeline: Welcome Trust (http://www.wellcome.ac.uk/stellent/groups/corporatesite/@policy_communications/documents/web_document/wtx063807.pdf)
The Egalitarian Gene
Agarose Gel Electrophoresis, 1973
1958
Matt Meselson &
Ultracentrifuge, $500,000
1973
Sharp, Sambrook, Sugden
Gel Electrophoresis
Chamber, $250
The Egalitarian Genome
Next Generation Sequencing, 2005
Bacterial colonies
Hundreds of millions of…
PCR colonies (clusters, features)
Bringing Genomics into the Classroom
To This…
Educational Challenge
For the first time in the history of biology students
can work with the same data at the same time and
with the same tools as research scientists.
Research
Education
Context of scientific discovery
Walk or…
…ride
an educational Discovery Environment
iPlant Genomics in Education Workshop
Major Workshop Concepts:
•Biology is becoming a “Data Unlimited” science.
•Genomes are dynamic.
•Genomes are more than just protein coding genes.
•DNA sequence is information.
•Gene annotation adds “meaning” to DNA sequence.
•Biological concepts like “genes” and “species” continually evolving.
•DNA barcoding bridges molecular genetics, evolution, ecology.
The Problem of Big Data in Biology
The abundance of biological data generated by
high-throughput sequencing creates challenges, as
well as opportunities:
•How do scientists share their data and make it publically available?
•How do scientists extract maximum value from the datasets they
generate?
•How can students and educators (who will need to come to grips
with data-intensive biology) be brought into the fold?
Bringing Genomics into the Classroom
Majority of genome is transcribed
~50% transposons
~25% protein coding genes/1.3% exons
~23,700 protein coding genes
~160,000 transcripts
Average Gene ~ 36,000 bp
7 exons @ ~ 300 bp
6 introns @ ~5,700 bp
7 alternatively spliced products
(95% of genes)
RefSeq: ~34,600 “reference sequence”
genes (includes pseudogenes, known RNA
genes)
Using Plants to Explore Genomics
Using Plants to Explore Genomics
There are a large number of
plant genomes available for
analysis.
Using Plants to Explore Genomics
The “weirdness” of plant genomes
on your dinner plate
1
Brachypodium
Sorghum
Oryza
1
3
2
10
6
3
1
3
9
5
7
7
8
2
4
4
2
5
5
6
8 10 11 12 9
4
3
5
Brachypodium
1
Triticum aestivum: allohexaploid
2
4
Using Plants to Explore Genomics
Glycine max (soy)
Dicots
46
150-300
Arabidopsis
145 Mb
Oryza (rice)
430 Mb
Avena (oats)
25
Brachypodium
Monocots
50-70
13
14
28
Hordeum (barley)
Triticum (wheat)
Time (million years)
- Genome duplication event
5,200 Mb
20,000 Mb
Pennisetum (pearl millet)
?? Mb
Zea (maize)
20
270 Mb
?? Mb
9
40
>20,000 Mb
Setaria (foxtail millet)
Sorghum
60
1,115 Mb
Present
750 Mb
2,500 Mb
Using DNA Subway to Explore Genomics