New DNA Subway workshop introduction1

Download Report

Transcript New DNA Subway workshop introduction1

iPlant Genomics in Education Workshop
Genome Exploration in Your Classroom
iPlant Genomics in Education Workshop
Major Workshop Concepts:
•Biology is becoming a “Data Unlimited” science.
•Genomes are dynamic.
•Genomes are more than just protein coding genes.
•DNA sequence is information.
•Gene annotation adds “meaning” to DNA sequence.
•Biological concepts like “genes” and “species” continually evolving.
•DNA barcoding bridges molecular genetics, evolution, ecology.
The Problem of Big Data in Biology
The abundance of biological data generated by highthroughput sequencing creates challenges, as well as
opportunities:
•How do scientists share their data and make it
publically available?
•How do scientists extract maximum value from the
datasets they generate?
•How can students and educators (who will need to
come to grips with data-intensive biology) be brought
into the fold?
The iPlant Collaborative
The iPlant Collaborative
5-10 year project to develop a
computer infrastructure to apply
computational thinking to solve
biological problems
•High performance computing
•Data and data analysis
•Virtual organization
•Learning and workforce
The iPlant Collaborative
Bringing Genomics into the Classroom
Visualization of the Pectobacterium atrosepticum genome
http://www.scri.ac.uk/research/pp/plantpathogengenomics/pathogenbioinformatics
Bringing Genomics into the Classroom
•
•
•
•
•
•
•
•
•
1866 – Mendel publishes work on inheritance
1869 – DNA discovered
1915 – Hunt Morgan describes linkage and recombination
1953 – Structure of DNA described
1956 – Human chromosome number determined
1968 – First gene mapped to autosome
1977 – Dideoxy sequencing
1983 – PCR
1986 – Human Genome Project proposed
Bringing Genomics into the Classroom
•
•
•
•
1993 – First MicroRNAs described
2003 – First ‘Gold Standard’ human genome sequence
2005 – First draft of human haplotype map (HapMap)
2007 – ENCODE project
Timeline: Welcome Trust (http://www.wellcome.ac.uk/stellent/groups/corporatesite/@policy_communications/documents/web_document/wtx063807.pdf)
Bringing Genomics into the Classroom
“Essentially, all models are wrong, but some are useful” – George E.P. Box
From This…
Bringing Genomics into the Classroom
To This…
Bringing Genomics into the Classroom
Majority of genome is transcribed
~50% transposons
~25% protein coding genes/1.3% exons
~23,700 protein coding genes
~160,000 transcripts
Average Gene ~ 36,000 bp
7 exons @ ~ 300 bp
6 introns @ ~5,700 bp
7 alternatively spliced products
(95% of genes)
RefSeq: ~34,600 “reference sequence”
genes (includes pseudogenes, known RNA
genes)
Using Plants to Explore Genomics
Using Plants to Explore Genomics
There are a large number of
plant genomes available for
analysis.
Using Plants to Explore Genomics
“Plant genomes range from simple to exceptionally complex”
– Richard Chronn, USDA Forest Service
It’s this diversity within plant genomes that provides a rich
platform for examination of the genome as a phenomenon.
Genlisea margaretae 63Mb
Paris Japonica 150Gb
Using Plants to Explore Genomics
The “weirdness” of plant genomes
on your dinner plate
1
Brachypodium
Sorghum
Oryza
1
3
2
10
6
3
1
3
9
5
7
7
8
2
4
4
2
5
5
6
8 10 11 12 9
4
3
5
Brachypodium
1
Triticum aestivum: allohexaploid
2
4
Using Plants to Explore Genomics
Glycine max (soy)
Dicots
46
150-300
Arabidopsis
145 Mb
Oryza (rice)
430 Mb
Avena (oats)
25
Brachypodium
Monocots
50-70
13
14
28
Hordeum (barley)
Triticum (wheat)
Time (million years)
- Genome duplication event
5,200 Mb
20,000 Mb
Pennisetum (pearl millet)
?? Mb
Zea (maize)
20
270 Mb
?? Mb
9
40
>20,000 Mb
Setaria (foxtail millet)
Sorghum
60
1,115 Mb
Present
750 Mb
2,500 Mb
Using Plants to Explore Genomics