DNA Subway - iPlant Pods

Download Report

Transcript DNA Subway - iPlant Pods

Manifestations of a Code
Genes, genomes, bioinformatics and
cyberspace – and the promise they
hold for biology education
The iPlant Collaborative
Vision
Enable life science researchers and educators to
use and extend cyberinfrastructure
www.iPlantCollaborative.org
What is a genome?
A GENOME is all of a living thing’s genetic
material.
The genetic material is DNA
(DeoxyriboNucleic Acid)
DNA, a double helical molecule, is made
up of four nucleotide “letters”:
A-G--
T--
C--
Slide: JGI, 2009
What is sequencing?
Just as computer software is rendered in
long strings of 0s and 1s, the GENOME or
“software” of life is represented by a
string of the four nucleotides, A, G, C, and
T.
To understand the software of either - a
computer or a living organism - we must
know the order, or sequence, of these
informative bits.
Slide: JGI, 2009
Economics of Scale
¢1
2.0
¢0.57
¢0.46
1.0
¢0.50
¢0.35
Cost: Cents per base
Sequence production
(Billions of bases/month)
3.0
¢0.19
¢0.08
> ¢0.05
0
0
1989
1991
1993
Human Genome
launched
1995
1997
1999
Slide: JGI, 2009
2001
2003
Human Genome
completed
2005
2007
Important Dates in Genomics
•1986 DOE announces Human Genome Initiative-$5.3 million to develop technology.
•1990 DOE & NIH present their HGP plan to Congress.
1997 Escherichia coli genome published
•1997 Yeast genome published
•2000 Fruit fly (Drosophila) genome published.
•2000 Working draft of the human genome announced.
•2000 Thale cress (Arabidopsis) genome published (2x).
•2002 Rice genome published (2x).
•2003 Human genome published.
•2006 First tree genome published in Science.
•2007 First metagenomics study published
Another angle
Slide: Stein, 2010
Coming into the Genome Age
For the first time in the history of science students can work
with the same data and tools that are used by researchers.
Learning by posing and answering question.
Students generate new knowledge.
Workshop Objectives
 Illustrate the evolving concept of “gene.”
 Conceptualize a “big picture” of complex, dynamic
genomes.
 Guide students to address real problems through modern
genome science.
 Use educational and research interfaces for
bioinformatics.
 Work with “real” genome sequences gathered by students
– in the lab or online.
Exciting?
>mouse_ear_cress_1080
GAAATAATCAATGGAATATGTAGAGGTCTCCTGTACCTTCACAGAGATTCTAGGCTGAGAGCAGTGCATATAGATATCTTT
CGTACTCATCTGCTTTTTCTGGTCTCCATCACAAAAGCCAACTAGGTAATCATATCAATCTCTCTTTACCGTTTACTCGAC
CTTTTCCAATCAGGTGCT TCTGGTGTGTCTACTACTATCAGTTTTAGGTCTTTGTATACCTGATCTTATCTGCTACTG
AGGCTTGTAAAAGTGATTAAAACTGTGACATTTACTCTAAGAGAAGTAACCTGTTTGATGCATTTCCCTAATATACCGGTG
TGGAAAAGTGTAGGTATCTGTACTCAGCTGAAATGGTGGACGATTTTGAAGAAGATGAACTCTCATTGACTGAAAGCGGGT
TGAAGAGTGAAGATGGCGTTATTATCGAGATGAATGTCTCCTGGATGCTTTTATTATCATGTTTGGGAATTTACCAAGGGA
GAGGTATCAGAATCTATCTTAGAAGGTTACATTTAGCTCAAGCTTGCATCAACATCTTTACTTAGAGCTCTACGGGTTTTA
GTGTGTTTGAAGTTTCTTAACTCCTAGTATAATTAGAATCTTCTGCAGCAGACTTTAGAGTTTTGGGATGTAGAGCTAACC
AGAGTCGGTTTGTTTAAACTAGAATCTTTTTATGTAGCAGACTTGTTCAGTACCTGAATACCAGTTTTAAATTACCGTCAG
ATGTTGATCTTGTTGGTAATAATGGAGAAACGGAAGAATAATTAGACGAAACAAACTCTTTAAGAACGTATCTTTCAGTTT
TCCATCACAAATTTTCTTACAAGCTACAAAAATCGAACTATATATAACTGAACCGAATTTAAACCGGAGGGAGGGTTTGAC
TTTGGTCAATCACATTTCCAATGATACCGTCGTTTGGTTTGGGGAAGCCTCGTCGTACAAATACGACGTCGTTTAAGGAAA
GCCCTCCTTAACCCCAGTTATAAGCTCAAAGTTGTACTTGACCTTTTTAAAGAAGCACGAAACGAAAAACCCTAAAATTCC
CAAGCAGAGAAAGAGAGACAGAGCAAGTACAGATTTCAACTAGCTCAAGATGATCATCCCTGTTCGTTGCTTTACTTGTGG
AAAGGTTGATATTTTCCCCTTCGCTTTGGTCTTATTTAGGGTTTTACTCCGTCTTTATAGGGTTTTAGTTACTCCAAATTT
GGCTAAGAAGAGATCTTTACTCTCTGTATTTGACACGAATGTTTTTAATCGGTTGGATACATGTTGGGTCGATTAGAGAAA
TAAAGTATTGAGCTTTACTAAGCTTTCACCTTGTGATTGGTTTAGGTGATTGGAAACAAATGGGATCAGTATCTTGATCTT
CTCCAGCTCGACTACACTGAAGGGTAAGCTTACAATGATTCTCACTTCTTGCTGCTCTAATCATCATACTTTGTGTCAAAA
AGAGAGTAATTGCTTTGCGTTTTAGAGAAATTAGCCCAGATTTCGTATTGGGTCTGTGAAGTTTCATATTAGCTAACACAC
TTCTCTAATTGATAACAGAAGCTATAAAATAGATTTGCTGATGAAGGAGTTAGCTTTTTATAATCTTCTGTGTTTGTGTTT
TACTGTCTGTGTCATTGGAAGAGACTATGTCCTGCCTATATAATCTCTATGTGCCTATCTAGATTTTCTATACAATTGATA
TTTGATAGAAGTAGAAAGTAAGACTTAAGGTCTTTTGATTAGACTTGTGCCCATCTACATGATTCTTATTGGACTAATCAT
TCTTTGTGTGAAAATAGAATACTTTGTCTGAACATGAGAGAATGGTTCATAATACGTGTGAAGTATGGGATTAGTTCAACA
ATTTCGCTATTGGAGAAGCAAACCAAGGGTTAATCGTTTATAGGGTTAAGCTAATGCTCTGCTCTTTATATGTTATTGGAA
CAGACTATTGTTGTGCCTATCTTGTTTAGTTGTAGATTCTATCTCGACTGTTATAAGTATGACTGAAGGCTTGATGACTTA
TGATTCTCTTTACACCTGTAGAAGGATTTAAGCTTGGTGTCTAGATATTCAATCTGTGTTGGTTTTGTCTTTCTTTTGGCT
This better?
Annotation workflow
Generate
mathematical
evidence
Find
Gene Families
Browse in
context
Get DNA
sequence
Build gene
models
Gather
biological
evidence
Analyze large
data amounts
Walk or…
Early concept (2009)
DNA Subway 2014
Molecular biology and bioinformatics concepts
RepeatMasker
• Eukaryotic genomes contain large amounts of repetitive DNA.
• Transposons can be located anywhere.
• Transposons can mutate like any other DNA sequence.
FGenesH Gene Predictor
• Protein-coding information begins with start, followed by codons, ends in stop.
• Codons in mRNA (AUG, UAA,…) have sequence equivalents in DNA (ATG, TAA,…).
• Most eukaryotic introns have “canonical splice sites,” GT---AG (mRNA: GU---AG).
• Gene prediction programs search for patterns to predict genes and their structure.
• Different gene prediction programs may predict different genes and/or structures.
Multiple Gene Predictors
• The protein coding sequence of a mRNA is flanked by untranslated regions (UTRs).
• UTRs hold regulatory information.
BLAST Searches
• Gene or protein homologs share similarities due to common ancestry.
• Biological evidence is needed to curate gene models predicted by computers.
• mRNA transcripts and protein sequence data provide “hard” evidence for genes.
What is a gene?
• Can we define a gene?
• Has the definition of a gene changed?
• How can we find genes?
Views
• Genes as “independent hereditary units (1866), Mendel
• Genes as “beads on strings” (1926), Morgan
• One gene, one enzyme (1941), Beadle & Tatum
• DNA is molecule of heredity (), Avery
• DNA > RNA > Protein (1953), Crick, Watson, Wilkins
More Views
•
•
•
•
Transposons (1940s-50s), McClintock
Reverse transcription (1970), Temin & Baltimore
Split genes (1977), Roberts & Sharp
RNA interference (1998), Fire and Mello
Sequence & course material repository
http://gfx.dnalc.org/files/evidence
Don’t open items, save them to your computer!!
•
•
•
•
•
•
•
Annotation (sequences & evidence)
Manuals (DNA, Subway, Apollo, JalView)
Presentations (.ppt files)
Prospecting (sequences)
Readings (Bioinformatics tools, splicing, etc.)
Worksheets (Word docs, handouts, etc.)
BCR-ABL (temporary; not course-related)