Genomics - California Lutheran University

Download Report

Transcript Genomics - California Lutheran University

Genomics
Biology 122
Genes and Development
Genomics milestones
First genome: Haemophilus influenza, 1995; by Craig Venter and TIGR
Human genome, draft sequences, 2001: Two groups (Francis Collins
of the Public consortium ; Craig Venter and CELERA)
Now: 1000’s of bacteria have been sequenced. Hundreds of human genomes
have been sequenced!
NCBI, Nov. 2010
From Genome.gov Human genome conference 6/7/2010
Restriction
analysis
FISH
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Reciprocal translocation
between one 9 and one #22
chromosome forms an
extra-long chromosome 9 (“der
9”) and the Philadelphia
chromosome (Ph1) containing
the fused bcr-abl gene. This is a
schematic view representing
metaphase chromosomes.
Fig. 18.2
bcr
Ph1
22
abl
9
der 9
a.
bcr (on normal 22)
abl (on normal 9)
bcr
fused gene
abl
Normal interphase nucleus
Interphase nucleus of leukemic
cell containing the Philadelphia
chromosome (Ph1)
b.
b: Reprinted by permission from Macmillan Publishers Ltd: Bone Marrow Transplantation 33, 247-249, “Secondary Philadelphia chromosome
after non-myeloablative peripheral blood stem cell transplantation for a myelodysplastic syndrome in transformation,” T Prebet, A-S Michallet, C
Charrin, S Hayette, J-P Magaud, A Thiébaut, M Michallet, F E Nicolini © 2004
Sequence-tagged
sites (STS)
Comparison of genetic and physical maps
Manual sequencing
Automated DNA sequencing
Estimated genes in sequenced genomes
Transposable elements
Alternative splicing
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Genome
variation
SNPs
SNP
SNP
SNP
Chromosome 1 A A C A C G C C A
T T C G G G G T C
A G T C G A C C G
Chromosome 2 A A C A C G C C A
T T C G A G G T C
A G T C A A C C G
Chromosome 3 A A C A T G C C A
T T C G G G G T C
A G T C A A C C G
Chromosome 4 A A C A C G C C A
T T C G G G G T C
A G T C G A C C G
a.
Haplotypes
Haplotype 1
C T C A A A G T A C G G T T C A G G C A
Haplotype 2
T T G A T T G C G C A A C A G T A A T A
Haplotype 3
C C C G A T C T G T G A T A C T G G T G
Haplotype 4
T C G A T T C C G C G G T T C A G A C A
b.
Diagnostic SNPs
A/G
c.
T/C
C/G
Haplotype 1
A T C
Haplotype 2
A C G
Haplotype 3
G T C
Haplotype 4
A C C
Comparison of plant genomes
(Comparative genomics)
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Rice Genome
Fig. 18.9
Sugarcane
Chromosome Segments
Genomic Alignment (Segment Rearrangement)
1 2 3 4 5 6
7 8 9 10 11 12
Corn Chromosome Segments
A B C D F G H
I
Wheat
Chromosome Segments
Rice
Sugarcane
Corn
Wheat
1 2 3 4 5 6 7 8 9 10
1 2 3 4 5 6 7
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
SCIENTIFIC THINKING
Hypothesis: Flowers and leaves will express some of the same genes.
Prediction: When mRNAs isolated from Arabidopsis flowers and from leaves are used as probes on an Arabidopsis
genome microarray, the two different probe sets will hybridize to both common and unique sequences.
Genomes deposited at NCBI
Organism
Complete
Genome sequencing projects statistics
Draft assembly
In progress
total
Prokaryotes
850
585
534
1969
Archaea
78
5
32
115
Bacteria
773
580
502
1855
Eukaryotes
39
249
320
608
Animals
6
110
159
275
Mammals
3
37
81
121
Birds
3
12
15
Fishes
13
12
25
26
20
48
2
3
5
13
12
26
Insects
2
Flatworms
Roundworms
1
Amphibians
1
1
Reptiles
1
Other animals
16
22
38
1
Plants
7
23
78
108
Land plants
4
19
73
96
Green Algae
3
4
4
11
Fungi
16
83
39
138
Ascomycetes
13
63
28
104
Basidiomycetes
1
12
8
21
Other fungi
2
8
3
13
Protists
10
31
40
81
Apicomplexans
5
10
4
19
Kinetoplasts
4
1
3
8
Other protists
1
19
33
53
total:
889
834
854
2577
Revised: Nov 18, 2010
GOLD (Genomes Online Database)
Complete
Incomplete
Targeted
Bacterial
2666
5493
424
Archaeal
149
182
1
Eukaryotic
166
2037
13
Metagenome studies
340
Metagenome samples
1930
[Metagenome are environmental samples]
Finished
1960
Permanent draft
1021
Complete, not published
26
Draft
1529
In progress
3426
DNA received
266
Awaiting DNA
510
Targeted (funded, not started)
Date
438
11/23/2011
NCBI, Genomes
Species
Reference sequences
In progress
Viroids
41
41
Viruses
2721
3933
Bacterial
1681
5140
Archaeal
121
90
Eukaryotes
1815
Organelles
2974
Date
11/23/2011
Human Disease genes
From Genome.gov, 11-2010
Animals
Vertebrates
Amphipod Crustacean
Chicken
Aphid, Pea
Coelacanth
Beetle, Red Flour
Gar, Spotted
Bug (Chagas' Vector)
Hagfish
Centipede, Geophilimorph
Lamprey, Sea
Chelicerate (Horseshoe Crab)
Lizard, Anole
Drug Resistant Parasitic Nematode
Pufferfish
Freshwater Polyp
Shark, Elephant
Fruit Fly
Skate
Honey Bee
Spotted African Lungfish
Louse, Body
Stickleback, Threespine
Mosquito
Turtle, Painted
Placazoan
Zebra finch
Planarian
Roundworm
Sand Fly
Sea Slug
Sea Squirt
Sea Star
Sea Urchin
Snail, Freshwater
Strongylid Nematode
Tardigrade
Wasp, Parasitoid
Worm, Acorn
Genome.gov
Worm, Priapulid
11/22/2011
Animal genomes in progress, November 2011 (genome.gov)
Mammals
Aardvark
Guinea Pig
Opossum, Gray Short-Tailed
Alpaca
Hedgehog, European
Opossum, Laboratory
Armadillo, Nine-banded
Hippopottamus
Orangutan
Baboon
Honey Possum (Noolbenger)
Pangolin
Bat, Little Brown (Microbat)
Horse
Pika
Bat, Big brown
Human
Platypus, Duck-Billed
Bonobo
Hyrax
Rabbit
Bushbaby
Koala
Rat
Bushbaby/Galago
Lemur, Flying
Rat, Kangaroo
California leaf-nosed bat
Lemur, Mouse
Rhesus Macaque
Cape golden mole
Lesser Egyptian jerboa
Ring-tailed lemur
Cat
Lizard, Anole
Shrew, Elephant
Chimpanzee
Llama
Shrew, European Common
Chinchilla
Long-haired (Rufous) elephant shrew
Shrew, Tree
Chinese hamster
Macaque, Cynomolgous
Sloth
Cow
Macaque, Pigtail
Springhare
Crested porcupine
Macaque, Rhesus
Squirrel
Degu
Macaque, Rhesus (Chinese population)
Star nosed mole
Dog
Malayan tapir
Stickleback, Threespine
Dolphin
Mangabey, Sooty
Syrian/Golden Hamster
Eastern grey kangaroo
Marmoset
Tarsier
Elephant, African Savannah
Mexican free-tailed bat
Tenrec (Lesser Hedgehog)
Ferret
Mole
Vervet
Fly Fox (Megabat)
Monkey, Squirrel
Vole, Prairie
Giant anteater
Mouse
Wallaby, Tammar
Gibbon
Mouse, Deer
Water Chevrotain
Golden-mantled howling monkey
Mouse, White-Footed
Weddell Seal
Greater horseshoe bat
Naked mole rat
West Indian manatee
North American porcupine
White rhinocerous
Mammal genomes in progress, November 2011 (genome.gov)
Neanderthals
Science Nov 17, 2006
Neanderthals
• 99.5% identical to humans when comparing the
same sequences
Neanderthals
Draft sequence published May 7, 2010.
Neanderthals from four sites (see map)
21 bones from Vindija analyzed for this study
3 bones were selected for detailed sequencing
(from three individuals)
Bones from three other sites were also
sequenced (see map)
Compared Neanderthal to five human genomes
Conclusion:
Non-African humans contain some
Neanderthal derived sequences (1 to 4%)
(gene flow estimated to be Neanderthal to Human,
and occurred > 45,000 years ago)
Notes:
Humans and Neanderthals lived in the same area for > 10,000 years.
Neanderthals perished 30,000 years ago.
Neanderthals
Four models of how the gene
transfer could have occurred
(option 2 is least likely,
option 3 most likely)
Transfer most likely occurred in
Middle-East/Western Asia
PNG = Papua New Guinea
Denisovians
Third type of human genome sequenced
Finger bone found in the Denisova cave in Altai Krai, Russia in 2008
The Denisova bone had a genome distinct from modern humans or
Neanderthals
The bone was dated to 41,000 years ago
Since only bone fragments are known, it is not known how they looked
It is thought that they were distributed throughout Asia and Melanasia
Analysis of the genome, and comparison with humans and neanderthals, suggests
that 4% of non-African DNA is related to neanderthals and 4 to 6% of melanasian
genomes is related to denisovians. This suggests some interbreeding between
the first modern humans, neanderthals, and denisovians.
Analysis of HLA types (immune proteins) suggests that over half of eurasian HLA
types came from neanderthals or denisovians, suggesting that they were selected
for in the eurasians.
Watson’s genome
• Sequenced using shotgun
sequencing
• About 3.5 percent of
Watson’s genome could not
be matched to the reference
genome-probably due to
differences in cloning step
Venter’s genome compared to the reference
genome
• 32 million reads resulted in 2.8 billion base
pairs of assembled sequence (7.5 fold
coverage)
• 4.1 million differences to the already
published genome (12.3 million bases
different)
• 3,213,401 single nucleotide polymorphisms
(SNPs), 53,823 block substitutions (2-206
bp), 292,102 heterozygous
insertion/deletion events (indels)(1-571 bp),
559,473 homozygous indels (1-82,711 bp),
90 inversions, as well as numerous
segmental duplications and copy number
variation regions.
How different are
individuals?
• 44% of genes were heterozygous
for one or more variants (they could
determine both copies)
• A conservative estimate that a
minimum of 0.5% variation exists
between two haploid genomes (all
heterozygous bases).
How different are individuals?
• DNA from a Yoruba from Ibadan, Nigeria was
completed.
• About 4 million SNPs were found, 74% had
already been found by others.
• About 24% more polymorphism (heterozygosity)
than Caucasian genomes.
• There were 5,704 indels ranging from 50 to over
35,000 bp long. Many were SINES and LINES.
Bentley et al., Nature,
November 6, 2008
How different are individuals?
• DNA from a Han Chinese individual was completed.
• About 3 million SNPs were found, 86% had already been
found by others.
• About 24% more polymorphism (heterozygosity) than
Caucasian genomes.
• There were 2,682 structural variations, including
insertions, deletions, and inversions. Many variations in
SINES and LINES were found.
Wang et al., Nature,
November 6, 2008
How different are cancer cells?
• DNA from skin cells and acute myeloid leukemia cells
from the same Caucasian woman were sequenced.
• About 2.9 million SNPs were found in the skin cells,
and 3.8 million in the leukemia cells.
• Almost all of the differences in SNPs were found to be
common in other sequenced genomes or not in genes.
• Ten genes were found to have acquired mutations in
the leukemia cells. Of these, two were known to be
involved in tumour progression. The functions of the
other eight mutant genes are unknown.
Ley et al., Nature,
November 6, 2008
Metabolomics
• A study of 284 males compared 383 metabolic
indicators and SNPs (genetic variants).
• Up to 12% of the levels of the metabolic
molecules could be explained by particular
versions of the gene (SNP).
• Four genes were known to be in metabolic
pathways related to the metabolic molecule that
was high or low.
Geiger et al., PLOS
Genetics. November, 2008
Wooly mammoth
• Over 4 billion bp in genome
• Mammoths and African elephants differ in
about 1 amino acid per protein
• Estimate that Mammoths and African
elephant separated 1.5 to 2.0 Million years
ago
Nature, November 20, 2008
Wooly mammoth
Recent genome news
Nov 19, 2011
Malaysian Genomics Resource Centre Berhad (MGRC) today announced
that it has successfully completed its 100th human genome from a diverse
mix of Malaysian, European and Australian individuals.
The results of the data generated from these genomes has helped in efforts
to identify and compare highly represented patterns of common and
clinically-relevant genetic variations within Malaysian and other populations,
and to establish robust bioinformatics protocols for the reference-based
analysis of genomic information.
Recent genome news
Nov 23, 2011
A study of 11,000 children and adults found that very short people (the
lowest 2.5% of the population) are missing more genes or parts of genes
than taller people.
Recent genome news
November, 2011
The mythical "$1,000 genome" is almost upon us (in 2012), said Jonathan
Rothberg, CEO of sequencing technology company Ion Torrent, at MIT's
Emerging Technology conference.
November 2, 2011
Duke University said last week that it will sequence 4,000 individuals as
part of a collaborative, $25 million effort to identify as many genes as
possible implicated in epilepsy.
Maize (corn) genome
Maize has 10 chromosomes, 2.3 billion base pairs
The sequencing was done using clone-by-clone method,
with 16,848 BACs sequenced, assembled, and analyzed.
There are estimated to be 32,500 protein encoding genes,
and 150 microRNA genes (miRNA).
Approximately 75% of the genome is repeated DNA.
It has over 400 families of LTR retrotransposons with over
31,000 different sequences.
Fig. 1 The maize B73 reference genome (B73 RefGen_v1): Concentric circles show aspects of the
genome
P. S. Schnable et al., Science 326,
1112-1115 (2009)
1000 Genomes project
The 1000 Genomes Project is an international collaboration to produce an
extensive public catalog of human genetic variation, including SNPs and
structural variants, and their haplotype contexts.
This resource will support genome-wide association studies and other medical
research studies.
The genomes of about 2500 unidentified people from about 27 populations
around the world will be sequenced using next-generation sequencing
Technologies.
Highlights
Over 4.9 trillion nucleotides sequenced
Over 800 individuals (179 people had their whole genomes sequenced
and 697 people just the protein-coding regions)
Each child had around 60 mutations in its genome that did not exist in
either parent
Over 15 million SNPs discovered
each individual is carrying a significant number of deleterious mutations,
maybe 250 or 300 genes that have defective copies
1000 Genomes project
http://www.1000genomes.org/home
3 billion Number of DNA letters in the human genome (200 volumes the size
of a Manhattan telephone book, which has around 1,000 pages)
20,000-25,000 Number of genes in the genome (though not all scientists agree)
2000 Year the first draft of the human genome was announced to much
fanfare at the Clinton White House
2003 Final draft completed to 99.99% accuracy
2500 Number of people whose genomes the 1,000 Genomes Project hopes
to sequence, from 25 populations
15 million Number of single-letter changes identified in the pilot phase
1 million Number of small insertions and deletions identified in the pilot phase
4.9 trillion Number of letters of data sequenced by the 1,000 Genomes
Project so far
1094 Genomes completed for 1094 individuals, 6/23/11
Human microbiome
Adults harbor ten times more microbial cells than they have human cells.
Examination of how these microbes impact human health through their
association with the body, for example by influencing metabolism,
disease susceptibility and drug response is key for improving human health.
Through the Comparative Genome Evolution (CGE) program,
NHGRI approved a limited project –
Sequencing of Cultivable Microbes from Human Gut –
to obtain reference genome sequence data from up to 300 cultured bacteria
and archea sampled from the human digestive tract and urogenital tract in
September 2005.
The object is three-fold: to start to generate reference data for future
large-scale metagenomics studies; to understand the diversity of bacterial
pangenomes, and to start to address the technical and bioinformatic
challenges that human metagenomics research will encounter.
From Genome.gov, 11-2010
Scientists propose a "genome zoo" of 10,000 vertebrate species
November 03, 2009
By Branwyn Wagman, Guest Writer (831) 459-3077
Scientists involved in the Genome 10K Project are assembling specimens of thousands of
animals spanning a broad range of evolutionary diversity.
Photos courtesy of San Diego Zoo.
From http://news.ucsc.edu/2009/11/3333.html
10,000 vertebrate genomes
In the most comprehensive study of animal evolution ever attempted, an
international consortium of scientists plans to assemble a genomic zoo--a
collection of DNA sequences for 10,000 vertebrate species, approximately
one for every vertebrate genus.
Known as the Genome 10K Project, it involves gathering specimens of
thousands of animals from zoos, museums, and university collections
throughout the world, and then sequencing the genome of each species to
reveal its complete genetic heritage.
Launched in April 2009 at a three-day meeting at the University of California,
Santa Cruz, the project now involves more than 68 scientists. Calling
themselves the Genome 10K Community of Scientists (G10KCOS), the
group outlined its proposal to create a collection of tissue and DNA specimens
for the project in a paper to be published online November 5 in the
Journal of Heredity.
From http://news.ucsc.edu/2009/11/3333.html