Human Genome Project

Download Report

Transcript Human Genome Project

Human Genome Project
•
•
•
•
Seminal achievement.
Scientific milestone.
Scientific implications.
Social implications.
HGP: Background
• International Human
Genome Sequencing
Consortium:
 Proposed 1985, endorsed in
1988.
 20 governmental groups.
 “Public project.”
Craig Venter & Celera
Genomics:
 Founded 1998.
 Sequence in 3 years.
 Technology: automation,
computers.
 Had access to public
project’s data.
Race ends in tie Feb. 2001: both publish in Science and Nature.
International Human Genome
Sequencing Consortium
• Approach was conservative and methodical.
• Had to wait for technology.
• First produced a clone-based physical map of the genome that
would serve as a scaffold for the later sequence data:
– Broke genome into chunks of DNA whose position on chromosome
was known from maps, clone into bacteria using BACs.
– Digest BAC-inserted clonal chunks of DNA into small fragments.
– Sequence small fragments.
– Stitch together BAC clones to assemble sequence.
– Assemble genome sequence from BAC clone sequences, using
clone-based physical map.
Celera
• Approach using "shotgun sequencing" (no
organized map).
• Shreds genome randomly into small
fragments with no idea of where they are
physically located.
• Clones and sequences fragments.
• Uses computer to stitch together genome by
matching overlapping ends of sequenced
fragments.
Timeline
• Genome sequencing driven
by technology.
– 1985: 500 base pairs per day
by hand.
– 1985-86: PCR and automated
DNA sequencing.
– 1992: BACs.
– 2000: 1000 bases per second.
Waiting for Technology
• Eyes on the
human
genome.
• While waiting
for technology
other
genomes were
sequenced.
Current Status
• Human genome ~3.2 Gb.
• “Rough draft” sequence of the human
genome.
• Have sequenced 90% of the 2.5 Gb of generich (euchromatic) DNA.
• What is considered finished?
– Fewer than 1 base in 10,000 is incorrectly
assigned.
– More than 95% of the euchromatic regions are
assigned.
– Each gap is smaller than 150 kb.
Access to Information
• All public project data on the Internet.
• NCBI Website: www.ncbi.nlm.nih.gov.
– Human genome database.
– Sequence and mapping tools.
Database Search Example
• The genome database has many tools to locate a
gene of interest or search for potential traits of the
gene.
• Example–chromosomal map search result for the
"breast cancer–causing gene" BRCA2:
Early Statistics
• Only 28% is transcribed into RNA.
• Only 1.1%-1.4% of genome actually
encodes protein (=5% of transcribed
RNA).
• Surprises:
– More junk DNA.
– Fewer genes.
Junk DNA
•
•
•
•
No apparent direct biological function.
Long stretches of repeated sequence.
Hot area of investigation.
Human genome has far more repeat DNA
than any other sequenced organism (over
half).
• Parasitic elements–45% of this repeat DNA is
from selfish, parasitic DNA:
– Transposable elements.
– May play role in evolution.
Gene Count
• Many fewer genes than expected (half):
– Only 35,000-45,000 genes vs. previously
predicted 100,000.
– Only twice the amount of a nematode or a fruit fly.
– Does not correlate to twice as complex.
– Alternative splicing: Invertebrate genes are more
innovative in their assembly of genes.
– Protein domains are mixed more creatively and in
larger numbers by invertebrates.
• Genes elusive.
Genetic Variation
• The International Single Nucleotide
Polymorphism (SNP) Map.
– Compiled 1.4 million SNPs (single-base pair
differences between individuals).
• Investigate:
–
–
–
–
–
Disease resistance.
Response to therapeutics.
Evolution.
Natural selection.
Individual traits.
Gene Variation Example
• Mutations in "breast cancer gene” BRCA2.
• Chromosomal location and beginning sequence with
one of the mapped variations.
Future Directions
• Fill gaps (refinement).
• Bioinformatics.
• Sequence additional
genomes.
– For comparison.
– Upcoming: mouse, fish,
dogs, kangaroo, chimpanzee
(most valuable).
• Proteomics.
• Gene and Protein Chips
(Microarrays).