Nessun titolo diapositiva

Download Report

Transcript Nessun titolo diapositiva

DNA sequence
1977
Allan Maxam and Walter Gilbert
(pictured) at Harvard University
Frederick Sanger at
the U.K. Medical Research Council (MRC)
independently develop methods for
sequencing DNA
(PNAS, February; PNAS, December).
The Nobel Prize in Chemistry 1980
"for his fundamental studies of the
biochemistry of nucleic acids,
with particular regard to
recombinant-DNA"
"for their contributions concerning the
determination of base sequences in nucleic acids"
Paul Berg
Walter Gilbert
Frederick Sanger
Stanford University
Stanford, CA, USA
Biological Laboratories
Cambridge, MA, USA
MRC Laboratory of Molecular Biology
Cambridge, United Kingdom
1/2 of the prize
USA
b. 1926
1/4 of the prize
USA
b. 1932
1/4 of the prize
United Kingdom
b. 1918
The Nobel Prize in Chemistry 1958
"for his work on the structure of proteins, especially that of insulin"
Frederick Sanger
United Kingdom
University of Cambridge
Cambridge, United Kingdom
b. 1918
Maxam and Gilbert method is based on the chemical
degradation of selective bases on DNA
it is not any more in use for DNA sequence!.
Sanger method is the only one used now is also
known as enzymatic becauses uses:
DNA polymerase an enzyme that synthesizes a
daughter strand(s) of DNA
(under direction from a DNA template).
Parental strands of DNA
are the two
complementary strands
of duplex DNA before
replication.
1986 (June) Leroy Hood (pictured) and Lloyd
Smith of the California Institute of Technology
(Caltech) and colleagues announce the first
automated DNA sequencing machine (Nature)
1991(June) NIH biologist J. Craig Venter
announces a strategy to find expressed
genes, using ESTs (Science). A fight
erupts at a congressional hearing 1 month
later, when Venter reveals that NIH is filing
patent applications on thousands of these
partial genes.
1992 (June) Venter leaves NIH to set up The
Institute for Genomic Research (TIGR), a
nonprofit in Rockville, Maryland. William
Haseltine heads its sister company, Human
Genome Sciences, to commercialize TIGR
products.
There are two approaches for sequencing large
repeat-rich genomes:
The first is a whole-genome shotgun
sequencing approach, as has been used for the repeat-poor
genomes of viruses, bacteria and flies, using linking
information and computational analysis to attempt to avoid
misassemblies. CELERA
The second is the 'hierarchical shotgun sequencing' approach
also referred to as 'map-based', 'BAC-based' or 'clone-byclone'. Performed in collaboration involving 20 groups from the
United States, the United Kingdom, Japan, France, Germany
and China to produce a draft sequence of the human genome.
H G P approach
DNA cloned labelled with fluorescent
Form contigues clones
1998 (May) PE Biosystems Inc. introduces the PE
Prism 3700 capillary sequencing machine.
(May) Venter announces a new company named
Celera and declares that it will sequence the human
genome within 3 years for $300 million.
(May) In response, the Wellcome Trust doubles its
support for the HGP to $330 million, taking
on responsibility for one-third of the sequencing.
Graig Wenter
1999 (March) NIH again moves up the completion
date for the rough draft, to spring 2000. Largescale sequencing efforts are concentrated in
centers at Whitehead, Washington University,
Baylor, Sanger, and DOE's Joint Genome Institute.
(September) NIH launches a project to sequence
the mouse genome, devoting $130 million over 3
years.
International Human Genome Sequencing Consortium
1999 (March) NIH again moves up the completion
date for the rough draft, to spring 2000. Largescale sequencing efforts are concentrated in
centers at Whitehead, Washington University,
Baylor, Sanger, and DOE's Joint Genome Institute.
ERIC S. LANDER
(September) NIH launches a
project to sequence the mouse
genome, devoting $130 million
over 3 years.
2000 (March) Celera and academic collaborators
sequence the 180-Mb genome of the fruit
flyDrosophila melanogaster (left), the largest genome
yet sequenced and a validation of Venter's
controversial whole-genome shotgun method
(Science).
(June) At a White House ceremony, HGP and Celera
jointly announce working drafts of the human
genome sequence, declare their feud at an end, and
promise simultaneous publication.
(December) HGP and Celera's plans for joint
publication in Science collapse; HGP sends its paper
to Nature.
Francis Collins
Director HGP
2001(February) The HGP consortium publishes
its working draft in Nature (15 February), and
Celera publishes its draft in Science (16
February).
Analisi delle ORF per identificare i geni nel genoma
From DNA sequence to protein model
Si puo’ ipotizzare la funzione di un gene dalla sua
similitudine con geni a funcione nota
Only 1% of the human genome consists of coding frames.
The exons comprise ~5% of each gene, so genes
(exons plus introns) comprise ~25% of the genome.
The human genome has 30,000-40,000 genes.
~60% of human genes are alternatively spliced, and
~70 of the alternative splices change protein
sequence, so the proteome has ~50,000-60,000 members.
Gene cluster is a group of adjacent genes
that are identical or related.
Gene family consists of a set of genes
whose exons are related; the members
were derived by duplication and
variation from some ancestral gene.
Gene clusters are formed by duplication and
divergence
All globin genes are descended by
duplication and mutation from an ancestral
gene that had three exons.
This gave rise to myoglobin, leghemoglobin,
and a- and b-globins. The a- and b-globin
genes separated in
the period of early vertebrate evolution, after
which duplications generated the individual
clusters of separate
a-like and b-like genes.
Sequence divergence is the basis for the evolutionary clock
Divergence is the percent difference in nucleotide sequence between
two related DNA sequences or in amino acid sequences between two
proteins.
Evolutionary clock is defined by the rate at which mutations
accumulate in a given gene.
Replacement sites in a gene are those at which mutations alter the
amino acid that is coded.
Pseudogenes
are dead ends of evolution
are inactive but stable components of the genome derived by mutation of an
ancestral active gene. Usually they are inactive because of mutations that block
transcription or translation or both.
they can be recognized by sequence similarities with existing functional genes.
They arise by the accumulation of mutations in (formerly) functional genes.
Once a gene has bene inactivated by mutation, it may accumulate further
mutations and become a pseudogene, which is homologous to the active
gene(s) but has no functional role.
Processed pseudogenes
• lack introns, and their sequences are derived
from a transcript rather than the genome. They arise by reverse transcription
of an RNA followed by insertion of the DNA copy into the genome.
• presumably originate by reverse transcription of mRNA
and insertion of a duple copy into the genome.
Unequal crossing-over rearranges gene clusters
Thalassemia is disease of red blood cells resulting from
lack of either a or b globin.
Unequal crossing-over describes a recombination event in
which the two recombining sites lie at nonidentical locations
in the two parental DNA molecules.
Unequal crossing-over is caused by mispairing between
nonallelic genes when a genome contains a cluster
of genes with related sequences. It produces a deletion in
one recombinant chromosome and a corresponding
duplication in the other.
Different thalassemias are caused by various deletions that
eliminate - or -globin genes. The severity of the disease
depends on the individual deletion.
How many genes are essential?
Not all genes are essential. In yeast and fly, deletions of
<50% of the genes have detectable effects.
Some genes are redundant; any one of a group can
provide the necessary function.
We do not fully understand the survival in the genome of
genes that are apparently dispensable.
Variations pf the DNA sequences in each individue
Mutation that cause
e genetic disease
polymorphisms (SNP)
0.1 % of the human DNA is responsible for the
Polymorphisms in human populations
Minisatellites are useful for genetic mapping
Microsatellite DNAs consist of repetitions of extremely short (typically
<10 bp) units.
Minisatellite DNAs consist of ~10 copies of a short repeating sequence.
the length of the repeating unit is measured in 10s of base pairs. The
number of repeats varies between individual genomes.
The variation between microsatellites or minisatellites in individual
genomes can be used to identify heredity unequivocally by showing
that 50% of the bands in an individual are derived from a particular
parent.
Minisatellite DNAs consist of ~10 copies of a short repeating sequence. the
length of the repeating unit is measured in 10s of base pairs. The number of
repeats varies between individual genomes.
Satellite DNA consists of many tandem repeats (identical or related) of a
short basic repeating unit.
Unequal crossing-over describes a recombination event in which the two
recombining sites lie at nonidentical
locations in the two parental DNA molecules.
Satellite DNA has a simple repeating sequence
and no coding function. It is often the major
constituent of centromeric heterochromatin.
Satellite DNA consists of many tandem repeats
(identical or related) of a short basic repeating
unit.
Euchromatin comprises all of the genome in the
interphase nucleus except for the
heterochromatin.
Heterochromatin describes regions of the genome
that are permanently in a highly condensed
condition, are not transcribed, and are latereplicating. May be constitutive or facultative.
The use of minisatellite variant repeat-polymerase chain reaction
(MVR-PCR) to determine the source of saliva on a used postage stamp.
Hopkins B, Williams NJ, Webb MB, Debenham PG, Jeffreys AJ
J Forensic Sci 1994 Mar;39(2):526-31
Cellmark Diagnostics, Oxfordshire, England.
How many genes are expressed?
mRNAs expressed at low levels overlap extensively when
different cell types are compared.
The abundantly expressed mRNAs are usually specific for
the cell type.
~10,000 expressed genes may be common to most cell
types of a higher eukaryote.
Genes are expressed at widely differing levels
Abundance of an mRNA is the average number of molecules
per cell.
In any given cell, most genes are expressed at a low level.
Only a small number of genes, whose products are
specialized for the cell type, are highly expressed.
"Chip" technology allows a snapshot to be
taken of the expression of the entire genome in
a yeast cell.
~75% (~4500 genes) of the yeast genome is
expressed under normal growth conditions.
Chip technology allows detailed comparisons of
related animal cells to determine (for example)
the differences in expression between a normal
cell and a cancer cell.