Transcript Genomics

From Mendel to Genomics
• Historically
– Identify or create mutations,
follow inheritance
– Determine linkage, create
maps
• Now: Genomics
– Not just a gene, but as many
genes as may be involved in a
process.
www.bastardidentro.com
1
2
Genomics: The study of genes and their function.
Genomics aims to understand the structure of the
genome, including mapping genes and sequencing the
DNA. Genomics examines the molecular mechanisms
and the interplay of genetic and environmental factors
in disease.
Genomics:
Focus: entire genome, not individual genes
Uses recombinant DNA methods
Methodology in place for sequencing entire genomes and
looking at the activity of multiple genes simultaneously
3
Genomics includes:
Functional genomics -- the characterization of genes
and their mRNA and protein products.
Structural genomics -- the dissection of the
architectural features of genes and chromosomes.
Comparative genomics -- the evolutionary relationships
between the genes and proteins of different species.
http://www.medterms.com/script/main/art.asp?articlekey=23242
Bioinformatics
• Sequencing creates
huge amount of
information that must be
stored and analyzed
• Bioinformatics is the
science of methods for
storing and analyzing
that information
– Melding of computer
science and molecular
biology
http://www.swbic.org/products/clipart/images/bioinformatics.jpg
4
Sequencing the Human Genome
• Publicly funded consortium
– Clone-by-clone method
– Create library of clones of entire genome
– Order clones using restriction enzyme maps and
various DNA markers
– Then sequence each clone
• Craig Venter and private enterprise
– Shotgun method
– Create library of clones of entire genome
– Sequence all the clones
– Use supercomputer to determine order
• Sequencing done multiple times to get it right.
5
Sequencing the Human Genome
• A Huge job
– Human DNA has over 3
trillion has pairs (3 x 109)
– Much of the technology
had to be invented and
improved to do this
particular job
www.achievement.org/.../achievers/col1-005
6
Clone-by-clone
www.yourgenome.org/ intermediate/all/
Shotgun approach
7
8
Is sequencing a genome the answer?
No, only the beginning of the questions.
http://www.insectscience.or
g/2.10/ref/fig5a.gif
Annotation: making sense of the sequence
9
• Looking for regulatory regions, RNA genes,
repetitive regions, and protein genes.
• Finding protein genes
– Look for ORFs (open reading frames)
• Start codon (ATG), stop codon.
• Codons must be “in frame”, distance long enough
– Problems: 3 reading frames x 2 strands, widely
spaced genes, introns.
– Help: new software finds TATA box and other
elements; codon bias can help
• Different codons not used equally in organisms
Where is the reading frame?
Could start in one of 3 different places.
10
11
Find the start codon. Do all the codons that follow
spell out a protein sequence seen before?
Functional Genomics
12
• OK you have a sequence. What does the gene
do? What is the function of the protein?
– Search the databases for similar sequences
– Is the sequence similar to sequences for proteins of
known function?
– Use computer to search for functional motifs.
• Various proteins that do the same thing have
similar structural elements.
• Example: transcription factors that have lecuine
zippers bind to DNA
About Human Genome
13
• The average gene: 3000 bases, but sizes vary greatly
– largest known human gene is dystrophin: 2.4 million bases.
• The total number of genes is estimated at 30,000
• Almost all (99.9%) nucleotide bases are exactly the
same in all people.
• The functions are unknown for over 50% of discovered
genes.
• Less than 2% of the genome codes for proteins.
• Repeated sequences are at least 50% of genome.
http://www.ornl.gov/sci/techresources/Human_Genome/project/info.shtml
Fundamental questions
14
• Questions can be asked using whole genome
information that couldn’t before.
– How did genomes evolve?
– What is the minimum number of genes necessary
for a free-living organism?
• Much can be learned about the ecology of an
organism by genomics and proteomics.
– First bacterium sequenced: Mycoplasma genitalium
– Lives a parasitic existence, evident from genes.
Protein function
Amino acid biosynthesis
Purine, pyrimidine, nucleoside and nucleotide
metabolism
Fatty acid and phospholipid metabolism
Biosynthesis of co-factors, prosthetic groups
and carriers
Central intermediary metabolism
Energy metabolism
Transport and binding proteins
DNA metabolism
Transcription
15
# of genes
0
19
8
4
7
33
33
29
13
16
Protein synthesis
Protein fate
Regulatory functions
Cell envelope
Cellular processes
Other categories
Unknown
Hypothetical
Database match
No database match
90
21
5
29
6
0
12
Total number
483
168
6
Advances in understanding genomes
• Prokaryotic- eubacterial
• not all genomes are circular
• not all genomes are in one piece
• when is a plasmid not a plasmid but a
chromosome?
• not all genomes are small
• very little wasted space, very few with introns
• Significant quantity of genes organized into
operons
17
Understanding-2
• Archaeal genomes similar to eubacteria but
• have histones, sequence similarities to
eukaryotes, and introns in tRNA genes
• Eukaryotic genomes -wide variations
• low gene density, that is few genes per amount
of DNA
• introns, more in some (humans) than others
• repetitive sequences
18
Proteomics
• Proteome: all the proteins an organism makes
• Proteomics: the study of those proteins
– Timing of gene expression
– Regulation of gene expression
– Modifications made to proteins
– Functions of the proteins
– Subcellular location of proteins
http://www.emc.maricopa.edu/faculty/farabee/BIOBK/3_14d.jpg
19
Proteomics: study of proteins
• Proteomics
– 30,000 genes, 100,000 different proteins
• must be lots of post translational modifications
–>100 different ways of modifying proteins
–addition of groups, crosslinking, inteins
• many genes code for proteins of unknown
function
– methods of study
• 2D gel electrophoresis
• Peptide fragments generated with trypsin,
studied by MS
20
2D gel electrophoresis of proteins
Blue and green
arrows mark
proteins of interest.
Proteins of
Halobacterium.
Left to right: pH
Vertical: MW
Spots digested w/
trypsin then studied
using mass spec.
http://www.biochem.mpg.de/en/research/rd/oesterhelt/web_page_list/Proteome_Hasal_cytosolic/absatz_3_bild.gif
21