Transcript Slide 1

For Bioinformatics, Start with:
Genomics:
READING genome sequences
carry out dideoxy sequencing
ASSEMBLY of the sequence
connect seqs. to make whole chromosomes
ANNOTATION of the sequence
find the genes!
The Human Genome
E. coli Genome
Reading:
Shotgun DNA Sequencing of whole genome (WGS)
DNA target sample
SHEAR
Reads
LIGATE &
CLONE
Primer
SEQUENCE
Vector
Reading to Assembly:
Assembly:
4 million bp
3 billion bp
The challenge of eukaryotic genomes
E. coli Genome
The Human Genome
50% of genome is repeat sequences!
Assembly of sequence of
each chromosome from end to end
END, Jan 14 begin
Annotation:
Genomics:
READING genome sequences
Robotically do dideoxy-dye data collection
ASSEMBLY of the sequence
Whole genome shotgun OR Ordered clones
ANNOTATION of the sequence
find the genes !
Annotation:
10/1/5
Genomics:
READING genome sequences
ASSEMBLY of the sequence
ANNOTATION of the sequence
find the genes ! 1. ab initio
2. by evidence
Annotation: For Bacterial genomes, ab initio is adequate
ab initio: “from the beginning”
‫יש מאין‬
from first principles…
ORFs are MOST of prokaryotic genome
Annotation: ab initio – finding ORFs
-85-88% of the nucleotides are associated with coding sequence
in the bacterial genomes that have been completely sequenced.
example: in Escherichia coli there are 4288 genes that
have an average of 950 bp of coding sequence
and are separated by an average of just 118 bp.
So first, to find genes in prokaryotic DNA, search for ORFs!!
Annotation: ab initio – finding ORFs
-85-88% of the nucleotides are associated with coding sequence
in the bacterial genomes that have been completely sequenced.
example: in Escherichia coli there are 4288 genes that
have an average of 950 bp of coding sequence
and are separated by an average of just 118 bp.
So first, to find genes in prokaryotic DNA, search for ORFs!!
Annotation: ab initio – beyond ORFs
beyond ORFs:
-Prokaryotes have short, simple promoters that are
easy to recognize
-Transcriptional terminators often consist of short inverted
repeats followed by a run of Ts.
-Therefore, programs that find prokaryotic genes search for:
ORFs 60 or more codons long –and codon usage
promoters at the 5' end
Terminators at the 3' end
Homology to known genes from other prokaryotes
Shine-Dalgarno sequences
Annotation: ab initio – automated
Prokaryotic gene finder examples
GlimmerInterpolated Markov Model method
GrailIINeural Network method
(See BioInfo text – Fig 8.8)
Annotation: results
Annotation: Multicellular eukaryotes
Done too 10/1/5
Annotation: Multicellular eukaryotes
Done too 10/1/5
Annotation: Multicellular eukaryotes
Done too 10/1/5
Annotation:
2 ways to annotate eukaryotic genomes:
-ab initio gene finders:
Work on basic biological principles:
Open reading frames
Codon usage
Consensus splice sites
Genes basedMet
onstart
previous
codons knowledge-EVIDENCE
…..
-cDNA sequence
of the gene’s message
-ab initio gene finders:
Work on basic biological principles:
Open reading frames
Codon usage
splice
sites seq.
-cDNAConsensus
of a related gene’s
message
-Genes based on previous knowledge….EVIDENCE
-Protein
sequence
of gene’s
the known
gene
Met
startof codons
-cDNA
sequence
the
message
-cDNA
of a closely
related gene’ message sequence
Same
gene’s
…..
-Protein sequence of the known gene
Same gene’s from another species
Same gene’s
Related
gene’s
Same gene’s
fromprotein…….
another species
Related gene’s protein…….
start and
stop site
predictions
Unique identifiers
Splice site
predictions
Homology
based exon
predictions
computational
exon
predictions
Tracking
information
Consensus gene
structure (both strands)
Automatically
generated
annotation
A zebrafish hit shows a gene model protein encoded by a 6 exon gene.
This gene structure (intron/exon) is seen in other species, as is the protein size.
The proteins, if corresponding to MSP in S. gal., must be heavily glycosylated (likely).
At least some have a signal peptide.
The zebrafish hit can be viewed at higher resolution, and…
The zebrafish hit can be viewed down to nucleotide resolution
Genomics:
READING genome sequences
carry
outeach
dideoxy
, 700 bp
read,sequencing
MAX
ASSEMBLY of the sequence
connect seqs. to make whole chromosomes
ANNOTATION of the sequence
Genomics:
READING genome sequences
carry out dideoxy sequencing
ASSEMBLY of the sequence
connect seqs. to make whole chromosomes
ANNOTATION of the sequence
find the genes!
Annotation:
cDNAs &
ESTs:
Expressed Sequence Tags
RNA target sample
End Reads (Mates)
cDNA Library
Primer
SEQUENCE
Each cDNA provides sequence from the two ends – two ESTs
Who Gets Sequenced?
Models
Pathogens
Agriculturals
Array analysis: see animation from Griffiths
Protein Structure Database
See Swiss-pdb viewer
RNA for ALL C. elegans genes
RNAi for every C. elegans
gene too!
-results on the web
Projects to systematically Knock-out (or pseudo-knockout)
every gene, in order to establish phenotype of each gene
-> function of each gene