One copy from each parent Each parent passes on a “mixed copy”

Download Report

Transcript One copy from each parent Each parent passes on a “mixed copy”

Biological Sequence Analysis
140.638.01
The materials used in this class
are made possible by:





Zhiping Weng, http://zlab.bu.edu
Wenyi Wang
Zhijin Wu
Garland publishing, Alberts’s the Cell
And the wealth of internet resources
Who are we?



Sining Chen
Carlo Colantuoni
Giovanni Parmigiani
Who are you?
•
Field of research
•
Stats & computing background
•
Register or audit
•
Why are you taking this course
•
Specific topics you are interested
Administrative Details
http://astor.som.jhmi.edu/~sining/BSA/syllabus.h
tm
The MHS program in Bioinfo


Jointly offered by Dept. Biostatistics and
Molecular Microbiology and
Immunology
An intensive one-year program that
emphasizes biology, statistical
methods, and computing
Goal of the class
•• Learn to look at biological sequences from a
probabilistic point of view
• Understand algorithms behind routine
operations, e.g. BLAST.
• Be able to build statistical model to solve
problems involving sequences
Biological Sequence Analysis:
Basic Biological Concepts
Carlo Colantuoni
Clinical Brain Disorders Branch, NIMH, NIH
Dept. Biostatistics, JHSPH
[email protected]
[email protected]
Molecular Cell Biology: Central Dogma
Replication
DNA
Transcription
RNA
Translation
Protein
Sequence analysis important at all 3 levels
The Human Genome
Genomic Content:
3.3 billion bases
~30K genes
23 chromosomes (22+X/Y)
Millions of variants
DAD
MOM
2 copies in every cell (46 chr)
One copy from each parent
Each parent passes on a “mixed copy”
YOU
Nucleotides are
the chemical
building block of
Nucleic Acids:
DNA and RNA
Nucleotides are
the chemical
building block of
Nucleic Acids:
DNA and RNA
From Genomic DNA to mRNA Transcripts
EXONS
INTRONS
Protein-coding genes are not easy to find - gene density is low, and exons are interrupted by introns.
~30K
>30K
Promoters
Alternative splicing
Poly-Adenylation
Molecular Cell Biology:
Components of the Central Dogma
Protein
Translation
START
mRNA
5’ UTR
protein coding
STOP
AAAAA
3’ UTR
Transcription
Genomic
DNA
3.3 Gb
Translation - Protein Synthesis: Every 3 nucleotides
(codon) are translated into one amino acid
DNA: A T G C
Replication
1:1
Transcription
RNA: A U G
C
3:1
Protein: 20 amino acids
Translation
Translation - Protein Synthesis
RNA
Protein
5’ -> 3’ : N-term -> C-term
Nucleotide sequence determines the amino
acid sequence
The Human Genome
Genomic Content:
3.3 billion bases
~30K genes
23 chromosomes
(22+X/Y)
2 copies in every cell
DAD
One copy from each parent
Each parent passes on a “mixed copy”
MOM
Deletions
Insertions
Mutations
Evolutionary Scale
YOU
Biological Sequence Analysis:
Primary Concepts
Identity
Homologue
&
Paralogue
Similarity
Ortholog