No Slide Title

Download Report

Transcript No Slide Title

Statistics for Microarrays
Biological background: Molecular Biology
Class web site:
http://statwww.epfl.ch/davison/teaching/Microarrays/ETHZ/
Two types of organisms*
* Every biological ‘rule’ has exceptions!
Mendelian Genetics
http://www.stg.brown.edu/webs/MendelWeb/MWtoc.html
Human Chromosomes
Human Chromosome Banding Patterns
Chromosomes and DNA
Mitosis and Meiosis Compared
DNA Structure Discovery
Nature (1953), 171:737
“We wish to suggest a structure for the salt of deoxyribose
nucleic acid (D.N.A.). This structure has novel features which
are of considerable biological interest.”
DNA
• A deoxyribonucleic acid or DNA molecule
is a double-stranded linear polymer
composed of four molecular subunits
called nucleotides
• Each nucleotide comprises a phosphate
group, a deoxyribose sugar, and one of
four nitrogen bases: adenine (A), guanine
(G), cytosine (C), or thymine (T)
• The two strands are held together by
weak hydrogen bonds between
complementary bases
• Base-pairing occurs according to the rule:
G pairs with C, and A pairs with T
Polymorphic DNA Tertiary Structures
DNA B-type (7BNA)
(Watson-Crick form)
DNA A-type (140D) DNA Z-type (2ZNA)
(low water content) (high salt concentration)
Genes are linearly arranged along chromosomes
DNA Structure
(overview)
DNA Structure
The monomeric
units of nucleic
acids are called
nucleotides.
A nucleotide is a
phospate, a sugar,
and a purine (A, G)
or a pyramidine (T,
C) base.
Proteins
• Proteins: macromolecules composed of
one or more chains of amino acids
• Amino acids: class of 20 different organic
compounds containing a basic amino group
(-NH2) and an acidic carboxyl group (COOH)
• The order of amino acids is determined
by the base sequence of nucleotides in
the gene coding for the protein
• Proteins function as enzymes, antibodies,
structures, etc.
Amino acid codes
Ala
Arg
Asn
Asp
Cys
Gln
Glu
Gly
His
Ile
Leu
Lys
Met
Phe
Pro
Ser
Thr
Trp
Tyr
Val
Asx
Glx
Sec
Unk
A
R
N
D
C
Q
E
G
H
I
L
K
M
F
P
S
T
W
Y
V
B
Z
U
X
Alanine
Arginin e
Asparagin e
Aspartic acid
Cysteine
Glutamine
Glutamic acid
Glycine
Histidin e
Isoleucine
Leucine
Lysine
Methionine
Phenylalanine
Prolin e
Serine
Threonine
Tryptophan
Tyrosin e
Valine
Asn or Asp
Gln or Glu
Selenocysteine
Unknown
Primary Protein Structure
Multiple Levels of
Protein Strucure
( Protein folding)
Tertiary Structure of
Sperm whale myoglobin (1MBN)
(RT)
DNA Replication
Nature (1953), 171:737
“It has not escaped our notice that the specific pairing we have
postulated immediately suggests a possible copying
mechanism for the genetic material.”
DNA Replication
• The DNA strand that is copied to form a
new strand is called a template
• In the replication of a double-stranded or
duplex DNA molecule, both original
(parental) DNA strands are copied
• When copying is finished, the two new
duplexes, each consisting of one of the
original strands plus its copy, separate
from each other (semiconservative
replication)
Semiconservative Replication
DNA Replication, ctd
• Synthesis occurs in the chemical direction 5’3’
• Nucleic acid chains are assembled from 5’
triphosphates of deoxyribonucleosides (the
triphosphates supply energy)
• DNA polymerases are enzymes that copy DNA
• DNA polymerases require a short preexisting DNA
strand (primer) to begin chain growth. With a
primer base-paired to the template strand, a DNA
polymerase adds nucleotides to the free hydroxyl
group at the 3’ end of the primer.
• DNA replication requires assembly of many
proteins (at least 30) at a growing replication fork:
helicases to unwind, primases to prime, ligases to
ligate (join), topisomerases to remove supercoils,
RNA polymerase, etc.
DNA Replication Fork
DNA Synthesis
DNA is unwinding 
RNA
• RNA, or ribonucleic acid, is similar to DNA,
but
-- RNA is (usually) single-stranded
-- the sugar is ribose rather than
deoxyribose
-- uracil (U) is used instead of thymine
• RNA is important for protein synthesis and
other cell activities
• There are several classes of RNA
molecules, including messenger RNA
(mRNA), transfer RNA (tRNA), ribosomal
RNA (rRNA), and other small RNAs
The Genetic Code
• DNA: sequence of four different
nucleotides
• Protein: sequence of twenty different
amino acids
• The correspondence between the fourletter DNA alphabet and the twentyletter protein alphabet is specified by
the genetic code, which relates
nucleotide triplets, or codons, to amino
acids
Standard Genetic Code
Variation of genetic codes
T1
T2
T3
T4
T5
T6
T9
T10
T12
T13
T14
T15
CUU
CUC
CUA
CUG
Leu
Leu
Leu
Leu
-
Thr
Thr
Thr
Thr
-
-
-
-
-
Ser
-
-
-
AUU
AUC
AUA
AUG
Ile
Ile
Ile
Met
Met
-
Met
-
-
Met
-
-
-
-
-
Met
-
-
-
UAU
UAC
UAA
UAG
Tyr
Tyr
Stop
Stop
-
-
-
-
Gln
Gln
-
-
-
-
Tyr
-
Gln
AAU
AAC
AAA
AAG
Asn
Asn
Lys
Lys
-
-
-
-
-
Asn
-
-
-
-
Asn
-
-
UGU
UCG
UGA
UGG
Cys
Cys
Stop
Trp
Trp
-
Trp
-
Trp
-
Trp
-
-
Trp
-
Cys
-
-
Trp
-
Trp
-
-
AGU
AGC
AGA
AGG
Ser
Ser
Arg
Arg
Stop
Stop
-
-
Ser
Ser
-
Ser
Ser
-
-
Gly
Gly
Ser
Ser
-
T1: standard
T2: vert mt
T3: yeast mt
T4: other mt
T5: invert. mt
T6: cil. etc nuc.
T9: ech. mt
T10: eup. nuc.
T12:alt yeast nuc
T13: asc. mt
T14: flat. mt
T15: bleph. nuc.
Protein Synthesis
Transcription
• Transcription is a complex process involving
several steps and many proteins (enzymes)
• RNA polymerase synthesizes a single strand of
RNA against the DNA template strand (antisense strand), adding nucleotides to the 3’ end
of the RNA chain
• Initiation is regulated by transcription factors,
including promoters, usually an initiator element
and TATA box, usually lying just upstream (at
the 5’ end) of the coding region
• 3’ end cleaved at AAUAAA, poly-A tail added
Exons and Introns
• Most of the genome consists of non-coding
regions
• Some non-coding regions (centromeres and
telomeres) may have specific chomosomal
functions
• Other non-coding regions have regulatory
purposes
• Non-coding, non-functional DNA often called junk
DNA, but may have some effect on biological
functions
• The terms exon and intron refer to coding and
non-coding DNA, respectively
Intron Splicing
Translation
• The AUG start codon is recognized by
methionyl-tRNAiMet
• Once the start codon has been identified,
the ribosome incorporates amino acids into
a polypeptide chain
• RNA is decoded by tRNA (transfer RNA)
molecules, which each transport specific
amino acids to the growing chain
• Translation ends when a stop codon (UAA,
UAG, UGA) is reached
Translation Illustrated
From Primary Transcript
to Protein
Alternative Splicing (of Exons)
• How is it possible that there are
millions of human antibodies when there
are only about 30,000 genes?
• Alternative splicing refers to the
different ways the exons of a gene may
be combined, producing different forms
of proteins within the same gene-coding
region
• Alternative pre-mRNA splicing is an
important mechanism for regulating
gene expression in higher eukaryotes
Acknowledgements
• http://www.accessexcellence.org/AB/GG
•http://www.oup.co.uk/best.textbooks/bioch
emistry/genesvii
• Sandrine Dudoit, UC Berkeley Biostatistics
• Yee Hwa Yang, UC Berkeley Statistics
• Terry Speed, UC Berkeley Statistics and
WEHI, Melbourne, Australia