Alternative Splicing

Download Report

Transcript Alternative Splicing

Alternative Splicing
As an introduction to microarrays
QuickTime™ and a
TIFF (LZW) decompressor
are needed to see this picture.
QuickTime™ and a
TIFF (LZW) decompressor
are needed to see this picture.
QuickTime™ and a
TIFF (LZW) decompressor
are needed to see this picture.
QuickTime™ and a
TIFF (LZW) decompressor
are needed to see this picture.
Human Genome
• 90,000 Human proteins, initially assumed
near that number of genes (initial estimates
153,000)
• The 1000 cell roundworm Caenorhabditis
elegans has 19,500 genes, corn has 40,000
genes
• Current estimates are 25,000 or fewer genes
• Alternative splicing allows different tissue
types to perform different function with same
gene assortment
Implications
• 75% of human genes are subject to
alternative editing
• faulty gene splicing leads to cancer and
congenital diseases.
• gene therapy can use splicing
Application
• We talked before about apoptotis when
the cell determines it cant be repaired
• Bcl-x is a regulator of apoptotis, is
alternatively spliced to produce either
Bcl-x(L) that suppresses apoptosis, or
Bcl-x(S) that promotes it.
QuickTime™ and a
TIFF (LZW) decompressor
are needed to see this picture.
Spliceosome
• Five snRNA molecules U1, U2, U3, U4,
U5, U6 combine with as many as 150
proteins to form the spliceosome
• It recognizes sites where introns begin
and end
– Cuts introns out of pre-mRNA
– joins exons
QuickTime™ and a
TIFF (LZW) decompressor
are needed to see this picture.
Spliceosome
• The 5’ splice site is at the beginning of
the intron, the 3’ site is at the end
• The average human protein coding
gene is 28000 nucleotides long with 8.8
exons separated by 7.8 introns
• exons are 120 nucleotides long while
introns are 100-100,000 nucleotides
long
Splicing errors
• familial dysautonomia results from a singlenucleotide mutation that causes a gene to be
alternatively spliced in nervous system tissue
• The decrease in the IKBKAP protein leads to
abnormal nervous system development (half
die before 30)
• > 15% of gene mutations that cause genetic
diseases and cancers are caused by splicing
errors.
Why splicing
•
•
•
•
Each gene generates 3 alternatively spliced mRNAs
Why so much intron (1-2% of genome is exons)?
Mouse and human differences are almost all splicing
Half of the human genome is made up of transposable
elements, Alus being the most abundant (1.4 million copies)
– They continue to multiply and insert themselves into the
genome at the rate of one insertion per 100 human births
• mutations in the Alu can create a 5’ or 3’ site in an intron causing
it to be an exon
• This mutation doesn’t impact existing exons
• It only has effect when it is alternatively spliced in
QuickTime™ and a
TIFF (LZW) decompressor
are needed to see this picture.
Microarrays For Alt. Splicing
• Use short oligonucleotides
• Get a guess at the rate of expression of
the oligo
Exon 1
Exon 2
Exon 3
Exon 4
Exon 5
Affymetrix
Microarrays For Alt. Splicing
Exon 1
Exon 2
Exon 4
Exon 5
Exon 3
Isoform 1:
Exon 1
Exon 2
Exon 4
Exon 5
Isoform 2:
Exon 1
Exon 3
Exon 5
Probe types
Constitutive
Junction
Exon
Unique (“Cassette”)
Expression
Ideal Microarray Readings
a
Isoform 1:
b
a
Exon 1
c
Probe
d
c
Exon 2
Exon 4
Exon 5
b
Isoform 2:
a
Exon 1
e
d
Exon 3
Exon 5
e
Probe types
Constitutive
Exon
Junction
Unique (“Cassette”)
Motivation
• Why alternatively splice?
• How does it affect the resulting
proteins?
• Look at domains:
– High level summary of protein
– ~80% of eukaryotic proteins are multidomain
– Domains are big relative to an exon
Some Previous Work
• Signatures of domain shuffling in the
human genome. Kaessmann, 2002.
Intron phase symmetry around domain
boundaries
• The Effects of Alternative Splicing On
Transmembrane Proteins in the Mouse
Genome. Cline, 2004.
Half of TM proteins studied affected by altsplicing.
Method
• Predict Alternative Splicing
• Predict Protein Domains
• Look for effects of Alt-Splicing on
predicted domains
– “Swapping”
– “Knockout”
– “Clipping”
Microarray Design
• Genes based on mRNA and EST data
in mouse
• Mapped to Feb. 2002 mouse genome
freeze
• ~500,000 probes (~66,000 sets)
• ~100,000 transcripts
• ~13,000 gene models
Technical work
Genome Space
Provided data
gene models
transcripts
Overlap
Probe to transcript mapping
Generated Data
Overlap
E@NM_021320
cc-chr10-000017.82.0
G6836022@J911445
cc-chr10-000017.91.1
G6807921@J911524_RC cc-chr10-000018.4.0
probes
Predicting Alternative Splicing
• Using mouse alt-splicing microarrays
• Data from Manny Ares
– 8 tissues
– 3 replicates of each tissue
Predicting Alternative Splicing
• General Approach: Clustering, then
Anti-Clustering
107 Clusters
QuickTime™ and a
TIFF (LZW) decompressor
are needed to see this picture.
Detail View
Gene Expression
Measurement
• mRNA expression represents dynamic
aspects of cell
• mRNA expression can be measured with
latest technology
• mRNA is isolated and labeled with fluorescent
protein
• mRNA is hybridized to the target; level of
hybridization corresponds to light emission
which is measured with a laser
Gene Expression Microarrays
The main types of gene expression
microarrays:
• Short oligonucleotide arrays (Affymetrix);
• cDNA or spotted arrays (Brown/Botstein).
• Long oligonucleotide arrays (Agilent Inkjet);
• Fiber-optic arrays
• ...
Affymetrix Microarrays
Raw image
1.28cm
50um
~107 oligonucleotides,
half Perfectly Match mRNA (PM),
half have one Mismatch (MM)
Raw gene expression is intensity
difference: PM - MM
Microarray Potential
Applications
• Biological discovery
– new and better molecular diagnostics
– new molecular targets for therapy
– finding and refining biological pathways
• Recent examples
– molecular diagnosis of leukemia, breast cancer, ...
– appropriate treatment for genetic signature
– potential new drug targets
Microarray Data Analysis
Types
• Gene Selection
– find genes for therapeutic targets
– avoid false positives (FDA approval ?)
• Classification (Supervised)
– identify disease
– predict outcome / select best treatment
• Clustering (Unsupervised)
– find new biological classes / refine existing ones
– exploration
•…
Microarray Data Mining
Challenges
• too few records (samples), usually < 100
• too many columns (genes), usually > 1,000
• Too many columns likely to lead to False
positives
• for exploration, a large set of all relevant
genes is desired
• for diagnostics or identification of therapeutic
targets, the smallest set of genes is needed
• model needs to be explainable to biologists