No Slide Title

Download Report

Transcript No Slide Title

Bioinformatics approaches
to gene expression
Introduction to Bioinformatics
ME:440.714
J. Pevsner ([email protected])
Copyright notice
Many of the images in this powerpoint presentation
are from Bioinformatics and Functional Genomics
by J Pevsner (ISBN 0-471-21004-8).
Copyright © 2003 by Wiley.
These images and materials may not be used
without permission from the publisher.
Visit http://www.bioinfbook.org
Announcements
Immediately after today’s class, the first exam will be
available on the back table. It is due by Friday, 2pm.
Immediately after Wednesday’s class on microarrays,
Tom Downey (President, Partek Inc.) will give a lecture
here on microarray data analysis (in Mountcastle).
I will e-mail solutions to computer lab problems
(chapters 2-5) to the class.
Outline of upcoming lectures
The first third of the course covered sequence analysis,
including BLAST.
Today we begin the middle third of the course:
functional genomics. We will study how DNA is
transcribed to RNA (i.e. gene expression), and we will
discuss microarrays. Then we will study proteins.
We will perform multiple sequence alignments,
then visualize those alignments in phylogenetic trees.
The last third of the course will cover genomes.
Gene expression is regulated
in several basic ways
• by region (e.g. brain versus kidney)
• in development (e.g. fetal versus adult tissue)
• in dynamic response to environmental signals
(e.g. immediate-early response genes)
• in disease states
• by gene activity
Page 157
virus
bacteria
fungi
invertebrates
rodents
human
Disease
Cell types
Development
In response to stimuli
In mutant or wildtype cells
In virus, bacteria, and/or host
Organism
Gene expression changes measured...
Page 158
DNA
RNA
protein
phenotype
cDNA
Page 159
DNA
RNA
protein
DNA
cDNA
RNA
protein
cDNA
UniGene
SAGE
microarray
Page 159
DNA
RNA
protein
phenotype
cDNA
[1] Transcription
[2] RNA processing (splicing)
[3] RNA export
[4] RNA surveillance
Page 160
5’ exon 1
3’
intron
exon 2
3’
exon 3 5’
intron
transcription
5’
3’
RNA splicing
(remove introns)
3’
5’
polyadenylation
5’
AAAAA 3’
Export to cytoplasm
Page 161
Relationship of mRNA to genomic DNA for RBP4
Page 162
Analysis of gene expression in cDNA libraries
A fundamental approach to studying gene expression
is through cDNA libraries.
• Isolate RNA (always from a specific
organism, region, and time point)
insert
• Convert RNA to complementary DNA
• Subclone into a vector
vector
• Sequence the cDNA inserts.
These are expressed sequence tags
(ESTs)
Page 162-163
UniGene: unique genes via ESTs
• Find UniGene at NCBI:
www.ncbi.nlm.nih.gov/UniGene
• UniGene clusters contain many ESTs
• UniGene data come from many cDNA libraries.
Thus, when you look up a gene in UniGene
you get information on its abundance
and its regional distribution.
Page 164
Cluster sizes in UniGene
This is a gene with
1 EST associated;
the cluster size is 1
Page 164
Cluster sizes in UniGene
This is a gene with
10 ESTs associated;
the cluster size is 10
Page 164
Cluster sizes in UniGene
Cluster size
Number of clusters
1
34,000
2
14,000
3-4
15,000
5-8
10,000
9-16
6,000
17-32
4,000
500-1000
500
2000-4000
50
8000-16,000
3
>16,000
1
Page 164
Ten largest UniGene clusters (10/02)
Cluster
size
25,232
14,277
14,231
12,749
10,649
10,596
10,290
9,987
9,667
9,058
Gene
eukary. translation EF (Hs.181165)
GAPDH (Hs.169476)
ubiquitin (Ta.9227)
actin, gamma 1 (Hs.14376)
euk transl EF (Mm.196614)
ribosomal prot. S2 (Hs.356360)
hemoglobin, beta (Mm.30266)
mRNA, placental villi (Hs.356428)
actin, beta (Hs.288061)
40S ribosomal prot. S18 (Dr.2984)
Page 165
Digital Differential Display (DDD) in UniGene
• UniGene clusters contain many ESTs
• UniGene data come from many cDNA libraries
• Libraries can be compared electronically
Page 165
Page 166
Page 166
Page 166
UniGene brain
libraries
UniGene lung
libraries
Page 167
Page 167
CamKII
up-regulated
in brain
n-sec1 up-regulated
in brain
surfactant upregulated in lung
Page 167
Fisher’s Exact Test: deriving a p value
Gene 1
Pool A
g1A
All other genes
total
NA-g1A
NA
NB
Pool B
g1B
NB-g1B
total
c = g1A + g1B
C = (NA-g1A) + (NB-g1B)
Page 167
Pitfalls in interpreting cDNA library data
• bias in library construction
• variable depth of sequencing
• library normalization
• error rate in sequencing
• contamination (chimeric sequences)
Pages 166-168
Page 168-169
Serial analysis of gene expression (SAGE)
• 9 to 11 base “tags” correspond to genes
• measure of gene expression in different
biological samples
• SAGE tags can be compared electronically
Page 169
SAGE tags are mapped to UniGene clusters
Tag 1
Tag 1
Tag 2
Tag n
Cluster 1
Cluster 2
Cluster 3
Cluster 1
Page 169
Page 171
Page 171
Page 172
Page 173
Page 174
Page 175
Page 175
Microarrays: tools for gene expression
A microarray is a solid support (such as a membrane
or glass microscope slide) on which DNA of known
sequence is deposited in a grid-like array.
RNA is isolated from matched samples of interest.
The RNA is typically converted to cDNA, labeled with
fluorescence (or radioactivity), then hybridized to
microarrays in order to measure the expression levels
of thousands of genes.
Page 173
Questions addressed using microarrays
• Wildtype versus mutant
• Cultured cells +/- drug
• Physiological states (hibernation, cell polarity formation)
• Normal versus diseased tissue (cancer, autism)
Page 173
Organisms represented on microarrays
• metazoans: human, mouse, rat, worm, insect
• fungi: yeast
• plants: Arabidopsis
• other: bacteria, viruses
Advantages of microarray experiments
Fast
Data on 15,000 genes in 1-4 weeks
Comprehensive
Entire yeast genome on a chip
Flexible
• As more genomes are sequenced,
more arrays can be made.
• Custom arrays can be made
to represent genes of interest
You can submit RNA samples
to a core facility for analysis
Easy
Cheap?
Chip representing 15,000 genes for $350;
robotic spotter/scanner cost $100,000
Page 175
Disadvantages of microarray experiments
Cost
Many researchers can’t afford to do
appropriate controls, replicates
RNA
The final product of gene expression is protein
significance
(see pages 174-176 for references)
Quality
control
Impossible to assess elements on array surface
Artifacts with image analysis
Artifacts with data analysis
Page 176
Sample
acquisition
RNA: purify, label
Data
acquisition
Microarray: hybridize,
wash, image
Data
analysis
Data
confirmation
Biological insight
Page 176
Stage 1: Experimental design
[1] Biological samples: technical and biological replicates
[2] RNA extraction, conversion, labeling, hybridization
[3] Arrangement of array elements on a surface
Page 177
Sample 1
Sample 2
Sample 3
Page 177
Samples 1,2
Samples 1,3
Sample 1, pool Sample 2, pool
Samples 2,3
Samples 2,1:
switch dyes
Page 177
Stage 2: RNA and probe preparation
For Affymetrix chips, need total RNA (about 10 ug)
Confirm purity by running agarose gel
Measure a260/a280 to confirm purity, quantity
Page 178
Basic sciences Affymetrix core
http://microarray.mbg.jhmi.edu/
Johns Hopkins Oncology Center
Microarray Core
http://www.hopkinsmedicine.org/microarray/
Johns Hopkins University
NIDDK Gene Profiling Center
http://www.hopkinsmedicine.org/
nephrology/microarray/
The Hopkins Expressionists
http://astor.som.jhmi.edu/hex/
Gene expression methodology
seminar series
http://astor.som.jhmi.edu/hex/gem.html
Stage 3: hybridization to DNA arrays
The array consists of cDNA or oligonucleotides
Oligonucleotides can be deposited by photolithography
The sample is converted to cRNA or cDNA
Page 178-179
Microarrays: array surface
Page 179
Microarrays: robotic spotters
See Nature Genetics microarray supplement
Stage 4: Image analysis
RNA expression levels are quantitated
Fluorescence intensity is measured with a scanner,
or radioactivity with a phosphorimager
Page 180
Differential Gene Expression on a cDNA Microarray
Control
Rett
a B Crystallin
is over-expressed
in Rett Syndrome
Page 180
Page 181
Page 181
Page 181
Stage 5: Data analysis
This is the subject of Wednesday’s class
• How can arrays be compared?
• Which genes are regulated?
• Are differences authentic?
• What are the criteria for statistical significance?
• Are there meaningful patterns in the data
(such as groups)?
Page 180
Microarray data analysis
preprocessing
inferential
statistics
exploratory
statistics
Page 180
Microarray data analysis
preprocessing
global normalization
local normalization
scatter plots
inferential
statistics
t-tests
exploratory
statistics
clustering
Page 180
Matrix of genes versus samples
Metric (define distance)
principal
components
analysis
clustering
Trees
(hierarchical,
k-means)
supervised,
unsupervised
analyses
selforganizing
maps
Page 180
Stage 6: Biological confirmation
Microarray experiments can be thought of as
“hypothesis-generating” experiments.
The differential up- or down-regulation of specific
genes can be measured using independent assays
such as
-- Northern blots
-- polymerase chain reaction (RT-PCR)
-- in situ hybridization
Page 182
Stage 7: Microarray databases
There are two main repositories:
Gene expression omnibus (GEO) at NCBI
ArrayExpress at the European Bioinformatics
Institute (EBI)
See the URLs on page 184
Page 182
Gene expression omnibus (GEO)
NCBI repository for gene expression data
http://www.dnachip.org
Page 183
Microarrays: web resources
• Many links on Leming Shi’s page:
http://www.gene-chips.com
• Stanford Microarray Database
http://www.dnachip.org
• links at http://pevsnerlab.kennedykrieger.org/
Database Referencing of Array Genes Online
(DRAGON)
Database Referencing of Array Genes Online
(DRAGON)
Credit:
Christopher Bouton
Carlo Colantuoni
George Henry
Paste accession numbers
into DRAGON here
DRAGON relates genes
to KEGG pathways