Introduction to Microarrays - Middle East Technical University
Download
Report
Transcript Introduction to Microarrays - Middle East Technical University
Introduction to Microarrays
Dr. Özlem İLK & İbrahim ERKAN
2011, Ankara
Gene
The fundamental unit in a living organism.
It holds information necessary for building
cells and passing genetic traits to offsprings.
Every single cell in the human body have
the same set of genes.
However, different genes are active (and
therefore "expressed") in different kinds of
cells and tissues.
2
Gene
The amount of activity (expression) in a
gene is an indicator whether it is being used
to form an organic structure or not.
3
What is a microarray?
DNA
microarrays are
small solid
surfaces on
which thousands
of gene
sequences are
contained.
4
Microarray
The spots of gene sequences are placed in
an order.
Therefore, the researcher can keep track of
the gene sequences and uses the location of
each spot in the array to identify a
particular gene sequence.
5
Microarray
Microarrays help easily and quickly analyze
thousands of genes at once.
Analyzing genes refers to the determination of
activity and also the amount of activity in a
specific gene.
This can help finding active genes in the existence
of a certain disease or a treatment providing
guidance for medical researchers.
Interdisciplinary work – statisticians, biologists,
doctors, engineers, …
6
Methods
Preprocessing – MAS5, dChip,RMA, gcRMA ...
Probe ID
1
2
3
4
5
6
100_g_at
7.4587
7.321
7.2037
7.3204 7.333
7.486
1000_at
6.2708
6.276
6.0370
6.046
6.052
6.262
1708_at
11.62
11.409
11.398
4.35
4.358
4.534
fold change
Clustering
ANOVA, t-test
multiple testing correction
7
Fold change: (GE1 / GE2)
Probe ID
Case
Control
Fold change
100_g_at
7.4587
7.3204
1.012 (7.4587/7.321)
1000_at
6.2708
6.046
0.999
1708_at
11.62
4.35
2.67
+Easy
+Can use in data sets without any replicates
−Not a statistical method, doesn’t take the variance
into account, its sensitivity and reliability are in
doubt
8
Clustering
+Easy
−Can’t measure the statistical significance
and would find clusters in the data even if
there aren’t reasonable clusters in the
dataset
−Can be affected from the data
transformation or from the measurement
unit (Xu et al., Human Molecular Genetics,
2002)
9
ANOVA
H0: µ1i=µ2i=µ3i , H1: at least one is different, i: prob set
Probe ID Drug1,
Rep1
Drug1,
Rep2
Drug2,
Rep1
Drug2,
Rep2
Drug3,
Rep1
Drug3,
Rep2
p-value
100_g_at
7.4587
7.321
7.2037
7.3204
7.333
7.486
0.46
1000_at
6.2708
6.276
6.0370
6.046
6.052
6.262
0.0043
1708_at
11.62
11.409
11.398
11.35
4.358
4.534
0.00288
10
Interpretation of the results
Rejection region
Rejection region
X
-1.96
1.96
z
If p-value < alpha, then reject H0
Alpha is usually 0.05, 0.01 or 0.1
E.g., expected false positive is
10,000 gene * 0.05 = 500
11
ANOVA
+
+
−
−
−
Can test the difference between groups
Takes the variance into account
Can’t be applied to data w/o replicates
Assumes data come from Normal distribution
The rejection of H0 does not provide the
information on which groups are different, we
need pairwise comparisons (t-tests) for this.
12
t-test
Probe ID
p (1,2)
p (2,3)
p (1,3)
100_g_at
0.292
0.263
0.863
1000_at
0.0005
0.386
0.383
1708_at
0.325
0.00017
0.0004
H0: µ1i=µ2i , H1: µ1i≠µ2i
+ Can measure the differences between two groups
+ Takes the variance into account
− Can’t be applied to data w/o replicates
− Assumes data come from Normal distribution
Warning: paired / unpaired t-test
13
Multiple testing correction
Assume that we are holding two independent tests (!) for two
genes. Let the probability of each being correct be 0.95. Due to
independence, the probability of both test being correct is
0.95*0.95 = 0.9025.
− Bonferroni
New alpha = alpha / test number
Too conservative
+ Benjamini-Hochberg FDR
Example, suppose you found 100 expressed genes out of 10,000 at
alpha= 0.0001
Expected false positive number
10,000 gene * 0.0001 = 1
False Discovery Rate (FDR) = 1/100 = 0.01
14
Recommended references
Pavlidis, P.. Using ANOVA for gene selection from
microarray studies of the nervous system.
Methods, (2003); 31, 282–289.
Books in the series of QP624.5 in the library
A few courses in Computer Engineering
Department (Tolga Can)
15