Introduction to Microarrays - Middle East Technical University

Download Report

Transcript Introduction to Microarrays - Middle East Technical University

Introduction to Microarrays
Dr. Özlem İLK & İbrahim ERKAN
2011, Ankara
Gene
 The fundamental unit in a living organism.
It holds information necessary for building
cells and passing genetic traits to offsprings.
 Every single cell in the human body have
the same set of genes.
 However, different genes are active (and
therefore "expressed") in different kinds of
cells and tissues.
2
Gene
 The amount of activity (expression) in a
gene is an indicator whether it is being used
to form an organic structure or not.
3
What is a microarray?
 DNA
microarrays are
small solid
surfaces on
which thousands
of gene
sequences are
contained.
4
Microarray
 The spots of gene sequences are placed in
an order.
 Therefore, the researcher can keep track of
the gene sequences and uses the location of
each spot in the array to identify a
particular gene sequence.
5
Microarray
 Microarrays help easily and quickly analyze
thousands of genes at once.
 Analyzing genes refers to the determination of
activity and also the amount of activity in a
specific gene.
 This can help finding active genes in the existence
of a certain disease or a treatment providing
guidance for medical researchers.
 Interdisciplinary work – statisticians, biologists,
doctors, engineers, …
6
Methods
Preprocessing – MAS5, dChip,RMA, gcRMA ...
Probe ID
1
2
3
4
5
6
100_g_at
7.4587
7.321
7.2037
7.3204 7.333
7.486
1000_at
6.2708
6.276
6.0370
6.046
6.052
6.262
1708_at
11.62
11.409
11.398
4.35
4.358
4.534
fold change
Clustering
ANOVA, t-test
multiple testing correction
7
Fold change: (GE1 / GE2)
Probe ID
Case
Control
Fold change
100_g_at
7.4587
7.3204
1.012 (7.4587/7.321)
1000_at
6.2708
6.046
0.999
1708_at
11.62
4.35
2.67
+Easy
+Can use in data sets without any replicates
−Not a statistical method, doesn’t take the variance
into account, its sensitivity and reliability are in
doubt
8
Clustering
+Easy
−Can’t measure the statistical significance
and would find clusters in the data even if
there aren’t reasonable clusters in the
dataset
−Can be affected from the data
transformation or from the measurement
unit (Xu et al., Human Molecular Genetics,
2002)
9
ANOVA
H0: µ1i=µ2i=µ3i , H1: at least one is different, i: prob set
Probe ID Drug1,
Rep1
Drug1,
Rep2
Drug2,
Rep1
Drug2,
Rep2
Drug3,
Rep1
Drug3,
Rep2
p-value
100_g_at
7.4587
7.321
7.2037
7.3204
7.333
7.486
0.46
1000_at
6.2708
6.276
6.0370
6.046
6.052
6.262
0.0043
1708_at
11.62
11.409
11.398
11.35
4.358
4.534
0.00288
10
Interpretation of the results
 Rejection region
Rejection region
X
-1.96
1.96
z
If p-value < alpha, then reject H0
Alpha is usually 0.05, 0.01 or 0.1
E.g., expected false positive is
10,000 gene * 0.05 = 500
11
ANOVA
+
+
−
−
−
Can test the difference between groups
Takes the variance into account
Can’t be applied to data w/o replicates
Assumes data come from Normal distribution
The rejection of H0 does not provide the
information on which groups are different, we
need pairwise comparisons (t-tests) for this.
12
t-test
Probe ID
p (1,2)
p (2,3)
p (1,3)
100_g_at
0.292
0.263
0.863
1000_at
0.0005
0.386
0.383
1708_at
0.325
0.00017
0.0004
H0: µ1i=µ2i , H1: µ1i≠µ2i
+ Can measure the differences between two groups
+ Takes the variance into account
− Can’t be applied to data w/o replicates
− Assumes data come from Normal distribution
 Warning: paired / unpaired t-test
13
Multiple testing correction
 Assume that we are holding two independent tests (!) for two
genes. Let the probability of each being correct be 0.95. Due to
independence, the probability of both test being correct is
0.95*0.95 = 0.9025.
− Bonferroni
 New alpha = alpha / test number
 Too conservative
+ Benjamini-Hochberg FDR
 Example, suppose you found 100 expressed genes out of 10,000 at
alpha= 0.0001
 Expected false positive number
10,000 gene * 0.0001 = 1
 False Discovery Rate (FDR) = 1/100 = 0.01
14
Recommended references
 Pavlidis, P.. Using ANOVA for gene selection from
microarray studies of the nervous system.
Methods, (2003); 31, 282–289.
 Books in the series of QP624.5 in the library
 A few courses in Computer Engineering
Department (Tolga Can)
15