Notes 1 - Department of Mathematics and Statistics
Download
Report
Transcript Notes 1 - Department of Mathematics and Statistics
Advanced Statistical Methods:
Beyond Linear Regression
John R. Stevens
Utah State University
Notes 1. Case Study Data Sets
Mathematics Educators Workshop
28 March 2009
1
http://www.stat.usu.edu/~jrstevens/pcmi
Why this workshop?
Me …
Outreach mission of USU
Recruitment – undergraduate & graduate
Too much fun
You …
2
Outline
Notes 1: Case Study Data sets
1. Challenger Explosion
2. Beetle Fumigation
3. T-cell Cancer
Notes 2: Statistical Methods I
Logistic Regression – incl. Separation of Points
EM Algorithm
Notes 3: Statistical Methods II
Tests for Differential Expression
Multiple hypothesis testing
Visualization
Machine Learning
Notes 4: Computer Implementation
(Notes 5): Bonus Material
3
Case Study 1: Challenger
January 18, 1986 explosion prompted the Presidential
Commission on the Space Shuttle Challenger Accident
Commission's 1986 report attributed the explosion to a burn
through of an O-ring seal at a field joint in one of the solidfuel rocket boosters
After each of the previous 24 launches, the solid rocket
boosters were inspected, and the presence or absence of
damage
to the field joint was noted
4
Challenger Data
Motivating question:
What was so
different on the 25th
launch?
5
Obs
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
Flight
STS1
STS9
STS51B
STS2
STS41B
STS51G
STS3
STS41C
STS51F
STS4
STS41D
STS51I
STS5
STS41G
STS51J
STS6
STS51A
STS61A
STS7
STS51C
STS61B
STS8
STS51D
STS61C
Temp
66
70
75
70
57
70
69
63
81
80
70
76
68
78
79
67
67
75
72
53
76
73
67
58
Damage
NO
NO
NO
YES
YES
NO
NO
YES
NO
YES
NO
NO
NO
NO
NO
NO
YES
NO
YES
NO
NO
NO
YES
Case Study 2: Beetle Fumigation
– Rhyzopertha Dominica
6
(Image courtesy Clemson University – USDA Cooperative Extension Slide Series, www.insectimages.org)
Motivation
Beetle: lesser grain borer
A primary pest of stored grain
A year-round problem in moderate climates
Australian grain industry:
$6–8 billion
Zero tolerance for insect-infested grain
Phosphine fumigant for control
Some beetles have developed resistance levels more than 235
times greater than normal
7
(UQ News Online, 18 Oct. 1999)
Experimental Background
Two DNA markers linked to resistance
rp6.79: two genotypes: –,+
rp5.11: three genotypes: B,H,A
Motivating question:
What contributes to
the degree of resistance?
Mixture of six beetle genotypes exposure to various
concentrations of fumigant (48 hours)
8
Experimental Data
Phosphine
Total
Dosage
Receiving
(mg/L)
Dosage
0
98
0.003
100
0.004
100
0.005
100
0.01
100
0.05
300
0.1
400
0.2
750
0.3
500
0.4
500
1.0
7850
10,798
9
Total
Deaths
0
16
68
78
77
270
383
740
490
492
7,806
10,420
Total
Survivors
98
84
32
22
23
30
17
10
10
8
44
378
Survivors Observed at Genotype
-/B
-/H
-/A +/B +/H +/A
31
27
10
6
20
4
18
26
10
6
20
4
10
4
3
5
7
4
1
4
7
2
6
2
0
1
9
8
5
0
0
0
0
5
20
5
0
0
0
0
10
7
0
0
0
0
0
10
0
0
0
0
0
10
0
0
0
0
0
8
0
0
0
0
0
44
Practical Considerations in
Choosing Dosage
Clearly a high dosage would kill all beetles, regardless of
genotype
Time more important than concentration
Expense
more time with lower dose
Technical limitations
maintain concentration in silos
Safety
spontaneous combustion at high conc.
10
Case Study 3: T-cell Cancer
Acute lymphoblastic leukemia (ALL)
leukemia – cancer of white blood cells
ALL – excess of lymphoblasts (immature cells that become
white blood cells)
Two types of interest here:
T-cell – manage cell-mediated immune response
(activation of cells, release of cytokines)
B-cell – manage humoral immune response
(secretion of antibodies)
Researchers used gene expression technology
11
Central Dogma of Molecular
Biology
12
General assumption of microarray
technology
Use mRNA transcript abundance level as a measure of
the level of “expression” for the corresponding gene
Proportional to degree of gene expression
13
How to measure mRNA
abundance?
Several different approaches with similar themes:
Affymetrix GeneChip
Nimblegen array
Two-color cDNA array
more
Representation of genes on slide
Small portion of gene
Larger sequence of gene
14
oligonucleotide
arrays
Affymetrix Probes
25 bp
15
(Images courtesy Affymetrix, www.affymetrix.com)
Affymetrix Technology – GeneChip
Each spot on array represents a
single probe sequence
(with millions of copies)
Perfect match
Mismatch
16
(Image courtesy Affymetrix, www.affymetrix.com)
Each gene is represented by a
unique set of probe pairs (usually
12-20 probe pairs per probe set)
These probes are fixed to the array
Affymetrix Technology – Expression
A tissue sample is prepared so that its mRNA has
fluorescent tags; wait for hybridization
17
(Images courtesy Affymetrix, www.affymetrix.com)
Affymetrix GeneChip
18
Image courtesy Affymetrix, www.affymetrix.com
Cartoon Representations
Animation 1: GeneChip structure
(1 min.)
Animation 2: Measuring gene expression
(2.5 min)
19
Data: Spot Intensities
Full Array Image
20
Close-up of Array Image
Images courtesy Affymetrix, www.affymetrix.com
Basic goal of microarray technology
“Observe” gene expression in different conditions – healthy
vs. diseased, e.g.
Decide which genes’ expression levels are changing
significantly between conditions
Target those genes – to halt disease, e.g.
Study those genes – to better understand differences at the
genetic level
21
ALL Data
“Preprocessed” gene expression data
12625 genes (hgu95av2 Affymetrix GeneChip)
128 samples (arrays)
a matrix of “expression values” – 128 cols, 12625 rows
phenotypic data on all 128 patients, including:
95 B-cell cancer
33 T-cell cancer
Motivating question: Which genes are changing expression
values systematically between B-cell and T-cell groups?
22
Next …
Analysis for these case studies
Build on known statistical methods
Notice huge potential for additional methods
23