Notes 1 - Department of Mathematics and Statistics

Download Report

Transcript Notes 1 - Department of Mathematics and Statistics

Advanced Statistical Methods:
Beyond Linear Regression
John R. Stevens
Utah State University
Notes 1. Case Study Data Sets
Mathematics Educators Workshop
28 March 2009
1
http://www.stat.usu.edu/~jrstevens/pcmi
Why this workshop?
 Me …
 Outreach mission of USU
 Recruitment – undergraduate & graduate
 Too much fun
 You …
2
Outline
 Notes 1: Case Study Data sets
 1. Challenger Explosion
 2. Beetle Fumigation
 3. T-cell Cancer
 Notes 2: Statistical Methods I
 Logistic Regression – incl. Separation of Points
 EM Algorithm
 Notes 3: Statistical Methods II
 Tests for Differential Expression
 Multiple hypothesis testing
 Visualization
 Machine Learning
 Notes 4: Computer Implementation
 (Notes 5): Bonus Material
3
Case Study 1: Challenger
 January 18, 1986 explosion prompted the Presidential
Commission on the Space Shuttle Challenger Accident
 Commission's 1986 report attributed the explosion to a burn
through of an O-ring seal at a field joint in one of the solidfuel rocket boosters
 After each of the previous 24 launches, the solid rocket
boosters were inspected, and the presence or absence of
damage
to the field joint was noted
4
Challenger Data
Motivating question:
What was so
different on the 25th
launch?
5
Obs
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
Flight
STS1
STS9
STS51B
STS2
STS41B
STS51G
STS3
STS41C
STS51F
STS4
STS41D
STS51I
STS5
STS41G
STS51J
STS6
STS51A
STS61A
STS7
STS51C
STS61B
STS8
STS51D
STS61C
Temp
66
70
75
70
57
70
69
63
81
80
70
76
68
78
79
67
67
75
72
53
76
73
67
58
Damage
NO
NO
NO
YES
YES
NO
NO
YES
NO
YES
NO
NO
NO
NO
NO
NO
YES
NO
YES
NO
NO
NO
YES
Case Study 2: Beetle Fumigation
– Rhyzopertha Dominica
6
(Image courtesy Clemson University – USDA Cooperative Extension Slide Series, www.insectimages.org)
Motivation
 Beetle: lesser grain borer
 A primary pest of stored grain
 A year-round problem in moderate climates
 Australian grain industry:
 $6–8 billion
 Zero tolerance for insect-infested grain
 Phosphine fumigant for control
 Some beetles have developed resistance levels more than 235
times greater than normal
7
(UQ News Online, 18 Oct. 1999)
Experimental Background
 Two DNA markers linked to resistance
 rp6.79: two genotypes: –,+
 rp5.11: three genotypes: B,H,A
 Motivating question:
What contributes to
the degree of resistance?
 Mixture of six beetle genotypes  exposure to various
concentrations of fumigant (48 hours)
8
Experimental Data
Phosphine
Total
Dosage
Receiving
(mg/L)
Dosage
0
98
0.003
100
0.004
100
0.005
100
0.01
100
0.05
300
0.1
400
0.2
750
0.3
500
0.4
500
1.0
7850
10,798
9
Total
Deaths
0
16
68
78
77
270
383
740
490
492
7,806
10,420
Total
Survivors
98
84
32
22
23
30
17
10
10
8
44
378
Survivors Observed at Genotype
-/B
-/H
-/A +/B +/H +/A
31
27
10
6
20
4
18
26
10
6
20
4
10
4
3
5
7
4
1
4
7
2
6
2
0
1
9
8
5
0
0
0
0
5
20
5
0
0
0
0
10
7
0
0
0
0
0
10
0
0
0
0
0
10
0
0
0
0
0
8
0
0
0
0
0
44
Practical Considerations in
Choosing Dosage
 Clearly a high dosage would kill all beetles, regardless of
genotype
 Time more important than concentration
 Expense
more time with lower dose
 Technical limitations
maintain concentration in silos
 Safety
spontaneous combustion at high conc.
10
Case Study 3: T-cell Cancer
 Acute lymphoblastic leukemia (ALL)
 leukemia – cancer of white blood cells
 ALL – excess of lymphoblasts (immature cells that become
white blood cells)
 Two types of interest here:
 T-cell – manage cell-mediated immune response
(activation of cells, release of cytokines)
 B-cell – manage humoral immune response
(secretion of antibodies)
 Researchers used gene expression technology
11
Central Dogma of Molecular
Biology
12
General assumption of microarray
technology
 Use mRNA transcript abundance level as a measure of
the level of “expression” for the corresponding gene
 Proportional to degree of gene expression
13
How to measure mRNA
abundance?
 Several different approaches with similar themes:
 Affymetrix GeneChip
 Nimblegen array
 Two-color cDNA array
 more
 Representation of genes on slide
 Small portion of gene
 Larger sequence of gene
14
oligonucleotide
arrays
Affymetrix Probes
25 bp
15
(Images courtesy Affymetrix, www.affymetrix.com)
Affymetrix Technology – GeneChip

Each spot on array represents a
single probe sequence
(with millions of copies)
 Perfect match
 Mismatch
16
(Image courtesy Affymetrix, www.affymetrix.com)

Each gene is represented by a
unique set of probe pairs (usually
12-20 probe pairs per probe set)

These probes are fixed to the array
Affymetrix Technology – Expression
A tissue sample is prepared so that its mRNA has
fluorescent tags; wait for hybridization
17
(Images courtesy Affymetrix, www.affymetrix.com)
Affymetrix GeneChip
18
Image courtesy Affymetrix, www.affymetrix.com
Cartoon Representations
 Animation 1: GeneChip structure
(1 min.)
 Animation 2: Measuring gene expression
(2.5 min)
19
Data: Spot Intensities
Full Array Image
20
Close-up of Array Image
Images courtesy Affymetrix, www.affymetrix.com
Basic goal of microarray technology
 “Observe” gene expression in different conditions – healthy
vs. diseased, e.g.
 Decide which genes’ expression levels are changing
significantly between conditions
 Target those genes – to halt disease, e.g.
 Study those genes – to better understand differences at the
genetic level
21
ALL Data
 “Preprocessed” gene expression data
 12625 genes (hgu95av2 Affymetrix GeneChip)
 128 samples (arrays)
 a matrix of “expression values” – 128 cols, 12625 rows
 phenotypic data on all 128 patients, including:
 95 B-cell cancer
 33 T-cell cancer
 Motivating question: Which genes are changing expression
values systematically between B-cell and T-cell groups?
22
Next …
 Analysis for these case studies
 Build on known statistical methods
 Notice huge potential for additional methods
23