Transcript statcourses

Life after linear regression
A survey of Penn State
applied statistics graduate courses
The courses
•
•
•
•
•
•
•
•
•
Stat 500: Applied Statistics
Stat 501: Regression Methods
Stat 502: Analysis of Variance & Design of Expts
Stat 503: Design of Experiments
Stat 504: Analysis of Discrete Data
Stat 505: Applied Multivariate Statistical Analysis
Stat 506: Sampling Theory and Methods
Stat 509: Biostatistical Methods
Stat 510: Applied Time Series Analysis
Stat 500: Applied Statistics
• Topics covered:
–
–
–
–
–
–
Descriptive statistics
Hypothesis testing and power
Estimation and confidence intervals
Regression
One- and two-way ANOVA
Chi-square tests
• Prerequisites
– 2 credits of algebra
Stat 501: Regression Methods
• Topics covered:
– Analysis of research data through simple and multiple
regression and correlation
– Polynomial models
– Indicator variables
– Stepwise and piecewise regression
– Logistic regression
• Prerequisites
– 6 credits of statistics or Stat 500; matrix algebra
Stat 502: Analysis of Variance and
Design of Experiments
• Analysis of data when:
– the response y is continuous
– the predictors (called factors or treatments) are
all qualitative
– have same error assumptions as for regression
• Do the means differ among the groups
defined by the factor combinations?
Stat 502: Analysis of Variance and
Design of Experiments
• Topics covered:
–
–
–
–
–
–
Analysis of variance and design concepts
Factorial, nested and unbalanced data
Analysis of covariance
Blocked designs
Latin-square, split-plot, repeated measures designs
Multiple comparisons
• Prerequisites
– Stat 501 (or undergraduate version Stat 462)
A Stat 502 Example:
Intertidal Seaweed Grazers
• To study influence of ocean grazers on
regeneration rates of seaweed in intertidal zone, a
researcher scraped square rock plots free of
seaweed and observed the seaweed regeneration
when certain types of seaweed-grazing animals
were denied access.
• Research questions:
– Which grazer consumes most seaweed?
– Do different grazers influence impact of each other?
– Are grazing effects similar in all microhabitats?
A Stat 502 Example:
Intertidal Seaweed Grazers
• The grazers were limpets (L), small fishes
(f), and large fishes (F):
–
–
–
–
–
–
LfF: all three grazers were allowed access
fF: limpets were excluded using caustic paint
Lf: large fish were excluded using coarse net
f: limpets and large fish were excluded
L: small, large fish excluded using fine net
C: the control group, all excluded
A Stat 502 Example:
Intertidal Seaweed Grazers
• Intertidal zone is a highly variable environment.
Researcher applied treatments in 8 blocks of 12 plots each:
–
–
–
–
–
–
–
–
#1: Just below high tide, exposed to heavy surf
#2: Just below high tide, protected from surf
#3: Midtide, exposed
#4: Midtide, protected
#5: Just above low tide level, exposed
#6: Just above low tide level, protected
#7: On near-vertical rock wall, midtide, protected
#8: On near-vertical rock wall, above low tide, protected
A Stat 502 Example:
Percent of regenerated seaweed on intertidal
plots with some grazers excluded
Block
1
2
3
4
5
6
7
8
Control
14, 23
22, 35
67, 82
94, 95
34, 53
58, 75
19, 47
53, 61
L
4, 4
7, 8
28, 58
27, 35
11, 33
16, 31
6, 8
15, 17
f
Lf
11, 24 3, 5
14, 31 3, 6
52, 59 9, 31
83, 89 21, 57
33, 34 5, 9
39, 52 26, 43
43, 53 4, 12
30, 37 12, 18
fF
LfF
10, 13 1, 2
10, 15 3, 5
44, 50 6, 9
57, 73 7, 22
26, 42 5, 6
38, 42 10, 17
29, 36 5, 14
11, 40 5, 7
Stat 503: Design of Experiments
• The key word is “experiments”
• When you can control the values of your
predictors (factors), you should ensure you
can answer your research question by:
–
–
–
–
Collecting the appropriate measurements
Setting the values of your factors appropriately
Reducing extraneous variation by “blocking”
Having an appropriate sample size
Stat 503: Design of Experiments
• Topics covered:
–
–
–
–
Design principles
Optimality
Confounding in split-plot designs
Repeated measures designs, fractional factorial designs,
response surface designs
– Balanced/partially balanced incomplete block designs
• Prerequisites:
– Stat 501 (or undergraduate Stat 462)
– Stat 502
A Stat 503 Example:
The BARGE Study
• Current standard treatment for patients with
mild to moderate asthma is scheduled daily
use of inhaled albuterol.
• Now hypothesized that such regular use has
a negative effect on lung function in
patients with B16Arg/Arg genotype, but not
in those with B16Gly/Gly genotype.
A Stat 503 Example:
The BARGE Study
• The BARGE Study concerns comparing the
regular use of inhaled albuterol (A) to
placebo (P) in patients with the B16Arg/Arg
genotype (R) and in patients with the
B16GlyGly genotype.
• The primary hypothesis concerns inference
about whether (μRA- μRP)- (μGA- μGP) is 0.
A Stat 503 Example:
BARGE Study’s Paired Crossover
Order
Genotype
R
Genotype
G
Period Wash Period
1
out
2
1 (AP)
Y1jRA
---
Y1jRP
2 (PA)
Y2jRP
---
Y2jRA
1 (AP)
Y1jGA
---
Y1jGP
2 (PA)
Y2jGP
---
Y2jGA
Stat 504: Analysis of Discrete Data
• Analysis of data when:
– the response y is binary or discrete
– the predictors are qualitative or quantitative
• Summarized data are frequency counts
• How do the predictors affect the response?
Stat 504: Analysis of Discrete Data
• Topics covered:
–
–
–
–
–
Models for frequency arrays
Goodness-of-fit tests
Two-, three- and higher-way tables
Latent models
Logistic and Poisson regression models
• Prerequisites
– Stat 502 (or undergraduate Stat 460 or major Stat 512)
– Matrix algebra
A Stat 504 Example:
Survival in the Donner Party
• In 1846, Donner and Reed families traveled
from Illinois to California by covered wagon.
• Group became stranded in eastern Sierra
Nevada mountains when hit by heavy snow.
• 40 of 87 members (45 adults over age 15)
died from famine and exposure.
• Are females better able to withstand harsh
conditions than are males?
A Stat 504 Example:
Survival in the Donner Party
Probability of survival
0.9
0.8
0.7
Female
0.6
0.5
0.4
0.3
Male
0.2
0.1
0.0
15
25
35
45
Age
55
65
A Stat 504 Example:
Survival in the Donner Party
Link Function:
Logit
Response Information
Variable
STATUS
Value
SURVIVED
DIED
Total
Count
20
25
45
(Event)
Logistic Regression Table
Predictor Coef
SE Coef
Z
P
Constant
1.633
1.110
1.47 0.141
AGE
-0.07820 0.03729 -2.10 0.036
Gender
1.5973
0.7555
2.11 0.034
Odds
Ratio
0.92
4.94
95% CI
Lower
Upper
0.86
1.12
0.99
21.72
Stat 505: Applied Multivariate
Statistical Analysis
• Analysis of data when you have several
correlated, continuous responses is called
multivariate data analysis.
• A repeated measure is a special kind of
multivariate response obtained by
measuring the same variable on each
subject several times, possibly under
different conditions.
Stat 505: Applied Multivariate
Statistical Analysis
• Topics covered:
– Multivariate data: matrix review, graphical displays, probability
theory, multivariate normal distribution, partial correlations
– Inferences about multivariate means: Hotelling’s T2 tests,
multivariate analysis of variance, repeated measures experiments
and growth curves, discriminant analysis
– Data reduction: Principal components, factor analysis, canonical
correlation analysis, cluster analysis
– Structural equation modeling
• Prerequisites:
– 6 credits in statistics
– Matrix algebra
A Stat 505 Example:
Pottery Data
• Pottery samples were collected from four
sites in the British Isles: Llanedyrn,
Caldicot, Isle Thornes, and Ashley Rails.
• Each piece analyzed for its aluminum, iron,
magnesium, calcium, and sodium content.
• Do the pottery samples from the four sites
differ with respect to their composition?
A Stat 505 Example:
Pottery Data
Stat 506: Sampling
Theory and Methods
• Topics covered:
– Basic methods: simple random sampling, selecting sample sizes,
unequal probability sampling, ratio and regression estimation,
stratified sampling, cluster and systematic sampling, multistage
designs, double sampling
– Special topics: sampling hidden human populations, environmental
sampling, sampling to study cause-and-effect relationships,
resampling of data, measurement errors and nonresponse in
surveys, adaptive sampling, network and snowball sampling
• Prerequisites:
– Calculus
– 3 credits in statistics
A Stat 506 Example:
A Water Pollution Survey
• Study region of interest has 320 lakes.
• Take random sample of the lakes by:
– Drawing a rectangle of length l and width w around
study region.
– Generate pairs of (0,1) random numbers. Multiple first
number by l, second by w to get random location
coordinates within region.
– If location is a lake, then lake is selected.
– Continue until required number of lakes selected.
Stat 509: Biostatistics
• Topics covered:
– An introduction to the design and statistical
analysis of randomized and observational
studies in biomedical research
• Prerequisites:
– Stat 500
Stat 510: Applied Time Series
Analysis
• Topics covered:
– Identification of models for empirical data collected
over time
– Use of models in forecasting
• Prerequisites:
– Stat 501 (or undergraduate Stat 462 or major Stat 511)
A Stat 510 Example:
Measuring Global Warming
• Temperature (in degrees Celsius) averaged for the
northern hemisphere over a full year.
• Temperature series collected from 1880 to 1987.
• All measurements expressed as differences from
their 108-year mean.
• Research questions:
– Is the mean temperature increasing over the 88 years?
– What is the rate of increase in global temperature over
the past century?
A Stat 510 Example:
Measuring Global Warming
Scatterplot of TEMP vs YEAR
0.4
0.3
0.2
TEMP
0.1
0.0
-0.1
-0.2
-0.3
-0.4
-0.5
1880
1900
1920
1940
YEAR
1960
1980
2000
A Stat 510 Example:
Measuring Global Warming
Residuals Versus the Order of the Data
(response is TEMP)
0.3
0.2
Residual
0.1
0.0
-0.1
-0.2
-0.3
1
10
20
30
40
50
60
70
Observation Order
80
90
100