summary statistics and model parameters
Download
Report
Transcript summary statistics and model parameters
Speed Dating with
Regression Procedures
David J Corliss, PhD
Wayne State University
Physics and Astronomy / Public Outreach
Model Selection Flowchart
NON-LINEAR
LINEAR MIXED
NON-PARAMETRIC
Decision: Continuous or Discrete Outcome
PROC LOGISTIC
PROC REG
Simple Linear Regression
• Regression Type: Continuous, linear
• General regression procedure with a number of
options but limited specialized capabilities, for which
other procedures or packages have been developed
• Choice of model variable selection methods (e.g.,
Forward, Backwards, Best Subsets), can be coded for
polynomial regression, multiple model statements
and features interactive capability
• SAS = REG, R = lm function, regress
Simple Linear Regression
Example: Homeless Students by State
Actual
Percent
r2=.652
Model - Percent of Student Population
Solid performance of the model across the range from low to high homelessness states
indicates consistency of factors correlated with the number of homeless students
Special Data Needs: Problems with Outliers
Robust Regression
• Regression Type: Continuous, linear
• Robust regression is achieved by identifying outliers,
limiting their influence by assigning weights and then
performing standard regression
• Choice of methods for outlier detection e.g. M, LTS,
S and MM estimation; robust ANOVA
• SAS = ROBUSTREG, R = robustbase, robust
PROC ROBUSTREG
Example: Log-Log Regression With Weighted Outliers
SAS/STAT® 9.2 User’s Guide, support.sas.com
In Robust Regression, the outliers need not be disregarded:
weights can be assigned and incorporated in the regression
Special Data Needs: Ill-Conditioned Data
Regression Using Givens Rotations
• Regression Type: Continuous, linear
• Regression using the Gentleman-Givens procedure
instead of collecting crossproducts
• For ill-conditioned data, where small errors in the
data may cause large errors in the results – more
accurate than simple regression
• SAS = ORTHOREG, R = givens
Givens Rotation Regression
Example: Fitting a Higher-Order Polynomial
SAS/STAT® 9.2 User’s Guide, support.sas.com
An example of fitting a 9th-degree polynomial, where near
singularities must be distinguished from true ones
Special Data Needs: Transformation
Regression with Data Transformation
• Regression Type: Continuous, linear
• Regression with a number of data transformations,
including smooth, spline, Box-Cox and other nonlinear forms
• Supports fitting splines with a user-specified degree
and number of knots; capable of piece-wise solutions
• SAS = TRANSREG, R = reg, betareg
Regression with Data Transformation
Example: Spline Regression to a Complex Form
Splines used to fit to a spectrographic line profile
to determine the radial velocity of erupting gas from a star
Special Model Types: General Linear
General Linear Models
• Regression Type: Continuous, linear
• General purpose procedure for continuous least
squares regression using classification predictor
variables as well as continuous
• While capable of many types of models and analysis,
another procedure is often better for a specific task
• SAS = GLM, R = glm function
General Linear Model
Example: Age Group as a Categorical Predictor Variable
Distribution of Response
An Overview of ODS Statistical Graphics in SAS® 9.3
Robert N. Rodriguez, SAS Institute Inc., Cary, NC
agegroup
GLM used with Box and Whisker output
Special Model Types: By Quantile
Quantile Regression
• Regression Type: Continuous, linear
• Quantile regression: while other procedures model
the mean, quantile regression models the median and
other specified quantiles to provide a more complete
picture of the response variable
• Uncertainties for individual quantiles can be
estimated by bootstrapping
• SAS = QUANTREG, R = quantreg
Quantile Regression
Example: 5/10/ 25/50/75/90/95% Quantiles
Predicted birth weight by maternal weight gain
Quantile regression with PROC QUANTREG
Peter L. Flom, Peter Flom Consulting, New York, NY
An example of Quantile Regression demonstrating
greater detail than possible with ordinary regression
Special Model Types: PLS, PCA Regression
Partial Least Squares & Principal Components
• Regression Type: Continuous, linear
• Partial Least Squares and Principal Component
regression: predictor and response variables are
projected into a new coordinate systems, possibly
with reduced complexity
• Supports reduced rank regression with cross
validation of the number of components
• SAS = PLS, R = pls
Partial Least Squares / Principal Components
Example: Variable Importance Plot
Quantile regression with PROC QUANTREG
Peter L. Flom, Peter Flom Consulting, New York, NY
Principal Component variables
derived from the original, observed variables
Special Model Types: Survey Data
Survey Regression
• Regression Type: Continuous, linear
• Special capabilities for analysis in the presence of
common survey data features, including
stratification, clustering and weighting
• Supports several methods for sampling and
estimation of sampling error using either Taylor
series or primary sample units
• SAS = SURVEYREG, R = survey
Survey Regression
Example: Regression with Stratified Sampling
Stratum Information
Stratum Index State
Note:
Parameter
Region
N Obs
Population Total
Sampling Rate
1 Iowa
1
3
100
3.00%
2
2
5
50
10.0%
3
3
3
15
20.0%
4 Nebraska
1
6
30
20.0%
5
2
2
40
5.00%
Estimated Regression Coefficients
The denominator
degrees
of freedom for
Standard
is 14.
Estimatethe F testsError
t Value
Tests of Model Effects
Effect
Num DF
F Value
Pr > F
Pr > |t|
Model
1
21.74
0.0004
Intercept
11.8162978
5.31981027
2.22
0.0433
Intercept
1
4.93
0.0433
FarmArea
0.2126576
0.04560949
4.66
0.0004
FarmArea
1
21.74
0.0004
Covariance of Estimated Regression Coefficients
Intercept
FarmArea
Intercept
28.300381277
-0.146471538
FarmArea
-0.146471538
0.0020802259
PROC SURVEYREG sas.support.com, example 98.4
Example output from application to survey data,
with summary statistics and model parameters
Special Model Types: PH on Survey Data
Proportional Hazards with Survey Data
• Regression Type: Continuous, linear
• Performs Cox Proportional Hazards modeling on
survey data with truncation, supporting
stratification, clustering and weighting
• Performs estimation of variance by model
parameters by Taylor series, BRR or Jackknife
• SAS = SURVEYPHREG, R = survey
Proportional Hazards with Survey Data
Example: Stratified Sampling with Truncated Data
Analysis of Maximum Likelihood Estimates
DF
Estimate
Standard Error
t Value
Pr > |t|
Hazard
Ratio
BodyWeight
586
0.011920
0.003155
3.78
0.0002
1.012
Smoke -1
586
-1.174048
0.739450
-1.59
0.1129
0.309
Smoke 1
586
-1.006515
0.578810
-1.74
0.0826
0.365
Smoke 2
586
-0.674183
0.558412
-1.21
0.2278
0.510
Smoke 3
586
0
.
.
.
1.000
Parameter
Type III Tests of Model Effects
Effect
Num DF
Den DF
F Value
Pr > F
BodyWeight
1
586
14.27
0.0002
Smoke
3
586
1.49
0.2160
Estimate
Label
Row 1
Estimate
Standard Error
DF
t Value
Pr > |t|
Exponentiated
-0.7532
0.3870
586
-1.95
0.0521
0.4709
PROC SURVEYPHREG sas.support.com, example 97.2
Example output for Proportional Hazards regression on survey
data with truncation: summary statistics and model parameters
Special Model Types: Categorical
Regression on Categorical Data
• Regression Type: Continuous, linear
• A generalization of continuous methods to categorical
data, performs linear regression and other analyses
on data than can be expressed in a contingency tables
• Supports both ordinary and logistic regression, loglinear and repeated measures
• SAS = CATMOD, R = catdata, vgam
Regression on Categorical Data
Example: Bartlett's Data, No 3-Variable Interaction
Response Profiles
Data Summary
Response
Length*Time*Status
Response Levels
8
Response
Length
Time
Status
Weight Variable
wt
Populations
1
1
1
1
1
Data Set
BARTLETT
Total Frequency
960
2
1
1
2
Frequency Missing
0
Observations
8
3
1
2
1
4
1
2
2
5
2
1
1
6
2
1
2
7
2
2
1
8
2
2
2
Maximum Likelihood Analysis of Variance
Source
DF
Chi-Square
Pr > ChiSq
Length
1
2.64
0.1041
Time
1
5.25
0.0220
Length*Time
1
5.25
0.0220
Status
1
48.94
<.0001
Length*Status
1
48.94
<.0001
Time*Status
1
95.01
<.0001
Likelihood Ratio
1
2.29
0.1299
PROC CATMOD sas.support.com, example 28.4
Example output from regression on categorical data,
with summary statistics and model parameters
Special Model Types: Complex Optimization
Response Surface Regression
• Regression Type: Continuous, linear
• Linear regression for fitting quadratic Response
Surface Models – a type of general linear model that
identifies where optimal response values occur more
efficiently than ordinary regression or GLM
• Output displays the Response Surface and identifies
ridges of optimum response
• SAS = RSREG, R = rsm
Response Surface Regression
Example: A Response Surface with Optimal Solution
Quantile regression with PROC QUANTREG
Peter L. Flom, Peter Flom Consulting, New York, NY
An example of a response surface with the optimal solution found
at the minimum; multiple minima and maxima are possible
Special Model Types: Time to Failure
Survival Analysis
• Regression Type: Continuous, linear
• Models time to failure data as a linear combination of
predictors and a random disturbance term, which
can be described by many different distributions
• Supports standard survival analysis data censored on
the right, left, both or neither
• SAS = LIFEREG, R = survival
Survival Analysis
Example: A Cumulative Hazard Model
Quantile regression with PROC QUANTREG
Peter L. Flom, Peter Flom Consulting, New York, NY
This example plots the log-logistic vs. the Kaplan-Meier
Cumulative Hazard
Special Model Types: Time-dependent Risk
Proportional Hazards Model
• Regression Type: Continuous, linear
• Cox Proportional Hazards modeling, where the a
unit increase in a predictor multiplies the risk by a
factor determined by the model
• Supports proportional hazards models with data
censored on the right, left, both or neither, variable
selection by multiple methods incl. best subset
• SAS = PHREG, R = coxph
Proportional Hazards Model
Example: Model With Time-Dependent Predictors
Example output from a Proportional Hazards model, with
summary statistics and model parameters
Special Model Types: Simultaneous Outcomes
Structural Equation Models
• Regression Type: Continuous, linear
• In Structural Equation Modeling, a linear
combination of predictors describes a vector equal to
a linear combination of outcome variables
•
• Supports latent variables, multiple and multivariate
regression, path analysis and canonical correlation
• SAS = CALIS, R = sem
Structural Equation Model
Example: Linear Relations among Factor Loadings
Example output from a Structural Equation
model, with matrices of model parameters
Discrete Outcomes: Simple Logistic
Logistic Regression
• Regression Type: binary & ordinal outcomes, linear
• General procedure for logistic regression with a
number of options; other procedures may offer more
capabilities for specific types of discrete models
• Supports many model variable selection methods and
diagnostic tests
• SAS = LOGISTIC, R = glm function
Discrete Outcomes: Simple Logistic
Logistic Regression
Data: IDRE / UCLA
Example data and output from a Logistic Regression
model, with summary statistics and model parameters
Discrete Outcomes: Generalized
General Linear Models
• Regression Type: discrete outcomes, linear
• Generalized linear models with discrete outcomes,
appropriate where the data are not normally
distributed or the variance is not the same for all
observations
•
• Supports Poisson Regression and Repeated Measures
• SAS = GENMOD, R = glm function
Discrete Outcomes: Generalized
General Linear Models
Example output from a General Linear Regression model of a
discrete outcome, with summary statistics and model parameters
Discrete Outcomes: Outcome Probability
PROBIT Models
• Regression Type: discrete outcomes, linear
• Models the probability that an observation will have a
particular outcome
• Supports probit, logit, ordinal logistic, and extreme
value / gompit
• SAS = PROBIT, R = glm, family = binomial(link =
"probit")
Discrete Outcomes: Outcome Probability
PROBIT Models
Example data and output from a PROBIT model,
with summary statistics and model parameters
Non-Linear Models: General
Non-Linear Models
• Regression Type: non-linear
• Performs non-linear regression with the dependent
variable divided into a mean component and a
(random) error component; process is iterative
• Supports steepest-descent, Newton, modified GaussNewton and Marquardt methods
• SAS = NLIN, R = nls function, nleqslv
Non-Linear Models
Example: Fitting a Model to a Complex Curve
In this example observations are normally distributed
about a non-linear function – in this case, a Morlet wavelet
Non-Linear Models: Mixed Effects
Non-Linear Mixed-Effects Models
• Regression Type: non-linear
• Performs non-linear regression where both the mean
and errors components of the dependent variable are
non-linear; process uses a Taylor series expansion
about zero
• Supports normal, binomial and Poisson distributions
and capability for programing a general distribution
• SAS = NLMIXED, R = nlme
Non-Linear Mixed-Effects Models
Example: Plot of Profile of Trees Over Time
In this example, variability the shape of observed trees
increases over time
Linear Mixed: Fixed and Random Effects
Mixed Models
• Regression Type: linear, fixed and random effects
• Performs linear regression using a linear
combination of fixed effects added to a second linear
combination of random effects
• Supports repeated measures in longitudinal studies;
especially useful for dealing with missing data
• SAS = MIXED, R = lme4, coxme
Linear Mixed-Effects Models
Example: Repeated Measures
Example of a Mixed Effects Model, incorporating both fixed
and random effects to improve the predictive power
Linear Mixed: General
General Mixed Models
• Regression Type: linear mixed
• Generalization of mixed models to permit normallydistributed random effects and non-normal error
terms
• Supports fitting models to correlated data or where
the variability is not constant
• SAS = GLIMMIX, R = lme4
General Mixed Models
Example: Crossed Random Effects
LOESS with crossed random effects analyzes in-breeding in an
isolated population, allowing generalization to all populations
Non-Parametric Models: Localized
Local Regression
• Regression Type: linear, non-parametric
• Develops a model using non-parametric regression to
segments of data and calculates confidence limits for
• the outcome; computationally intensive
• Supports multiple dependent variables,
multidimensional predictors and interpolation using
kd trees
• SAS = LOESS, R = locfit
Local Regression
Example: Periodicities in Weather Data
In this example, Local Regression is used to identify
potential periodicities at 12 and 42 months
Non-Parametric Models: Additive
Generalized Additive Models
• Regression Type: linear, non-parametric
• Generalized Additive Models, with multiple
independent non-parametric predictors; univariate
smoothing provides finer details than is possible with
the piece-wise LOESS procedure
• Supports non-parametric and semi-paramentric
models, multidimensional predictors
• SAS = GAM, R = gam
Additive Model
Example: Segmented Response Surface
An Additive Model used to fit a complex response surface
without loss of detail to due piece-wise fitting in local regression
Questions
[email protected]