Chapter 19.1

Download Report

Transcript Chapter 19.1

Final Exam Time and Place:
Saturday, Dec 8,
9:00am - 12:00pm
EN 1054
Chapter 19.1
Exploratory Data Analysis
What is Exploratory Data Analysis?
• An approach to analyze data sets to:
– Discover patterns
– Find a better model
• It’s an iterative process
– Refine to uncover patterns
Confirmatory vs. Exploratory
Confirmatory analysis
Exploratory analysis
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
What decision can be made?
How certain can we be?
What are values of parameters?
Sample
ONE use of a sample (datagrinding, otherwise)
Single analysis
p-value = ?
Yes/no decision
Residuals acceptable?
Experimental design
•
•
•
•
•
What is the appropriate model?
What is data telling us?
What is structure of model?
Batch of data
Repeated use of a batch
.
Iterative search for pattern
Explained variance = ?
Best model
Residuals show pattern?
Factor analysis
Exploratory
What is the appropriate model?
But remember,
pattern ≠ cause
Confirmatory
What decision can be made?
Inference
• Confirmatory
– Narrow form of inference
– Relate one Q to another Q
(e.g. βreg)
• Exploratory
– Broader form of inference
– Trying to discover a pattern
worth running through a
confirmatory analysis
P corm
N corn
C corn
⁞
~
P soil
N soil
C soil
⁞
Don’t confuse confirmatory and
exploratory analyses
• Refining models using p-values ≠ exploratory
analysis
• Repeated analysis of the same data set is data
dredging (aka: data grinding, data mining, data fishing, data snooping…)
• Any data set has a degree of randomness, so
multiple comparisons may be bound to find a
false association
Characteristics of Exploratory Analyses
• Relies strongly on graphical analyses
http://gallery.r-enthusiasts.com/thumbs.php
Characteristics of Exploratory Analyses
• Simplify – determine best model for pattern
Execution
1. Define all quantities that are used
– Procedure statement
– Name and Symbol
– Values with Units
2. Identify response and explanatory variables
3. Decide whether to undertake exploratory or confirmatory
analysis, stating reasons for choice
4. State screening criterion to distinguish exploratory from
confirmatory analysis
–
Visual screening
–
P-value based (e.g. keep if <0.1)
Box and Arrow Diagrams  Logic
• Gordon Riley is interested in aquatic
productivity of Georges Bank
Light
Nutrients
(nitrates,
phophates)
Phytoplankton
Zooplankton