Transcript Chapter 1
CHAPTER 1
EVERYTHING YOU
EVER WANTED TO
KNOW ABOUT
STATISTCS
Building Statistical Models
• Discovering some thing about a
phenomena
• We explain a Phenomena by collecting
data from real world and then draw
conclusion about what i being studied
• Analogy: Building a Bridge across a river
Figure 1.1
• Model is an accurate represenation of the
real world.
• Social Scientists build real world models to
predict how these processes operate
under certain conditions
•
Simple Statistics Models: The Mean,
sums of squares & Std Deviation
• Mean: Represents a summary of data. 5 Stat lectures and measured
the number of friends they had i.e 1,2,3,3 and 4 Mean is 2.6 (Fig
1.2)
• Total error = sum of deviances
•
•
•
•
•
•
Sum of Squared Errors (SS) – measures accuracy of model
Variance: Measures how well the model fits the data
= SS / N-1
Std Deviation: Measures how well the mean represents the data.
Large and small standard deviation
Model Outcome= Model + Error
Figure 1.2
Model over
estimates
Red line is the model
Error in the
model
Model over
estimates
SS: Sum of squared Errors, Variance and Standard Deviation
Figure 1.3
Mean is a good fit of
data
Mean is a poor fit of
data
Frequency Distributions
In an ideal world data is distributed
symmetrically around the mean:
Normal distribution
Figure 1.4
Properties of frequency distributions
Skewness
Positively skewed (left)
Negatively skewed (right)
Figure 1.5
In a normal distribution the value is zero
Kutosis: Distributions vary in their
pointyness
Leptokurtic
Platykurtic
Figure 1.7
In a normal distribution the value is zero
1.5.3 Standard Normal Distribution
Beachy Head example
Z scores
Z=X–X /s
Important Z Scores
Z = +-1.96 i.e 95% of z scores lie between this
range
Z = +- 2.58 i.e 99% of z scores lie between
Z = +- 3.29 i.e 99.9% of z scores lie between
Figure 1.8
Is my sample representative of
population
Sampling distribution tells us
the behaviour of samples
Figure 1.9
Linear models
Figure 1.11
Confidence Intervals
• Lower boundary of confidence intervals
= X – (1.96 x SE )
• Upper boundary of confidence intervals
= X – (1.96 x SE )
How can we tell if our model represents
the real world
•
•
•
•
•
•
•
•
•
•
Generate a Hypothesis
Collect useful data
Fit a statistical model to the data
Assess this model to see whether it supports initial predictions
Prediction made by researcher Experimental Hypothesis (Alternative
Hypo)
Status Quo is Null Hypothesis
Example: Hamburgers make you fat
Fischer – Statistically Significant : When we are 95% certain that a
result is genuine (i.e not a chance finding) should we accept it as
being true.
Or
I f there is only a 5 % probability of some thing occuring by then we
accept that it is a true finding. We say it is a statistically significant
finding.
Test Statistics
Two Types of Variance
• Systematic Variation
Due to some genuie effect i.e that can be
explained by the model that we have fitted
to the data.
• Unsystemativ Variation
This variation is not due to the effect in
which we are interested. i.e that can not
be explained by the model that we have
fitted to the data.
Whteher a model is a reasonable representation of
what is happening in the population
• Calculate Test Statistics ( A test stat is simply a stat that has known
properties i.e we know how frequently different values of this stat
occur)
• By knowing this we can calculate the probability of obtaining a
particular value.
• The test statistics are t, F & chi square. They represent the same
thing
test statistics = Variance explained by the model / Variance not
explained by the model
One and two tailed tests
Figure 1.13
Type I & Type II Errors
What we did beleive from the
experiement
Fail to Reject
H0
Reject H0
True state of nature:What is in reality
H0 true
(Null is True)
Correct
decision
1-
Type I error
P(Rejecting the null hypothesis,
when it is true)
H0 false
Null is false)
Type II error
P(Acceptinng the Null Hypothesis when it is
fls)
Correct
Decision
1- Power
Effect Sizes
• R = .10 Small effect i.e Explains 1% of the
total variance
• R = .30 Small effect i.e Explains 9% of the
total variance
• R = .50 Small effect i.e Explains 25% of
the total variance
Statistical Power
• The ability of a test to detect an effect of
that size is known as statistical power.
1- Power