A variable - The Department of Mathematics & Statistics

Download Report

Transcript A variable - The Department of Mathematics & Statistics

Stats 242.3(02)
Statistical Theory and Methodology
Instructor:
W.H.Laverty
Office:
235 McLean Hall
Phone:
966-6096
Lectures:
Evaluation:
M W F 2:30pm - 3:20am
Arts 206
Lab: M 3:30 - 4:20 Arts 206
Assignments, Labs, Term tests - 40%
Final Examination - 60%
Text:
Dennis D. Wackerly, William Mendenhall
III, Richard L. Scheaffer, Mathematical
Statistics with applications, 6th Edition,
Duxbury Press
Course Outline
Introduction
• Chapter 1
Sampling Distributions
Chapter 7
• Sampling distributions related to the
Normal distribution
• The Central Limit theorem
• The Normal approximation to the Binomial
Estimation
Chapter 8
• Properties of estimators
• Interval estimation
• Sample size determination
Properties and Methods of
Estimation
Chapter 9
• The method of moments
• Maximum Likelihood estimation
• Sufficiency (Sufficient Statistics)
Hypothesis testing
•
•
•
•
•
Chapter 10
Elements of a statistical test - type I and
type II errors
The Z test - one and two samples
hypothesis testing for the means of the
normal distribution with small sample sizes
Power and the NeymannPearson Lemma
Likelihood ratio tests
Linear and Nonlinear Models
Least Squares Estimation
Chapter 11
• Topics covered dependent on available time
The Analysis of Variance
Chapter 13
• Topics covered dependent on available time
Nonparametric Statistical
Methods
Chapter 15
• Topics covered dependent on available time
Introduction
What is Statistics?
It is the major mathematical tool of
scientific inference – methods for
drawing conclusion from data.
Data that is to some extent corrupted
by some component of random
variation (random noise)
Phenomena
Deterministic
Non-deterministic
Deterministic Phenomena
A mathematical model exists that
allows accurate prediction of
outcomes of the phenomena (or
observations taken from the
phenomena)
Non-deterministic Phenomena
Lack of perfect predictability
Non-deterministic Phenomena
haphazard
Random
Random Phenomena
No mathematical model exists that allows
accurate prediction of outcomes of the
phenomena (or observations)
However the outcomes (or observations)
exhibit in the long run on the average
statistical regularity
Example
Tossing of a Coin:
No mathematical model exists that allows
accurate prediction of outcome of this
phenomena
However in the long run on the average
approximately 50% of the time the coin is
a head and 50% of the time the coin is a
tail
Haphazard Phenomena
No mathematical model exists that allows
accurate prediction of outcomes of the
phenomena (or observations)
No exhibition of statistical regularity in
the long run.
Do such phenomena exist?
In both Statistics and Probability theory
we are concerned with studying random
phenomena
In probability theory
The model is known and we are interested
in predicting the outcomes and
observations of the phenomena.
model
outcomes and
observations
In statistics
The model is unknown
the outcomes and observations of the
phenomena have been observed.
We are interested in determining the model
from the observations
outcomes and
observations
model
Example - Probability
A coin is tossed n = 100 times
We are interested in the observation, X, the
number of times the coin is a head.
Assuming the coin is balanced (i.e. p = the
probability of a head = ½.)
  
100  1 x 1 100 x
p  x   P  X  x  
 2
2
x


100  1 100

for x  0, 1, , 100
 2
 x 
 
Example - Statistics
We are interested in the success rate, p, of a
new surgical procedure.
The procedure is performed n = 100 times.
X, the number of successful times the
procedure is performed is 82.
The success rate p is unknown.
If the success rate p was known.
Then
100  x
100  x
p  x   P  X  x  
 p 1  p 
 x 
This equation allows us to predict the
value of the observation, X.
In the case when the success rate p was
unknown.
Then the following equation is still true the
success rate
100  x
100  x
p  x   P  X  x  
 p 1  p 
 x 
We will want to use the value of the
observation, X = 82 to make a decision
regarding the value of p.
Some definitions
important to Statistics
A population:
this is the complete collection of subjects
(objects) that are of interest in the study.
There may be (and frequently are) more
than one in which case a major objective
is that of comparison.
A case (elementary sampling
unit):
This is an individual unit (subject) of the
population.
A variable:
a measurement or type of measurement
that is made on each individual case in the
population.
Types of variables
Some variables may be measured on a
numerical scale while others are
measured on a categorical scale.
The nature of the variables has a great
influence on which analysis will be used. .
For Variables measured on a numerical scale
the measurements will be numbers.
Ex: Age, Weight, Systolic Blood Pressure
For Variables measured on a categorical scale
the measurements will be categories.
Ex: Sex, Religion, Heart Disease
Note
Sometimes variables can be measured on
both a numerical scale and a categorical
scale.
In fact, variables measured on a numerical
scale can always be converted to
measurements on a categorical scale.
Example
The following variables were evaluated
for a study of individuals receiving head
injuries in Saskatchewan.
1. Cause of the injury (categorical)
•
•
•
•
Motor vehicle accident
Fall
Violence
other
2. Time of year (date) (numerical or
categorical)
•
•
•
•
summer
fall
winter
spring
3. Sex on injured individual (categorical)
•
•
male
female
4. Age (numerical or categorical)
•
•
•
•
•
•
< 10
10-19
20 - 29
30 - 49
50 – 65
65+
5. Mortality (categorical)
•
•
Died from injury
alive
Types of variables
In addition some variables are labeled as
dependent variables and some variables
are labeled as independent variables.
This usually depends on the objectives of
the analysis.
Dependent variables are output or
response variables while the
independent variables are the input
variables or factors.
Usually one is interested in determining
equations that describe how the dependent
variables are affected by the independent
variables
Example
Suppose we are collecting data on
• Blood Pressure
• Height
• Weight
• Age
Suppose we are interested in how
• Blood Pressure
is influenced by the following factors
• Height
• Weight
• Age
Then
• Blood Pressure
is the dependent variable
and
• Height
• Weight
• Age
Are the independent variables
Example – Head Injury study
Suppose we are interested in how
• Mortality
is influenced by the following factors
• Cause of head injury
• Time of year
• Sex
• Age
Then
• Mortality
is the dependent variable
and
• Cause of head injury
• Time of year
• Sex
• Age
Are the independent variables
dependent
Response
variable
independent
predictor
variable
A sample:
Is a subset of the population
In statistics:
One draws conclusions about the
population based on data collected
from a sample
Reasons:
Cost
It is less costly to collect data from a
sample then the entire population
Accuracy
Accuracy
Data from a sample sometimes leads
to more accurate conclusions then data
from the entire population
Costs saved from using a sample can
be directed to obtaining more accurate
observations on each case in the
population
Types of Samples
different types of samples are determined
by how the sample is selected.
Convenience Samples
In a convenience sample the subjects that
are most convenient to the researcher are
selected as objects in the sample.
This is not a very good procedure for
inferential Statistical Analysis but is
useful for exploratory preliminary work.
Quota samples
In quota samples subjects are chosen
conveniently until quotas are met for
different subgroups of the population.
This also is useful for exploratory
preliminary work.
Random Samples
Random samples of a given size are
selected in such that all possible samples
of that size have the same probability of
being selected.
Convenience Samples and Quota samples
are useful for preliminary studies. It is
however difficult to assess the accuracy
of estimates based on this type of
sampling scheme.
Sometimes however one has to be
satisfied with a convenience sample and
assume that it is equivalent to a random
sampling procedure
Some other definitions
A population statistic
(parameter):
Any quantity computed from the values
of variables for the entire population.
A sample statistic:
Any quantity computed from the values
of variables for the cases in the sample.