Transcript lecture 1
University of Khartoum
Faculty of Mathematical Science
Department of Information
Technology
Applied Statistics
)301 احصاء تطبيقي (احص
Azza Osman Mohamed
Course component محتوى
Statistical Estimates.
Test of Hypotheses .
Correlation.
Simple Linear Regression
Analysis.
Analysis of Variance.
Non-parametric Test.
Statistic package SPSS.
المقرر
.التقدير االحصائي
.اختبارات الفروض
.االرتباط الخطي
.االنحدار الخطي البسيط
.تحليل التباين
.االختبارات الالمعلمية
SPSS الحزمة االحصائية
Course aim:
The aim of this course is to develop further understanding of
statistical methods.
Outcome: By the end of this course you will be able to:
o
o
o
o
o
o
Understand the inferential statistics.
Describing common measures of correlation and
association, and performing simple regression analysis.
understand the workings of the analysis of variance table
and its application to one-way ANOVA, and two-way
ANOVA situations.
understand the workings of the non-parametric methods.
Perform statistical analysis using SPSS.
Present and interpret the results.
Course evaluation:
o
o
o
o
Assignments.
Labs .
Mid-term exam.
Final exam.
Session 1
Learning Objectives
At the end of session 1 and 2 you will be able to
State Estimation Process
Introduce Properties of Point Estimates
Explain Confidence Interval Estimates
Compute Confidence Interval Estimation for
Population Mean ( known and unknown)
Compute Confidence Interval Estimation for
Population Proportion
Introduction to
Estimation
Point Estimation
Statistical Methods
Statistical
Methods
Descriptive
Statistics
Inferential
Statistics
Estimation
Hypothesis
Testing
Statistical Inference…
Statistical inference is the process by which we acquire information and
draw conclusions about populations from samples.
Statistics
Information
Data
Population
Sample
Inference
Statistic
Parameter
In order to do inference, we require the skills and knowledge of descriptive
statistics, probability distributions, and sampling distributions.
Inference Process
Estimates
& Tests
Population
Sample
Statistics
X, Ps
Sample
Thinking Challenge
Suppose you’re
interested in the
average amount of
money that students in
this class (the
population) have on
them. How would you
find out?
Estimation Methods
Estimation
Point
Estimation
Interval
Estimation
Estimation…
The objective of estimation is to determine the approximate value
of a population parameter on the basis of a sample statistic.
An estimator is a method for producing a best guess about a
population value.
An estimate is a specific value provided by an estimator.
Example: We said that the sample mean is a good estimate of the
population mean
o The sample mean is an estimator
o A particular value of the sample mean is an estimate
Point Estimator…
Definition:
A point estimator draws inferences about a population by
estimating the value of an unknown parameter using a single value
or point.
Gives no information about how close value is to the unknown
population parameter
Example: the sample mean (
population mean ( ).
) is employed to estimate the
Population Parameters Are
Estimated with Point Estimator
Estimate Population
Parameter
with Sample
Statistic
Mea
n
Proportion
X
p
ps
Variance
2
Differences
12
s
X1
2
X2
Point Estimator…
Question: Is there a unique estimator for a population parameter?
For example, is there only one estimator for the population mean?
The answer is that there may be many possible estimators
Those estimators must be ranked in terms of some desirable
properties that they should exhibit
Properties of Point Estimators
The choice of point estimator is based on the following criteria
o Unbiasedness
o Efficiency
o Consistency
Unbiased Estimators
: عدم التحيز
Definition
A point estimator is said to be an unbiased estimator of the
population parameter if its expected value (the mean of its
sampling distribution) is equal to the population parameter it is
trying to estimate ˆ
E ˆ
We can also define the bias of an estimator as follows
Bias ˆ E ˆ
Properties of Point Estimators
To select the “best unbiased” estimator, we use the criterion of
efficiency
Efficiency:
الكفاءة
Definition
An unbiased estimator is efficient if no other unbiased estimator of
the particular population parameter has a lower sampling
distribution variance.
If ˆ1 and ˆ2 are two unbiased estimators of the population
parameter , then ˆ1 is more efficient than ˆ2 if
V ˆ1 V ˆ2
The unbiased estimator of a population parameter with the lowest
variance out of all unbiased estimators is called the most efficient
or minimum variance unbiased estimator (MVUE).
Properties of Point Estimators
Consistency :
االتساق
Definition:
We say that an estimator is consistent if the probability of
obtaining estimates close to the population parameter
increases as the sample size increases
One measure of the expected closeness of an estimator
to the population parameter is its mean squared error
The problem of selecting the most appropriate estimator
for a population parameter is quite complicated
References…..
Inferences Based on a Single Sample: Estimation with Confidence
Intervals John J. McGill/Lyn Noble Revisions by Peter Jurkat
Chapter 10 Introduction on to Estimation Brocks/Cole , a division of
Thomson learning, Inc.
Basic Business Statistics: Concepts & Applications Chapter 8 Confidence Interval Estimation
Chapter 1, Point Estimation Algorithms , Department of Computer
science, University of Tennessee ,USA
Session 2
Introduction to
Estimation
Interval Estimation
Estimation Methods
Estimation
Point
Estimation
Interval
Estimation
Confidence Interval Estimation
Process
Population
Mean, , is
unknown
Random Sample
Mean
X = 50
I am 95%
confident
that is
between
40 & 60.
Interval Estimator…
An interval estimator draws inferences about a population by
estimating the value of an unknown parameter using an interval.
Confidence Interval
Confidence Limit
(Lower)
Sample Statistic
(Point Estimate)
Confidence Limit
(Upper)
Provide us with a range of values that we belive, with a given level
of confidence, containes a true value.
That is we say (with some ___% certainty) that the population
parameter of interest is between some lower and upper bounds.
Gives Information about Closeness to Unknown Population
Parameter
Point & Interval Estimation…
For example, suppose we want to estimate the mean summer
income of a class of IT students. For n=25 students,
is calculated to be 400 $/week.
point estimate
interval estimate
An alternative statement is:
The mean income is between 380 and 420 $/week.
Confidence Interval (CI)..... فترة الثقة
Probability that the unknown population parameter θ falls within
interval ˆ ˆ
,
l
u
.θ تسمي فترة الثقة للمعلمةˆl ,ˆu
probability that “true” parameter is in the interval ˆl ,ˆu
to 1-.
الفترة
is equaled
P(ˆL ˆU ) 1
1- is called confidence level.
. θ على المعلمةˆ ,ˆ
l
u
يسمى معامل الثقة وهو احتمال احتواء الفترة1-
Limits of the interval are called lower and upper confidence limits.
Confidence Interval (CI)..... فترة الثقة
Actual realization of this interval ˆl ,ˆu is called a (1- )% 100 of
confidence interval.
. ( بأن المعلمة المجهولة تقع داخل الفترة1- )% 100 نكون واثقين بمقدار
We are 95% confident that the 95% confidence interval will include
the population parameter
5% is probability that parameter is Not within interval
Typical values are 99%, 95%, 90%, …
Interval and Level of Confidence
Sampling Distribution of the Mean
Z / 2 X
Intervals
extend from
/2
X
1
X
X Z X
X
1 100%
of intervals
constructed
contain ;
100% do not.
to
X Z X
Z / 2 X
/2
Confidence Intervals
Know Central Intervals of the
Normal Distribution
X = ± Zx
-2.58x
-1.65
x
-1.96x
+2.58x
+1.65x
+1.96x
90% Confidence
95% Confidence
99% Confidence
Factors Affecting
Interval Width
1. Data Dispersion
Measured
2. Sample Size
X
by X
Intervals Extend from
X - ZX toX + ZX
= X / n
3. Level of Confidence
(1 - )
Affects
Z
Confidence Interval Estimates
Confidence
Intervals
Mean
x Known
Proportion
x Unknown
Variance
Estimating μ when σ is known…
Known, i.e. standard
normal distribution
Known, i.e. its
assumed we know
the population
standard deviation…
Known, i.e. sample
mean
Unknown, i.e. we
want to estimate
the population mean
Known, i.e. the
number of items
sampled
Confidence Interval Estimator for μ
Usually represented
with a “plus/minus”
( ± ) sign
upper confidence
limit (UCL)
lower confidence
limit (LCL)
Four commonly used confidence
levels…
Confidence Level
Example …
A computer company samples demand during lead time over 25
time periods:
235
421
394
261
386
374
361
439
374
316
309
514
348
302
296
499
462
344
466
332
253
369
330
535
334
Its is known that the standard deviation of demand over lead time is
75 computers. We want to estimate the mean demand over lead
time with 95% confidence in order to set inventory levels…
Example …
“We want to estimate the mean demand over lead time with 95%
confidence in order to set inventory levels…”
Thus, the parameter to be estimated is the pop’n mean μ .
And so our confidence interval estimator will be:
Example …
In order to use our confidence interval estimator, we need the following
pieces of data:
370.16
Calculated from the data…
1.96
75
n
Given
25
therefore:
The lower and upper confidence limits are 340.76 and 399.56.
Thinking Challenge
The
mean of a random sample of
n = 25 isX = 50. Set up a 95%
confidence interval estimate for X
if 2X = 100.
X Z / 2
X Z / 2
n
n
10
10
50 1.96
50 1.96
25
25
46.08 53.92
What is interval for sample size = 100?
Confidence Interval Estimates
Confidence
Intervals
Mean
x Known
Proportion
x Unknown
Variance
Confidence Interval for Mean of a Normal
Distribution with Unknown Variance
If the sample size is large n ≤ 30 :
في حالة حجم العينة كبير
The population variance is not be known
The sample standard deviation will be a sufficiently good
estimator of the population standard deviation
Z
s
n
Thus, the confidence interval for the population mean is:
s
s
X Z / 2
X Z / 2
n
n
Confidence Interval for Mean of a Normal
Distribution with Unknown Variance
If the sample size is small and the population variance is unknown,
we cannot use the standard normal distribution
If we replace the unknown with the sample st. deviation s the
following quantity
X
t
s/ n
follows Student’s t distribution with (n – 1) degrees of freedom
The t-distribution has mean 0 and (n – 1) degrees of freedom
As degrees of freedom increase, the t-distribution approaches the
standard normal distribution
Student’s t Distribution
Estimates the distribution of the sample mean, X , when the
distribution to be sample is normal
Standard
Normal
Bell-Shaped
t (df = 13)
Symmetric
t (df = 5)
‘Fatter’ Tails
0
Z
t
Confidence Interval for Mean of a Normal
Distribution with Unknown Variance
a 100(1-)% confidence interval for the population mean when we
draw small samples from a normal distribution with an unknown
variance 2 is given by
s
X tn 1, / 2
n
Student’s t Table
/2
v
t .10
t .05
t .025
1 3.078 6.314 12.706
Assume:
n=3
df = n - 1 = 2
= .10
/2 =.05
2 1.886 2.920 4.303
/2
3 1.638 2.353 3.182
t values
0 2.920
t
Estimation Example
Mean ( Unknown)
A random sample of n = 25 has X = 50 and s = 8. Set up a 95%
confidence interval estimate for .
S
S
X t / 2
X t / 2
n
n
8
8
50 2.064
50 2.064
25
25
46.69 53.30
with 95% confidence
Thinking Challenge
For a sample where the sample size = 9, the
sample mean = 28 and the sample s.d. = 3.
What is the closest 95% confidence interval
of the mean?
Select A for [27, 29] B for [26.5, 29.5]
C for [26, 30] D for [25.25, 30.75]
E for [24.5, 31.5]
Confidence Interval
For the Population Proportion
If we want to estimate the population proportion and n is large then:
: اذا كان من المتوقع ان ال تكون نسبة النجاح غير معلومة وكان حجم العينة كبير فإن
Z
x
pˆ
n
pˆ p
pˆ 1 p
n
and
Where x is the number of success .
Confidence interval estimate
pˆ z 2
ˆˆ
pq
p pˆ z 2
n
ˆˆ
pq
n
Example ….
A random sample of 400 graduates showed 32 went to graduate
school. Set up a 95% confidence interval estimate for p.
ˆˆ
ˆˆ
pq
pq
pˆ Z / 2
p pˆ Z / 2
n
n
.08 .92
.08 .92
.08 1.96
p .08 1.96
400
400
.053 p .107
with 95% confidence
Thinking Challenge
You’re a production
manager for a newspaper.
You want to find the %
defective. Of 200
newspapers, 35 had
defects. What is the 90%
confidence interval estimate
of the population
proportion defective?
Solution ….
pˆ qˆ
pˆ qˆ
pˆ z / 2
p pˆ z / 2
n
n
pˆ
.175 (.825)
.175 (.825)
.175 1.645
p .175 1.645
200
200
.1308 p .2192
with 90% confidence
References…..
Inferences Based on a Single Sample: Estimation with Confidence
Intervals John J. McGill/Lyn Noble Revisions by Peter Jurkat
Chapter 10 Introduction on to Estimation Brocks/Cole , a division of
Thomson learning, Inc.
Basic Business Statistics: Concepts & Applications Chapter 8 Confidence Interval Estimation.