Confidence intervals
Download
Report
Transcript Confidence intervals
CONFIDANCE INTERVALS
The Aim
By the end of this lecture, the students
will be aware of confidance intervals
2
The Goals
•
•
•
•
To define the confidence interval
To calculate confidence interval for the mean
To calculate confidence interval for the proportion
To define the confidence interval’s relations with the
theoretical distribution
• To calculate the standard error of differences
• To calculate relative deviates
*RD=(mean1 – mean2) / standard error of differences
• Interpretation of confidence intervals.
*Wile examining the difference between groups, confidence
interval contans 0
* At Risk assessment (Odds ratio) confidence interval contains 1
• To explain degrees of freedom
• To calculate the confidance intervals of the means and
differance of the means by using SPSS
3
3
Confidence intervals
● Confidence interval for the mean
-Using the Normal distribution
-Using the t-distribution
● Confidence interval for the proportion
● Confidence interval for the differances
-for numerical data
-for proportions
● Confidance intervals for odds ratio
● Interpretation of confidance intervals
● Degrees of freedom=df
● Applying the relative deviate formula for confidence interval
of the differences
4
•Once we have taken a sample from our
population, we obtain a point estimate of
the parameter of interest, and calculate its
standard error to indicate the precision of
the estimate.
•However, to most people the standard
error is not, by itself, particularly useful.
•It is more helpful to incorporate this
measure of precision into an interval
estimate for the population parameter.
5
5
•We do this by making use of our knowledge of
the theoretical probability distribution of the
sample statistic to calculate a confidence interval
for the parameter.
•Generally the confidence interval extends either
side of the estimate by some multiple of the
standard error; the two values (the confidence
limits) which define the interval are generally
separated by a comma, a dash or the word 'to' and
are contained in brackets.
6
6
Confidence interval for the mean
Using the Normal distribution
•In the previous lecture we stated that the sample mean
follows a Normal distribution if the sample size is large.
• Therefore we can make use of the properties of the
Normal distribution when considering the sample mean.
•In particular, 95% of the distribution of sample means lies
within 1.96 standard deviations (SD) of the population
mean.
•We call this SD the standard error of the mean (SEM), and
when we have a single sample, the 95% confidence
interval (Cl) for the mean is:
• (Sample mean -(1.96 x SEM) to Sample mean + (l.96 x SEM))
7
7
Confidence interval for the mean
Using the Normal distribution
•If we were to repeat the experiment many times, the range
of values determined in this way would contain the true
population mean on 95% of occasions.
•This range is known as the 95% confidence interval for the
mean.
•We usually interpret this confidence interval as the range of
values within which we are 95% confident that the true
population mean lies.
•Although not strictly correct (the population mean is a
fixed value and therefore cannot have a probability attached
to it), we will interpret the confidence interval in this way as
it is conceptually easier to understand.
8
8
Confidence interval for the mean
Using the t-distribution
•Strictly, we should only use the Normal
distribution in the calculation if we know the
value of the variance, 2, in the population.
•Furthermore, if the sample size is small, the
sample mean only follows a Normal distribution
if the underlying population data are Normally
distributed.
•Where the data are not Normally distributed,
and/or we do not know the population variance
but estimate it by s2, the sample mean follows a
t-distribution.
9
9
Confidence interval for the mean
Using the t-distribution
•We calculate the 95% confidence interval
for the mean as
• where t 0.05 is the percentage point
(percentile) of the t-distribution with
(n - 1) degrees of freedom which gives a
two-tailed probability of 0.05
10
10
Confidence interval for the mean
Using the t-distribution
• This generally provides a slightly wider
confidence interval than that using the
Normal distribution to allow for the extra
uncertainty that we have introduced by
estimating the population standard
deviation and/or because of the small
sample size.
• When the sample size is large, the
difference between the two distributions is
negligible.
11
11
Confidence interval for the mean
Using the t-distribution
• Therefore, we always use the t-distribution when
calculating a confidence interval for the mean even if the
sample size is large.
•By convention we usually quote 95% confidence intervals.
•We could calculate other confidence intervals, e.g. a 99%
confidence interval for the mean.
•Instead of multiplying the standard error by the tabulated
value of the t-distribution corresponding to a two-tailed
probability of 0.05, we multiply it by that corresponding to
a two-tailed probability of 0.01
•The 99% confidence interval is wider than a 95%
confidence interval, to reflect our increased confidence that
the range includes the true population mean.
12
12
Calculation of the confidence interval for the
mean by using SPSS
• www.aile.net/agep/istat/diyabet.sav
• Let’s calculate %95 confidance interval of the mean
for age variable.
• Analyze > Descriptive Statistics > Explore
[“Dependent List” kutusuna “Age” değişkenini
koyalım. “Display” kısmında “Statistics” işaretli olsun.
>OK. Aşağıdaki çıktıyı elde ederiz:
Statistic
Age
Mean
95% Confidence
Interval for Mean
54,44
Lower Bound
Upper Bound
55,63
54,33
Median
54,00
Std. Deviation
,603
53,26
5% Trimmed Mean
Variance
Std. Error
156,069
12,493
Minimum
22
Maximum
99
Range
77
Interquartile Range
18
Skewness
,157
,118
Kurtosis
,048
,235
• The borders of 95% confidance intervals are 53.26 and 55.63
• We should use our normal distribution knowledge if we know the
variance of the population (σ2).
• We should remember that, when the sample size is small, ampric
distribution wolud be similar only if the data normally distributed in
the population.
Let’s calculate the confidance intervals of first and second groups
for age by using Table 1 data.
Grup 1
Grup 2
Ortalama 1 = 51
Ortalama 2 = 43,76
Standart sapma 1 = 15,39
Standart sapma 2 = 15,07
n1=25
n2 = 25 kişi
SD = 24; p = 0,05 için tablo t değeri = 2,064 SD = 24; p = 0,05 için tablo t değeri = 2,064
SEM1 = 15,39 / √25 = 3,078
SEM2 = 15,07 / √25 = 3,014
%95 GA 1 [51 ± 2,064 x 3,078]
%95 GA 2 [43,76 ± 2,064 x 3,014]
[44,65 – 57,35]
[37,54 – 49,98]
Kadın
Erkek
Ortalama 1 = 5,95
Ortalama 2 = 0,75
Standart sapma 1 = 5,19
Standart sapma 2 = 1,14
n1= 38
n2 = 12 kişi
SD = 37; p = 0,05 için tablo t değeri ~ 2,02
SD = 11; p = 0,05 için tablo t değeri = 2,201
SEM1 = 5,19 / √38 = 0,84
SEM2 = 0,75 / √12 = 0,22
%95 GA 1 [5,95 ± 2,02 x 0,84]
%95 GA 2 [0,75 ± 2,201 x 0,75]
[4,25 – 7,65]
[-0,90 – 2,40]
Confidence interval for the proportion
• The sampling distribution of a proportion follows a
Binomial distribution.
• However, if the sample size, n, is reasonably large,
then the sampling distribution of the proportion is
approximately Normal with mean μ .
19
19
Confidence interval for the proportion
p = r/ n
μ: Mean of the population
p: Population proportion
n: Sample size from the population
r: The number of individuals in the sample with the characteristic
• We estimate μ by the proportion in the sample,
p = r/n (where r is the number of individuals in the
sample with the characteristic of interest), and
its standard error is
20
20
Confidence interval for the proportion
• If the sample size is small (usually when np or
n(1 - p) is less than 5) then we have to use the
Binomial distribution to calculate exact
confidence intervals.
• Note that if p is expressed as a percentage, we
replace (1 - p) by (100 - p).
21
21
Confidance intervals for the differances
(Numerical data)
• In order to calculate stadart error for the differances at numerical data
two groups: SEM(fark)= √[(s12/n1) + (s22/n2)]
• If we aply the example at Tablo 1;
SEM = √ [(15,39 x 15,39/25) + (15,07 x 15,07 / 25)] = 4,30
%95 GA = Mean 1 – Mean 2 ± t0,05 x SEM
Here degre of freedom is calculated as
df= (n1-1) + (n2-1)
(24+24)=48
T value is approximately 2,009 for 5% level of significance.
95 % CI = (51 -43,76) ± 2,009 x 4,30
95% CI for the differance between two means: [-1,4 – 15,87]
• Note: (While interpreting CI between 2 means, if CI contains ZERO we
conclude that there is no significant diference between the means. (i.e.
the differance would be +, - or zero. At this time we can not claim that one
mean is bigger than the other.)
Calculation of the confidence interval for the
difference between means by using SPSS
• www.aile.net/agep/istat/diyabet.sav
• Let us calculate 95% confidance intervals of mean
differance between men and women for age varible.
• Analyze > Compare Means > Independent-Samples t
test [“Test variables” kutusuna “Age” değişkenini,
“Grouping variable” kutusuna “sex” değişkenini
koyalım. “Define Groups” butonunu tıklayıp “Group
1” için 1, “Group 2” için 2 yazalım > Continue > OK.
Aşağıdaki çıktıyı elde ederiz:
Sex of the
patient
Age
N
Mean
Std.
Deviation
Std. Error
Mean
Male
235
56,20
12,662
,826
Female
194
52,31
11,975
,860
• The Confidance interval of the age differance of men and
women is [1,53-6,24].
Confidence intervals for the differences
(Proportions)
-While we deal with categorical data, the standard error for proportion
differences
SEM(fark) = √[(p1q1/n1) + (p2q2/n2)] formula is used.
According to Tablo 1
1. Group cotains 20 women
2. Group cotains 18 women
-Let us calculate the CI for gender of these groups
SEM(fark) = √[((20/25)x(5/25)/25) + ((18/25)x(7/25)/25)] = 0,12
%95 GA =
%95 GA = (0,8 - 0,72) ± 1,96 x 0,12
The CI betwen 2 persentages: [-0,16 – 0,32]
Confidance interval for Odds ratio
• Odds ratio defined as
-the ratio of the probability of occurence of the event to
-the probability of not occurence of the event.
-e.g: the probability of developing cancer among
smokers / the probability of developing cancer among nonsmokers.
• This ratio is an important parameter that is used for
calculation of risk factor.
• For example, if the odds ratio of lung cancer is 10, we can
make a comment that; smokers develop lung cancer 10
times more than non-smokers.
• It would be better to give confidance intervals with odds
ratio in researches.
• If the confidance interval contains 1, there is no significance
in terms of risk
Interpretation of confidence intervals
When interpreting a confidence interval we are interested in
a number of issues.
• How wide is it?
-A wide interval indicates that the estimate is imprecise;
a narrow one indicates a precise estimate.
-The width of the confidence interval depends on the size of
the standard error, which in turn depends on the sample size
and, when considering a numerical variable, the variability
of the data.
-Therefore, small studies on variable data give wider
confidence intervals than larger studies on less variable data.
28
28
Interpretation of confidence intervals
• What clinical implications can be derived from it?
-The upper and lower limits provide a way of assessing
whether the results are clinically important.
• Does it include any values of particular interest?
-We can check whether a hypothesized value for the
population parameter falls within the confidence interval.
-If so, then our results are consistent with this hypothesized
value.
-If not, then it is unlikely (for a 95% confidence interval, the
chance is at most 5%) that the parameter has this value.
29
29
Degrees of freedom
• You will come across the term 'degrees
of freedom' in statistics.
• In general they can be calculated as the
sample size minus the number of
constraints in a particular calculation;
these constraints may be the parameters
that have to be estimated.
• As a simple illustration, consider a set of
three numbers which add up to a particular
total (T).
30
30
x2
)x
)2x
Degrees of freedom
• Two of the numbers are 'free' to take any value but the
remaining number is fixed by the constraint imposed by T.
• Therefore the numbers have two degrees of freedom.
• Similarly, the degrees of freedom of the sample variance,
• are the sample size minus one, because we have to calculate
the sample mean ( ), an estimate of the population mean, in
order to evaluate s2
31
31
Applying the relative deviate formula for
confidence interval of the differences
Relative Deviate (RD) = (Mean 1 – Mean 2) / standart error of the differaces
•
For the age differances between Grup 1 ve Grup 2 at Table 1
(51,0-43,76)/4,30 = 1,68
Since the result is less than 1,96, the difference is not statistically significant.
• For the gender differances between Grup 1 ve Grup 2 at Table 1
(0,8-0,72)/0,12 = 0,66
Since the result is less than 1,96, the difference is not statistically significant.
• For the age differances between males and females in Diyabet.sav data set
(56,2-52,31)/1,199 = 3,24
Since the result is higher than 1,96, the difference is statistically significant.
Summary
Confidence intervals
● Confidence interval for the mean
-Using the Normal distribution
-Using the t-distribution
● Confidence interval for the proportion
● Confidence interval for the differances
-for numerical data
-for proportions
● Confidance intervals for odds ratio
● Interpretation of confidance intervals
● Degrees of freedom=df
● Applying the relative deviate formula for confidence interval of the
differences
34