Module 01 PowerPoint I

Download Report

Transcript Module 01 PowerPoint I

Ex St 801
Statistical Methods
Introduction
Basic Definitions
STATISTICS:
Area of science
concerned with extraction of
information from numerical data and
its use in making inference about a
population from data that are
obtained from a sample.
Basic Definitions (cont.)
POPULATION: set representing all
measurements of interest to the
investigator.
PARAMETER: an unknown
population characteristic of interest
to the investigator.
Basic Definitions (cont.)
SAMPLE:
subset of measurements
selected from the population of
interest.
STATISTIC: a sample characteristic of
interest to the investigator.
Some Frequently Used
Statistics and Parameters
SAMPLE
POPULATION
MEAN
y

VARIANCE
s2


STANDARD
DEVIATION
s



PROPORTION

Basic Definitions (cont.)
STATISTICAL INFERENCE :
making an "INFORMED GUESS" about
a parameter based on a statistic.
(This is the main objective of statistics.)
STATISTICAL INFERENCE
GATHER DATA
POPULATION
SAMPLE
MAKE INFERENCES
PARAMETERS
 , 
2
,  ,  , etc.
SAMPLE STATISTICS
y,
s , s, ˆ , etc.
2

More Basic Definitions
• A VARIABLE is a characteristic of an
individual or object that may vary for
different observations.
• A QUANTITATIVE VARIABLE measures a
variable scale.
• A QUALITATIVE VARIABLE categorizes
the values of the variable.
RAISIN BRAN EXAMPLE
• A cereal company claims that the
average amount of raisins in its boxes
of raisin bran is two scoops.
• A random sample of five boxes was
taken off the production line, and an
analysis revealed an average of 1.9
scoops per box.
Components of the Problem
• Identify the population
• Identify the sample
• Identify the symbol for the
parameter
• Identify the symbol for the statistic
• Is the variable quantitative or
qualitative?
ASPIRIN AND HEART ATTACKS
1
• Twenty thousand doctors
participated in a study to
determine if taking an aspirin
every other day would result in
a reduction of heart attacks.
ASPIRIN AND HEART ATTACKS
2
• The physicians were randomly
divided into two groups. The first
group (called the treatment group)
received an aspirin every other day,
while the other group (called the
control group) received a placebo.
ASPIRIN AND HEART ATTACKS
3
• At the end of the study, there had
been 104 heart attacks in the
treatment group and 189 heart
attacks in the control group.
Identifying Components
of the Problem
• Identify the population
• Identify the sample
• Identify the symbol for the
parameter
• Identify the symbol for the statistic
• Is the variable quantitative or
qualitative?
Five Steps in a
Statistical Study:
1. Stating the problem
2. Gathering the data
3. Summarizing the data
4. Analyzing the data
5. Reporting the results
Stating the Problem
• Specifically identifying the
population to be sampled
• Identifying the parameter(s) being
studied
Stating the Problem Example
• A researcher wanted to determine if a
vitamin supplement would reduce the
rate of certain cancers.
• A large study was conducted in China
and the results indicated that people
who had the vitamin supplement had a
significantly lower cancer rate.
• Do the results of this study apply to
Americans? Why or why not?
Gathering the Data
• SURVEYS
– Random Sampling
– Stratified Sampling
– Cluster Sampling
– Systematic sampling
Gathering the Data
• EXPERIMENTS
– Completely Randomized
Design
– Randomized Block Design
– Factorial Design
More Definitions
DESCRIPTIVE STATISTICS:
Organizing and describing sample
information.
(Descriptive Statistics describe how
things are.)
Graphical Displays for
Qualitative Data
• PIE CHART
• BAR CHART
Major Volcanoes in the World
13%
30%
8%
3%
11%
35%
Africa
Antarctica
Asia
Europe
North America
South America
Major Volcanoes in the World
South America
North America
Europe
Asia
Antarctica
Africa
0
10
20
30
40
50
Graphical Displays for
Quantitative Data
• HISTOGRAM
• STEM AND LEAF DISPLAY
Histogram of Major Volcanoes
in the World
30
Frequency
25
20
15
10
5
0
2500
5000
7500
10000
12500
Elevation
15000
17500
20000
Life Expectancies in 33 Developed Nations
Country
Austrialia
Austria
Belgium
Britain
Bulgaria
Life
Expectancy
76.3
75.1
74.3
75.3
71.5
Canada
Czechoslovakia
Demark
East Germany
West Germany
Finland
France
Greece
Hungary
Iceland
Ireland
Israel
76.5
71.0
74.9
73.2
75.8
74.8
75.9
76.5
69.7
77.4
73.5
75.2
Country
Italy
Japan
Luxembourg
Malta
The
Netherlands
New Zealand
Norway
Poland
Portugal
Rumania
Soviet Union
Spain
Sweden
Switzerland
United States
Yugoslavia
Life
Expectancy
75.5
79.1
74.1
74.8
76.5
74.2
76.3
71.0
74.1
69.9
69.8
76.6
77.1
77.6
75.0
71.0
Histogram of Life Expectancies
in 33 Developed Nations
10
9
Frequency
8
7
6
5
4
3
2
1
0
71.20
72.80
74.40
76.00
Life Expectancy
77.60
79.20
Stem-Leaf Display for Elevation
STEM LEAF
0 001111
0 222333
0 444444444455555555
0 6666667777777
0 8888888999999999999
1 0000000000000111111
1 22222222333333
1 44555
1 67777
1 8889999
KEY:
UNIT = 1000
1| 2
REPRESENTS
12000
Construction of
a Stem-Leaf Display
• List the stem values, in order, in a
vertical column
• Draw a vertical line to the right of the
stem values
• For each observation, record the leaf
portion of the observation in the row
corresponding to the appropriate stem
• Reorder the leaves from the lowest to
highest within each stem row
Construction of
a Stem-Leaf Display (cont.)
• If the number of leaves appearing in
each stem is too large, divide the stems
into two groups, the first corresponding
to leaves 0 through 4, and the second
corresponding to leaves 5 through 9.
(This subdivision can be increased to
five groups if necessary).
• Provide a key to your stem and leaf
coding, so the reader can reconstruct
the actual measurements.
Numerical Measures for
Summarizing Data
TYPES:
1. Measures of CENTRAL TENDENCY
2. Measures of VARIABILITY
3. Measures OF RELATIVE LOCATION
The Arithmetic Mean
The ARITHMETIC MEAN of a set of n
measurements (y1, y2, ..., yn ) is equal to
the sum of the measurements divided by
n.
The mathematical notation for the
ARITHMETIC MEAN is:

n
y
i 1
n
yi
The Median
The MEDIAN of a set of n
measurements (y1, y2, ..., yn ) is the
value that falls in the middle position
when the measurements are ordered
from the smallest to the largest.
RULE FOR CALCULATING
THE MEDIAN
1 Order the measurements from the
smallest to the largest.
2 A) If the sample size is odd, the
median is the middle measurement.
B) If the sample size is even, the median
is the average of the two middle
measurements.
Example
A random sample of six values were
taken from a population. These values
were:
y1=7, y2=1, y3=10, y4=8, y5=4, and y6=12.
What are the sample mean and
sample median for these data?
Sample Mean
y1  y 2  y3  y 4  y5  y6
y
n
CALCULATIONS FOR THE SAMPLE
MEDIAN
( Ordered Sample)
y2=1, y5=5, y1=7, y4=8, y3=10, y6=12
MEDIAN = ( 7 + 8 ) / 2 = 7.5
Consider the following sample:
4
46
18
47
36
48
39
49
41
49
42
50
43
51
44
53
44
54
45
60
Which measure of central tendency best
describes the central location of the
data:
THE SAMPLE MEAN OR SAMPLE MEDIAN?
STEM LEAF
04
0
1
18
2
2
3
3 69
4 12344
4 567899
5 0134
5
60
MEASUREMENTS OF
VARIABILITY
• RANGE
• VARIANCE
• STANDARD DEVIATION
Deviation
The DEVIATION of an observation yi from
the sample mean is equal to:
( yi  y )
Deviations to the left of the sample mean
are negative and deviations to the right
of the sample mean are positive.
Also, notice that the larger the squared
deviation, the further away the
observation is from the mean.
Formula for the
Sample Variance
n
  y y 
n
2
i
S 
2
i 1
n 1


i 1
 n


yi 


i 1


2
yi 
n

n 1
2
Obs. Y
1
2
3
4
5
6
7
1
10
8
4
12
y 7
(Y-Y)
(Y-Y)2
Obs.
Y
0
-6
3
1
-3
5
0
36
9
1
9
25
1
2
3
4
5
6
7
1
10
8
4
12
49
1
100
64
16
144
42
374
80
Y2
Calculation of Sample Variance



yi 


i 1



n
n
n
S2 
2


y

y
 i
i 1
n 1
80

5
 16

n
S2 

yi2
i 1
n 1

374 
5
 16
42 2
6
2
THE EMPIRICAL RULE
Given a large set of measurements
possessing a mound-shaped histogram, then
• the interval y  s contains
approximately 68% of the measurements.
• the interval y  2s contains
approximately 95% of the measurements.
• the interval y  3s contains
approximately 99.7% of the measurements.
Percent of Observations Included between
Certain Values of the Standard Deviation
68%
95%
99.7%
-4
s
-3
s
-2
s
-1
s
0
1
s
2
s
3s
4
s
Major Volcanoes in the World
Emprical Rule
Interval
Pecentage of
Actual Percentage of
Observations Expected to Observations Found
Fall within the Inteval
within the Interval
4912 to 14058
68%
66.6%
339 to 18630
95%
95.7%
-4232 to 23202
99.7%
100%
TWO MEASURES OF RELATIVE
STANDING
• Percentile
• Quartile
The Pth Percentile is the value Xp
such that p% of the measurements
will fall below that value and (100-p)%
of the measurements will fall above
that value.
(100-p)%
p%
X
p
Quartiles divide the measurements
into four parts such that 25% of the
measurements are contained in each
part. The first quartile (Lower Quartile)
is denoted by Q1, the second by Q2,
and the third (Upper Quartile) by Q3.
25%
25%
Q1
25%
Q2
25%
Q3
Box and Whisker Plot
Life Expectancies in 33 Developed Nations
80
78
76
74
72
70
68
Calculating Fence Values
Lower Inner Fence: Q1 - 1.5 (IQR)
Upper Inner Fence: Q3 + 1.5 (IQR)
Lower Outer Fence: Q1 - 3 (IQR)
Upper Outer Fence: Q3 + 3 (IQR)
EXAMPLE: Construct a Box-and-Whisker
Plot for the elevations of volcanoes in Africa
1,650 5,981 7,745
13,451 19,340
Median =
Lower
Upper
Lower
Upper
9,281 10,023
Q1 =
Inner Fence =
Inner Fence =
Outer Fence =
Outer Fence =
Q2 =
11,400 12,198
IQR =
BOX AND WHISKER PLOT
MAJOR VOLCANOES IN AFRICA
20000
18000
16000
14000
12000
10000
8000
6000
4000
2000
0
Ex St 801
Statistical Methods
The End