Chapter 4 - McGraw Hill Higher Education

Download Report

Transcript Chapter 4 - McGraw Hill Higher Education

McGraw-Hill/Irwin
Copyright © 2013 by The McGraw-Hill Companies, Inc. All rights reserved.
Chapter 4
Descriptive Statistics
Chapter Contents
4.1
4.2
4.3
4.4
4.5
4.6
4.7
4.8
Numerical Description
Measures of Center
Measures of Variability
Standardized Data
Percentiles, Quartiles, and Box Plots
Correlation and Covariance
Grouped Data
Skewness and Kurtosis
4-2
Chapter 4
Descriptive Statistics
Chapter Learning Objectives
LO4-1:
LO4-2:
LO4-3:
LO4-4:
LO4-5:
LO4-6:
LO4-7:
LO4-8:
LO4-9:
LO4-10:
LO4-11:
Explain the concepts of center, variability, and shape.
Use Excel to obtain descriptive statistics and visual displays.
Calculate and interpret common measures of center.
Calculate and interpret common measures of variability.
Transform a data set into standardized values.
Apply the Empirical Rule and recognize outliers.
Calculate quartiles and other percentiles.
Make and interpret box plots.
Calculate and interpret a correlation coefficient and covariance.
Calculate the mean and standard deviation from grouped data.
Assess skewness and kurtosis in a sample
4-3
Chapter 4
LO4-1
4.1 Numerical Description
LO4-1: Explain the concepts of center, variability, and shape.
Three key characteristics of numerical data:
4-4
Chapter 4
LO4-2
4.1 Numerical Description
LO4-2: Use Excel to obtain descriptive statistics and visual displays.
EXCEL Displays for Table 4.3
4-5
Chapter 4
LO4-3
4.2 Measures of Center
LO4-3: Calculate and interpret common measures of center.
4-6
Chapter 4
4.2 Measures of Center
LO4-1
LO4-1: Explain the concepts of center, variability, and shape.
Shape
•
Compare mean and median or look at histogram to determine degree of skewness.
4-7
Chapter 4
LO4-4
4.3 Measures of Variability
LO4-4: Calculate and interpret common measures of variability.
•
Variation is the “spread” of data points about the center of the distribution in a sample.
Consider the following measures of variability:
4-8
Chapter 4
4.4 Standardized Data
Chebyshev’s Theorem
•
•
•
•
•
For any population with mean m and standard deviation s, the percentage of observations
that lie within k standard deviations of the mean must be at least 100[1 – 1/k2].
For k = 2 standard deviations, 100[1 – 1/22] = 75%. So, at least 75.0% will lie within m + 2s.
For k = 3 standard deviations,
100[1 – 1/32] = 88.9%
So, at least 88.9% will lie within m + 3s
Although applicable to any data set, these limits tend to be too wide to be useful.
4-9
Chapter 4
LO4-6
4.4 Standardized Data
LO4-6: Apply the Empirical Rule and recognize outliers.
Note: No upper
bound is given.
Data
outside
m + 3s
are rare.
Unusual observations
are those that lie
beyond m + 2s.
Outliers are
observations
that lie beyond m + 3s.
The Empirical Rule
4-10
Chapter 4
LO4-5
4.4 Standardized Data
LO4-5: Transform a data set into standardized values.
Defining a Standardized Variable
•
A standardized variable (Z) redefines each observation in terms the number of
standard deviations from the mean.
A negative z
value means the
xi  m
Standardization formula for a
zi 
observation is
population:
s
below the mean.
Standardization formula for a
sample:
xi  x
zi 
s
Positive z means
the observation is
above the mean.
4-11
Chapter 4
LO4-7
4.5 Percentiles, Quartiles, and Box-Plots
LO4-7: Calculate quartiles and other percentiles.
Percentiles
•
•
Percentiles are data that have been divided into 100 groups.
For example, you score in the 83rd percentile on a standardized test. That means
that 83% of the test-takers scored below you.
•
•
•
Deciles are data that have been divided into 10 groups.
Quintiles are data that have been divided into 5 groups.
Quartiles are data that have been divided into 4 groups.
4-12
Chapter 4
LO4-8
4.5 Percentiles, Quartiles, and Box-Plots
LO4-8: Make and interpret box plots.
•
•
•
A useful tool of exploratory data analysis (EDA).
Also called a box-and-whisker plot.
Based on a five-number summary: Xmin, Q1, Q2, Q3, Xmax
• A box plot shows central tendency, dispersion, and shape.
Fences and Unusual Data Values
Values outside the inner fences are
unusual while those outside the outer
fences are outliers
4-13
4.6 Correlation and Covariance
LO4-9: Calculate and interpret a correlation coefficient and covariance.
Correlation Coefficient
•
The sample correlation coefficient r is a statistic that describes the degree of linearity
between paired observations on two quantitative variables X and Y. Note: -1 ≤ r ≤ +1.
The covariance of two
random variables X and Y
(denoted σXY ) measures the
degree to which the values of
X and Y change together.
Population
Sample
4-14
Chapter 4
LO4-9
4.6 Correlation and Covariance
LO4-9: Calculate and interpret a correlation coefficient and covariance.
Covariance
A correlation coefficient is the
covariance divided by the
product of the standard
deviations of X and Y.
4-15
Chapter 4
LO4-9
Chapter 4
LO4-10
4.7 Grouped Data
LO4-10: Calculate the mean and standard deviation from grouped data.
Group Mean and Standard Deviation
4-16
Chapter 4
LO4-11
4.8 Skewness and Kurtosis
LO4-11: Assess skewness and kurtosis in a sample.
Skewness
4-17
Chapter 4
LO4-11
4.8 Skewness and Kurtosis
LO4-11: Assess skewness and kurtosis in a sample.
Kurtosis
4-18