No Slide Title
Download
Report
Transcript No Slide Title
Slide 1
Statistics Workshop
Tutorial 6
•Measures
of Relative Standing
• Exploratory Data Analysis
Slide 2
Section 2-6
Measures of Relative
Standing
Created by Tom Wegleitner, Centreville, Virginia
Copyright © 2004 Pearson Education, Inc.
Definition
z Score
Slide 3
(or standard score)
the number of standard deviations
that a given value x is above or below
the mean.
Copyright © 2004 Pearson Education, Inc.
Measures of Position
z score
Sample
Population
x
x
z= s
x
µ
z=
Round to 2 decimal places
Copyright © 2004 Pearson Education, Inc.
Slide 4
Interpreting Z Scores
Slide 5
FIGURE 2-14
Whenever a value is less than the mean, its
corresponding z score is negative
Ordinary values:
z score between –2 and 2 sd
Unusual Values:
z score < -2 or z score > 2 sd
Copyright © 2004 Pearson Education, Inc.
Definition
Slide 6
Q1 (First Quartile) separates the bottom
25% of sorted values from the top 75%.
Q2 (Second Quartile) same as the median;
separates the bottom 50% of sorted
values from the top 50%.
Q1 (Third Quartile) separates the bottom
75% of sorted values from the top 25%.
Copyright © 2004 Pearson Education, Inc.
Quartiles
Slide 7
Q1, Q2, Q3
divides ranked scores into four equal parts
25%
(minimum)
25%
25% 25%
Q1 Q2 Q3
(maximum)
(median)
Copyright © 2004 Pearson Education, Inc.
Percentiles
Slide 8
Just as there are quartiles separating data
into four parts, there are 99 percentiles
denoted P1, P2, . . . P99, which partition the
data into 100 groups.
Copyright © 2004 Pearson Education, Inc.
Finding the Percentile
of a Given Score
Percentile of value x =
number of values less than x
total number of values
Copyright © 2004 Pearson Education, Inc.
Slide 9
• 100
From Percentile to Data Value
• What score is at the kth percentile?
• (1) Rank the data from lowest to highest
• (2) Find L (locator)
L = k% * n
•
a) If L is not a whole number, round up and find
the score in that position
•
b) If L is a whole #, find the average of the
scores in positions L and L+1
Some Other Statistics
Slide 11
Interquartile Range (or IQR): Q3 - Q1
Semi-interquartile Range:
Q3 - Q1
2
Midquartile:
Q3 + Q1
2
10 - 90 Percentile Range: P90 - P10
Copyright © 2004 Pearson Education, Inc.
Slide 13
Section 2-7
Exploratory Data Analysis
(EDA)
Created by Tom Wegleitner, Centreville, Virginia
Copyright © 2004 Pearson Education, Inc.
Definition
Slide 14
Exploratory Data Analysis is the
process of using statistical tools (such
as graphs, measures of center, and
measures of variation) to investigate
data sets in order to understand their
important characteristics
Copyright © 2004 Pearson Education, Inc.
Outliers
• An outlier is a very high or very low value
that stand apart from the rest of the data
• They may be from data collection errors, data
entry errors, or simply valid but unusual data
values.
• Always identify and examine outliers to
determine if they are in error
Important Principles
Slide 16
An outlier can have a dramatic effect on the
mean
An outlier have a dramatic effect on the
standard deviation
An outlier can have a dramatic effect on the
scale of the histogram so that the true
nature of the distribution is totally
obscured
Copyright © 2004 Pearson Education, Inc.
Definitions
Slide 17
For a set of data, the 5-number summary consists
of the minimum value; the first quartile Q1; the
median (or second quartile Q2); the third quartile,
Q3; and the maximum value
A boxplot ( or box-and-whisker-diagram) is a
graph of a data set that consists of a line
extending from the minimum value to the
maximum value, and a box with lines drawn at the
first quartile, Q1; the median; and the third
quartile, Q3
Copyright © 2004 Pearson Education, Inc.
Boxplots
Figure 2-16
Copyright © 2004 Pearson Education, Inc.
Slide 18
Outliers
• A data point is considered an outlier if it is 1.5 times
the interquartile range above the 75th percentile or 1.5
times the interquartile range below the 25th percentile
• In other words, outliers are numbers outside the
interval [Q1-1.5*IQR, Q3+1.5*IQR]
Box Plots and Histograms
• When looking at one variable, it’s a good idea to look
at the box plot and histogram together
• Box plots complement histograms by providing more
specific information about the center, the quartiles,
and outliers
Boxplots
Figure 2-17
Copyright © 2004 Pearson Education, Inc.
Slide 21
Shape, Center and Spread
• What should you tell about a quantitative variable?
• Always report the shape, center and spread
• If the distribution is skewed, report the median and
IQR
• In a symmetric distribution, report the mean and
standard deviation
• If there are any clear outliers and you are reporting
the mean and the standard deviation, report them
with the outliers and without them
Slide 23
Now we are ready for
Part 21 of Day 1