Fundamentals of Statistics

Download Report

Transcript Fundamentals of Statistics

Basic of Statistics
&
Normal Distribution
What Is Statistics?
• Collection of Data
– Survey
– Interviews
• Summarization and Presentation of Data
Frequency Distribution
Measures of Central Tendency and Dispersion
Charts, Tables,Graphs
Statistical Methods
Statistical
Methods
Descriptive
Statistics
Inferential
Statistics
Key Terms
•
1.Population (Universe)
– All Items of Interest
•
2.Sample
– Portion of Population
•
• P in Population
& Parameter
• S in Sample
& Statistic
3.Parameter
– Summary Measure about Population
•
4.Statistic
– Summary Measure about Sample
Statistical
Computer Packages
• 1. Typical Software
– SAS
– SPSS
– MINITAB
– Excel
• 2. Need Statistical
Understanding
– Assumptions
– Limitations
Standard Notation
Measure
Mean
Stand. Dev.
Sample
Population
`X
m
S
s
2
s
Variance
S
Size
n
2
N
Measures of Central Tendency
for
Ungrouped Data
Raw Data
Mean
•
•
•
•
•
Measure of Central Tendency
Most Common Measure
Acts as ‘Balance Point’
Affected by Extreme Values (‘Outliers’)
Formula (Sample Mean)
n
X =
Xi
i= 1
n
=
X1 + X 2 + L + X n
n
Advantages of the Mean
• Most widely used
• Every item taken into account
• Determined algebraically and amenable to
algebraic operations
• Can be calculated on any set of numerical
data (interval and ratio scale) -Always exists
• Unique
• Relatively reliable
Disadvantages
of the Mean
• Affected by outliers
• Cannot use in open-ended
classes of a frequency
distribution
Median
• Measure of Central Tendency
• Middle Value In Ordered Sequence
– If Odd n, Middle Value of Sequence
– If Even n, Average of 2 Middle Values
• Not Affected by Extreme Values
• Position of Median in Sequence
n +1
Positioning
g Point =
2
Advantages of the Median
•
•
•
•
Unique
Unaffected by outliers and skewness
Easily understood
Can be computed for open-ended classes
of a frequency distribution
• Always exists on ungrouped data
• Can be computed on ratio, interval and
ordinal scales
Disadvantages of Median
• Requires an ordered array
• No arithmetic properties
Mode
• Measure of Central Tendency
• Value That Occurs Most Often
• Not Affected by Extreme Values
• May Be No Mode or Several Modes
• May Be Used for Numerical & Categorical
Data
Advantages of Mode
•
•
•
•
Easily understood
Not affected by outliers
Useful with qualitative problems
May indicate a bimodal
distribution
Disadvantages of Mode
• May not exist
• Not unique
• No arithmetic properties
• Least accurate
Relationship among Mean, Median, &Mode
• If a distribution is symmetrical, the mean, median and
mode coincide
• If a distribution is non symmetrical, and skewed to the left
or to the right, the three measures differ.
A positively skewed distribution
(“skewed to the right”)
A negatively skewed distribution
(“skewed to the left”)
Mode
Mean
Median
Mean
Mode
Median
Measures of Dispersion
for
Ungrouped Data
Range
• Measure of Dispersion
• Difference Between Largest & Smallest
Observations
Range = X l arg est - X smallest
“VARIATION”
The Root Of All Process EVIL
What is the standard deviation?
• The SD says how far away numbers on a list
are from their average.
• Most entries on the list will be somewhere
around one SD away from the average. Very
few will be more than two or three SD’s away.
Variance &
Standard Deviation
• Measures of Dispersion
• Most Common Measures
• Consider How Data Are Distributed
• Show Variation About Mean (`X or m)
What is the standard deviation
• Same means
different
standard
deviations
SD
SD
Sample Standard Deviation
Formula
(Computational Version)
s=
(
X ) - n( X )
2
n -1
2
Population Mean
m =

N
x
Population
Standard Deviation
s =

(x - m )
N
2
Coefficient of Variation
• 1. Measure of Relative Dispersion
• 2. Always a %
• 3. Shows Variation Relative to Mean
• 4. Used to Compare 2 or More Groups
• 5. Formula (Sample)
CV =
S
X
100%
Coefficient of Variation
• 1. Measure of relative dispersion
• 2. Always a %
• 3. Shows variation relative to mean
• 4. Used to compare 2 or more groups
• 5. Formula:
• 6. Population Sample
s
CV =
(100)
x
CV =
s
m (100)
_
Summary of
Variation Measures
Range
Interquartile Range
Equation
Q3 - Q1
Standard Deviation
(Sample)
 x
Standard Deviation
(Population)
 x
2
n -1
-m

2
Dispersion about
Sample Mean
Dispersion about
Population Mean
N
(x - x )2
n-1
Squared Dispersion
about Sample Mean
_
Coeff. of Variation
Spread of Middle 50%
- x
_
Variance
(Sample)
Description
x largest - x smallest Total Spread
_
Measure
Relative Variation
s / x (100)
Also known as the Empirical
Rule