Descriptive Statistics –

Download Report

Transcript Descriptive Statistics –

BUSINESS STATISTICS I
Descriptive Statistics & Data Collection
Descriptive Statistics – Graphic
Guidelines
• Pie charts – nominal variables, eg. ‘religion’; cross-sectional
•
•
•
•
data
Bar charts – nominal or interval variables, eg. ‘religion’ or
‘margin debt’; time series or cross sectional data
Line graphs – interval variables, eg. margin debt; time series
data
Histograms – interval variables, eg. golf scores; cross sectional
data – depicts the SHAPE of a frequency distribution
• Stem and Leaf Plot– quick and dirty histogram
• Ogive – depicts a cumulative percentage frequency
distribution
Scatter diagram – two interval variables, eg. Margin vs, the
market value
Graphic Deception – some widely used
methods
• Graphs without a scale on one axis
• Captions or titles intended to influence
• Reporting only absolute changes in value and not
percentage changes
• Changing the scale of the vertical axis with
breaks or truncations
• Changing the scale of the horizontal axis
• Changing the width as well as the height of bars
or pictogram figures
Summary of data types and available
graphic techniques
Interval
Cross-sectional data Histograms
Percentage histograms
Ogives
Stem and leaf plots
Box plots
Time-series data
Line charts
Bar charts
Nominal
Pie charts
Bar charts
Complex pie
or bar charts
Describing the frequency distribution for interval,
cross sectional data
• Shape
• Center
• Spread
Describing distributions
• SHAPE
• Graphs
•
•
•
•
•
Histograms
Percentage histograms
Ogives
Stem and leaf plots
Box plots
• Words
• Symmetric, skewed, bell shaped, flat, peaked
Descriptive statistics –
• CENTER
• Quantitative measures
• Mean (arithmetic)
• Median
• Mode
• Geometric mean
• Mid-point of the range
Descriptive statistics –
• Numeric Measures – cont’d.
• SPREAD (dispersion)
• Range
• Symmetric distributions
• Standard deviation
• Variance
• Coefficient of variation
• Skewed distributions
• Quartiles
• Min
• Max
• Interquartile range
• Percentiles
Z Scores and t-scores
• Measures distance from the mean in standard
deviations
• Eg. T score for bone density – 1 to 2.5 standard
deviations below the norm (mean) for a 23 year
old indicates osteopenia; 2.5 or more indicates
osteoporosis
• (X-m)/s = z score
• (X – Xbar)/s = t score
Empirical Rule
• For mound shaped distributions
• About 68% of observations are within one standard deviation of the
mean
• About 95% of observations are within two standard deviations of
the mean
• Almost all (99.7%) observations are within three standard
deviations of the mean
Chebysheff’s Rule
• For all distributions
• Let k be greater than or equal to 1
• At least 1-(1/k2) of the observations are within k standard
deviations of the mean
• Examples
• K=1 zero observations may be within one standard
deviation of the mean
• K=2 3/4th’s of observations must be within two
standard deviations of the mean
• K=3 8/9th’s of observations must be within three
standard deviations of the mean
Sampling
• ‘Scientific sampling’ is random sampling
•
•
•
•
Simple random samples
Systematic random samples
Stratified random samples
Random cluster samples
• What?
• Why?
• How?
What is random sampling?
• Simple random sample -Every sample with the
same number of observations has the same
probability of being chosen
• Choose first sample member randomly
• Stratified random sample – Choose simple
random samples from the mutually exclusive
strata of a population
• Cluster sample – Choose a simple random
sample of groups or clusters
Why sample randomly?
• To make valid statistical inferences to a population
• Conclusions from a non-probability sample can be
questioned
• Conclusions from a self-selected sample are SLOP
How can samples be randomly chosen?
• Random number generators (software)
• Ping pong balls in a hopper
• Other mechanical devices
• Random number tables
• Slips of paper in a ‘hat’
With or without replacement