Transcript Sample
Business Statistics:
A Decision-Making Approach
7th Edition
Chapter 1
The Where, Why, and How of
Data Collection
What is Statistics?
Statistics is the development and application of methods
to collect, analyze and interpret data.
Statistics is a discipline which is concerned with:
designing experiments and other data collection
summarizing information to aid understanding
drawing conclusions from data
estimating the present or predicting the future.
Populations and Samples
A Population is the set of all items or individuals
of interest
Examples:
All likely voters in the next election
All parts produced today
All sales receipts for November
A Sample is a subset of the population
Examples:
1000 voters selected at random for interview
A few parts selected for destructive testing
Every 100th receipt selected for audit
Population vs. Sample
Population
a b
Sample
cd
b
ef gh i jk l m n
o p q rs t u v w
x y
z
c
gi
o
n
r
y
u
Why Sample?
Less time consuming than a census
Less costly to administer than a census
It is possible to obtain statistical results of a
sufficiently high precision based on samples.
Sampling Techniques
Sampling Techniques
Nonstatistical Sampling
Convenience
Statistical Sampling
(Simple)
Random
Systematic
Judgment
Not interested in……
Stratified
Cluster
Statistical Sampling
Items of the sample are chosen based on
known or calculable probabilities
Statistical Sampling
(Probability Sampling)
(Simple) Random
Stratified
Video Clip
Systematic
Cluster
Random Sampling
Every possible sample of a given size has an equal
chance of being selected
Selection may be with replacement or without
replacement
The sample can be obtained using a table of random
numbers or computer random number generator
Stratified Random Sampling
Watch Video Clip: Samples and Surveys (#14)
More often used than “Systematic” and “Cluster”
Stratified random sampling could be used to divide
the employees into groups with similar characteristics
that might affect preferences like marital status or
age and then simple random samples can be taken
from each group.
Systematic Random Sampling
Decide on sample size: n (sample)
Divide frame of N (population) individuals into groups of
k individuals: k=N/n
Randomly select one individual from the 1st group
Select every kth individual thereafter
N = 64
n=8
k=8
First Group
Cluster Sampling
Divide population into several “clusters,” each
representative of the population
Select a simple random sample of clusters
All items in the selected clusters can be used, or items can be
chosen from a cluster using another probability sampling
technique
Population
divided into
16 clusters.
Randomly selected
clusters for sample
Two Basic Divisions of Statistics
Descriptive statistics
Descriptive statistics are numbers that are used to summarize
and describe data.
Examples:
Average salary of various occupations
Median house price in Bakersfield, CA
Descriptive statistics do not infer the properties of the population
from which the sample was drawn --- do not involve generalization
Descriptive Statistics
Collect data
e.g., Survey, Observation,
Experiments
Present data
e.g., Charts and graphs
Characterize data
e.g., Sample mean =
x
n
i
Inferential Statistics
You have been hired by the National Election
Commission to examine how the American people feel
about the fairness of the voting procedures in the U.S.
Who will you ask?
Ask every single American
Ask randomly selected a small group (sample) of Americans
and then draw inferences about the entire country from their
responses.
Inferential Statistics
are used to draw inferences about a population from
a sample (two main methods: estimation and
hypothesis testing).
Sample selection is “critical” matter….
Not from a particular state
Not from a particular party
Tools for Collecting Data
Data Collection Methods
Domino Pizza example from What is Statistics? (#1)
Also Watch Video Clip: Samples and Surveys (#14)
Experiments
Telephone
surveys
Written
questionnaires
Direct observation and
personal interview
Survey Design Steps
Define the issue
What are the purpose and objectives of the survey?
How will the survey be administered?
(e.g. phone, email, face to face)
Define the population of interest
Develop survey questions
Make questions clear and unambiguous
Use universally-accepted definitions
Limit the number of questions
1-17
Survey Design Steps
(continued)
Pre-test the survey
Pilot test with a small group of participants
Assess clarity and length
Determine the sample size and sampling method
Select sample and administer the survey
Types of Questions
Closed-end Questions
Select from a short list of defined choices
Example: Major: __business __liberal arts
__science __other
Open-end Questions
Respondents are free to respond with any value, words, or
statement
Example: What did you like best about this course?
Demographic Questions
Questions about the respondents’ personal characteristics
Example: Gender: __Female __ Male
Observations and Interviews
Observations
Data collected is observed and recorded based on
what takes place
Very subjective
Example: Observe reactions of customers to a new
store layout
Interviews
Can be structured – fixed set of questions
Can use a variety of questions
Requires more time from the researcher
Data Collection Pitfalls
Interview bias
Non response
Selection bias
Observer bias
Measurement error
Internal/External validity
The objective is to collect accurate and reliable data!
Data (variable) Types
Data
Qualitative
(Categorical)
Quantitative
(Numerical)
Examples:
Marital Status
Political Party
Eye Color
(Defined categories)
Discrete
Examples:
Number of Children
Defects per hour
(Counted items)
Continuous
Examples:
Weight
Voltage
(Measured
characteristics)
Qualitative vs. Quantitative Variables (Data)
Qualitative variables (data) take on values that are
names or labels.
Quantitative variables are numerical. They represent a
measurable quantity.
Quantitative variables can be further classified
as discrete or continuous: If a variable can take on any value
between its minimum value and its maximum value, it is called a
continuous variable; otherwise, it is called a discrete variable.
Video clip (easy and simple): Introduction to Variables
Data Measurement Levels
(Please see the Video Clip: Scales of Measurement)
Measurements
Ratio/Interval Data
Rankings
Ordered Categories
Categorical Codes
ID Numbers
Category Names
Ordinal Data
Nominal Data
Highest Level
Complete Analysis
Higher Level
Mid-level Analysis
Lowest Level
Basic Analysis
Data Types
Time Series Data
Ordered data values observed over time
Cross Section Data
Data values observed at a fixed point in time
Data Types
Sales (in $1000’s)
2003
2004
2005
2006
Atlanta
435
460
475
490
Boston
320
345
375
395
Cleveland
405
390
410
395
Denver
260
270
285
280
Cross Section
Data
Time
Series
Data