Session6 - Duke University`s Fuqua School of Business
Download
Report
Transcript Session6 - Duke University`s Fuqua School of Business
Sampling, Causal Research
Market Intelligence
Julie Edell Britton
Session 6
September 5, 2009
1
Today’s Agenda
Announcements
Sampling
Sampling Error
Milan Food Case
WSJ/Harris Interactive Survey
Causal Research – Experiments
Pre-experimental Designs
True Experiments
Factorial Designs and Interaction Effects
2
Announcements
Lots to do between now and next class:
Midterm Exam – take online between Sunday
9/6 and Wednesday 9/16 at 8pm
Open book, open notes – 3 hours from
opening exam to submitting it.
WEMBA A case – with your team – due on
Sunday,9/20 by 8pm
WEMBA B case – with your team – due on
Thursday, 9/24 by 8 pm
Prepare Entitle Direct case – but no slides
3
Announcements
WEMBA A case – What to submit:
Body – 1 page, single-spaced Executive Summary describing the
key dicisions to be made and the information needed to make those
decisions
Appendix A: Proposed sampling plan and Survey Mode
Appendix B: Proposed draft Questionnaire
Appendix C: Key dummy tables and how to turn the data into
action
Everyone should comes to your team meeting with question ideas & draft items
Heed point distribution on grading to guide your time allocation across
subtasks
Once WEMBA A is submitted, I will send all
members of the team, WEMBA B.
WEMBA B - SUBMIT 5 slides -- 1 slide with your dummy tables &
action standards for each of Dan Nagy’s 5 questions
4
Sampling Process
Define population
Elements, extent, time
Identify a good sampling frame
costly to create for yourself
Determine sample size
budget, accuracy needs
Select sampling procedure
way to select elements from the frame
Physically select sample
5
Probability Samples
Each element in population has known, nonzero chance of
being sampled
Simple random sample: all elements have 1/n chance of
being sampled (e.g., cold caller)
Systematic sample: start with randomly selected element
and take every nth element (e.g., teams in this class)
Cluster sampling: pick groups of elements (city blocks,
census tracts, schools) then randomly select n elements from
each cluster
Stratified sampling: divide frame into strata according to a
characteristic (e.g., gender), then sample randomly from
each strata
6
Complex Sampling Procedures
Simple random sampling almost never
used in practice
Stratified Sampling -- Lowers error
Cluster Sampling -- Lowers cost of
getting frames and of data collection
7
Stratified Random Sample
Have frames sorted on some stratification
variable believed to influence the variable
you are estimating.
Lower variance within each subgroup than
across population in general
By ensuring that each subgroup is
represented in right mix, extreme overall
means less likely -- i.e., smaller std. error.
8
Steps for Stratified Random Sample
Divide Population into mutually exclusive and
exhaustive categories.
Decide what sampling fraction f = n / N to
use.
Draw an independent simple random sample
of size f * N(stratum) from each stratum.
Compute stratum mean for each
Estimate overall pop mean as weighted
average of stratum means
Estimate SE as weighted combo of SEs in
each
9
Cluster Sampling
Typically “clusters” are geographic territories.
Start with list of clusters, randomly select
subset, and survey only subset.
Cheaper travel cost, cost per interview
Loss of effective sample size if people in
cluster more alike than if in different cluster
10
Non-Probability Samples
Convenience
Judgment
Pick especially informative elements
Quota
Sample matches population on key control
characteristics correlated with behavior under
study.
Match only really matters for control variables
related to thing you are trying to estimate.
11
Sampling Errors vs. Biases
Sampling Error: variation in estimates of a
population parameter (e.g., awareness of X) due
simply to variations among different random
samples chosen by following the same basic
procedure.
Sample Biases: Expected value of estimated
population parameter differs from true value
because of unwitting under-sampling or
oversampling of certain types of sampling
elements
Availability biases (1-900 polls, Web surveys)
Frame errors (Literary Digest)
12
Milan Foods
Purpose is to illustrate things about sampling
If you had the population data, you would use it rather
than sample from it
13
Distribution from the Population
14
Precision in Simple Random
Statistics review
Distribution of original scores
Mean = Y-bar
Variance -- Average squared deviation from
mean
Standard Deviation -- Square root of variance
Distribution of means of samples of size n
SD of Y-bar distribution
Std Error = SD of pop. est.
Square root (n)
15
Sampling Distribution of Means
of Samples of Size n < N
Milan Foods (FoodExp$)
Population Mean = $43.30; SD = 20.91
What about distribution of sample means for n <
N? If sample size = 100,
Std Error = SD of means of 100-case samples in
pop. = pop SD/sqrt(100) = $20.91/10 = $2.09
95% of all sample means of sample size 100 are
within $43.30 +/- (1.96*2.09): $39.12 to 47.48.
16
In Milan Foods
Simple Random,
SE (for n = 25) = 20.91/sqrt(25) = 4.18
Simple Random,
SE (for n = 100) = 20.91/10 = 2.09
(quad n to ½ SE)
Stratified on I (Any Kids 6-18),
SE (for n=100) = 19.13/10=1.91
17
Precision of Estimate of Pop.
Mean From Sample Mean
In practice, we don’t know pop. SD so we treat
sample SD as our best guess
n = 100, sample mean = $42.41, SD = $18.34
Est. Std. Error = $18.34 / sqrt(100) = $1.834
95% CI: $42.41 +/- (1.96*1.834) = ($38.74 to
$46.08)
18
Same Thing, n = 25
n = 25: sample mean = $45.10, SD = 18.13
Treat as our best guesses of pop.
parameters
Est. Std. Error = $18.13/Sqrt(25) = $3.26
95% CI: $45.10 +/- (1.96*3.26) = ($37.85 to
$52.35)
Note the comparison of n=100 to n = 25
n=100 ($38.74 to $46.08)
19
DH
SL
SM
VN
HN
BS
AC1
AC2
MC
JE
DH
ZI
SJ
BK
SM
SP
AP
PR
RS
MS
LT
Sum
Mean
SE
Actual n=500
45.25
45.1
49.6
40.44
43.3004
45.16
45.84
41.62
47.4
39.32
38.9
44.28
43.18
41.98
39.8
46.04
43.3
38.6
37.8
44.4
45.04
44.34
46.08
44.7
42.13
39.8
40.985
44.62
42.5
43.82
41.64
41.07
42.405
46.225
42.095
45.407
45.39
42.247
45.3
42.03
42.155
45.112
33.4636
35.4
34.6364
34.6455
36.9818
37.1182
41.5364
40.1182
40.8
34.9545
37.5818
37.0455
37.0273
36.61
38.3818
32.55
35.3727
38.3
36.4455
34.9636
35.1364
1128.025 1126.178
43.38559 43.31453
3.1345
2.319
954.5773
36.71451154
2.568
1328.4868 1123.584394
51.09564615 43.21478438
3.114
1.759
35.7445
17.63261
52.4611 43.3004032
20.94042 19.12774012
Mean
SD
43.3004
20.90868
51.8111
51.9933
49.2333
51.3667
51.5889
58.1889
48.6689
48.1556
50.88
47.3889
53.4556
52.5222
53.9889
49.1
54.9556
52.23
49.0489
50.3
55.4333
54.1044
53.8556
41.75667
42.9001716
41.2341988
42.2034824
43.5842092
46.6421564
44.76029
43.7511048
45.35616
40.5748488
44.7567576
44.0409684
44.6939432
42.25548
45.8731576
41.44536
41.5543424
43.724
45.0279856
43.6152416
43.5974784
20
21
22
23
Wall Street Journal
Sampling Error
sample size per school too small for meaningful
comparisons (n=20 to qualify). No evidence that those
ranked 6th -50th differed significantly from each other in
ratings.
Sample Bias
Sample of recruiters open to manipulation
Let respondents pick which of many schools they
recruited they would focus on for ratings. Leads to further
selection bias like 1-900 call in poll.
Sampling method underweights views of large recruiters
who visit many campuses
Response rate not reported, but appears to be 7%.
24
Key Sampling Takeaways
Probability v Non-Probability Samples
For Probability Samples, Standard Error is
the measure of precision
Precision increases with square root of N
More precision with Stratified if and only if
stratifier is correlated with thing estimated
Same principal for Quota samples. Quotas
only help if correlated with variable
25
Experiments
Best way to test causal hypotheses
Independent Variable = hypothesized cause
Manipulated by the researcher/manager
Example: Send a color or black and white
brochure
Dependent Variable = effect
Measured (observed) by researcher/manager
Example: New accounts secured
Random assignment of subjects to conditions
Example: receive color or receive b&w brochure
26 26
Pre-experimental Designs
One group, after-only design
One group, before & after design
Unmatched control group design
Matched control group design
All have threats to validity not present in a
true experiment with random assignment
to treatments.
27
27
Validity
The strength of our conclusions
i.e., Is what we conclude from our experiment correct?
Threats to Validity
History: an event occurring around same time as treatment
that has nothing to do with treatment
Maturation: people change pre to post
Testing: pretest causes change in response
Instrumentation: measures changed meaning
Statistical Regression: Original measure was due to a
random peak (SI Cover Curse) or valley
28
28
One Group After Only
We propose a change in MBA Core, to move Finance
and Marketing up to Term 2 from their position in Term
3. One major motive for this is that students interview
for internships in Term 3, and if they want jobs in
marketing or finance, they have no background at the
time of the interview. Thus, we perceive that we are at
a competitive disadvantage because those courses
are in Term 3.
EG
X
O (Mean = 50%)
X = Marketing Term 3, O = Did/Did Not Get Desired Internship
Key: Lacks a baseline, so worthless.
29
29
One Group Pre-Post Design
Breckenridge Brewery wants to assess the efficacy of
TV ad spots for its new amber ale.
Time 1 (O1): Duke undergrads are brought to the lab
and asked to rate their frequency of buying a series of
brands in various categories over the past week. The
list includes Breckenridge Amber Ale. Mean = 0.2
packs per week.
Time 2 (X): Two weeks of ads for Breckenridge Ale.
Time 3 (O2): Same Duke undergrads brought back to
lab to rate frequency of buying same set of brands over
past week. Mean = 1.3 packs per week.
1.3 - 0.2 = 1.1. We attribute an increase of 1.1 packs
per week to the ad.
30
30
Takeaways for the Day
Probability v Non-Probability Samples
More precision with Stratified if and only if stratifier is
correlated with thing estimated
Threats to validity in pre-experimental and quasiexperimental designs
31