la modélisation du comportement du consommateur

Download Report

Transcript la modélisation du comportement du consommateur

Data Analysis:
Review and Practical
Application using SPSS
Data of Interest

National Insurance Company
– 1000 questionnaires sent
– 285 respondents

Questionnaire Presentation
– Copy given in class
Coding

Coding broadly refers to the set of all tasks
associated with transforming edited responses into
a form that is ready for analysis
 Steps
– Transforming responses to each question into a set of
meaningful categories
– Assigning numerical codes to the categories
– Creating a data set suitable for computer analysis
Transforming Responses into
Meaningful Categories

A structured question is pre-categorized
 Responses to a nonstructured or open-ended
question to be grouped into a meaningful
and manageable set of categories
Q 1: In this questionnaire, how many noncategorized questions?
Missing-Value Category

A missing value can stem from
– A respondent's refusal to answer a question
– An interviewer's failure to ask a question or
record an answer or a "don't know" that does
not seem legitimate

Best way to treat missing value responses
– Sound questionnaire design
– Tight control over fieldwork
Assigning Numerical Codes

Assign appropriate numerical codes to
responses that are not already in quantified
form
 To assign numerical codes, the researcher
should facilitate computer manipulation and
analysis of the responses
Multiple Response Question –
Rank Order Question

Please rank the following Insurance companies by
placing a 1 beside the company you think is best
overall, a 2 beside the company you think is
second best, and so on.
__________Progressive
__________All State
__________National

Q2 How would you code the previous question to
be added to the questionnaire ?
This question requires as many variables (and columns) as there are
objects to be ranked: 3 separate variables are needed
Creating a Data Set


Organized collection of data records
Each sample unit within the data set is called a
Case or Observation
 Structure of a Data Set
– The number of observations = n
– The total number of variables embedded in the
questionnaire is m, then


Data set = n x m matrix of numbers
Importance of Coding Sheet: Anybody can enter
/check data set. (Copy of coding sheet)
SPSS Data Set

2 Views : Variable and Data.
 Raw Variable (labels and values)
 Transformed Variable (compute and recode)
Preliminary Data Analysis:
Basic Descriptive Statistics

Preliminary data analysis examines the
central tendency and the dispersion of the
data on each variable in the data set
 Measurement level dictates what to do
 Feeling for the data

What can we do: limitations on next slide?
Run descriptives. (outputs 1)
Measures of Central Tendency and
Dispersion for Different Types of
Variables
Why Averages May be
Misleading

Researchers tested a new sauce product and
found
– Mean rating of the taste test was close to the
middle of the scale, which had "very mild" and
"very hot" as its bipolar adjectives

Researcher’s conclusion
– Consumers need really neither really hot nor
really mild sauce
Why Averages May be
Misleading (Cont’d)

Deeper examination revealed
– The existence of a large proportion of
consumers who wanted the sauce to be mild
and an equally large proportion who wanted it
to be hot nor really mild sauce

Moral of the story:
– A clear understanding of the distribution of
responses can help a researcher avoid erroneous
inferences. Talk about Skewness and Kurtosis.
Crosstabs: Occurencies in
specific condition.

Most of the time with categorical variables

Examples to run
Cross-Tabulations- Comparing
frequencies: Chi-square
Contingency Test

Technique used for determining whether there is a
statistically significant relationship between two
categorical (nominal or ordinal) variables
Cross-Tabulation Using SPSS for
National Insurance Company

One crucial issue in the customer survey of
National Insurance Company was how a
customer's education was associated with whether
or not she or he would recommend National to a
friend.
Need to Conduct Chi-square Test
to Reach a Conclusion

The hypotheses are:
– H0:There is no association between educational level
and willingness to recommend National to a friend (the
two variables are independent of each other).
– Ha:There is some association between educational level
and willingness to recommend National to a friend (the
two variables are not independent of each other).
– Let’s do it….
Conducting the Test

Test involves comparing the actual, or observed,
cell frequencies in the cross-tabulation with a
corresponding set of expected cell frequencies(Eij)
Expected Values
Eij =
ninj
----n
where ni and nj are the marginal frequencies, that
is, the total number of sample units in category i
of the row variable and category j of the column
variable, respectively
Chi-square Test Statistic
r
2 = 
c

i=1 j=1
(Oij -
Eij)2
----------------Eij
where r and c are the number of rows and columns, respectively,
in the contingency table. The number of degrees of freedom
associated with this chi-square statistic are given by the product
(r - 1)(c - 1).
National Insurance Company
Study
Computed Chisquare value
P-value
National Insurance Company
Study --P-Value Significance

The actual significance level (p-value) = 0.019
 the chances of getting a chi-square value as high
as 10.007 when there is no relationship between
education and recommendation are less than 19 in
1000.
 The apparent relationship between education and
recommendation revealed by the sample data is
unlikely to have occurred because of chance.
 We can safely reject null hypothesis.
Precautions in Interpreting Cross
Tabulation Results

Two-way tables cannot show conclusive evidence
of a causal relationship

Watch out for small cell sizes

Increases the risk of drawing erroneous inferences
when more than two variables are involved
Overview of Techniques for
Examining Associations


Spearman Correlation Coefficient Technique
The technique is appropriate when
– The degree of association between two sets of ranks (pertaining to
two variables) is to be examined

Illustrative Research Question(s) This Technique Can
Answer:
– Is there a significant relationship between motivation levels of
salespeople and the quality of their performance?
 Assume that the data on motivation and quality of performance
are in the form of ranks, say, 1through 20, for 20 salespeople
who were evaluated subjectively by their supervisor on each
variable

Overview of Techniques for
Examining Associations
(Cont’d)
Pearson Correlation Coefficient Technique
 This technique is appropriate when
– The degree of association between two metric-scaled
(interval or ratio) variables is to be examined

Illustrative Research Question(s) This Technique
Can Answer:
– Is there a significant relationship between customers'
age (measured in actual years) and their perceptions of
our company's image (measured on a scale of 1to 7)?
Spearman Correlation
Coefficient
A Spearman correlation coefficient is a measure of
association between two sets of ranks
n
6  d2i
i =1
rs = 1 - ---------------------------n(n2 - 1)
di = the difference between the ith sample unit's ranks on the
two variables
n = the total sample size
Pearson Correlation
Coefficient
The Pearson correlation coefficient is the degree of association
between variables that are interval-or ratio-scaled.
Pearson correlation coefficient (rxy) between them is given by
rxy =
n
 (Xi – X)(Yi – Y)
i=1
----------------------------(n-1) sx sy
n = sample size (total number of data points)
X and Y = means
Xi and Yi = values for any sample unit i
sx and sy = standard deviations
National Insurance Company– Computing
Pearson Correlation Among Service Quality
Constructs

National Insurance Company was interested in the
correlations between respondents’ overall servicequality perceptions (on the 10-point scale) and
their average ratings along each of the five
dimensions of Service Quality
National Insurance Company– Computing
Pearson Correlation Among Service Quality
Constructs Using SPSS
Interpreting Pearson
Correlation Coefficients

Each of the five service-quality measures
(reliability, empathy, tangibles, responsiveness,
and assurance) is significantly related to the
overall quality (OQ) at the .001 level of
significance
 Responsiveness has the strongest correlation
(.8625)
 Tangibles have the weakest correlation (.5038)
 All the correlations are strong enough to be
meaningful
Comparing Means

Mainly T-tests and ANOVAs

T-test on OQ and gender.
Independent T-tests

Independent Variable with 2 categories max.

Equality of variance (cf output)

88% of chance that the difference of .04 is
due to chance (random effect). Cannot
reject the null hypothesis.
Analysis of Variance

ANOVA is appropriate in situations where
the independent variable is set at certain
specific levels (called treatments in an
ANOVA context) and metric measurements
of the dependent variable are obtained at
each of those levels
Example
24 Stores Chosen randomly for the study
8 Stores randomly chosen for each treatment
Treatment 1
Store brand sold at
the regular price
Treatment 2
Store brand sold at
50¢ off the regular
price
Treatment 3
Store brand sold at
75¢ off the regular
price
monitor sales of the store brand for a week in each store
Table 15.2 Unit Sales Data Under Three
Pricing Treatments
Treatment
Regular Price
50 ¢ off
75 ¢ off
37
46
46
38
43
49
40
43
48
40
45
48
38
45
47
38
43
48
40
44
49
39
44
49
Number of
stores
8
8
8
Mean sales
38.75
44.13
48.00
Unit Sale in
each store
ANOVA –Grocery Store
Hypothesis

Grocery Store Example
– Ho
– Ha

1 = 2 = 3
At least one  is different from one or more of
the others
Hypotheses for K Treatment groups or samples
– Ho
– Ha
1 = 2 = ………..k
At least one  is different from one or more of
the others
Exhibit 15.1 SPSS Computer
Output for ANOVA Analysis
Between -Sub je cts F acto rs
Treatment
group
1
2
3
Value Label
Regular
price
50 cents off
75 cents off
N
8
8
8
Exhibit 15.1 SPSS Computer
Output for ANOVA Analysis
(Cont’d)
Tests of Between-Subjects Effects
Dependent Variable: SALES
Source
Corrected Model
Intercept
TREAT
Error
Total
Corrected Total
Type III Sum
of Squares
345.250a
45675.375
345.250
26.375
46047.000
371.625
df
2
1
2
21
24
23
Mean Square
172.625
45675.375
172.625
1.256
F
137.445
36367.123
137.445
Sig.
.000
.000
.000
a. R Squared = .929 (Adjusted R Squared = .922)
There is less than a .001 probability of obtaining an Fvalue as high as 137.447
ANOVA

OQ recommendation and OQ, individual
variable

OQ and EDUC (Graph)..and post hoc

Overview of Techniques for
Examining Associations
(Cont’d)
Simple Regression Analysis Technique
 This technique is appropriate when
– A mathematical function or equation linking
two metric-scaled (interval or ratio) variables is
to be constructed, under the assumption that
values of one of the two variables is dependent
on the values of the other
Overview of Techniques for
Examining Associations–Simple
Regression Analysis (Cont’d)

Illustrative Research Question(s) this
Technique Can Answer:
– Are sales (measured in dollars) significantly
affected by advertising expenditures (measured
in dollars)?
– What proportion of the variation in sales is
accounted for by variation in advertising
expenditures? How sensitive are sales to
changes in advertising expenditures?
Overview of Techniques for
Examining Associations (Cont’d)

Multiple Regression Analysis Technique
 This technique is appropriate when
– Under the same conditions as simple regression
analysis except that more than two variables are
involved wherein one variable is assumed to be
dependent on the others
Overview of Techniques for
Examining Associations (Cont’d)

Illustrative Research Question(s) this
Technique Can Answer:
– Are sales significantly affected by advertising
expenditures and price (where all three
variables are measured in dollars)?
– What proportion of the variation in sales is
accounted for by advertising and price? How
sensitive are sales to changes in advertising and
price?
Simple Regression Analysis

Generates a mathematical relationship
(called the regression equation) between
one variable designated as the dependent
variable (Y) and another designated as the
independent variable (X)
Independent Variable Vs.
Dependent Variable

Independent variable
– Explanatory or predictor variable
– Often presumed to be a cause of the other

Dependent variable
– Criterion Variable
– Influenced by the independent variable
Practical Applications of
Regression Equations

The regression coefficient, or slope, can
indicate how sensitive the dependent
variable is to changes in the independent
variable
 The regression equation is a forecasting tool
for predicting the value of the dependent
variable for a given value of the
independent variable
Precautions In Using
Regression Analysis

Only capable of capturing linear associations
between dependent and independent variables
 A significant R2-value does not necessarily imply a
cause-and-effect association between the
independent and dependent variables
 A regression equation may not yield a trustworthy
prediction of the dependent variable when the
value of the independent variable at which the
prediction is desired is outside the range of values
used in constructing the equation
Precautions In Using
Regression Analysis (Cont’d)

A regression equation based on relatively
few data points cannot be trusted
 The ranges of data on the dependent and
independent variables can affect the
meaningfulness of a regression equation
Multiple Regression Analysis





Yi = a + b1X1i + b2X2i + … + bkXki
Yi is the predicted value of the dependent variable
for some unit i;
X1i, X2i, …, Xki are values on the independent
variables for unit i;
bl, b2, . . . , bk are the regression coefficients;
a is the Y-intercept representing the prediction for
Y when all independent variables are set to zero
National Insurance Company–
Multiple Regression Using
SPSS

Jill and Tom were interested in conducting a
multiple regression analysis wherein overall
service quality perceptions is the dependent
variable and the average ratings along the
five dimensions are the indpendent variable
Factor Analysis

A data and variable reduction technique that
attempts to partition a given set of variables
into groups of maximally correlated
variables
Factor Analysis Output and Its
Interpretation

Primary output of factor analysis is a factorloading matrix
Table 15.4 Factor-Loading Matrix Based on Data from
Study of Star Customers
Factor Loadings
Factors
F1
F2
Achieved
Communalities
0.96
0.06
.926
X6: No other brand of VCR
0.92
0.17
.875
even comes close to matching
the Star
X1: I did not mind paying the
high Price for my Star VCR
0.89
0.15
.815
X3: I hardly ever worry about
0.18
0.94
.916
anything going wrong with my
Star VCR
X5: The Star VCR has the
latest technology built into it
0.09
0.88
.782
X2: I am pleased with the
0.16
0.86
.766
2.626
2.454
0.438
0.409
X4: My friends are very
impressed with the Star VCR
3 Variables load
high on factor 1
3 Variables load
high on factor 2
variety of things that a Star
VCR can do
VCR
Eigenvalues: Standardized
variance explained by each
factor
Proportion of the total variance
explained by each factor
Reducing Star Data

X1, X4, and X6 can be combined into one
factor
 X2, X3, and X5 can be into a second factor
 6 variables can be reduced to two factors
Potential Applications of
Factor Analysis

Used to
– Develop concise but comprehensive, multiple-
item scales for measuring various marketing
constructs
– Illuminate the nature of distinct dimensions
underlying an existing data set
– Convert a large volume of data into a set of
factor scores on a limited number of
uncorrelated factors
Cluster Analysis

Segment objects into groups so that
members within each group are similar to
one another in a variety of ways
 Useful for segmenting customers, market
areas, and products
Use of Cluster Analysis

Firm offering recreational services wanted to
enter a new region of the country
 They gathered data on more than 100
characteristics including
–
–
–
–

Demographics
Expenditures on recreation
Leisure time activities
Interests of household members
The firm identified one or several household
segments that are likely to be most responsive to
its advertising and to its services
How Does Cluster Analysis
Work?

Cluster analysis measures the similarity
between objects on the basis of their values
on the various characteristics
Exhibit 15.8 Clusters Formed
by Using Data on Two
Characteristics
High
Low
Low
Extent of participation in outdoor sporting events
High