Correlation coefficient
Download
Report
Transcript Correlation coefficient
Research in business studies
Department of Business Administration
SPRING 2009-10
Quantitative and Qualitative Data Analysis
by
Assoc. Prof. Sami Fethi
Research Methods in Business Studies
© 2009/10, Sami Fethi, EMU, All Right Reserved, Pearson Education, 2005, 3. Ed.
Research in business studies
Quantitative data analysis
Examining differences
Relationship between variables
Explaining and predicting relationship between variables
Data reduction, structure and dimension
Additional methods
Characteristic of qualitative research
Qualitative data
Analytical procedure
Interpretation
Strategies for qualitative analysis
Quantify qualitative data
Validity in qualitative research
2
Research Methods in Business Studies
© 2009/10, Sami Fethi, EMU, All Right Reserved, Pearson Education, 2005, 3. Ed.
Examining differences
Research in business studies
Hypotheses about one mean
In research we often have to make statements about the mean. When the
population variance is unknown, the stadard error of the mean is also
unknown. The standard error of the mean must be estimated from sample
data.
e.g. SDX= SD‘/ N
where
SDX= standard error of mean
SD‘= estimated standard deviation
N= sample size
N
( xi X )
i 1
2
SD‘=
N 1
N-1 is degrees of freedom
Example 1: For a supermarket chain to add a new product, at least 100
units must be sold per week. The new product is tested in ten randomly
selected stores for a limited time.
Apply a test such as one-tailed t test and answer the question that will the
new product sell more than 100 unit per week?
a) construct hypothesis
b) calculate mean and standard deviation if they are not given.
c) calculate standart error of mean
3
d) find t- value
Research Methods in Business Studies
© 2009/10, Sami Fethi, EMU, All Right Reserved, Pearson Education, 2005, 3. Ed.
Examining differences
Research in business studies
a) H0: X<=100
H1: X>100
b) X and SD are given 109.4 and 14.90 respectively.
c) SDX = 14.90/ 10 1 =4.55
d) t= (X-µ)/SDX=(109.4-100)/4.55=2.07
Where t-table is 1.83 at 5% significant level.
We reject the null
Hypotheses about two means
This is usually associated with such a question: Are
the tastes in region A different from the tastes in
region B?
( X1 X 2 ) (1 2 )
Z
e.g.
SD
X1 X 2
Where
X1= sample mean for the first sample
X2= sample mean for the second sample
Research Methods in Business Studies
4
© 2009/10, Sami Fethi, EMU, All Right Reserved, Pearson Education, 2005, 3. Ed.
Research in business studies
Examining differences
SDX1 X 2 = the standard eror of differences in means
µ1 and µ2 are the unknown population means and
the general estimate of:
SD 21 SD 2 2
SD X 1 X 2 SD X2 1 SD X2 2
N1
N2
In assuming the two population variances to be equal, the
common population variance can be generated by pooling the
samples. When the variances are unknonw and the standard
errors of means must be estimated, then the t represents an
adequate test statistics, distributed with v= N1+ N2-2- degrees
of freedom.
Example2: A manufacturer has developed a new product and
wonders whether the label of the package should be red or
blue. The new products with two different labels are tested in
ten randomly selected stores. The means sales obtained for
the red package are 403.0 and for the blue package 390.3. The
standard error of estimate for the difference means is 8.15.
Research Methods in Business Studies
5
© 2009/10, Sami Fethi, EMU, All Right Reserved, Pearson Education, 2005, 3. Ed.
Research in business studies
Examining differences
a) construct hypothesis
b) find t- value
a) H0: (µ1- µ2 )=0
H1: (µ1- µ2 )≠0
or
H0: (µ1- µ2 )<=0
H1: (µ1- µ2 )>0
b)
( X 1 X 2 ) (1 2 )
t
=((403.0-390.3)-0)/8.15=1.56
SDX1 X 2
V=10+10-2=18 degrees of freedom...5% and df 18 so
critical value from the table is 2.101. This means that null
hypothesis is accepted.. H0: (µ1- µ2 )=0. This means that the two
unknown population means are assumed to be same.
Research Methods in Business Studies
6
© 2009/10, Sami Fethi, EMU, All Right Reserved, Pearson Education, 2005, 3. Ed.
Research in business studies
Useful alternative tests
o
In problems involving one or two population means, t-methods are
usually appropriate, but often non-parametric methods are good
alternatives.
e.g. Non-parametric methods have advantage of requiring less in
terms of assumptions and less powerful than t-methods (see siegel
and Castella; 1998).
e.g. The main difference between them is that t-method associates
with means while non-parametric methods are concerned with
medians.
ANOVA- analysis of variance measures comparisons of more than two
groups simultaneously. This method rests on comparing the ratio of
systematic variance to unsystematic variance.
In ANOVA, the following is computed:
Total variation by comparing each observation with the grand mean.
The between-group variation by comparing the treatment means with
the grand mean.
The within-group variation by comparing each score in the group with
the group mean.
Recall-MANOVA-multivariate analysis of variance. This has more than
one dependent variable compared to ANOVA:
Research Methods in Business Studies
7
© 2009/10, Sami Fethi, EMU, All Right Reserved, Pearson Education, 2005, 3. Ed.
Research in business studies
Comparison of more than two group
Example 3: In the
following table,
three advertising
campaigns tested
in 24 randomly
selected cities
comparable in size
and demographics.
The following
output is an anova
analysis results:
Source Sum Degree Mean F-ratio
of
of
sq.
freedom
sq.
Between 49.0
2
24.1 5.88
group
Within
group
87.5
21
total
136.5
23
4.17
8
Research Methods in Business Studies
© 2009/10, Sami Fethi, EMU, All Right Reserved, Pearson Education, 2005, 3. Ed.
Example 3
Research in business studies
a) construct hypothesis
b) find F- value whether significant or not
c) Comment on the F-values
a) H0: G1= G2= G3
H1: G1≠ G2 ≠ G3
d.f= 24-1=23, between group 3-1=2 within group 232=21.
b) Fcalculated=24.1/4.17=5.88
Fcritical=n-k,k-1=24-3,3-1=(21,2). From F-distribution,
Fcritical is 3.47.
c) Since 5.88 is greater than 3.47, we reject the null
hypothesis, that is, the group means are equal and
accept the alternative hypothesis that the advertising
campaigns vary in effectiveness.
Research Methods in Business Studies
9
© 2009/10, Sami Fethi, EMU, All Right Reserved, Pearson Education, 2005, 3. Ed.
Research in business studies
Relationship between variables
In research, we are often preoccupied with whether
there is a relationship or two or more variables covary.
o Correlation coefficient
Based on the Pearson criterion, it examines the strength
of linear relationship between two variables, for example
x and y.
o Theoretically, the Correlation coefficient can take the
values from -1 to 1. A correlation coefficient of 1 tells us
that two variables perfectly covary positively whereas -1
shows that two variables perfectly inversely related.
Close to 0 indicates that the variables are unrelated.
The formula of the Correlation coefficient as fololw:
Where X and Y represent the sample means of X and Y.
rXY
( x X )( y Y )
(x X ) ( y Y )
i
i
2
i
10
2
i
Research Methods in Business Studies
© 2009/10, Sami Fethi, EMU, All Right Reserved, Pearson Education, 2005, 3. Ed.
Research in business studies
Relationship between variables
o Correlation coefficient
A Correlation coefficient shows covariation between two variables,
and not that the variables are causally related.
The square of the Correlation coefficient is the coefficient of
determination.
R2=Explained variation/Total variation
o Example 4- partial correlation
Using the following table (Table 1) and calculate the relationship
between advertisement recognition, appeal and sex. In other words,
Is the relationship between advertisement recognition and appeal
inluenced by controling for sex?
11
Research Methods in Business Studies
© 2009/10, Sami Fethi, EMU, All Right Reserved, Pearson Education, 2005, 3. Ed.
Research in business studies
Example 4
o This is partial correlation and can be formulated as follow based on
partial Correlation coefficient r123 as such ad.roc, appeal, sex
r123
r12 (r13 ) (r23 )
1 r13
2
1 r23
rAd.roc , appeal , sex
2
0.24 (0.33) (0.09)
1 (0.33) 1 (0.09)
2
2
0.29
o This shows that controlling for sex the observed relationship
between ad.roc, and appeal positive and strengthened.
12
Research Methods in Business Studies
© 2009/10, Sami Fethi, EMU, All Right Reserved, Pearson Education, 2005, 3. Ed.
Research in business studies
Explaining and predicting relationship between variables
o Explaining and predicting relationship between variables are
important tasks in business research. One of the most applied and
useful approaches to examining relationships between variables is
regression analysis. In regression analysis, we want to fit a model
that best describes the data which is done in regression analysis by
applying the method of least squares. More precisely, this is done by
fitting a straight line that minimizes the squared vertical deviations
from that line as shown in following figure.
o Single Linear Regression
Y= a0+a1xi+ei
Where Y=the outcome variable, X=predictor variable, a1=slope of the
straight line fitted to the data and a0=intercept of the line and
ei=difference between the score predicted and the score actually
obtained. This is called residual.
Research Methods in Business Studies
13
© 2009/10, Sami Fethi, EMU, All Right Reserved, Pearson Education, 2005, 3. Ed.
Research in business studies
Single Linear Regression
Explaining and Predicting Relationship between Variables
Figure 1 The linear model
Research Methods in Business Studies
14
© 2009/10, Sami Fethi, EMU, All Right Reserved, Pearson Education, 2005, 3. Ed.
Research in business studies
Single Linear Regression
Example 5
o Assume that a car dealer collects data for six months on four
variables; Tv advertising, printing advertising, competitors’
advertising and sales. Y is sales. The car dealer expects carsales to
be positively correlated with TV-ads and Print-ads and negatively
correlated with competitors’ ads.
Table 2 Data matrix
Research Methods in Business Studies
15
© 2009/10, Sami Fethi, EMU, All Right Reserved, Pearson Education, 2005, 3. Ed.
Research in business studies
Simple Mean Regression-output
Example 5
o Assume that a car dealer collects data for six months on four
variables; Tv advertising, printing advertising, competitors’
advertising and sales. Y is sales. The car dealer expects carsales to
be positively correlated with TV-ads and Print-ads and negatively
correlated with competitors’ ads. Based on the information below,
comment on the estimated coefficinent and T-ratio as well as R2
on Tv-Ads.
Table 3 Simple mean regression-output
Research Methods in Business Studies
16
© 2009/10, Sami Fethi, EMU, All Right Reserved, Pearson Education, 2005, 3. Ed.
Research in business studies
Simple Mean Regression-output
Example 5-Answer
o The estimated constant term 0.7 shows that If the dealer does not
use Tv-ads at all (Tv-ads=0), the estimated expected value of
carsale is 0.7 unit that is 7 car. The estimated regression coefficient
of sales on Tv-Ads is 0.9. This coefficient shows that if the variable
Tv-ads is increased by 1 unit, the estimated expected value of
carsales increases by 0.9 units, that is nine car. The result, Rsquare, R2 that is 85.3 percent shows that the sample determination of
coefficient is equal to 0.853. Practically speaking, this means that the
variation in the variable Tv-ads has explained 85.3 percent of the
variations in the dependent variable carsales. Estimated t-value on Tvads is 4.81 which is greater than 2 (tabular value from t-distribution) or
rule of thumb so it is signficant 5% and 1% levels. This means that we
can reject the null hypothesis that is the corresponding population
regression coefficient is equal to zore. The conclusion then is that Tvads and sales are significantly related to each other or Tv-ads has
positive impact on sales.
17
Research Methods in Business Studies
© 2009/10, Sami Fethi, EMU, All Right Reserved, Pearson Education, 2005, 3. Ed.
Research in business studies
Assumptions in Regression analysis
o The expected value of the error term is zero
o The variance for the error term for each X is constant.
This term homoscedasticity. If the variance to e varies
with X, this is termed heteroscedasticity.
o The error for the observations are uncorrelated.
o e should be normally distributed for each X.
o The error term should not be correlated with x-corr(e,
x)=0
o It is also a common assumption that the regression
model should be linear in its parameters.
18
Research Methods in Business Studies
© 2009/10, Sami Fethi, EMU, All Right Reserved, Pearson Education, 2005, 3. Ed.
Research in business studies
Correlation Coefficients-output
Example 6
o Assume that a car dealer collects data for six months
on four variables; Tv advertising, printing advertising,
competitors’ advertising and sales. Y is sales. The car
dealer expects carsales to be positively correlated with
TV-ads and Print-ads and negatively correlated with
competitors’ ads. Use the concept of correlation
coefficient and explain the relationships between the
variable under inspection based on the information given
in table 4.
Table 4 Correlation coefficients-output
Research Methods in Business Studies
19
© 2009/10, Sami Fethi, EMU, All Right Reserved, Pearson Education, 2005, 3. Ed.
Correlation Coefficients-output
Example 6 -Answer
Research in business studies
o The relationship between carsales (dependent) and Tv
advertising, printing advertising, competitors’ advertising
(explanatory) are expected to be high. The relationship
between the explanatory variables as such Tv
advertising, printing advertising, competitors’ advertising
are expected to be low. So high correlation coefficient
between for example Tv advertising and printing
advertising shows a high degree of multicollinearity.
This influences the estimates results badly. To remedy
this situation, the relevant variable can be dropped from
the regression equation. For example between sales
and Tv-ads is 0.92 which is highly reasonable score or
between sales and Comp-ads is 0.155 which is very low
score .
Research Methods in Business Studies
20
© 2009/10, Sami Fethi, EMU, All Right Reserved, Pearson Education, 2005, 3. Ed.
Multiple Regression
Research in business studies
o In multiple regression, at least two or more independent or
explanatory variables are applied to explain/predict the dependent
variable. The purpose is to make the model more realistic, control for
other variables, and explain more of the variance in the dependent
variable as well as reduce the residuals. The following is a typical
example output for a multiple regression.
21
Table 5 Multiple regression – output
Research Methods in Business Studies
© 2009/10, Sami Fethi, EMU, All Right Reserved, Pearson Education, 2005, 3. Ed.
Research in business studies
Dummy Variables
o In a multiple regression, dummy variable can be used in two ways.
As a dependent variables where its values take 1 or 0 that is also
called dichotomous. The other type can be used as independent
variable which takes the value 0 or 1. The dummy variable used in
an analysis when there does not exist as numerical values. For
example, in the following table that is a nominal scaled variable that
can not be ranked so to be applied in a regression analysis, the
seasons need to be assigned numbers
Table 6 Coding of dummy variable
Research Methods in Business Studies
22
© 2009/10, Sami Fethi, EMU, All Right Reserved, Pearson Education, 2005, 3. Ed.
Dummy variables
Example 7
Research in business studies
o In the following table, there three new variables A, B and C and indicates
that the four seasons are different combinations of zeros and ones. Assume
that the following regression model for sales of women’s clothing where the
price (P) is also included, has been estimated:
Sale=1000 - 0.5P+100A - 20B - 50C
a) Calculate the sales in the summer by considering dummy variables as
well (i.e. p=$200 ).
b) Calculate the sales in the autumn by considering dummy variables as well
(i.e. p=$200 ).
c) Compare the sales in winter and spring by keeping the same price.
Table 6 Coding of dummy variable
Research Methods in Business Studies
23
© 2009/10, Sami Fethi, EMU, All Right Reserved, Pearson Education, 2005, 3. Ed.
Dummy variables
Example 7-Answer
o
Research in business studies
In the following table, there three new variables A, B and C and
indicates that the four seasons are different combinations of zeros and
ones. Assume that the following regression model for sales of women’s
clothing where the price (P) is also included, has been estimated:
Sale=1000 - 0.5P+100A - 20B - 50C
a) Calculate the sales in the summer by considering dummy variables as
well (i.e. p=$200 ).
Sale=1000 - 0.5 (200)+100(1) – 20(0) – 50(0)=$1000
b) Calculate the sales in the autumn by considering dummy variables as
well (i.e. p=$200 ).
Sale=1000 - 0.5 (200)+100(0) – 20(1) – 50(0)= $880
c) Compare the sales in winter and spring by keeping the same price.
Winter- Sale=1000 - 0.5 (200)+100(0) – 20(0) – 50(1)= $950
spring- Sale=1000 - 0.5 (200)+100(0) – 20(0) – 50(0)= $900
Research Methods in Business Studies
24
© 2009/10, Sami Fethi, EMU, All Right Reserved, Pearson Education, 2005, 3. Ed.