correlations
Download
Report
Transcript correlations
Statistics
Correlation and regression
Introduction
Some methods involve one variable
is Treatment A as effective in relieving arthritic
pain as Treatment B?
Correlation and regression used to
investigate relationships between variables
most commonly linear relationships
between two variables
is BMD related to dietary calcium level?
2
Contents
Coefficients of correlation
meaning
values
role
significance
Regression
line of best fit
prediction
significance
3
Introduction
Correlation
Regression analysis
the strength of the linear relationship between two
variables
determines the nature of the relationship
Is there a relationship between the number of
units of alcohol consumed and the likelihood of
developing cirrhosis of the liver?
4
Pearson’s coefficient of correlation
r
Measures the strength of the linear relationship
between one dependent and one independent
variable
curvilinear relationships need other techniques
Values lie between +1 and -1
perfect positive correlation r = +1
perfect negative correlation r = -1
no linear relationship r = 0
5
r = +1
Pearson’s coefficient of correlation
r = -1
r=0
r = 0.6
6
Scatter plot
BMD
dependent variable
make inferences about
Calcium intake
independent variable
make inferences from
controlled in some cases
7
Non-Normal data
8
Normalised
9
Calculating r
The value and significance of r are calculated by
SPSS
10
SPSS output: scatter plot
11
SPSS output: correlations
12
Interpreting correlation
Large r does not necessarily imply:
strong correlation
r increases with sample size
cause and effect
strong correlation between the number of
televisions sold and the number of cases of
paranoid schizophrenia
watching TV causes paranoid schizophrenia
may be due to indirect relationship
13
Interpreting correlation
Variation in dependent variable due to:
relationship with independent variable: r2
random factors: 1 - r2
r2 is the Coefficient of Determination
e.g. r = 0.661
r2 = = 0.44
less than half of the variation in the dependent
variable due to independent variable
14
15
Agreement
Correlation should never be used to determine
the level of agreement between repeated
measures:
measuring devices
users
techniques
It measures the degree of linear relationship
1, 2, 3 and 2, 4, 6 are perfectly positively correlated
16
Assumptions
Errors are differences of predicted values of Y
from actual values
To ascribe significance to r:
distribution of errors is Normal
variance is same for all values of independent
variable X
17
Non-parametric correlation
Make no assumptions
Carried out on ranks
Spearman’s r
Kendall’s t
easy to calculate
has some advantages over r
distribution has better statistical properties
easier to identify concordant / discordant pairs
Usually both lead to same conclusions
18
Calculation of value and significance
Computer does it!
19
Role of regression
Shows how one variable changes with another
By determining the line of best fit
linear
curvilinear
20
Line of best fit
Simplest case linear
Line of best fit between:
dependent variable Y
BMD
independent variable X
dietary intake of Calcium
Y = a + bX
value of Y when X=0 change in Y when X increases by 1
21
Role of regression
Used to predict
the value of the dependent variable
when value of independent variable(s) known
within the range of the known data
extrapolation risky!
relation between age and bone age
Does not imply causality
22
SPSS output: regression
23
Assumptions
Only if statistical inferences are to be made
significance of regression
values of slope and intercept
24
Assumptions
If values of independent variable are randomly
chosen then no further assumptions necessary
Otherwise
as in correlation, assumptions based on errors
balance out (mean=0)
variances equal for all values of independent variable
not related to magnitude of independent variable
seek advice / help
25
Multivariate regression
More than one independent variable
BMD dependent on:
age
gender
calorific intake
etc
26
Logistic regression
The dependent variable is binary
yes / no
predict whether a patient with Type 1 diabetes
will undergo limb amputation given history of
prior ulcer, time diabetic etc
result is a probability
Can be extended to more than two
categories
Outcome after treatment
recovered, in remission, died
27
Summary
Correlation
strength of linear relationship between two variables
Pearson’s - parametric
Spearman’s / Kendalls non-parametric
Interpret with care!
Regression
line of best fit
prediction
multivariate
logistic
28