Regression and Correlation
Download
Report
Transcript Regression and Correlation
Biostatistics
Unit 9
Regression and
Correlation
1
Regression and Correlation
• Regression and correlation analysis
studies the relationships between variables.
• This area of statistics was started in the
1860s by Francis Galton (1822-1911) who
was also Darwin’s Cousin.
2
Data for Regression and Correlation
• Data are in the form of (x,y) pairs.
• A scatter plot (x-y) plot is used to
display regression and correlation data.
• The regression line has the form
y = mx + b
• In actual practice, two forms are used
which are y = ax + b and y = a + bx.
3
General Regression Line
y = a + bx + e
a is the y-intercept
b is the slope
e is the error term
4
Calculations
• For each (x,y) point, the vertical
distance from the point to the
regression line is squared.
• Adding these gives the sum of squares.
• Regression analysis allows the
experimenter to predict one value
based on the value of another.
• A similar procedure is used in
biochemistry with standard curves.
5
Data
Data are in the form of (x,y) pairs. List L1
contains the x values and List L2 contains the
y values.
6
Calculation of regression equation using TI-83
• The Linear Regression test is used.
• Conclusion: The equation of the
regression line is y = 4.54x – 1.57
7
Using the regression equation
• Interpolation is used to find values of points
between the data points. This is a relatively
safe and accurate process.
• Extrapolation is used to find values of
points outside the range of the data. This
process is more risky especially as you get
further and further from the ends of the line.
Be careful to make sure that the
calculations give realistic results.
8
Significance of regression analysis
It is possible to perform the linear regression t
test to give a probability. In this test:
b is the population regression coefficient
r is the population correlation coefficient
The hypotheses are:
H0: b and r = 0
HA: b and r 0
9
Calculations and Results
Calculator setup
10
Calculations and Results
Results
Conclusion: p < .001 (.000206)
11
Correlation
Correlation is used to give information about
the relationship between x and y. When the
regression equation is calculated, the
correlation results indicate the nature and
strength of the relationship.
12
Correlation Coefficient
The correlation coefficient, r, indicates the
nature and strength of the relationship.
Values of r range from -1 to +1. A correlation
coefficient of 0 means that there is no
relationship.
13
Correlation Coefficient
Perfect negative correlation, r = -1.
14
Correlation Coefficient
No correlation, r = 0.
15
Correlation Coefficient
Perfect positive correlation, r = +1.
16
Coefficient of Determination
The coefficient of determination is r2. It has
values between 0 and 1. The value of r2
indicates the percentage of the relationship
resulting from the factor being studied.
17
Graphs
Scatter plot
18
Graphs
Scatter plot with regression line
19
Data for calculations
20
Calculations
Calculate the regression equation
21
Calculations
Calculate the regression equation
Result: The regression equation is
y = 4.54x – 1.57
22
Calculations
Calculate the correlation coefficient
23
Coefficient of Determination
• The coefficient of determination is r2. It
indicates the percentage of the contribution
that the factor makes toward the relationship
between x and y.
• With r = .974, the coefficient of
determination r2 = .948.
• This means that about 95% of the
relationship is due to the temperature.
24
Residuals
• The distance that each point is above or
below the line is called a residual.
• With a good relationship, the values of the
residuals will be randomly scattered.
• If there is not a random residual plot then
there is another factor or effect involved that
needs attention.
25
Calculate the residual variance
26
Calculate the residual variance
Result: The residual variance is 56.1366.
Residual SD is 7.4924 which TI-83 gives.
27
Results of linear regression t test
28
Results of linear regression t test
29
fin
30