Bivariate Data
Download
Report
Transcript Bivariate Data
Bivariate Data – Scatter Plots
and Correlation
Coefficient……
Section 3.1 and 3.2
2 Quantitative Variables……
We represent 2 variables that are quantitative
by using a scatter plot.
Scatter Plot – a plot of ordered pairs (x,y) of
bivariate data on a coordinate axis system. It
is a visual or pictoral way to describe the
nature of the relationship between 2 variables.
Input and Output Variables……
X:
a. Input Variable
b. Independent Var
c. Controlled Var
Y:
a. Output Variable
b. Dependent Var
c. Results from the
Controlled variable
Example……
When dealing with
height and weight,
which variable would
you use as the input
variable and why?
Answer:
Height would be used
as the input variable
because weight is often
predicted based on a
person’s height.
Constructing a scatter plot……
Do a scatter plot of the
following data:
Independent
Dependent
Variable
Variable
Age
Blood Pressure
43
128
48
120
56
135
61
143
67
141
70
152
What do we look for?......
A. Is it a positive correlation, negative
correlation, or no correlation?
B. Is it a strong or weak correlation?
C. What is the shape of the graph?
Answer……With TI
Age
Blood Pressure
43
128
48
120
56
135
61
143
67
141
70
152
Notice……
Notice the following:
A. Strong Positive –
as x increases, y
also increases.
B. Linear - it is a
graph of a line.
Example 2……By Hand
Independent
Dependent
Variable
Variable
# of Absences
Final Grade
6
82
2
86
15
43
9
74
12
58
5
90
8
78
Example 2……With TI
Independent
Dependent
Variable
Variable
# of Absences
Final Grade
6
82
2
86
15
43
9
74
12
58
5
90
8
78
Notice……
A.
B.
Notice the following:
Strong Negative – As
x increases, y
decreases
Linear – it’s the graph
of a line.
Example 3……By Hand
Independent
Dependent
Variable
Variable
Hrs. of Exercise
Amt of Milk
3
48
0
8
2
32
5
64
8
10
5
32
10
56
2
72
1
48
Example 3……With TI
Independent
Dependent
Variable
Variable
Hrs. of Exercise
Amt of Milk
3
48
0
8
2
32
5
64
8
10
5
32
10
56
2
72
1
48
Notice……
Notice:
There seems to be no
correlation between
the hours or exercise a
person performs and
the amount of milk
they drink.
Steps to see on Calculator……
Put x’s in L1 and y’s in L2
Click on “2nd y=“
Set scatter plot to look like
the screen to the right.
Press zoom 9 or set your
own window and then
press graph.
Linear Correlation
Section 3.2
Correlation……
Definition – a
statistical method used
to determine whether a
relationship exists
between variables.
3 Types of Correlation:
A. Positive
B. Negative
C. No Correlation
Positive Correlation: as x increases, y
increases or as x decreases, y decreases.
Negative Correlation: as x increases, y
decreases.
No Correlation: there is no relationship
between the variables.
Linear Correlation Analysis ……
Primary Purpose: to measure the strength of
the relationship between the variables.
*This is a test question!!!!
Coefficient of Linear Correlation
The numerical measure
of the strength and the
direction between 2
variables.
This number is called
the correlation
coefficient.
The symbol used to
represent the
correlation coefficient
is “r.”
The range of “r” values……
The range of the correlation coefficient is -1
to +1.
The closer to 0 you get, the weaker the
correlation.
Range……
Strong
Negative
No Linear Relationship
Strong
Positive
____________________________________
-1
0
+1
Computational Formula using z-scores
of x and y……
r
zx z y
n 1
value mean
z
st .deviation
Example 1……
Find the correlation
coefficient (r) of the
following example.
Use the lists in the
calculator.
x
y
2
80
5
80
1
70
4
90
2
60
Find mean and st. dev first……
Since you will be using a
formula that uses z-scores,
you will need to know the
mean and standard
deviation of the x and y
values.
Put x’s in L1
Put y’s in L2
Run stat calc one var
stats L1 – Write down
mean & st. dev.
Run stat calc one var
stats L2 – Write down
mean & st. dev.
X values:
Y values:
Write down on your paper……You’ll
use them later.
X Values:
Mean = 2.8
St. Dev = 1.643167673
Y Values:
Mean = 76
St. Dev = 11.40175425
Calculator Lists……
Set Formula
Set Formula
Set Formula
L1
L2
L3 = (L1-2.8)/1.643167673
L4 = (L2-76)/11.40175425
L5 = L3 x L4
x
y
z(of x)
z (of y)
z (of x) times z(of y)
2
80
-0.4869
0.35082
-0.1708
5
80
1.3389
0.35082
0.46971
1
70
-1.095
-0.5262
0.57646
4
90
0.7303
1.2279
0.89672
2
60
-0.4869
-1.403
0.68321
2.455298358
Calculate “r”……
From the lists…..
n=5
z x z y 2.455298395
r
zx z y
n 1
2.455298395
0.61
4
What does that mean?
Since r = 0.61, the
correlation is a
moderate correlation.
Do we want to make
predictions from this?
It depends on how
precise the answer
needs to be.
Example 2……
Find the correlation
coefficient (r) for the
following data.
Do you remember what
we found from the
scatter plot?
Age
Blood Pressure
43
128
48
120
56
135
61
143
67
141
70
152
Let’s do this one together……
Remember to use your lists in the calculator.
Don’t round numbers until your final answer.
Find the mean and st. dev. for x and y.
Explain what you found.
X Values:
Y Values:
List values you should have……
n=6
L1
L2
L3
L4
L5
43
128
-1.368
-0.7458
1.0205
48
120
-0.8965
-1.448
1.2978
56
135
-0.1415
-0.1316
0.01863
61
143
0.33028
0.57031
0.18836
67
141
0.89647
0.39483
0.35395
70
152
1.1796
1.36
1.6042
4.483364073
Compute “r”……
zx z y
4.483364073
r
0.897
n 1
5
Describe it……
Since r = 0.897
Strong Positive Correlation
Example 3……
Find the correlation
coefficient for the
following data.
Do you remember what
we found from the
scatter plot?
# of Absences
Final Grade
6
82
2
86
15
43
9
74
12
58
5
90
8
78
X Values:
Y Values:
List Values you should have……
n=7
L1
L2
L3
L4
L5
6
82
-0.4898
0.53626
-0.2626
2
86
-1.404
0.7746
-1.088
15
43
1.5673
-1.788
-2.802
9
74
0.19591
0.05958
0.01167
12
58
0.88158
-0.8938
-0.7879
5
90
-0.7183
1.0129
-0.7276
8
78
-0.0327
0.29792
-0.0097
-5.66529102
Compute “r”……
zx z y
5.66529102
r
0.944
n 1
6
Describe it……
Since r = -0.944
Strong Negative Correlation
Example 4……
Find the correlation
coefficient of the
following data.
Do you remember what
we found from the
scatter plot?
Hrs of Exercise
Amt of Milk
3
48
0
8
2
32
5
64
8
10
5
32
10
56
2
72
1
48
X Values:
Y Values:
List Values you should have……
n=9
Hrs of Exercise
Amt of Milk
L3
L4
L5
3
48
-0.3015
0.30713
-0.0926
0
8
-1.206
-1.476
1.7804
2
32
-0.603
-0.4062
0.24495
5
64
0.30151
1.0205
0.30768
8
10
1.206
-1.387
-1.673
5
32
0.30151
-0.4062
-0.1225
10
56
1.8091
0.66379
1.2008
2
72
-0.603
1.3771
-0.8304
1
48
-0.9045
0.30713
-0.2778
0.537689672
Compute “r”……
zx z y
.5376896717
r
.067
n 1
8
Describe It……
Since r = .067
No Correlation…..No correlation exists
2
What is r ?
It is the coefficient of determination.
It is the percentage of the total variation in y which
can be explained by the relationship between x and
y.
A way to think of it: The value tells you how much
your ability to predict is improved by using the
regression line compared with NOT using the
regression line.
For Example……
If r .89 it means that 89% of the variation
in y can be explained by the relationship
between x and y.
2
It is a good fit.
Assignment……
Worksheet