Transcript PowerPoint

GRS LX 700
Language Acquisition
& Linguistic Theory
Week 6. Experiments,
methodology, the responsible use
of numbers, and paper structure.
Why we do experiments

In this context, we’re generally interested in the
state and developmental course of children’s
linguistic knowledge.



What does the child know?
To what extent does it differ from what an adult
knows?
We have logical, abstract reasons to believe that
a lot of what kids ultimately know about
language is not deduced from their language
input— but what evidence is there?
Universal Grammar

The poverty of the stimulus



Properties of “innateness”?




1 2 3 — what’s next?
1 2 3 5 — what’s next?
Independence from external evidence.
Universality?
Early emergence?
Constraints and the absence of negative evidence.


Candidate A: Who does Arnold wanna make breakfast for?
Who does Arnold wanna make breakfast?
Candidate B: Who does Arnold wanna make breakfast for?
*Who does Arnold wanna make breakfast?
Hypotheses


Where you have two hypotheses that make
different predictions, you use an experiment to
determine which predictions are actually borne
out.
A standard setup in child language studies is
pitting an experimental hypothesis (H1) such as
Children (Like Those We Are Testing) Know
Grammatical Constraint X against the null
hypothesis (H0) They Don’t.
This should be difficult



Experiments are naturally
subject to error. We’re
measuring things in the
real world.
We only want to reject the
null hypothesis if we’re
sure.
So, we want to “stack the
deck against” H1. Exclude
the possibility that kids
give the correct answers
for the wrong reasons.
H0
H0
actually actually
true
false
Reject
H0
Do not
reject
H0
Type I
error
Correct
Correct
Type II
error
Production and
comprehension



Two things to look at to
assess kids’ grammatical
knowledge.
Naturalistic production
(e.g., CHILDES
transcripts) is good for
some things, but not so
good for others.
If we’re interested in a
particular construction,
often this needs to be
elicited in an
experimental setting in
order to get enough
examples.


Non-production:
Constraint? Or
preference?
Test comprehension.



Act-out
Grammaticality
judgment
Truth value judgment
Elicited production

Wanna contraction







Who do you want to kiss?
Who do you wanna kiss?
I want to kiss Bill
I wanna kiss Bill
Who do you want to kiss Bill?
*Who do you wanna kiss Bill?
All signs point to the last
one being good. But it isn’t.
Do kids know this?

So: try to get kids to
say Who do you
wanna kiss Bill?

Great. Suppose they
don’t. Have we shown
that they know the
constraint?

Hint: No.
But I don’t wanna contract.


The fact that a kid never
says Who do you wanna
kiss Bill? doesn’t tell us
anything unless the kid
would have otherwise
contracted.
So we need controls in
order to determine
whether the kid
knows/uses wanna
contraction to begin with.


So: Try to get the kid to
say Who do you wanna
kiss? as well as trying to
get them to say Who do
you wanna kiss Bill?
If the kid never says
wanna, we have no
evidence of anything.

Also: Maybe the kid knows
all there is to know about
wanna contraction, but
prefers not to contract.
Here’s how it might be done.



Design the experiment to be
some kind of game, to keep
the kid interested and willing
to continue.
Use a puppet. Kids are more
willing to interact with the
puppet, more willing to
disagree with a puppet.

Though: Gordon (1996) relates
a tale of testing Kadiweu kids,
who “had never encountered
puppets before and reacted
with a mixture of curiosity and
fear that often led to tears”
Ratty missed snacktime and
is probably hungry. Various
toy food items are arrayed in
the playspace.







E: The rat looks kind of hungry.
I bet he wants to eat
something. Ask him what.
C: What do you want?
R: Huh?
C: What do you wanna eat?
R: Is that pepperoni pizza over
there? I’ll have some of that.
E: I bet the rat wants someone
to brush his teeth for him. Ask
him who.
C: Who do you want to brush
your teeth?
Analyzing the results

Find the kids who





Expectation given H0
would be that kids would
not distinguish subjects
and objects; they would
be as likely to contract in
one case as in the other.

If kids contract often with
objects and never with
subjects, that points to
H1. We could reject H0.
Produced both subject
and object questions
Produced wanna
sometimes.
H1: Kids know that
want to cannot
contract to wanna
over a (subject) whtrace.
H0: They don’t.
Testing interpretation

Do kids assign the same
meanings to a sentence
as adults? (More
meanings? Fewer
meanings?)



Pros:



Constraints on meaning
(e.g., Binding Theory)
The Truth Value
Judgment task is a
popular way to approach
this.

Fun for the kids.
Minimal extra cognitive
demands
Gets at alternative
meanings an act-out task
can’t reliably exclude
Con:

A trial takes a long time,
not many data points
collected.
TVJ

The idea: Set up a
context by telling a story.
Provide a test sentence
which is either true or
false of the situation (and
have the puppet say it).
Kid then either agrees
with the puppet and
rewards it, or disagrees
and punishes it. If the
puppet is wrong, the kid
is asked “What really
happened?”

Also: Kids often like
the puppet to be right,
and will more readily
agree with the
puppet.

So: stack the deck
against H1, and have
adult-impossible
readings correspond
to “yes” responses.
Principle C, for example.

Jumping competition (Crain &
Thornton 1998)


This is a story about a jumping
competition. The judge is Robocop.
Last year he won the jumping
competition, so this year he gets to
be judge. This year, these guys,
Cookie Monster, the Troll, and
Grover are in the jumping
competition. They have to try and
jump over this log, the barrels, and
the benches over here.
R: The winner of the competition
gets a great prize: colored pasta!
See, it’s in this barrel right here.
R: Troll, you
jumped very
well. You didn’t
crash into
anything at all.
You could be
the winner. But
let me judge
Grover before I
decide.

Now Troll
winning is a
possibility.

R: Grover, your jumps
were very good, too.
You didn’t knock
anything down, and
you were also very fast.
So, I think you were the
best jumper. You win
the prize, this colored
pasta. Well done,
Grover. Great job!

Robocop will remain by
Grover as a reminder.
Against all odds



T: No, Robocop, you’re wrong! I am
the best jumper. I think I should get
the prize. I’m going to take some
colored pasta for myself.
K: Let me try to say what happened.
That was a story about Robocop,
who was the judge, and Cookie
Monster, and Grover, and there was
the Troll. I know one thing that
happened. He said that the Troll is
the best jumper.
C: No!! Bad Kermit. Eat this rag.

Yet hei said that Trolli is the best jumper is
true.
TVJ



Distinguishing meaning1
(disallowed by adult
constraint) and meaning2
(allowed by adult
constraint).
Test sentence should be
true on meaning1, false
on meaning2.
Child judges puppet’s
report to be true (reward)
or false (punishment).




Evidence for meaning1
should be acted out last.
Linguistic antecedent for
meaning1 should be
mentioned (by puppet)
last.
What really happened?
Ensure that the test
sentence is relevant; it
must be clear why it is
true or false (condition of
plausible dissent).
Experimental design

What we’re trying to
determine is the degree
to which variables in the
situation affect one
another.


Does a Principle C
configuration preclude a
certain interpretation?
Does a subject whextraction preclude wanna
contraction?



Independent variables
are the presumed causal
variables.
Dependent variables are
the presumed caused
variables.
Nuisance or
confounding variables
are other factors that may
introduce systematic
“noise”
Between- and within-subjects

Between-subjects
designs vary
independent variables
with the subjects, so
each subject
represents one of the
values (levels) of the
independent variable.


Within-subjects
designs vary
independent variables
for each subject, so each
subject sees all of the
levels of the independent
variable.

Age, for example
(“Cross-sectional”).

Subject extraction and
object extraction, for
example.
Age, for another
(“Longitudinal”)
Task considerations


Ideally, we want
test items to be
distinguished by
just the factor
we’re looking at.
This is important
because other
things may play a
role and may
confound the
result.

If we find that kids are slower on:


Than on:



Who did Pat say met Chris?
Who met Chris?
Can we conclude that it takes
more time to process a longerdistance extraction?
Well, it could just take longer
because there are more words—
we need to rule that out if we want
to conclude that it has to do with
long-distance extraction.
Things that matter

Performing the task
on the test item
changes the subject.



Items should include
controls.

Ensure that the subjects are
performing the task.
Rule out confounding variables.
Present the same item

to them, they’ll
remember, it’ll affect  Fillers should (often) be
how they act.
included.
 Irrelevant items to mask the
Seeing a pattern in all
actual goal of the experiment
of the items they get
may lead to an
 Items should be presented in
irrelevant strategy.
different orders

ruling out another confounding
variable
Things that matter

The instructions given
matter a lot. Is the task
clear?



Circle 1 for grammatical, 5
for ungrammatical.
Circle 1 if the sentence
sounds ok, 5 if you would
never use it.
Give the puppet a cookie if
his sentence makes sense,
and give him a rag if his
sentence is silly.

Practice with
feedback


To confirm that the
subjects understand the
task (and are
comfortable that they
do), run a couple of
practice trials.
Practice items should not
be test items. Should be
relatively easy.
Things that matter

Balance the responses



If a “no box” will be
considered to have scored
perfectly, there is a huge
uncontrolled confound.
If you are testing for
obliviousness to a
constraint, but
obliviousness would yield
all “yes” responses (or a
big preponderance),
subjects may start to
“second guess”
themselves.
Fillers/controls are for this.

Balance the items




There should be the same
number of items at each level of
your independent variable(s).
This maximizes the power of
statistical analysis later.
If a subject misses one, not a
huge problem, but design it as a
nice “square” if you can.
Balance the conditions


Eliminate confounds.
Lexical items

(Known? Frequent? Ambiguous?
Long?)
Conditions



At the outset, we need to define what we’re
going to test for.
Suppose we’re going to do a simple test of the
that-trace effect, with some kind of acceptability
judgment task (for adults, say).
The question is: are sentences that violate the
that-trace filter worse than those that don’t?


Who did John say that left?
Which capybara did Madonna meet on Mars?
Confounds


Controlling for confounds is one of the
most important things you have to do.
That-trace filter violations are not the only
things that differentiate these sentences.
Who did John say that left?
 Which capybara did Madonna meet on Mars?

Confounds
Who did John say that left?
 Which capybara did Madonna meet on Mars?





Differences in lexical frequency can have
a big effect on processing difficulty/time.
Differences in plausibility can have a big
effect on ratings from subjects.
Differences in length can conceivably play
a role.
Differences in structure can have an effect.
Confounds
Who did John say that left?
 Which capybara did Madonna meet on Mars?


The point is: If you find that one sentence
is judged worse than the other, we’ve
learned nothing. We have no idea to what
extent the that-trace violation played a role
in the difference.
Confounds


You want to do everything you can to be testing exactly
what you mean to be testing for.
We can’t control frequency, familiarity, plausibility very
reliably—but we can control for them to some extent.



Who did John say that left?
Who did John say left?
Keep everything the same and at least they don’t differ in
structure, frequency, plausibility—only in that-trace.
(Well, and here, length).

However—note that length now works against that-trace, unless
shorter sentences are harder.
Conditions

To start, we might say we want to test two conditions:




Sentences with a that-trace violation
Sentences with no that-trace violation
But we can’t build these without a length confound—
holding everything else constant, we still have one fewer
words in the that-trace case. How do we solve this?
How can we show that the effect of the extra word that
isn’t responsible for the overall effect?
Conditions

The trick we’ll use is to have a second set
of conditions, testing only the exact length
issue. There’s no that-trace problem in
object questions, so we can compare:
Who did John say Mary met?
 Who did John say that Mary met?


to see how the difference compares to:
Who did John say met Mary?
 Who did John say that met Mary?

Factors

We now have two “factors”—our sentences differ in
terms of:



subject vs. object question
presence vs. absence of that
When we analyze the result, we can determine the
extent of the influence of the second factor by looking at
the object condition and comparing it to the
(disproportionately larger) effect of the presence of that
in the subject condition.
2x2 factorial design
Subject
extraction
Object
extraction


without that
Who do you think
likes John?
Who do you think
John likes?
with that
Who do you think
that likes John?
Who do you think
that John likes?
Often this is drawn in a table, with each
factor on a different dimension.
This is known as a 2x2 factorial design.
Context




It turns out that the context also seems to have an effect
on people’s ratings of sentences.
What comes before can color your subjects’ opinions.
This too needs to be controlled for.
One aspect of this is that we generally avoid showing a
single subject two versions of the same sentence (more
relevant when they’re more unique than the John and
Mary sentences)—the reaction to the second viewing
may be based a lot on the first one.
Another is that you want to give the sentences in a
different order to different subjects.
Strategy


You also don’t want your subjects to “catch on” to what
you’re testing for—they will often see that they’re getting
a lot of sentences with a particular structure and start
responding to them based on their own theory of
whether the sentence should be good or not, no longer
performing the task.
Nor do you want to include people who seem to simply
have a crazy grammar (or more likely just aren’t
understanding or doing the task).
Fillers




The solution to both problems is traditionally to use
“fillers”, sentences which are not really part of the
experiment.
These can provide a baseline to show that a given
subject is behaving “normally” and can serve to obscure
the real “test items.”
There’s no answer to “how many fillers should there be?”
but it shouldn’t be fewer than the test items, and
probably a 2:1 (filler:test item) ration is a good idea.
Fillers can’t be all good! About half should be bad.
Instructions and practice


Another vital aspect of this procedure is to
be sure that the subjects understand the
task that they are supposed to be
performing (and all in the same way).
The wordings of the instructions and the
rating scales are very important, and it’s a
good idea to give subjects a few “practice”
items before the test begins (clear cases
for which the answers are provided).
Instructions

“Is the sentence grammatical?” is not a good instruction.


The closest the naïve subject can come to “grammatical” will
probably be to evaluate based on prescriptive rules learned in
grammar classes—the term does not have the same meaning in
common usage.
“Is this a good sentence?” also has problems.


I’d never say that, I’d say it another way.
That could never happen.
Numerical / category ratings


How do you ask people to judge?
Good/bad


Forces a choice, for anything other than “certainly good” and
“certainly bad” there’s a chance that it doesn’t reflect the
subject’s actual opinion—no differentiation between “great!” and
“well, kind of ok”
Good/neutral/bad

Neutral also tends to get used for “I can’t decide” which is
different from “I’m confident it has an in-between status” (doesn’t
change much if you call it “in-between”)
Numerical / category ratings

Rate the sentence: (good) 1 2 3 4 5 (bad)


Some people will never use the ends of the
scale, likely to confound certainty with
acceptability. Also, for certain applications, “3”
is unusable.
Rate the sentence: (good) 1 2 3 4 (bad)

Can be treated as a categorial judgment, may
be able to factor out some personality
aspects. This is the one I tend to like best.
Online tasks


The nice thing about an online experiment is it to some
extent takes it “out of their hands.” The subject simply
reacts, and we time it.
Nevertheless, it is still important to ensure that the
subject is performing the task, paying attention.


Often can be addressed by questions about the sentence
afterwards they must answer.
Feedback can strengthen the motivation.










Descriptive, inferential

Any discussion of statistics anywhere (± a
couple) seems to begin with the following
distinction:

Descriptive statistics


Various measures used to describe/summarize an
existing set of data. Average, spread, …
Inferential statistics

Similar-looking measures, but aiming at drawing
conclusions about a population by examining a
sample.
Central tendency and
dispersion


A good way to summarize a set of
numbers (e.g., reaction times, test scores,
heights) is to ascertain a “usual value”
given the set, as well as some idea of how
far values tend to vary from the usual.
Central tendency:


mean (average), median, mode
Dispersion:

Range, variance (S2), standard deviation (S)
Data points relative to the
distribution: z-scores


Once we have the summary characteristics of a data set
(mean, standard deviation), we can describe any given
data point in terms of its position relative to the mean
and the distribution using a standardized score (the zscore).
The z-score is defined so that 0 is at the mean, -1 is one
standard deviation below, and 1 is one standard
deviation above:
xi  M
zi 
S
Type I and Type II errors

As a reminder, as we
Innoevaluate data sampled
cent
from the world to draw
conclusions, there are Convict Type I
four possibilities for
error
any given hypothesis:


The hypothesis is (in
reality) either true or
false
We conclude that the
hypothesis is true or
false.
Guilty
Correct
Acquit Correct Type II
error
This leaves two outcomes that
are correct, and two that are
errors.
Type I and Type II errors


The risk of making a Type
I error is counterbalanced
by the risk of making Type
II errors; being safer with
respect to one means
being riskier with respect
to the other.
One needs to decide
which is worse, what the
acceptable level of risk is
for a Type I error, and
establish a criterion— a
threshold of evidence that
is needed in order to
decide to convict.
InnoGuilty
cent
Convict Type I Correct
error
Acquit Correct Type II
error
You may sometimes encounter
Type I errors referred to as a errors,
and Type II errors as b errors.
Binomial / sign tests

If you have an experiment
in which each trial has two
possible outcomes (coin
flip, rolling a 3 on a die, kid
picking the right animal out
of 6), you can do a
binomial test.

Called a sign test if
success and failure have
equal probabilities (e.g.
coin toss)


Hsu & Hsu’s (1996) example: Kid
asked to pick an animal in
response to stimulus sentence.
Picking the right animal (of 6)
serves as evidence of knowing the
linguistic phenomenon under
investigation.
Random choice would yield 1 out
of 6 chance (probability .17) of
getting it right. Success.


Failure: probability 1-.17=.83
Chances of getting it right 4 times
out of 5 by guessing = .0035.
Chances of getting it right all 5
times is .0001.
Hypothesis testing



Independent variable is one
which we control.
Dependent variable is the
one which we measure, and
which we hypothesize may be
affected by the choice of
independent variable.
Summary score: What we’re
measuring about the
dependent variable. Perhaps
number of times a kid picks the
right animal.

H0: The independent variable has
no effect on the dependent
variable.


A grammatically indicated animal
is not more likely to be picked.
H1: The independent variable
does have an effect on the
dependent variable.

A grammatically indicated animal
is more likely to be picked.
Hypothesis testing

H0: The independent
variable has no effect on
the dependent variable.


If H0 is true, the kid has a 1/6th
chance (0.17) of getting one right in
each trial.

A grammatically indicated
animal is not more likely to
be picked.



H1: The independent
variable does have an
effect on the dependent
variable.

A grammatically indicated
animal is more likely to be
picked.

So, given 5 tries, that’s a 40% chance
(.40) of getting one.
But odds of getting 3 are about 3%
(0.03), and odds of getting 4 are
about .4% (0.0035).
So, if the kid gets 3 of 5 right, the
likelihood that this came about by
chance (H0) are slim.
=BINOMDIST(3, 5, 0.17, false)

Yields 0.03. 3 is number of
successes, 5 is number of tries, 0.17
is the probability of success per try.
True instead of false would be
probability that at most 3 were
successes.
Criteria



In hypothesis testing, a
criterion is set for rejecting the
null hypothesis.
This is a maximum probability
that, if the null hypothesis were
true, we would have gotten the
observed result.
This has arbitrarily been
(conventionally) set to 0.05.

So, if the probability p
of seeing what we see
if H0 were true is less
than 0.05, we reject
the null hypothesis.

If the kid gets 3
animals right in 5 trials,
p=0.03 — that is,
p<0.05 so we reject the
null hypothesis.
Measuring things

When we go out into the world and measure something
like reaction time for reading a word, we’re trying to
investigate the underlying phenomenon that gives rise to
the reaction time.

When we measure reaction time of reading I vs. they, we
are trying to find out of there is a real, systematic
difference between them (such that I is generally faster).
Measuring things





Does it take longer to read I than they?
Suppose that in principle it takes Pat A ms to read I and
B ms to read they.
Except sometimes his mind wanders, sometimes he’s
sleepy, sometimes he’s hyper-caffeinated.
Does it take longer for people to read I than they?
Some people read/react slower than Pat. Some people
read/react faster than Pat.
Normally…


Many things we measure, with their noise
taken into account, can be described
(at least to a good approximation) by this
“bell-shaped” normal distribution.
Often as we do statistics, we implicitly
assume that this is the case…
Properties of the normal
distribution

A normal distribution can be described in terms of two
parameters.


m = mean
s = standard deviation (spread)
Interesting facts about the
standard deviation



About 68% of the observations will be within one
standard deviation of the population mean.
About 95% of the observations will be within two
standard deviations of the population mean.
Percentile (mean 80, score 75, stdev 5): 15.9
Inferential statistics



For much of what you’ll use statistics for, the
presumption is that there is a distribution out in the
world, a truth of the matter.
If that distribution is a normal distribution, there will be a
population mean (m) and standard deviation (s).
By measuring a sample of the population, we can try to
guess m and s from the properties of our sample.
A common goal

Commonly what we’re after is an answer to the
question: are these two things that we’re
measuring actually different?

So, we measure for I and for they. Of the
measurements we’ve gotten, I seems to be
around A, they seems to be around B, and B is a
bit longer than A. The question is: given the
inherent noise of measurement, how likely is it
that we got that difference just by chance?
So, more or less, …


If we knew the actual mean of the variable
we’re measuring and the standard
deviation, we can be 95% sure that any
given measurement we do will land within
two standard deviations of that mean—
and 68% sure that it will be within one.
Of course, we can’t know the actual mean.
But we’d like to.
Estimating

If we take a sample of the population and
compute the sample mean of the measures we
get, that’s the best estimate we’ve got of the
population mean.


=AVERAGE(A2:A10)
To estimate the spread of the population, we use
a number related to the number of samples we
took and the variance of our sample.


=STDEV(A2:A10)
If you want to describe your sample (that is if you
have the entire population sampled), use STDEVP
instead.
t-tests

Take a sample from the population and measure it.
Say you took n measurements.


Your hypotheses determine what you expect your
population mean to be if the null hypothesis is true.


Population estimates:
mM = AVERAGE(sample), sM = SQRT(VAR(sample)/n)
We’re actually considering variability in the sample means
here—what is the mean mean you expect to get, and what is
the variance in those means?
You look at the distance of the sample mean from the
estimated population mean (of sample means) and
see if it’s far enough away to be very unlikely (e.g.,
p<0.05) to have arisen by chance.
t-tests


Does caffeine affect heart rate (example from Loftus & Loftus
1988)?
Sample 9 people, measure their heart rate pre- and postcaffeination. The measure for each subject will be the
difference score (post-pre). This is a within-subjects design.



Estimate the sample mean population: mM=AVERAGE(B1:B10)=4.44
sM=SQRT(VAR(B1:B10)/COUNT(B1:B10))=1.37
t-score (like z-score) is scaled (here, against estimated standard
deviation), giving a measure of how “extreme” the sample mean was that
we found.
If the t-score (here 3.24) is higher than the criterion t (2.31,
based on “degrees of freedom” = n-1 = 8) and desired a-level
(0.05), we can reject the null hypothesis: caffeine affects heart
rate.
t-tests: 2 sample means

The more normal use of a t-test is to see if two sample
means are different from one another.




H0: m1 = m2
H1: m1 > m2
This is a directional hypothesis—we are investigating
not just that they are different, but that m1 is more than
m2.
For such situations, our criterion t score should be onetailed. We’re only looking in one direction, and m1 has to
be sufficiently bigger than m2 to conclude that H0 is
wrong.
Tails

If we are taking as our alternative hypothesis (H1) that two means
simply differ, then they could differ in either direction, and so we’d
conclude that they differ if the one were far out from the the other in
either direction. If H1 is that the mean will increase, then it is a
directional hypothesis, and then a one-tailed criterion is called for.
t-tests in Excel


If you have one set of data in column A,
and another in column B,
=TTEST(A1:A10, B1:B10, 1, type)
Type is 1 if paired (each row in column A
corresponds to a row in column B), 2 if
independently sampled but with equal
variance, 3 if independently sampled but with
unequal variance.
 Paired is generally better at keeping variance
under control.

ANOVA


Analysis of Variance (ANOVA), finding
where the variance comes from.
Suppose we have three conditions and we
want to see if the means differ.

We could do t-tests, condition 1 against
condition 2, condition 1 against condition 3,
condition 2 against condition 3, but this turns
out to be not as good.
Finding the variance




The idea of the ANOVA is to divide up the total
variance in the data into parts (to “account for the
variance”):
Within group variance (variance that arises within a
single condition)
Between group variance (variance that arises between
different conditions)
ANOVA:
SS df
MS F
p
Fc
between groups
…
5
..
2.45 0.0452.39
within groups
…
54 ..
total
Confidence intervals



As well as trying to decide if your observed sample is
within what you’d expect your estimated distribution
to provide, you can kind of run this logic in reverse as
well, and come up with a confidence interval:
Given where you see the measurements coming up,
they must be 68% likely to be within 1 CI of the
mean, and 95% likely to be within 2 CI of the mean,
so the more measurements you have the better
guess you can make.
A 95% CI like 209.9 < µ < 523.4 means “we’re 95%
confident that the real population mean is in there”.

=CONFIDENCE(0.05,
STDEV(sample),
COUNT(sample))
Correlation and Chi square



Correlation between two two
measured variables is often
measured in terms of
(Pearson’s) r.
If r is close to 1 or -1, the value
of one variable can predict
quite accurate the value of the
other.
If r is close to 0, predictive
power is low.

Chi-square test is
supposed to help us
decide if two
conditions/factors are
independent of one
another or not. (Does
knowing one help
predict the effect of
the other?)



















