Transcript PowerPoint

GRS LX 865
Topics in Linguistics
Week 6. Statistics etc.
Update on our sentence
processing experiment…

Quick graph of reaction time per region
2500
2000
1500
NP
I
He
John
1000
500
0
1
2
3
4
5
6
7
8
Update

Seems nice; there’s a difference in region
5 (where the NP, I, they, John were) and
also in region 6. From slowest to fastest,
John, he, NP, I.

Something like what we expected—but
wait…
Update

Two further things that this didn’t account
for:
Different people read at different speeds.
 I is a lot shorter than the photographer. Might
it go faster?


To take account of people’s reading
speeds, tried average RT per character on
the fillers.
Subject RT/c

Average RT per
character was pretty
much all over the
map.
4
3.5
3
2.5
2

So at least it seemed
worth factoring out.
N
1.5
1
0.5

Overhead?
0
70
100 130 160 190 220
Items?

It’s also important to
look at the items.
Were any always
incorrect? Those
might have been too
hard or had
something else wrong
with them. ?

(Not clear that we
actually care whether
the answer was right)
1
0.9
0.8
0.7
0.6
0.5
%corr
0.4
0.3
0.2
0.1
0
1
4
7 10 13 16 19
End result so far?

So, taking that all into account, I ended up
with this… Not what we were going for.
2500
2000
1500
np
I
1000
they
name
500
0
1
-500
2
3
4
5
6
7
8
So…



There’s still work to be done. Since I’m not
sure exactly what work that is, once
again… no lab work to do.
Instead, we’ll talk about statistics
generally…
Places to go:
http://davidmlane.com/hyperstat/
 http://www.stat.sc.edu/webstat/

Measuring things

When we go out into the world and measure
something like reaction time for reading a word,
we’re trying to investigate the underlying
phenomenon that gives rise to the reaction time.

When we measure reaction time of reading I vs.
they, we are trying to find out of there is a real,
systematic difference between them (such that I
is generally faster).
Measuring things

So, suppose for any given person, it takes
A ms to read I and B ms to read they.

If our measurement worked perfectly, we’d
get A whenever we measure for I and B
whenever we measure for they.

But it’s a noisy world.
Measuring things

Measurement never works perfectly.

There is always additional noise of some kind or
another. You’re likely to get a value near A when
you measure I, but you’re not guaranteed to get
A.

Similarly, there are differences between
subjects, differences between items, differences
of still other sorts…
A common goal

Commonly what we’re after is an answer to the
question: are these two things that we’re
measuring actually different?

So, we measure for I and for they. Of the
measurements we’ve gotten, I seems to be
around A, they seems to be around B, and B is a
bit longer than A. The question is: given the
inherent noise of measurement, how likely is it
that we got that different just by chance?
Some stats talk

There are two major uses for statistics:



Describing a set of data in some comprehensible way
Drawing inferences from a sample about a
population.
That last one is the useful one for us; by picking
some random representative sample of the
population, we can estimate characteristics of
the whole population by measuring things in our
sample.
Normally…


Many things we measure, with their noise
taken into account, can be described
(at least to a good approximation) by this
“bell-shaped” normal distribution.
Often as we do statistics, we implicitly
assume that this is the case…
First some descriptive stuff

Central tendency:
What’s the usual value for this thing we’re
measuring?
 Various ways to do it, most common way is by
using the arithmetic mean (“average”).


Average is determined by adding up the
measurements and dividing by the number
of measurements.
Descriptive stats

Spread




How often is the measurement right around the
mean? How far out does it get?
Range (maximum - minimum), kind of basic.
Variance, standard deviation: a more sophisticated
measure of the width of the measurement distribution.
You describe a normal distribution in terms of
two parameters, mean and standard deviation.
Interesting facts about stdev



About 68% of the observations will be within one
standard deviation of the mean.
About 95% of the observations will be within two
standard deviations of the mean.
Percentile (mean 80, score 75, stdev 5): 15.9
So, more or less, …


If we knew the actual mean of the variable
we’re measuring and the standard
deviation, we can be 95% sure that any
given measurement we do will land within
two standard deviations of that mean—
and 68% sure that it will be within one.
Of course, we can’t know the actual mean.
But we’d like to.
Confidence intervals



It turns out that you kind run this logic in reverse
as well, coming up with a confidence interval (I
won’t tell you how precisely, but here’s the idea):
Given where you see the measurements coming
up, they must be 68% likely to be within 1 CI of
the mean, and 95% likely to be within 2 CI of the
mean, so the more measurements you have the
better guess you can make.
A 95% CI like 209.9 < µ < 523.4 means “we’re
95% confident that the real mean is in there”.
Hypothesis testing




Testing to see if the means generating two
distributions are actually different.
The idea is to determine how likely it is that we
could get the difference we observe by chance.
After all, you could roll 25 6’es in a row, it’s just very
unlikely. (1/6)^25. (Null hypothesis = chance).
Once you estimate the sample means and standard
deviations, this is something you basically look up
(t-test, based on number of observations you
make). This is what you see reported as p.
“p < 0.05” means there’s only a 5% chance this
happened by accident.
Significance


Generally, 0.05 is taken to be the level of
“significance”—if the difference you
measure only has a 5% chance of having
arisen by pure accident, than that
difference is significant.
There’s no real magic about 0.05, it’s just
a convention. Hard to say that 0.055 and
0.045 are seriously qualitatively different.
ANOVA


Analysis of variance—same as the t-test,
except for more than two means at once.
Still trying to discover if there are
differences in the underlying distributions
of several means that are unlikely to have
arisen just by chance.
I hope to come back to this. Perhaps it can
be tacked on to a different lab.
Statistical power


In general, the more
samples you get, the
better off you are—the
more statistical power
your analysis has. Also,
the lower the variance,
the significant level
you’ve chosen.
Technically, statistical
power has to do with how
likely it is that you will
correctly reject a false
null hypothesis.
H0 true H0
false
Reject
H0
Type I
error
Correct
Do not
reject
H0
Correct Type II
error
Correlation and Chi square



Correlation between two
two measured variables
is often measured in
terms of (Pearson’s) r.
If r is close to 1 or -1, the
value of one variable can
predict quite accurate the
value of the other.
If r is close to 0,
predictive power is low.

Chi-square test is
supposed to help us
decide if two
conditions/factors are
independent of one
another or not. (Does
knowing one help
predict the effect of
the other?)
Much more to it…

Mainly I just wanted you to see some
terminology. I hope to get some workable
data from some experiment or lab we do
that we can put into a stats program,
perhaps just WebStat.

…









