Descriptive statistics 2012_13

Download Report

Transcript Descriptive statistics 2012_13

Descriptive
statistics
922
What do we need to run an
experiment?
Hypothesis (Linguistic)
 Participants
 Task (stimuli = questions, responses = answers)
 Results
 Conclusions


Key terms: stimulus design, response measure
Example


Show me the cat that
bit the dog
Show me the cat that
the dog bit
Picture from:
Friedmann &Novogrodsky (2001)
Design
Number of conditions
 Within subject / between subject
 How many items to each participant
 Order of items

Measure Response
Variables
 Scales
 Analysis

 Descriptive
 Inferential
Variables
Any experimental category that has a
value that can vary.
 Anything that is not constant and can
change over time, or be different in
different people is a variable
 Variables can take many forms
 Variables can be manipulated and
observed

Properties of Variables
Continuous variable – along a continuum
with equal intervals (e.g., age, height,
weight, grade in a test)
 Ordinal variables – rating along a
continuum with estimated intervals (e.g.,
evaluation)
 Discrete variables (categorical, nominal) –
divide to categories (e.g., language,
yes/no, correct/incorrect)

Types of Variables
Independent variables –
Characteristics of the subject (Participant
variable)
Conditions chosen by the experimenter
 Dependent variables – what the experiment
measures (e.g., degree of success)
 Intervening variables – variables which are
not measured or manipulated, but could
influence the results (e.g., concentration,
intelligence)

Scales
Nominal
 Ordinal
 Interval
 Ratio

Scales
Nominal
 Ordinal
 Interval
 Ratio


Two things with the
same number are
similar (same name)
Scales
Nominal
 Ordinal
 Interval
 Ratio


Four is more than
three (but not the
same as three from
two)
Scales
Nominal
 Ordinal
 Interval
 Ratio


Four is more than two
(but not twice)
Scales
Nominal
 Ordinal
 Interval
 Ratio


Four is more than
three, same as three
from two, and is twice
two
Which scale are the following
variables rated on?









Height
Celsius degrees
TV channel number
Grades in an exam (1-100)
Psychological rating (anxiety on a scale of 110)
Time (13:00, 14:00)
Time (one hour, two hours, three hours)
Phone number
Rating places in a race
Variables and Scales: summary
Choose an appropriate task
 Measure responses
 Be aware of the variables and their
properties
 Choose the mathematical operations
appropriate for the scale

Factorial design
Tests all possible combinations, e.g., a 2x2
design – one participant variable and one
independent variable with two conditions.
Subject
relatives
TLD
SLI
Object
Relatives
Practical questions for offline
tasks
How many subjects? At least 25
 How many categories? 2x2
 How many items? More subjects >> fewer
items.

For 25 – 6 items per category
 For 50 – 3 is enough
 For case studies and within subject analysis
at least 10.

SIMPLE NUMERICAL
COMPUTATIONS
Ratio

The relation between
two nominal variables
N
Nouns
80

V/N ratio: 60/80=3/4
Verbs
60

N/V ratio: 80/60=4/3
Other
words
Total
50
190
Example

Goofy said that the
Troll had to put two
hoops on the pole to
win.
Does the Troll win?

Musolino (2004)

Ratio
N
Yes
8
No
12
Didn’t
answer
Total
10
37

Yes/no ratio:
8/12=2/3
Proportion
Relation between a group and its part
(Verb/Word, Pronouns/Subject position).
Ratio out of the total
 Verb/Word proportion: 60/190=1/3=0.31

Percentage (%)
Relative proportion out of a hundred
 Verb percentage (out of all words):
100*(60/190) =31%

Rate
The relative frequency (for population out
of a 1000)
 7%
of children have SLI
 >> 0.07 * 1000 = 70
 70 children out of a 1000 have SLI
Frequency
Count the number
of times a score
occurs.
 How many times a
value of a variable
occurs?

Example


Show 10 pictures,
and check for number
of “correct” response
Is every bunny eating
a carrot?
Roeper, Strauss and Zurer
Pearson (2004)
Picture
correct
1
1
2
1
3
0
4
0
5
0
6
0
7
1
8
1
9
1
10
1
Total
6
Frequency

Count the number
of times a score
occurs
Child
Score
1
8
2
8
3
6
4
6
5
6
6
6
7
2
8
2
Frequency
Raw score
Frequency
Child
Score
Score
Frequency
1
8
2
2
2
8
6
4
3
6
8
2
4
6
5
6
6
6
7
2
8
2
Frequency=how many children got
this score
Frequency graph


Score on the test is
the horizontal axis
(X-axis)
Frequency is on the
vertical axis (Y-axis)
Percentile
Grade
Frequency
100
90
80
70
60
50
Total N
2
5
10
8
4
1
30

cumulative
frequency
30
28
23
13
5
1
percentile
100%
93%
77%
43%
17%
3%
The cumulative frequency - how many scores are
below a particular point in the distribution
Percentile = 100(Cumulative Frequency/Total N)
Frequency polygon (the curve)
Frequency distribution
N of student
12
10
8
6
4
2
0
50
60
70
80
90
100
Grade
The frequency polygon (the curve) is a picture of the data
Types of distributions (Fig. 4.3
&4.4, pp. 113-116)
Peak
Tails
A bell shaped curve - a symmetric distribution,
a unimodal distribution (one midpoint, one peak),
normal distribution
Pointy distribution (Leptokutic)
Flat distribution (Platykutic)
In skewed distribution the tail is skewed in one direction:
Positively skewed distribution - most scores are low, the
tail is directed towards the high (positive) scores which
skewed the distribution
Negatively skewed distribution - most scores are high, the
tail is directed towards the low (negative) scores which
skewed the distribution
Bimodal distribution - a double peaked curve
Descriptive Statistics - Some
definitions
Min (the lowest score) and Max (the
highest score)
 Range – the range of observed values.
Range = Max-Min


But the range changes with the extreme
scores (unstable but useful informal
measure).
Mode - most frequently obtained score
 Mean (average) – average of a set of
numbers
 Median – the middle score of a group
(when odd) or the average of the two
middle scores (when even)
In a bell curve (normal) distribution mode,
mean and median will be the same

Mode
Grade
Frequency
50
1
60
4
70
8
80
10
90
5
100
2
total
30


Which grade is most
frequent?
Highest in “frequency”
column
Mean (average)
Grade
Frequency
50
1
60
4
70
8
80
10
90
5
100
2
total
30


Compute a sum of all
grades
Divide by number of
grades
Mean (average)
Grade x times
50x1
50
60x4
240
70x8
560
80x10
800
90x5
450
100x2
200
total
2300
mean
2300/30
76.66
Median
Grade
Frequency
50
1
60
4
70
8
80
10
90
5
100
2
total
30


Order all grades in a
row according to
value
The grade in “the
middle” of the row is
the median
Median
Grade
Frequency
50
1
60
4
70
8
80
10
90
5
100
2
total
30
We have a row of 30
grades:
50,60,60,60,60,70…
 Half of 30 is 15
 The grade in the 15th
position is the median

Median
Grade
Frequency
50
1
60
4
70
8
80
10
90
5
100
2
total
30


Slight complication:
we have 15 grades on
both sides of the
median
Compute mean of the
grades in the 15th and
16th positions
Questions:
Are both curves the same? How?
Are they different? How?
We need to measure the accuracy of the mean.
(Figure from Hatch & Farhady 1982, p.56)
Variability
Coming attractions

How to draw valid statistical inferences?
 We
have to look at the relation between our
sample and the population

Today we looked at where the ‘center’ of
the data is – what is the big picture
 Look
at variance, how the data is distributed
Deviation
The distance between a score and the Mean (see Table 4.2,
p. 125), how much a score deviates from the average
Sum of squared errors (SS)
Variance
Average error in the sample, average error
in the population
 Variance in the sample = SS/N
33.7143/7=4.8163
 Variance in the population = SS/(N-1)
33.7143/6=5.6191
 Why N-1? Degree of freedom (read box
4.5, page 129)

Standard deviation (SD)

The average distance between a score and the
Mean (square root of the Variance)
SD= √5.6191 = 2.37

What can SD tell us about the distribution (pointy
distribution vs. flat distribution)?
Standard Error (SE)
How well does the sample represent the
population?
 Different samples of the population might
yield different means. The SE is the
average of the SDs of the means of
several samples. Large value - big
difference, small value- small difference.

SE = SD/√ N
Confidence Interval

The limits within which 95% or 99% of the
samples fall
Lower boundary = Mean-2SE
 Upper boundary = Mean+2SE

Inferential
statistics
z-score and T-score

How can we use the standard deviation
(SD) to compare two samples? two
exams? two tests?
We translate the raw scores into distance in SD
from the mean, by subtracting the mean from the
raw score and dividing by the SD.
So for Table 4.2:
1-3.57
8-3.57
--------- = -1.08
--------- = 1.86
2.37
2.37
These scores are z-scores. Some zscores are negative and some are
positive. Why?
So for Table 4.2:
1-3.57
8-3.57
--------- = -1.08
--------- = 1.86
2.37
2.37
These scores are z-scores. Some zscores are negative and some are
positive. Why?
If you prefer a scale with only positive
numbers, you can use the T-score
T score = 10 * z-score +50
10 * -1.08 +50 = 39.2
10*1.86+50 = 68.6
A few words on Covariance and
Pearson correlation

Covariance - how much two variables co-vary?
Cov = (X - X) (Y- Y)

But we are interested in sets of scores so we
need to sum up all the individual covariance and
divide, as always by N-1.
Σ (X-X)(Y-Y)
COVxy= ---------------------N-1

What do we need covariance for? To measure
correlations (Pearson correlation coefficient is
considered the best way to estimate correlation
between X & Y).

Since the two samples do not have the same
SD, we must adjust the covariance to the
amount of variation
COVx y
r= -------------SDx * SDy
What does r mean?
Positive r - positive correlation
 Negative r - negative correlation
 Small r - small correlation
 Big r - big correlation

inferential statistics.xls
Effect size






We can use correlations to measure
experimental effect size
r2 - the coefficient of determination - is the
fraction of the variance that is accounted for by a
linear correlation.
r=0.1 (small effect) - only 1% of the variance is
accounted for by our task (1%=.01=r2)
r=0.3 (medium effect) - 9% of variance is
accounted for by our task (9%=.09=r2)
r=0.5 (large effect) - 25% of variance is
accounted for by our task (25%=0.25=r2)
r = 1 A perfect effect
Probability




How probable it is to get a certain correlation?
How probable is it to get a certain score?
How probable is it to get a certain mean?
How probable is it that two samples are the
same/different?
 Playing "Head or
 Throwing a dice.

tails?"
Probability can be calculated by dividing the
number of desired events by the number of
possible outcomes.
Or by relaying on SD
What is the probability of getting a score above the mean?
What is the probability of getting a score which is up to 1SD
above the mean? up to 1SD from the mean? (For every zscore there is a probability)
Confidence Interval

The limits within which 95% of the
samples fall
Lower boundary = Mean-2SE
 Upper boundary = Mean+2SE

Hypothesis testing
How likely is it (how probable is it) that our
hypothesis is right?
 The probability that some results could
happen by chance is less than 5% (or 1%)
 p<0.05 (or p<0.01) - the level of
significance





Null hypothesis - there is no difference between
our sample and the population
Positive hypothesis - the sample does better
than the population.
Negative hypothesis - the sample worse better
than the population
Alternative hypothesis - the sample is different
but there is no direction.
p<0.05
(Figures from Hatch & Farhady 1982, p.87)
p>0.05




If the data falls in the shaded area of 8.5 - the
null hypothesis is confirmed
If the data falls in the shaded area of 8.6 - the
null hypothesis is rejected
If the data falls in the shaded higher tail of 8.6 the scores are higher than the population and
the null hypothesis is rejected
If the data falls in the shaded negative tail of 8.6
- the scores are lower than the population and
the null hypothesis is rejected
Since there is no direction specified by the
null hypothesis, we must consider both
tails - thus we use a two tailed test (with
.025 in each tail).
 If we test a directional hypothesis, the
level of significance applies to one tail
only.

(Figures from Hatch & Farhady 1982, p.88)
A score in the shaded area in 8.7 confirms the
_____________ hypothesis
A score in the shaded area in 8.8 confirms the
_____________ hypothesis