Statistics and correlation

Download Report

Transcript Statistics and correlation

Statistics
Why do we need statistics?
To describe data:
A-average
B-quartile (ACT tests)
C-percentile
Why do we need statistics?
To find relationships
●
Is use of 'like' related to age?
●
Do people who learn a language
earlier learn it better?
Why do we need statistics?
To test a hypothesis
●
Is Shakespeare's vocabulary larger
than King James Bible's?
●
Do men interrupt more than women?
Null Hypothesis
● Hypothesis: Women talk more than men
● Null hypothesis: There is no difference
between women and men'
Null Hypothesis
● Hypothesis: Women talk more than men
● Null hypothesis: There is no difference
between women and men'
● Hypothesis: Program X classifies parts of
speech more accurately than program Y
● Null hypothesis: There is no difference
between program X and Y
Statistical Significance
● A significant difference is better than one in twenty of
happening by chance (p < .05). The opposite of significance
is random chance.
Statistical Significance
● A significant difference is better than one in twenty of
happening by chance (p < .05). The opposite of significance
is random chance.
● What if test had only 4 multiple choice questions and only one
person took it and was rolling dice to determine answer? How
many times could the person take the test with dice and get
an 80% or better? The probability is high (over 1/20) that it
will happen. If 100 people took the test the chances of getting
an average 80% or better by rolling dice go way down (less
than 1/20). If the test has 100 questions the possibility goes
way down also.
Statistical Significance
● Consider a commercial that claims that four out of five
dentists recommend toothpaste X. If only five dentists were
actually consulted would you be impressed? Would you not
be more motivated to buy it if 4,000 out of 5,000 dentists
recommended toothpaste X, in spite of the fact that 4/5 and
4000/5000 are both 80%? In like manner, statistical formulas
take into consideration factors such as the number of
subjects, responses, and test items when calculating the
statistical significance.
● In other words, an 80% vs. 85% score may not be significant
if there are few test takers and few items, but an 80% vs. 81%
may be significant if the test is long and many people took the
test. Statistics takes this into consideration.
Types of Data
Categorical
Gender: male or female
Country of origin: Korea, Canada, Brazil, France
Education: high school graduate or not
Ethnicity: Hispanic, Caucasian, Asian, Black, Polynesian
Types of Data
Categorical
Childhood language background: monolingual, bilingual, multilingual
Prodrop: subject pronoun used with verb, subject pronoun not used with
verb
Language abilities of participant: native, non-native
Teaching method: total physical response, audiolingual, grammar
translation
Which word is used for “large sandwich”?: hoagie, subway, grinder, po
boy
Types of Data
Ordinal
The order in which children acquire certain morphemes.
The way a test participant orders a series of five recordings of
non-natives from “most fluent” to “least fluent.”
Types of Data
Continuous
●
Age
●
Number of years of formal schooling
●
Months spent living in a foreign country
●
Time required to recognize a word during an experiment
●
Frequency of a formant
●
Duration of consonant closure
●
Hours spent sending text messages
Variables
Characteristics that change from situation to situation, object to object, or
person to person.
– Biographical variables (What kind are they?)
● age
● number of children
● ethnicity
● state of residence
● birth order among siblings
Dependent and Independent Variables
● What is the effect of X on Y?
– X is independent
– Y is dependent (you measure it)
Dependent and Independent Variables
Idea: People seem to use 'myself' as the nonreflexive object of a preposition rather than as
a reflexive a lot more nowadays (e.g. “as for
myself”).
Dependent and Independent Variables
Idea: People seem to use 'myself' as the nonreflexive object of a preposition rather than as
a reflexive a lot more nowadays (e.g. “as for
myself”).
Quantified question: What is the effect of
time (1950s, 1960s, etc.) on the use of 'myself'
as the non-reflexive object of a preposition?
What are the variables?
Dependent and Independent Variables
Idea: People seem to use 'myself' as the non-reflexive
object of a preposition rather than as a reflexive a lot more
nowadays (e.g. “as for myself”).
Quantified question: What is the effect of time (1950s,
1960s, etc.) on the use of 'myself' as the non-reflexive object of a
preposition?
Variables: Time is a continuous independent variable and number of uses
of 'myself' as the object of a preposition is a continuous dependent
variable.
Dependent and Independent Variables
Idea: It seems that women always outnumber
men in foreign language classes.
Dependent and Independent Variables
Idea: It seems that women always outnumber
men in foreign language classes.
Quantified question: What is the effect of
gender on enrollment in foreign language
classes?
What are the variables?
Dependent and Independent Variables
Idea: It seems that women always outnumber
men in foreign language classes.
Quantified question: What is the effect of gender
on enrollment in foreign language classes?
Variables: Gender is the categorical independent
variable and number of students enrolled is the
continuous dependent variable.
Dependent and Independent Variables
Idea: I wonder if daily consumption of greasy
American-style fast food is likely to shorten
my life?
Dependent and Independent Variables
Idea: I wonder if daily consumption of greasy
American-style fast food is likely to shorten
my life?
Yes
Correlation
Question answered: What is the relationship
between two variables?
Type of variables used: Both continuous.
Correlation
Examples:
1 How are second language proficiency and degree of cultural adaptation related?
2 What is the relationship between vowel backness and how big an object
represented by a nonce word with back (or front) vowels is perceived to be?
3 How is word frequency related to the amount of time required to name a word?
4 How does income relate to happiness?
Correlation
Do southerners who move away from the South
shift the pronunciation [aɪ] to [a] over time?
Correlation
Speaker
% [a]
Years Away
1
98
1
2
82
1
3
99
2
4
65
3
5
90
3
6
85
5
7
75
5
8
50
5
9
75
6
10
55
6
11
85
7
12
70
8
13
30
8
14
55
9
15
80
9
16
25
10
Correlation
● Line slopes down =negative correlation
Correlation
● What is the effect of education on income?
Correlation
● What is the effect of education on income?
● Line slopes up=positive correlation
Correlation
● What does this correlation tell you?
● Is it positive or negative?
Correlation coefficient
● Called r
● Ranges from +1 to -1
● Shows direction of correlation (neg pos)
● Shows strength of correlation
Correlation coefficient
● r = .79
Correlation coefficient
● What is the effect of water's volume on its
weight?
Correlation coefficient
● What is the effect of water's volume on its
weight?
● What is r?
Correlation coefficient
● What is the effect of water's volume on its
weight?
● What is r?
●r = 1
Regression Line
● The regression line is the closest line that
can be drawn to the data points.
Interactive graph
What is p?
● The probability of getting the results by
chance
– r = 1 with two data points
– r = 1 with 1000 data points
● 1 in 20 chance or smaller of getting results
by chance is called statistically significant
What is p?
● The probability of getting the results by
chance
– r = 1 with two data points
– r = 1 with 1000 data points
● 1 in 20 chance or smaller of getting results
by chance is called statistically significant
● 1/20 =.05, so p ≤ .05 is significant
What is p?
● The probability of getting the results by
chance
– r = 1 with two data points
– r = 1 with 1000 data points
● 1 in 20 chance or smaller of getting results
by chance is called statistically significant
● 1/20 =.05, so p ≤ .05 is significant
● Smaller p is MORE significant
Number of months in a foreign country and
linguistic abilities in the country's language
(positive or negative?)
● What would this mean? R = 0.56, p < .03
● What would this mean? R = 0.56, p < .07
Number of native dialectal usages and time
spent living outside of native dialect area
(negative or positive?)
● What would this mean? R = -.23, p < .0001
● What would this mean? R = -.67, p < .0001
What is the past test of spling?
What is the past tense of creeze?
Computer
People
Computer
People
splung 35%
splung 22%
croze 12%
croze 6%
Correlation and Causation
Correlation and Causation
● Does wealth cause belief in evolution?
● Does belief in God cause poverty?
Correlation and Causation
● Utah has highest use of antidepressants
● Utah has highest percentage of LDS
Correlation and Causation
● Utah has highest use of antidepressants
● Utah has highest percentage of LDS
● Utah has highest use of thyroid medicine
● Utah has highest autism rate
Correlation and Causation
● Utah has highest use of antidepressants
● Utah has highest percentage of LDS
● Utah has highest use of thyroid medicine
● Utah has highest autism rate
● Utahns go to doctors more
● Utahns don't self medicate with alcohol
(as much)
Correlation and Causation
● Number of drownings is positively
correlated with ice-cream sales
Correlation and Causation
● Number of drownings is positively
correlated with ice-cream sales
● Bad oral health is correlated with
Alzheimer's
Correlation and Causation
● Number of drownings is positively
correlated with ice-cream sales
● Bad oral health is correlated with
Alzheimer's
– What are other reasons for this?