Why study statistics in psychology, variables and frequency

Download Report

Transcript Why study statistics in psychology, variables and frequency

Statistics
• This lecture covers chapter 1 and 2 sections 3.1-3.2 in Howell
• Why study maths in psychology?
• “Mathematics has the advantage of teaching
you the habit of thinking without passion. You
learn to use your mind primarily upon material
where passion can’t come in, and having
trained it that way you can then use it
passionately upon matters about which you feel
passionately. Then you’re much more likely to
come to true conclusions” - Bertrand Russell
1
Statistical terminology
• 2 types of statistics:
• Descriptive - describe a sample or population
• Inferential - draw inferences about relationships
between samples and populations
• Samples and populations:
• Population: complete set of events we are
investigating (eg all IQ scores)
• Sample: subset of a population (IQ scores of 10
people)
2
3
Terminology 2
• Statistics and parameters:
• Statistic: a number which speaks about a
sample (abbreviated with a latin letter, eg. s)
• Parameter: a number which speaks about a
population (abbreviated with a greek letter eg
)
• Variable:
• a property of an object/event that is measured
Variables
• Statistics allows one to look at variables
• behaviour
• relationships between variables
• Types of variables:
• Discrete variables: can only take on certain
values, eg:
• 1 2 3 4 5 …. (only whole numbers)
• 1.5 2 2.5 3 3.5 4 4.5…. (whole numbers and
halves)
• Examples: gender, number of children, sexual
preference
4
Variables (2)
• Continuous variables
•
•
•
•
Can take on any value
(there exists a value between any two values)
eg: 1, 1.1, 1.11, 1.111, 1.1111, 1.11111…..
Examples: length, age, IQ, dosage of Valium
• For stats, all variables must contain only
numbers
• convert “word” values into numbers
• eg: male/female becomes 100/101
5
Scales of measurement
• Not all statistical techniques can be applied
to all types of variable
• which is more - male or female?
• By looking at the property a variable
represents, and how that property was
measured (its scale), we can decide if a
particular technique is appropriate
6
7
Nominal scale
• Simply labels items
•
•
•
•
• 723 = male, 742=female, 857=Prince
Differences between numbers mean nothing
Order of numbers mean nothing
Often expressed as words rather than
numbers
Cannot do very much stats with nominal
scales
8
Ordinal Scale
• Labels items, puts them in order
• Eg expense 1 = Woolworths, 2 = Pick n Pay, 3 = Shoprite
• Differences between numbers mean nothing
• eg. 4 is not twice as bad as 2
• Order is important
• eg. 1 is the best, 5 is worse than 1-4 but better
than 6 down, etc.
• Useful in ranking items (highest to lowest)
when specific values are not important
Interval Scale
• Order is important, as is the difference
between points
• eg. Degrees celcius: 10 C is the same distance
from 0 C as 40 C is from 50 C
• BUT: it has no absolute zero, so cannot
speak about multiplication
• eg. “40 is twice as much as 20” - WRONG!
• Most Likert-type items are of this scale
9
Ratio Scale
• The most versatile: has differences and
multiplication
• 40 is twice as much as 20, AND 40-30 = 110100
• It is like an interval scale, but has an
absolute zero.
• Very few in psychology: IQ is the best
known
10
Notes on the scales
• Discrete variables may be on the nominal or
ordinal scales only
• Continuous variables can be on any, mostly
interval & ratio
• Difficult to decide what scale a variable
belongs to
• “Absolute zero” is contentious
• Making a wrong decision can lead to silly stats
- the average family has 2.3 children!!
11
Frequency
• A descriptive statistic
• Applies to all scales of measurement
• Asks: How often did particular things come
up?
• Mostly a matter of counting!
12
Expressing frequency
• Work with four varieties of frequency
• Frequency: how often did this observation
occur?
• Eg. How many males in this sample?
• Cumulative frequency: how often has this
score, or scores less than this score, occurred?
• Eg. How many people scored 25 marks or less for
the test?
13
Expressing frequency
• Percentage frequency: frequency expressed as a
percentage of all observations
• Eg. 52% of all Capetonians are male
• Percentage cumulative frequency: cumulative
frequency expressed as a percentage of all
observations
• Eg. 30% of the class failed the test
14
15
Frequency tables
• All 4 types of frequency are summarised on
a frequency table, which has the columns:
• Value F Cum. F %F
% Cum F.
Making a freq table - discrete var
• Given a sample of x, a discrete variable
which ranges from 1-6:
•3 3 5 2 4 3 3 5 6 2 4
• Start the table by putting in the values:
•
•
•
•
•
•
•
Value
1
2
3
4
5
6
F
Cum F %F
% Cum F
16
17
Working out F
• Add in the F - count how often each value
occurs, add it in
•
•
•
•
•
•
•
Value
1
2
3
4
5
6
F
0
2
4
2
2
1
Cum F %F
% Cum F
18
Working out Cum. F
• Add the F for this value to the Cum.F score
for the previous value
•
•
•
•
•
•
•
Value
1
2
3
4
5
6
F
0
2
4
2
2
1
Cum F
0
2
6
8
10
11
%F
% Cum F
Working out %F
• Count the total number of observations, n
(11)
• For each value, divide F by n, multiply by
100
•
•
•
•
•
•
•
Value
1
2
3
4
5
6
F
0
2
4
2
2
1
Cum F
0
2
6
8
10
11
%F
% Cum F
0%
18%
36%
18%
18%
9%
19
Working out % Cum. F
• Count the total number of observations, n
(11)
• For each value, divide Cum. F by n,
multiply by 100
•
•
•
•
•
•
•
Value
1
2
3
4
5
6
F
0
2
4
2
2
1
Cum F
0
2
6
8
10
11
%F
% Cum F
0%
0%
18%
18%
36%
55%
18%
72%
18%
90%
9%
100%
20
21
Things to remember
• The Cum. F. for the last value must be the
same as n
• The % Cum. F. for the last value must be
100%
• Cum.F and % Cum. F. always get bigger as
you go down
22
Distribution of a variable
• The frequency table tells us how x is
distributed
• The proportion of high and low scores; what
scores come up most often; how
• “wide” or “narrow” the data is
• Distributions tells us what we can expect
from a variable - which scores are likely
and which are unlikely?
23
Example: distribution of x
• Look at the freq table:
•
•
•
•
•
•
•
Value
1
2
3
4
5
6
F
0
2
4
2
2
1
Cum F
0
2
6
8
10
11
%F
% Cum F
0%
0%
18%
18%
36%
55%
18%
72%
18%
90%
9%
100%
• Which values are most likely to occur
again? (3 and 2, 4, 5)
• The data are widely spread (from 2 all the
way to 6)
24
Drawing a picture of x
• We can draw a histogram of x to see things
better:
4.5
4
3.5
3
2.5
2
1.5
1
0.5
0
1
2
3
4
5
6
Shows distribution visually - handy to understand what is happening
25
Drawing histograms
• Very simple: Use the F column from the
table
• For each value, draw (in scale) a bar of the
height represented by F
• Do this for all values
• Remember: label the X and Y axes (X:
variable name; Y: “Frequency”)
26
Frequency of continuous variables
• Problem: cannot write all the values of a
continuous variable:
• value: 1, 1.1, 1.111, 1.1111, 1.111111….
• Infinitely many!
• This problem can be overcome by using
data buckets
Buckets
• A bucket is a range of values which you
group together, eg [2-3], [3-4]….
• Here, the first bucket holds all values gretaer
than or equal to 2 and less than 3, the second all
values greater than or equal to 3, less than 4,
etc.
• Each value in the dataset is placed into a
bucket
• Once buckets are created, you make a
frequency table and histogram in the normal
way
27
28
Bucket example
• x is a continuous variable, from which a
sample is drawn:
• 2.2, 3.5, 3.75, 2.34, 5.33, 3.2, 3.51
• Use the following buckets:
• [0 - 1.5], [1.5 - 3], [3 - 4.5], [4.5 - 6]
29
Bucket example: F
•
•
•
•
•
Bucket
[0-1.5]
[1.5-3]
[3-4.5]
[4.5-6]
F
0
2
4
1
• CF, %F, and %CF are worked out as before.
A histogram is drawn as before, but
labelling the X axis with the buckets.