introduction to Statistics

Download Report

Transcript introduction to Statistics

Introduction to statistics I
Sophia King
Rm. P24 HWB
[email protected]
Using statistics in Psychology
 Carrying out psychological research means the collection
of data. Statistics are a way of making use of this data
 Descriptive Statistics: used to describe characteristics of
our sample
• Statistics describe samples
 Inferential Statistics: used to generalise from our sample
to our population
• Parameters describe populations
 Any samples used should therefore be representative of
the target population
Descriptive Statistics
 Statistical procedures used to summarise, organise, and
simplify data. This process should be carried out in such a
way that reflects overall findings
 Raw data is made more manageable
 Raw data is presented in a logical form
 Patterns can be seen from organised data
•
•
•
•
Frequency tables
Graphical techniques
Measures of Central Tendency
Measures of Spread (variability)
Plotting Data: describing spread of data
 A researcher is investigating short-term memory capacity:
how many symbols remembered are recorded for 20
participants:
4, 6, 3, 7, 5, 7, 8, 4, 5,10
10, 6, 8, 9, 3, 5, 6, 4, 11, 6
 We can describe our data by using a Frequency
Distribution. This can be presented as a table or a graph.
Always presents:
• The set of categories that made up the original category
• The frequency of each score/category
 Three important characteristics: shape, central tendency,
and variability
Frequency Distribution Tables
X
11
10
9
8
7
6
5
4
3
f
1
2
1
2
2
4
3
3
2
fX
11
20
9
16
14
24
15
12
6
 Highest Score is placed at top
 All observed scores are listed
 Gives information about
distribution, variability, and
centrality
 X = score value
 f = frequency
 fx = total value associated with
frequency
 f = N
 X =fX
Frequency Table Additions
X
11
10
9
8
7
6
5
4
3
f
1
2
1
2
2
4
3
3
2
fX
11
20
9
16
14
24
15
12
6
p
0.05
0.1
0.05
0.1
0.1
0.2
0.15
0.15
0.1
%
5%
10%
5%
10%
10%
20%
15%
15%
10%
 Frequency tables can display more
detailed information about distribution
 Percentages and proportions
 p = fraction of total group
associated with each score (relative
frequency)
 p = f/N
 As %: p(100) =100(f/N)
 What does this tell about this
distribution of scores?
Grouped Frequency Distribution Tables
X
95-99
90-94
85-89
80-84
75-79
70-74
65-69
60-64
55-59
50-54
f
1
1
0
1
2
4
7
0
6
3
 Sometimes the spread of data is too wide
 Grouped tables present scores as class
intervals
 About 10 intervals
 An interval should be a simple round number
(2, 5, 10, etc), and same width
 Bottom score should be a multiple of the
width
 Class intervals represent Continuous variable
of X:
 E.g. 51 is bounded by real limits of 50.5-51.5
 If X is 8 and f is 3, does not mean they all
have the same scores: they all fell
somewhere between 7.5 and 8.5
Percentiles and Percentile Ranks
X
11
10
9
8
7
6
5
4
3
f
1
2
1
2
2
4
3
3
2
cf
20
19
17
16
14
12
8
5
2
C%
100%
95%
85%
80%
70%
60%
40%
25%
10%
 X values = raw scores, without
context
 Percentile rank = the percentage of
the sample with scores below or at
the particular value
 This can be represented be a
cumulative frequency column
 Cumulative percentage obtained by:
c% = cf/N(100)
 This gives information about relative
position in the data distribution
Representing data as graphs
 Frequency Distribution Graph
presents all the info available in
a Frequency Table (can be fitted
to a grouped frequency table)
 Uses Histograms
5
Frequency
4
3
2
1
0
1
2
3
4
5
6
7
8
9
10
11
memory score
8
7
6
Frequency
 Bar width corresponds to real
limits of intervals
 Histograms can be modified to
include blocks representing
individual scores
5
4
3
2
1
2
3
4
5
6
7
8
9
10
11
12
0
45 49 54 59 64 69 74 79 84 89 94 99
score
Frequency Distribution Polygons
5
 Shows same information with
lines: traces ‘shape’ of
distribution
 Both histograms and
polygons represent
continuous data
 For non numerical data,
frequency distribution can be
represented by bar graphs
 Bar graphs have spaces
between adjacent bars to
represent distinct
categories
4
3
2
1
0
1
2
3
4
5
6
7
8
9
10
11
16
14
12
10
8
6
4
2
0
phone numbers
historical dates
family dates
12
Frequencies of Populations and Samples
 Population
 All the individuals of interest to the study
 Sample
 The particular group of participants you are testing:
selected from the population
 Although it is possible to have graphs of population
distributions, unlike graphs of sample distributions, exact
frequencies are not normally possible. However, you can
 Display graphs of relative frequencies (categorical data)
 Use smooth curves to indicate relative frequencies
(interval or ratio data)
Frequency Distribution: the Normal Distribution
 Bell-shaped: specific shape that can be defined as an equation
 Symmetrical around the mid point, where the greatest frequency
if scores occur
 Asymptotes of the perfect curve never quite meet the horizontal
axis
 Normal distribution is an assumption of parametric testing
Frequency Distribution: Different Distribution
shapes
Measures of Central Tendency
 A way of summarising the data using a single value that is
in some way representative of the entire data set
 It is not always possible to follow the same procedure in
producing a central representative value: this changes
with the shape of the distribution
 Mode
 Most frequent value
 Does not take into account exact scores
 Unaffected by extreme scores
 Not useful when there are several values that occur
equally often in a set
Measures of Central Tendency
 Median
 The values that falls exactly in the midpoint of a ranked
distribution
 Does not take into account exact scores
 Unaffected by extreme scores
 In a small set it can be unrepresentative
 Mean (Arithmetic average)
 Sample mean: M = X
Population mean:  = X
n
N
 Takes into account all values
 Easily distorted by extreme values
Measures of Central Tendency
 For our set of memory scores:
4, 6, 3, 7, 5, 7, 8, 4, 5,10
10, 6, 8, 9, 3, 5, 6, 4, 11, 6
 Mode = 6: Median = 6: Mean = 6.35
 The mean is the preferred measure of central tendency,
except when
 There are extreme scores or skewed distributions
 Non interval data
 Discrete variables
Central Tendencies and Distribution Shape
Describing Variability
 Describes in an exact quantitative measure, how spread
out/clustered together the scores are
 Variability is usually defined in terms of distance
 How far apart scores are from each other
 How far apart scores are from the mean
 How representative a score is of the data set as a whole
Describing Variability: the Range
 Simplest and most obvious way of describing variability
Range = Highest - Lowest
 The range only takes into account the two extreme scores
and ignores any values in between. To counter this there
the distribution is divided into quarters (quartiles). Q1 =
25%, Q2 =50%, Q3 =75%
• The Interquartile range: the distance of the middle two
quartiles (Q3 – Q1)
• The Semi-Interquartile range: is one half of the
Interquartile range
Describing Variability: Deviation
 A more sophisticated measure of variability is one that
shows how scores cluster around the mean
 Deviation is the distance of a score from the mean
X - , e.g. 11 - 6.35 = 3.65, 3 – 6.35 = -3.35
 A measure representative of the variability of all the scores
would be the mean of the deviation scores
(X - )
Add all the deviations and divide by n
n
• However the deviation scores add up to zero (as mean
serves as balance point for scores)
Describing Variability: Variance
X
3
3
4
4
4
5
5
5
6
6
6
6
7
7
8
8
9
10
10
11
Sum
X-
-3.35
-3.35
-2.35
-2.35
-2.35
-1.35
-1.35
-1.35
-0.35
-0.35
-0.35
-0.35
0.65
0.65
1.65
1.65
2.65
3.65
3.65
4.65
0
(X -)²
11.22
11.22
5.52
5.52
5.52
1.82
1.82
1.82
0.12
0.12
0.12
0.12
0.42
0.42
2.72
2.72
7.02
13.32
13.32
21.62
106.55
 To remove the +/- signs we simply
square each deviation before finding
the average. This is called the
Variance:
(X - )²
n
= 106.55
20
= 5.33
 The numerator is referred to as the
Sum of Squares (SS): as it refers to
the sum of the squared deviations
around the mean value
Describing Variability: Population Variance
 Population variance is designated by ²
² = (X - )² = SS
N
N
 Sample Variance is designated by s²
 Samples are less variable than populations: they therefore
give biased estimates of population variability
 Degrees of Freedom (df): the number of independent (free
to vary) scores. In a sample, the sample mean must be
known before the variance can be calculated, therefore
the final score is dependent on earlier scores: df = n -1
s² = (X - M)² = SS = 106.55 = 5.61
n-1
n -1
20 -1
Describing Variability: the Standard Deviation
 Variance is a measure based on squared distances
 In order to get around this, we can take the square root of
the variance, which gives us the standard deviation
 Population () and Sample (s) standard deviation
 = (X - )²
N
s = (X - M)²
n-1
So for our memory score
example we simple take the
square root of the variance:
=
5.61 = 2.37
Describing Variability
 The standard deviation is the most common measure of
variability, but the others can be used. A good measure of
variability must:
 Must be stable and reliable: not be greatly affected by little
details in the data
• Extreme scores
• Multiple sampling from the same population
• Open-ended distributions
 Both the variance and SD are related to other statistical
techniques
Descriptive statistics
 A researcher is investigating short-term memory capacity:
how many symbols remembered are recorded for 20
participants:
4, 6, 3, 7, 5, 7, 8, 4, 5,10
10, 6, 8, 9, 3, 5, 6, 4, 11, 6
 What statistics can we display about this data, and what
do they mean?
 Frequency table: show how often different scores occur
 Frequency graph: information about the shape of the
distribution
 Measures of central tendency and variability
Descriptive statistics
5
4
X
11
10
9
8
7
6
5
4
3
f
1
2
1
2
2
4
3
3
2
fX
11
20
9
16
14
24
15
12
6
p
0.05
0.1
0.05
0.1
0.1
0.2
0.15
0.15
0.1
%
5%
10%
5%
10%
10%
20%
15%
15%
10%
3
2
1
0
1
2
3
4
5
6
7
8
9
10
11
12
References and Further Reading
 Gravetter & Wallnau
 Chapter 2
 Chapter 3
 Chapter 4