Transcript day2
EDPSY 511-001
Chp. 2: Measurement and
Statistical Notation
Populations vs. Samples
• Population
– The complete set of individuals
• Characteristics are called parameters
• Sample
– A subset of the population
• Characteristics are called statistics.
– In most cases we cannot study all the
members of a population
Descriptive vs. Inferential
• Descriptive statistics
– Summarize/organize a group of numbers from
a research study
• Inferential statistics
– Draw conclusions/make inferences that go
beyond the numbers from a research study
– Determine if a causal relationship exists
between the IV and DV
Common Research Designs
• Correlational
– Do two qualities “go together”.
• Comparing intact groups
– a.k.a. causal-comparative and ex post facto designs.
• Quasi-experiments
– Researcher manipulates IV
• True experiments
– Must have random assignment.
• Why?
– Researcher manipulates IV
Variables
• Variables
– Characteristics that takes on different values
• Achievement
• Age
• Condition
– Independent variable (IV)
• Manipulated or Experimental
– Condition
• Subject
– Personality
– Gender
– Dependent variable (DV)
• The outcome of interest
– Achievement
– Drop-out status
Measurement
• Is the assignment of numerals to objects.
• Nominal
– Examples: Gender, party affiliation, and place of birth
• Ordinal
– Examples: SES, Student rank, and Place in race
• Interval
– Examples: Test scores, personality and attitude scales.
• Ratio
– Examples: Weight, length, reaction time, and number of
responses
Categorical, Continuous and
Discontinuous
• Categorical (nominal)
– Gender, party affiliation, etc.
• Discontinuous
– No intermediate values
• Children, deaths, accidents, etc.
• Continuous
– Variable may assume an value
• Age, weight, blood sugar, etc.
Values
• Exhaustive
– Must be able to assign a value to all objects.
• Mutually Exclusive
– Each object can only be assigned one of a set
of values.
• A variable with only one value is not a
variable.
– It is a constant.
Chapter 2: Statistical Notation
•
Nouns, Adjectives, Verbs and
Adverbs.
–
•
Say what?
Here’s what you need to know
–
X
•
–
Xi = a specific observation
N
•
–
# of observations
∑
•
Sigma
–
–
Means to sum
Work from left to right
•
•
•
•
•
•
Perform operations in
parentheses first
Exponentiation and square
roots
Perform summing operations
Simplify numerator and divisor
Multiplication and division
Addition and subtraction
N
X
i 1
i
• Pop Quiz (non graded)
– In groups of three or four
• Perform the indicated operations.
• What was that?
N X ( X )
2
N ( N 1)
2
Chapter 3
Exploratory Data Analysis
Exploratory Data Analysis
• A set of tools to help us exam data
– Visually representing data makes it easy to
see patterns.
• 49, 10, 8, 26, 16, 18, 47, 41, 45, 36, 12, 42, 46, 6,
4, 23, 2, 43, 35, 32
– Can you see a pattern in the above data?
• Imagine if the data set was larger.
– 100 cases
– 1000 cases
Three goals
• Central tendency
– What is the most common score?
– What number best represents the data?
• Dispersion
– What is the spread of the scores?
• What is the shape of the distribution?
Frequency Tables
• Let say a teacher gives her students a
spelling test and wants to understand the
distribution of the resultant scores.
– 5, 4, 6, 3, 5, 7, 2, 4, 3, 4
Value
F
Cumulative F
%
Cum%
7
1
1
10%
10%
6
1
2
10%
20%
5
2
4
20%
40%
4
3
7
30%
70%
3
2
9
20%
90%
2
1
10
10%
100%
N=10
As groups
• Create a frequency table using the
following values.
– 20, 19, 17, 16, 15, 14, 12, 11, 10, 9
Banded Intervals
• A.k.a. Grouped frequency tables
• With the previous data the frequency table
did not help.
– Why?
• Solution: Create intervals
• Try building a table using the following
intervals
<=13, 14 – 18, 19+
Stem-and-leaf plots
• Babe Ruth
– Hit the following number of Home Runs from 1920 –
1934.
• 54, 59, 35, 41, 46, 25, 47, 60, 54, 46, 49, 46, 41, 34, 22
– As a group let’ build a stem and leaf plot
– With two classes’ spelling scores on a 50 item
test.
• Class 1: 49, 46, 42, 38, 34, 33, 32, 30, 29, 25
• Class 2: 39, 38, 38, 36, 36, 31, 29, 29, 28, 19
– As a group let’ build a stem and leaf plot
Landmarks in the data
• Quartiles
– We’re often interested in the 25th, 50th and 75th
percentiles.
• 39, 38, 38, 36, 36, 31, 29, 29, 28, 19
– Steps
• First, order the scores from least to greatest.
• Second, Add 1 to the sample size.
– Why?
• Third, Multiply sample size by percentile to find location.
– Q1 = (10 + 1) * .25
– Q2 = (10 + 1) * .50
– Q3 = (10 + 1) * .75
» If the value obtained is a fraction take the average of the
two adjacent X values.
Box-and-Whiskers Plots (a.k.a.,
Boxplots)
Shapes of Distributions
• Normal distribution
• Positive Skew
– Or right skewed
• Negative Skew
– Or left skewed
How is this variable distributed?
3.0
2.5
Frequency
2.0
1.5
1.0
0.5
Mean = 4.3
Std. Dev. = 1.494
N = 10
0.0
1
2
3
4
5
score
6
7
8
How is this variable distributed?
3.0
2.5
Frequency
2.0
1.5
1.0
0.5
Mean = 2.80
Std. Dev. = 1.75119
N = 10
0.0
0.00
1.00
2.00
3.00
4.00
right
5.00
6.00
7.00
How is this variable distributed?
3.0
2.5
Frequency
2.0
1.5
1.0
0.5
Mean = 5.40
Std. Dev. = 1.42984
N = 10
0.0
2.00
3.00
4.00
5.00
left
6.00
7.00
8.00
Descriptive Statistics
Statistics vs. Parameters
• A parameter is a characteristic of a population.
– It is a numerical or graphic way to summarize data
obtained from the population
• A statistic is a characteristic of a sample.
– It is a numerical or graphic way to summarize data
obtained from a sample
Types of Numerical Data
•
There are two fundamental types of
numerical data:
1)
2)
Categorical data: obtained by determining the
frequency of occurrences in each of several
categories
Quantitative data: obtained by determining
placement on a scale that indicates amount or
degree
Measures of Central Tendency
Central Tendency
Average (Mean)
Median
n
X
X
i 1
n
N
X
i 1
N
i
i
Mode
Mean (Arithmetic Mean)
• Mean (arithmetic mean) of data values
– Sample mean
Sample Size
n
X
X
i 1
i
n
X1 X 2
n
– Population mean
N
X
i 1
N
i
Xn
Population Size
X1 X 2
N
XN
Mean
• The most common measure of central
tendency
• Affected by extreme values (outliers)
0 1 2 3 4 5 6 7 8 9 10
Mean = 5
0 1 2 3 4 5 6 7 8 9 10 12 14
Mean = 6
Median
• Robust measure of central tendency
• Not affected by extreme values
0 1 2 3 4 5 6 7 8 9 10
Median = 5
0 1 2 3 4 5 6 7 8 9 10 12 14
Median = 5
• In an Ordered array, median is the
“middle” number
– If n or N is odd, median is the middle number
– If n or N is even, median is the average of the
two middle numbers
Mode
•
•
•
•
•
•
A measure of central tendency
Value that occurs most often
Not affected by extreme values
Used for either numerical or categorical data
There may may be no mode
There may be several modes
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
Mode = 9
0 1 2 3 4 5 6
No Mode