Why Statistics

Download Report

Transcript Why Statistics

Why Statistics ?
Notes about behavior
Ideas and ramblings from Devore & Peck: Statistics
Why Stats? Three Reasons
To be informed
 To understand issues and make
decisions
 The be able to evaluate decisions about
your life and those who you may teach

Reason One: Being Informed
Our life is filled with data but most of that
data comes in the form of sound bites.
 We are not given sufficient details to
make our own decisions
 We are expected to “follow”
 The average American is data ignorant
 Not “understanding” data takes away our
control of decision making

Reason One: Informed
Consumer

To be in control of your decisions you
must be able to:
1.
2.
3.
Extract information from charts and graphs
Follow the logic of numerical arguments
Know the basic rules of how data should
be gathered, summarized, and analyzed
to draw valid “truthful” statistical
conclusions.
Reason Two: Understanding and Making
Decisions

Be able to decide if information is adequate
and sufficient to make a decision
1.




Know enough to challenge the data presented by
virtue of “knowing” about data
Analyze the data that is available.
Assess assumptions inherent (built into) the
type of data collected
Draw conclusions & make decisions about
the data
Assess the risk of an incorrect decision
Reason Three: Life Decisions
Drug screening for work (or the
Olympics): False positives and negatives
 Criteria to define financial need.
 Scores on state achievement exams
 Data about teen accidents that affects
the rate of payment
 Probability of an incorrect medical
diagnosis

Types of Data
Nominal – Naming
 Ordinal – Ordering
 Interval – Equal Intervals
 Ratio – True Zero

Working with Interval and Ratio Data


Remember Interval and Ratio data are the only two
types of data that can be added, subtracted, multiplied,
or divided.
Note about the use of symbols to indicate operations:
 There are three symbols that mean “multiply”
 x , , and ( ) thus 2x3, 2  3, and 2(3)
 There are four symbols that mean “divide”
/ (or virgule), ) ― (vinculum) , ÷ (obelus), and
)
(a closed parenthesis attached to a vinculum)
Measures of Similarity

Measures of Central Tendency
Mean
2. Median
3. Mode
 Normal Curve –estimating the population
from a sample
1.
Mean (Average or
(x-bar))
= the sum of scores divided by
the number of scores
 +5+7+3+6+4 = 25 total with 5
scores
 5/25 = 5 =

Median (middle)

"Middle value" of a list. The smallest number such
that at least half the numbers in the list are no
greater than it. If the list has an odd number of
entries, the median is the middle entry in the list
after sorting the list into increasing order. If the list
has an even number of entries, the median is equal
to the sum of the two middle (after sorting) numbers
divided by two
+5+7+3+6+4 change to …
 +3+4+5+6+7 = find the middle = 5

True Middle = (5 (scores) + 1) ÷ 2 = 3rd score, 5 (for
odd numbers ONLY)
Mode (most common)

For lists, the mode is the most common
(frequent) value. A list can have more than one
mode. For histograms, a mode is a relative
maximum ("bump")
 In this list (+5+7+3+6+4) there is no
mode but in this list
 +5+7+3+6+6+4 the most common
number is 6
X
X
X
X
X
X
X
X
1
2
3
4
5
6
7
Population Estimates- Normal Curve

The normal curve (aka Bell curve) is an
estimate of the population of all possible
instances of an object, event or other entity
68.26%
95.44%
97.74%
of the entire
population
Nature and the role of variability

If all students at PHCC were invariable my job
would be really easy!
 Sadly, the students at PHCC exhibit high
variability including age, education, socioeconomic status, self-assessment, and
educational expectations - so I have lots of
work in planning & preparing for class,
anticipating prior knowledge, anticipating
levels of understanding, and predicting the
speed of information delivery.
What is variability?

Variability refers to the spread of scores is
about describing how the data (plural of
datum) along the scale of measurement
is organized. Data variability describes
the ways in which the data are grouped
Range, Variance, and Deviation
How do we see and tell about variability – the
distance (or spread) of scores across the
continuum of scores?
 One way is the range. In the example
3+4+5+6+7 the range is 7-3 = 4
 The
is not enough to describe the data.
The following examples all have a of 5:
 2+1+6+1+15 = 25/5 =
=5
 2+2+1+6+3+3+3+7+18 = 45/9 =
=5

Data has a

of 5
3+4+5+6+7 =
=5
Range = 7-3 = 4

1+1+2+6+15 = 25/5 =
=5
Range = 15-1 = 14

1+2+2+3+3+3+6+7+18 = 45/9 =
=5
Range = 18-1 =17
Variability I
Frequency
Reason
Academic problems
1.
Poor advising or
2.
teaching
Needed a break
3.
Economic reasons
4.
Family
5.
responsibilities
To attend another
6.
school
Personal problems
7.
Other
8.
Why students drop out
1
4
1
2
6
2
10
3
5
5
3
15
7
8
0
1
2
3
4
5
Reasons
6
7
8
4
5
6
7
8
Variance


Another way to estimate variability is by calculating the variance.
The variance is the sum of differences between each score and
the . The variance is squared so that any negative
numbers do not counterbalance the positive numbers.
first calculate the mean of the scores, then measure the amount
that each score deviates from the mean and then square that
deviation (by multiplying it by itself). Numerically, the variance
equals the average of the several squared deviations from the
mean
High variance
Low variance
Calculating Variance
1.
2.
3.
4.
5.
First calculate the mean
of the scores,
then measure the
amount that each score
deviates from the mean
then square that
deviation (by
multiplying it by itself).
Add up all of the
variance-squared
scores
Divide by the number of
scores (5)





+5
+7
+3
+6
+4
Variance = 10 / 5 = 2
–
–
–
–
–
5
5
5
5
5
=
=
=
=
=
0 =02 = 0
2 =22 = 4
-2 =-22=4
1 =12 = 1
-1=-12 = 1
10
Another Variance





First calculate the mean 
of the scores,

then measure the
amount that each score 
deviates from the mean 
then square that
deviation (by multiplying 
it by itself).
Add up all of the
variance-squared scores
Divide by the number of
scores (5)
2
+2 – 5 =-3=-3 =9
2
+1 – 5 =-4=-4 =12
2
+6 – 5 =-1=-1 =1
2
+1 – 5 =-4=-4 =12
2
+15–5 =10=10 =100
Variance = 134 / 5 = 26.8
134
Standard Deviation

Is simply the square
root of the variance
 Problem 1 Variance = 10 / 5 = 2
S.D. or s = √2 = 1.414214

Problem 2
Variance = 134 / 5 = 26.8
S.D. or s = √26.8 = 5.176872
Interquartile Range
The distance from the 75th percentile to the
25th percentile in a group of scores.
 As the median divides a data set in half, the
quartiles divide the data set into fourths.
Hence the second quartile, denoted Q2, is the
median. 1+2+2+3+3+3+6+7+18

True Middle = (9 (scores) + 1) ÷ 2 = 5th score, 3
True Middle of lower half = (4 scores) +1) ÷ 2 = 2.5
True Middle of upper half = (4 scores) +1) ÷ 2 = 6.5
Interquartile Range = Q3 – Q1 = 6.5 – 2.5 = 3
The Interquartile range ignores outlier numbers such as the 18
we are interested only in the data above and below Q2. In the above example
and do not include Q2 in either score
Converted Measures

Scores can be converted to a common denominator to provide
equated comparisons between groups.
Z-scores (standard scores), percentile, and stanine scores are all
converted to a common base so that comparisons between
groups can be made.

Percentiles Raw scores, or total of points a student earns on a tests,

are converted into percentage values. There are two statistics used for
this purpose: the percentile rank which is a number between 0 and 100
indicating the percent of cases in a norm group falling at or below that
score. The percentile is a point on a scale of scores at or below which a
given percent of the cases falls. For example, a child who scores at the
42 percentile , is doing as well as, or better than, 42 percent of the
students who took the same test.
 Percentiles are like quartiles, except that they divide the data set
into 100 equal parts instead of four equal parts
Percentiles Explained

The percentile for an observation x is found by
dividing the number of observations less than x by
the total number of observations and then
multiplying this quantity by 100.
Once you can calculate Percentitles, you can also
determine Deciles and Quartiles.
 The First Quartile = the 25th Percentile
The Second Quartile = the 50th Percentile
The Third Quartile = the 75th Percentile
Given 45 out of 50 students had test scores less than 80. Since 45/50 =
90%. If you had a score of 80, you were in the 90th percentile
1+2+2+3+3+3+6+7+18
The percentile for a score of 6 = (6 ÷ 9) x 100 = .66667 x 100 = 66.66%
So a score of 6 is higher that 66% of the other scores
Stanine Scores

Stanines The term stanine is derived from “standard nine”

Stanine scores range from 1 to 9 with 5 in the center. Except for 1
and 9, each stanine includes a band of scores one half a standard
deviation wide. Thus stanine scores are standard scores with a
mean of 5 and a standard deviation of 2.

Test scores are commonly expressed using these single-digit scores which
can help students and parents visualize where someone falls on the test
scale.

The National Stanine is a scale score that divides the scores of the
norming sample into nine groups, ranging from a high of 9 to a low
of 1. Stanine 1-3 are generally considered below average, Stanine
scores 4-6 average, and Stanine 7-9 above average. Stanine scores
have a constant relationship to Percentiles; that is a given
Percentile always falls into the same stanine. Stanine 5, for
example, always includes Percentiles 41-59.
Stanine Example
Danville Montessori Third Grade CAT Scores (Total Battery)
National Stanine
Scale Score
National Percentile
1998-1999
1999-2000
2000-2001
7.5
7.2
7.3
740.1
725.3
730.4
95.0
89.0
***
2001-2002
2002-2003
2003-2004
7.7
7.4
8.6
743.2
732.4
757.0
98.0
***
97.0
2004-2005
9.0
759.0
***
YEAR
*** denotes less than ten students tested, therefore the National Percentile for the group is not computed.
Stanine Description

The middle stanine is the fifth one; it contains the middle
20% of the scores. Each stanine interval, except the first and
last ones, spans half of a standard deviation.
1,2 or 3 = "below average"
4,5 or 6 = "average"
7,8, or 9 = "above average"
Stanine Calculation
Stanine is calculated from a z-score
 (2 x z-score) + 5
 A mean of 5 and a S.D. of 2

Standard Score

When a set of scores are converted to zscores, the scores are said to be
standardized and are referred to as
standard scores. Standard scores have a
mean of 0 and a standard deviation of 1.
Stats Interpretation Summary
Variability II
Estimate of number of each color of M & M’s in a large bag
250
198
200
132
150
132
100
66
66
66
50
Color
ue
Bl
en
G
re
ng
e
O
ra
ed
R
w
llo
Ye
ow
n
0
Br
Frequency in Large Bag
Estimated Number of M & M's N=660
M & M’s Variability I


Q: What is the percentage of each color in "M&M's"
Chocolate Candies?
A: On average, "M&M's" Plain Chocolate Candies and our
new "M&M's" Mint Chocolate Candies contain 30% browns,
20% each of yellows and reds and 10% each of oranges,
greens, and blue. For "M&M's" Peanut Chocolate Candies,
the ratio is 20% each of browns, yellows, reds, greens and
oranges. We use the same ratio for our "M&M's" Peanut
Butter and Chocolate
Total
Brown
Yellow
Red
660
198
132
132
Orange Green
66
66
Blue
66
M & M’s Variability II
Estimate of colors present in large bag
30% 20% 20% 10%
250
10% 10%
198
200
132
150
132
100
66
66
66
50
Color
ue
Bl
G
re
en
O
ra
ng
e
Re
d
Ye
llo
w
n
0
Br
ow
Frequency in Large
Bag
Estimated Number of M & M's
N=660
Count Your M & M’s
Use Excel to graph your counts – convert to %. Does it match
what the company says?
12
11
10
9
8
7
6
5
4
3
2
1
red
blue
yellow
green
brown
orange
Homework
Find the mean, median, & mode for your
M & M’s & for the total group
 Find the Variance and the standard
deviation your M & M’s & for the total
group
 Convert your groups M & M’s to
Percentiles
 Answer the question- How does your
sample vary from the total group sample

Homework Format
Results
Description of M & M population ( the % estimates from Mars Company)
1. The results (central tendency)….
2. The variability (range, variance, and standard deviation)
3. Most (color) fell in the 90th percentile, while (and so forth)
4. * provide charts as necessary
Description of Sample
1. Within the present sample (do the same as above except for using
the sample’s statistics)
2. * Provide charts as necessary
Summary
1. Summarize your results by comparing your sample to the population