Ch 2 Statistics - Youngstown State University
Download
Report
Transcript Ch 2 Statistics - Youngstown State University
Chapter 2. Sadistics
Juggling
Basic
Statistical
Notions
Important For
Teachers
While
Maintaining
Sanity
1
Remember . . .
There are three kinds of people in the world.
Those who are good at math, and
Those who aren’t.
2
Raw Scores: without organization or context,
scores by themselves are meaningless
George Carlin’s famous line: “Here’s a partial football
score just in, Green Bay 6.”
Ohio State 35; Michigan 32
Annika Sorenstam 35; Lorena Ochoa 32
John and Charley are taking a trip together. Charley
has lived for 7 years; John has lived for 27. Who is
the older of the two?
Charley
John
Charley is a dog. In human years he is 46; e.g., (7-1) x
5 +16 = 46. John is a man. In human years he is 27.
3
Topic List
Descriptive vs. Inferential Statistics
Concepts of Data and Variables
Frequency Tables
Bar Graphs and Histograms
Measures of Central Tendency
Measures of Variability
Shapes of Distributions
z-scores
Correlation
4
Two Main Areas of Statistics:
Descriptive vs. Inferential
Descriptive Statistics is used to organize,
consolidate or summarize data we have in front of
us. Typically in descriptive statistics we describe:
a set of data elements by graphically displaying the information; or
its central tendencies and how it is distributed in relation to this
center; or
the relationship between two data elements.
Inferential Statistics is a leap into the unknown.
We use samples (a selected portion of the data
set) to draw inferences about populations (the
complete set of data elements).
5
Variables
A good place to begin is with the concept of “variables”.
Our students “vary” with regard to many
characteristics related to aptitude and achievement.
We can think of these variable characteristics using
three levels of generality.
6
English Lesson – When Speaking of Data
As a teacher, you will see lots of data.
The dictionary defines “data” as facts or figures.
Notice the word “data” is plural and when speaking of
data you need to use plural verb forms. It is an
unfortunate giveaway that you might NOT know what
you are talking about if you use the phrase “this data
is” instead of “these data are” in professional
conversation.
“Datum” is the singular form for data. No one ever
uses this term.
7
Making and Reading Frequency Tables Part 1:
Frequency Distributions - with special focus on bins (also known as
intervals, categories and class intervals)
Purpose of Creating these Tables – To organize data in ways to
make our inspection of those data much more manageable.
Frequency Distribution
We construct or read a table of counts per score.
BUT, when we have many scores, we create intervals (I like
the term “bins”) and place the individual scores in the bins.
When making bins:
Determine your score range
Determine an appropriate number of bins. Rule of thumb: no fewer than
5 or more than 20 class intervals work best for a frequency table.
Make sure no overlap exists so that no data fall into more than one bin.
Count each score in its one and only appropriate bin.
Notice that in the resulting table, individual scores are lost.
8
Making and Reading Frequency Tables Part 2:
Cumulative Distributions
Cumulative frequency distribution: A distribution that indicates
cumulative frequency counts (cum f) in each bin, and/or
percentage of the total number of cases at and below the upper
limit of the associated bin. Sometimes this is referred to simply
as cumulative distribution or cumulative frequency.
Note: Educators are using the description statistics of
cumulative distributions when speaking of students’ relative
standing.
Percentile: The point on the original measurement scale at
and below which a specified percentage of scores falls. Also
called a percentile point.
Percentile rank: The percentile rank of a score is the point on
the percentile scale that gives the percentage of scores
falling at and below a student’s specified score.
9
Frequency Distribution Table
10
Tables are Nice, but Pictures are Nicer
Frequency distributions are often converted into graphic form.
Bar Graph – Individual counts. The count bins are
separated on the horizontal line.
Histogram – Grouped counts. The bins touch each other on
the horizontal line.
Pie Graph – Either individual or grouped counts. The media
likes to display data using these graphs.
Explore the CSERD (Computational Science Education
Reference Desk) Interactive Website. This is a Pathways
project of the National Science Digital Library and funded by
the National Science Foundation.
11
Ideas of Data “Centers”; How Does Data Cluster?
. . . . starting with a concept from Garrison Keillor.
Keillor’s hometown is Lake Wobegon, located
near the geographic center of Minnesota.
Keillor reports that in Lake Wobegon "all the
women are strong, all the men are good
looking, and all the children are above
average."
12
Central Tendency
While graphs and charts are useful to visually represent data,
they are inconvenient; they are difficult to display and can not be
easily remembered apart from the visual. It is frequently useful
to reduce data to a number (sometimes called an index number)
that is easy to remember, is easy to communicate, yet captures
the essence of the complete data set it represents.
One such index is called Measures of Central Tendency (i.e.,
how do the raw data tend to cluster)
Mean – the arithmetical average
Median – the middle score
Mode – the most occurring score
So, these are measures of “center” regarding the data, but we
are also concerned about how the raw data are spread out
around the center.
13
Consider the two graphs below. These graphs represent the scores on two quizzes. The
mean score for each quiz is 7.0. Despite the equality of means, you can see that the
distributions are quite different. Specifically, the scores on Quiz 1 (top graph) are more
densely packed while those on Quiz 2 (bottom graph) are more spread out. The differences
among students was much greater on Quiz 2 than on Quiz 1.
14
Variability
Our second index is called Measures of Variability
(i.e., how do the raw data tend to spread out or
scatter)
Range – list the lowest and highest scores, then
take the difference (aka subtract) between them
Standard Deviation (S, SD, σ) – this is an
interesting concept; it is akin to finding the
average distance that scores are from the center
Variance (SD2) – mathematically the standard
deviation squared; we more often use the
standard deviation in educational assessment.
15
16
Shape of Normal Distributions
The frequency histograms for test score data
often approximate what is called the “normal
distribution” (aka bell curve, normal curve).
The normal curve has three characteristics:
unimodal – one hump
asymptotic – tails never touch the base
symmetrical – mirror image about the center
axis
17
Normal Curve
18
Shape of Other Distributions
Kurtosis –
platykurtic looks more flat
leptokurtic looks more peaked
Skewness –
positive skew means that the tail is to the right
negative skew means that the tail is to the left.
------------------------------------------------------------------- Back to the normal distribution, let’s look at
transforming a data score to a score that will tell us
where that score is in relationship to the mean. This
score is called a “z-score”.
19
z-scores
Formula:
z
=X-M
SD
Definition: A measure of how many standard
deviations a raw score is from the mean.
If the z score is negative, we say the score is
below the mean
If the z score is positive, we say the score is
above the mean
20
z-scores in normal curve
This Graph Leads In To Percentile Rank
21
Comparing Two Variables
So far we have only dealt with one variable (aka
univariate statistics). Sometimes (I would say
many times) we are curious as to the
relationship between two variables (aka
bivariate statistics). We call this curiosity an
interest in co-relationships or correlation.
22
Some History . . .
Francis Galton (1822-1911)
and “Co-relations”
Cousin of Charles Darwin
Interested in the mathematical
treatment of heredity
Used statistical analysis to study
human variation
noted that arranging measures of
a physical trait in a population
(height, e.g.) displays a bellshaped distribution
Coined term "eugenics"—science of
improving the stock
variations (deviations) viewed as
flaws as well as assets
artificial and natural selection will
shift median of distribution
23
The Eugenics Movement
Scientific “evidence” was used to argue that social ills like feeble-
mindedness, alcoholism, pauperism and criminal behavior are
hereditary traits.
Aim - "to give the more suitable races or strains of blood a better
chance of prevailing speedily over the less suitable"
Can no longer rely on natural selection:
unfit survive to childbearing years due to
advances in medicine
comforts of civilization
social welfare
unfit reproduce at higher rate than fit,
Must design society by controlling human reproduction:
encourage fit to have children
prohibit unfit from having children
24
Scattergram – Can you “eye ball” the one line you could
draw through the data points that best describes the graphic display?
.
25
Correlation Coefficient – the calculated number that
best describes the relationship between two variables
Correlation coefficient – symbol is “r” – linear relationships
Range -1.00 through .00 to +1.00
Sign indicates direction
+ indicates that as one variable increases, the other variable
increases
- indicates that as one variable increases, the other variable
decreases
Number indicates strength
Although the following table is somewhat arbitrary, the following
thinking might be useful in interpretation:
-1.0 to -0.7 strong converse association.
-0.7 to -0.3 weak converse association.
-0.3 to +0.3 little or no association.
+0.3 to +0.7 weak direct association.
+0.7 to +1.0 strong direct association.
26
Important Notes about “r”:
Not a percentage (decimal makes it look like one)
Linear assumption, not curvilinear
Equal scatter assumption – no bunching
Variability affects “r”
Greater
the variability, greater the “r”
Less the variability, lower the “r”
“r” does not imply causation
27
Depth Chart
During your YSU field work, you will be asked to organize data through
the creation of frequency tables and histograms. Thus, we discussed
constructing them as well as understanding them.
Throughout your professional practice, you will be asked to utilize
measures of central tendency and variability. Thus, we emphasized
understanding them, basic computations, and their relationship to zscores. These concepts are key to understanding standard scores.
In professional publications you will see correlation coefficients. We
discussed (and you were asked to compute) correlation. Correlation is
a key tool in exploring our next topic – reliability (and later, validity) .
Hopefully you will see value in computing measures based on your
own classroom data. It is actually fun to learn to do these basic
descriptive stats with a software package. Commonly used packages
include SPSS, SAS, Minitab, and SYSTAT. Any system would be OK.
Start simple.
28
Terms/Concepts to Review and
Study on Your Own (1)
bivariate distribution
measures of central tendency (mean,
median, mode)
construct
correlation coefficient
descriptive statistics
frequency distribution
histogram
inferential statistics
29
Terms Concepts to Review and
Study on Your Own (2)
kurtosis
normal curve
range
scattergram
skewness
standard deviation
variability
variable
z-score
30