Statistics for teachers
Download
Report
Transcript Statistics for teachers
Session 7
Introduction to Research
and Evaluation
Topic 1: Research Questions and
Hypothesis Testing
And
Topic 2: Introduction to Statistics
For Tonight
Today
Review the contents of the proposal
Topics tonight
– Finish the research questions
– Types of data review
– Hypothesis Testing
– Intro to Stats
The phases of a research project
Problem statement
Purpose
Hypothesis development / research
question(s)
Population / Sample type
Results reporting (data)
Statistical testing
Conclusions Recommendations
Parts of the Research Report
Chapter 1
Chapter 2
Chapter 3
Chapter 4
Chapter 5
References
Appendix
Components of Chapter 1
Introduction
Background of the study
Problem statement
Significance of study
Overview of methodology
Delimitations of study
Definitions of key terms
Conclusion (optional)
Characteristics of
Components in Chapter 1
Introduction – 1 paragraph – 3 pages
– Gets attention - gradually
– Brief vs. reflective opening
Background – 2-5 pages
– History of problem, etc.
– Professional vs. practical use
– Be careful of personal intrusions
Characteristics of Components in
Chapter 1
Problem Statement – ½ page
– States problem as clearly as possible
Significance of study – 1 pgh. to 1 page
– Answers: “Why did you bother to conduct the
study?”
– Be careful of promising too much
Ways to Convey
Significance
Problem has intrinsic importance, affecting
organizations or people
Previous studies have produced mixed
results
Your study examines problem in different
setting
Meaningful results can be used by
practitioners
Unique population
Different methods used
Characteristics of
Components
Delimitations – as needed
– Not flaws
– Establishes the boundaries – can study be
generalized?
– Consider:
sample
Setting
time period
methods
Stating
the
Problem
Developing a hypothesis
:
– Methods: estimation and hypothesis testing.
Estimation, the sample is used to estimate a
parameter and a confidence interval about the
estimate is constructed.
– Parameter: numerical quantity measuring some
aspect
– Confidence Interval: range of values that estimates a
parameter for a high proportion of the time
Hypothesis Testing: the most common use
– Hypothesis: an intelligent guess or assumption that guides
the design of the study
– Null hypothesis: there is no difference or there is no effect
– Alternative hypothesis: there is a difference or there is an
effect
– Hypotheses: more than hypothesis, which are related to the
population
TYPES OF DATA
Variables
Two categories:
Independent
– Variables in an experiment or study which are
not easily to be manipulated without changing
the participants.
Age, gender, year, classroom teacher, any
personal background data, etc
Dependent
– Variables which are changed in an experiment
Hours of sleep, amount of food, time given to
complete an activity, curriculum, instructional
method, etc.
Variables
A variable: any measured characteristic or attribute that differs for
different subjects.
Two types:
– Quantitative: sometimes called "categorical variables.“
measured on one of three scales:
– Ordinal: first second or third choice (most of the children
preferred red popsicles, and grape was the second choice)
– Interval: direct time periods between two events ( time it
takes a child to respond to a question)
– Ratio scale: compares the number of times one event
happens in comparison to another event. (example: the
number of time a black card is pulled in comparison to the
number of times a red card is pulled)
– Qualitative:
measured on a nominal scale.
Types of Data
Nominal Data -- Data that describe the presence or absence of some
characteristic or attribute; data that name a characteristic without any
regard to the value of the characteristic; also referred to as categorical
data. Male = 1 Female = 2, blue, green, etc
Ordinal Data -- Measurement based on the rank order of concepts or
variables; differences among ranks need not be equal.
interval data -- Measurement based on numerical scores or values in
which the distance between any two adjacent, or contiguous, data
points is equal; scale without a meaningful or true zero
Ratio Data -- Order and magnitude…. Measurement for which
intervals between data points are equal; a true zero exists; if the score
is zero, there is a complete absence of the variable.
– Four levels:
nominal: assigning items to groups or categories
– Examples: Classroom, color, size
Ordinal: ordered in the sense that higher numbers represent higher
values
– Examples 1= freshmen, 2= sophomore
Interval: one unit on the scale represents the same magnitude on the
trait or characteristic being measured across the whole range of the
scale.
– Interval scales do not have a "true" zero point,
it is not possible to make statements about how many times higher
one score is than another.
Ratio: represents the same magnitude on the trait or characteristic
being measured across the whole range of the scale.
– DO have true zero points
Nominal level of
measurement
Assigns a number to represent a
group (gender; geography)
Numbers represent qualitative
differences (good-bad)
No order to numbers
Statistics -- mode, percentages,
chi-square
Ordinal level of
measurement
Things are rank-ordered -- >, <
Numbers are not assigned arbitrarily
Assume a continuum
Examples -- classification (fr, soph,
jr, sr), levels of education, Likert
scales
Statistics--median (preferred), mode,
percentage, percentile rank, chisquare, rank correlation.
Interval level of
measurement
Equal units of measurement
Arbitrary zero point--does not indicate
absence of the property
Example -- degrees, Likert-type
scales (treatment), numerical grades
Statistics -- frequencies, percentages,
mode, mean, SD, t test, F test,
product moment correlation
Ratio level of measurement
Absolute zero
Interval scale
Examples -- distance, weight
Statistics -- all statistical
determinations
Which are these?
Never married
Lower middle Class
Divorced
Age
Separated
Middle class
Widowed
Weight
Religious Affiliations
Height
Political Affiliations
Distance
freshmen
Which are these?
Never married N
Lower middle Class O
Divorced N
Age I/R
Separated N
Middle class O
Widowed N
Weight I/R
Religious Affiliations N
Height I/R
Political Affiliations
Distance I/R
freshmen O
Minutes I/R
N
Key Point
Statistical Significance must be
distinguished from practical significance
– Even a small difference in a large sample might
be significant if the sample is large
– No p-value of a .0001 means that 1 in 10000
times the difference observed will occur by
chance (no real difference between groups)
Example Hypothesis
There will be no significant difference in the
EOC scores for schools that use CAERT
and those that don’t.
The EOC exam scores for schools using
Caert and those that don’t will not be
significantly different.
The EOC exam scores for schools using
Caert and those that don’t will be
significantly different.
Statistics for Teachers
Statistics
“If you can assign a number to it,
you can measure it”
Dr. W. Edward Demming
Statistics
– refers to calculated quantities regardless of whether or
not they are from a sample
– is defined as a numerical quantity
– Often used incorrectly to refer to a range of techniques
and procedures for analyzing data, interpreting data,
displaying data, and making decisions based on data.
Because that is the basic learning outcomes of a
statistics course.
What is the mean medium
and the mode in this
example?
Descriptive statistics
Descriptive statistics
– summarize a collection of data in a clear and understandable way.
Example: Scores of 500 children on all parts of a standardized test.
Methods: numerical and graphical.
– Numerical: more precise- uses numbers as accurate measure
mean the arithmetic average which is calculated by adding
a the scores or totals and then dividing by the number of
scores.
standard deviation. These statistics convey information
about the average degree of shyness and the degree to
which people differ in shyness.
– Graphical: better for identifying patterns
stem and leaf display : a graphical method of displaying
data to show how several data are aligned on a graph
box plot. Graphical method to show what data are
included. The box stretches from the 25th percentile to the
the 75th percentile
historgrams.
Since the numerical and graphical approaches compliment each
other, it is wise to use both.
Inferential statistics
For choosing a statistical test
variables fall into 2 groups
Continuous variables are numeric values that can
be ordered sequentially, and that do not naturally
fall into discrete ranges.
– Examples include: weight, number of seconds it takes
to perform a task, number of words on a user interface
Categorical variable values cannot be sequentially
ordered or differentiated from each other using a
mathematical method.
– Examples include: gender, ethnicity, software user
interfaces
Tools for Measuring
Measurement is the assignment of numbers to objects or
events in a systematic fashion.
– Four levels:
nominal: assigning items to groups or categories
– Examples: Classroom, color, size
Ordinal: ordered in the sense that higher numbers represent higher
values
– Examples 1= freshmen, 2= sophomore
Interval: one unit on the scale represents the same magnitude on the
trait or characteristic being measured across the whole range of the
scale.
– Interval scales do not have a "true" zero point,
it is not possible to make statements about how many times higher
one score is than another.
Ratio: represents the same magnitude on the trait or characteristic
being measured across the whole range of the scale.
– DO have true zero points
Data Analysis
Explaining and interpreting the data:
– Data are plural
You are looking at more than one number or group of numbers;
subject-verb agreement is important when writing.
Central Tendency: measures of the location of the middle or the center
of the whole data base for a variable or group of variables
– Frequency: the number of times a number appears
– Mean: the arithmetic average
– Mode: the number that appears most often
– Median: the number in the middle when numbers are arranged by
value
– Skew: A distribution is skewed if one of its tails is longer than the
other. Data may be skewed positively or negatively.
Standard deviation: the amount of variance between each sigma
Inferential statistics
Inferential statistics
– Infers or implies something about population from a
sample.
Population: A total group
Sample: A few from the whole group
Representative sample: a sample that is equally
propionate to the population
Random Sample: a sample that is chosen strictly by
chance is not “hand-picked”
– Probability: the percentage of change that an event will
occur
Parameters vs Statistics
Parametric vs Non-Parametric
Definitions again
– Parameter is the true value in the population of
interest (everyone)
– Statistics is a number you calculate from your
sample data in order to estimate the parameter
Example:
– All the Ag Teachers of the state
– Only 25 teachers selected from the 285 that
exist
What can make the sample different
from the true value/result of the whole?
Students taught by teachers using Caert will
score higher on end of course exams than
those who do not.
– True difference – one group actually has a
higher capacity to learn.
– Random Variations -- The two populations have
identical means and the observed differences is
a coincidence of sampling
– Sampling error (bias) Poorly selected samples
not representing the population.
Parameters or Parametric Data
Parameter: a numerical
quantity measuring some
aspect of a population of
scores.
– Parameters are usually
estimated by statistics
computed in samples
Quantity Parameter
Greek letters are
commonly accepted for
writing formulas
Statistical symbols are
most common in
reporting actual data
analysis in reports or
articles.
Greek letters are used to designate
parameters
Quantity
Parameter
Statistic
Mean
μ
M
Standard deviation
σ
s
Proportion
π
p
Correlation
ρ
r
Stats tests & types of data each use
1 Sample t-test ·
1 Continuous Dependent Variable with normal distribution ·
0 Independent Variables
1 Sample Median ·
1 Continuous Dependent Variable with non-normal distribution ·
0 Independent Variables
Binomial test ·
1 Bi-level Categorical Dependent Variable ·
0 Independent Variables
Chi-Square Goodness of Fit ·
1 Categorical Dependent Variable ·
0 Independent Variables·
2 Independent Sample t-test · 1 Continuous Dependent Variable with normal distribution
–
Wilcoxon Signed Ranks Test ·
–
1 (2 level) Categorical Independent Variable
1 Continuous Dependent Variable with non-normal distribution ·
1 (2 level) Categorical Independent Variable
Chi Square Test ·
1 Categorical Dependent Variable ·
1 (2-level) Categorical Independent Variable
Fisher Exact Test ·
1 Categorical Dependent Variable ·
1 (2 level) Categorical Independent Variable
Paired t-test 1 Continuous Dependent Variable with normal distribution, · 1 (2 Level) Categorical Independent Variable
One-way repeated measures ANOVA 1 Continuous Dependent Var w/normal distribution
–
1 (Multi-Level) Categorical Independent Variable
Friedman Analysis of Variance by Ranks 1 Continuous Dependent Var w/ non-normal distribution
One-way ANOVA ·
–
–
1 (Multi-Level) Categorical Independent Variable
Kruskal Wallis
–
1 Continuous Dependent Variable with normal distribution
1 (Multi-level) Categorical Independent Variable
1 Continuous Dependent Variable with non-normal distribution
1 (Multi-level) Categorical Independent Variable
Linear Discriminant Analysis
Factorial ANOVA
–
–
1 Continuous Dependent Variable with normal distribution
2 or more Categorical Independent Variables
1 Continuous Dependent Variable with normal distribution
1 Continuous Independent Variable with normal distribution
Multiple Regression 1 Continuous Dependent Variable with normal distribution
–
1 or more Continuous Independent Variable with normal distribution
Linear Regression
–
1 Categorical Dependent Variable
Multiple Continuous Independent Variables with normal distribution
ANCOVA 1 Continuous Dependent Var w/normal distribution
–
2 (or more) Categorical or Continuous Independent Variables with normal distribution
Results
Results
At the end of the trial experience schools
using Caert had EOC exam scores that
were 18% that were higher than those
schools that did not use Caert.
– Alpha set at p<.05
– Observed P value of .03
Conclusion
Interpretation: Given that there is no true
(other than scores) difference between
schools using Caert and those that don’t,
the probability of observing a 3% (.03) or
more difference due to chance is less than
.05
ANOVA
A factorial ANOVA has two or more
categorical independent variables (either
with or without the interactions) and a single
normally distributed interval dependent
variable.
ANOVA
In statistics, ANOVA is short for analysis of
variance. Analysis of variance is a collection of
statistical models, and their associated
procedures, in which the observed variance is
partitioned into components due to different
explanatory variables.
– The initial techniques of the analysis of variance were
developed by the statistician and geneticist R. A. Fisher
in the 1920s and 1930s, and is sometimes known as
Fisher's ANOVA or Fisher's analysis of variance, due
to the use of Fisher's F-distribution as part of the test of
statistical significance.
Z test
The Z-test is a statistical test used in inference
which determines if the difference between a
sample mean and the population mean is large
enough to be statistically significant, that is, if it is
unlikely to have occurred by chance.
The Z-test is used primarily with standardized
testing to determine if the test scores of a
particular sample of test takers are within or
outside of the standard performance of test takers.
Pearson Correlation
The PEARSON Correlation is a calculation
between the correlation coefficient between two
measurement variables when measurements on
each variable are observed for each of N subjects.
– (Any missing observation for any subject causes that
subject to be ignored in the analysis.) The Correlation
analysis tool is particularly useful when there are more
than two measurement variables for each subject. It
provides an output table, a correlation matrix, showing
the value applied to each possible pair of measurement
variables.
Two-Sample t-Test
The Two-Sample t-Test analysis tools test
for equality of the population means
underlying each sample. The three tools
employ different assumptions: that the
population variances are equal, that the
population variances are not equal, and that
the two samples represent before treatment
and after treatment observations on the
same subjects.
Research Techniques
Types of hypothesis testing:
– T-test: comparing the mean of two groups
– ANOVA: Analysis of Variance – used to compare the
means of several variables
– Correlation: compares the relationship of two groups
– Chi Square of independence: explains if is a relationship
between the attributes of two variables.
– Linear regression: the prediction of one variable based
on another variable, when the relationship between the
variables is assumed to assumed to be linear.
Normal Curve
In practice, one often assumes that data are from an approximately
normally distributed population. If that assumption is justified, then
about 68% of the values are at within 1 standard deviation away from
the mean, about 95% of the values are within two standard deviations
and about 99.7% lie within 3 standard deviations. This is known as the
"68-95-99.7 rule" or the "Empirical Rule".
Key points
Comparing groups for Sig Diff
Key Terms
Use for new terms in profession (cognitive
processing skills)
Give preciseness to ambiguous term (learner)
General term used in special way (learning style)
Writing definition
– State term
– Give broad class to which term belongs
– Specify how term is used that differs
Conclusion – not always used
– Summarizes if necessary
– Tells reader what to expect
Survey Construction
Parts:
– Title
– Directions introduction to survey
Scales
– Items (a list of statements or questions)
Usually with a scale of some type
–
–
–
–
Rating
Ranking
Semantic differential
Likert type scale
Demographical info
Likert type scale
Ice cream is good for breakfast
– Strongly disagree
– Disagree
– Neither agree nor disagree
– Agree
– Strongly agree
Rating
Scale of 1 to 5
or 1 to 7 , etc….
– ? 1 = Best or highest
– ? 5 = Best or highest
Even number of items or odd?
– Forced choice – no fence sitting
– Middle – allows a middle ground response
– Might allow for not opinion, (NA or NO)
Semantic differential
“In order to succeed you must know what
you are doing, like what you are doing,
and believe in what you are doing”
Will Rogers
Setting Alpha Level
Set alpha at something like 0.05
Conduct a statistical test
Obtain a p-value
Parametric tests
–
–
–
–
Pearson Product Correlation Coefficient
Student t-Test
The z-Test
ANOVA
Nonparametric tests
–
–
–
–
Chi-Squared
Spearman Rank Coefficient
Mann-Whitney U Test
Kruskal-Wallis Test