Descriptive Statistics

Download Report

Transcript Descriptive Statistics

STATS DAY
• First a few review questions
Which of the following correlation
coefficients would a statistician
know, at first glance, is a mistake?
• A. 0.0
• B +1.1
• C +1.0
• D -.7
• E -.2
Which of the following is measure
of central tendency?
• A Mean
• B Correlation
• C Random Sample
• D Frequency Distribution
• E Histogram
Most psychologists accept a
difference between groups as
“real,” or significant, under which
of the following conditions?
• A p<.5
• B p<.3
• C p<.1
• D p<.05
• E p=0
Descriptive Statistics
Descriptive statistics are used to
organize and summarize data.
They provide an overview of
numerical data
Descriptive Statistics
• Key descriptive statistics include:
• measures of central tendency
• Measures of variability
• The coefficient of correlation
Central Tendency
• Mean
• The Mean or average is probably the most
commonly used method of describing central
tendency. To compute the mean all you do is
add up all the values and divide by the number
of values.
15, 20, 21, 20, 36, 15, 25, 15
15, 20, 21, 20, 36, 15, 25, 15
• The sum of these 8 values is 167, so the
mean is 167/8 = 20.875.
Median
• is the score found at the exact middle of
the set of values
• For example, if there are 500 scores in the
list, score #250 would be the median.
15,15,15,20,20,21,25,36
15,15,15,20,20,21,25,36
• There are 8 scores and score #4 and #5
represent the halfway point. Since both of
these scores are 20, the median is 20. If
the two middle scores had different
values, you would have to take the
average of the two middle scores.
mode
• The most frequently occurring value is the
mode.
• The most frequently occurring value is the
mode.
• The most frequently occurring value is the
mode.
15,15,15,20,20,21,25,36
So the Mode is…………………
• the value 15 occurs three times and is the
mode
• bimodal distribution there are two values
that occur most frequently
• 3, 6, 7, 7, 8, 8, 9, 0
– The model should be 7 and 8
15,15,15,20,20,21,25,36
• For this set we just used what would the
mean, median, and mode be?
• -20.875, 20, and 15
• If the distribution was truly normal (bell
shaped curve) then these would all be
equal.
Range
• Range is simply the highest value minus
the lowest value .
• 15,15,15,20,20,21,25,36
• The range in this example is:
36-15=21
Bar Graphs
• A bar graph uses vertical bars to
represent the data.
• The height of the bars usually
represent the frequencies for the
categories that sit on the X axis.
• Note that, by tradition, the X axis
is the horizontal axis and the Y axis
is the vertical axis.
• Bar graphs are typically used for
categorical variables.
Histograms
• A histogram is a
graphic that shows
the frequencies and
shape that
characterize a
quantitative
variable.
Variability
• Refers to how much the scores in a data
set vary from each other and from the
mean
Standard Deviation
• Is an index of the amount of variability in
a set of data.
Speed (mph)
• Set A Perfection Blvd
• 35 34
• 33 37
• 38 40
• 36 33
• 34 30
• Mean = 35
• SD = 2.87
• Set B Wild Street
• 21 37
• 50 28
• 42 37
• 39 25
• 23 48
• Mean = 35
• SD = 10.39
Standard Deviation
• To transform a raw score
into z-score units, just use
the following formula:
•
Raw score
Mean
Z-score =
-
---------------
Standard Deviation
What if it’s not bell shaped?
• Normal
Skewed
Right
Skewed
Left
The mean, median, and mode are
affected by what is called skewness
(i.e., lack of symmetry) in the data.
• Look at the above figure and note that when a variable
•
is normally distributed, the mean, median, and mode are
the same number.
If you go to the end of the curve, to where it is pulled
out the most, you will see that the order goes mean,
median, and mode as you “walk up the curve” for
negatively and positively skewed curves.
Skewed up rules!
• You can use the following two rules to
provide some information about skewness
even when you cannot see a line graph of
the data (i.e., all you need is the mean
and the median):
• 1.
Rule One. If the mean is less than
the median, the data are skewed to the
left.
• 2.
Rule Two. If the mean is greater
than the median, the data are skewed to
the right.
Statistical significance (p)
• is a mathematical tool used to
determine whether the outcome
of an experiment is the result of a
relationship between specific
factors or due to chance.
Statistical Significance
• The statistical analysis of the data will
produce a number that is statistically
significant if the p value falls below 5%
aka p<.05. In other words, if the
likelihood of an event is statistically
significant, the researcher can be 95%
confident that the result did not happen
by chance
Statistical Significance
• The lower the p value the less likely the
results were due to chance.
Statistical Significance
If the p value was 1 in 100
it would look like
p<.01
Clarification of Behavioral
Approach
• Subject matter: Effects of environment on
the overt behavior of humans and animals
• Basic Premise: Only observable events
(stimulus-response relations) can be
studied scientifically.