descriptive statistics - People Server at UNCW

Download Report

Transcript descriptive statistics - People Server at UNCW

DESCRIPTIVE STATISTICS
Summarizing, organizing, simplifying and communicating
the nature of a data set in numerical terms. These
numerical accounts are intended to describe the data set
without inferring causal factors ( what caused the data).
….To describe the data set
3 primary concerns
• CENTRALITY
• VARIABILITY
• RELATEDNESS
CONSIDERATION:
• SCALES OF DATA
NOMINAL Scales of data
Continuous data scales
Interval and ratio
MEASURES OF CENTRALITY
MEAN
MEDIAN
MODE
MEAN (which scales of data can be represented in this way?)
(which scales of data can be represented in this way?)
• Median-most central value
• Mode most frequently occurring data point
Centrality
• Can you find the mean hair color in the
class?
• MEDIAN?
• MODE?
Centrality-for ordinal data makes
little sense
• Can you find the mean?
• Median?
• Mode?
• TRANSFORM YOUR DATA!
Measures of Variability for
Continuous data
• Range
• Variance
• Standard deviation
Range
• Highest score minus lowest score
– How accurately will the “range” describe
dispersion of the data?
Variance
Don’t be confused by different
formulations : this is also the formula for
variance
Standard Deviation
Why “n” or “n-1?”
(the square root of the variance)
• Find the standard deviation of 4, 9, 11, 12,
17, 5, 8, 12, 14
• STD Example
• Find the standard deviation of 4, 9, 11, 12, 17, 5, 8, 12, 14
First work out the mean: 10.222
Now, subtract the mean individually from each of the numbers given and
square the result. This is equivalent to the (x - )² step. x refers to the
values given in the question.
• X
4
9
11
12
17
5
8
12
14
• (x - )2
38.7 1.49 0.60 3.16 45.9 27.3 4.94 3.16 14.3
• Now add up these results (this is the 'sigma' in the formula): 139.55
• Divide by n. n is the number of values, so in this case is 9. This gives us:
15.51
• And finally, square root this: 3.94
Example
• Mean scores for a class on two different
tests were:
Test 1- 75.5%
Test 2-
75.5%
Did an average student do better, worse or
the same on test 2 vs test 1?
80
Mean % performance
70
60
50
40
30
20
10
0
test 1
test 2
Cell
Test 1
Histogram
8
7
6
Count
5
4
3
2
1
0
65
70
75
80
85
Exam 1
90
95
100
Test 2
Histogram
9
8
7
Count
6
5
4
3
2
1
0
40
50
60
70
80
Q1
90
100
110
NOTE***
• When you present a mean, it should
always be accompanied by a measure of
variability!!
120
Mean % performance
100
80
60
40
20
0
test 1
test 2
Cell
Relatedness = correlation
– Correlations yield coefficient values
– Between +1.0 and -1.0
Correlational outcomes can be
visualized
Criteria For Evaluating Correlation Coefficients
There are no widely accepted criteria for defining a strong, moderate or weak association.
However, there are suggestions for health science studies:
Correlation coefficient, r
0.00 - 0.25
0.25 - 0.50
0.50 - 0.75
0.75 - 1.00
Interpretation
no or weak relationship
fair degree of relationship
moderate to good relationship
good to strong relationship
Correlation coefficients are not proportional. That is, the difference between 0.5 and 0.6 is not the
same as the difference between 0.8 and 0.9. Also, a 0.4 correlation for one set of data is not the
same as a 0.4 correlation for another set of data. Each correlation coefficient must interpreted
with respect its own data.
Relatedness…correlation
• Consider scale of data for each variable
•
•
•
•
Var X
Nominal
Nominal
Ordinal
Var Y
Nominal
continuous
Technique
Chi-square
Rpbs
• Continuous
Continuous
Spearmans rho
Pearson’s r
Pearsons’ product-moment
correlation
Further considerations when
performing correlation analysis
considerations
Correlation
• Third variables
• Directionality
• Correlation coefficients
• Positive, negative and zero correlations
• Graphing and curvilinearity
Graphing results
Bar graphs vs line graphs
Label axis
Use error bars
Provide a figure legend!
Rough methods section for worry survey
• Methods
• Materials
• The subjects were randomly chosen people with no
preference for race, socioeconomic status, appearance,
etc. Researchers attempted to avoid recruitment of
subjects under the age of eight years old. The subjects
were separated into two different groups, child (8 years to
12 years) and adult. Otherwise, a matched stratified
random sampling procedure was used across age groups.
Adult subjects were classified in one of four subsets: teen
(13 years to 19 years), adult (20 years to 40 years), middleaged adult (41 years to 60 years), and old persons (61
years plus). The subjects were chosen with intent at
achieving an equal represent among all four subsets within
the adult group. To maintain the objective nature of the
experiment the subjects were recruited in a city in
southeastern North Carolina at a variety of places including,
but not limited to: local restaurants, the local college
campus, the local beach, and local parks.
• Materials
• A team of twenty-four researchers devised a survey to examine age,
relative anxiety, and risk perception. A survey was created through
a collaboration of the twenty-four members of the research team.
Members submitted a list of items/events that would cause them to
worry. This list was compiled and examined for relevance to adults
and children, redundancy, and effectiveness. Items deemed
unnecessary or irrelevant were excluded from the survey. The team
then decided which of the events/items were inappropriate for
children and were thus discarded.
• There were two different forms of the survey, one for the adult set of
subjects and one for the child group of subjects. The reasoning for
this was there were some items on the adult survey that were not
appropriate for children. Examples of this would include worry
involved with “drunk driving”, “being drugged”, and “getting sexually
assaulted”. The survey required some general demographic
information
•
including gender, age, highest level of education, and race. The survey
then asked the subject to indicate on a scale of 1-7, 1 being not worried at
all and 7 being extremely worried, if he or she was a worrier in general. A
listing of potentially risky events followed. The adult survey contained 54
items and the child survey contained 42 items. Excluding the examples
depicted above, the items on the adult and child surveys were the same.
These items included events/activities tat would generally cause one
anxiety such as “skydiving”, “holding a snake”, “being outside during a
lightning storm”, “swimming in the ocean after reports of a shark attack”, etc.
Items were also included to act as controls for biased or inaccurate
responding such as “playing putt-putt golf” or “taking a walk”. The survey
was also designed to control for those not paying attention / not taking the
survey seriously by placing “being lost” on the survey in two separate
places. Subjects were to indicate their level of worry if they were to
participate in or encounter each of these items on the same 1-7 scale of
worry. The survey concluded with a question regarding the interference of
worry with the subject's normal routines, work, school, and/or social
activities on which the subject gave a score of 1-7, 1 being no interference
and 7 being extremely interfering.
• Procedure
•
Adult subjects were randomly recruited by walking up to a
person and asking it he or she would like to participate in a quick
(about five minutes) survey. If the person obliged, he or she was
informed that this was a survey inquiring about typical worries. The
subject was also informed that partaking tin the survey posed no
mental or physical risk, and was assured of the anonymity of his or
her responses to the items. Following administration of the survey,
the subject was informed of the intent of the survey.
• Potential subjects were excluded if they looked to be in either too
much of a hurry or too busy. These potential subjects were
excluded on the basis of possible contamination of the results.
Subject traveling in a group were not approached, as group bias
would possibly skew their responses. For the same
• reason, when surveys were administered to more than one subject
at once, the subjects were asked not to speak with one another
about the survey until all had completed.
• Because only two of the members had ready access to
administration of the survey to children, some of the elimination
criteria used of adults could not apply. There were two main places
children were recruited, at a soccer practice and at a children's
museum. At the soccer field, the children arrived in groups of three
to five. These subjects were instructed to take the survey quietly
without discussion of the survey during administration, but because
they were children this was quite difficult to control. The children
recruited at the children's museum were directed to a table where
the survey was administered on a one by one basis.
For the Statistics we will talk about
these are the general assumptions:
• ASSUMPTIONS
Linearity
– Normal distributions
•
CONSIDERATIONS
– SCALES OF DATA