Summarizing data - Creighton University

Download Report

Transcript Summarizing data - Creighton University

Introduction to Statistics
Do I have to??
J.D. Bramble, Ph.D.
Creighton University Medical Center
Med 483 -- Fall 2006
Why we “do it”
"What we really want to get at [in health care
research] is not how many reports have been
done, but how many people's lives are being
bettered by what has been accomplished. In
other words, is it being used, is it being
followed, is it actually being given to patients—
... What effect is it having on people—"
Rep. John Porter (R-IL), retired chairman
House Appropriations Subcommittee on
Labor, Health and Human Services (HHS), and Education
J.D. Bramble, Ph.D.
MED 483 – Fall 2006
Is Statistics Important?



Statistics is important because we can use it to find
out whether something we observe can be applied
to new and different situations.
Knowing this allows us to plan for the future, and
to make decisions about how to allocate our scarce
resources of money, energy, and ultimately life.
We use the term generalizable: can what we know
help to predict what will happen in new and
different situations?
J.D. Bramble, Ph.D.
MED 483 – Fall 2006
Why Statistics



Scientific knowledge represents the best
understanding that has been produced by
means of current evidence.
Research design, if used properly,
strengthens the objectivity of the research.
Statistical methods allow us to compare
what is actually observed to what is
logically expected.
J.D. Bramble, Ph.D.
MED 483 – Fall 2006
Why Statistics (cont’d)

Knowledge of statistics . . .





Useful in conducting investigations
Helpful the preparing and evaluating research
proposals.
Vital in deciding whether claims of a researcher are
valid
Keep abreast of current developments.
Effective presentations of the findings.
J.D. Bramble, Ph.D.
MED 483 – Fall 2006
J.D. Bramble, Ph.D.
MED 483 – Fall 2006
Evils of Pickle Eating

Pickles are associated with all the major
diseases of the body. Eating them breeds
war and Communism. They can be related
to most airline tragedies. Auto accidents are
caused by pickles. There exists a positive
relationship between crime waves and
consumption of this fruit of the cucurbit
family. For example
J.D. Bramble, Ph.D.
MED 483 – Fall 2006
Evils of Pickle Eating (cont’d)






Nearly all sick people have eaten pickles. 99.9% of all
people who die from cancer have eaten pickles.
100% of all soldiers have eaten pickles.
96.8% of all Communist sympathizers have eaten pickles.
99.7% of the people involved in air and auto accidents ate
pickles within 14 days preceding the accident.
93.1% of juvenile delinquents come from homes where
pickles are served frequently. Evidence points to the longterm effects of pickle eating.
Of the people born in 1839 who later dined on pickles,
there has been a 100% mortality.
J.D. Bramble, Ph.D.
MED 483 – Fall 2006
Evils of Pickle Eating (cont’d)

All pickle eaters born between 1849 and 1859
have wrinkled skin, have lost most of their teeth,
have brittle bones and failing eyesight-if the ills of
pickle eating have not already caused their death.

Even more convincing is the report of a noted
team of medical specialists: rats force-fed with 20
pounds of pickles per day for 30 days developed
bulging abdomens. Their appetites for
WHOLESOME FOOD were destroyed.
J.D. Bramble, Ph.D.
MED 483 – Fall 2006
Evils of Pickle Eating (cont’d)

In spite of all the evidence, pickle growers and
packers continue to spread their evil. More than
120,000 acres of fertile U.S. soil are devoted to
growing pickles. Our per capita consumption is
nearly four pounds.

Eat orchid petal soup. Practically no one has as
many problems from eating orchid petal soup as
they do with eating pickles.
EVERETT D. EDINGTON
J.D. Bramble, Ph.D.
MED 483 – Fall 2006
J.D. Bramble, Ph.D.
MED 483 – Fall 2006
J.D. Bramble, Ph.D.
MED 483 – Fall 2006
Types of Statistics

Descriptive Statistics




enumerate, organize, summarize, and categorize
graphical representation of data.
these type of statistics describes the data.
Examples


means and frequency of outcomes
charts and graphs
J.D. Bramble, Ph.D.
MED 483 – Fall 2006
Types of Statistics

Inferential Statistics




drawing conclusions from incomplete
information.
they make predictions about a larger population
given a smaller sample
these are thought of as the statistical test
Examples

t-test, chi square test, ANOVA, regression
J.D. Bramble, Ph.D.
MED 483 – Fall 2006
J.D. Bramble, Ph.D.
MED 483 – Fall 2006
Variables
J.D. Bramble, Ph.D.
Creighton University Medical Center
Med 483 -- Fall 2006
Types of Data

Qualitative



data fall into separate classes with no numerical
relationship
sex, mortality, correct/incorrect, etc.
Quantitative


numerical data that is continuous
pharmaceutical costs, LOS, etc.
J.D. Bramble, Ph.D.
MED 483 – Fall 2006
Parameters and Statistics

Parameters



characteristics of the population
calculating the exact population parameter is
often impractical or impossible
Statistics


characteristics of the sample
represent summary measures of observed
values
J.D. Bramble, Ph.D.
MED 483 – Fall 2006
Types of Variables

Variables are symbols to which numerals or
values are assigned




e.g. X and Y are variables
Dependent (Y’s), that which is predicted
Independent (X’s), that which predicts
Extraneous (Confounding or Control)

statistical models “adjust” for their influence
J.D. Bramble, Ph.D.
MED 483 – Fall 2006
Independent variables


Independent variables are the presumed
cause of the the dependent variable
The variable responsible for the change in
the phenomena being observed

Nothing is for sure, so avoid the word ‘cause’
and think in terms of independent and
dependent variables
J.D. Bramble, Ph.D.
MED 483 – Fall 2006
Dependent variables

Also referred to as the outcome variable

The outcome of the changes due to the
independent variables

Example:
y = a + bx
J.D. Bramble, Ph.D.
MED 483 – Fall 2006
Confounding
variables



Additional variables that may effect the
changes in the dependent variable attributed
to the independent variables.
These variables are controlled by measuring
them and statistical methods adjust for there
influence.
Sometimes referred to as control variables
J.D. Bramble, Ph.D.
MED 483 – Fall 2006
Active vs. attribute variables

Active variables are those variables under the
control of the researcher



controlled experimental studies
e.g., amount of drug administered
Attribute variables can not be manipulated by the
researcher


quasi-experimental studies
e.g.,sex or age of subject; blood pressure; smoker
J.D. Bramble, Ph.D.
MED 483 – Fall 2006
The Wrong data Leads to Migraines
J.D. Bramble, Ph.D.
MED 483 – Fall 2006
Levels of Measurement

Categorical Variables



Nominal Scale
Ordinal Scale
Continuous Variables


Interval Scale
Ratio Scale
J.D. Bramble, Ph.D.
MED 483 – Fall 2006
Continuous Variables

Continuous variables are measured and can
take on any value along the scale



quantitative variables
measured on a interval or ratio level
Examples

Age, income, number of medications
J.D. Bramble, Ph.D.
MED 483 – Fall 2006
Categorical Variables

Categorical variables are measured as
dichotomous or polytomous measures
 qualitative variables
 measured on a nominal or ordinal level

Examples


sex; smoking status; ownership
Categorizing continuous variables
J.D. Bramble, Ph.D.
MED 483 – Fall 2006
Nominal measurement scale




Used for qualitative data
Two or more levels of measurement
The name of the groups does not matter
Examples



Sex (Male/Female)
Smoker (Yes/No)
Political Party (Rep, Dem, Ind)
J.D. Bramble, Ph.D.
MED 483 – Fall 2006
Ordinal measurement scale




All the properties of nominal plus . . .
The groups are ordered or ranked
Intervals between groups are not necessarily
equal
Example:



Income (low, med, high)
Disease severity
Likert scales
J.D. Bramble, Ph.D.
MED 483 – Fall 2006
Interval measurement scale




All properties of nominal and ordinal plus . . .
A scale is used to measure the response of the
study subjects
The intervals scale’s units are equal; however
arbitrary (e.g., a relative scale)
Examples:

Temperature on Fahrenheit scale
J.D. Bramble, Ph.D.
MED 483 – Fall 2006
Ratio measurement scale
All properties of the previous scales plus . . .
 An absolute zero point
 Can perform mathematical operations
 Highest level of measurement
 Examples


Income, age, height, weight
J.D. Bramble, Ph.D.
MED 483 – Fall 2006
Summarizing Data
Measures of Central Tendency and Variation
J.D. Bramble, Ph.D.
Creighton University Medical Center
Med 483 -- Fall 2006
Mean

Arithmetic mean



the balance point sum all observations
sum all observations
divide the sum by the number of observations
x x x

x

1
n
2
 x3  ...  xn
n
J.D. Bramble, Ph.D.
MED 483 – Fall 2006
Median

Divides the distribution into two equal
parts.

Considered the most “typical” observation

Less sensitive to extreme values
J.D. Bramble, Ph.D.
MED 483 – Fall 2006
Calculating Medians

To find the median value: q(n+1)
41, 28, 34, 36, 26, 44, 39, 32, 40, 35, 36, 33
order data in ascending order
26, 28, 32, 33, 34, 35, 36, 36, 39, 40, 41, 44
Apply the median location formula: 0.5(12+1) = 6.5
Note: this is ONLY the location of the median
J.D. Bramble, Ph.D.
MED 483 – Fall 2006
Quantiles

Quantiles are those values that divide the
distribution into n equal parts so that there is a given
proportion of data below each quantile.

The median is the middle quantile.

Quartiles are also very common (25, 50, 75)

If we divided the distribution into 100 then we have
percentiles.
J.D. Bramble, Ph.D.
MED 483 – Fall 2006
Mode




The observation that occurs most frequently
Graphically it is the value of the peak of the
distribution.
Frequency often may be bimodal--two
modes.
If values are all the same--no mode exists
J.D. Bramble, Ph.D.
MED 483 – Fall 2006
Single Modal
50
45
40
35
30
25
20
15
10
5
0
yr 1
yr 2
yr 3
yr4
yr 5
1st Qtr
J.D. Bramble, Ph.D.
MED 483 – Fall 2006
Bimodal Example
50
45
40
35
30
25
20
15
10
5
0
yr 1
yr 2
yr 3
yr 4
yr 5
yr 6
yr 7
yr 8
yr 9
1st Qtr
J.D. Bramble, Ph.D.
MED 483 – Fall 2006
Symmetrical: The relationship
between the Mean, Median, & Mode
mean
median
mode
J.D. Bramble, Ph.D.
MED 483 – Fall 2006
Positive Skew: The relationship
between the Mean, Median, & Mode
Mode
Median
Mean
J.D. Bramble, Ph.D.
MED 483 – Fall 2006
Negative Skew: The relationship
Between the Mean, Median, & Mode
Mean
Mode
Median
J.D. Bramble, Ph.D.
MED 483 – Fall 2006
Summarizing Data


Frequency distributions
Measures of central tendency


Three common measures:


The tendency of data to center around certain
numerical and ordinal values.
mean, median, & mode
Measures of variation

standard deviation
J.D. Bramble, Ph.D.
MED 483 – Fall 2006
Five Figure Summary





Median
Quartiles
Maximum
Minimum
Can be shown in a box and whisker plot
J.D. Bramble, Ph.D.
MED 483 – Fall 2006
Which Measure?

Mean



Median



numerical data
symmetric distribution
ordinal data
skewed distribution
Mode


bimodal distribution
most popular
J.D. Bramble, Ph.D.
MED 483 – Fall 2006
Variation

Must also report measures of variation

Measures of variability reflect the degree to which
data differ from one another as well as the mean.

Together the mean and variability help describe the
characteristics of the data and shows how the
distributions vary from one another.
J.D. Bramble, Ph.D.
MED 483 – Fall 2006
Example of Variation

Take the following three sets of data:
1) 10, 8, 5, 5, 2;
2) 5, 6, 6, 7, 6;
3) 6, 6, 6, 6, 6

In all three cases the mean is 6,
 the variability is a lot of variability in set 1
 No variability in set 3.

We will discuss three measures of variability: 1) the
range; 2) the standard deviation; and 3) variance
J.D. Bramble, Ph.D.
MED 483 – Fall 2006
Measures of Variation

Range




the value between the highest and the lowest
observations
Range = xmax - xmin
limited usefulness since it only accounts for the
extreme values
can also report the inter-quartile range (q3 – q1)
J.D. Bramble, Ph.D.
MED 483 – Fall 2006
Standard Deviation






most widely used & preferred measure of variation.
represented by the symbol s or sd
the square root of the variance (s2)
larger values = more heterogeneous distribution
75% of the observations lie between x-2s and x+2s
if the distribution is normal (bell shaped)
 67% = x  s

95% = x  2s

99.7% =
x  3s
J.D. Bramble, Ph.D.
MED 483 – Fall 2006
Variance and Std Deviation
Variance
Standard
Deviation
s 
2


n  x   x 
2
2
nn  1
s s
2
J.D. Bramble, Ph.D.
MED 483 – Fall 2006
Example

Using data on the sexual activity of male and
female subjects can be found in Chatterjee,
Handcock, and Simonoff (1995) A casebook for
a first course in statistics. New York: Wiley.
They provide data on the reported number of
sexual partners for 1682 females and 1850
males. The dependent variable is the number of
reported partners.
J.D. Bramble, Ph.D.
MED 483 – Fall 2006
Descriptive Statistics
Mean
Median
Mode
Male
Female
(n=1850)
(n=1685)
10.9
4
1
3.4
1
1
J.D. Bramble, Ph.D.
MED 483 – Fall 2006
Using Excel When Syntax in Known
•Write them right into the spreadsheet
•Be sure to start with an equal sign
•Use your mouse to highlight data to analyze
J.D. Bramble, Ph.D.
MED 483 – Fall 2006
Using Excel When Syntax in Unknown
•
•
Use the wizard and follow in instructions.
All wizards work about the same way.
Select the fx button to select appropriate test
Select category and
then desired test
J.D. Bramble, Ph.D.
MED 483 – Fall 2006
Follow the Wizard
Either highlight
the array or just
write it in
These icons
reduce/enlarge
the Wizard box
J.D. Bramble, Ph.D.
MED 483 – Fall 2006