Basic Statistics Update
Download
Report
Transcript Basic Statistics Update
Matthew Perri, Bs. Pharm., Ph.D., R.Ph.
Professor of Pharmacy
Director, Pharmacy Care Administration
Graduate Program
January 2014
British Prime Minister Benjamin Disraeli
Popularized by Mark Twain
PHRM 4700 Basic Statistical Concepts
H.G. Wells
PHRM 4700 Basic Statistical Concepts
H.G. Wells
PHRM 4700 Basic Statistical Concepts
• Understanding statistics will enable you to draw
your own conclusions and make decisions:
Will you recommend this drug to patients or physicians?
Is the drug likely to work for your patients?
Is it better or safer than existing therapies?
Should the drug be listed on the formulary or PDL?
Should there be dispensing limits, refill limits, prior
authorization, limiting prescribing authority?
PHRM 4700 Basic Statistical Concepts
You are a recent Pharm. D. graduate from the University of GA. Two of your
professors (Drs. May and Perri) have served over the last two decades on the GA
Department of Community Health Drug Utilization Review Board (DURB). The
DURB is the governing body of physicians, pharmacists and others that study and
select appropriate drug therapy for the lives covered by all GA state funded health
plans (e.g., Medicaid, State Merritt, Board of Regents, Peach Care). Upon departure
from the Board, the Commissioner sought input from Drs. May and Perri about
who might be good to replace them on this body of decision makers. Dr. Perri
made the recommendation to include you on the list of possible candidates and you
were eventually selected by the commissioner. The DURB meets quarterly and
prior to each meeting a binder is sent to all members reviewing the disease states
and recent literature about the drugs to be reviewed at the next meeting.
PHRM 4700 Basic Statistical Concepts
Statistics let you make general conclusions from limited data.
Statistics is not intuitive. (Not easy to understand or use.)
Statistical conclusions are always presented in terms of probability.
All statistical tests are based on assumptions.
Decisions about how to analyze data should be made in advance.
A confidence interval quantifies precision and is easy to interpret.
A P Value tests a null hypothesis and is hard to understand at first.
Statistically significant does not mean the “effect” or “phenomenon” is large or
scientifically – clinically important.
Not statistically different” does not mean the effect or phenomenon is absent, small
or scientifically – clinically irrelevant.
Multiple comparisons make it hard to interpret statistical results – which is why we
have statistics to help fix that. (ANOVA, range tests)
Correlation does not mean causation.
Published statistics tend to be optimistic.
PHRM 4700 Basic Statistical Concepts
What
we hope to do here is to teach you the basics
needed to navigate evaluation of research.
Things you might need to know in a Spanish speaking
country”
• ¿Dónde está el cuarto de baño por favor?
• Déme una cerveza por favor.
• ¿Dónde está la biblioteca?
In
statistics:
• Were the data normally distributed?
• What was the mean? Standard deviation?
PHRM 4700 Basic Statistical Concepts
GENERAL QUESTION:
HOW DO PEOPLE LET YOU
KNOW THEY ARE AT YOUR
DOOR AND WANT TO
COME IN?
ANSWERS
They ring the doorbell.
They knock.
They stand outside, studying
kinetics, until you open the
door for your own reasons.
PHRM 4700 Basic Statistical Concepts
POSSIBLE RESEARCH
QUESTIONS
DATA SOURCES?
• How do people knock on
someone’s door?
• How many times do they
knock?
• Do people speak when they
knock?
Search literature and
review/compile the results of
previous studies on this subject
Survey people and ask them
how they knock
Observe people as they knock
and record data
PHRM 4700 Basic Statistical Concepts
Questions/Propositions
• People generally approach a residence and knock when they wish to
enter.
• Describe how people knock when at someone’s door.
Method:
• Review available data
• Design survey, experiment, interviews or some combination.
Database:
• Sample: http://www.youtube.com/watch?v=tKV4XYD3xK4
PHRM 4700 Basic Statistical Concepts
• Descriptive Statistics
Number of events observed (also known as “n” or sample size) was 35.
Sheldon knocked between 0 and 30,000 (self-reported) times when approaching
Penny’s door.
He used 1, 2, 6 and 30,000 knocks each one time. (The “1” was the robot)
He knocked for Leonard, then Penny, 5 times, with one instance where he knocked
for Penny first.
Penny knocked one time on Sheldon’s door, in this case she knocked three times.
In one instance, he knocked, then approached an interior door where he knocked
a second time.
• Parametric Statistics
The average number of knocks was 860.06 (mean)
The most common number of knocks was 3 (mode)
The median number of knocks was 3 (1, 2, 3, 6, 30000)
The standard deviation of the mean number of knocks was 4997.46
PHRM 4700 Basic Statistical Concepts
• Without any other information, which of the following can we infer:
In this sample, three knocks were used to alert the resident that someone was at
the door.
People in general knock three times.
Knocking three times is always effective in getting someone to answer the door.
Tony Orlando and Dawn ( http://www.youtube.com/watch?v=k7Jvsbcxunc )
were wrong in the 70’s when they concluded that:
You should knock three times on the ceiling…
You should knock twice on the pipe if the answer is no…
In our data, knocks were always associated with the calling out of a name and this
process was repeated.
If someone is at your door and they knock three times, followed by your name
three times, and this is repeated three times, it is likely to be Sheldon.
Sheldon has issues.
PHRM 4700 Basic Statistical Concepts
People
in general knock 3 times.
• How would our results have changed if we had seen only a subset of
the data? (Smaller sample size…) For example what if we missed the
“flash” – how would the results have changed?
The average number of knocks was 3 (mean)
The most common number of knocks was 3 (mode)
The median number of knocks was 3 (1, 2, 3, 6, 30000)
The standard deviation of the mean number of knocks was 0.641689
PHRM 4700 Basic Statistical Concepts
Good
research always poses new questions.
Additional research questions for this example:
• Is there a time when two knocks are sufficient?
• Are mechanical/technological means of knocking just
as effective as in person knocking?
• How hard would it be to find a new apartment?
PHRM 4700 Basic Statistical Concepts
Statistics
• Techniques and procedures regarding the collection,
organization, analysis, interpretation and presentation
of information that can be stated numerically (Kuzma
and Bohnenblust, 2004)
Biostatistics
• Application of statistics to the biomedical sciences
PHRM 4700 Basic Statistical Concepts
Descriptive
Statistics
• Sometimes, formal statistical analyses are not needed
or desired, depending on the research questions.
Descriptive stats tell us something about a
phenomenon or population:
Number of drug overdose fatalities in 2013
Pharmacy student acceptance rate at UGA College of Pharmacy
Demographics of a study population (2)
Numbers of patients experiencing an adverse reaction to a
medication.
Consumer awareness of advertising.
PHRM 4700 Basic Statistical Concepts
Inferential
Statistics
• Observed information is incomplete and uncertain, so we
can’t know for sure – instead we infer.
Drawing conclusions based on observed information.
Generalizing from the specifics (as is done in most clinical research).
• Example:
Once-daily aminoglycoside (ODA) regimens have been studied.
When done in one location, e.g., Athens Regional, what, if anything can or
should we infer, or generalize to other patient groups?
What about a different dose? Would these results still apply?
PHRM 4700 Basic Statistical Concepts
Variables
vs. Data
Survey vs. Experiment
Population vs. Samples
Response Rate
Sampling Techniques
PHRM 4700 Basic Statistical Concepts
• When making a gentamycin dosing recommendation,
you need to understand the patient’s characteristics,
such as age, weight and height.
In statistics, patient characteristics are referred to as
variables (e.g., Systolic Blood Pressure) because the observed
values change.
The actual values of the characteristics (variables) recorded
are referred to as data (e.g., 115 mmHg)
PHRM 4700 Basic Statistical Concepts
Surveys
• Observations of events or phenomena over which few, if
any, controls are imposed; i.e., teaching evaluations
• Teaching evaluations, political opinion polls, satisfaction
studies are all examples of survey research.
Experiments
• Design a research plan that manipulates, for example,
dosage, e.g., 50mg drug A v. 100mg or placebo
• Studying the effects on health outcomes before and after
limiting formulary access to antipsychotic agents in GA
Medicaid.
• Studying two doses of a new drug for toxicity.
PHRM 4700 Basic Statistical Concepts
Both
survey and experiments are important research
designs
FDA requires all drugs submitted for approval to be
evaluated by experimental research to substantiate
their safety and efficacy
However, survey design is often used in postmarketing surveillance for monitoring safety
PHRM 4700 Basic Statistical Concepts
A
population is a set of persons (or objects) having a
common observable characteristic
A sample is a subset of a population
• The goal is for this subset to be as representative of the
population as possible.
Example:
• The US population was 317,330,434 as of 8:30AM January 8,
2014.1
• The CBS News Poll surveyed a sample of 808 adults to
assess preferences for presidential candidates.
(1) http://www.census.gov/main/www/popclock.html
PHRM 4700 Basic Statistical Concepts
If
you wanted to study all insulin-dependent diabetics,
is there any way you could create a list of all insulin
dependent diabetics from which to draw a sample?
You can create / collect a random sample of patients
who generally represent the population in question:
then draw inferences from this group and generalize
our results to all insulin-dependent diabetics based
on how well your sample mimics the entire
population. (Note: what assumption does this require
you to make?)
PHRM 4700 Basic Statistical Concepts
PHRM 4700 Basic Statistical Concepts
2nd Year Rx Students are a sample (but probably not random – which we
will talk about in a minute) of many populations, such as all pharmacy
students at UGA, all pharmacy students in the US, students at UGA, etc.,
or even a sample of the US population. However, they are also the total
population of 2nd year pharmacy students at UGA COP. Answering
questions about a sample requires you to know the perspective you are
taking.
PHRM 4700 Basic Statistical Concepts
Sampling
nomenclature is important to
understanding research design and to
evaluating studies. The goal in evaluation of
sampling methods is to make sure the right
population was sampled for the study – and
the sample was created properly.
We don’t want to accidentally observe the
“Sheldons” of the world.
PHRM 4700 Basic Statistical Concepts
Sampling
frame:
• a complete, non-overlapping list of the persons or
objects in the population.
e.g., Want to draw a sample of GA pharmacists we could use the
database of all registered GA pharmacists as a sampling frame
Hard to develop a sampling frame for studying patients with asthma,
or any condition for that matter. This makes finding a representative
sample very important.
Random sampling is the primary method of obtaining a sample that
is representative of a larger population and an issue which can have
a huge impact on study results.
PHRM 4700 Basic Statistical Concepts
Random
Sample
Sample units are chosen in an unpredictable way
i.e., using a random number table, putting all the names in a hat
Types
Simple random sample: all members have equal chance of
selection.
Cluster: units are selected in groups such as geographic
area (Northeast, Southeast, Central ,West) then a random
sample is created in each area.
Stratified: choosing sub-groups or “strata” (e.g., race, gender,
age group, education) within a population and sampling from
within these groups.
PHRM 4700 Basic Statistical Concepts
Same
as putting ALL the names of a population in a
hat, mix them up, and select however many names
you want.
• Note, it must be all the names and each has the same
chance of being selected.
• Advantages
Avoids known and unknown biases on average
Helps convince others that the study was conducted properly
It is the basis for statistical theory that underlies hypothesis testing
and confidence intervals
PHRM 4700 Basic Statistical Concepts
You
may see other techniques used in biomedical research:
• Convenience Sample
e.g., intercepting patients after having a prescription filled at a local
community pharmacy or shopping mall.
• Systematic sampling
e.g., take a phone book and pick a random place to start, then take every 9th
name in the book.
• Stratified sampling
• Cluster sampling
• Others…e.g., snowball sampling (which is kind of cool)
PHRM 4700 Basic Statistical Concepts
Often
used when it is virtually impossible to select a
random sample
• Underlying assumption is that the sample will accurately
represent the population
Example: Estimate the average PCAT scores for pharmacy students in the
US, would you:
Use UGA Class of 2014 pharmacy students as a study sample and survey
some number of students? While we might do this we have to ask, how
representative would this actually be?
Use multiple pharmacy schools?
In a clinical trial, we might recruit patients from multiple doctors’ offices
to get a better picture.
PHRM 4700 Basic Statistical Concepts
Grouping
members of the population into
homogenous groups.
Strata should be mutually exclusive, subjects can be in only
one strata, no group should be excluded.
Then, use random or systematic sampling to id subjects in
each strata.
Can be proportional or not.
Proportional: If the population consists of 60% in the male stratum
and 40% in the female stratum, then the relative size of the two
samples (three males, two females) should reflect this proportion.
Sometimes this is used in medical research, e.g., where you want to
study patients with certain characteristics: obesity, gender, pregnancy,
past history of disease, etc.
PHRM 4700 Basic Statistical Concepts
Why
is random sampling less prone to bias
than convenience sampling?
• Think about how we selected our “convenience”
sample of events from YouTube.
Does
using a random sample guarantee a
representative sample?
PHRM 4700 Basic Statistical Concepts
Similar
meanings clinically and statistically. Clinically it
is how many patients responded in a certain manner.
Consider a random sample of college students in the
US. You sent out a questionnaire to these students to
assess how frequently college students skip classes.
The response rate is how many (usually %) students completed and returned the
questionnaire.
Is a 50% response rate good enough?
Generally, the higher the response rate, the more representative the sample,
but extremely high response rates may not always be required.
Is there any potential for bias in a study like this?
PHRM 4700 Basic Statistical Concepts
Sampling
bias exists if the sample of data you received
are not representative of the population, e.g., studied
only a certain age group when all age groups were of
concern.
In our previous example, bias may occur students
who returned the questionnaire are somehow
inherently different from those who did not.
• e.g., one could infer that more diligent students are more
likely to respond than less studious ones….
PHRM 4700 Basic Statistical Concepts
Clinical
trials often employ a non-random
sample – they do however use random
assignment of patients to groups (arms) within
the study.
PHRM 4700 Basic Statistical Concepts
Assess “how” subjects were identified and used in research.
• Researchers often have to make hard choices in their investigations regarding
•
•
•
•
how to “find” subjects for research. Sampling procedures must be appropriate
for the study population.
Studies are rarely perfect and most have their own biases: random
sampling/assignment can help.
We seldom get definitive answers, so we make inferences from the data and
analyses we do have.
Learning statistics will allow you to understand the assumptions researchers
make so that you can make your best professional judgment.
Thought question: Is a sample of healthy volunteers ever a good “sample” to
study a drug?
PHRM 4700 Basic Statistical Concepts
Descriptive statistics are used to describe the main
features of a collection of data in quantitative terms.
Descriptive statistics are distinguished from inferential stats
(we talked about these last time) in that descriptive statistics
quantitatively summarize a data set, rather than being used to
support inferential statements about the population in
question.
Even when a data analysis draws its main conclusions using
statistical analysis, descriptive statistics are generally presented
along with more formal analyses, to give the audience an
overall sense of the data being analyzed.
Pharmacy Manpower Trends: http://www.pharmacymanpower.com/trends.jsp
Research Article: Gabapentin for RLS
HCV Treatment Study
Recall
that data = observations which are the values
of the variables you record.
4 Basic Levels of Measurement Scales: Nominal,
Ordinal, Interval and Ratio
Qualitative scales: (Nominal and Ordinal)
Nominal scale
Eye color: Blue, green, or brown
No rank or order to the categories
Presence or absence of a disease
Gender
• Ordinal scale
All the characteristics of a nominal scale, plus there is a ranking
among the categories:
e.g., Mild, Moderate, Severe;
First place, Second place, Third place
Strongly Agree - - - - Strongly Disagree
Wong-Baker Faces Scale
Quantitative
scales
• Interval scale
Designates an equal-interval ordering
No true zero point
The distance between 1 and 2 is the same as the distance between 49
and 50
Fahrenheit temperature scale: 0 degrees F does not mean no temperature
60 degrees F is not twice as warm as 30 degrees
• Ratio scale
All the above plus, a true zero point
Wealth: $0 means no money
$100 is twice as much as $50
Defining
levels of measurement facilitates the choice
of appropriate statistical techniques for data analysis
Nominal
Ratio
Increasing ability to use higher level statistical analyses
Non-parametric
testing is generally performed with
nominal and ordinal level data
Parametric testing with interval and ratio
www.statsoft.com/textbook/stnonpar.htm
Interval
and Ratio data can further be classified as:
• Discrete data
Data are in whole numbers and measured by nominal or ordinal
scales:
Number of children, number of times you been married, date of birth, etc.
• Continuous data
Data may (but are not required) take on fractional values
Temperature (37.5 degrees), age, Body Mass Index (BMI)
The
type of data you have dictates the statistics you
will use.
• Generally, nominal & ordinal use non-parametric and
interval and ratio levels use parametric stats.
Incorporating the Web into your communication mix yields
strategic benefits
Rx info search after DTC
21%
Toll-free number
7%
Print
7%
MD/RPh
79%
77%
Internet
0%
Have sought Rx info
42%
20% 40% 60% 80% 100%
Have not sought Rx info
n=482
Searching the web for more
information will encourage
consumers to talk to their MDs
about advertised Rxs
DTC encourages consumers to look for more
information by going to the Web.
From recent research on DTC ads by Menon, Desphande and Perri
Normal (symmetrical) Distribution (bell shaped)
Nonsymmetrical
Distribution
Bimodal
Distribution
Descriptive
Statistics
• For normally distributed data, measured on interval
and ratio level scales, the appropriate measure of
central tendency is the mean.
• The median is most appropriate for data measured on
ordinal scales (but can still be used for continuous
data)
• Mode is the appropriate measure of central tendency
for nominal data.
Mean
is calculated by summing all the
observations and dividing the sum by the
number of observations
Median is the observation that divides the
distribution of data into equal parts
Mode is the observation that occurs most
frequently
Data:
Monthly income of 10 college students:
$300, $375, $485, $500, $600, $625, $1000, $2000, $3000, $3500
Mean
( 300 + 375 + 485 + 500 + 600 + 625 + 1000 + 2000 + 3000 +
3500) / 10 = $1238.5
Median
average of $600 and 625 = $612.5 (half the data above, half below.)
Mode: there
is no mode
•
Range
– Largest value – smallest value
– Sometimes see quartiles (75th vs. 25th quartiles, with the median at the 50th quartile)
•
Mean Deviation (Standard Deviation)
– Sum of the deviations of each variable from the “mean” observation divided by sample size; it’s
the average deviation of all observations from the mean
•
Variance
– Is computed by squaring each deviation from the mean, adding them up and dividing their sum by
one less than “n”
•
Note: The closer the data are around the mean, the smaller the standard deviation.
Coefficient of variation:
• Not as common as mean, s.d., variance, or range.
• Expressed as a percentage, with higher percentages indicating greater
variation:
• Calculated by taking the s.d. and dividing by the mean, X100.
Useful in comparing the amount of variability between data.
e.g., not much point in comparing the standard deviation of HbA1c values
with the standard deviation of blood glucose values because they are
measured on different scales. You could compare coefficient of variation
(percentage) to see which has the greater variability.
Example of Range:
LIPITOR Benefit #1: Lower Cholesterol
Along with diet and exercise, LIPITOR is proven to help you:
Lower your LDL ("bad" cholesterol) by 39% to 60%. (The average effect depends on
dose)
Lower your triglycerides (a type of fat found in your blood) by 19% to 37%. (The
average effect depends on dose)
Raise your HDL ("good" cholesterol) by up to 9%. (The average effect depends on
dose)
http://www.lipitor.com/learn-about-lipitor/lipitor-benefits.jsp?setShowOn=../learn-aboutlipitor/home.jsp&setShowHighlightOn=../learn-about-lipitor/lipitorbenefits.jsp&source=google&HBX_PK=c_lipitor&HBX_OU=50&o=23127370|166376222|0 accessed 1/8/08
The type of data dictates the measure of central tendency that
most accurately represents the data.
Sometimes data are best described by summarizing in a
descriptive fashion.
Otherwise, data are described by a measure of central
tendency and a measure of variation; mean and standard
deviation.
Sometimes a combination of both are used.
More information about your sample is better when it comes
to informing those who may want to draw conclusions from
your work.