Data-Driven Decision MakingEd Schumacher, Ph.D.

Download Report

Transcript Data-Driven Decision MakingEd Schumacher, Ph.D.

Data-Driven Decision Making
Ed Schumacher, Ph.D.
Statistical Analysis for Health Care Management
CENTENE-TRINITY-OLIN
“In God we trust; all others must
bring data.”
W. Edwards Deming
Creative Destruction of Medicine
New Medicine
(individual-based)
Consumerism
Social
Networking
Wireless
Censors
Genomics
Super Convergence
New financing
models
Imaging
Predictive Analytics
Information
Systems
Computing Power +
Big Data
Old Medicine
(population-based)
Thinking Like a Statistician
• The World is made up of information
– We make judgments and decisions based on the
data that reaches us
– If that data is systematically biased then these
judgments are systematically wrong
• Formal statistical analysis provides precise
ways to convert data into information
– But we can also use these concepts to help us in
our day-to-day thinking
3
Start with a Story
• A study of the incidence of kidney cancer in
the 3,141 counties of the US reveals a
remarkable pattern. The counties with the
lowest incidence of kidney cancer are:
– Rural, sparsely populated, located in traditionally
Republican states in the midwest, south, and west.
4
We are not natural statisticians
• We like stories
– A good story usually trumps good data
• This causes us to make mistakes in evaluating
the randomness of truly random events
• Which of these is more likely?
– BBBGGG
– GGGGGG
– BGBGBG
5
We misunderstand randomness
• We pay more attention to the content of
messages than to information about their
reliability
– We end up with a view of the world that is simpler and
more coherent than the data justify
• Many facts of the world are due to chance
– Causal explanation of chance events are inevitably
wrong
• Even faced with data to the contrary, we stick to
our story
– Emulate successful firms
6
• So we need to train our brains to think
statistically
• Huge quantities of data, from an increasingly
variety of sources
• Key is to convert data into information.
• A basic understanding of statistical concepts is
critical
7
Selection Bias
• We make a lot of decisions based on
incomplete information
– Often we make inference about a whole
population based on a subset of that population
– As long as the subset is representative of the
whole, your inferences will be reasonable
– But if our subset is a special set of the population,
then we can be way off
8
Selection Bias
• Suppose you are in charge of monitoring
sound quality in a large auditorium that is
filled with people
Can Hear
Everything
Can Hear a
Little
Can’t Hear
Anything
9
Selection Bias
• After the lecture you told people to go to a
certain web site and fill out a survey on how
well they could hear
Can Hear a
Little
Can Hear
Everything
Can’t Hear
Anything
10
Selection Bias
• Or what you get back looks as follows. What
do we conclude about the sound?
Can Hear
Everything
Ca
n
He
ar a
Litt
le
11
Selection Bias
• Where do we see selection bias?
12
Selection Bias
• The Feedback Effect
– My favorite professor
– Teaching evaluations
– Leaders often have skewed impressions of their
organizations
• If there are real or perceived dangers to expressing a negative
opinion
• Many instances of “supervisory madness” are not the result of
malice or idiocy, but rather selection bias
–
–
–
–
Fox News vs MSNBC, NYTimes vs WSJ
Geographic variations in health care
Spending at the end of life
Other instances were we receive “first person” information
13
The Filter Bubble = Selection Bias
14
Endogeneity – Correlation Does
not Imply Causation
• Endogeneity
– Not even a word according to Microsoft
– Basic idea:
• We assume X Y but in fact Y X
• Or Assume A B but in fact C A and C B
15
Endogeneity
• Omitted Variable Bias
• It is typically assumed that GPA is a good measure
of a job or graduate school applicant’s prospects
𝑋 ∗ 𝐸𝑓𝑓𝑜𝑟𝑡 + 𝑌 ∗ 𝐴𝑏𝑖𝑙𝑖𝑡𝑦 = 𝐺𝑃𝐴 + 𝑒𝑟𝑟𝑜𝑟
– GPA is a function of how hard you work and how able
you are
– X converts effort into GPA points
– Y converts ability into GPA points
– Error is a term that allows for random variations
16
Endogeneity
• Problem:
– Students can pick their courses
• Now the error term is not random – it is
correlated with an omitted variable: the difficulty
of the course
• Taking easier courses results in a higher GPA
– So our X and Y effects get diluted
• We may not write out equations like this very
often, but we implicitly do in our heads.
– Need to be aware of omitted variables
17
Endogeneity
• Causality Loops -- or reverse causation
– Does X cause Y or does Y cause X? Or BOTH?
– Open That Bottle Night
18
Endogeneity
• Her equation
𝐶𝑜𝑚𝑝𝑢𝑡𝑒𝑟𝑡𝑖𝑚𝑒𝑡 = 𝑓 𝑓𝑜𝑜𝑡𝑏𝑎𝑙𝑙𝑡−1
Computer time is a function of football time
• My equation
𝑓𝑜𝑜𝑡𝑏𝑎𝑙𝑙𝑡 = 𝑓 𝑐𝑜𝑚𝑝𝑢𝑡𝑒𝑟𝑡𝑖𝑚𝑒𝑡−1
Football time is a function of computer time
• Both variables are endogeneous to the other
• The surgeries don’t start on time because the anesthesiologist
doesn’t show up on time
• The anesthesiologist doesn’t show up on time because the
surgeries don’t start on time
19
Endogeneity
• A similar problem shows up in much advertising:
– People who switched to Progressive save $$$$
• Note the big endogeneity problem here
– The decision to switch insurance is endogeneous –
people who discover they would save money will
switch, those who discover they won’t tend not to
switch.
– The company wants you to think switching will save:
X*Switch = Savings + error – switching causes savings
But:
Y*Savings = Switch + error – savings causes switching
20
Endogeneity
• “Mark Zuckerberg dropped out of college and did well for
himself– we need to encourage our youth to be more
adventurous, risk-taking, entrepreneurial”
𝑋 ∗ 𝐷𝑟𝑜𝑝𝑜𝑢𝑡 = 𝑆𝑢𝑐𝑐𝑒𝑠𝑠 + 𝑒𝑟𝑟𝑜𝑟
Taking risk results in a higher chance of success. But…
𝑌 ∗ 𝑆𝑢𝑐𝑐𝑒𝑠𝑠 = 𝐷𝑟𝑜𝑝𝑜𝑢𝑡 + 𝑒𝑟𝑟𝑜𝑟
Being successful makes it more likely to drop out
•
Note also there is a sample selection problem here
21
Endogeneity
• Challenge of the Big Data movement
– The idea is to move away from causation and
towards correlation
• “Correlations are powerful not only because they offer
insights, but also because the insights they offer are
relatively clear. These insights often get obscured
when we bring causality back into the picture” -Mayer-Schonberger and Cukier, Big Data
22
Endogeneity
• Big Data has made some impressive
observations:
– Google search terms and flu outbreaks
– Farecast – predicting if the price of plane tickets
will rise or fall in the future
– Target knows you are pregnant before your family
does
– Wal-Mart: hurricanes and strawberry Pop-Tarts
• But there are some limits
23
Endogeneity
• Correlations are not always all that clear
– A zillion things can correlate with the other.
• Ice cream sales and drowning are highly correlated
– You will have to rely on some causal hypothesis
– We are “blind to randomness” – Black Swans
• The passing of time can produce gigantic and
unpredictable changes in taste and behavior
– These changes will be poorly anticipated by
looking at patterns of data on what just happened
24
Endogeneity
• Data creates bigger haystacks
– As we acquire more data we have the ability to
find many more statistically significant correlations
– Most of these correlations are spurious
– Falsity grows exponentially the more data we
collect
• The haystack gets bigger but the needle stays the same
25
Endogeneity
• Big Data is a tool – a good tool but just a tool
• It is really good at telling you what to pay
attention to, but then you need to get back to
the world of causality to benefit.
• “Big data is like the offensive coordinator up
in the booth at a football game who, with
altitude, can see patterns others miss. But
the head coach and players still need to be on
the field of subjectivity” David Brooks
26
Bayes' Theorem
• Student in middle school
– “given I received this note, what are the chances she
actually likes me?”
• This is what is known as conditional probability
– The note is the conditioning event
– The probability I am interested in is “does she like me?”
• How do we update when we receive new
information?
27
Bayes' Theorem
• Bayes' Theorem gives us the answer. We
need 3 quantities
1. The probability of getting a note conditional
on her liking me -- let’s say this is 75%
2. The probability of getting a note conditional
on her NOT liking me – let’s say this is 20%
3. Prior probability – what is the chance of her
liking me prior to the note appearing? Let’s
say 10%
28
Bayes' Theorem
• So now we can calculate the probability of her
liking me conditional on receiving the note:
𝑃 𝑙𝑖𝑘𝑒 𝑛𝑜𝑡𝑒 =
𝑃 𝑛𝑜𝑡𝑒 𝑙𝑖𝑘𝑒 ∗ 𝑃(𝑙𝑖𝑘𝑒)
𝑃 𝑛𝑜𝑡𝑒 𝑙𝑖𝑘𝑒 ∗ 𝑃 𝑙𝑖𝑘𝑒 + 𝑃 𝑛𝑜𝑡𝑒 𝑛𝑜𝑡𝑙𝑖𝑘𝑒 ∗ 𝑃(𝑛𝑜𝑡𝑙𝑖𝑘𝑒)
29
Bayes' Theorem
• So now we can calculate the probability of her
liking me conditional on receiving the note:
Prior probability
She likes me
x
0.10
Probability of getting note conditional on her liking
me
y
0.75
Probability of getting note conditional on her NOT
liking me
z
0.20
conditioning event -- note in class
(xy)/(xy+z(1-x))
Posterior probability
probability of her liking me conditional on getting
the note
0.29
30
Bayes' Theorem
• Note the power of the prior belief. Suppose
we put a 50% chance on her liking me prior to
the note
– That raises the probability of her liking me
conditional on getting the note to almost 80%
– I’ll come back to this in a bit, but our priors matter
31
Bayes' Theorem
• Application – Screening for breast cancer
among women in their 40s.
– What is the probability of getting cancer
conditional on getting a positive mammogram?
32
Bayes' Theorem
• The prior probability of a woman in her 40s
developing breast cancer is about 1.4%
• If a woman does not have cancer, a
mammogram will incorrectly claim she does
about 10% of the time
• If a woman does have cancer a mammogram
will detect it about 75% of the time
33
Bayes' Theorem
• So the probability of having breast cancer
conditional
on a positive mammogram is:
Prior probability
Obtaining breast cancer
x
0.01
Probability of getting a positive mammogram
conditional on having cancer
y
0.75
probability of getting a positive mammogram
conditional on NOT having cancer
z
0.10
conditioning event -- positive
mammogram
Posterior probability
probability of having cancer conditional on
having a positive mammogram
0.10
34
Bayes' Theorem
• Even though false positives are not too likely,
they dominate since the prior probability is so
low.
• To see this assume a population of 1000 women
in their 40s
– 14 of them will have cancer, 986 will not
– If there is a 10% chance of a false positive -- 98.6 will
have false positives
– If there is a 75% chance of a true positive --10.5 will
have true positives
– Thus 109.1 women will have positive mammograms,
10.5 of them will have cancer: 10.5/109.1=.096
35
Bayes' Theorem
• This is known as the Base-Rate Fallacy
– We tend to be very bad at using Bayes' Theorem
and tend to focus on the most immediate
information
– The value of Bayes' theorem is that it allows us to
account for our prior beliefs but then to update
them with this new information
• Bayes' helps us understand why two people
can look at the same data and come to two
opposite conclusions:
36
Bayes' Theorem
Prior probability
God exists
Francis
Fidel
x
0.900
0.001
Probability of water into wine conditional on God existing
y
0.990
0.990
probability of water into wine conditional on God not
existing
z
0.010
0.010
0.999
0.090
conditioning event -- water into wine
Posterior probability
probability God exists given water into wine
37
Bayes' Theorem
• But the encouraging thing about Bayes'
Theorem is that over time our beliefs should
converge as we update with new information
Prior probability
God exists
x
Party 1 Party 2 Party 3
0.001
0.090
0.908
conditioning event -- water into wine
Probability of water into wine conditional on
God existing
y
0.990
0.990
0.990
probability of water into wine conditional on
God not existing
z
0.010
0.010
0.010
0.090
0.908
0.999
Posterior probability
probability God exists given water into wine
38
Descriptive Statistics
• Descriptive Statistics vs. Inferential Statistics
– Descriptive stats are used to summarize and
describe a group of data.
• The analyst is neutral
–
Inferential statistics makes possible the
estimation of a characteristic of a population
based only on sample results
•
Analyst uses judgment
39
Measures of Central Tendency
• Consider the following data
Patient
LOS
1
5
2
5
3
2
4
10
5
4
6
5
7
3
40
Mean, Median, Mode
• Three common measures of central tendency:
Mean Median and Mode
– Arithmetic Mean: x =
•
•
•
•
•
•
𝑛
𝑖 𝑥𝑖
𝑛
Where x is the sample mean
n is the number of observations in the sample
xi is the ith observation of the variable x
Xi is the summation of all Xi values in the sample
(5+5+2+10+4+5+3)/7 = 4.86
The average LOS is 4.86 days for this sample
41
Geometric Mean
• An alternative to the arithmetic mean – often
used when there are positive outliers.
• 𝐺=
𝑛
𝑥1 ∗ 𝑥2 ∗ 𝑥3 ∗ ⋯ ∗ 𝑥𝑛
• Or 𝑙𝑛𝐺 =
𝑛 𝑙𝑛𝑋𝑖
1
𝑛
– The log of the geometric mean is equal to the log mean
7
• 𝐺 = 2 ∗ 3 ∗ 4 ∗ 5 ∗ 5 ∗ 5 ∗ 10 = 4.36
• The geometric mean is always less than the
arithmetic mean
42
Median
• The median is the middle value in an ordered
array of data.
• If there are no ties, half the observations will
be smaller than the median and half will be
larger.
• The median is unaffected by any extreme
observations in a set of data
2
So 5 is the median
3
4
5
5
5
10
43
Mode
• The mode is the value in a set of data that
appears most frequently.
• So the mode is 5 in our sample
44
Measures of Variation
• Range
– The range is the difference between the largest
and smallest observations in the data. In our
hospital revenue data the range is 10-2=8.
• The Interquartile Range
– The interquartile range is obtained by subtracting
the first quartile from the third quartile.
45
Variance
• The sample variance is roughly the average of
the squared differences between each of the
observations in a set of data and the mean:
• 𝑆2 =
𝑛
2
𝑖 (𝑥𝑖 −𝑥)
𝑛−1
• So for our data we would get: 6.48 days
• Why n-1?
46
Standard Deviation
• If we take the square root of the variance we get
the standard deviation:
• S = S2
• So S = 2.54
• On average each patient’s stay is about 2.5 days
away from the average LOS.
• Note that in the squaring process, observations
that are further from the mean get more
weight than are observations that are closer to
the mean.
47
Coefficient of Variation
• The coefficient of variation is a relative measure
of variation:
𝑠
𝐶𝑉 =
∗ 100%
𝑥
• This statistic is most useful when making
comparisons across different types of data that
might use different scales or different units of
measurement.
• It makes it easier to compare apples to oranges.
48
Coefficient of Variation
• Wait times for 3 different lab tests:
test a
test b
test c
12
54
105
15
31
95
20
54
110
10
60
135
7
60
187
9
51
115
mean
12.17
51.67
124.50
st dev
4.71
10.75
33.37
cv
0.39
0.21
0.27
49
Correlation Coefficient
• The correlation coefficient gives a value to
how to continuous variables relate to each
other.
Patient
LOS
cost
3
2
1145
7
3
2108
5
4
3425
1
5
4358
2
5
5689
6
5
6258
4
10
9256
50
Correlation Coefficient
• The Sample Correlation coefficient, r, can be
calculated as
𝑟=
𝑛
𝑖
𝑛
𝑖 (𝑥𝑖
𝑥𝑖 − 𝑥 (𝑦𝑖 − 𝑦)
− 𝑥)2
𝑛
𝑖 (𝑦𝑖
− 𝑦)2
• When I do this for the above data I get .95
• Interpret this
• There is a canned command for this in excel:
=correl(xrange,yrange)
51