#### Transcript Diving into the Deep: Advanced Concepts

```Diving into the Deep:
Module 9
Violation of Distributional
Assumptions
• What distribution do the following charts
assume?
– i-Chart
– Xbar-Range
– Xbar-Sigma
– p Chart
– np Chart
– u Chart
– c Chart
Violation of Distributional
Assumptions
• What distribution do the following charts
assume?
– i-Chart—near normal
– Xbar-Range—near normal
– Xbar-Sigma—near normal
– p Chart--binomial
– np Chart--binomial
– u Chart--Poisson
– c Chart--Poisson
Trial and Error Steps
• Plot original data in histogram and on a
probability plot of some kind.
• If skewed, try lower power (i.e., take the
square root of each value).
• Make probability plot on transformed data.
– If looks better, take even lower powers.
– If looks worse, take higher powers.
• Continue until probability plot looks
reasonably straight.
Higher Powers
This is your
raw data
Cubic
Square
Powers
Identity
SQRT
Try these
transformations
until you get
a distribution
that looks like
it comes from
a normal
distribution.
Log
1/SQRT
Lower Powers
1/X
1/Square
1/Cubic
6
4
2
0
Frequency
of Months
8
Average monthly number of preventable hospitalizations
due to chronic disease in one IPC site, 2004-2005
0
10
20
Hospitalizations
30
40
0
20000 40000 60000 80000
0
0
500
1000 1500 2000
0
5
6
7
6
2
2
3
4
1/square
-.2
-.1
0
-.5
-.4
-.3
-.2
-.1
1/cubic
0
-.1
-.05
Hospitalizations
Histograms by transformation
-.6
50
0
0 2 4 6 8
10 20 30 40
10
inverse
-.3
40
0
1
100 150
4
30
4
.6
.4
0
3
20
1/sqrt
.2
.2
.1
0
2
10
log
.3
sqrt
Density
identity
.02 .04 .06
square
0
5.0e-04 .0015
.001 .002
cubic
0
1.0e-05
2.0e-05
3.0e-05
4.0e-05
5.0e-05
Histograms of different power transformations.
0
-.04
-.03
-.02
-.01
0
Quantiles of preventable hospitalizations
against quantiles of the normal distribution:
Dots are right on the line for normally distributed data
16.7917
34.8622
0
5
10
13.5
20
30
40
40
-1.27882
0
10
20
Inverse Normal
Grid lines are 5, 10, 25, 50, 75, 90, and 95 percentiles
30
40
Which
transformation
is the
Quantile
Normal
Probability Plots
for Different
closest to
normally distributed?
Transformations
square
0
500
5
6
1.5
2
3
3.5
.05
153.5 40
-.5
-.0055096
-.000625
-.04
-.02
0
.02
Hospitalizations
Quantile-Normal plots by transformation
Grid lines are 5,10,25,50,75,90, and 95 percentiles
40
-.4
-.3
-.2
-.1
1/cubic
-.05 .05
-.1
0
-.06 -.04
30
-.6-.5-.4-.3-.2-.1
4
-.0502301 -.0125318 .0251665
-.0741758
-.2-.025
0
20
-.4459152 -.2841066 -.122298
1/square
-.2018679 -.0899905 .0218869
-.2 -.15 -.1 -.05
10
-.0149693
-.0025449
.0098794
0
inverse
2.5
1.609438
2.602003
3.688879
1.539247 2.622609 3.705971
-.04
-.03
-.02
-.01.01
4
0
1/sqrt
1 2 3 4
5.993344
2.236068
3.673604
6.324555
3.904662
3
1000 1500
-.4472136
-.2723057
-.1581139
-500
log
23 45 67
2
-1.278818 16.79167 34.86215
-.015 -.01 -.005 0 .005 .01
-.0000156
-.0004098
-.008
200004000060000
sqrt
1.815981
1246.94
10203040
0
397.625
0
0
-20000 0
identity
182.5
25 1600
-451.6897
2470.5
12564000
2000
-5001000
1500
500
-23795.93 11962.79 47721.51
-.3-.2-.1 .1
0
-20000
60000
80000
20000
40000
cubic
Quantile Normal Probability Plots for Different
Transformations
square
0
500
5
6
1.5
2
3
3.5
.05
153.5 40
-.5
-.0055096
-.000625
-.04
-.02
0
.02
Hospitalizations
Quantile-Normal plots by transformation
Grid lines are 5,10,25,50,75,90, and 95 percentiles
40
-.4
-.3
-.2
-.1
1/cubic
-.05 .05
-.1
0
-.06 -.04
30
-.6-.5-.4-.3-.2-.1
4
-.0502301 -.0125318 .0251665
-.0741758
-.2-.025
0
20
-.4459152 -.2841066 -.122298
1/square
-.2018679 -.0899905 .0218869
-.2 -.15 -.1 -.05
10
-.0149693
-.0025449
.0098794
0
inverse
2.5
1.609438
2.602003
3.688879
1.539247 2.622609 3.705971
-.04
-.03
-.02
-.01.01
4
0
1/sqrt
1 2 3 4
5.993344
2.236068
3.673604
6.324555
3.904662
3
1000 1500
-.4472136
-.2723057
-.1581139
-500
log
23 45 67
2
-1.278818 16.79167 34.86215
-.015 -.01 -.005 0 .005 .01
-.0000156
-.0004098
-.008
200004000060000
sqrt
1.815981
1246.94
10203040
0
397.625
0
0
-20000 0
identity
182.5
25 1600
-451.6897
2470.5
12564000
2000
-5001000
1500
500
-23795.93 11962.79 47721.51
-.3-.2-.1 .1
0
-20000
60000
80000
20000
40000
cubic
Probability of the observed distribution
under the hypothesis of a normal distribution
Simply type
in this command!
In STATA:
Results
Transformation
chi2(2)
P(chi2)
cubic
square
raw
square-root
log
reciprocal root
reciprocal
reciprocal square
reciprocal cubic
16.36
12.38
6.17
2.01
0.29
8.04
18.25
31.62
36.56
0.000
0.002
0.046
0.366
0.864
0.018
0.000
0.000
0.000
Which one would you use?
The approximate
probability that the
original data are
normal.
The approximate
probability that
the log of each data
point is normally
distributed.
How do you use this
transformation?
• Take the log of each value.
• Calculate centerline and control limits for
the log.
• Exponentiate the centerline and control
limits.
• Plot these values.
i-Chart assuming normal distribution of raw data.
i-Chart of the log of the number of hospitalizations.
i-Chart of No. of hospitalizations with control limits derived
from the log of hospitalizations.
Note: UCL for log of hospitalizations = 4.81, UCL for this graph=e4.81 = 122.74
Central
Limit
Theorem?
How do I know if I can use the Poisson distribution?
C-Chart and U-Chart?
Practical Way…..
• If there is a large “n” and very small “p.”
• If the mean ≈ variance
– Calculate variance using formula for
estimation of population variance with a
sample (sum of squared deviations divided by
n-1) and compare to mean.
• Then you probably have a Poisson
distributed variable.
How do I know if I can use the Poisson distribution?
C-Chart and U-Chart?
• You can get technical if you need to.
– It is theoretically possible to compare the
distribution of units of area of opportunity (i.e.,
bed-days) over the range of possible events
(0,1,2,3….50) to the distribution predicted by
the mathematical formulae for the Poisson
distribution.
• Save that for another course, please.
When can I assume the binomial distribution?
• If you have a situation where each
observation can be classified as yes or no,
0 or 1, etc.
• If average p is not close to 0 or 1.
• If n*p*(1-p) >/= 5
– THEN you can probably use the binomial, i.e.,
p-chart, np-chart.
• Need 25 subgroups, more or less
• For p and np charts, n >/= 4/pBar
• For u and c charts, cBar >/= 4
• See Tables by Benneyan.
Extra-binomial & Extra-Poisson
Dispersion
• Can occur when subgroup sizes are too
large.
more widely than predicted by the binomial
or Poisson distribution.
Is this data over-dispersed?
How do we solve this?
• Use smaller increments of time.
• Check with subject-matter experts to be
sure that there are no special causes.
• Then use the p’ method available in
CHARTrunner 3.6.
– Adjusts the control limits by combining withinsubgroup variation with between sub-group
variance.
Notice the wider control limits with the p’ chart.
This method is available in
CHARTrunner 3.6
•
•
•
•
p’
np’
u’
c’
Exercise E9
Enjoy
What did we learn from the
exercise?
• Time intervals or subgroups can be “roped
together”, i.e., not independent.
• This restricts their freedom to take on all
possible values at any given time.
• This reduces the real sample size.
• Control limits are then too narrow.
• Question: Are monthly measures on the same 50
patients for percent screened for cancer over the last
year roped together? Autocorrelated?
What can we do?
• Separate the time between subgroups, e.g.,
sample once every three months.
• Change what you are measuring.
autocorrelation (limited).
• Switch to time-series analysis and model the
autocorrelation. (Usually need help here.)
• Don’t worry unless autocorrelation is really high.
• Use a run chart without run chart rules.
Hazards of Pooling Unlike
Streams of Data
• Can we combine
– Males with females?
– Data from the private sector with the public?
– Hospital A with Hospital B?
• Confusing the effect of time with the effect
of another variable is the hazard.
• Confusion=Confounding
Example
30 Day Mortality following CABG
1
N=5250
n=10000
.5
n=5250
n=500
Example: Pooled Hospitals A and B
30 Day Mortality following CABG
1
Hospitals A & B
n=10500
.5
n=10500
How is this possible?
Pooled Results
• Confused the effect of time with the effect
of changing patient load of hospital A vs B.
Solutions
• Don’t combine unless there is no special
cause between subgroups.
• Use indirect standardization.
– See Hart and Hart, 2002, Appendix 2.
P-Chart comparing reporting sites on 1 executive order.
What is wrong with this
comparison?
• When comparing outcomes sensitive to
the severity of the patient’s condition, and
• When comparing across sites.
• Risk adjustment not as important when
comparing the same patient population
over time.
• Risk adjustment not as important for
process measures that should be done
regardless of severity.
Approaches
• Simple: Adjust for Age and Sex
• Complex: Age, Sex, Comorbidity, Other.
– Requires many variables and proprietary or
government software.
Converted into a system with
its own software.
• CMS
• JCAHCO
• AHRQ
Problems and Solutions
• We don’t collect enough variables centrally