#### Transcript Diving into the Deep: Advanced Concepts

Diving into the Deep: Advanced Concepts Module 9 Violation of Distributional Assumptions • What distribution do the following charts assume? – i-Chart – Xbar-Range – Xbar-Sigma – p Chart – np Chart – u Chart – c Chart Violation of Distributional Assumptions • What distribution do the following charts assume? – i-Chart—near normal – Xbar-Range—near normal – Xbar-Sigma—near normal – p Chart--binomial – np Chart--binomial – u Chart--Poisson – c Chart--Poisson Making your distribution normal: Trial and Error Steps • Plot original data in histogram and on a probability plot of some kind. • If skewed, try lower power (i.e., take the square root of each value). • Make probability plot on transformed data. – If looks better, take even lower powers. – If looks worse, take higher powers. • Continue until probability plot looks reasonably straight. Higher Powers This is your raw data Cubic Square Ladder of Powers Identity SQRT Try these transformations until you get a distribution that looks like it comes from a normal distribution. Log 1/SQRT Lower Powers 1/X 1/Square 1/Cubic 6 4 2 0 Frequency of Months 8 Average monthly number of preventable hospitalizations due to chronic disease in one IPC site, 2004-2005 0 10 20 Hospitalizations 30 40 0 20000 40000 60000 80000 0 0 500 1000 1500 2000 0 5 6 7 6 2 2 3 4 1/square -.2 -.1 0 -.5 -.4 -.3 -.2 -.1 1/cubic 0 -.1 -.05 Hospitalizations Histograms by transformation -.6 50 0 0 2 4 6 8 10 20 30 40 10 inverse -.3 40 0 1 100 150 4 30 4 .6 .4 0 3 20 1/sqrt .2 .2 .1 0 2 10 log .3 sqrt Density identity .02 .04 .06 square 0 5.0e-04 .0015 .001 .002 cubic 0 1.0e-05 2.0e-05 3.0e-05 4.0e-05 5.0e-05 Histograms of different power transformations. 0 -.04 -.03 -.02 -.01 0 Quantiles of preventable hospitalizations against quantiles of the normal distribution: Dots are right on the line for normally distributed data 16.7917 34.8622 0 5 10 13.5 20 30 40 40 -1.27882 0 10 20 Inverse Normal Grid lines are 5, 10, 25, 50, 75, 90, and 95 percentiles 30 40 Which transformation is the Quantile Normal Probability Plots for Different closest to normally distributed? Transformations square 0 500 5 6 1.5 2 3 3.5 .05 153.5 40 -.5 -.0055096 -.000625 -.04 -.02 0 .02 Hospitalizations Quantile-Normal plots by transformation Grid lines are 5,10,25,50,75,90, and 95 percentiles 40 -.4 -.3 -.2 -.1 1/cubic -.05 .05 -.1 0 -.06 -.04 30 -.6-.5-.4-.3-.2-.1 4 -.0502301 -.0125318 .0251665 -.0741758 -.2-.025 0 20 -.4459152 -.2841066 -.122298 1/square -.2018679 -.0899905 .0218869 -.2 -.15 -.1 -.05 10 -.0149693 -.0025449 .0098794 0 inverse 2.5 1.609438 2.602003 3.688879 1.539247 2.622609 3.705971 -.04 -.03 -.02 -.01.01 4 0 1/sqrt 1 2 3 4 5.993344 2.236068 3.673604 6.324555 3.904662 3 1000 1500 -.4472136 -.2723057 -.1581139 -500 log 23 45 67 2 -1.278818 16.79167 34.86215 -.015 -.01 -.005 0 .005 .01 -.0000156 -.0004098 -.008 200004000060000 sqrt 1.815981 1246.94 10203040 0 397.625 0 0 -20000 0 identity 182.5 25 1600 -451.6897 2470.5 12564000 2000 -5001000 1500 500 -23795.93 11962.79 47721.51 -.3-.2-.1 .1 0 -20000 60000 80000 20000 40000 cubic Quantile Normal Probability Plots for Different Transformations square 0 500 5 6 1.5 2 3 3.5 .05 153.5 40 -.5 -.0055096 -.000625 -.04 -.02 0 .02 Hospitalizations Quantile-Normal plots by transformation Grid lines are 5,10,25,50,75,90, and 95 percentiles 40 -.4 -.3 -.2 -.1 1/cubic -.05 .05 -.1 0 -.06 -.04 30 -.6-.5-.4-.3-.2-.1 4 -.0502301 -.0125318 .0251665 -.0741758 -.2-.025 0 20 -.4459152 -.2841066 -.122298 1/square -.2018679 -.0899905 .0218869 -.2 -.15 -.1 -.05 10 -.0149693 -.0025449 .0098794 0 inverse 2.5 1.609438 2.602003 3.688879 1.539247 2.622609 3.705971 -.04 -.03 -.02 -.01.01 4 0 1/sqrt 1 2 3 4 5.993344 2.236068 3.673604 6.324555 3.904662 3 1000 1500 -.4472136 -.2723057 -.1581139 -500 log 23 45 67 2 -1.278818 16.79167 34.86215 -.015 -.01 -.005 0 .005 .01 -.0000156 -.0004098 -.008 200004000060000 sqrt 1.815981 1246.94 10203040 0 397.625 0 0 -20000 0 identity 182.5 25 1600 -451.6897 2470.5 12564000 2000 -5001000 1500 500 -23795.93 11962.79 47721.51 -.3-.2-.1 .1 0 -20000 60000 80000 20000 40000 cubic Probability of the observed distribution under the hypothesis of a normal distribution Simply type in this command! In STATA: ladder Hospitalizations Results Transformation chi2(2) P(chi2) cubic square raw square-root log reciprocal root reciprocal reciprocal square reciprocal cubic 16.36 12.38 6.17 2.01 0.29 8.04 18.25 31.62 36.56 0.000 0.002 0.046 0.366 0.864 0.018 0.000 0.000 0.000 Which one would you use? The approximate probability that the original data are normal. The approximate probability that the log of each data point is normally distributed. How do you use this transformation? • Take the log of each value. • Calculate centerline and control limits for the log. • Exponentiate the centerline and control limits. • Plot these values. i-Chart assuming normal distribution of raw data. i-Chart of the log of the number of hospitalizations. i-Chart of No. of hospitalizations with control limits derived from the log of hospitalizations. Note: UCL for log of hospitalizations = 4.81, UCL for this graph=e4.81 = 122.74 But what about the Central Limit Theorem? How do I know if I can use the Poisson distribution? C-Chart and U-Chart? Practical Way….. • If there is a large “n” and very small “p.” • If the mean ≈ variance – Calculate variance using formula for estimation of population variance with a sample (sum of squared deviations divided by n-1) and compare to mean. • Then you probably have a Poisson distributed variable. How do I know if I can use the Poisson distribution? C-Chart and U-Chart? • You can get technical if you need to. – It is theoretically possible to compare the distribution of units of area of opportunity (i.e., bed-days) over the range of possible events (0,1,2,3….50) to the distribution predicted by the mathematical formulae for the Poisson distribution. • Save that for another course, please. When can I assume the binomial distribution? • If you have a situation where each observation can be classified as yes or no, 0 or 1, etc. • If average p is not close to 0 or 1. • If n*p*(1-p) >/= 5 – THEN you can probably use the binomial, i.e., p-chart, np-chart. What about sample size? • Need 25 subgroups, more or less • For p and np charts, n >/= 4/pBar • For u and c charts, cBar >/= 4 • See Tables by Benneyan. Extra-binomial & Extra-Poisson Dispersion • Can occur when subgroup sizes are too large. • Your observed points may be spread out more widely than predicted by the binomial or Poisson distribution. Is this data over-dispersed? How do we solve this? • Use smaller increments of time. • Check with subject-matter experts to be sure that there are no special causes. • Then use the p’ method available in CHARTrunner 3.6. – Adjusts the control limits by combining withinsubgroup variation with between sub-group variance. Notice the wider control limits with the p’ chart. This method is available in CHARTrunner 3.6 • • • • p’ np’ u’ c’ Exercise E9 Enjoy What did we learn from the exercise? • Time intervals or subgroups can be “roped together”, i.e., not independent. • This restricts their freedom to take on all possible values at any given time. • This reduces the real sample size. • Control limits are then too narrow. • Question: Are monthly measures on the same 50 patients for percent screened for cancer over the last year roped together? Autocorrelated? What can we do? • Separate the time between subgroups, e.g., sample once every three months. • Change what you are measuring. • Use advanced charts to adjust for autocorrelation (limited). • Switch to time-series analysis and model the autocorrelation. (Usually need help here.) • Don’t worry unless autocorrelation is really high. • Use a run chart without run chart rules. Hazards of Pooling Unlike Streams of Data • Can we combine – Males with females? – Data from the private sector with the public? – Hospital A with Hospital B? • Confusing the effect of time with the effect of another variable is the hazard. • Confusion=Confounding Example 30 Day Mortality following CABG 1 N=5250 n=10000 .5 n=5250 n=500 Example: Pooled Hospitals A and B 30 Day Mortality following CABG 1 Hospitals A & B n=10500 .5 n=10500 How is this possible? Pooled Results • Confused the effect of time with the effect of changing patient load of hospital A vs B. Solutions • Don’t combine unless there is no special cause between subgroups. • Use indirect standardization. – See Hart and Hart, 2002, Appendix 2. P-Chart comparing reporting sites on 1 executive order. What is wrong with this comparison? Needs Risk Adjustment • When comparing outcomes sensitive to the severity of the patient’s condition, and • When comparing across sites. • Risk adjustment not as important when comparing the same patient population over time. • Risk adjustment not as important for process measures that should be done regardless of severity. Approaches • Simple: Adjust for Age and Sex • Complex: Age, Sex, Comorbidity, Other. – Requires many variables and proprietary or government software. Read about risk-adjusted control charts. Converted into a system with its own software. Other Risk Adjustment Programs • CMS • JCAHCO • AHRQ Problems and Solutions • We don’t collect enough variables centrally to risk adjust. • Solutions: – Collect more data and apply software. – Stratify/Restrict your comparisons • Males, aged 65-74, diagnosed with diabetes 5 years ago. – Compare over time (“percent improvement”) – Investigate case-mix as explanation for difference.