Statistical Thinking and Analysis

Download Report

Transcript Statistical Thinking and Analysis

© 2011 Jones and Bartlett Publishers, LLC
Donald E. Lighter, MD, MBA, FAAP, FACHE
Professor, University of Tennessee
Statistical Thinking and
Analysis
© 2011 Jones and Bartlett Publishers, LLC
Deming – Theory of Profound
Knowledge
•
Systems thinking –
–
–
–
–
•
System is more than a sum of its parts
Understanding that the parts interact to produce the end product
Coordination and collaboration of parts increase productivity of system
Interconnected subsystems and processes affect each other
Process variation –
– All processes have variation that can either be inherent in the process or due to
external influences
– Variation is a major source of nonconforming output leading to reduced quality and
higher cost
– Identifying and reducing sources of variation is a major undertaking for performance
improvement initiatives
•
Theory of knowledge –
– Information must be tempered by experience and theory to become knowledge
– Effective managers combine experience and theory to create organizational knowledge
•
Psychology –
– Understanding the variation in people is as important as understanding the variation in processes
– Successful managers use human psychology to effectively coordinate, collaborate, and motivate
workers to optimize system outcomes
© 2011 Jones and Bartlett Publishers, LLC
Statistical thinking
• Ron Snee (1986):
“… statistical thinking is used to describe the thought processes
that acknowledge the ubiquitous nature of variation and that its
identification, characterization, quantification, control, and
reduction provide a unique opportunity for improvement. ”
“ ….Every enterprise is made up of a collection of interconnected
processes whose input, control variable, and output are subject
to variation. This leads to the conclusion that statistical
thinking must be used routinely at all levels of the
organization.”
Snee, R. D. (1986). "In Pursuit of Total Quality." Quality Progress 20(8): 25-31.
© 2011 Jones and Bartlett Publishers, LLC
Shewhart’s concept of process
variation
• Common cause
• Special (attributable) cause
– Variation is based on the
process materials and
procedures;
– Variation is predictable
using mathematics
related to probability and
chance;
– Variation is irregular, i.e.
shows no particular
pattern; and
– High and low values
within the measurements
are statistically
indistinguishable
– Variation due to new,
unanticipated, emergent, or
previously unknown factors
within the system;
– Variation that is entirely
unpredictable, even using
statistical probability
techniques;
– Variation that is outside
historical trends; and
– Variation that indicates an
underlying change in the
system or some previously
unidentified factor.
© 2011 Jones and Bartlett Publishers, LLC
Translated into statistical
thinking:
• Common Cause variation
– Data points from process fall inside control limits
– Data points are statistically indistinguishable
• Special Cause variation
– Data points from process fall outside control limits
(3 standard deviations)
– < 0.3% (3 chances per 1000) probability of
occurrence
• Tampering
– Deming’s concept of treating common cause
variation like
a Jones
special
cause
© 2011
and Bartlett
Publishers, LLC
Why is this stuff important?
Let’s Review Some Data
Collection Rules…
© 2011 Jones and Bartlett Publishers, LLC
Data collection principles
• Understand need for information collected
• Collect everything you think might be
needed
• Least invasive methods of collection
• Operational definition of each data
element
• Appropriate format for analysis
• Before starting, review the study to ensure
correctness
© 2011 Jones and Bartlett Publishers, LLC
Common data sources in health care
Source
Claims
Pro
Con
•Data used to pay claims
•Analyzed for errors by edits in payer
computer system
•Data entry errors – insurer, provider
•Paucity of information (limited clinical info)
•Inconsistent payments for same services
•Upcoding
•Capitation effects
Medicaid
•Consistent coding systems within state
•Population fairly uniform
•Same as above
•Varying types of plans around the US
•Tendency to upcode more pronounced
Medicare
•Relatively consistent data set
•Edits tend to reduce coding errors
•Upcoding still a problem
•Payment schedules vary by region, more than by
specialty
Provider
Billing
Systems
•Source data from point of care
•Usually consistent within a practice
•Broad variation in coding between practices
•Coding variation also for same services
•Variety of formats
•Original source data from point of care
•Complete record of clinical encounter
•Expensive to review
•Variation in recording
•Handwriting
•Variety of recording conventions
•Measures customer opinions directly
•Often can be done simply
•Lack of scientific approach, leading to bias
•Selection bias
•Validation
Patient Charts
Surveys
© 2011 Jones and Bartlett Publishers, LLC
Statistical process control (SPC) – a
method to understand variation
• Shows trends in the process mean
over time
• Evaluates process variability at each
point in time
• Provides graphic evidence that
process is in control (or not) at each
point
© 2011 Jones and Bartlett Publishers, LLC
Two primary types of data
• Attributes
– Counts of individual items
– Examples?
• Continuous (variables)
– Variables along a measurement scale
– Real numbers, no “gaps” between
measures
© 2011 Jones and Bartlett Publishers, LLC
Types of Control Charts
• Attribute data charts
– p and np charts
– c and u charts
• Continuous data charts
– IX-MR charts
– X-bar and R charts
– X-bar and s charts
© 2011 Jones and Bartlett Publishers, LLC
© 2011 Jones and Bartlett Publishers, LLC
Commonly used control charts
Control Charts for Attributes
Data
© 2011 Jones and Bartlett Publishers, LLC
Attribute chart selection
• p-charts
– Proportions of nonconformities
– Example: C-section rates
• np-charts
– Numbers of nonconformities
– Example: maternal deaths
• c-chart
– Nonconformities per inspection unit, constant number of
inspection units
– Examples: housekeeping errors per room; missed
appointments per day
• u-chart
– Nonconformities per inspection unit, like c, BUT…
– Used when the number of inspection units varies
© 2011 Jones and Bartlett Publishers, LLC
Attributes data limb
of decision tree…
© 2011 Jones and Bartlett Publishers, LLC
Example c-chart
20
15
Mean
LCL
10
UCL
Nonconformities
5
10
7
6
5
4
3
2
0
1
Number of Nonconformities
c-chart for XYZ Clinic
Day of Study
© 2011 Jones and Bartlett Publishers, LLC
u-charts
u-chart for St. Elsewhere Food Service
Note “wavy” control
limit line – why?
0.0700
0.0600
UCL
0.0400
LCL
0.0300
u
0.0200
Mean u
0.0100
Day of Data Collection
© 2011 Jones and Bartlett Publishers, LLC
7
-0.0100
4
0.0000
1
u-value
0.0500
Commonly used control charts
Control Charts for
Continuous Variables
© 2011 Jones and Bartlett Publishers, LLC
Continuous
(variables) data limb
of decision tree…
© 2011 Jones and Bartlett Publishers, LLC
IX-MR chart creation
• Example: ALOS for a hospital
– Data obtained from a hospital over 24
months
– Calculate mean of all samples, plot as
center line
– Calculate MR, average moving range
– Control limits = + D4 * MR-bar (D4 is the
“correction factor”, see Table 5.7, p 191
in the text)
– Plot on graph
Remember: software does this work for you…
© 2011 Jones and Bartlett Publishers, LLC
IX-MR chart
40
IX Chart
30
24.7170
Average
20
10
7.3636
0
-10
-9.9897
-20
Range
Date/Time/Period
30
25
20
15
10
5
0
MR Chart
Note out of control MR
chart point!
21.3133
6.5238
Date/Time/Period/Number
IX chart has sample size of 1
Moving range is the difference between successive points and
is surrogate for standard deviation (with correction factor)
© 2011 Jones and Bartlett Publishers, LLC
IX-MR chart
• What’s important?
– MR Chart – what does it mean if the MR is
out of control?
– IX Chart – what does it mean if an IX value
is out of control?
– What other analyses could we do?
© 2011 Jones and Bartlett Publishers, LLC
Common control charts depend
on…
• “Reasonable” conformity of the data set to a Gaussian
(normal, bell shaped) distribution
• Most analysis programs will provide a histogram of
the data to determine if data are normally distributed
IX-MR Histogram
12
10
0.08
24.71
-9.99
0.07
0.06
8
Number
0.05
6
0.04
0.03
4
0.02
2
0.01
0
-9.986920881
0
-3.046697983
3.893524915
10.83374781
17.77397071
24.71419361
Note bimodal distribution of data, indicating reason for MR chart in previous slide to be out of
control; thus, IX-MR may not be appropriate for this data set
© 2011 Jones and Bartlett Publishers, LLC
What if the MR is out of control?
• Determine special cause using root cause
analysis and eliminate
• Re-run the analysis with special cause
eliminated
• Track data through more cycles to ensure
that attributable cause was correctly
identified
• Other options:
– Data transformation, e.g. natural log of each
point
– Usually better
to identify
special
© 2011 Jones
and Bartlett Publishers,
LLC cause
Other types of continuous
variable charts
© 2011 Jones and Bartlett Publishers, LLC
X-bar and Range Chart
• Similar to IX-MR chart, except:
– Subgroup size = 2 – 9
– Measure of variation is range
• Procedure:
– Mean of each subgroup plotted
– Mean of those means is centerline
– Range of each subgroup plotted
– Mean of those ranges is centerline
– D4 is used to adjust ranges to control
limits
– A2 is used to create X-bar control limits
© 2011 Jones and Bartlett Publishers, LLC
X-bar-R chart
50
Phlebotomist Time - Notify to Draw
Average Time (X-bar)
40
30
20
10
Day of Study
60
Subgroup Range
4.5943
0
R Chart - Phlebotomist Time
40
20
0
0.0000
Day of Study
Note the R chart is in control
© 2011 Jones and Bartlett Publishers, LLC
The histogram is “reasonably”
normally distributed…
Histogram
40
35
59.54
-22.82
30
Number
25
20
15
10
5
0
-22.8158868-14.57998216-6.344077532 1.8918271 10.12773173 18.36363636 26.599541 34.83544563 43.07135026 51.30725489 59.54315952 67.77906415
© 2011 Jones and Bartlett Publishers, LLC
The last commonly used
continuous variable chart…
© 2011 Jones and Bartlett Publishers, LLC
X-bar and s chart
• Similar to others, except
– Subgroup size >9
– Measure of variation = sample standard
deviation
• Procedure
– Mean of each subgroup plotted
– Mean of those means is centerline
– s of each subgroup calculated and plotted
– Mean of those s-values is centerline
– B4 and B3 are used to adjust s to control limits
– A3 is used to create X-bar control limits
© 2011 Jones and Bartlett Publishers, LLC
Airflow Example
• Airflow measurements on a clinical unit
• Ten measurements a day, spaced
throughout the day
• Subgroup size = 10
• Subgroup time period = 1 day
• Measurements then plotted on x-bar s
chart
© 2011 Jones and Bartlett Publishers, LLC
The Airflow Example
37.43
UCL
33.66
CL
29.89
LCL
6.62731
UCL
3.86207
CL
Xbar
SD
1.09683
LCL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
Day
Note similarity to X-bar-R chart
© 2011 Jones and Bartlett Publishers, LLC
21
22
23
24
25
Data conversion…
• Used when raw data are not normally
distributed
• Used when raw data sample sizes
are not uniform
• Types of conversions
– Lognormal
– Arcsin
– z-score
• How do we calculate z-scores?
© 2011 Jones and Bartlett Publishers, LLC
Example Z-score plot
Z-Score Chart
4.000
UCL
3.000
2.000
1.000
0.000
Mean
-1.000
-2.000
-3.000
LCL
-4.000
Z scores are x-values divided by the standard deviation
© 2011 Jones and Bartlett Publishers, LLC
In summary…
• Data types - attribute vs. continuous variables
determine type of control chart
• Control charts have center line (average of control
chart means) and upper and lower control limits
(+3s)
• For attribute charts, data points are nonconformity
values or rates
• For continuous variable charts, data points are
sample values or averages of sample values
• Measures of variation for control charts are
corrected using bias correction tables
© 2011 Jones and Bartlett Publishers, LLC
Other useful analyses
ANOM
ANOVA
Regression
© 2011 Jones and Bartlett Publishers, LLC
Rankings are used in health care
• Concept of rankings
– How are they used?
– Are they valid?
– What about control limits?
• Measures falling within control limits are common cause
- statistically indistinguishable
• Can’t be ranked!
– Time factor - most rankings are for specific
period of time
• Physician or provider profiles –
experiences?
© 2011 Jones and Bartlett Publishers, LLC
Ranking – some approaches to
validation
• 95% Confidence Intervals
– Not time series based, usually single
point in time
– Help establish the level of variation in
the measurement used for the ranking
(higher variation, less predictive ability
from ranks)
– Still difficult to identify outliers
© 2011 Jones and Bartlett Publishers, LLC
Percentiles
• Often used for comparisons
– Examples
•
•
•
•
Percent mortality post-op
Nosocomial infection rates
Error rate for claims entry
Others?
• Problems with percentages
– Denominator size may vary, making
comparisons potentially invalid
– Case mix adjustment not often done to
adjust for sampling bias
© 2011 Jones and Bartlett Publishers, LLC
Now for something a little
different… Analysis of Means!
• Not time series data
• Used for attribute (count) data with unequal
subgroup sizes
– Rate of particular measure of count data
– Examples?
•
•
•
•
C-section rates
Antibiotic utilization rates
Infection rates post-op
Others?
• Does provide adjustment for issues like case
mix, if done correctly
© 2011 Jones and Bartlett Publishers, LLC
ANOM example: C-section rates
ANOM Chart - Comparison of Proportion Data
0.600
0.500
Proportion
0.400
Proportions
0.300
Lower Common Cause Limits
Upper Common Cause Limits
0.200
0.100
0.000
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
Subject Number
C-section rates among providers doing deliveries
UCCL = upper control limits for each provider
LCCL = lower control limits for each provider
Control limits adjusted for opportunities, i.e. cases, that provider treats
© 2011 Jones and Bartlett Publishers, LLC
27
28
29
30
Analysis of variance (ANOVA)
• Test hypotheses about differences
between two or more means
• Used in DOE to determine if changes in
mean in one intervention subgroup
statistically differ from other intervention
subgroups
• See Example 5.6 (p 220)
© 2011 Jones and Bartlett Publishers, LLC
Regression analysis
• Test hypotheses of relationships between
a response variable (Y) and one or more
predictor variables (X)
• Determination of statistical significance of
relationships (r value)
• Sign of coefficient (b) for predictor variable
determines if effect is positive or negative
• R2 value provides predictive level of model
(i.e. how much of the variation in Y is due
to the selected predictor variables)
© 2011 Jones and Bartlett Publishers, LLC
Types of regression
• Simple linear regression – relates one xvariable to one dependent y-variable
Linear Model
120
100
80
y = 4.007x + 8.663
60
R² = 0.9793
40
20
0
0
5
10
15
20
© 2011 Jones and Bartlett Publishers, LLC
25
30
Types of regression
• Multiple regression
– One dependent variable with multiple
predictor variables
– Graphic output is a multidimensional
surface, so usually not provided
– Output includes:
• Coefficients (b) and levels of significance (pvalue) for each x-value
• r value
• R2 value
© 2011 Jones and Bartlett Publishers, LLC
Design of Experiments
A scientific approach to improvement
© 2011 Jones and Bartlett Publishers, LLC
DOE – when evidence is
needed
• Method for validating processes and
determining which factors are most
important
• Just like in science class – multiple runs,
varying “factors” (predictor variables) at
different “levels”
• Statistically valid approach to identify
“main effects” (primary effect of each
factor) and “interaction effects” (effects
caused by combinations of factors)
• Optimization of experiment is desirable to
ensure identification of salient factors
© 2011 Jones and Bartlett Publishers, LLC