Data Quality Missing Data Beverly Musick

Download Report

Transcript Data Quality Missing Data Beverly Musick

Data Quality
Missing Data
Beverly Musick
This module was recorded at the health informatics-training course – data management series offered by the Regional
East African Centre for Health Informatics (REACH-Informatics) in Eldoret, Kenya. Funding was made possible by NIH’s
Fogarty Center and a grant from the Rockefeller Center. The training was held at the Academic Model Providing Access
to Healthcare (AMPATH), a USAID-funded program, supported by the Regenstrief Institute at Indiana University. The
modules were created in collaboration with the School of Informatics at IUPUI.
Content licensed under Creative Commons Attribution-Share Alike 3.0 Unported
1
Plan for Missing Data
• Before data collection begins, determine how
missing data will be recorded and entered
• Possibilities for Coding Missing Data
– Not Applicable N/A (i.e. Adherence to medications for patient not
taking any meds)
– Not Available (i.e. variable added to questionnaire at a later date,
weight temporarily missing due to broken scale)
– Unknown (i.e. HIV status of partner)
– Refusal to answer (i.e. questions associated with stigma)
– True missing (i.e. Question skipped)
• Understand how missing data will be managed in the
analysis to help determine how much information is
to be gathered about missing data
2
Types of Missingness
• Missing Completely at Random – probability of
missing data on variable Y is unrelated to the true
value of Y or other variables in the dataset
– Ex. Water damage to paper forms prior to entry
• Missing at Random – probability of missing data on Y
is unrelated to Y only after adjusting for one or more
other variables
– Ex. For really sick patients, clinicians may not draw blood
for routine labs
• Not Missing at Random – probability of missing data
on Y is dependent on value of Y
– Ex. Higher income patients may be less likely to report
income
3
Documenting Missingness
• Embed missing codes and/or variables in dataset
and/or on data collection forms
Pros:
Cons:
Permanently associated with variable
Immediately available for analysis
Reduces need to re-look for data
Takes up a lot of digital and physical space
Increases time needed to complete forms
• Provide explanation in separate meta document
Pros:
Cons:
Global explanation of missing data
Minimal digital and physical space
Eliminates ability to code subject-level data
Tends to get lost/separated from data
4
Benefits of Documenting Missingness
• Informs Quality Control reporting
– Query data collectors on missingness related to “result not
available” but not on “test not ordered”
• Allows for full disclosure in publication or
presentation of data
• Some statistical analysis methods are
dependent on Missing Completely at Random
or Missing at Random
• Useful for methodological research related to
missing data
5
Procedures for Minimizing Missing
Data
• In the clinic: review the data collection forms in the
clinic, preferably while the patient is still there.
– should be part of the clinical staff training and oversight
• Point of data entry: prevent entry of a form that is
missing key variables such as patient ID and visit
date. Alert data entry clerk about missing fields.
6