3_IhtsdoBusinessMeetingOct2014_ImpSIG_Data Quality and
Download
Report
Transcript 3_IhtsdoBusinessMeetingOct2014_ImpSIG_Data Quality and
SNOMED CT
Data Quality and Data Repair
Dr Jeremy Rogers
IHTSDO Consultant Terminologist
Principal Terminology Specialists NHS HSCIC
IHTSDO ImpSIG, Amsterdam, October 27th 2014
Outline
Data Quality : An old problem
SNOMED CT : New ways to get it wrong
SNOMED CT : New ways to prevent or fix it
Medicine needs useful formal ontologies, but formal ontologies that are simple
to use are not useful, while useful ontologies appear to be too complex to be
directly useable.
MD Thesis 2004
Old data quality problems…
Interrater Variability
ART & ARCHITECTURE THESAURUS (AAT)
Domain: art, architecture, decorative arts, material culture
Content: 125,000 terms
Structure: 7 facets, 33 polyhierarchies
Associated concepts (beauty, freedom, socialism)
Physical attributes (red, round, waterlogged)
Style/Period (French, impressionist, surrealist)
Agents: (printmaker, architect, jockey)
Activities: (analysing, running, painting)
Materials (iron, clay, emulsifier)
Objects: (gun, house, painting, statue, arm)
Synonyms
Links to ‘associated’ terms
Access: lexical string match;
hierarchical view
Data Quality
Untrained, time pressured users
Headcloth
Cloth
Scarf
Model Person
Woman
Adults
Standing
Background
Brown
Blue
Chemise
Dress
Tunics
Clothes
Suitcase
Luggage
Attache case
Brass Instrument
French Horn
Horn
Tuba
X
X
X
X
X
X
X
X
X
X
X
X
X X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X X
X
X X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
Data Quality
Types of coding error
Missed coding: no code
e.g. Table?
Miscoding : wrong code
e.g. French Horn, Arms
Undercoding : half the truth
e.g. Brass Instrument
Overcoding : truth + lies
e.g. Woman
Outline
Data Quality : An old problem
SNOMED CT : New ways to get it wrong
SNOMED CT : New ways to fix it
SNOMED CT Miscodes
39 months in a busy UK A&E Department
• Setting: 408,831 coded ED episodes
– One ‘reason’ code per completed visit
– 39 months (Oct 2008 – Dec 2011)
– 12,022 distinct SCT codes selected at least once
• Users: No training, time pressured
• Browser: string match on all of
SNOMED, no hierarchy
SNOMED CT: New ways to get it wrong
‘Ontology-driven’ miscodes
11% of all data
‘obviously’
miscoded
SNOMED CT: New ways to get it wrong
‘Obvious’ miscode examples
1097
145
118
24
33
15
17
8
373
207
117
50
33
136
82
78
396
230
Temperature
High temperature
FB (Foreign Body)
TB (Tuberculosis)
Spot
MI (Myocardial Infarct)
Drug used
Drugs
ETOH - Alcohol intake
Alcohol
EtOH – Ethanol
Ethanol
Lymph node
Nasogastric tube
Catheter
Dressing
Psychiatric
Stabbing
246508008|Temperature (attribute)|
285717004|High temperature (physical force)|
367409002|Followed by (attribute)|
60117003|Transmitted by (attribute)|
23840004|Leiostomus xanthurus (organism)|
45169001|Without (attribute)|
246488008|Drug used (attribute)|
228011000000101|Drugs (navigational concept)|
160573003|Alcohol intake (observable entity)|
53041004|Alcohol (substance)|
419442005|Ethyl alcohol (substance)|
419442005|Ethyl alcohol (substance)|
59441001|Structure of lymph node (body structure)|
17102003|Nasogastric tube, device (physical object)|
19923001|Catheter, device (physical object)|
37898001|Dressing, device (physical object)|
27296002|Psychiatric (qualifier value)|
410706007|Stabbing sensation quality (qualifier value)|
SNOMED CT: New ways to get it wrong
Subtle miscode examples
Coding foreign bodies…
Disorder or treatment ?
11% of all data
‘obviously’
miscoded?
SNOMED CT: New ways to get it wrong
No ‘standard’ miscoding error rate
• 23% of 74 abdominal aortic aneurysms miscoded as a Drug Trade Family
(9192101000001100 AAA (product)); AAA is the name of a pharmaceutical
company that make a spray to soothe sore throats
• 25% of 939 stabbing victims miscoded as a qualifier value (‘stabbing
sensation quality’ = quality of pain experienced during heart attack)
• 33% of 3771 patients with some form of fever miscoded as ‘temperature’
either as an (attribute) or a (physical force)
• 38% of 1101 failed consultations (patient left the department, or did not
attend an appointment) miscoded as either a laterality (left) or as
deoxyribonucleic acid (DNA = Did Not Attend)
• 44% of 575 patients attending with a fish bone stuck in their throat
miscoded as a food allergen (7661006|Fish bone (substance)|)
• 49% of 5,062 alcohol-related attendances miscoded as either the
substance (alcohol, ethyl alcohol) or just the mood disorder of feeling
elated (‘intoxicated’) but not necessarily involving alcohol intake at all
Outline
Data Quality : An old problem
SNOMED CT : New ways to get it wrong
SNOMED CT : New ways to prevent or fix it
Much of the complexity of formal ontologies arises from the consistent application
of semantic patterns and choices. The cognitive load of using a complex formal
ontology can be reduced if these patterns and choices are made explicit as a
metamodel of the ontology, and where the metamodel is subsequently harnessed
to guide user choices pre hoc and transform expressions post hoc to a preferred
semantic form.
MD Thesis 2004
SNOMED CT: New ways to prevent it
Pre-hoc data capture
(User training)
Clinical search-and-browse
Suppress non-sensical chapters
Display concepts in hierarchy
Data entry constraints/validation
Speciality Subsets
But beware : risk of
non-interoperable
sublanguages
Structured Data Entry
Non-interoperable sublanguages…
with thanks to Malcolm Duncan http://www.mrtablet.demon.co.uk/chocolate_teapot_lite.htm
SPECIALTY ONE
SPECIALTY TWO
SPECIALTY THREE
Crockery
---Teapot
5
----- Brown teapot 2
------White teapot 1
Crockery
---Teapot
----- Brown teapot
------White teapot
------Blue teapot
------China teapot
Crockery
---Teapot
----- Brown teapot
------White teapot
----------White china teapot
------Blue teapot
----------Blue china teapot
------China teapot
---------White china teapot
---------Blue china teapot
------Pink teapot
------Aluminium teapot
------Chocolate teapot
------Wooden teapot
------Paper teapot
2
2
1
2
1
1
0
0
0
1
1
0
0
1
1
0
1
1
1
SNOMED CT: New ways to prevent it
Post-hoc data repair
Query Table
‘Semantic redirection’
Manual redirection
‘Semantic redirection’ ?
IF code IN <<442083009|Anatomical or acquired body structure| THEN
SELECT CASE epr_context
CASE = “diagnosis”
code disorder:findingSite=code
CASE = “procedure”
code procedure:procedureSite=code
END SELECT
END IF
Manual redirection
‘OD’ = overdose
Hypoglycemia
Dyspepsia?
Dysuria?
THANKYOU