MS in Health Informatics

Download Report

Transcript MS in Health Informatics

Moving Towards A Data
Repository That Facilitates Data
Analysis
CHOP
November 18, 2009
Relational Database Design
Normalization
• Normalization - process of efficiently organizing data in a
database to reduce redundancies of data
• Goal - consistency of data
– Store data once and one time only!
– security
– disk space
– speed of queries
– efficiency of database updates
– data integrity
 In normalized database no aggregation and no
calculated fields
3
Data Anomolies
4
Unnormalized data set
Patient
ID
Name
Address
DOB
Doc
Appt
Date
Location
DX
111111
Cindy
Marselis
2320 Edge
Hill Road
1/11/64
Armstrong
9/1/09
11:00 AM
Alter 2011
Herniated
Disc
Flu
111111
Cindy
Marselis
9331
Rising Sun
Avenue
1/11/64
Morningstar
9/1/09
11:00 AM
Alter 2011
Herniated
Disc
111111
Cindy
Marselis
2320 Edge
Hill Road
1/11/64
Allen
11/1/09
10:00 AM
Alter 2012
Psoriasis
222222
Kathryn
Marselis
2320 Edge
Hill Road
11/3/04
Dershaw
8/1/09
11:00 AM
Speakman
105
Well baby
check
111111
Cindy
Schwartz
9331
Rising Sun
Avenue
1/11/64
Armstrong
8/11/09
3:00 PM
Alter 105
Psoriasis
Herniated
Disc
5
Normalized db - before
6
Normalized db - after
7
Example of Appointment Entity
Relationship Diagram
8
Structured, free text, unstructured
text
Free text
• Issues with string
searches
– Must match exactly
in case, punctuation,
spelling, etc.
• Use of lookup tables
where possible
10
Unstructured Text
• Gartner: white-collar workers spend from 30 to 40% of time
managing documents
• Merrill Lynch: > 85 % of business information exists as
unstructured data
– e-mails, memos, notes from call centers and support
operations, news, user groups, chats, reports, letters, surveys,
white papers, marketing material, research, presentations and
Web pages.
• In relational db, data that can't be stored in rows and columns.
– stored in a BLOB (binary large object)
– e-mail files, word-processing text documents, PowerPoint
presentations, JPEG and GIF image files, and MPEG video file
• Metadata (data about data can be stored)
http://www.informationmanagement.com/issues/20030201/6287-1.html
11
Approaches to structured and
unstructured data
1. Unique database: consolidates all structured and unstructured
data together
– expensive to buy and maintain
– large volume of data can clog the database making it slow
and inefficient
2. Use two databases: one structured data, and one for
unstructured data.
– Avoids performance issues with structured data
– significant performance limitations for unstructured data
12
Approaches to structured and
unstructured data
3. Unstructured data left on file servers with database to record and
links to unstructured data files.
– Avoids issue with volumes of data
– Fragile as links are broken when files and folders moved
around.
– Must create links every time new document created
4. Complex and expensive connectors used to tap in all databases
and file servers providing unified view of data.
– Expensive and complex requiring purchase and maintenance
of multiple databases and file servers with the added cost of
all required connectors.
5. Patents currently under development.
13
Certification Commission for Health
Information Technology (CCHIT)
EHR Construct
EMAR: Electronic Medication
Administration Record
CPOE: Computerized Physician
Order Entry
PFS: Physician Fee Schedule
OC/RR: Physician Order
Communication/Results Retrieval
CPOE: Computerized Physician
Order Entry
PFS: Physician Fee Schedule
R-ADT: Registration Admission
Discharge Transfer
14
Data Warehousing
Pressures Driving Need for Business
Intelligence and Data Warehousing
• External and internal
forces require tactical
and strategic
decisions
• Search for
competitive
advantage
• Business
environments are
dynamic
• Decision-making
cycle time is reduced
16
Operational vs. Decision Support Data
• Operational data
• DSS
– Relational, normalized
– Snapshot of operational
database
data
– Optimized to support
– Summarized
transactions
– Large amounts of data
– Real time updates
• Data analyst viewpoint
– Timespan
– Granularity
– Dimensionality
17
Creating a Data Warehouse
18
Codd’s Key Data Warehouse Rules
•
•
•
•
•
•
Separated from operational environment
Integrated Data
Historical data over long time horizon
Snapshot data captured at given time
Subject-oriented data
Mainly read-only data with periodic batch updates from
operational source, no online updates
19
Codd’s Key Data Warehouse Rules
contd.
•
Contains different levels of data detail
– Current and old detail
– Lightly and highly summarized
• Metadata (data about the data) critical
components
– Identify and define data elements
– Provide the source, transformation,
integration, storage, usage, relationships, and
history of data elements
20
Decision Support Systems DSS
DSS Components
22
Decomposition of DSS –
Operational Data
o
o
o
o
o
o
o
Tumor registry
A/D/T
Radiology narrative
Pathology narrative
Lab results
Patient Accounting
Charge Master
23
Decomposition of DSS –
External Data
o
o
o
o
Research spider
Treatment guidelines
Reimbursement schedules
NCI/NIH protocols
24
Decomposition of DSS –
ETL
•
•
•
•
Rationalize normal lab values
Transform gender codes and free
text
Narrative dumps
Doctor cleansing
o Similar names
o Which practice gets credit?
25
ETL – Extraction,
Transformation, Load
Extract
data from
source
systems
Transform: cleanse data for consistency
and output exceptions
o Apply business rules
o Selecting certain columns to load
(not null records)
o Translating coded values (1, M, male
=0)
o Derive new calculated value
(sale_amount = qty * unit_price)
o Join data from multiple sources
(lookup, merge)
o Aggregate (rollup/summarize data –
average LOS for each doctor by
DRG)
o Transpose/pivot (turning columns
into rows)
o Data validation.
Load:
data into
repository
26
ETL Best Practice
•
•
•
ETL: 60-80% of development effort
Create multi-departmental team charged with
consensus on Transformation!
Review exceptions carefully
o Indicator of issues with operational db design
o Indicator of changes needed in transformation
27
Decomposition of DSS –
Business Data
•
•
Business data – central
repository
Includes metadata: source,
format, timing of feeds
Characteristic
Factors
Integrated
•
•
Centralized
Holds data retrieved from
entire organization
SubjectOriented
•
•
Optimized to give answers to
diverse questions
Used by all functional areas
Time Variant
•
•
Flow of data through time
Projected data
Non-Volatile
•
•
Data never removed
Always growing
28
Decomposition of DSS –
Business Model Data
•
Comprehensive Cancer Center
definition of a patient
o Must have seen a physician
for suspected or confirmed
benign or malignant condition
o What about patients seen for
screening mammography?
29
Decomposition of DSS –
End user query tool
•
•
Web-based or client-server?
OLAP – Online Analytic Processing
o Microsoft
o Business Objects (bought by
SAP)
o MicroStrategy
o Cognos (bought by IBM)
o Oracle (includes Hyperion)
30
Decomposition of DSS –
End –user tool
o
o
o
o
Drill down functionality
Roll up
Charts – not data level
Export features
http://demos.telerik.com/aspnetajax/chart/examples/functionality/
drilldown/defaultcs.aspx
31
Design – Star Schema
Star Schema
•
•
•
•
Center fact table
o usually contains numeric information for summary
reports.
Dimension table radiate from fact table
Dimension table is hierarchial
• ‘rollup’ allows to compare types of hospitals, disease
categories, or even patient age bands.
Creates logical data cube dimensions identifying a set of
numeric measurements within the cube.
33
Star Schema
• Data-modeling technique
• Maps multidimensional decision support into relational
database
• Yield model for multidimensional data analysis while
preserving relational structure of operational DB
• Four Components:
– Facts
– Dimensions
– Attributes
– Attribute hierarchies
34
Simple Star Schema
35
Star Schema
36
Entity Relationship Diagram
37
Analysis
Online Analytical Processing (OLAP)
• Advanced data analysis environment
• Supports decision making, business modeling, and
operations research activities
• Characteristics of OLAP
– Use multidimensional data analysis techniques
– Provide advanced database support
– Provide easy-to-use end-user interfaces
– Support client/server architecture
39
Healthcare Cube – slice and dice view
Time
Provider
Strategic
Period
Clinic
Year
Specialty
Quarter
Group
Month
Physician
Week
Day
Shift
Hour
40
Dashboard
41
Scorecard including Key Performace
Indicators (KPI)
•
•
•
•
•
Risk-adjusted
mortality index
Risk-adjusted
complications
index
Risk-adjusted
patient safety index
Severity-adjusted
average length of
stay
Expense per
adjusted discharge,
case mix- and
wage-adjusted
42