Faculty of Computer Science - Department of Computing Science

Download Report

Transcript Faculty of Computer Science - Department of Computing Science

Faculty of Computer Science
A Data Warehouse Architecture for
Clinical Data Warehousing
Tony R. Sahama and Peter R. Croll
Amit Satsangi
[email protected]
CMPUT 605
February 11, 2008
© 2006
Department of Computing Science
Focus
 Why are Clinical Data Warehouses (CDW) needed?
 Issues in their construction
 Design & design-choices in the construction of a
CDW
CMPUT 605
© 2006
Department of Computing Science
Why Clinical Data Warehouse?
 Efficient Storage
 Uniformity in storage and querying of data
 Timely analysis
 Quality of decision making and analytics
—Decision based on larger sized datasets
—More accurate information
—Better strategies and research methods
CMPUT 605
© 2006
Department of Computing Science
Why Clinical Data Warehouse?
 Measurement of the effectiveness of treatment
 Relationships between causality and treatment
protocols
 Safety
 Management
—Breakdown of cost, and charge information
—Forecasting demand
—Better strategies and research methods
CMPUT 605
© 2006
Department of Computing Science
Some Facts…
 Large volume of data distributed in a number of
small repositories—”islands” of information
 Data has great scientific and medical insight
 Great potential for people practicing clinical
medicine
CMPUT 605
© 2006
Department of Computing Science
Issues
 Heterogeneity—different clinical practices e.g.
public vs. private hospitals
 Data Location
 Technical platforms & data formats
 Organizational behaviors on processing the data
 Varying cultures amongst data management
population
CMPUT 605
© 2006
Department of Computing Science
Past efforts
 Szirbik et al. – Medical data Warehouse for elderly patients
—Six methodological steps to build medical data warehouses for
research. International Journal of Medical Informatics 75 (9): 683691
 Used Rational Unified process (RUP) framework
 Identification of current trends (critical requirements of future)
 Data Modelling
 Ontology Building
 Quality Management and exception handling
CMPUT 605
© 2006
Department of Computing Science
Different DW Architectures (Sen & Sinha 2005)
CMPUT 605
© 2006
Department of Computing Science
Design and Planning
 Business Analytics Approach—understand the key
processes of the business
 DW architect + Business Analyst + Expected Users
 Understand Key business processes + the
questions that would be asked of those processes
 Analysis might be conducted on demographic,
diagnosis, severity of illness, length of stay
CMPUT 605
© 2006
Department of Computing Science
Approach
 Integration of data from two Biomedical Knowledge
Repositories (BKR’s)—Oncology & Mental care
 Used SAS Data Warehouse Administrator (SAS 2002)
—Flexibility to integrate external data repositories
—Hassle-free ETL
—Analytics with Data Miner
—Reporting using SAS Enterprise Guide (EG)
 Operational Data Store Architecture & Distributed Data
Warehouse Architecture
CMPUT 605
© 2006
Department of Computing Science
 Several data marts to include different
administration and management operations
—Summary reports
—Monitoring of clinical outcomes by management
CMPUT 605
© 2006
Department of Computing Science
Oncology Patient Management
CMPUT 605
© 2006
Department of Computing Science
Mental Health Patient Management
CMPUT 605
© 2006
Department of Computing Science
Data Transformation
 Source systems  CDW (ETL— ExtractionTransformation-Load)
 Data preparation & Integration takes 90% of the
effort in a given CDW project
 Excel, SAS External File Interface (EFI) & SAS
Enterprise Guide (EG) used to clean the data
CMPUT 605
© 2006
Department of Computing Science
Steps in creation of CDW
 Step 1: Data imported in SAS
—Standardization into SAS table format
—Opportunity for data manipulation—create/delete columns
 Step 2: Creation of metadata using Operational Data definition
 Step 3: Creation and loading of Data Tables
—Different tables for predictive and Database analysis
—Creation of multi-dimensional cubes
CMPUT 605
© 2006
Department of Computing Science
Discussion
 Data acquisition step took very long—very little
time left for cleaning, transformation
 Not enough time left to refine the shared
environment (no modifications to their interface
implementation etc.)
 Security issues of federated Data Warehouses—
anonymization of records
CMPUT 605
© 2006
Department of Computing Science
Discussion
 SAS EM used to interpret relationships between
seemingly unconnected data
 Newer CDW models coming from Case-based, Rolebased & evidence-based data structures need to be
incorporated
CMPUT 605
© 2006
Department of Computing Science
Steps in creation of CDW
 Step 4: Data Mining
—Tools integrable with or within SAS used EM, EG etc.
CMPUT 605
© 2006
Department of Computing Science
Thank You For Your Attention!
CMPUT 605
© 2006