PPT - School of Engineering
Download
Report
Transcript PPT - School of Engineering
Topics in Biomedical Informatics
CSE
5095
Informatics for Integrating
Biology and the Bedside
(i2b2)
http://www.i2b2.org
Antonio Cusano
Computer Science & Engineering Department
The University of Connecticut
371 Fairfield Road, Box U-255
Storrs, CT 06269-2155
[email protected]
Spring 2011
i2b2-1
Overview
CSE
5095
Introduction to i2b2
Modeling the i2b2 Data Model
Overview of the i2b2 Software Tools
Using the i2b2 Software
Overview of the i2b2 Hive Cells
Example Use Case Scenario
Notable Projects & Usage in BMI
Evaluating i2b2
Summary
i2b2-2
Background & Motivation
CSE
5095
The rise of Electronic Medical Record Systems
(EMRS) holds great promise for clinical research
Increasingly important for integration between
medical record data and clinical research data
But many challenges exist:
EMRS are typically built with the “single patient”
in mind
It would be difficult to observe trends in data across
combinations of many patients
How do we “clean” EMR data at a global,
enterprise-level without compromising the data?
Removal of some data by person X could be a
devastating loss to person Y
How do we maintain patient privacy?
i2b2-3
Background & Motivation
CSE
5095
What do we need?
A system that supports queries that cut across
multiple patients
More dependent on standard descriptors
A system that can process and understand complex
queries and specifications
A system that can integrate medical record data
and clinical research data
Provide a robust data model
A system that protects the privacy of the patients
Solution?
i2b2-4
Introducing i2b2
CSE
5095
Informatics for Integrating Biology and the Bedside
One of seven NIH Roadmap National Centers for
Biomedical Computing (http://www.ncbcs.org)
Funded under the NIH Common Fund
Part of the networked national effort to build the
infrastructure for biomedical computing in the nation
Established in 2004
Based at Partners HealthCare in Boston,
Massachusetts
Non-profit, integrated health system founded by
Brigham and Women’s Hospital and Mass. General
Primary Investigator:
Isaac Kohane, M.D., Ph.D., Professor of Pediatrics at
Harvard Medical School
i2b2-5
Mission Statement
CSE
5095
Overcome two major obstacles:
The computational challenges of discovery across
large, heterogeneous data sets routinely obtained in
clinical care
The lack of knowledge of genomic-level
physiology and how to study it
Therefore, the goals of i2b2 are:
To provide clinical researchers with the software
tools necessary to collect and integrate medical
record and clinical research data in the genomics
age
By creating a software suite that constructs and
manages the modern clinical research chart
i2b2-6
i2b2 Software Tools
CSE
5095
The i2b2 Hive
The Clinical Research Chart
ver. 1.5.2
i2b2-7
i2b2 Software – Design Objectives
CSE
5095
Design focused around several goals:
Provide a secure presentation of patient
information for research purposes
Provide a software framework that can be easily
extended
Provide secure communication capabilities for said
software framework
Provide a flexible data model tuned to the needs of
patient-specific information
Requiring timely and scalable query performance
Adaptable to new and unanticipated representations of
health care information
i2b2-8
Identifying the Data Model Requirements
CSE
5095
Developers identified these key requirements for
constructing a data model for i2b2
Integration of data from distributed and
differently structured databases
In order to perform comprehensive and integrative
analyses
Separation of data used for research from daily
operational or transactional data
Eliminate any performance implications and maintain
integrity
Standardization of a model across systems
Ensure all i2b2 systems possess the same data model to
enable data sharing
Ease of use by end-users
i2b2-9
Dimensional Modeling
CSE
5095
Model the database using two concepts:
Facts
The quantitative or factual data being queried
Dimensions
Descriptions of the various facts
i2b2-10
Star Schema
CSE
5095
Possesses a central “fact” table where each row
represents a single fact
A fact is an observation of a patient
Diagnoses, Procedures, Genetic Data, Lab Data, Health
History, Demographics Data, etc.
An observation is not the same thing as an event
Observations are recorded by a specific observer within
a specific time range regarding a specific concept
Fact table is surrounded by numerous dimension tables
Four dimension tables
Concept, Provider, Visit, Patient
Contains descriptors that characterize the facts
i2b2-11
Star Schema
CSE
5095
i2b2-12
Star Schema Performance
CSE
5095
Enterprise repositories and project-specific, local
repositories can contain very large amounts of data
The size of the central fact table can grow to be
very large as a result, impacting performance
It is critical to have indexes on that table to maintain
stable performance
Use system-specific enhancements when possible
SQL Server databases can use clustered indexes to any
table to produce sorted results
i2b2-13
i2b2 Software – Purpose
CSE
5095
Serves two primary use cases:
Expose an enterprise wide repurposing and
distribution of medical record data for research
Enable high performance collection of medical record
data for querying and distribution
Enable discovery within data on a wide scale
Enable usage of medical record data in clinical
studies
How do we achieve these use cases?
Use the i2b2 Software Tools!
The i2b2 Hive
The Clinical Research Chart
– A core component of the i2b2 Hive
i2b2-14
What is the i2b2 Hive?
CSE
5095
A collection of interoperable services provided by
i2b2 cells
Each cell behaves as a functional service
Cells are loosely coupled (independence)
Cells do not know their relative locality (proximity)
Cells are connected and communicate with each other
using web services
Can be invoked manually by the user
Can be invoked automatically by the system
workflow
What do we notice?
Highly modular architecture
Highly scalable
i2b2-15
What are i2b2 Cells?
CSE
5095
The i2b2 cell is the basic building block of the i2b2
environment
An application “wrapped” into a functional unit
Encapsulates business logic as well as access to data
objects behind standard web service interfaces
Supported services include REST, SOAP
Communication using XML messages
Business Logic
HTTP XML
Data Access
Data Objects
REST
SOAP
i2b2 web service interfaces
i2b2-16
Structure of the XML Message
CSE
5095
XML schema that defines:
A header for communication management
A header for the message request/response
A message body that contains the data
For example, can contain patient sets with their:
– Phenotypic (Clinical) and Genotypic Data
– References to other data objects (images, attachments)
i2b2-17
Example XML Message Header
CSE
5095
i2b2-18
Example XML Message Body
CSE
5095
i2b2-19
Advantages of Web Services
CSE
5095
Because all communication is in XML…
Not limited to any single operating system
Not limited to any single programming language
Cells can be developed in Microsoft .NET, Perl,
Python, Java, etc.
Any language that supports REST or SOAP
capability can be used
Cells can exist on Windows, Linux, and Mac OS and
communicate with each other
i.e. cells residing on a Windows platform can talk
with those on a UNIX platform
No restriction on how simple or complex a cell can be
XML tags the data
REST/SOAP transfers the data
i2b2-20
But Where’s the User Interface?
CSE
5095
Web services do not provide a visual user interface
The developer is required to build a client component
Must include a Graphical User Interface (GUI) and
Control Mechanism for user interaction
Some considerations:
Should utilize the web service interfaces for
communication, rather than a home-brew approach
Must ensure cell-to-cell communication is maintained
Reuse the functionality of existing cells
i2b2-21
How are Cells Classified?
CSE
5095
The i2b2 Hive is composed of a number of cells with
varying importance and functionality
Core cells are essential for operation of the Hive
Provide basic services
Written in Java using Java J2EE specifications
Front-end clients written using the Standard Widget
Toolkit (SWT)
– Provides native OS look-and-feel for the user interfaces
Optional and Plug-in type cells add functionality to
the Hive but are not essential
Special Hive Cells:
The Clinical Research Chart
The i2b2 Web Client
The i2b2 Workbench Application
i2b2-22
The Clinical Research Chart
CSE
5095
The Clinical Research Chart is the implementation of
the Star Schema in i2b2
Functions as the integrated data repository for the
i2b2 Hive
Core cell of the i2b2 Hive (Data Repository Cell)
Requires all core cells to gain complete functionality
– In fact, the main purpose of the other Core cells is to support
the activities of the CRC
Fundamentally built to store medical data
Which can be accessed by any cell in the i2b2 Hive
Similarly, any cell can contribute to placing data into
the CRC
i2b2-23
The Clinical Research Chart
CSE
5095
Useful for:
Repurposing patient data and integrating it with
genomic data and clinical trial data for clinical
research
Important to note:
Not a mechanism for searching through hospital
clinical systems
Not a transaction system to manage clinical trials
i2b2-24
The i2b2 Web Client
CSE
5095
Designed for enterprise related activities
i.e. selecting patients from an enterprise repository
Written entirely in JavaScript, HTML, and CSS
Uses AJAX to eliminate page refreshing
Cross platform and compatible with most browsers
Known compatibility issues with IE5 and lower
Easy to deploy and update
Important to note:
Can create patient sets and retrieve patient counts
Only anonymous patient data is shown
Data is obfuscated by adding or subtracting a small
random number to the available aggregate totals
i2b2-25
The i2b2 Workbench Application
CSE
5095
Designed for project-based use
i.e. data manipulation, visual analytics
Written in Java using the Eclipse Framework
The client applications are Eclipse plug-ins which
compose the workbench application
Can be extended with other Java/Eclipse plug-ins
More resource intensive than its web companion
Helpful for heavy client-side processing
i2b2-26
How to use the i2b2 Software
CSE
5095
First, use the web or desktop client to select/query
patients from the enterprise data repository (EDR)
i2b2-27
Creating the Query
CSE
5095
Patient attributes are dragged from the “Terms” panels
into the “Query Tool” panels
Terms in the same panel are logically OR’d
Terms in different panels are logically AND’d
i2b2-28
How to use this Data?
Querying from an EDR returns limited data
A patient count from the results of the query
Aggregate counts of the demographics of these
patients
Not very useful for research purposes in current form
In order to effectively use this data, patient sets
must be saved into a new, project-specific database
CSE
5095
Will be saved in your local i2b2 installation
This process is known as creating a “data mart”
Requires IRB approval
i2b2-29
Creation of a Data Mart
CSE
5095
A data mart ensures patient privacy by only storing
information allowed under HIPAA regulations
Public Health Information (PHI) is not included in
the data mart
Data is saved in the CRC (Star Schema DB Model)
i2b2-30
Working with the Data
CSE
5095
Use the i2b2 Workbench Application to view &
manipulate the data from your data mart
i2b2-31
User & Hive Interaction
CSE
5095
When using the web or desktop client, you’re not just
accessing the Clinical Research Chart directly
In fact, most interaction incorporates the
functionalities of many i2b2 Cells
At the minimum, all core cells are used in some way
What do these other cells do?
Project Management
Ontology
Management
Data
Repository
(CRC)
Identity
Management
File
Repository
Workflow
Management
i2b2-32
Workflow Framework Cell
CSE
5095
This cell is used to process information in steps
through various parts of the Hive
Most processed information will come to reside in
the CRC or be displayed to the user
Specifically:
Facilitates communication between cells
Manages project-specific XML data objects for
users of a given project
These objects typically originate in other cells
These objects are organized in hierarchical structures
that represent relationships between elements
Allows users to organize, label, and annotate data
objects
i2b2-33
Workflow Framework Cell
Use Case Diagram
CSE
5095
i2b2-34
Workflow Framework Cell
Operations and Descriptions
CSE
5095
i2b2-35
Workflow Framework Cell
CSE
5095
We can see the Workflow Management Cell at work
in the i2b2 Web and Desktop Clients
For example, providing hierarchal structure for
concepts and patient sets
i2b2-36
Project Management Cell
CSE
5095
This cell is used to provide user authentication and
manage group and role information
User access is determined by a user’s role
Defines what actions they may perform in the Hive
Default role is User
Other roles include Manager, Administrator
Users can have one or more roles
It also keeps track of what cells are part of the Hive
and their location
i2b2-37
Project Management Cell
CSE
5095
Can be accessed by either an i2b2 client or by another
i2b2 cell
Client: user trying to login to client
Cell: check which roles exist for user for that cell
Authentication and Authorization
Use Case Diagram:
i2b2-38
File Repository Cell
CSE
5095
Fundamentally, this cell holds large files of data
Radiological images, genetic sequences
These files are generally referenced from the
Clinical Research Chart
Manages the sending and receiving of these files
between cells
Other cells will use REST or SOAP service calls to
access files in this cell under most conditions
Users can use this cell to upload files
XML Request format:
<message_body>
<recvfile_request>
<filename>/oasis/ABT001b/brain_324.jpg</filename>
</recfile_request>
</message_body>
i2b2-39
Ontology Management Cell
CSE
5095
Manages the terminology and knowledge information
typically used in the Hive, especially in the CRC
Provides descriptive terms and other information
for data stored in the observation_fact table
This metadata is stored in a separate table(s) outside of
the Star Schema
These vocabulary terms are organized in
hierarchical structures (Workflow Framework)
This information is either requested by or distributed
to cells during most of the Hive’s transactions
Use Case Diagram:
i2b2-40
Ontology Management Cell
Typical Ontology Table
CSE
5095
Hierarchical level
Full path that leads to the term
Descriptive text value
Is field a synonym for another term?
Display icon used in the user interface
Field not used in i2b2
Describes ontological concept
Extra information about the concept in XML
Column name in fact table that holds concept code
Name of look-up table that holds concept code
Name of field that holds concept path
T for text or N for numeric
SQL operator used in WHERE clause for queries
Dimension table path that maps to the concept
Store miscellaneous comments
Tooltip that appears in the user interface
Date the data was updated
Date the data was downloaded
Date the data was imported
Coded value for the originating source system
Coded value indicating term type: DOC or LAB
i2b2-41
Identity Management Cell
CSE
5095
Manages a patient's protected health information in a
manner consistent with HIPAA privacy rules
Patient data is available only as a HIPAA defined
“Limited Data Set”
Removal of patient identifiers
Uses a “code book” that maps the real patient
identifiers to arbitrary patient numbers in the CRC
Design and Architecture documents are not publicly
available for this cell
It’s a secret?
i2b2-42
Optional i2b2 Cells
CSE
5095
Natural Language Processing Cell
Manipulates text reports to extract specific terms
and knowledge from them
Extract concepts such as diagnoses, smoking status
These concepts are then used to achieve various
representations of the data
Concepts returned divided into three categories:
UMLS concepts
– Mapping parts of the document to concepts in the Unified
Medical Language System (UMLS) database
Regular Expression concepts
– Matching document text to a set of regular expression rules
Smoking Status concepts
– Classification model trained on human-annotated smokingrelated sentences
i2b2-43
Natural Language Processing
CSE
5095
i2b2-44
Optional i2b2 Cells
CSE
5095
Pulmonary Function Test (PFT) Processing Cell
Parses a pulmonary function report and extracts
embedded test values
Report must be in a specific format
Returned values may be stored in the CRC and
used in queries or other types of analyses
Report format not specified in any official i2b2
documentation, but examples have been published
Provides some idea about the required format
i2b2-45
Pulmonary Function Report Format
CSE
5095
i2b2-46
Example Use Case Scenario
CSE
5095
Clinical Asthma Investigation
Available data includes:
Text notes from asthma clinic
Reports from pulmonary function tests
Questions…
How and when is the data extracted?
How and when is the data encrypted?
How and when is the data collated into something
meaningful and useful?
Answer!
Use the functionality provided by the i2b2 Hive
Core cells and Optional cells
Once data is gathered and processed, add this data
to the Clinical Research Chart
i2b2-47
Workflow Requirements
The Workflow Framework (WF) cell controls
communication between the other cells
Identify cells that will be needed for this workflow
Identity Management, Data Repository, Natural
Language Processing, and PFT Processing
CSE
5095
i2b2-48
Workflow Continued…
CSE
5095
The available data is uploaded through the Identity
Management (IM) cell
Names, medical record numbers, and other
sensitive information are resolved and retained in
the IM cell
Data is encrypted (based on the block cipher
Advanced Encryption Standard)
Data is added to the Clinical Research Chart (CRC)
The CRC now contains a HIPAA compliant, limited
data set
Encrypt
Text Notes,
PFT Reports
i2b2-49
Workflow Continued…
CSE
5095
With our newly defined data set, we want to extract
concepts from the text notes
i.e. hospital discharge summaries, EMR data
WF cell retrieves notes from the CRC and sends them
to the Natural Language Processing cell (NLP)
The NLP cell manipulates the notes and extracts
specific information from them to form concepts
These concepts are then pushed back to the CRC
i2b2-50
Workflow Continued…
CSE
5095
Similarly, we want to extract concepts from the PFT
reports
WF cell retrieves the PFT reports from the CRC and
sends them to the PFT Processing cell
The PFT cell parses the records one by one and
generates concepts from them
The values associated with each test record are placed
back into the CRC
i2b2-51
Workflow Complete
CSE
5095
Data has now been fully processed and saved in the
CRC and is available for viewing and manipulation
Using the i2b2 Workbench Application
Allows the investigator to query, analyze, and display
the data
What did we get from this process?
Medication and diagnoses concepts related to
asthma from the NLP notes
Physical findings and physiological test results
extracted from the PFTs
Resulting in a wealth of valuable data for the clinical
investigator to aid in clinical discovery
i2b2-52
Crimson Project
CSE
5095
Developed by Dr. Lynn Bry of Partners HealthCare
Project Objectives:
Provide enhanced sample management within i2b2
Support prospective and retrospective sample
collection
Prospective: requests typically routed to an external
information system
Retrospective: requests typically directed towards an
existing repository or registry
Three i2b2 cells
Regulatory cell
Sample Cohort Management cell
Sample Registry cell
https://community.i2b2.org/wiki/display/crimson/Crimson+Home
i2b2-53
Crimson Project – The Cells
CSE
5095
Regulatory Cell
Manages the regulatory aspects associated with
sample request and sample data management
within i2b2
De-identification of data
Connection management with external systems
Storing PHI encryption keys
Sample Cohort Management Cell
Focused on translating, broadcasting, and tracking
i2b2 sample requests
Sample Registry Cell
Manage the import process of sample data from
external sources
i2b2-54
Crimson Project – Architecture
CSE
5095
i2b2-55
SMArt Project for i2b2
CSE
5095
Developed by Nich Wattanasin
Project Objective:
Develop a common API for SMArt applications to
interact with the i2b2 platform
Project in the very early stages of development
First release: September 14, 2010
Only 20 revisions since (as of April 2011)
Current Capabilities:
A handful of functions that return targeted
information from a single patient record
Accomplished via REST calls
Results returned in RDF/XML format
Plug-in for the i2b2 Web Client
https://community.i2b2.org/wiki/display/SMArt/SMART+Home
i2b2-56
SMArt Project – Current Functions
CSE
5095
Get Medications
Returns a list of medications for a specific patient
record
Get Demographics
Returns the demographic information for a specific
patient record
Get Problems
Returns a list of problems for a specific patient
record
Get Allergies
Returns a list of allergies for a specific patient
record
GET http://i2b2_server/records/{record id}/{medications | demographics |
problems | allergies}/
i2b2-57
SMArt Dashboard Web Client Plug-in
CSE
5095
Ability to embed SMArt Apps directly into the i2b2
Web Client
Ability to access i2b2 patient data via the SMArt
connect model/project common API
i2b2-58
i2b2 Research Data Warehouse
CSE
5095
A custom i2b2 implementation at Cincinnati
Children’s Hospital Medical Center (https://i2b2.cchmc.org)
Developed by the CCHMC i2b2 team
Project adds several new capabilities to the i2b2
platform:
Ability to view clinical data in a web-based form
(similar to a chart review)
Ability to enter data directly into i2b2 using forms
i.e. data that is not collected from an EMR
Ability to run reports and perform custom
visualizations on the data
CCHMC uses i2b2 to create a “research data
warehouse”
But what is a research data warehouse?
i2b2-59
What is a Research Data Warehouse?
CSE
5095
According to CCHMC…
A research data warehouse is a repository that
integrates information on patients from multiple
sources
Electronic health records
Lab results
Genetic and research data
Birth registry data
Government data (Medicaid)
What it is used for:
Cohort identification, hypothesis generation
What it is NOT used for:
Decision support, clinical trials, real-time alerts
i2b2-60
i2b2 Research Data Warehouse
CSE
5095
i2b2-61
Evaluating i2b2
CSE
5095
Performance
Statistics provided by Partners Healthcare
Query Performance (on their primary i2b2 system)
4.6 million patient records
1.2 billion observations (facts) on these patients
(observation_fact table)
– Queries requesting patient counts on this repository typically
complete within 10 seconds, many within several milliseconds
Data Mart Initialization Performance
2.6 million patient records
550 million observations (facts) on these patients
8x3 GHz processor machine with 32GB RAM
– Completed building in approximately 1 hour and 15 minutes
i2b2-62
Evaluating i2b2
CSE
5095
Scalability
Enabled by the modular nature of the i2b2 cell and
ease of integration into the Hive
Encourages development outside of the i2b2 core team
Fosters rapid software development
Usability
Simple installation processes to get started
Intuitive user interfaces
Wealth of documentation publicly available online
Reduced learning curve
Interoperability
Works on a variety of operating systems, web
browsers, and server technologies
Not limited to commercial technologies
i2b2-63
Limitations
CSE
5095
Naturally, users can create project-level repositories
(data marts) from an enterprise-level repository
Can we update our project databases with fresh,
updated enterprise data?
Can we upload our project data, regardless of
origin, into the enterprise repository?
Such capabilities are not currently supported in i2b2
Difficult to implement the numerous policies
required for these functions
i2b2-64
Limitations
CSE
5095
i2b2 cells communicate through web services, which
are not always flexible
Perhaps we want to execute our own SQL queries?
Not possible, queries are limited to pre-specified
queries and result sets, dictated by the cells
How do we overcome this?
Developers planning to introduce a second SQL
access layer to the CRC
Will allow for greater flexibility with queries
– But will need to comply with security rules and strict ontology
i2b2-65
Summary
Presented i2b2 as a software tool and a data model
aiding in clinical research and discovery
Addresses the inherit challenges of integrating
medical record and clinical research data
Relatively young project, but on the fast track for
growth and development
Roadmap for future releases with a new version
currently in release candidate (RC) status
Adoption and usage in BMI looks promising
Approximately 17 sites outside of Partners
HealthCare are engaged in i2b2 projects
CSE
5095
i2b2-66
Thank You!
CSE
5095
i2b2-67