PPT - School of Engineering

Download Report

Transcript PPT - School of Engineering

Topics in Biomedical Informatics
CSE
5095
Informatics for Integrating
Biology and the Bedside
(i2b2)
http://www.i2b2.org
Antonio Cusano
Computer Science & Engineering Department
The University of Connecticut
371 Fairfield Road, Box U-255
Storrs, CT 06269-2155
[email protected]
Spring 2011
i2b2-1
Overview

CSE
5095 
Introduction to i2b2
Modeling the i2b2 Data Model

Overview of the i2b2 Software Tools

Using the i2b2 Software

Overview of the i2b2 Hive Cells

Example Use Case Scenario

Notable Projects & Usage in BMI

Evaluating i2b2

Summary
i2b2-2
Background & Motivation

CSE
5095

The rise of Electronic Medical Record Systems
(EMRS) holds great promise for clinical research
 Increasingly important for integration between
medical record data and clinical research data
But many challenges exist:
 EMRS are typically built with the “single patient”
in mind
 It would be difficult to observe trends in data across
combinations of many patients

How do we “clean” EMR data at a global,
enterprise-level without compromising the data?
 Removal of some data by person X could be a
devastating loss to person Y

How do we maintain patient privacy?
i2b2-3
Background & Motivation

CSE
5095
What do we need?
 A system that supports queries that cut across
multiple patients
 More dependent on standard descriptors


A system that can process and understand complex
queries and specifications
A system that can integrate medical record data
and clinical research data
 Provide a robust data model
A system that protects the privacy of the patients
Solution?


i2b2-4
Introducing i2b2

CSE
5095
Informatics for Integrating Biology and the Bedside
 One of seven NIH Roadmap National Centers for
Biomedical Computing (http://www.ncbcs.org)
 Funded under the NIH Common Fund
 Part of the networked national effort to build the
infrastructure for biomedical computing in the nation


Established in 2004
Based at Partners HealthCare in Boston,
Massachusetts
 Non-profit, integrated health system founded by
Brigham and Women’s Hospital and Mass. General

Primary Investigator:
 Isaac Kohane, M.D., Ph.D., Professor of Pediatrics at
Harvard Medical School
i2b2-5
Mission Statement

CSE
5095

Overcome two major obstacles:
 The computational challenges of discovery across
large, heterogeneous data sets routinely obtained in
clinical care
 The lack of knowledge of genomic-level
physiology and how to study it
Therefore, the goals of i2b2 are:
 To provide clinical researchers with the software
tools necessary to collect and integrate medical
record and clinical research data in the genomics
age
 By creating a software suite that constructs and
manages the modern clinical research chart
i2b2-6
i2b2 Software Tools
CSE
5095
The i2b2 Hive
The Clinical Research Chart
ver. 1.5.2
i2b2-7
i2b2 Software – Design Objectives

CSE
5095
Design focused around several goals:
 Provide a secure presentation of patient
information for research purposes
 Provide a software framework that can be easily
extended
 Provide secure communication capabilities for said
software framework
 Provide a flexible data model tuned to the needs of
patient-specific information
 Requiring timely and scalable query performance
 Adaptable to new and unanticipated representations of
health care information
i2b2-8
Identifying the Data Model Requirements

CSE
5095
Developers identified these key requirements for
constructing a data model for i2b2
 Integration of data from distributed and
differently structured databases
 In order to perform comprehensive and integrative
analyses

Separation of data used for research from daily
operational or transactional data
 Eliminate any performance implications and maintain
integrity

Standardization of a model across systems
 Ensure all i2b2 systems possess the same data model to
enable data sharing

Ease of use by end-users
i2b2-9
Dimensional Modeling

CSE
5095
Model the database using two concepts:
 Facts
 The quantitative or factual data being queried

Dimensions
 Descriptions of the various facts
i2b2-10
Star Schema

CSE
5095
Possesses a central “fact” table where each row
represents a single fact
 A fact is an observation of a patient
 Diagnoses, Procedures, Genetic Data, Lab Data, Health
History, Demographics Data, etc.

An observation is not the same thing as an event
 Observations are recorded by a specific observer within
a specific time range regarding a specific concept

Fact table is surrounded by numerous dimension tables
 Four dimension tables
 Concept, Provider, Visit, Patient

Contains descriptors that characterize the facts
i2b2-11
Star Schema
CSE
5095
i2b2-12
Star Schema Performance

CSE
5095

Enterprise repositories and project-specific, local
repositories can contain very large amounts of data
 The size of the central fact table can grow to be
very large as a result, impacting performance
It is critical to have indexes on that table to maintain
stable performance
 Use system-specific enhancements when possible
 SQL Server databases can use clustered indexes to any
table to produce sorted results
i2b2-13
i2b2 Software – Purpose

CSE
5095
Serves two primary use cases:
 Expose an enterprise wide repurposing and
distribution of medical record data for research
 Enable high performance collection of medical record
data for querying and distribution
 Enable discovery within data on a wide scale
Enable usage of medical record data in clinical
studies
How do we achieve these use cases?
 Use the i2b2 Software Tools!


 The i2b2 Hive
 The Clinical Research Chart
– A core component of the i2b2 Hive
i2b2-14
What is the i2b2 Hive?

CSE
5095




A collection of interoperable services provided by
i2b2 cells
 Each cell behaves as a functional service
Cells are loosely coupled (independence)
Cells do not know their relative locality (proximity)
Cells are connected and communicate with each other
using web services
 Can be invoked manually by the user
 Can be invoked automatically by the system
workflow
What do we notice?
 Highly modular architecture
 Highly scalable
i2b2-15
What are i2b2 Cells?

CSE
5095

The i2b2 cell is the basic building block of the i2b2
environment
 An application “wrapped” into a functional unit
Encapsulates business logic as well as access to data
objects behind standard web service interfaces
 Supported services include REST, SOAP
 Communication using XML messages
Business Logic
HTTP XML
Data Access
Data Objects
REST
SOAP
i2b2 web service interfaces
i2b2-16
Structure of the XML Message

CSE
5095
XML schema that defines:
 A header for communication management
 A header for the message request/response
 A message body that contains the data
 For example, can contain patient sets with their:
– Phenotypic (Clinical) and Genotypic Data
– References to other data objects (images, attachments)
i2b2-17
Example XML Message Header
CSE
5095
i2b2-18
Example XML Message Body
CSE
5095
i2b2-19
Advantages of Web Services

CSE
5095



Because all communication is in XML…
 Not limited to any single operating system
 Not limited to any single programming language
Cells can be developed in Microsoft .NET, Perl,
Python, Java, etc.
 Any language that supports REST or SOAP
capability can be used
Cells can exist on Windows, Linux, and Mac OS and
communicate with each other
 i.e. cells residing on a Windows platform can talk
with those on a UNIX platform
No restriction on how simple or complex a cell can be
 XML tags the data
 REST/SOAP transfers the data
i2b2-20
But Where’s the User Interface?

CSE

5095

Web services do not provide a visual user interface
The developer is required to build a client component
 Must include a Graphical User Interface (GUI) and
Control Mechanism for user interaction
Some considerations:
 Should utilize the web service interfaces for
communication, rather than a home-brew approach
 Must ensure cell-to-cell communication is maintained

Reuse the functionality of existing cells
i2b2-21
How are Cells Classified?

CSE
5095
The i2b2 Hive is composed of a number of cells with
varying importance and functionality
 Core cells are essential for operation of the Hive
 Provide basic services
 Written in Java using Java J2EE specifications
 Front-end clients written using the Standard Widget
Toolkit (SWT)
– Provides native OS look-and-feel for the user interfaces
Optional and Plug-in type cells add functionality to
the Hive but are not essential
Special Hive Cells:
 The Clinical Research Chart
 The i2b2 Web Client
 The i2b2 Workbench Application


i2b2-22
The Clinical Research Chart

CSE
5095
The Clinical Research Chart is the implementation of
the Star Schema in i2b2
 Functions as the integrated data repository for the
i2b2 Hive
 Core cell of the i2b2 Hive (Data Repository Cell)
 Requires all core cells to gain complete functionality
– In fact, the main purpose of the other Core cells is to support
the activities of the CRC

Fundamentally built to store medical data
 Which can be accessed by any cell in the i2b2 Hive
 Similarly, any cell can contribute to placing data into
the CRC
i2b2-23
The Clinical Research Chart

CSE
5095

Useful for:
 Repurposing patient data and integrating it with
genomic data and clinical trial data for clinical
research
Important to note:
 Not a mechanism for searching through hospital
clinical systems
 Not a transaction system to manage clinical trials
i2b2-24
The i2b2 Web Client

CSE
5095




Designed for enterprise related activities
 i.e. selecting patients from an enterprise repository
Written entirely in JavaScript, HTML, and CSS
 Uses AJAX to eliminate page refreshing
Cross platform and compatible with most browsers
 Known compatibility issues with IE5 and lower
Easy to deploy and update
Important to note:
 Can create patient sets and retrieve patient counts
 Only anonymous patient data is shown
 Data is obfuscated by adding or subtracting a small
random number to the available aggregate totals
i2b2-25
The i2b2 Workbench Application

CSE
5095


Designed for project-based use
 i.e. data manipulation, visual analytics
Written in Java using the Eclipse Framework
 The client applications are Eclipse plug-ins which
compose the workbench application
 Can be extended with other Java/Eclipse plug-ins
More resource intensive than its web companion
 Helpful for heavy client-side processing
i2b2-26
How to use the i2b2 Software

CSE
5095
First, use the web or desktop client to select/query
patients from the enterprise data repository (EDR)
i2b2-27
Creating the Query

CSE
5095
Patient attributes are dragged from the “Terms” panels
into the “Query Tool” panels
 Terms in the same panel are logically OR’d
 Terms in different panels are logically AND’d
i2b2-28
How to use this Data?

Querying from an EDR returns limited data
 A patient count from the results of the query
 Aggregate counts of the demographics of these
patients

Not very useful for research purposes in current form
 In order to effectively use this data, patient sets
must be saved into a new, project-specific database
CSE
5095
 Will be saved in your local i2b2 installation

This process is known as creating a “data mart”
 Requires IRB approval
i2b2-29
Creation of a Data Mart

CSE
5095

A data mart ensures patient privacy by only storing
information allowed under HIPAA regulations
 Public Health Information (PHI) is not included in
the data mart
Data is saved in the CRC (Star Schema DB Model)
i2b2-30
Working with the Data

CSE
5095
Use the i2b2 Workbench Application to view &
manipulate the data from your data mart
i2b2-31
User & Hive Interaction

CSE
5095
When using the web or desktop client, you’re not just
accessing the Clinical Research Chart directly
 In fact, most interaction incorporates the
functionalities of many i2b2 Cells
 At the minimum, all core cells are used in some way

What do these other cells do?
Project Management
Ontology
Management
Data
Repository
(CRC)
Identity
Management
File
Repository
Workflow
Management
i2b2-32
Workflow Framework Cell

CSE
5095

This cell is used to process information in steps
through various parts of the Hive
 Most processed information will come to reside in
the CRC or be displayed to the user
Specifically:
 Facilitates communication between cells
 Manages project-specific XML data objects for
users of a given project
 These objects typically originate in other cells
 These objects are organized in hierarchical structures
that represent relationships between elements

Allows users to organize, label, and annotate data
objects
i2b2-33
Workflow Framework Cell

Use Case Diagram
CSE
5095
i2b2-34
Workflow Framework Cell

Operations and Descriptions
CSE
5095
i2b2-35
Workflow Framework Cell

CSE
5095
We can see the Workflow Management Cell at work
in the i2b2 Web and Desktop Clients
 For example, providing hierarchal structure for
concepts and patient sets
i2b2-36
Project Management Cell

CSE
5095
This cell is used to provide user authentication and
manage group and role information
 User access is determined by a user’s role
 Defines what actions they may perform in the Hive

Default role is User
 Other roles include Manager, Administrator
Users can have one or more roles
It also keeps track of what cells are part of the Hive
and their location


i2b2-37
Project Management Cell

CSE
5095
Can be accessed by either an i2b2 client or by another
i2b2 cell
 Client: user trying to login to client
 Cell: check which roles exist for user for that cell
 Authentication and Authorization

Use Case Diagram:
i2b2-38
File Repository Cell

CSE
5095



Fundamentally, this cell holds large files of data
 Radiological images, genetic sequences
 These files are generally referenced from the
Clinical Research Chart
Manages the sending and receiving of these files
between cells
 Other cells will use REST or SOAP service calls to
access files in this cell under most conditions
Users can use this cell to upload files
XML Request format:
<message_body>
<recvfile_request>
<filename>/oasis/ABT001b/brain_324.jpg</filename>
</recfile_request>
</message_body>
i2b2-39
Ontology Management Cell

CSE
5095
Manages the terminology and knowledge information
typically used in the Hive, especially in the CRC
 Provides descriptive terms and other information
for data stored in the observation_fact table
 This metadata is stored in a separate table(s) outside of
the Star Schema
These vocabulary terms are organized in
hierarchical structures (Workflow Framework)
This information is either requested by or distributed
to cells during most of the Hive’s transactions
Use Case Diagram:



i2b2-40
Ontology Management Cell

Typical Ontology Table
CSE
5095
Hierarchical level
Full path that leads to the term
Descriptive text value
Is field a synonym for another term?
Display icon used in the user interface
Field not used in i2b2
Describes ontological concept
Extra information about the concept in XML
Column name in fact table that holds concept code
Name of look-up table that holds concept code
Name of field that holds concept path
T for text or N for numeric
SQL operator used in WHERE clause for queries
Dimension table path that maps to the concept
Store miscellaneous comments
Tooltip that appears in the user interface
Date the data was updated
Date the data was downloaded
Date the data was imported
Coded value for the originating source system
Coded value indicating term type: DOC or LAB
i2b2-41
Identity Management Cell

CSE
5095
Manages a patient's protected health information in a
manner consistent with HIPAA privacy rules
 Patient data is available only as a HIPAA defined
“Limited Data Set”
 Removal of patient identifiers


Uses a “code book” that maps the real patient
identifiers to arbitrary patient numbers in the CRC
Design and Architecture documents are not publicly
available for this cell
 It’s a secret?
i2b2-42
Optional i2b2 Cells

CSE
5095
Natural Language Processing Cell
 Manipulates text reports to extract specific terms
and knowledge from them
 Extract concepts such as diagnoses, smoking status


These concepts are then used to achieve various
representations of the data
Concepts returned divided into three categories:
 UMLS concepts
– Mapping parts of the document to concepts in the Unified
Medical Language System (UMLS) database
 Regular Expression concepts
– Matching document text to a set of regular expression rules
 Smoking Status concepts
– Classification model trained on human-annotated smokingrelated sentences
i2b2-43
Natural Language Processing
CSE
5095
i2b2-44
Optional i2b2 Cells

CSE
5095
Pulmonary Function Test (PFT) Processing Cell
 Parses a pulmonary function report and extracts
embedded test values
 Report must be in a specific format


Returned values may be stored in the CRC and
used in queries or other types of analyses
Report format not specified in any official i2b2
documentation, but examples have been published
 Provides some idea about the required format
i2b2-45
Pulmonary Function Report Format
CSE
5095
i2b2-46
Example Use Case Scenario

CSE
5095
Clinical Asthma Investigation
 Available data includes:
 Text notes from asthma clinic
 Reports from pulmonary function tests


Questions…
 How and when is the data extracted?
 How and when is the data encrypted?
 How and when is the data collated into something
meaningful and useful?
Answer!
 Use the functionality provided by the i2b2 Hive
 Core cells and Optional cells

Once data is gathered and processed, add this data
to the Clinical Research Chart
i2b2-47
Workflow Requirements

The Workflow Framework (WF) cell controls
communication between the other cells

Identify cells that will be needed for this workflow
 Identity Management, Data Repository, Natural
Language Processing, and PFT Processing
CSE
5095
i2b2-48
Workflow Continued…

CSE
5095


The available data is uploaded through the Identity
Management (IM) cell
 Names, medical record numbers, and other
sensitive information are resolved and retained in
the IM cell
 Data is encrypted (based on the block cipher
Advanced Encryption Standard)
Data is added to the Clinical Research Chart (CRC)
The CRC now contains a HIPAA compliant, limited
data set
Encrypt
Text Notes,
PFT Reports
i2b2-49
Workflow Continued…

CSE
5095



With our newly defined data set, we want to extract
concepts from the text notes
 i.e. hospital discharge summaries, EMR data
WF cell retrieves notes from the CRC and sends them
to the Natural Language Processing cell (NLP)
The NLP cell manipulates the notes and extracts
specific information from them to form concepts
These concepts are then pushed back to the CRC
i2b2-50
Workflow Continued…

CSE
5095



Similarly, we want to extract concepts from the PFT
reports
WF cell retrieves the PFT reports from the CRC and
sends them to the PFT Processing cell
The PFT cell parses the records one by one and
generates concepts from them
The values associated with each test record are placed
back into the CRC
i2b2-51
Workflow Complete

CSE
5095
Data has now been fully processed and saved in the
CRC and is available for viewing and manipulation
 Using the i2b2 Workbench Application
 Allows the investigator to query, analyze, and display
the data


What did we get from this process?
 Medication and diagnoses concepts related to
asthma from the NLP notes
 Physical findings and physiological test results
extracted from the PFTs
Resulting in a wealth of valuable data for the clinical
investigator to aid in clinical discovery
i2b2-52
Crimson Project

CSE

5095
Developed by Dr. Lynn Bry of Partners HealthCare
Project Objectives:
 Provide enhanced sample management within i2b2
 Support prospective and retrospective sample
collection
 Prospective: requests typically routed to an external
information system
 Retrospective: requests typically directed towards an
existing repository or registry


Three i2b2 cells
 Regulatory cell
 Sample Cohort Management cell
 Sample Registry cell
https://community.i2b2.org/wiki/display/crimson/Crimson+Home
i2b2-53
Crimson Project – The Cells

CSE
5095
Regulatory Cell
 Manages the regulatory aspects associated with
sample request and sample data management
within i2b2
 De-identification of data
 Connection management with external systems
 Storing PHI encryption keys


Sample Cohort Management Cell
 Focused on translating, broadcasting, and tracking
i2b2 sample requests
Sample Registry Cell
 Manage the import process of sample data from
external sources
i2b2-54
Crimson Project – Architecture
CSE
5095
i2b2-55
SMArt Project for i2b2

CSE

5095


Developed by Nich Wattanasin
Project Objective:
 Develop a common API for SMArt applications to
interact with the i2b2 platform
Project in the very early stages of development
 First release: September 14, 2010
 Only 20 revisions since (as of April 2011)
Current Capabilities:
 A handful of functions that return targeted
information from a single patient record
 Accomplished via REST calls
 Results returned in RDF/XML format


Plug-in for the i2b2 Web Client
https://community.i2b2.org/wiki/display/SMArt/SMART+Home
i2b2-56
SMArt Project – Current Functions

CSE
5095




Get Medications
 Returns a list of medications for a specific patient
record
Get Demographics
 Returns the demographic information for a specific
patient record
Get Problems
 Returns a list of problems for a specific patient
record
Get Allergies
 Returns a list of allergies for a specific patient
record
GET http://i2b2_server/records/{record id}/{medications | demographics |
problems | allergies}/
i2b2-57
SMArt Dashboard Web Client Plug-in

CSE
5095

Ability to embed SMArt Apps directly into the i2b2
Web Client
Ability to access i2b2 patient data via the SMArt
connect model/project common API
i2b2-58
i2b2 Research Data Warehouse

CSE
5095


A custom i2b2 implementation at Cincinnati
Children’s Hospital Medical Center (https://i2b2.cchmc.org)
Developed by the CCHMC i2b2 team
Project adds several new capabilities to the i2b2
platform:
 Ability to view clinical data in a web-based form
(similar to a chart review)
 Ability to enter data directly into i2b2 using forms
 i.e. data that is not collected from an EMR
Ability to run reports and perform custom
visualizations on the data
CCHMC uses i2b2 to create a “research data
warehouse”
 But what is a research data warehouse?


i2b2-59
What is a Research Data Warehouse?

CSE
5095
According to CCHMC…
 A research data warehouse is a repository that
integrates information on patients from multiple
sources







Electronic health records
Lab results
Genetic and research data
Birth registry data
Government data (Medicaid)
What it is used for:
 Cohort identification, hypothesis generation
What it is NOT used for:
 Decision support, clinical trials, real-time alerts
i2b2-60
i2b2 Research Data Warehouse
CSE
5095
i2b2-61
Evaluating i2b2

CSE
5095
Performance
 Statistics provided by Partners Healthcare
 Query Performance (on their primary i2b2 system)
 4.6 million patient records
 1.2 billion observations (facts) on these patients
(observation_fact table)
– Queries requesting patient counts on this repository typically
complete within 10 seconds, many within several milliseconds

Data Mart Initialization Performance
 2.6 million patient records
 550 million observations (facts) on these patients
 8x3 GHz processor machine with 32GB RAM
– Completed building in approximately 1 hour and 15 minutes
i2b2-62
Evaluating i2b2

CSE
5095
Scalability
 Enabled by the modular nature of the i2b2 cell and
ease of integration into the Hive
 Encourages development outside of the i2b2 core team
 Fosters rapid software development

Usability
 Simple installation processes to get started
 Intuitive user interfaces
 Wealth of documentation publicly available online
 Reduced learning curve

Interoperability
 Works on a variety of operating systems, web
browsers, and server technologies
 Not limited to commercial technologies
i2b2-63
Limitations

CSE
5095

Naturally, users can create project-level repositories
(data marts) from an enterprise-level repository
 Can we update our project databases with fresh,
updated enterprise data?
 Can we upload our project data, regardless of
origin, into the enterprise repository?
Such capabilities are not currently supported in i2b2
 Difficult to implement the numerous policies
required for these functions
i2b2-64
Limitations

CSE
5095
i2b2 cells communicate through web services, which
are not always flexible
 Perhaps we want to execute our own SQL queries?
 Not possible, queries are limited to pre-specified
queries and result sets, dictated by the cells

How do we overcome this?
 Developers planning to introduce a second SQL
access layer to the CRC
 Will allow for greater flexibility with queries
– But will need to comply with security rules and strict ontology
i2b2-65
Summary

Presented i2b2 as a software tool and a data model
aiding in clinical research and discovery
 Addresses the inherit challenges of integrating
medical record and clinical research data

Relatively young project, but on the fast track for
growth and development
 Roadmap for future releases with a new version
currently in release candidate (RC) status

Adoption and usage in BMI looks promising
 Approximately 17 sites outside of Partners
HealthCare are engaged in i2b2 projects
CSE
5095
i2b2-66
Thank You!
CSE
5095
i2b2-67