No Slide Title

Download Report

Transcript No Slide Title

Tumor Profile Discovery and Tumor Bank Management with DORA
www.polyomx.org
Adrian Driga1, Russ Greiner2, Kathryn Graham1, 4, Sambasivarao Damaraju1, 4, David Wishart2, 3, John Mackey1, 4 , Carol Cass1, 4
1Cross
DORA (Database for Online Retrieval and Analysis) is a webaccessible medical and laboratory information management system
(LIMS), through which clinical, microarray, SNP, and metabonomic
information from PolyomX-consented patients is stored, retrieved,
managed, and analyzed. DORA is designed for data warehousing
and has a flexible relational database architecture that can be readily
scaled up to accommodate clinical data from new cancer types, or
experimental data from new laboratory assays. PolyomX currently
collects clinical and molecular data for four cancer types: breast,
lung, ovarian, and gastric.
DORA facilitates the generation of a cancer knowledge base
and will help individualize cancer treatment by allowing researchers
to identify patient-specific characteristics of a cancer disease at the
molecular level. DORA supports translational research by providing
quick electronic access to patient information from the clinical and
molecular domains. For example, class prediction analysis is
performed using the signal intensity of the microarray spots and the
values of a clinical factor that partitions a group of patients into two
classes. Database queries retrieve this information and present it to
the statistical analysis software in the appropriate format.
Confidentiality of the information collected in DORA is strictly
maintained. Access to DORA is password-protected and confidential
information is encoded before being stored in the database. Access
to the confidential information and modification of data is highly
restricted. For security reasons, DORA is currently available only in
the Cross Cancer Institute computer network.
Tumor banking information is managed through the Tumor
Banking Database, a self-contained module of DORA. Please see
the poster on the Tumor Banking Database for details.
PolyomX is supported by the Alberta Cancer Foundation and
the Alberta Cancer Board.
Cancer Institute, Alberta Cancer Board, 2Department of Computing Science, 3Faculty of Pharmacy and Pharmaceutical Sciences,
4Department of Oncology, University of Alberta, Edmonton
DORA Modules, Schema, and Forms
Patient
Data Sharing Using DORA
M = Many
1
M
Cancer Disease
Treatment Protocol
1
1
M
M
Pathology
Remote User
Stage/Progression
M
M
M
Treatment
tissue
Remote User
Lab Work
blood, urine
1
1
M
Wide Area Network
1
M
Microarray
M
SNP
Metabonomics
Data
Figure 1: Overview of Database Schema
MA Slide Group
M
1
tissue ID
1
Remote User
MA Slide Type
Sequence Info
manufacturer, version
oligo/cDNA gene of
origin IDs
1
1
Local User
Figure 5: Centralized Database Scenario
1
M
Users connect securely to a central instance of DORA and access molecular or
clinical data according to their permissions. When a user creates a new patient
record in the clinical, microarray, SNP, or metabonomics module, that user is marked
as the owner of the record and is notified by e-mail. All users can have access to all
data as soon as the data is added to the database. However, when the central server
is not accessible (e.g., server down for software upgrade), no data is available. The
center hosting the DORA server is responsible for database administration and
software development.
M = Many
MA Slide Repeat
M
experiment details &
parameters
M
MA Slide Spot Sequence
slide map (e.g., GAL)
1
DORA Server
1
M
MA Slide Spot
position, intensity value
DORA Server S2
1
generates
M
MA Normalized Slide Spot
position, normalized intensity value
M
M
MA Gene Aggregate Value
tissue ID, sequence ID, aggregate of
normalized intensity values for sequence
across all spots from the repeats in group
Local User S2
Data S2
Main Features of DORA
• Integration of molecular and clinical information. Finding
Figure 2: Overview of Microarray (MA) Module Schema
Remote User
clinically relevant tumor profiles requires the analysis of genetic and
clinical information for large cohorts of patients. For every patient,
microarray, SNP, and metabonomic technologies can generate
massive amounts of data. DORA speeds up the analysis process by
seamlessly linking molecular data with relevant clinical data for every
patient. Integrated views of the patient data are analyzed statistically
and with machine learning techniques in order to discover molecular
profiles for clinical factors. It has been shown that gene expression
profiles can be reliable predictors of treatment response, relapse,
and disease free survival, and that certain combinations of SNPs
can indicate predisposition to cancer.
Wide Area Network
Data S1
Remote User
DORA Server S3
Local User S1
Data S3
• Data sharing and portability. DORA, database and software, can
be shared with or be easily replicated at other research centers.
Researchers can access a centralized DORA database remotely or
manage their own copy of DORA. For the latter scenario, data
sharing can be done via import/export software. Data sharing is
particularly important for studies on rare tumor groups (e.g., brain,
pancreas) because it allows researchers to accumulate a large
enough number of patients from across the province or the country.
Researchers can exchange patient data, molecular and clinical, but
will still retain ownership of the data that they have generated.
• Scalability. DORA is designed so that new modules can be quickly
integrated with the existing ones. Currently, modules for microarray,
SNP, and metabonomic data and clinical sections for breast, lung,
gastric, and ovarian cancer are fully functional. A clinical section for
brain/CNS cancer is still in the design phase and it will be
implemented soon.
DORA Server S1
Figure 6: Distributed (Federated) Database Scenario
Figure 4: Lung Cancer Stage/Progression Form
• DORA is implemented as a MySQL database, and is
made available to users via an Apache web server. The
software that connects the web forms with the database is
written in Perl and runs on the DORA server.
• The server on which DORA resides, runs Red Hat Linux
and is protected by a firewall.
• All the software that is needed to run a DORA server, i.e.,
R.H. Linux, MySQL, and Perl, is freely available for noncommercial purposes.
Figure 3: Ovarian Cancer Pathology Form
• PolyomX has designed and implemented the software
specific to DORA and can make this software available to
ACB researchers.
Several DORA servers are available at different sites and the databases have
identical schemas. Users can connect to any of the servers, but will add new records
to their local database. Integrated views of the data from several servers can be
obtained via import/export tools for data exchange, or by running a query (same)
against each database. Cost of administration and development is shared among the
server hosts. With additional software, the schemas of the DORA databases do not
need to be identical.
Acknowledgements
The authors want to thank doctors Brent Zanke, Tony Reiman, Tim Winton, Bryan
Dicken, Michael Sawyer, Helen Steed, Katia Tonkin, and David Omahen for their help
in designing the clinical modules of DORA, and Jennifer Listgarten for her help in
designing the microarray module.