IBM_Life_Science

Download Report

Transcript IBM_Life_Science

IBM Life Sciences
Luigi DIPACE
IBM Semea Sud - Director
© Copyright IBM Corporation
IBM Life Sciences
Agenda
IBM Life Sciences:
-
Life Sciences, at a glance
-
Life Sciences,in details:
 Organizational
 IBM Research
 Discovery Link
 Partnerships and Alliances
 High Performance Computing
 Some IBM case studies
2
© Copyright IBM Corporation
IBM Life Sciences
IBM Life Sciences at a glance
August 2000 start
About 1000 dedicated Life Sciences industry experts and scientists on:
-
Genomics/Proteomics, Biology/Bioinformatics/Chemistry, Drug Discovery,
Clinical Development & Regulatory, Information Based Medicine
Leverage on:
-
-
All Life Sciences ecosystem coverage
Organizational
IBM Research
$220+ M investiments in partnerships and alliances
Leadership in High Performance Computing
IDC: "IBM nr.1 in Life Sciences market share in 2002“
3
© Copyright IBM Corporation
IBM Life Sciences
All Life Sciences ecosystem coverage
Diagnostis&
Medical
Devices
CRO
Researcher /
Scientist
Pharmaceutical
Healthcare
Patient /
Consumer
Venture
Capital
Bioinformatics
Agribusiness
Biotec
Government
Academia
4
© Copyright IBM Corporation
IBM Life Sciences
Organization
IBM Life
n
Sciences
Business Partners
Services
IBM Research
IBM Life Sciences has
overall strategy In Life
Sciences, including:
 Orchestrating the critical
relationship between
products and services
delivery,
 Directing investments
 Developing business
partnerships
Overseeing the
advancement of IBM
research and product
development in this area.
Customer
Relationship
Independent Software vendors
5
© Copyright IBM Corporation
IBM Life Sciences
IBM Research: basic research at intersection of biology and
computation since 1992
Almaden
Almaden
Austin
Austin
Watson
Watson
Zurich
Zurich
Beijing
Beijing
Tokyo
Tokyo
Haifa
Haifa
Delhi
Delhi
http://www.research.ibm.com/compsci/compbio/index.html
6
© Copyright IBM Corporation
IBM Life Sciences
Blue Gene is supported by IBM Research, in particular by
IBM’s Computational Biology Center (CBC)
Structural
Biology
Blue
Gene
IBM Research:
Bioinformatics
Watson
CBC
Protein
Dynamics
Pattern
Discovery
Almaden
Zurich
Data and
Text Mining
7
Electronic
Medical Record
Functional
Genomics
Tokyo
© Copyright IBM Corporation
IBM Life Sciences
Research projects
•Ab-Initio Molecular Dynamics
•Annotations Of Complete Genomes (InsightLink)
•Bioinformatics & Pattern Discovery
•Bio-Dictionaries
•Comparative Molecular Moment Analysis (CoMMA)
•Heterogeneous Databases (Discovery Link)
•Scaleable Similarity Searching
•The Functional Genomics Group
•Integrated Medical Records
•Visual Analysis
Free of charge bioinformatics algorithms for no-profit use
http://www.research.ibm.com/compsci/compbio/index.html
8
© Copyright IBM Corporation
IBM Life Sciences
Life Science Standards Bodies and Industry
Consortia Participation
OMG-LSR
I3C
Research
CDISC
Development:
Clinical
Trial
Clinical
Medical
Imaging
GGF
9
HL7
Clinical
Genomics
HL7
DICOM
© Copyright IBM Corporation
IBM Life Sciences
220 m$ investment on partnership & alliances
10
© Copyright IBM Corporation
IBM Life Sciences
FOCUS AREAS
High Performance Information Infrastructure to
support R&D activities
Data Integration, Data Mining and Knowledge
Management to support scientific collaboration and
productivity in Drug Discovery and Information
Based Medicine
Clinical development and regulatory solutions to
improve efficiency and shorten the drug time-tomarket
11
© Copyright IBM Corporation
IBM Life Sciences
High Performance Information Infrastructure
 IT Svcs
 Clustering





x86 Intel, Opteron AMD, Power IBM
Linux and AIX
Rack and Blade architectures
Common System Mgmt
Investments protection and open customer choices
 Storage
 Packaged solution for R&D customers
 Grid Computing
 IBM Grid Innovation Center in Montpellier
 Globus Project – OGSA
 Graphic Workstations
12
© Copyright IBM Corporation
IBM Life Sciences
Data Integration & Data Mining projects to support Medical &
Pharma R&D activities
 Solutions enabling to extract data from multiple
heterogeneous sources and applicative
subsystems as a response of a single query
 Projects enabling to integrate all medical info, like
medical images, laboratory data, electronic patient
records, pedigrees, gene chip array data, protein
data, and to mine these info with an easy query
interface
 Projects aiming to combine phenotypic, genetics
and genomics data with innovative analysis tools
13
© Copyright IBM Corporation
IBM Life Sciences
DiscoveryLink
is a middleware able to extract data from multiple,
heterogeneous data sources in response to a single query.
Some innovative features of
such a system are:
Integration of all traditional Relational
DBs
Transparent access to a large number
of standard bioinformatic sources (flat
files DB, Documetum repositories,
Excel spreadsheet, etc.)
Possibility to extend the integration to
datasources of his own type by
appositely creating customized
wrapper
Availability of standard wrappers for
bioinformatics algorithms as BLAST
Easy integration in the pre-existing
applications by means of its unique
SQL interface
14
© Copyright IBM Corporation
IBM Life Sciences
Discovery Link Global View
•Excel
•Flat Files in CSV format
•Documentum
•BLAST
•DB2
•Oracle
Front end
Wide variety of clients
Accelrys, Lion BioSciences, In House GUI,
Application, SQL Command on line,etc.
SQL
Rq
rs
•Oracle Cartridge
Discovery Link
Single virtual Data
Base view
Optimizer
Wrappers
Rq
Various D.B. In
different locations
15
Rq
Rq
rs
rs
Back end
RDB, Spreadsheet, Flat Files, Algorithm, etc.
in dfferent locations
rs
•MS SQL Server
•Sybase
•My SQL
•Informix
•XML
•ODBC
•HMMER
•ENTREZ (NCBI portal)
•Extended Search
•BioRS
•SRS
•Accelrys
•Websphere MQ
•Web Services
•Wrapper Development Toolkit
© Copyright IBM Corporation
IBM Life Sciences
A Pharma Discovery Link project: Aventis
PRESS RELEASE
Brio.One solution
Bridging data islands
By Gary Anthes
Computerworld (US)
(10/14/02)— "After just two months, a
new software tool enabled Aventis
Pharmaceuticals Inc. to discover a
promising candidate for a new drug to
treat asthma, arthritis or even perhaps
cancer; it's a chemical compound that
might well have been overlooked using
traditional IT tools.
Aventis is using DiscoveryLink, a
feature of IBM Corp.'s DB2 database
management system that can propel a
single SQL query out to multiple,
heterogeneous data sources and bring
information back to the user in one
coherent view.
16
Search
Publish
Customization
Collaboration
Integration
"Using this integrated framework, scientists were able to pull
data from many different sources around the world, visualize
it in a new way that they could never do before," says Peter
Loupos, vice president for drug innovation and approval
information systems"
© Copyright IBM Corporation
IBM Life Sciences
Discovery Link installations at University - Research Centers
•Academica Sinica
•Indiana University
•University of Toronto
•Institute for Systems Biology
•John Hopkins University
•MCNC
•National Institute of Education
•University of California San Diego
•University of Amsterdam
• .....................................
17
© Copyright IBM Corporation
IBM Life Sciences
IBM Clinical Genomics integration projects at Mayo Clinic,
City of Kobe, deCode Genetics, John Hopkins, UCSF, etc.
End users
IBM + Partner
Visualization
applications
Front-end user
application
Data Mining
Front-end query tool
IBM
IBM Integration Middleware
Public/private
genomic data
Clinical data
Medical images
Genotype data
Phenotype data
18
© Copyright IBM Corporation
IBM Life Sciences
IBM Digital Mammography Grid projects at Pennsylvania, UK, etc.
19
© Copyright IBM Corporation
IBM Life Sciences
Backup
20
© Copyright IBM Corporation
IBM Life Sciences
The Interoperable Informatics Infrastructure Consortium is working to facilitate and enable data exchange and data and
knowledge management across the entire life science community. The I3C is developing a common standards –based
platform, with a use case approach, based on XML and services. Specifications include LSID (work in progress).
LSID, Life Science Identifier is work in progress in the I3C. It provides a logical naming convention which defines a means
for uniquely naming biologically significant data items. LSID is unique for the life of the entity. LSID makes it easier to track
an item through the pipeline
The Life Science Research committee of the Object Management Group is focused on improving the quality and utility of
interoperability software and standardized interfaces in life science research. Specifications include MAGE
MAGE, the Microarray and GeneExpression standard addresses the representation of gene expression data and relevant
annotations, as well as the mechanisms for exchanging this data. From Gene Expression Spec. 2/02
Global Grid Forum is aimed at the development of broadly based integrated grid architecture that assists the emerging grid
communities which includes the life science grids.
Clinical Data Interchange Standards Consortium develops the industry standards that support the electronic acquisition,
exchange, submission and archiving of clinical trial data.
Health Level Seven focuses on the electronic interchange of clinical information among healthcare oriented computers
systems. HL7 also works in conjunction with CDISC on clinical trials data exchange standards and DICOM on medical
images. In addition to messages, HL7 also defines architecture of clinical documents that could encapsulate clinical and
genomic data.
DICOM Standards Committee exists to create and maintain international standards for communication of biomedical
diagnostic and therapeutic information in disciplines that use digital images and associated data. The goals of DICOM are to
achieve compatibility and to improve workflow efficiency between imaging systems and other information systems in
healthcare environments worldwide.
DICOM is used or will soon be used by virtually every medical profession that utilizes images. These include cardiology,
dentistry, endoscopy, mammography, opthamology, orthopedics, pathology, pediatrics, radiation therapy, radiology, surgery.
21
© Copyright IBM Corporation
IBM Life Sciences
IBM is actively supporting and leading Grid open source
communities in Life Sciences
• North Carolina Genomics & Bioinformatics Consortium
GSK, Biogen, University of North Carolina, Duke University, etc.
diverse goals and expertise in all Life Sciences disciplines
•UK e-science projects: connecting various universities in the UK
•Netherlands Grid: connecting universities in the Netherlands
•Telethon: grid scientific project in France
•TeraGrid: a consortium of 4 U.S. research centres to build the world’s
most powerful computing Grid, capable of 13.6 trillion calculations/sec
•University of Pennsylvania: Grid for LS research
•Smallpox cure with U.S.A. department of Defense and Accelrys
•China Education and Research Grid
the most ambitious grid project by a government to date
22
© Copyright IBM Corporation
IBM Life Sciences
CG Case Study: Mayo Clinic Collaboration
Applied Genomics Data Analysis
Genomic data (DNA) – GeneChip array data (RNA)
Protein data
Clinical Data
Signs
Symptoms
Laboratory
Radiology
Etc.
Phase I
23
Databases
Genome
Proteome
Disease
Tumors
Drugs
Optimized, individualized healthcare
© Copyright IBM Corporation
IBM Life Sciences
A major Challenge was to enable non-IT Mayo Clinic
specialists to ask complex questions to the Clinical DB
 Find all patients with:
-
-
-
24
Coronary artery disease (a form of heart disease)
Diabetes Mellitus (“diabetes”)
Nonalcoholic steatohepatitis (a form of liver disease)
Who had a breast biopsy at Mayo (a procedure)
In ZIP code 55901, 55902, 55903, 55904 (local region)
Between 45 and 65 years of age (certain age)
Who are female (female gender)
And are alive (vital status)
© Copyright IBM Corporation
IBM Life Sciences
During Phase I, the Mayo Clinic Partnership has produced
one of the world’s largest Clinical Data Warehouses
 Warehouse contains 4.4M+ patient records
 Infinite number of unique queries across
-
28 demographic elements
523 DRG codes
10,455 ICD-9 codes
All structured laboratory test conditions or results (up to 4900+)
All microbiology organisms by name; heart rate on ECG
 Mayo researcher benefit – “months to minutes” time savings for select
cases tested
25
© Copyright IBM Corporation
IBM Life Sciences
Mayo Clinic - Phase II Collaboration
 Part 1 - Storage and Retrieval of Genomic data
-
Genomic data storage and retrieval utilities incorporated into the data
warehouse
 Part 2 - Genomic Data Analysis Workflow
-
Genomic data workflow encompassing DNA/RNA test results, analysis of raw
data (e.g., microarray), with annotation/comparison to reference databases
and inclusion in study list prototype application
 Part 3 - Text Analysis
-
26
Concept-based inquiry and retrieval of unstructured data in Clinical Notes
and Laboratory Reports
© Copyright IBM Corporation