Transcript Slide 1

DecisionSite and
DiscoveryLink
Visualizing data in
novel ways
Spotfire User Conference
Boston, MA
October 28, 2003
Doug Del Prete
IBM Life Sciences
[email protected]
What is DiscoveryLink?
 DiscoveryLink (DL) is a powerful technology
available from IBM that will allow you to view many
data sources – even non-relational ones like BLAST
and HMMER – as one heterogeneous “virtual
relational database”
 Basically, a wide variety of data sources are all
made to look like SQL-based tables/views, which
makes it easy to access them and integrate all of
this data together in an optimized way
DecisionSite and DiscoveryLink
1
What is DiscoveryLink?
 DiscoveryLink is based on the latest Information
Integrator middleware technology, now a major IBM
software initiative for data federation/integration
 All data sources essentially become “SQL aware”,
and do so under a cost-based optimizer that works
against both relational and non-relational data
sources and their associated queries
DecisionSite and DiscoveryLink
2
Information Integration: Issues
For Scientists
Without a Data Integration Layer
Application
Application
Application
Application
Application
Web Servers
For IT
Application
Layer
Data
Layer
ASCII
data file
Excel
Spreadsheet
Oracle
DB2
Multiple, varied sources
Data freshness
DecisionSite and DiscoveryLink
SQL
Server
Multiple queries
Legacy Databases
3
Information Integration: Solution
Textual
Data
Compound
Data
For Scientists
Proteomic
Data
Other
Other
Data
Data
Sources
Sources
Genomic
Data
DiscoveryLink
Gene
Expression
Data
Legacy
Data
Toxicology
Data
Application
Application
Application
Application
Application
Web Servers
For IT
DiscoveryLink ™
ASCII
data file
Excel
Spreadsheet
Oracle
DB2
Multiple, varied sources
Data freshness
DecisionSite and DiscoveryLink
SQL
Server
Multiple queries
Legacy Databases
4
Information Integration: Question
Q: Show me all the compounds
similar to ketanserin that have
been tested against members of
the serotonin family and have
characteristics of a good drug.
Compound
Data
Other
Other
Data
Data
Sources
Sources
Proteomic
Data
Genomic
Data
DiscoveryLink
Gene
Expression
Data
DecisionSite and DiscoveryLink
Textual
Data
Toxicology
Data
Legacy
Data
5
Information Integration: IT Translation
Solution
Q: Show me all the compounds
similar to ketanserin that have
been tested against members of
the serotonin family and have
characteristics of a good drug.
Textual
Data
Compound
Data
Proteomic
Data
Other
Other
Data
Data
Sources
Sources
Genomic
Data
DiscoveryLink
Gene
Expression
Data
Toxicology
Data
Legacy
Data
Architecture
Parameters
IIM
Query
Term
Operator
Value
Compound
SimilarTo
Ketanserin
Receptor
Homologous
Serotonin
Result Set
(Visualization)
Discovery Link
IC50
<=
1E-8
Molwt
>
375
Molwt
<
425
logP
>
4
logP
<
6
DecisionSite and DiscoveryLink
BLAST Wrapper
XML Wrapper
BLAST
Data Source
XML
Document
Oracle Wrapper
Oracle
Compound DB
in Germany
ODBC Wrapper
Assay
Results in
MySQL
6
Information Integration: IT Translation
Q: Show me all the compounds
similar to ketanserin that have
been tested against members of
the serotonin family and have
characteristics of a good drug.
Solution
Compound
Data
Other
Other
Data
Data
Sources
Sources
SELECT a.compound_id, b.ic50, b.screen_name
FROM CMPNDDBS a, ACTIVITY_DATA b, BLASTP c
WHERE a.compound_id = b.compound_id
AND SimilarTo(a.compound_struct,:KETANSERIN_MOL) >
0.88
AND c.input_seq = :SEROTONIN
AND c.protein_id = b.screen_name
AND b.ic50 <= 0.000000001
AND a.mol_wt BETWEEN 375 AND 425
AND a.logP BETWEEN 4 AND 5
DecisionSite and DiscoveryLink
Proteomic
Data
Genomic
Data
DiscoveryLink
Gene
Expression
Data
DL Query
Textual
Data
Legacy
Data
Toxicology
Data
Architecture
Result Set
IIM
Query
(Visulaizati
on)
Discovery Link
BLAST Wrapper
BLAST
Data Source
XML Wrapper
Oracle Wrapper
XML
Document
Oracle Coumpound
DB in Germany
ODBC Wrapper
Assay
Results in
MySQL
7
Information Integration: Final Result
Q: Show me all the compounds
similar to ketanserin that have
been tested against members of
the serotonin family and have
characteristics of a good drug.
Results
Compnd ID
HTR1A
US1111
HTR1B
US1234
HTR1D
US2534
HTR1E US HTR1F
1111
US2534
HTR2A US HTR2B
1111
US4791
HTR2C US
1111
HTR4
US1234
HTR5A
US1111
HTR6
US1111
US-345123
0.001
0.5
[email protected]
2.1
3.8
1.1
0.001
53.5@5
0.01
0.02
[email protected]
UK-567345
<0.003
>5.0
[email protected]
<6.0
8.8
5.5
5.9
16@5
5.6
4.3
[email protected]
US-234012
0.0025
>5.7
23@10
<6.0
8.9
5.4
7.0
15@5
4.8
19.0
[email protected]
US-321543
0.05
2.0
[email protected]
8.9
0.0
6.7
10.0
48@5
3.33
2.6
[email protected]
DecisionSite and DiscoveryLink
8
DiscoveryLink – Overall Architecture
Wrapper
instance
definition
SwissProt
KEGG
dbEST
Locus Link
UNIGENE
and more …
Server
Definition
User
Mapping
Nicknames
Definition
Spotfire DecisionSite, Synapsia, Customer
Application, SQL command line, etc.
(JDBC/ODBC)
Rq
Data
loader
DB2
rs
Information Integrator
Optimizer
Wrappers
plan
Catalog
Log
•Oracle Cartridge
Wrappers
Administration through
Information Integrator
Control center
Rq
Rq
Rq
rs
•DB2
•Oracle
rs
rs
<XML>
text
</XML>
•MS SQL Server
•Sybase
•Informix
•Teradata
•ODBC (MySQL, Postgres…)
•Excel
•Flat Files in CSV format
•Documentum
•Blast
•XML
•ENTREZ (NCBI portal)
•HMMer
•Extended Search
•BioRS
Wrapper Development Toolkit
RDB, Spreadsheet, Flat Files, Algorithms, etc. In diverse locations
DecisionSite and DiscoveryLink
9
DiscoveryLink: A Robust Solution
Benefits
• Access to multiple, heterogeneous sources
• Complex queries across distributed data sources
• Leverage existing IT infrastructure and use
specialized functions of existing databases
• Integrating analysis tools and business intelligence
• Can put a SQL front-end and user security on data
sources such as BLAST, Pubmed, Genbank,
HMMER, XML
• Can use for fast and easy ad-hoc extensions to a
data warehouse/mart
DecisionSite and DiscoveryLink
10
DiscoveryLink Value Proposition
A proven scalable data integration solution
that enables efficient and effective queries
across disparate data sources, thereby
improving R&D efficiencies and productivity.
This translates into greater flexibility and
competitive advantage in the marketplace.
DecisionSite and DiscoveryLink
11
DecisionSite and DiscoveryLink
DecisionSite and DiscoveryLink
12
DecisionSite and DiscoveryLink
 DecisionSite, in its own right, can access many data
sources to drive its powerful set of visualizations –
i.e. any data source that is JDBC compliant can be
configured as a data source
 But DiscoveryLink as the data access engine can
extend the reach of DecisionSite in both the types of
data sources and performance/scalability
 These two products together provide a robust,
flexible “best of breed” approach to analyzing your
data:
 DecisionSite as the user interface/front-end to DL
 DiscoveryLink as a federated data source access engine for DS
DecisionSite and DiscoveryLink
13
DecisionSite and DiscoveryLink
WebServices
DecisionSite Server
Information Interaction Services
JDBC
Txt/csv/etc…
Relational DB
Relational
(Optimized
Joins)
DecisionSite and DiscoveryLink
DB2/
DiscoveryLink
Middleware
BLAST/
HMMER
XML
Pubmed/
Genbank
…
14
Benefits Overview
 Access more data sources – BLAST, HMMER,
Genbank, Documentum, BioRS, etc.
 Access existing DecisionSite data sources
faster/easier – XML, Postgres, MS Access, etc.
 Extend/augment an existing visualization with
information from any of the above data sources
 Optimized queries cross-joined across all data
sources – relational and non-relational – all under
one JDBC connection
DecisionSite and DiscoveryLink
15
Visualize a BLAST result under DS/DL!
DecisionSite and DiscoveryLink
16
DecisionSite and DiscoveryLink
 Based on the natural fit of DiscoveryLink into
DecisionSite’s IIM (Information Interaction Model)
 Under the Information Interaction Designer (IID),
you reference “Nicknames”, which are “virtual
tables”, pointing to other tables/views and nonrelational objects pre-configured under
DiscoveryLink
 These data sources naturally available to
Information Builder and Information Library/Links,
and beyond
DecisionSite and DiscoveryLink
17
DecisionSite and BLAST
See how easy it is to configure a data source like
BLAST to be used under DecisionSite
DecisionSite and DiscoveryLink
18
Indicate BLAST Server/Algorithm
DecisionSite and DiscoveryLink
19
Configure Defline
DecisionSite and DiscoveryLink
20
View complete BLAST Nickname
DecisionSite and DiscoveryLink
21
Configure under Information Designer
DecisionSite and DiscoveryLink
22
Configure under Information Builder
DecisionSite and DiscoveryLink
23
DecisionSite and BLAST
This is all done quickly and easily, even though
there is no JDBC driver for BLAST readily
available!
DecisionSite and DiscoveryLink
24
DecisionSite and XML
Similarly, see how easy it is to configure XML to be
used under DecisionSite
DecisionSite and DiscoveryLink
25
Provide XML Schema/File Location
DecisionSite and DiscoveryLink
26
Create all the relational Nicknames
DecisionSite and DiscoveryLink
27
Configure Parent/Child Join
DecisionSite and DiscoveryLink
28
DecisionSite and XML
Again, this can all be done quickly and easily,
without having to find and manually configure a
JDBC driver via a text editor, and then restarting
DecisionSite, etc.
DecisionSite and DiscoveryLink
29
Merged Queries - Example






In IID, set up the BLAST and Entrez Data Models
Also the Nucleotide/BLAST join via Accession #
Configure a BLAST Information Link
Configure a Nucleotide Information Link
Run BLAST visualization
From the visualization, run the Entrez Information
Link to get all the applicable metadata information
about each displayed Accession (in Details-onDemand, Table, etc.)
DecisionSite and DiscoveryLink
30
Query Optimization
 Under IID, configure key Joins under DB2/
DiscoveryLink
 This includes established DecisionSite data like
Oracle and MySQL (very quickly configured in
DiscoveryLink Control Center)
 The Information Link makes only one JDBC
connection to access all the data sources!
 The underlying SQL query invokes the DB2
Optimizer to improve performance even more
DecisionSite and DiscoveryLink
31
Query Optimization
 Great for large relational data sets such as joining
assay results with a chemical compound database,
etc.
DecisionSite and DiscoveryLink
32
DecisionSite and DiscoveryLink - Summary
You can extend all the powerful features of
DecisionSite, like guided analytics and posters, to
visualize information from more data sources, in an
easier manner, and do so in an optimized
environment
DecisionSite and DiscoveryLink
33
DEMO
Visualize a BLAST protein similarity search
against SwissProt
DecisionSite and DiscoveryLink
34
Thank you.
Questions?
Doug Del Prete
IBM Life Sciences USA
[email protected]