A novel interactive tool for multidimensional biological
Download
Report
Transcript A novel interactive tool for multidimensional biological
A novel interactive tool for
multidimensional biological data
analysis
Zhaowen Luo, Xuliang Jiang
Serono Research Institute, Inc.
Outline
1.
Introduction
2.
Methods
3.
Applications and Examples
H2L decision making
Multiple kinases inhibitors analysis
2
1. Introduction
3
Data representation in drug discover
1.
Two dimensional view: Chemistry vs. Biology
2.
Chemistry – different compound structures
Biology - different assay data (potency, selectivity profiles,
ADEM/PK and toxicity).
A heat map is a graphical representation of data
where the values taken by a variable in a twodimensional map are represented as colors.
(WikiPedia).
Heat map meets the need of data representation in
drug discovery and could be a good decision
support tool.
4
My first heat map
A 200 X 200 Map
Nice picture
Some interesting patterns
But what is that???
5
Heat map is not enough
1.
Lack of interactivity:
2.
Difficult to retrieve information
Unable to display related information, such as structure
Static
Unable to manipulate data
Unable to do real-time analysis
Solution: Fully interactive heat map application –
only application can satisfy all need for decision
support.
6
2. Methods
7
An interactive heat map application
8
Methods
pre-defined set
of compounds
pre-defined set
of assays
Get Data
Get Structures
Get result data from
database
Normalize result data
Make Heat Map
Wrap Data, assaign color to
each data point
Heat map
Operation
DataMining
Data Analysis
Re-format
Re-Arrange
Exportresults
9
Features
Point to any point in heat
map, a tooltip box will show
structure as well as assay
result.
Double click on any points
will bring user to the source
of original data
Draw a box in heat map will
create a focus heat map for
the area of interesting.
More details about the point shows here
10
Example of focus map
Assay Name
Compound ID
11
More operations
Color spectrum can be changed.
Map Orientation can be changed.
Data analysis tools
1.
2.
Data points can be re-arranged based on analysis results.
Analysis results can be exported.
12
Normalize biological endpoints
Problem: Compare Orange with Apple
Solution: Use relative scale: MIN-MAX method
Define good and bad end for each endpoint.
Normalize result based both ends
For different kinds of assays, we define deferent
methods to normalize result.
User can customize their own normalization
methods.
13
Normalization examples
Bad
Potency – Enzyme Assay - logIC50
Good: -7
Bad: -4.5
Cytochrom C P450 Inhibition – logIC50
Good: -8
Bad: -5
Potency – Cell Based Assay – logIC50
Good
Good: -4.5
Bad: -7
Rat T1/2
Good: > 2 hours
Bad: < 0.25 hour
14
Normalized results
Distance Matrix
For Compounds
50nM
80 %
0 . 5hr
...
100nM
40 %
1hr
...
200nM
70 %
2hr
...
...
...
...
...
9
8
1
...
8
4
4
...
7
7
9
...
...
...
...
...
Raw Data (different units)
Normalized data (No unit)
Distance Matrix
For Assays
15
Data analysis
Sorting
•
•
•
23
45
21
...
109
234
55
...
77
78
97
...
...
...
...
...
Clustering
Distance Matrix
Similarity analysis
Analysis can be done for compounds and assays
Based on biological assays results
Results can exported to Excel file for further analysis
16
Structural analysis
1.
Clustering and sorting compounds by their
structural similarity.
2.
Using fingerprint to calculate the similarity between
compounds.
Provides structural-activity representation and
analysis.
17
Business consideration
1.
Hide information
2.
Use generic name for compounds and assays
For example, compounds use prefix and sequence
number.
Use generic structure, such as Benzene, to hide real
structure.
Look-up table for symbol replacement
Offline (offsite) capability
Export and import heat map to binary file
Re-import map offline without connecting to corporate
database.
18
Application development
1.
2.
3.
4.
JAVA™ JDK 1.5 (from Sun Microsystems)
ChimePro™ for JAVA from MDL
CDK
JDBC 1.4 from Oracle
Features:
1. Direct extract structural and assays information from
Accord Enterprise database, MDL ISIS/Host
database.
2. Web deployed (Java Web Start)
19
3. Applications and Examples
20
H2L Project data analysis and decision making
Heat map details:
1. 214 compounds from a list of Accord Enterprise
2. 18 assays in four assays group
1.
2.
3.
4.
Potency
CYP450 inhibition
In vitro ADME
In vivo PK
21
Heat map
22
Heat map – after sorting by biological profile
23
Focus on most active area (top area)
All top compounds are in clinical
or lead candidates
24
Summary for H2L data analysis
Bring together structural, as well as many biological
assays for a discovery project.
Multiple dimensional data analysis
Most activity compounds is not the top compounds in overall
profile score.
Heat map can pick up the drug candidates
Problems:
Missing data point: lots of compounds do not have in
vivo data.
Clustering analysis is not accurate in this case.
25
Kinase activity and selective analysis
Heat map details:
1. 105 positives in multiple kinases screen
2. 12 Kinase assays
4 Kinase family
AGC
OTHER
STE
TK
26
Sorted by Active Profile
Multiple-kinase inhibitors are ranked in top.
27
Cluster by structural similarity
1. Compounds are colored
by clusters
2. Cluster 1: AGC-2,
Other_3, and TK_10
inhibitors
3. Cluster 2: Other_3
inhibitors
4. Pan-inhibitors
Structural similar
compounds (in same
structural cluster) have
similar kinase inhibitory
behaviors.
28
Cluster by overall kinase inhibitory profile
Multiple inhibitor cluster
Three singlet exhibit
different inhibitory
pattern
AGC_1 inhibitor cluster
AGC_1,AGC_2 and
TK_10 inhibitors
29
Cluster assays based on overall compounds profile
•
•
The clusters of assays based on compounds profile is
not same as phylogeny tree.
Identify kinases with possible cross-interaction.
30
Summary of kinase inhibitors heat map
1.
2.
3.
4.
Identify pan-inhibitors.
Graphics structural-activity relationship
Identify kinase inhibitors activity patter for selectivity
analysis.
Clustering kinase based on compounds profile and
identify possible cross-interaction group.
31
Conclusion
1.
Provide a interactive graphics tool for decision
making in drug discovery process.
2.
Direct get data from corporate database
Interactive
Information-rich: structural and biological assay in one
place
One-stop shop for information analysis of drug discovery
Statistic analysis based on result code provides
powerful tool in decision making
Based on overall biological profile
Can pick winner in H2L process
Provide useful SAR analysis for compounds
Provide selectivity profiles for biological targets.
32
Acknowledge
Ben Askew
Steve Arkinstall
Brian Healey
33