没有幻灯片标题 - ACS Division of Chemical Information

Download Report

Transcript 没有幻灯片标题 - ACS Division of Chemical Information

Data mining of toxic chemicals &
database-based toxicity prediction
Jiansuo Wang & Luhua Lai
Institute of Physical Chemistry, Peking University
P. R. China
1
Our goal: to introduce risk assessment of
chemicals in the early stage of drug design.
Candidates generated by computer aid
Initial screening of chemical toxicity
Leads which are a bit “safer”
2
Due to computer-aided drug design,
characteristics & difficulty of the problem
besides the complexity of toxicity :
•The virtually generated molecules are numerous.
•The molecules designed for drugs may be structurally
diverse.
•The molecules have no or little other information
except for chemical structure.
3
How to evaluate the bio-activity (toxicity)
of a large number of molecules only from
their structure?
•In terms of structure-activity rules: expert system.
•In terms of statistical models: QSAR
(Qualitative/Quantitative Structure Activity
Relationship).
4
How to extract rules/models of toxic
chemicals from the database of toxic
chemicals to aid toxicity assessment?
•Structural features of toxic chemicals
statistical analysis,
similarity analysis,
To the database RTECS
cluster analysis
•QSAR models of toxic chemicals
QSAR combined with cluster analysis
To the database RTECS
5
What features toxic chemicals?
Molecular weight
Atomic composition of molecules
groups of molecules
rings of molecules
An initial database analysis shows that there is
no distinct difference between toxic chemicals
and drugs about these basic molecular features.
6
Classification of toxic substances
according to action modes:
1) substances that exhibit extremes of acidity, basicity,
dehydrating ability, or oxidizing power;
2) reactive substances that contain functional groups prone to
react with biomolecules in a damaging way;
3) heavy metals;
4) lipid-soluble compounds;
5) binding species in a reversible or irreversible way that
bond to biomolecules and alter the normal function, and so
on .
Manahan, S. E. Toxicological chemistry
7
Structure patterns
Considering the integrality of molecules and the specificity of
action modes between the molecules.
A molecular structure pattern is defined as a template
comprising a given framework and some given groups.
It represents the common structural features shared by a series
of molecules that are possible to act in a toxicologically
similar manner.
8
How to get molecular structure patterns?
•Dissect the molecules
•Similarity comparison:
RAB 
S A  SB
S A  SB
•Cluster analysis
.HCl
molecule
C 2H 5-N-(CH 2)2-OCOCH3
side-chains
framework
C
OMe
C
C2H5-N-(CH2)2-OCOCH3
MeO—, HCl
ring-systems
linkers
C
9
Do structure patterns really exist in the
database of toxic chemicals ?
The underlining idea of structure patterns:
Specificity of action modes
Structural correlation among
the molecules with similar action mode
The embodiment of structure patterns in the database:
• Structure similarity among the molecules in the databases
will become convergent when the size of the databases varies
from small to large. Parallel analysis
• A large enough database will have predictive power for
new toxic chemicals to a certain degree. Cross analysis
10
The curve of coverage rates vs size of databases when 0.6 is
given as the similarity limit.
Figure displays that prediction accuracy is given, prediction ability of
the databases tends to be convergent when the database is large enough.
11
It indicates of the possibility that structure patterns exist in the database.
The findings of systematic analysis
about the database indicate:
not only structure patterns promise
to exist,
but also it is necessary and feasible
to search for structure patterns.
12
The representative molecules of some structure patterns of
toxic chemicals
Chemical-count
401
CAS-Number
140-41-0
Chemical structure
O
NH
Cl
384
55-45-8
383
102585-42-2
CH3
Cl
NH
Cl
N
OCOCCl3
COOCH2
CH2 NH4 Cl
.HCl
CH2CH3
O
352
CH2CH(CH3)
N
COCH2
Cl
73972-98-2
C(CH3 )3
CH2CH3
CH2CH3
Cl
Cl
OC=O
HOOC
323
N
Cl
CH3
2828-42-4
NHCOO
N
CH3
13
Data mining of toxic chemicals: QSAR
combined with structure patterns
A two-step strategy to explore noncongeneric toxic
chemicals from the database: the screening of structure
patterns and the generation of detailed relationship
between structure and activity.
First, an efficient similarity comparison is proposed
to screen chemical patterns for further QSAR analysis.
Then, QSAR study of structure pattern can provide
the estimate of the activity as well as the detailed
relationship between activity and structure.
14
C 2H 5(CH 3)CH
O
An example of the implementation
CH 2-CH=CH 2
O
N
N
O
•Select one structure pattern.
The representative molecule
of the structure pattern
(WLN:
T6VMVMV FHJ F2Y&1 F2U1; CAS-number: 11544-6):
•By computing molecular similarity, we get 189 chemicals
from the database RTECS whose similarity values to the
representative molecule are higher than 0.6.
•According to species observed and route of exposure, the
chemicals mainly fall in the five major categories.
•Build CoMFA models between the structure and LD50
values about three series of chemicals.
15
Rabbit-intravenous: cross-validated and final fit CoMFA
analysis with five components; 37 chemicals, q2 = 0.608, r2 =
0.981, F = 323.
16
Rabbit-intravenous: contour map of final CoMFA model; for steric
effects, more bulk near green and less bulk near yellow is favorable to
increase the active, while for electrostatic effects, more positive near
blue and more negative near red is desirable for molecules to be more
17
active.
The
performance
of
procedure demonstrates:
overall
•such a stepwise scheme is feasible and
effective to mine a database of toxic
chemicals.
•The scheme take account of structural
diversity of toxic chemicals
•The scheme is a compromise between
speed and accuracy.
18
dbToxPre: database-based toxicity predictor of
chemicals
Inquiry molecule
Database of toxic chemicals
ShapeAnal
Structure-related set
Field-based similarity
analysis
Flexible CoMFA
analysis
Close molecule
& similarity-activity
CoMFA model &
activity prediction
19
dbToxPre
The program mainly includes four parts:
1) a fast and efficient clustering selection of molecules based on
molecular shape
2) field-based similarity computation of molecular structure
based on shape cluster
3) flexible CoMFA analysis of molecules based on shape cluster
4) a database of toxic chemicals suitable for such procedure
The characteristics of the program:
fast; efficient; dynamically combining with the database
20
ShapeAnal:fast & efficient shape analysis
of molecules
Inquiry molecule
Marking of atoms in the molecule
Structure description:dimension,ring systems,
relative orientation of ring-system atoms
Alignment of molecule shapes
Structure-related set
21
Molecular Field
• Concept:continuous property fields around the
molecule produced by the molecular atoms.
• Similarity analysis of molecular field(Carbo index):
RAB
P P dv


( P dv) ( P dv)
A B
2
A
1
2
2
1
2
B
• Comparative Molecular Field Analysis, CoMFA
22
Evolutionary Algorithm considering flexibility of molecules
•Community/Population: structure-related set
•Species/Chromosome: combination of rotatable
single bonds in the molecules
•Convergence: steady state of sorting
•Procedure:
Parent generation
Congenric mutation
Child generation
23
Fast field-based similarity analysis
Structure-related set
Molecular alignment based on framework shape
EA: conformation mutation & similarity comparison
Similarity analysis & activity prediction
24
Flexible CoMFA
Structure-related set
Molecular alignment based on framework shape
EA: conformation mutation & CoMFA
CoMFA model & activity prediction
• The procedure of CoMFA
• Characteristics: considering conformational
flexibility & hydrophobic field
25
Rebuilding of toxic-chemical database
•Seleciton of DBMS
Michael Stonebraker’s classification:
simple data & no inquiry--file system
complex data & no inquiry--object-oriented DBMS
simple data & inquiry -- relationship DBMS
complex data & inquiry -- object-relationship DBMS:
Postgresql
•Sketch map of the design of Toxdb
Toxdb
StructInfo
AcuteTox
。
。
。
。
。
。
。。。。。
。
。
。
。
。
。
。
26
Database-based toxicity prediction of chemicals
provides activity assessment of the inquiry molecule by a
serial of related molecules from the database. The
purposes:
•To try the best to use available known
knowledge of related chemicals.
•To offset uncertainty of single data by
mutual correction among a serial of
molecules.
27
Conclusion
•Initial analysis of toxic-chemical database confirms the concept
of structure pattern of toxic chemicals.
•QSAR combined with structure pattern provide an alternative to
explore noncongenric toxic chemicals in the database.
•Database-based toxicity prediction combines dynamically the
database to assist risk assessment of chemicals.
•Data-mining & toxicity prediction: visualization computation
Storage computation: effective computation integrated into
reasonable data storage
Reference & paper:
1. Data mining of toxic chemicals: structural patterns and QSAR, Jiansuo Wang, Luhua
lai, Youqi Tang, J. Mol. Modelling,1999,252-262.
2. Predictive toxicology of toxic chemicals and database mining, Jiansuo Wang, Luhua
lai, Youqi Tang, Chinese Science Bulletin, 2000, 45, 12, 1093-1097。
3. Structural features of toxic chemicals for specific toxicity, Jiansuo Wang, Luhua lai, 28
Youqi Tang , J. Chem. Inf. Comput. Sci.,1999,39,6,1173-1189.
Acknowledgements
Prof. Luhua Lai
Prof. Youqi Tang
Mr. Alan Gelberg
…...
29