Molecular Descriptors

Download Report

Transcript Molecular Descriptors

Molecular Descriptors
C371 Fall 2004
INTRODUCTION
• Molecular descriptors are numerical
values that characterize properties of
molecules
• Examples:
– Physicochemical properties (empirical)
– Values from algorithms, such as 2D
fingerprints
• Vary in complexity of encoded information
and in compute time
Descriptors for Large Data Sets
• Descriptors representing properties of
complete molecules
– Examples: LogP, Molar Refractivity
• Descriptors calculated from 2D graphs
– Examples: Topological Indexes, 2D
fingerprints
• Descriptors requiring 3D representations
• Example: Pharmacophore descriptors
DESCRIPTORS CALCULATED
FROM 2D STRUCTURES
• Simple counts of features
– Lipinski Rule of Five (H bonds, MW, etc.)
– Number of ring systems
– Number of rotatable bonds
• Not likely to discriminate sufficiently when
used alone
• Combined with other descriptors for best
effect
Physicochemical Properties
• Hydrophobicity
– LogP – the logarithm of the partition coefficient
between n-octanol and water
• ClogP (Leo and Hansch) – based on small set of
values from a small set of simple molecules
– BioByte: http://www.biobyte.com/
– Daylight’s MedChem Help page
– http://www.daylight.com/dayhtml/databases/medchem/m
edchem-help.html
– Isolating carbon: one not doubly or triply bonded to a
heteroatom
ACD Labs Calculated Properties
• http://www.acdlabs.com
• ACD Labs values now incorporated into
the CAS Registry File for millions of
compounds
• I-Lab: http://ilab.acdlabs.com/
– Name generation
– NMR prediction
– Physical property prediction
Molar Refractivity
• MR = n2 – 1 MW
-------- ----n2 + 2 d
where n is the refractive index, d is
density, and MW is molecular weight.
• Measures the steric bulk of a molecule.
Topological Indexes
• Single-valued descriptors calculated from
the 2D graph of the molecule
• Characterize structures according to size,
degree of branching, and overall shape
• Example: Wiener Index – counts the
number of bonds between pairs of atoms
and sums the distances between all pairs
Topological Indexes: Others
• Molecular Connectivity Indexes
– Randić (et al.) branching index
• Defines a “degree” of an atom as the number of
adjacent non-hydrogen atoms
• Bond connectivity value is the reciprocal of the
square root of the product of the degree of the two
atoms in the bond.
• Branching index is the sum of the bond
connectivities over all bonds in the molecule.
– Chi indexes – introduces valence values to
encode sigma, pi, and lone pair electrons
Kappa Shape Indexes
• Characterize aspects of molecular shape
– Compare the molecule with the “extreme
shapes” possible for that number of atoms
• Range from linear molecules to completely
connected graph
2D Fingerprints
• Two types:
– One based on a fragment dictionary
• Each bit position corresponds to a specific substructure
fragment
• Fragments that occur infrequently may be more useful
– Another based on hashed methods
• Not dependent on a pre-defined dictionary
• Any fragment can be encoded
• Originally designed for substructure searching,
not for molecular descriptors
Atom-Pair Descriptors
• Encode all pairs of atoms in a molecule
• Include the length of the shortest bond-bybond path between them
• Elemental type plus the number of nonhydrogen atoms and the number of πbonding electrons
BCUT Descriptors
• Designed to encode atomic properties that
govern intermolecular interactions
• Used in diversity analysis
• Encode atomic charge, atomic
polarizability, and atomic hydrogen
bonding ability
DESCRIPTORS BASED ON 3D
REPRESENTATIONS
• Require the generation of 3D
conformations
– Can be computationally time consuming with
large data sets
– Usually must take into account conformational
flexibility
– 3D fragment screens encode spatial
relationships between atoms, ring centroids,
and planes
Pharmacophore Keys
& Other 3D Descriptors
• Based on atoms or substructures thought
to be relevant for receptor binding
• Typically include hydrogen bond donors
and acceptors, charged centers, aromatic
ring centers and hydrophobic centers
• Others: 3D topographical indexes,
geometric atom pairs, quantum
mechanical calculations for HUMO and
LUMO
DATA VERIFICATION AND
MANIPULATION
• Data spread and distribution
– Coefficient of variation (standard deviation
divided by the mean)
• Scaling (standardization): making sure that
each descriptor has an equal chance of
contributing to the overall analysis
• Correlations
• Reducing the dimensionality of a data set:
Principal Components Analysis