Computation and computational thinking in chemistry

Download Report

Transcript Computation and computational thinking in chemistry

Computation and computational thinking
in Chemistry
Paul Madden
School of Chemistry
The “plan”
• My interest – atomistic, predictive
calculations of the properties of materials
• Energy minimization – optimization ideas
• Cutting out the computer – application of
optimization strategies in synthesis
numerous technologies benefit from the
capability to model thermodynamic and
transport properties accurately & reliably
+
-
Pyroprocessing of Nuclear Waste
More principles:
LiCl/KCl “solvent” – now
fluorides
Are the (continuum) models of transport
adequate representations of reality?
Why simulate?:
interpretation/visualization
provide data not obtainable by experiment
answer problems of principle, test theory
Molecular Dynamics simulation:
Follow trajectory of
interacting atoms
Newton’s Laws of Motion
r
Molecular Dynamics simulation:
Follow trajectory of
interacting atoms
Newton’s Laws of Motion
Need a “Law of Force”
– sometimes “pairwise additive”
(like gravitation F ∞ 1/ r 2 )
r
Electron Densities and the “Force Laws”
-
Covalent
+
Ionic, Non-bonding
Overlap of two spherical,
non-bonding charge
densities
Electron Densities and the “Force Laws”
-
Covalent
A stiff spring between
bonded atoms
+
Ionic, Non-bonding
Overlap of two spherical,
non-bonding charge
densities
Can model the dependence
on interatomic separation
because of the simplicity of these force laws
can model (atomistically) molecular materials
of great complexity
Phospholipid
Cell membrane
Can visualise (qualitatively)
Ion permeation through α-haemolysin
"These movies were made by Dr. Aleksei Aksimentiev using VMD and are owned by the Theoretical and
Computational Biophysics Group, NIH Resource for Macromolecular Modeling and Bioinformatics, at
the Beckman Institute, University of Illinois at Urbana-Champaign."
Electron Densities and the “Force Laws”
-
Covalent
+
Ionic, Non-bonding
Overlap of two spherical,
Now “easily” manipulated
non-bonding charge
by chemistry (200 years)
densities
Control the “liaisons”
affected by thermal motion
Inhibition of Cyclin Dependent Kinases (CDKs)
CDK2 is involved in DNA replication
CDK2
It is overexpressed in cancer cells,
=> Find inhibitors
ATP binding pocket
Inhibition of Cyclin Dependent Kinases (CDKs)
NU2058
NU6102
9d-NU6027
NU6027
ATP
SU9516
Staurosporine
Inhibition of Cyclin Dependent Kinases (CDKs)
CDK2 is involved in DNA replication
CDK2
It is overexpressed in cancer cells,
=> Find inhibitors
ATP binding pocket
MD simulation:
Follow trajectory of
interacting atoms
r
Newton’s Laws of Motion
Need a “Law of Force” – sometimes pairwise
additive – and this makes large-scale possible
But, this only works if the electrons
are moving “trivially” with nucleii
Interatomic interactions mediated by local electron density
generally, this depends on instantaneous coordination
environment
Electron density for a
self-interstitial in Aluminium
Interatomic interactions mediated by local electron density
generally, this depends on instantaneous coordination
environment
Electron density for a
self-interstitial in Aluminium
Can obtain the forces
direct from an electronic
structure calculation
“First-Principles”
Such calculations can give
accurate binding energies (v.i.)
Interatomic interactions mediated by local electron density
generally, this depends on instantaneous coordination
environment
Electron density for a
self-interstitial in Aluminium
Can obtain the forces
direct from an electronic
structure calculation (onthe-fly)
Additional benefit: obtain
the electronic structure
E.g: mechanism of oxidation of a silicon
surface (M. Payne)
The ab initio MD methods are general and
particularly useful when covalent bonds are
broken and formed
But they are very expensive, meaning that
many issues, requiring large
simulations or long runs, are out of reach
Why simulate?:
interpretation/visualization
provide data not obtainable by experiment
answer problems of principle, test theory
i.e. quantitative, realistic modelling
Properties of materials under extreme conditions
Mineralogy of the earth’s interior
Phase diagram of H2O -- or is
it??
1 GPa = 10,000 atmospheres!!
Direct coexistence simulation – to
obtain melting temperature
Determine T & P at which equilibrated
solid and liquid
Size Matters:
Gillan, Alfè
The ab initio MD methods are general and
particularly useful when covalent bonds are
broken and formed
But they are very expensive, meaning that
many issues are out of reach
Maybe we can use simpler representation of
electronic structure in some cases
The ab initio MD methods are general and
particularly useful when covalent bonds are
broken and formed
But they are very expensive, meaning that
many issues are out of reach
Maybe we can use simpler representation of
electronic structure in some cases
e.g. in ionic materials simple force laws do
not work quantitatively
Maybe in “ionic” materials:
Electron density
in an AlF3 crystal
Ions are not
spherical – they
are deformed in this
environment
Multiscale modelling
Incorporate such ideas
into interaction potential
and parameterize A-I
Direct coexistence simulation to determine
the melting temperature of MgO
Determine T & P at which equilibration occurs
Melting curve of MgO
ab initio model
Many problems may be
regarded as optimization
e.g. lowest energy structures
of a cluster or a crystal
=
=
+
+
Finding a global
minimum may be
easy, or hard
Energy Landscape
concept
=
+
+
For “hard” problems
non-minimization
strategies, such as
“genetic algorithms”
have been adopted
Structures of virus capsids
Hard for minimization
Genetic algorithm
Parents
110001101001001001110
00110110101101011100
Crossover
1100011010010 1011100
0011011010110 1001110
Offspring
mutation
“fitness”
Start with a population of “parents” and evolve
successive generations, by stochastically selecting
moves, to improve fitness
Representation of problems within GA paradigm
Folding a protein, which should be “hard”, must actually
be easy (for nature – simulated annealing works!).
Primary Structure: Sequence
• The primary structure of a protein is the amino acid sequence
Typical protein will
contain ~ 200 links
Tertiary Structure: A Protein
Fold
Proteins only
work when
properly folded
Primary Structure: Sequence
• Twenty different amino
acids have distinct
shapes and properties
Secondary Structure: , , &
loops
•  helices and  sheets are stabilized by hydrogen bonds
between backbone oxygen and hydrogen atoms
Tertiary Structure: A Protein
Fold
Levinthal paradox, 1968
•
•
•
•
•
•
•
•
A polypeptide chain of 100 residues (amino acids)
Each residue has only 2 possible configurations
2^100~10^30 configurations
10^-11 second is required to convert one to another
10^19 seconds ~10^11years!
Doubling time for a bacteria is <30 minutes
Molten globule (microsecond ~ millisecond)
Native state (millisecond ~ seconds)
Idea of a folding
“funnel”
“Foldability” must be encoded in the amino acid sequence
Schematic representation of some of the states
accessible to a polypeptide chain following its
biosynthesis
We know the amino acid sequence from the
genome project
The folded structures of some proteins is known
from crystallography
A major objective is to be able to predict the fold
from a knowledge of the sequence
Inhibition of Cyclin Dependent Kinases (CDKs)
CDK2 is involved in DNA replication
CDK2
It is overexpressed in cancer cells,
=> Find inhibitors
ATP binding pocket
Inhibition of Cyclin Dependent Kinases (CDKs)
NU2058
NU6102
9d-NU6027
NU6027
ATP
SU9516
Staurosporine
Can calculate binding energy of molecule at active site
Must go out to large distances to get convergence
Binding
Energy (eV)
Identifying drug molecules by direct calculation of
energetics is far too slow for practical applications
Instead use QSAR
Quantitative Structure Activity Relations
Activity = function(prop1,prop2,prop3,prop4,…)
prop is a readily-determined
property of each potential
drug mol.
Use a training set of drug mols
to “determine” function
(neural net)
Search huge databases of mols > 106
=> targets for synthesis and testing
However, the “properties” of relevance are defined on 3-d grids
e.g. of the electrostatic
potential, or the
hydrophobicity
of the molecule
- which should match
that of the binding site
But, the molecule (& grid) must be aligned with pocket
And, property varies with the conformation of the molecule !
Leads to huge search problems – screen-savers
http://www.bellatrix.ox.ac.uk
Cutting out the computer!
Oxide glasses with many components
Step 1: prepare “gene pool” of 54 glasses made up with
randomly chosen compositions
Step2: measure their luminosities – “fitness”
“Engineering” to produce
arrays of such chemicals
and to screen them for
desirable characteristics
is now well-established
“Combinatorial Chemistry”
e.g. Prof. Mark Bradley
(Huge arrays possible)
Generate a second generation stochastically and “evolve”
Drug Discovery Today !