RuncornMar08 - Redbrick

Download Report

Transcript RuncornMar08 - Redbrick

How to make the most of a QM calculation
Noel O’Boyle
[email protected]
www.ccdc.cam.ac.uk
Background
• ‘Career’:
– (ROI) UCG, DCU, UCD
– (UK) UCC, CCDC
• PhD in Computational Inorganic Chemistry
– Han Vos, Dublin City University (Ru polypyridyls)
• Postdoc in Cheminformatics
– Ciaran Regan, University College Dublin
– John Mitchell, University of Cambridge (MACiE)
• Postdoc in Protein-Ligand Docking
– Cambridge Crystallographic Data Centre (GOLD)
www.ccdc.cam.ac.uk
Tools
• GaussSum
– GUI for analysing results of comp chem calculations
• cclib
– Python library for extracting data from comp chem
calculations (now used by GaussSum…and others)
• Pybel
– Python library giving access to OpenBabel
www.ccdc.cam.ac.uk
Some general themes
• Interoperability
• Reinvent the wheel
– Libraries spread the work, and increase the reach
• Tools can add value
• Cross-platform
• Python where possible
www.ccdc.cam.ac.uk
Python is the dominant scripting language in
chemistry
• Cheminformatics
– OpenBabel, RDKit, OEChem, Daylight, Cambios Molecular
Toolkit, Frowns, PyBabel
• Computational chemistry
– OpenBabel, PyQuante, NWChem, Maestro/Jaguar, MMTK
• Visualisation
– CCP1GUI, PyMOL, Zeobuilder
• Scientific programming
– numpy (interface to ATLAS, LAPACK), can interface to C/C++,
FORTRAN, matplotlib, VTK
www.ccdc.cam.ac.uk
Tools
• GaussSum
– GUI for analysing results of comp chem calculations
• cclib
– Python library for extracting data from comp chem
calculations (now used by GaussSum…and others)
• Pybel
– Python library giving access to OpenBabel
www.ccdc.cam.ac.uk
GaussSum (.sf.net)
• GUI written in Python
• Enables comparisons of calculated properties with
experimental results
– orbitals and molecular structure, partial density of states
• HOMO is 40% Ligand 1, 20% Ligand 2, etc.
– vibrational frequencies and IR spectrum
• scale frequencies individually or generally
– electronic transitions and UV-vis, CD spectra
– electronic transitions and molecular structure
• lowest energy transition involves change in ‘charge density’ on
Ligand 1 from 0% to 80%
• (Electron density difference map removed, but how to make
package independent?)
NM O’Boyle, AL Tenderholt, KM Langner. J. Comp. Chem.
2008, 29, 839. http://gausssum.sf.net
www.ccdc.cam.ac.uk
GaussSum
• Simple features that make life easier for
modellers
– ‘grep’ for lines containing particular expressions
• can store up to four expressions
– spectra and extracted data are written to files suitable
for Excel
– plot convergence of geometry or SCF
• early warning of problems (unlike plotting of energy)
• “GaussSum parameter”
– Sum of (log of (deviation from target value))
[(for all unmet targets]
www.ccdc.cam.ac.uk
www.ccdc.cam.ac.uk
www.ccdc.cam.ac.uk
Tools
• GaussSum
– GUI for analysing results of comp chem calculations
• cclib
– Python library for extracting data from comp chem
calculations (now used by GaussSum…and others)
• Pybel
– Python library giving access to OpenBabel
www.ccdc.cam.ac.uk
cclib (.sf.net) - a Python library for packageindependent computational chemistry
algorithms
• In Jan 2005, Adam Tenderholt started writing PyMOlyze (now
QMForge)
– some overlap with GaussSum
– we decided to collaborate on a common framework for extracting data from
QM log files
• Karol Langner joined in Jan 2007
• cclib now extracts and standardises data from ADF, GAMESS,
GAMESS-UK, Gaussian, PC GAMESS, Jaguar, Molpro,
ORCA...(someone offered this week to help with ACES, Dalton,
NWChem, and PSI too)
NM O’Boyle, AL Tenderholt, KM Langner. J. Comp. Chem.
2008, 29, 839. http://cclib.sf.net
www.ccdc.cam.ac.uk
Why is cclib needed?
• Analysis methods are available only to users of
certain packages
– Morokuma energy decomposition (implemented in
GAMESS)
– Charge Decomposition Analysis (Frenking's code
only reads Gaussian output files)
• Keeps up to date with new versions of packages
• Allows chemists to focus on algorithms
• Makes implementation of algorithms
independent of proprietary software
www.ccdc.cam.ac.uk
>>> from cclib.parser import ccopen
>>> myfile = ccopen("basicGAMESS-UK/water_mp3.out")
>>> data = myfile.parse()
>>> dir(data)
['__class__', '__delattr__', '__dict__', '__doc__',
'__getattribute__', '__hash__', '__init__', '__module__',
'__new__', '__reduce__', '__reduce_ex__', '__repr__',
'__setattr__', '__str__', '__weakref__', '_attrlist',
'_attrtypes', '_intarrays', '_listsofarrays', 'aonames',
'arrayify', 'atombasis', 'atomcoords', 'atomnos', 'charge',
'coreelectrons', 'gbasis', 'getattributes', 'homos',
'listify', 'mocoeffs', 'moenergies', 'mosyms', 'mpenergies',
'mult', 'natom', 'nbasis', 'nmo', 'scfenergies',
'scftargets', 'scfvalues', 'setattributes']
>>>
7
>>>
[[[
[
[
>>>
print data.nbasis
print data.atomcoords
0.
0.
-0.2251786]
0.
1.4941103 0.9007143]
0.
-1.4941103 0.9007143]]]
www.ccdc.cam.ac.uk
Attribute
Name
aonames
aooverlaps
atomcoords
atomnos
coreelectrons
etenergies
etoscs
etrotats
etsecs
etsyms
fonames
fooverlaps
gbasis
geotargets
geovalues
homos
mocoeffs
moenergies
mosyms
mpenergies
natom
nbasis
nmo
scfenergies
scftargets
scfvalues
vibdisps
vibfreqs
vibirs
vibramans
vibsyms
www.ccdc.cam.ac.uk
Description
atomic orbital names
atomic orbital overlap matrix
atom coordinates
atomic numbers
number of core electrons in an atom's pseudopotential
energies of electronic transitions
oscillator strengths of electronic transitions
rotatory strengths of electronic transitions
singly-excited configurations for electronic transitions
symmetries of electronic transitions
fragment molecular orbital names
fragment molecular orbital overlap matrix
coefficients and exponents of Gaussian basis functions
criteria target values for geometry convergence
criteria values for geometry convergence
molecular orbital index of the HOMO(s)
molecular orbital coefficients
molecular orbital energies
molecular orbital symmetries
Möller-Plesset corrected electronic energies
number of atoms
number of basis functions
number of molecular orbitals
electronic energy of the molecule
criteria target values for SCF convergence
criteria values for SCF convergence
Cartesian displacement vectors
vibrational frequencies
IR intensities
Raman intensities
Symmetries of vibrations
Units
Å
cm
-1
eV
eV
eV
ΔÅ
-1
cm
-1
km mol
4
-1
A amu
Datatype
List
array of rank 2
array of rank 3
array of rank 1
array of rank 1
array of rank 1
array of rank 1
array of rank 1
list of lists
List
List
array of rank 2
PyQuante format
array of rank 1
array of rank 2
array of rank 1
list of arrays of rank 2
list of arrays of rank 1
list of lists
array of rank 2
Integer
Integer
Integer
array of rank 1
array of rank 2
list of arrays of rank 2
array of rank 3
array of rank 1
array of rank 1
array of rank 1
List
Standardisation of Symmetry Labels
• For the symmetry labelled BU by GAMESS and
Gaussian, ADF uses B.u, GAMESS-UK uses bu
and Jaguar uses Bu
– cclib normalises all of these to Bu
• In other cases all of the programs disagree: A” is
alternatively represented by AAA (ADF), A’’
(GAMESS), a1” (GAMESS-UK), A” (Gaussian)
and App (Jaguar)
• (one of the programs is internally inconsistent in
another case)
www.ccdc.cam.ac.uk
..\data\ADF\ADF2004.01\MoOCl4-sp.adfout.bz2... parsed
..\data\ADF\ADF2004.01\mo_sp.adfout.bz2... parsed
..\data\ADF\ADF2004.01\NH3.adfout.bz2... parsed
..\data\ADF\ADF2005.01\Os3(CO)12-D3h.zip... parsed
..\data\ADF\ADF2005.01\Os3.zip... parsed
..\data\ADF\ADF2006.01\Au2.out... parsed
..\data\ADF\ADF2006.01\Frags_NiCO4_orig.out... parsed
..\data\ADF\ADF2006.01\HgMeBr_zso_orig.out... parsed
..\data\ADF\ADF2006.01\dvb_gopt.adfout.bz2... parsed
Are the GAMESS UK files ccopened and parsed correctly?
..\data\GAMESS-UK\basicGAMESS-UK\dvb_gopt.out... parsed
..\data\GAMESS-UK\basicGAMESS-UK\dvb_gopt_b.out... parsed
..\data\GAMESS-UK\basicGAMESS-UK\dvb_gopt_c.out... parsed
..\data\GAMESS-UK\basicGAMESS-UK\dvb_gopt_d.out... parsed
..\data\GAMESS-UK\basicGAMESS-UK\dvb_ir.out... parsed
..\data\GAMESS-UK\basicGAMESS-UK\dvb_raman.out... parsed
..\data\GAMESS-UK\basicGAMESS-UK\dvb_sp.out... parsed
..\data\GAMESS-UK\basicGAMESS-UK\dvb_sp_b.out... parsed
..\data\GAMESS-UK\basicGAMESS-UK\dvb_un_sp.out... parsed
..\data\GAMESS-UK\basicGAMESS-UK\dvb_un_sp_b.out... parsed
..\data\GAMESS-UK\basicGAMESS-UK\MoOCl4-sp.out... parsed
..\data\GAMESS-UK\basicGAMESS-UK\water_mp2.out... parsed
..\data\GAMESS-UK\basicGAMESS-UK\water_mp3.out... parsed
..\data\GAMESS-UK\GAMESS-UK6.0\dscf_4.out.gz... parsed
..\data\GAMESS-UK\GAMESS-UK6.0\duhf_1.out.gz... parsed
..\data\GAMESS-UK\GAMESS-UK7.0\mg10.out.gz... parsed
..\data\GAMESS-UK\GAMESS-UK7.0\pyridine.out.gz... parsed
..\data\GAMESS-UK\GAMESS-UK7.0\pyridine2_21m10r.out.gz... parsed
Are the Jaguar files ccopened and parsed correctly?
..\data\Jaguar\Jaguar4.2\dvb_gopt.out.bz2... parsed
..\data\Jaguar\Jaguar4.2\dvb_gopt_b.out.bz2... parsed
..\data\Jaguar\Jaguar4.2\dvb_ir.out.bz2... parsed
..\data\Jaguar\Jaguar4.2\dvb_sp.out.bz2... parsed
Total: 147
Failed: 0
Errors: 2
**** testGeoOpt: GAMESS-UK geometry optimization unittest. ****
Are the indices in atombasis the right amount and unique? ... ok
Are atomcoords consistent with natom and Angstroms? ... ok
Are the atomnos correct? ... ok
Are the charge and multiplicity correct? ... ok
Are the coreelectrons all 0? ... ok
Are the dimensions of mocoeffs equal to 1 x (homo+5) x nbasis? ... ok
Do the geo targets have the right dimensions? ... ok
Are atomcoords consistent with geovalues? ... ok
Are scfvalues consistent with geovalues? ... ok
Is the index of the HOMO equal to 34? ... ok
Is the number of evalues equal to nmo? ... ok
Is the number of atoms equal to 20? ... ok
Is the number of basis set functions correct? ... ok
Did this subclass overwrite normalisesym? ... ok
Is the SCF energy within 40eV of target? ... ok
Do the scf targets have the right dimensions? ... ok
Are scfvalues and its elements the right type? ... ok
Are all the symmetry labels either Ag/u or Bg/u? ... ok
Is moenergies a list containing one numpy array? ... ok
---------------------------------------------------------------------Ran 19 tests in 0.016s
********* SUMMARY PER PACKAGE ****************
Total
Passed Failed Errors
ADF2007.01
48
46
0
0
GAMESS-UK
58
58
0
0
GAMESS-US
75
71
2
0
Gaussian03
92
88
1
0
Jaguar7.0
54
47
0
0
Molpro2006
63
59
0
0
ORCA2.6
54
44
5
3
PCGAMESS
75
74
0
0
Skipped
2
0
2
3
7
4
2
1
********* SUMMARY OF EVERYTHING **************
TOTAL: 519
PASSED: 487
FAILED: 8
ERRORS: 3
SKIPPED: 21
But it’s Python! I only code C, FORTRAN, etc.
• Use cclib to convert the log file to JSON
• JSON libraries are available for
– C, C++, Java, Javascript, Perl, PHP, Python, Ruby
• Trivial to write data to some type of FORTRAN format
www.ccdc.cam.ac.uk
Tools
• GaussSum
– GUI for analysing results of comp chem calculations
• cclib
– Python library for extracting data from comp chem
calculations (now used by GaussSum…and others)
• Pybel
– Python library giving access to OpenBabel
www.ccdc.cam.ac.uk
OpenBabel - “Not just file conversion”
• A C++ library for…
• Cheminformatics
– SMARTS searching, InChI, SMILES, molecular fingerprints, groupcontribution based descriptors, determination of SSSR, bond order
perception, hydrogen addition, Gasteiger charge calculation
• Computational chemistry
– AMBER, DMol3, Gaussian, GAMESS, GROMOS96, HyperChem,
Jaguar, MOPAC, Q-Chem, Turbomole, ZINDO
• varying levels of support
• if you want to change this…
– forcefield minimisation (UFF, MMFF94, Ghemical)
– symmetrisation of almost symmetric molecules (coming soon)
http://openbabel.org
www.ccdc.cam.ac.uk
Language bindings…and wrappers
• OpenBabel is a C++ library
• SWIG allows access to OpenBabel from
– Java, Perl, Python, Ruby (and many more if we wish)
• SWIG bindings are direct 1-to-1 translation of C++ API
and objects to a Python API and objects
• Pybel is a Pythonic wrapper around the SWIG bindings
– Makes it easy to carry out common tasks
– Allows idiomatic Python, e.g. using iterators, direct access to
attribute values rather than Get/Set, reduces verbosity
NM O’Boyle, C Morley, GR Hutchison. Chem. Cent. J. 2008,
2, 5. http://openbabel.org/wiki/Python
www.ccdc.cam.ac.uk
Let’s read a MOL file and optimise the geometry with
the UFF forcefield
SWIG bindings
import openbabel as ob
obconv = ob.OBConversion()
obconv.SetInFormat(“mol")
obmol = ob.OBMol()
obconv.ReadFile(obmol, “caffeine.mol")
obff = ob.OBForceField.FindForceField("UFF")
obff.Setup(obmol)
obff.ConjugateGradients(1000)
obff.UpdateCoordinates(obmol)
Pybel
import pybel
mol = pybel.readfile(“mol”, “caffeine.mol”).next()
mol.optimise(“UFF”) # Coming soon!
www.ccdc.cam.ac.uk
Eliminate duplicate molecules from a multimolecule
SD file
import pybel
inchis = []
output = pybel.Outputfile("sdf", "uniquemols.sdf")
for mol in pybel.readfile("sdf", "inputfile.sdf"):
inchi = mol.write("inchi")
if inchi not in inchis:
output.write(mol)
inchis.append(inchi)
output.close()
Note to self: should use ‘set’ instead of ‘list’ for O(N) instead of O(N**2)
www.ccdc.cam.ac.uk
Make it work on Windows!
• Most users use Windows, and even Linux users
want the option of jumping between OSs
• You restrict the reach of your software (and
hasten its replacement)
• Case study cclib-0.8 (Nov 07):
– cclib-0.8.tar.gz 63
– cclib-0.8.zip 58
– cclib-0.8-py2.4.exe 26
– cclib-0.8-py2.5.exe 45
• For every Linux user, there are 2 Windows users
www.ccdc.cam.ac.uk
Make it easy to install on Windows!
• No dependencies
• Case study: GaussSum 2.1.4 (Nov 2007)
– GaussSum-2.1.4.tar.gz 143 (Linux)
– GaussSum-2.1.4.zip 206 (Windows, requires Python, Numpy
and Python Imaging Library)
– GaussSumexe-2.1.4.zip 396 (Windows, no dependencies)
www.ccdc.cam.ac.uk
Make it easy to install on Windows!
• No dependencies
• Case study: GaussSum 2.1.4 (Nov 2007)
– GaussSum-2.1.4.tar.gz 143 (Linux)
– GaussSum-2.1.4.zip 206 (Windows, requires Python, Numpy
and Python Imaging Library)
– GaussSumexe-2.1.4.zip 396 (Windows, no dependencies)
• Lower the barrier to installation
– A one-click installer > a .zip file >> a .tar.gz file
– Make the installation instructions easy
• Case study: OpenBabel
– OB 2.0.1 Linux:Windows 5:4
– OB 2.1.1 Linux:Windows 5:7.5
www.ccdc.cam.ac.uk
Some questions
• Why is it so easy to add value to QM calculations?
– QM developers don’t consider analysis of results?
www.ccdc.cam.ac.uk
Some questions
• Why is it so easy to add value to QM calculations?
– QM developers don’t consider analysis of results?
• Why don’t QM software developers list compatible tools
on their website?
– Good for the QM software, good for the tool
www.ccdc.cam.ac.uk
Some questions
• Why is it so easy to add value to QM calculations?
– QM developers don’t consider analysis of results?
• Why don’t QM software developers list compatible tools
on their website?
– Good for the QM software, good for the tool
• Why don’t QM software developers make it easier for
tool developers?
– API, documentation describing output, XML, interoperability
www.ccdc.cam.ac.uk
Some questions
• Why is it so easy to add value to QM calculations?
– QM developers don’t consider analysis of results?
• Why don’t QM software developers list compatible tools
on their website?
– Good for the QM software, good for the tool
• Why don’t QM software developers make it easier for
tool developers?
– API, documentation describing output, XML, interoperability
• Why not open source?
– Could fix these problems myself
www.ccdc.cam.ac.uk
Some questions
• Why is it so easy to add value to QM calculations?
– QM developers don’t consider analysis of results?
• Why don’t QM software developers list compatible tools
on their website?
– Good for the QM software, good for the tool
• Why don’t QM software developers make it easier for
tool developers?
– API, documentation describing output, XML, interoperability
• Why not open source?
– Could fix these problems myself
• Why can’t I mix and match calculation methods from
different programs?
www.ccdc.cam.ac.uk
Some more questions
• Why do academics restrict usage of their sophisticated
routines to a single proprietary code?
www.ccdc.cam.ac.uk
Some more questions
• Why do academics restrict usage of their sophisticated
routines to a single proprietary code?
• Why do some visualisation packages use their own
parsing routines instead of adding them to libraries like
OpenBabel or cclib?
www.ccdc.cam.ac.uk
Some more questions
• Why do academics restrict usage of their sophisticated
routines to a single proprietary code?
• Why do some visualisation packages use their own
parsing routines instead of adding them to libraries like
OpenBabel or cclib?
• Why don’t QM packages donate code or contract
developers to improve support in libraries like
OpenBabel or cclib?
– ADF is doing this
www.ccdc.cam.ac.uk
Some more questions
• Why do academics restrict usage of their sophisticated
routines to a single proprietary code?
• Why do some visualisation packages use their own
parsing routines instead of adding them to libraries like
OpenBabel or cclib?
• Why don’t QM packages donate code or contract
developers to improve support in libraries like
OpenBabel or cclib?
– ADF is doing this
• How can we coordinate interoperability?
…
www.ccdc.cam.ac.uk
4
• BlueObelisk.org
• I propose [email protected]
Wish list
• Build farm (buildbot)
• Calculation farm
• Electron density export will give a major payoff
– Coarse (STO-3G)
– Medium (6-31G..)
– Fine (..)
• Let’s promote each other (help us help you)
www.ccdc.cam.ac.uk
Conclusions
• Interoperability
• Reinvent the wheel
– Libraries spread the work, and increase the reach
• Tools can add value
• Cross-platform
• Python where possible
• “Some of the people some of the time” is a
good aim
www.ccdc.cam.ac.uk
Thanks!
• The OpenBabel development team and particularly
Geoff Hutchison and Chris Morley
• cclib: Adam Tenderholt and Karol Langner
• SourceForge
•
•
•
•
•
Email: [email protected], [email protected]
Blog: http://baoilleach.blogspot.com
Website: http://www.redbrick.dcu.ie/~noel
Check out Linux4Chemistry
Consider subscribing to RSS feed for Blue Obelisk blogs
– http://cb.openmolecules.net/posts.php?category=Blue Obelisk
www.ccdc.cam.ac.uk
QM at Cambridge Crystallographic Data
Centre (CCDC)
Noel O’Boyle
www.ccdc.cam.ac.uk
www.ccdc.cam.ac.uk
www.ccdc.cam.ac.uk
www.ccdc.cam.ac.uk
www.ccdc.cam.ac.uk
www.ccdc.cam.ac.uk
www.ccdc.cam.ac.uk
To get MOGUL screenshot: