Molecular Modeling Basics: 2015

Tuesday, December 1, 2015

Computational Chemistry Highlights: November issue

The November issue of Computational Chemistry Highlights is out.

CCH is an overlay journal that identifies the most important papers in computational and theoretical chemistry published in the last 1-2 years. CCH is not affiliated with any publisher: it is a free resource run by scientists for scientists. You can read more about it here.

Table of content for this issue features contributions from CCH editors Steven Bachrach, Christoph Jacob and Jan Jensen:

Enhancing NMR Prediction for Organic Compounds Using Molecular Dynamics

Tetrabenzo[a,f,j,o]perylene: A Polycyclic Aromatic Hydrocarbon With An Open-Shell Singlet Biradical Ground State

Molecular Dynamics of the Diels–Alder Reactions of Tetrazines with Alkenes and N2 Extrusions from Adducts

Dual-Cavity Basket Promotes Encapsulation in Water in an Allosteric Fashion

Switching between Aromatic and Antiaromatic 1,3-Phenylene-Strapped [26]- and [28]Hexaphyrins upon Passage to the Singlet Excited State

A Local Variant of the Conductor-Like Screening Model for Fragment-Based Electronic-Structure Methods

Interested in more? There are many ways to subscribe to CCH updates.

Also, for your daily computational chemistry fix subscribe to Computational Chemistry Daily

This work is licensed under a Creative Commons Attribution 4.0

Sunday, November 22, 2015

Quick way to share interactive molecular structures and animations

#compchem http://t.co/tWJweuf1wF look at this dehydration. Right click on white space, press animation mode 'loop', right click and play
— gyges01 (@gyges01) October 16, 2015

Here is a neat trick I learned from @gyges01 on Twitter. Say you want to share an interactive molecular structure or animation with someone not well versed in structural visualization tools. Here's what you do.

1. Put your file on a server like figshare and copy the link. For example: http://files.figshare.com/1860026/ethen_01oIRC.cml
~~Unfortunately, a dropbox link, which would have been even easier, does not seem to work~~. See below

2. Combine this link with "http://chemapps.stolaf.edu/jmol/jmol.php?source=":
http://chemapps.stolaf.edu/jmol/jmol.php?source=http://files.figshare.com/1860026/ethen_01oIRC.cml

3. Email (or Tweet) the link (takes a minute to load) with instructions to start animation: right-click > animation > play

4. You can autostart the animation by uploading a "dummy" xyz file with the commands: http://chemapps.stolaf.edu/jmol/jmol.php?source=http://files.figshare.com/2367420/ethen_test.xyz

ethen_test.xyz looks like this:

1
jmolscript: load http://files.figshare.com/1860026/ethen_01oIRC.cml; animation on; animation mode palindrome;
h 0.0 0.0 0.0

@janhjensen Amazing! And I found Dropbox link is also available! Two keys are: http (not https), and https://t.co/TImsktm2Fx
— Acetaminophen (@aminophen) November 23, 2015

2015.11.23 Update Twitter strikes again: @aminophen found the trick to making this work with Dropbox. Change https://www.dropbox.com/s/xxx/file?dl=0 to http://dl.dropboxusercontent.com/s/xxx/file. Example:
https://www.dropbox.com/s/t42dx22mxsioja6/ethen_test.xyz?dl=0 to http://dl.dropboxusercontent.com/s/t42dx22mxsioja6/ethen_test.xyz and use it like before: http://chemapps.stolaf.edu/jmol/jmol.php?source=http://dl.dropboxusercontent.com/s/t42dx22mxsioja6/ethen_test.xyz

This work is licensed under a Creative Commons Attribution 4.0

Sunday, November 1, 2015

Computational Chemistry Highlights: October issue

The October issue of Computational Chemistry Highlights is out.

CCH is an overlay journal that identifies the most important papers in computational and theoretical chemistry published in the last 1-2 years. CCH is not affiliated with any publisher: it is a free resource run by scientists for scientists. You can read more about it here.

Table of content for this issue features contributions from CCH editors Steven Bachrach, David Bowler and Jan Jensen:

Stochastic Multiconfigurational Self-Consistent Field Theory

Unusual H-Bond Topology and Bifurcated H-bonds in the 2-Fluoroethanol Trimer

Benchmarking Continuum Solvent Models for Keto–Enol Tautomerizations

Mechanism of the Intramolecular Hexadehydro-Diels–Alder Reaction

The accuracy of semi-local functionals

Interested in more? There are many ways to subscribe to CCH updates.

Also, for your daily computational chemistry fix subscribe to Computational Chemistry Daily

This work is licensed under a Creative Commons Attribution 4.0

Thursday, October 1, 2015

Computational Chemistry Highlights: September issue

The September issue of Computational Chemistry Highlights is out.

CCH is an overlay journal that identifies the most important papers in computational and theoretical chemistry published in the last 1-2 years. CCH is not affiliated with any publisher: it is a free resource run by scientists for scientists. You can read more about it here.

Table of content for this issue features contributions from CCH editors Steven Bachrach and Jan Jensen:

Experimental Protein Structure Verification by Scoring with a Single, Unassigned NMR Spectrum

Formation of Ground State Triplet Diradicals from Annulated Rosarin Derivatives by Triprotonation

Convex and Concave Encapsulation of Multiple Potassium Ions by Sumanenyl Anions

Interested in more? There are many ways to subscribe to CCH updates.

Also, for your daily computational chemistry fix subscribe to Computational Chemistry Daily

This work is licensed under a Creative Commons Attribution 4.0

Sunday, September 27, 2015

Automatic data extraction from multiple GAMESS output files: an example

Jimmy ran many PM6-D3H+ geometry optimizations using different starting geometries. Typing "grep EQUILIBRIUM *.log" showed that they had all converged, but how to extract the heat of formation on the optimized geometry? This python program does the trick. It finds the last heat of formation in each file or, more precisely, it finds the second to last "word" in the last line where the first word is HEAT.

Update 2015.10.03.
1. Apologies to +Jimmy Charnley Kromann for the poor formulation. Jimmy ran the calculations, I wrote the script. After the post came online he was immediately teased mercilessly by his so-called friends for writing such crappy code. That's on me.

2. As +Anders Steen Christensen pointed out in the comments this can be done in one line using Bash: "for f in *.log; do tac $f | grep -m1 HEAT; done"

3. If you want to do it in Fortran +Lars Bratholm suggested the following code

This work is licensed under a Creative Commons Attribution 4.0

Tuesday, September 1, 2015

Computational Chemistry Highlights: August issue

The August issue of Computational Chemistry Highlights is out.

CCH is an overlay journal that identifies the most important papers in computational and theoretical chemistry published in the last 1-2 years. CCH is not affiliated with any publisher: it is a free resource run by scientists for scientists. You can read more about it here.

Table of content for this issue features contributions from CCH editors Steven Bachrach and Jan Jensen:

AFNMR: automated fragmentation quantum mechanical calculation of NMR chemical shifts for biomolecules

Four Decades of the Chemistry of Planar Hypercoordinate Compounds

Can Density Cumulant Functional Theory Describe Static Correlation Effects?

Domino Tunneling

A Molecular Nanotube with Three-Dimensional π-Conjugation

Benzene-Fused Azacorannulene Bearing an Internal Nitrogen Atom

Interested in more? There are many ways to subscribe to CCH updates.

Also, for your daily computational chemistry fix subscribe to Computational Chemistry Daily

This work is licensed under a Creative Commons Attribution 4.0

Sunday, August 9, 2015

A very brief introduction to the electron correlation energy

RHF is often not accurate enough for predicting the change in energies due to a chemical reaction, no matter how big a basis set we use. The reason is the error due to the molecular orbital approximation
$$ \Psi ({{\bf{r}}_1},{{\bf{r}}_2}, \ldots {{\bf{r}}_N}) \approx \left| {{\phi _1}(1){{\bar \phi }_1}(2) \ldots {\phi _{N/2}}(N - 1){{\bar \phi }_{N/2}}(N)} \right\rangle \equiv \Phi $$
and the energy difference due to this approximation is known as the correlation energy. Just like we improve the LCAO approximation by including more terms in an expansion, we can improve the orbital approximation by an expansion, in terms of Slater determinants
$$\Psi ({{\bf{r}}_1},{{\bf{r}}_2}, \ldots {{\bf{r}}_N}) \approx \sum\limits_{i = 1}^L {{C_i}{\Phi _i}({{\bf{r}}_1},{{\bf{r}}_2}, \ldots {{\bf{r}}_N})} $$
The “basis set” of Slater determinants $\{\Phi_i \}$ is generated by first computing an RHF wave function $\{\Phi_0 \}$ as usual, which also generates a lot of virtual orbitals, and then generating other determinants with these orbitals. For example, for an atom or molecule with two electrons the RHF wave function is $\left| {{\phi _1}{{\bar \phi }_1}} \right\rangle $ and we have $K-1$ virtual orbitals (${\phi _2}, \ldots ,{\phi _K}$ , where $K$ is the number of basis functions), which can be used to make other Slater determinants like $\Phi _1^2 = \left| {{\phi _1}{{\bar \phi }_2}} \right\rangle $ and $\Phi _{11}^{22} = \left| {{\phi _2}{{\bar \phi }_2}} \right\rangle $ (Figure 1).

Figure 1. Schematic representation of the electronic structure of some of the determinants used in Equation 3

Conceptually (in analogy to spectroscopy), an electron is excited from an occupied to a virtual orbital: $\left| {{\phi _1}{{\bar \phi }_2}} \right\rangle$ represents a single excitation and $\left| {{\phi _2}{{\bar \phi }_2}} \right\rangle $ a double excitation. For systems with more than two electrons higher excitations (like triple and quadruple excitations) are also possible. In general
$$\Psi \approx {C_0}{\Phi _0} + \sum\limits_a {\sum\limits_r {C_a^r\Phi _a^r} } + \sum\limits_a {\sum\limits_b {\sum\limits_r {\sum\limits_s {C_{ab}^{rs}\Phi _{ab}^{rs}} } } } + \ldots $$
The expansion coefficients can be found using the variational principle
$$\frac{{\partial E}}{{\partial {C_i}}} = 0 \ \textrm{for all} \ i$$
and this approach is called configuration interaction (CI). The more excitations we include (i.e. increase L in Eq 2.12.1) the more accurate the expansion and the resulting energy becomes. If the expansion includes all possible excitations (known as a full CI, FCI) then we have a numerically exact wave function for the particular basis set, and if we use a basis set where the HF limit is reached then we have a numerically exact solution to the electronic Schrödinger equation! That’s the good news …

The bad news is that the FCI “basis set of determinants” is much, much larger than the LCAO basis set (i.e. $L >> K$),
$$L = \frac{{K!}}{{N!(K - N)!}}$$
where $N$ is the number of electrons. Thus, an RHF/6-31G(d,p) calculation on water involves 24 basis functions and roughly $\tfrac{1}{8}K^4$ = 42,000 2-electron integrals but a corresponding FCI/6-31G(d) calculation involves nearly 2,000,000 Slater determinants.

Just like finding the LCAO coefficients involves the diagonalization of the Fock matrix, finding the CI coefficients (Ci) and the lowest energy also involves a matrix diagonalization.

$$\bf{E} = {{\bf{C}}^t}{\bf{HC}}$$

where $\bf{E}$ is a diagonal matrix whose smallest value ($E_0$) corresponds to the variational energy minimum. While the Fock matrix is a $K \times K$ matrix, the CI Hamiltonian ($\bf{H}$) is an $L \times L$ matrix. Just holding the 2 million by 2 million matrix for the water molecule using the 6-31G(d,p) basis set requires millions of gigabites! Clever programming and large computers actually makes a FCI/6-31G(d,p) calculation on $\ce{H2O}$ possible, but FCI is clearly not a routine molecular modeling tool.

Using, for example, only single excitations (called CI singles, CIS)

$${\Psi ^{CIS}} = {C_0}{\Phi _0} + \sum\limits_a {\sum\limits_r {C_a^r\Phi _a^r} } $$

is feasible, however is doesn’t result in any improvement. The CIS Hamiltonian has three kinds of contributions

$$\begin{split}
& \langle \Phi _0\left| {\hat H} \right| \Phi_0 \rangle = E_{RHF}\\
\langle\Phi^{CIS}\left| {\hat H} \right| \Phi^{CIS} \rangle \rightarrow & \langle \Phi _0\left| {\hat H} \right| \Phi_a^r \rangle = F_{ar} = 0 \\
& \langle \Phi _a^r\left| {\hat H} \right| \Phi_a^r \rangle
\end{split}$$
which means that when this matrix is diagonalized $E_0=E_{RHF}$. Thus CIS does not give us any correlation energy. However, CIS is not completely useless. The second lowest value of $\bf{E}$, $E_1$, represents the energy of the first excited state, at roughly an RHF quality.

Thus, we need at least single and double excitations (CISD)
$$\Psi \approx {C_0}{\Phi _0} + \sum\limits_a {\sum\limits_r {C_a^r\Phi _a^r} } + \sum\limits_a {\sum\limits_b {\sum\limits_r {\sum\limits_s {C_{ab}^{rs}\Phi _{ab}^{rs}} } } } $$
to get any correlation energy. However, in general including doubles already results in an $\bf{H}$ matrix that is impractically large for a matrix diagonalization. CI, i.e. finding the $C_i$ coefficients using the variational principle, is therefore rarely used to compute the correlation energy.

Perhaps the most popular means of finding the $C_i$’s is by perturbation theory, a standard mathematical technique in physics to compute corrections to a reference state (in this case RHF). Perturbation theory using this reference is called Møller-Plesset pertubation theory, and there are several successively more accurate and more expensive variants: MP2 (which includes some double excitations), MP3 (more double excitations than MP2), and MP4 (single, double, triple, and some quadruple excitations).

Another approach is called coupled cluster which has a similar hierarchy of methods, such as CCSD (singles and doubles) and CCSD(T) (CCSD plus an estimate of the triples contributions). In terms of accuracy vs expense, MP2 is the best choice of a cheap correlation method, followed by CCSD, and CCSD(T). For example, MP4 is not too much cheaper than CCSD(T), but the latter is much more accurate. In fact for many practical purposes it is rarely necessary to go beyond CCSD(T) in terms of accuracy, provided a triple-zeta or higher basis set it used. However, CCSD(T) is usually too computationally demanding for molecules with more than 10 non-hydrogen atoms. In general, the computational expense of these correlated methods scale much worse than RHF with respect to basis set size: MP2 ($K^5$), CCSD ($K^6$), and CCSD(T) ($K^7$). These methods also require a significant amount of computer memory, compared to RHF, which is often the practical limitation of these post-HF methods. Finally, it should be noted that all these calculations also imply an RHF calculation as the first step.

In conclusion we now have ways of systematically improving the wave function, and hence the energy, by increasing the number of basis functions ($K$) and the number of excitations ($L$) as shown in Figure 2.

Figure 2 Schematic representation of the increase in accuracy due to using better correlation methods and larger basis sets.

The most important implication of this is that in principle it is possible to check the accuracy of a given level of theory without comparison to experiment! If going to a better correlation method or a bigger basis set does not change the answer appreciably, then we have a genuine prediction with only the charges and masses of the particles involved as empirical input. These kinds of calculations are therefore known as ab initio or first principle calculations. In practice, different properties will converge at different rates, so it is better to monitor the convergence of the property you are actually interested in, than the total energy. For example, energy differences (e.g. between two conformers) converge earlier than the molecular energies. Furthermore, the molecular structure (bond lengths and angles) tends to converge faster than the energy difference. So it is common to optimize the geometry at a low level of theory [e.g. RHF/6-31G(d)] followed by an energy computation (a single point energy) at a higher level of theory [e.g. MP2/6-311+G(2d,p)]. This level of theory would be denoted MP2/6-311+G(2d,p)//RHF/6-31G(d).

Finally, the correlation energy is not just a fine-tuning of the RHF result but introduces an important intermolecular force called the dispersion energy. The dispersion energy (also known as the induced dipole-induced dipole interaction) is a result of the simultaneous excitation of at least two electrons and is not accounted for in the RHF energy. For example, the stacked orientation of base pairs in DNA is largely a result of dispersion interactions and cannot be predicted using RHF.

Monday, August 3, 2015

Computational Chemistry Highlights: July issue

The July issue of Computational Chemistry Highlights is out.

CCH is an overlay journal that identifies the most important papers in computational and theoretical chemistry published in the last 1-2 years. CCH is not affiliated with any publisher: it is a free resource run by scientists for scientists. You can read more about it here.

Table of content for this issue features contributions from CCH editors Steven Bachrach and Jan Jensen:

Simulations of Chemical Reactions with the Frozen Domain Formulation of the Fragment Molecular Orbital Method

On the tunneling instability of a hypercoordinated carbocation

Inverted Carbon Geometries: Challenges to Experiment and theory

Computational and Experimental Investigations of the Formal Dyotropic Rearrangements of Himbert Arene/Allene Cycloadducts

Interested in more? There are many ways to subscribe to CCH updates.

Also, for your daily computational chemistry fix subscribe to Computational Chemistry Daily

This work is licensed under a Creative Commons Attribution 4.0

Wednesday, July 15, 2015

Computational Chemistry Highlights: June issue

The June issue of Computational Chemistry Highlights is out.

CCH is an overlay journal that identifies the most important papers in computational and theoretical chemistry published in the last 1-2 years. CCH is not affiliated with any publisher: it is a free resource run by scientists for scientists. You can read more about it here.

Table of content for this issue features contributions from CCH editors Steven Bachrach, Tobias Schwabe and Jan Jensen:

Accurately Modeling Nanosecond Protein Dynamics Requires at least Microseconds of Simulation

A Practicable Real-Space Measure and Visualization of Static Electron-Correlation Effects

Ring Planarity Problem of 2-Oxazoline Revisited Using Microwave Spectroscopy and Quantum Chemical Calculations

Charge-Enhanced Acidity and Catalyst Activation

Interested in more? There are many ways to subscribe to CCH updates.

Also, for your daily computational chemistry fix subscribe to Computational Chemistry Daily

This work is licensed under a Creative Commons Attribution 4.0

Monday, June 15, 2015

A brief introduction to basis sets

In order to compute the energy we need to define mathematical functions for the orbitals. In the case of atoms we can simply use the solutions to the Schrödinger equation for the $\ce{H}$ atom as a starting point and find the best exponent for each function using the variational principle.

But what functions should we use for molecular orbitals (MOs)? The wave function of the $\ce{H2+}$ molecule provides a clue (Figure 1)

Figure 1. Schematic representation of the wave function of the $\ce{H2+}$ molecule.

It looks a bit like the sum of two 1$s$ functions centered at each nucleus (A and B),
$${\Psi ^{{\text{H}}_{\text{2}}^ + }} \approx \tfrac{1}{{\sqrt 2 }}\left( {\Psi _{1s}^{{{\text{H}}_{\text{A}}}} + \Psi _{1s}^{{{\text{H}}_{\text{B}}}}} \right)$$
Thus, one way of constructing MOs is as a linear combination of atomic orbitals (the LCAO approximation),
$${\phi _i}(1) = \sum\limits_{\mu = 1}^K {{C_{\mu i}}{\chi _\mu }(1)} $$
an approximation that becomes better and better as $K$ increases. Here $\chi _\mu$ is a mathematical function that looks like an AO, and is called a basis function (a collection of basis functions for various atoms is called a basis set), and $C_{\mu i}$ is a number (sometimes called an MO coefficient) that indicates how much basis function $\chi _\mu$ contributes to MO $i$, and is determined for each system via the variational principle. Note that every MO is expressed in terms of all basis function, and therefore extends over the entire molecule.

If we want to calculate the RHF energy of water, the basis set for the two $\ce{H}$ atoms would simply be the lowest energy solution to the Schrödinger equation for $\ce{H}$ atom
$${\chi _{{{\text{H}}_{\text{A}}}}}(1) = \Psi _{1s}^{\text{H}}({r_{1A}}) = \frac{1}{{\sqrt \pi }}{e^{ - \left| {{{\bf{r}}_1} - {{\bf{R}}_A}} \right|}}$$
For the O atom, the basis set is the AOs obtained from, say, an ROHF calculation on $\ce{O}$, i.e. $1s$, $2s$, $2p_x$, $2p_y$, and $2p_z$ functions from the solutions to the Schrödinger equation for the $\ce{H}$ atom, where the exponents ($\alpha_i$'s) have been variationally optimized for the $\ce{O}$ atom,
$$\Psi _{1s}^{\text{H}},\;\Psi _{2s}^{\text{H}},\;\Psi _{2p}^{\text{H}} \xrightarrow{\frac{\partial E}{\partial \alpha_i}=0} \phi _{1s}^{\text{O}},\;\phi _{2s}^{\text{O}},\;\phi _{2p}^{\text{O}} \equiv \;\left\{ {{\chi _O}} \right\}$$
Notice that this only has to be done once, i.e. we will use this oxygen basis set for all oxygen-containing molecules. We then provide a guess at the water structure and the basis functions are placed at the coordinates of the respective atoms. Then we find the best MO coefficients by variational minimization,
$$\frac{{\partial E}}{{\partial {C_{\mu i}}}} = 0 $$
for all $i$ and $\mu$. Thus, for water we need a total of seven basis functions to describe the five doubly occupied water MOs ($K = 7$ and $i = 1-5$ in Eq 5). This is an example of a minimal basis set, since it is the minimum number of basis functions per atoms that makes chemical sense.

One problem with the LCAO approximation is the number of 2-electron integrals it leads to, and the associated computational cost. Let’s look at the part of the energy that comes from the Coulomb integrals
$$\begin{split}
\sum\limits_{i = 1}^{N/2} {\sum\limits_{j = 1}^{N/2} {2{J_{ij}}} } &= \sum\limits_{i = 1}^{N/2} {\sum\limits_{j = 1}^{N/2} {2\left\langle {\left. {{\phi _i}(1){\phi _j}(2)} \right|\frac{1}{{{r_{12}}}}\left| {{\phi _i}(1){\phi _j}(2)} \right.} \right\rangle } } \\
&= \sum\limits_i^{N/2} {\sum\limits_j^{N/2} {2\left\langle {\left. {{\phi _i}(1){\phi _i}(1)} \right|\frac{1}{{{r_{12}}}}\left| {{\phi _j}(2){\phi _j}(2)} \right.} \right\rangle } } \\
&= \sum\limits_i^{N/2} {\sum\limits_j^{N/2} {2\left\langle {{{\phi _i}{\phi _i}}}
\mathrel{\left | {\vphantom {{{\phi _i}{\phi _i}} {{\phi _j}{\phi _j}}}}
\right. }
{{{\phi _j}{\phi _j}}} \right\rangle } } \\
&= \sum\limits_\mu ^K {\sum\limits_\nu ^K {\sum\limits_\lambda ^K {\sum\limits_\sigma ^K {\sum\limits_i^{N/2} {\sum\limits_j^{N/2} {2{C_{\mu i}}{C_{\nu i}}{C_{\lambda j}}{C_{\lambda j}}\left\langle {{{\chi _\mu }{\chi _\nu }}}
\mathrel{\left | {\vphantom {{{\chi _\mu }{\chi _\nu }} {{\chi _\lambda }{\chi _\sigma }}}}
\right. }
{{{\chi _\lambda }{\chi _\sigma }}} \right\rangle } } } } } } \\
&= \sum\limits_\mu ^K {\sum\limits_\nu ^K {\sum\limits_\lambda ^K {\sum\limits_\sigma ^K {\tfrac{1}{2}{P_{\mu \nu }}{P_{\lambda \sigma }}\left\langle {{{\chi _\mu }{\chi _\nu }}}
\mathrel{\left | {\vphantom {{{\chi _\mu }{\chi _\nu }} {{\chi _\lambda }{\chi _\sigma }}}}
\right. }
{{{\chi _\lambda }{\chi _\sigma }}} \right\rangle } } } }
\end{split}$$
We have roughly $(N/2)^2$ Coulomb integrals involving molecular orbitals but roughly $1/8K^4$ Coulomb integrals involving basis functions (the factor of 1/8 comes from the fact that some the integrals are identical and need only be computed once). Using a minimal basis set $K$ = 80 for a small organic molecule like caffeine $\ce{(C8H10N4O2)}$ and results in ca 5,000,000 2-electron integrals involving basis functions! (That being said, you can perform an RHF energy calculation with 80 basis functions on your desktop computer in a few minutes. The problem with $K^4$-scaling is that a corresponding calculation with 800 basis functions would take a few days on the same machine. So you can forget about optimizing the structure on that machine.) This is one of the reasons why modern computational quantum chemistry requires massive computers. This is also the reason why the basis set size it a key consideration in a quantum chemistry project.

The 2-electron integrals also pose another problem: the basis functions defined so far are exponential functions (also known as Slater type orbitals or STOs). 2-electron integrals involving STOs placed on four different atoms do not have analytic solutions. As a result, most quantum chemistry programs use Gaussian type orbitals (or simply Gaussians) instead of STOs, because the 2-electron integrals involving Gaussians have analytic solutions. Obviously,
$${e^{ - \alpha {r_{1A}}}} \approx {e^{ - \beta r_{1A}^2}}$$
is a poor approximation, so a linear combination of Gaussians are used to model each STO basis function (Figure 2)
$${e^{ - \alpha {r_{1A}}}} \approx \sum\limits_i^X {{a_{i\mu }}{e^{ - {\beta _i}r_{1A}^2}}} = {\chi _\mu }$$

Figure 2. (a) An exponential function is not well represented by one Gaussian, but (b) can be well represented by a linear combination of three Gaussians.

Here the $a_{i\mu}$parameters (or contraction coefficients) as well as the Gaussian exponents are determined just once for a given STO basis function. $\chi_\mu$ is a contracted basis function and the $X$ individual Gaussian functions are called primitives. Generally, three primitives are sufficient to represent an STO, and this basis set is known at the STO-3G basis set. $p$- and $d$-type STOs are expanded in terms of $p$- and $d$-type primitive Gaussians [e.g. $({x_1} - {x_A}){e^{ - \beta r_{1A}^2}}$ and $({x_1} - {x_A})({y_1} - {y_A}){e^{ - \beta r_{1A}^2}}$]. An RHF calculation using the STO-3G basis set is denoted RHF/STO-3G. Unless otherwise noted, this usually also implies that the geometry is computed (i.e. the minimum energy structure is found) at this level of theory.

Minimal basis sets are usually not sufficiently accurate to model reaction energies. This is due to the fact that the atomic basis functions cannot change size to adjust to their bonding environment. However, this can be made possible by using some the contraction coefficients as variational parameters. This will increase the basis set size (and hence the computational cost) so this must be done judiciously. For example, we’ll get most improvement by worrying about the basis functions that describe the valence electrons that participate most in bonding. Thus, for $\ce{O}$ atom we leave the 1$s$ core basis function alone, but “split” the valence 2$s$ basis function into linear combinations of two and one Gaussians respectively,
$$\begin{split}
{\chi _{1s}} &= \sum\limits_i^3 {{a_{i1s}}{e^{ - {\beta _i}r_{1A}^2}}} \\
{\chi _{2{s_a}}} &= \sum\limits_i^2 {{a_{i2s}}{e^{ - {\beta _i}r_{1A}^2}}} \\
{\chi _{2{s_b}}} &= {e^{ - {\beta _{2{s_b}}}r_{1A}^2}} \\
\end{split} $$
and similarly for the 2$p$ basis functions. This is known as the 3-21G basis set (pronounced “three-two-one g” not “three-twenty one g”), which denotes that core basis functions are described by 3 contracted Gaussians, while the valence basis functions are split into two basis functions, described by 2 and 1 Gaussian each. Thus, using the 3-21G basis set to describe water requires 13 basis functions: two basis functions on each $\ce{H}$ atom (1$s$ is the valence basis function of the H atom) and 9 basis functions on the $\ce{O}$ atom (one 1$s$ function and two each of 2$s$, 2$p_x$, 2$p_y$, and 2$p_z$).

The $\chi _{2{s_a}}$ basis function is smaller (i.e., the Gaussians have a larger exponent) than the basis function. Thus, one can make a function of any intermediate size by (variationally) mixing these two functions (Figure 3). The 3-21G is an example of a split valence or double zeta basis set (zeta, ζ, is often used as the symbol for the exponent, but I find it hard to write and don’t use it in my lectures). Similarly, one can make other double zeta basis sets such as 6-31G, or triple zeta basis sets such as 6-311G.

Figure 3. Sketch of two different sized $s$-type basis functions that can be used to make a basis function of intermediate size

As the number of basis functions ($K$ in Eq 2) increase the error associated with the LCAO approximation should decrease and the energy should converge to what is called the Hartree-Fock limit ($E_{\text{HF}}$) that is higher than the exact energy ($E_{\text{exact}}$) (Figure 4). The difference is known as the correlation energy,
$$E_{\text{corr}}=E_{\text{HF}}-E_{\text{exact}}$$
and is the error introduced by the orbital approximation
$$\Psi ({{\bf{r}}_1},{{\bf{r}}_2}, \ldots {{\bf{r}}_N}) \approx \left| {{\phi _1}(1){{\bar \phi }_1}(2) \ldots {\phi _{N/2}}(N - 1){{\bar \phi }_{N/2}}(N)} \right\rangle $$

Figure 4. Plot of the energy as a function the number of basis functions.

However, in the case of a one-electron molecule like $\ce{H2+}$ we would expect the energy to converge to $E_{\text{exact}}$ since there is no orbital approximation. However, if we try this with the basis sets discussed thus far we find that this is not the case (Figure 5)!

Figure 5. Plot of the energy of $\ce{H2+}$ computed using increasingly larger basis sets.

What’s going on? Again we get a clue by comparing the exact wave function to the LCAO-wave function (Figure 6).

Figure 6. Comparison of the exact wave function and one computed using the 6-311G basis set.

We find that compared to the exact result there is not “enough wave function” between the nuclei and too much at either end. As we increase the basis set we only add $s$-type basis functions (of varying size) to the basis set. Since they are spherical they cannot be used to shift electron from one side of the $\ce{H}$ atom to the other. However, $p$-functions are perfect for this (Figure 7).

Figure 7. Sketch of the polarization of an s basis function by a p basis function

So basis set convergence is not a matter of simply increasing the number of basis functions, it is also important to have the right mix of basis function types. Similarly, $d$-functions can be used to “bend” $p$-functions (Figure 8).

Figure 8. Sketch of the polarization of a p basis function by a d basis function

Such functions are known as polarization functions, and are denoted with the following notation. For example, 6-31G(d) denotes d polarization functions on all non-$\ce{H}$ atoms and can also be written as 6-31G*. 6-31G(d,p) is a 6-31G(d) basis set where p-functions have been added on all $\ce{H}$ atoms, and can also be written 6-31G**. A RHF/6-31G(d,p) calculation on water involves 24 basis functions: 13 basis functions for the 6-31G part (just like for 3-21G) plus 3 $p$-type polarization functions on each H atom and 5 $d$-type polarization functions (some programs use 6 Cartesian d-functions instead of the usual 5).

Anions tend to have very diffuse electron distributions and very large basis functions (with very small exponents) are often needed for accurate results. These diffuse functions are denoted with “+” signs: e.g. 6-31+G denotes one s-type and three $p$-type diffuse Gaussians on each non-$\ce{H}$ atom, and 6-31++G denotes the addition of a single diffuse $s$-type Gaussian on each $\ce{H}$-atom. Diffuse functions also tend to improve the accuracy of calculations on van der Waals complexes and other structures where the accurate representation of the outer part of the electron distribution is important.

Of course there are many other basis sets available, but in general they have the same kinds of attributes as described already. For example, aug-cc-pVTZ is a more modern basis set: “aug” stands for “augmented” meaning “augmented with diffuse functions”, “pVTZ” means “polarized valence triple zeta”, i.e. it is of roughly the same quality as 6-311++G(d,p). “cc” stands for “correlation consistent” meaning the parameters were optimized for correlated wave functions (like MP2, see below) rather than HF wave function like Pople basis sets [such as 6-31G(d)] described thus far.

Related blog posts
Complete basis set extrapolation

Monday, June 1, 2015

Computational Chemistry Highlights: May issue

The May issue of Computational Chemistry Highlights is out.

CCH is an overlay journal that identifies the most important papers in computational and theoretical chemistry published in the last 1-2 years. CCH is not affiliated with any publisher: it is a free resource run by scientists for scientists. You can read more about it here.

Table of content for this issue features contributions from CCH editors Steven Bachrach and Jan Jensen:

Exploring the Accuracy Limits of Local Pair Natural Orbital Coupled-Cluster Theory

Using dispersion-corrected density functional theory to understand supramolecular binding thermodynamics

Uthrene, a radically new molecule?

Interested in more? There are many ways to subscribe to CCH updates.

Also, for your daily computational chemistry fix subscribe to Computational Chemistry Daily

This work is licensed under a Creative Commons Attribution 4.0

Sunday, May 24, 2015

Energy Minimization: Finding and Connecting Stationary Points

Let’s start by considering a simple example with one coordinate (for example a distance), $R$, and where the energy is given by

$$ E= \frac{1}{2}(R-R_e)^2$$
The corresponding potential energy surface, known as a quadratic PES, is shown in Figure 1.

Figure 1. Plot of the (quadratic) potential energy surface given by Equation 1.

Here $R_e$ is the value of $R$ at which the energy is lowest (this is known as the equilibrium geometry) and this is what we’d like to find. We start by taking a guess at $R$, $R_g$. We already know how to check whether this is an energy minimum: we need to evaluate the gradient, which is
$$\left(\frac{\partial E}{\partial R}\right)_{R_g}=k(R_g-R_e)$$
It’s clear that the gradient is non-zero for $R_e \ne R_g$. However, by rearranging the equation the gradient also tells us how to change $R$ to get an energy minimum
$$R_e=R_g-\frac{1}{k}\left(\frac{\partial E}{\partial R}\right)_{R_g}=R_g+\frac{1}{k}F_g$$
where $F$ is the force, which is the negative gradient. If we know $k$ we can find $R_e$ in one step starting from $R_g$ (Figure 2).

Figure 2. The equilibrium geometry can be found in one step on a quadratic PES

If we don’t know $k$ then it is safest to take many small steps, i.e. scale the gradient by some small constant ($c$) and repeat until the gradient falls below some threshold (Figure 3)
$$R_{n+1}=R_n-c\left(\frac{\partial E}{\partial R}\right)_{R_n}$$

Figure 3. Energy minimization by steepest descent.

This general approach is known as steepest descent, since the gradient tells you in which direction where the energy is decreasing (descending) the most. Notice that this means that minimizing the energy will find the closest minimum to the starting geometry. When steepest descent is used to find equilibrium geometries it is often combined with a so-called “line search”, which tries to find the lowest energy in the direction of the current gradient, but the general idea is still the same.

Another use of steepest descent is to connect a transitions state with its two closest minima, since this tells you what two minima the transition state connects. Here the guess structure is a transition state structure displaced along the normal mode corresponding to the imaginary frequency (the transition state structure itself cannot be used because its gradient is zero). This path is the minimum energy path (MEP) between reactions and products and the resulting collection of structures is known as the intrinsic reaction coordinate (IRC). An IRC is usually depicted as a plot of the potential energy vs the mass-weighted root-mean-square-displacement of the Cartesian coordinates relative to some reference geometry (usually the transition state), usually denoted $s$ (Figure 4). When we draw a potential energy surface for a reaction we usually mean an IRC.

Figure 4. Depiction of an IRC or MEP: two steepest descent paths starting from the transition state

Clearly, the more steps we take, the higher the computational cost, and in general we want to take as few steps as possible. As I mentioned above, if we knew $k$ we could find the energy minimum in one step. For a quadratic surface $k$ is the second derivative of $E$ with respect to $R$ (or the first derivative of the gradient) so Eq 3 becomes
$$ R_e=R_g-\left(\frac{\partial^2 E}{\partial R^2}\right)^{-1}_{R_g}\left(\frac{\partial E}{\partial R}\right)_{R_g}$$
Such a step is known as a Newton-Raphson step or quadratic step. This works fine if the surface is quadratic, which is a good approximation near the minimum but not far away. For really bad guesses, where the PES tends to be flat, the quadratic step can be too large (Figure 5).

Figure 5. Quadratic steps (a) close to and (b) far from the minimum.

Most algorithms will scale back quadratic steps that are considered unreasonably large, and even using quadratic energy minimizers many steps are needed:
$$R_{n+1}=R_n-\left(\frac{\partial^2 E}{\partial R^2}\right)^{-1}_{R_n}\left(\frac{\partial E}{\partial R}\right)_{R_n}$$
$$ \mathbf{q}_{n+1}=\mathbf{q}_{n}-c\mathbf{H}^{-1}_n\mathbf{g}_n$$
Eq. 7 is the same equation as 6 but in many dimensions: qn are the current coordinates (for example Cartesian coordinates) for step number $n$ ($n = 0$ for the guess coordinates), $\mathbf{q}_{n+1}$ are the new coordinates, $\mathbf{g}_n$ is the gradient, and $\mathbf{H}_n$ is the Hessian both evaluated at the current coordinates. Note that the Hessian should be computed at each step.

Friday, May 1, 2015

Computational Chemistry Highlights: April issue

The April issue of Computational Chemistry Highlights is out.

CCH is an overlay journal that identifies the most important papers in computational and theoretical chemistry published in the last 1-2 years. CCH is not affiliated with any publisher: it is a free resource run by scientists for scientists. You can read more about it here.

Table of content for this issue features contributions from CCH editors Mario Barbatti, Steven Bachrach, and Jan Jensen:

Big Data Meets Quantum Chemistry Approximations: The ∆-Machine Learning Approach

Electron-Driven Proton Transfer Along H2O Wires Enables Photorelaxation of πσ* States in Chromophore−Water Clusters

The Fluorenyl Cation

The Nucleoside Uridine Isolated in the Gas Phase

Interested in more? There are many ways to subscribe to CCH updates.

Also, for your daily computational chemistry fix subscribe to Computational Chemistry Daily

This work is licensed under a Creative Commons Attribution 4.0

Wednesday, April 1, 2015

Computational Chemistry Highlights: March issue

The March issue of Computational Chemistry Highlights is out.

CCH is an overlay journal that identifies the most important papers in computational and theoretical chemistry published in the last 1-2 years. CCH is not affiliated with any publisher: it is a free resource run by scientists for scientists. You can read more about it here.

Table of content for this issue features contributions from CCH editors Tobias Schwabe, Martin Korth, Steven Bachrach, and Jan Jensen:

Thermodynamic Evaluation of Aromatic CH/π Interactions and Rotational Entropy in a Molecular Rotor

A Case Study of the Mechanism of Alcohol-Mediated Morita Baylis- Hillman Reactions. The Importance of Experimental Observations.

Reinvestigation of the Stereochemistry of the C-Glycosidic Ellagitannins, Vescalagin and Castalagin

Large-scale virtual high-throughput screening for the identification of new battery electrolyte solvents: computing infrastructure and collective properties

Selected publications of Stefan Grimme, Leibniz prize winner 2015

Interested in more? There are many ways to subscribe to CCH updates.

Also, for your daily computational chemistry fix subscribe to Computational Chemistry Daily

This work is licensed under a Creative Commons Attribution 4.0

Sunday, March 1, 2015

Computational Chemistry Highlights: February issue

The February issue of Computational Chemistry Highlights is out.

CCH is an overlay journal that identifies the most important papers in computational and theoretical chemistry published in the last 1-2 years. CCH is not affiliated with any publisher: it is a free resource run by scientists for scientists. You can read more about it here.

Table of content for this issue features contributions from CCH editors Dean Tantillo, Marcel Swart, Martin Korth, Johannes Hachmann, Steven Bachrach, and Jan Jensen:

Water–Water and Water–Solute Interactions in Microsolvated Organic Complexes

First Principles View on Chemical Compound Space: Gaining Rigorous Atomistic Control of Molecular Properties

An isomeric reaction benchmark set to test if the performance of state-of-the-art density functionals can be regarded as independent of the external potential

Benchmarking density-functional theory calculations of NMR shielding constants and spin–rotation constants using accurate coupled-cluster calculations

Structural Revision of Cephalosporolide J and Bassianolone

Metal oxidation states in the oxygen-evolving computational challenge

Effect of Anharmonicity on Adsorption Thermodynamics

A guide to small-molecule structure assignment through computation of (1H and 13C) NMR chemical shifts

Quantum-Mechanical Evaluation of pi-pi versus Substituent-pi Interactions in pi Stacking: Direct evidence for the Wheeler-Houk Picture

Interested in more? There are many ways to subscribe to CCH updates.

Also, for your daily computational chemistry fix subscribe to Computational Chemistry Daily

This work is licensed under a Creative Commons Attribution 4.0

Sunday, February 1, 2015

Computational Chemistry Highlights: January issue

The January issue of Computational Chemistry Highlights is out.

CCH is an overlay journal that identifies the most important papers in computational and theoretical chemistry published in the last 1-2 years. CCH is not affiliated with any publisher: it is a free resource run by scientists for scientists. You can read more about it here.

Table of content for this issue features contributions from CCH editors Alán Aspuru-Guzik, Andres Cisneros, Steven Bachrach, David Bowler, and Jan Jensen:

Construction of a Highly Distorted Benzene Ring in a Double Helicene

A computational drug design success story

Orbital-free density functional theory implementation with the projector augmented-wave method

Enhanced Helical Folding of ortho-Phenylenes through the Control of Aromatic Stacking Interactions

Materials Cartography: Representing and Mining Materials Space Using Structural and Electronic Fingerprints

Assessment of theoretical procedures for a diverse set of isomerization reactions involving double-bond migration in conjugated dienes

Computational Design of a Time-Dependent Histone Deacetylase 2 Selective Inhibitor

Nonstatistical Dynamics in the Thermal Garratt−Braverman/[1,5]‑H Shift of One Ene−diallene: An Experimental and Computational Study

Popular Highlights of 2014

Interested in more? There are many ways to subscribe to CCH updates.

Also, for your daily computational chemistry fix subscribe to Computational Chemistry Daily

This work is licensed under a Creative Commons Attribution 4.0

Saturday, January 24, 2015

The conformationally averaged free energy

The free energy averaged over $N_\mathrm{conf}$ conformations of molecule X is

$G^\circ=-RT \ln \left( \sum_I^{N_\mathrm{conf}} e^{-G^\circ_I/RT} \right)$ (1)

However, the way to compute an average property of the conformations $\left\langle x\right\rangle$ is

$\left\langle x\right\rangle=\sum_I^{N_\mathrm{conf}} p_Ix_I$ (2)

where

$p_I=\frac{e^{-G_I^\circ/RT}}{\sum_J^{N_\mathrm{conf}} e^{-G_J^\circ/RT}}$ (3)

But this equation cannot be used to compute the conformationally averaged free energy because

$G^\circ \ne \left\langle G^\circ \right\rangle$

Why not? First, let's see where the first equation comes from.

Obtaining equation (1)

The first equation comes from the statistical mechanical definition of the Helmholtz free energy found any any p-chem textbook

$A^\circ = -RT \ln \left( \sum_i^{\mathrm{states}} e^{-\varepsilon_i/kT} \right)=-RT \ln \left( q \right)$

The sum over microstates can be split into sums over microstates of each conformation I

$A^\circ = -RT \ln \left( \sum_I^{N\mathrm{conf}} \sum_{i\in I}^{\mathrm{states}} e^{-\varepsilon_i/kT} \right)$

$= -RT \ln \left( \sum_I^{N\mathrm{conf}} q_I \right) = -RT \ln \left( \sum_I^{N_\mathrm{conf}} e^{-A_I^\circ/RT} \right)$

To get the corresponding expression for the Gibbs free energy [Eq (1)]:

$G^\circ = A^\circ +p^\circ V $

$=-RT \ln \left( \sum_I^{N_\mathrm{conf}} e^{-A^\circ_I/RT} \right)-RT \ln \left( e^{-p^\circ V/RT} \right)$

$=-RT \ln \left( \sum_I^{N_\mathrm{conf}} e^{-G_I^\circ/RT} \right)$

The difference between equation (1) and (3) is the conformational entropy

$G^\circ - \left\langle G^\circ \right\rangle=-RT \ln \left( \sum_J^{N_\mathrm{conf}} e^{-G_J^\circ/RT} \right)\sum_I^{N_\mathrm{conf}} p_I$

$+RT\sum_I^{N_\mathrm{conf}} p_I\frac{-G_I}{RT}$

$=RT \sum_I^{N_\mathrm{conf}} p_I \left( \frac{-G_I}{RT}- \ln \left( \sum_J^{N_\mathrm{conf}} e^{-G_J^\circ/RT} \right) \right)$

$=RT \sum_I^{N_\mathrm{conf}} p_I \left( \ln \left( e^{-G_I^\circ/RT} \right) - \ln \left( \sum_J^{N_\mathrm{conf}} e^{-G_J^\circ/RT} \right) \right) $

$=RT \sum_I^{N_\mathrm{conf}} p_I \ln(p_I) =-TS_{\mathrm{conf}} $

Wednesday, January 7, 2015

Predicting binding free energies with electronic structure theory: thermodynamic considerations - Part 3

2015.01.25: I have summarized discussion of this and related issues in this paper.

Predicting absolute binding free energies of biologically relevant molecules (in aqueous solution at pH 7 using electronic structure theory without empirical adjustments to within 1 kcal/mol) is one of the holy grails of computational chemistry. Recent work by Grimme and co-workers (here and here) have come close with mean absolute deviations of about 2 kcal/mol for host-guest complexes. This and other work has shed some light on why it is so difficult to predict binding free energies in aqueous solution, which I will discuss here and in two previous posts (Part 1 and Part 2). In addition I'll talk about further improvements that could potentially increase the accuracy further.

The approach is based on the following equation

$\Delta G^\circ = \Delta E_\mathrm{gas} + \Delta G^\circ_{\mathrm{gas,RRHO}}+\Delta \delta G^\circ_{\mathrm{solv}}$

where $\Delta E_\mathrm{gas}$ is the change in electronic energy and $\Delta G^\circ_{\mathrm{gas,RRHO}}$ the change in the translational, rotational, and vibrational free energy computed using the rigid-rotor and harmonic oscillator approximation, respectively - both evaluated in the gas phase. The vibrational frequencies are computed at a lower level of theory compared to that used to compute $\Delta E_{\mathrm{gas}}$. $\Delta \delta G^\circ_{\mathrm{solv}}$ is the change in solvation free energy computed using a continuum model of the solvent, such as COSMO.

Solvation Thermodynamics

Background. Most continuum models (CMs) of solvation compute the solvation free energy as the difference between the free energy in solution and the gas phase electronic energy

$\delta G^\circ\mathrm{_{solv}(X)}=G\mathrm{^{\circ,CM}_{soln}(X)}-E\mathrm{_{gas}(X)}$

$G\mathrm{^{CM}_{soln}(X)}$ typically contain energy terms describing the electrostatic interaction of the molecule and the continuum as well as the van der Waal inteactions with the solvent and free energy required to create the molecular cavity in the solvent (cavitation). The electrostatic interaction with the solvent alters the molecular wavefunction and is computed self-consistently.

Some software packages automatically compute $G\mathrm{^{\circ,CM}_{soln}(X)}$ and $E\mathrm{_{gas}(X)}$ in one run, while others only compute $G\mathrm{^{\circ,CM}_{soln}(X)}$. Also, some programs just compute the electrostatic component of $G\mathrm{^{\circ,CM}_{soln}(X)}$ by default. However, the van der Waals and, especially, the cavitation component can make sizable contributions to the binding free energy and must be included for accurate results. It is worth noting that any hydrophobic contribution to binding will derive primarily from the change in cavitation energy.

$G\mathrm{^{CM}_{soln}(X)}$ contain parameters (e.g. atomic radii) that are adjusted to reproduce experimentally measured solvation free energies

$\delta G\mathrm{^\circ_{solv}(X)}=G\mathrm{^{\circ,exp}_{soln}(X)}-G\mathrm{^{\circ,exp}_{gas}(X)}$

The standard state for both $G\mathrm{^{\circ,exp}_{soln}(X)}$ and $G\mathrm{^{\circ,exp}_{gas}(X)}$ is generally 1 M. The latter is the reason why a 1 M reference state also must be used when computing $\Delta G^\circ_{\mathrm{RRHO}}$

Atomic radii. The solvation energy is computed using a set of atomic radii that define the solute-solvent boundary surface. These radii are usually obtained by fitting to experimentally measured solvation energies. Accurate solvation energies should not be expected from methods that use iso-electron density surfaces or van der Waals radii without additional empirical fitting. When using fitted radii one should use the same level of theory for the solute as was used in the parameterization.

Ions. For neutral molecules solvation free energies can be measured with an accuracy of roughly 0.2 kcal/mol and reproduced theoretically to within roughly 0.5 kcal/mol. However, the solvation energies of ions cannot be directly measured and must be indirectly inferred relative to a standard (typically the solvation energy of the proton). The experimentally obtained solvation energies are typically accurate to within 3 kcal/mol and can be reproduced computationally with roughly the same accuracy. The solvation energy of ions are therefore an especially likely source of error in binding free energies - especially if the ionic regions of the molecules become significantly desolvated due to binding.

Gas phase vs solution optimization. The fitting of the radii described above is usually done using gas phase optimized structures only, i.e. any change in structure and corresponding rotational and vibrational effects are "included" in the radii via the parameterization. However, for ionic species gas phase optimization can lead to significantly distorted structures or even proton transfer and in these cases solution phase optimizations and, hence, vibrational frequency calculations, tend to be used. However, numerical instability in the continuum models can make it necessary to increase (i.e. make less stringent) the geometry convergence criteria and can lead to more imaginary frequencies than in the gas phase. One option is to compute the vibrational contribution to $\Delta G^\circ_{\mathrm{RRHO}}$ using gas phase optimized structures.

When using solution phase geometries the gas phase single point energies needed to evaluate $\delta G\mathrm{^\circ_{solv}(X)}$ represent added computational expense and it can be tempting to use solution phase free energies to evaluate the binding free energies

$\Delta G^\circ = \Delta G\mathrm{^{\circ,CM}_{soln}}- \Delta G\mathrm{^{\circ,CM,RRHO}_{soln}}$

One problem with this approach is that $\Delta G\mathrm{^{\circ,CM}_{soln}}$, unlike $\Delta E_{\mathrm{gas}}$, is not systematically improveable due to the empirical parameterization.

Cavities. The atomic radii and corresponding cavity generation algorithms are parameterized for small molecules. For more complex molecules such as receptors this can lead to continuum solvation of regions of molecules, e.g. deep in the binding pocket, that are not accessible to the molecular solvent. Furthermore, any solvent molecule inside such pocket is likely to be quite "un-bulk-like" and not well-represented by the bulk solvent or fixed by the underlying parametrization. However, how big an error this may introduce to the binding free energy is not really known, but certain models for the cavitation energy have been shown to give unrealistically large contributions to the binding free energy.

Explicit water molecules. Adding explicit solvent molecules to the receptor and/or ligand can potentially lead to more accurate results. For example, including explicit water molecules around ionic sites reduces the strong dependence of the solvation energy on the corresponding atomic radii. Also, "un-bulk-like" water molecules now are treated more naturally and the risk of solvating non-solvent-acesssible regions is reduced somewhat. However, adding explicit solvent molecules increases the computational cost by increasing the CPU time needed to compute energies, perform conformational searches, and compute vibrational frequencies.

There are several approaches to include the effect of explicit solvent molecules in the binding free energy. Bryantsev and co-workers suggest computing the solvation energy by

$\delta G^\circ_{\mathrm{solv},n}\mathrm{(X)}=\Delta G^\circ_{\mathrm{gas}}\mathrm{(X(H_2O)_n})+\delta G^\circ_{\mathrm{solv}}\mathrm{(X(H_2O)_n)}-$

$\delta G^{\circ,liq}_{\mathrm{solv}}\mathrm{((H_2O)_n)}$

where

$\Delta G^\circ_{\mathrm{gas}}\mathrm{(X(H_2O)_n})=G^\circ_{\mathrm{gas}}\mathrm{(X(H_2O)_n})-G^\circ_{\mathrm{gas}}\mathrm{(X})-G^\circ_{\mathrm{gas}}\mathrm{((H_2O)_n})$

and

$\delta G^{\circ,liq}_{\mathrm{solv}}\mathrm{((H_2O)_n)}=\delta G^\circ_{\mathrm{solv}}\mathrm{((H_2O)_n)}+RT\ln(\mathrm{[H_2O]/n})$

with "$\circ$" and "$liq$" referring to a standard state of 1 M and 55.34, respectively. The term $RT\ln(\mathrm{[H_2O]/n})$ is the free energy required to change the standard state of (H$_2$O)$_n$ from 1 M to 55.34/n M.

Bryantsev et al. have shown that using this water cluster approach leads to a smooth convergence of the solvation free energy with respect to the cluster size $n$. The optimum choice of $n$ is this one where an additional water does changes the solvation energy by less than a certain user defined amount. One can thereby compute the optimum number of water molecules for the receptor ($n$), ligand ($m$) and receptor-ligand complex ($l$) and then compute the binding free energy as

$\Delta \delta G^\circ_{\mathrm{solv},x}=\delta G^\circ_{\mathrm{solv},l}\mathrm{(RL)}-\delta G^\circ_{\mathrm{solv},n}\mathrm{(L)}-\delta G^\circ_{\mathrm{solv},m}\mathrm{(R)}$

and computing $\Delta E_{\mathrm{gas}}$ and $\Delta G^\circ_{\mathrm{RRHO}}$ as before. One can show that this corresponds to the free energy change for this reaction

$\mathrm{L(H_2O)_n(aq)+R(H_2O)_m(aq)+(H_2O)_l(liq) \rightleftharpoons RL(H_2O)_l(aq)}+$
$\mathrm{(H_2O)_n(liq)+L(H_2O)_m(liq)}$ (1)

In principle the free energy change is zero for

$\mathrm{(H_2O)_l(liq) \rightleftharpoons+(H_2O)_n(liq)+L(H_2O)_m(liq)+sgn(d)(H_2O)_{|d|}(liq)}$

where $d=l-m-n$ and sgn(d) returns the sign of d. So the free energy change for Reaction (1) can also be computed as the free energy change for

$\mathrm{L(H_2O)_n(aq)+R(H_2O)_m(aq) \rightleftharpoons RL(H_2O)_l(aq)+sgn(d)(H_2O)_{|d|}}$ (2)

However, this is only approximately true in practice due to errors in the computed gas phase and solvation free energies. Furthermore, Reaction 2 does not really lead to any significant reduction in CPU time because the water cluster free energies only have to be computed once. However, if Reaction (2) is used then one must add an additional term correcting for the indistinguishability of water molecules

$G^\circ_{\mathrm{RRHO}}\mathrm{(X(H_2O)_n)} \rightarrow G^\circ_{\mathrm{RRHO}}\mathrm{(X(H_2O)_n)}-RT\ln (n!)$

and similarly for the water clusters. Using Reaction (1) leads to a cancellation of this term and also maximizes error cancellation in the other energy terms. Similar considerations apply to when using individual water molecules to the balance the reaction instead of water clusters

$\mathrm{(H_2O)_l(liq) \rightleftharpoons (H_2O)_n(liq)+L(H_2O)_m(liq)+sgn(d)|d|H_2O}$

When using many explicit water molecules the error in the continuum solvation energies can be reduced by ensuring that the continuum solvation energy of a single water molecule matches the experimental value of -6.32 kcal/mol at 298.15K as close as possible.

This work is licensed under a Creative Commons Attribution 4.0

Thursday, January 1, 2015

Computational Chemistry Highlights: December issue

The December issue of Computational Chemistry Highlights is out.

CCH is an overlay journal that identifies the most important papers in computational and theoretical chemistry published in the last 1-2 years. CCH is not affiliated with any publisher: it is a free resource run by scientists for scientists. You can read more about it here.

Table of content for this issue features contributions from CCH editors Alán Aspuru-Guzik, Robert Paton, Steven Bachrach, David Bowler, and Jan Jensen:

Discovering chemistry with an ab initio nanoreactor

How Electronic Dynamics with Pauli Exclusion Produces Fermi-Dirac Statistics

Computational Chemistry: 2014 in numbers

Two review articles for the general audience

Benchmarking Ab Initio Binding Energies of Hydrogen-Bonded Molecular Clusters Based on FTIR Spectroscopy

Hypercubane: DFT-based prediction of an Oh-symmetric double-shell hydrocarbon

Benchmarks for Characterization of Minima, Transition States, and Pathways in Atomic, Molecular, and Condensed Matter Systems

Gas-Phase Preparation of Carbonic Acid and Its Monomethyl Ester

Tautomerism in Neutral Histidine

Interested in more? There are many ways to subscribe to CCH updates.

Also, for your daily computational chemistry fix subscribe to Computational Chemistry Daily

This work is licensed under a Creative Commons Attribution 4.0

Tuesday, December 1, 2015

Sunday, November 22, 2015

Sunday, November 1, 2015

Thursday, October 1, 2015

Sunday, September 27, 2015

Tuesday, September 1, 2015

Sunday, August 9, 2015

Monday, August 3, 2015

Wednesday, July 15, 2015

Monday, June 15, 2015

Monday, June 1, 2015

Sunday, May 24, 2015

Friday, May 1, 2015

Wednesday, April 1, 2015

Sunday, March 1, 2015

Sunday, February 1, 2015

Saturday, January 24, 2015

Wednesday, January 7, 2015

Thursday, January 1, 2015

Pages

#compchem on Twitter

The Book

About Me

Posts by category: click to see posts on a particular subject

Popular Posts

Search This Blog

Blog Archive

Subscribe To MolModBasics

Subscribe via email

My Science Blog List

Links of note

Followers