Sunday, March 1, 2015

Computational Chemistry Highlights: February issue

The February issue of Computational Chemistry Highlights is out.

CCH is an overlay journal that identifies the most important papers in computational and theoretical chemistry published in the last 1-2 years. CCH is not affiliated with any publisher: it is a free resource run by scientists for scientists. You can read more about it here.

Table of content for this issue features contributions from CCH editors Dean Tantillo, Marcel Swart, Martin Korth, Johannes Hachmann, Steven Bachrach, and Jan Jensen:

Water–Water and Water–Solute Interactions in Microsolvated Organic Complexes








Sunday, February 1, 2015

Computational Chemistry Highlights: January issue

The January issue of Computational Chemistry Highlights is out.

CCH is an overlay journal that identifies the most important papers in computational and theoretical chemistry published in the last 1-2 years. CCH is not affiliated with any publisher: it is a free resource run by scientists for scientists. You can read more about it here.

Table of content for this issue features contributions from CCH editors Alán Aspuru-Guzik, Andres Cisneros, Steven Bachrach, David Bowler, and Jan Jensen:

Construction of a Highly Distorted Benzene Ring in a Double Helicene

A computational drug design success story

Orbital-free density functional theory implementation with the projector augmented-wave method

Enhanced Helical Folding of ortho-Phenylenes through the Control of Aromatic Stacking Interactions

Saturday, January 24, 2015

The conformationally averaged free energy


The free energy averaged over $N_\mathrm{conf}$ conformations of molecule X is

$G^\circ=-RT \ln \left( \sum_I^{N_\mathrm{conf}} e^{-G^\circ_I/RT} \right)$  (1)

However, the way to compute an average property of the conformations $\left\langle x\right\rangle$ is

$\left\langle x\right\rangle=\sum_I^{N_\mathrm{conf}} p_Ix_I$  (2)

where

$p_I=\frac{e^{-G_I^\circ/RT}}{\sum_J^{N_\mathrm{conf}} e^{-G_J^\circ/RT}}$ (3)

But this equation cannot be used to compute the conformationally averaged free energy because

$G^\circ \ne \left\langle G^\circ \right\rangle$

Why not? First, let's see where the first equation comes from.

Obtaining equation (1)

The first equation comes from the statistical mechanical definition of the Helmholtz free energy found any any p-chem textbook

$A^\circ = -RT \ln  \left( \sum_i^{\mathrm{states}} e^{-\varepsilon_i/kT} \right)=-RT \ln  \left( q \right)$

The sum over microstates can be split into sums over microstates of each conformation I

$A^\circ = -RT \ln  \left( \sum_I^{N\mathrm{conf}} \sum_{i\in I}^{\mathrm{states}} e^{-\varepsilon_i/kT} \right)$

$= -RT \ln  \left( \sum_I^{N\mathrm{conf}} q_I \right) = -RT \ln \left( \sum_I^{N_\mathrm{conf}} e^{-A_I^\circ/RT} \right)$

To get the corresponding expression for the Gibbs free energy [Eq (1)]:

$G^\circ = A^\circ +p^\circ V $

$=-RT \ln \left( \sum_I^{N_\mathrm{conf}} e^{-A^\circ_I/RT} \right)-RT \ln \left(  e^{-p^\circ V/RT} \right)$

$=-RT \ln \left( \sum_I^{N_\mathrm{conf}} e^{-G_I^\circ/RT} \right)$

The difference between equation (1) and (3) is the conformational entropy
$G^\circ - \left\langle G^\circ \right\rangle=-RT \ln \left( \sum_J^{N_\mathrm{conf}} e^{-G_J^\circ/RT} \right)\sum_I^{N_\mathrm{conf}} p_I$

$+RT\sum_I^{N_\mathrm{conf}} p_I\frac{-G_I}{RT}$

$=RT \sum_I^{N_\mathrm{conf}} p_I \left( \frac{-G_I}{RT}- \ln \left( \sum_J^{N_\mathrm{conf}} e^{-G_J^\circ/RT} \right) \right)$

$=RT \sum_I^{N_\mathrm{conf}} p_I \left( \ln \left(  e^{-G_I^\circ/RT} \right) - \ln \left( \sum_J^{N_\mathrm{conf}} e^{-G_J^\circ/RT} \right) \right)  $

$=RT \sum_I^{N_\mathrm{conf}} p_I \ln(p_I) =-TS_{\mathrm{conf}} $


Wednesday, January 7, 2015

Predicting binding free energies with electronic structure theory: thermodynamic considerations - Part 3


2015.01.25:  I have summarized discussion of this and related issues in this paper.

Predicting absolute binding free energies of biologically relevant molecules (in aqueous solution at pH 7 using electronic structure theory without empirical adjustments to within 1 kcal/mol) is one of the holy grails of computational chemistry. Recent work by Grimme and co-workers (here and here) have come close with mean absolute deviations of about 2 kcal/mol for host-guest complexes. This and other work has shed some light on why it is so difficult to predict binding free energies in aqueous solution, which I will discuss here and in two previous posts (Part 1 and Part 2). In addition I'll talk about further improvements that could potentially increase the accuracy further.

The approach is based on the following equation

$\Delta G^\circ = \Delta E_\mathrm{gas} + \Delta G^\circ_{\mathrm{gas,RRHO}}+\Delta \delta G^\circ_{\mathrm{solv}}$

where $\Delta E_\mathrm{gas}$ is the change in electronic energy and $\Delta G^\circ_{\mathrm{gas,RRHO}}$ the change in the translational, rotational, and vibrational free energy computed using the rigid-rotor and harmonic oscillator approximation, respectively - both evaluated in the gas phase. The vibrational frequencies are computed at a lower level of theory compared to that used to compute $\Delta E_{\mathrm{gas}}$. $\Delta \delta G^\circ_{\mathrm{solv}}$ is the change in solvation free energy computed using a continuum model of the solvent, such as COSMO.

Solvation Thermodynamics
Background. Most continuum models (CMs) of solvation compute the solvation free energy as the difference between the free energy in solution and the gas phase electronic energy

$\delta G^\circ\mathrm{_{solv}(X)}=G\mathrm{^{\circ,CM}_{soln}(X)}-E\mathrm{_{gas}(X)}$

$G\mathrm{^{CM}_{soln}(X)}$ typically contain energy terms describing the electrostatic interaction of the molecule and the continuum as well as the van der Waal inteactions with the solvent and free energy required to create the molecular cavity in the solvent (cavitation). The electrostatic interaction with the solvent alters the molecular wavefunction and is computed self-consistently.

Some software packages automatically compute $G\mathrm{^{\circ,CM}_{soln}(X)}$ and $E\mathrm{_{gas}(X)}$ in one run, while others only compute $G\mathrm{^{\circ,CM}_{soln}(X)}$. Also, some programs just compute the electrostatic component of $G\mathrm{^{\circ,CM}_{soln}(X)}$ by default.  However, the van der Waals and, especially, the cavitation component can make sizable contributions to the binding free energy and must be included for accurate results.  It is worth noting that any hydrophobic contribution to binding will derive primarily from the change in cavitation energy.

$G\mathrm{^{CM}_{soln}(X)}$ contain parameters (e.g. atomic radii) that are adjusted to reproduce experimentally measured solvation free energies

$\delta G\mathrm{^\circ_{solv}(X)}=G\mathrm{^{\circ,exp}_{soln}(X)}-G\mathrm{^{\circ,exp}_{gas}(X)}$

The standard state for both $G\mathrm{^{\circ,exp}_{soln}(X)}$ and $G\mathrm{^{\circ,exp}_{gas}(X)}$ is generally 1 M.  The latter is the reason why a 1 M reference state also must be used when computing $\Delta G^\circ_{\mathrm{RRHO}}$

Atomic radii. The solvation energy is computed using a set of atomic radii that define the solute-solvent boundary surface. These radii are usually obtained by fitting to experimentally measured solvation energies. Accurate solvation energies should not be expected from methods that use iso-electron density surfaces or van der Waals radii without additional empirical fitting. When using fitted radii one should use the same level of theory for the solute as was used in the parameterization.

Ions. For neutral molecules solvation free energies can be measured with an accuracy of roughly 0.2 kcal/mol and reproduced theoretically to within roughly 0.5 kcal/mol.  However, the solvation energies of ions cannot be directly measured and must be indirectly inferred relative to a standard (typically the solvation energy of the proton). The experimentally obtained solvation energies are typically accurate to within 3 kcal/mol and can be reproduced computationally with roughly the same accuracy.  The solvation energy of ions are therefore an especially likely source of error in binding free energies - especially if the ionic regions of the molecules become significantly desolvated due to binding.

Gas phase vs solution optimization.  The fitting of the radii described above is usually done using gas phase optimized structures only, i.e. any change in structure and corresponding rotational and vibrational effects are "included" in the radii via the parameterization.  However, for ionic species gas phase optimization can lead to significantly distorted structures or even proton transfer and in these cases solution phase optimizations and, hence, vibrational frequency calculations, tend to be used. However, numerical instability in the continuum models can make it necessary to increase (i.e. make less stringent) the geometry convergence criteria and can lead to more imaginary frequencies than in the gas phase. One option is to compute the vibrational contribution to $\Delta G^\circ_{\mathrm{RRHO}}$ using gas phase optimized structures.

When using solution phase geometries the gas phase single point energies needed to evaluate $\delta G\mathrm{^\circ_{solv}(X)}$ represent added computational expense and it can be tempting to use solution phase free energies to evaluate the binding free energies

$\Delta G^\circ = \Delta G\mathrm{^{\circ,CM}_{soln}}- \Delta G\mathrm{^{\circ,CM,RRHO}_{soln}}$

One problem with this approach is that $\Delta G\mathrm{^{\circ,CM}_{soln}}$, unlike $\Delta E_{\mathrm{gas}}$, is not systematically improveable due to the empirical parameterization.

Cavities. The atomic radii and corresponding cavity generation algorithms are parameterized for small molecules. For more complex molecules such as receptors this can lead to continuum solvation of regions of molecules, e.g. deep in the binding pocket, that are not accessible to the molecular solvent. Furthermore, any solvent molecule inside such pocket is likely to be quite "un-bulk-like" and not well-represented by the bulk solvent or fixed by the underlying parametrization.  However, how big an error this may introduce to the binding free energy is not really known, but certain models for the cavitation energy have been shown to give unrealistically large contributions to the binding free energy.

Explicit water molecules. Adding explicit solvent molecules to the receptor and/or ligand can potentially lead to more accurate results. For example, including explicit water molecules around ionic sites reduces the strong dependence of the solvation energy on the corresponding atomic radii. Also, "un-bulk-like" water molecules now are treated more naturally and the risk of solvating non-solvent-acesssible regions is reduced somewhat.  However, adding explicit solvent molecules increases the computational cost by increasing the CPU time needed to compute energies, perform conformational searches, and compute vibrational frequencies.

There are several approaches to include the effect of explicit solvent molecules in the binding free energy.  Bryantsev and co-workers suggest computing the solvation energy by

$\delta G^\circ_{\mathrm{solv},n}\mathrm{(X)}=\Delta G^\circ_{\mathrm{gas}}\mathrm{(X(H_2O)_n})+\delta G^\circ_{\mathrm{solv}}\mathrm{(X(H_2O)_n)}-$

$\delta G^{\circ,liq}_{\mathrm{solv}}\mathrm{((H_2O)_n)}$

where

$\Delta G^\circ_{\mathrm{gas}}\mathrm{(X(H_2O)_n})=G^\circ_{\mathrm{gas}}\mathrm{(X(H_2O)_n})-G^\circ_{\mathrm{gas}}\mathrm{(X})-G^\circ_{\mathrm{gas}}\mathrm{((H_2O)_n})$

and

$\delta G^{\circ,liq}_{\mathrm{solv}}\mathrm{((H_2O)_n)}=\delta G^\circ_{\mathrm{solv}}\mathrm{((H_2O)_n)}+RT\ln(\mathrm{[H_2O]/n})$

with "$\circ$" and "$liq$" referring to a standard state of 1 M and 55.34, respectively.  The term $RT\ln(\mathrm{[H_2O]/n})$ is the free energy required to change the standard state of (H$_2$O)$_n$ from 1 M to 55.34/n M.

Bryantsev et al. have shown that using this water cluster approach leads to a smooth convergence of the solvation free energy with respect to the cluster size $n$.  The optimum choice of $n$ is this one where an additional water does changes the solvation energy by less than a certain user defined amount.  One can thereby compute the optimum number of water molecules for the receptor ($n$), ligand ($m$) and receptor-ligand complex ($l$) and then compute the binding free energy as

$\Delta \delta G^\circ_{\mathrm{solv},x}=\delta G^\circ_{\mathrm{solv},l}\mathrm{(RL)}-\delta G^\circ_{\mathrm{solv},n}\mathrm{(L)}-\delta G^\circ_{\mathrm{solv},m}\mathrm{(R)}$

and computing $\Delta E_{\mathrm{gas}}$ and $\Delta G^\circ_{\mathrm{RRHO}}$ as before.  One can show that this corresponds to the free energy change for this reaction

$\mathrm{L(H_2O)_n(aq)+R(H_2O)_m(aq)+(H_2O)_l(liq) \rightleftharpoons RL(H_2O)_l(aq)}+$
$\mathrm{(H_2O)_n(liq)+L(H_2O)_m(liq)}$ (1)

In principle the free energy change is zero for

$\mathrm{(H_2O)_l(liq) \rightleftharpoons+(H_2O)_n(liq)+L(H_2O)_m(liq)+sgn(d)(H_2O)_{|d|}(liq)}$

where $d=l-m-n$ and sgn(d) returns the sign of d. So the free energy change for Reaction (1) can also be computed as the free energy change for

$\mathrm{L(H_2O)_n(aq)+R(H_2O)_m(aq) \rightleftharpoons RL(H_2O)_l(aq)+sgn(d)(H_2O)_{|d|}}$ (2)

However, this is only approximately true in practice due to errors in the computed gas phase and solvation free energies. Furthermore, Reaction 2 does not really lead to any significant reduction in CPU time because the water cluster free energies only have to be computed once. However, if Reaction (2) is used then one must add an additional term correcting for the indistinguishability of water molecules

$G^\circ_{\mathrm{RRHO}}\mathrm{(X(H_2O)_n)} \rightarrow G^\circ_{\mathrm{RRHO}}\mathrm{(X(H_2O)_n)}-RT\ln (n!)$

and similarly for the water clusters.  Using Reaction (1) leads to a cancellation of this term and also maximizes error cancellation in the other energy terms. Similar considerations apply to when using individual water molecules to the balance the reaction instead of water clusters

$\mathrm{(H_2O)_l(liq) \rightleftharpoons (H_2O)_n(liq)+L(H_2O)_m(liq)+sgn(d)|d|H_2O}$

When using many explicit water molecules the error in the continuum solvation energies can be reduced by ensuring that the continuum solvation energy of a single water molecule matches the experimental value of -6.32 kcal/mol at 298.15K as close as possible.


This work is licensed under a Creative Commons Attribution 4.0

Thursday, January 1, 2015

Computational Chemistry Highlights: December issue

The December issue of Computational Chemistry Highlights is out.

CCH is an overlay journal that identifies the most important papers in computational and theoretical chemistry published in the last 1-2 years. CCH is not affiliated with any publisher: it is a free resource run by scientists for scientists. You can read more about it here.

Table of content for this issue features contributions from CCH editors Alán Aspuru-Guzik, Robert Paton, Steven Bachrach, David Bowler, and Jan Jensen:

Discovering chemistry with an ab initio nanoreactor








Tautomerism in Neutral Histidine

Interested in more? There are many ways to subscribe to CCH updates.

Also, for your daily computational chemistry fix subscribe to Computational Chemistry Daily


This work is licensed under a Creative Commons Attribution 4.0

Monday, December 29, 2014

Predicting binding free energies with electronic structure theory: thermodynamic considerations - Part 2


2015.01.25:  I have summarized discussion of this and related issues in this paper.  I have also made some changes in the post to reflect what I have learned since I wrote it.

Predicting absolute binding free energies of biologically relevant molecules (in aqueous solution at pH 7 using electronic structure theory without empirical adjustments to within 1 kcal/mol) is one of the holy grails of computational chemistry. Recent work by Grimme and co-workers (here and here) have come close with mean absolute deviations of about 2 kcal/mol for host-guest complexes. This and other work has shed some light on why it is so difficult to predict binding free energies in aqueous solution, which I will discuss here and in a previous post. In addition I'll talk about further improvements that could potentially increase the accuracy further.

The general approach is based on the following equation

$\Delta G^\circ = \Delta E + \Delta G^\circ_{\mathrm{RRHO}}+\Delta \delta G_{\mathrm{solv}}$

where $\Delta E$ is the change in electronic energy and $\Delta G^\circ_{\mathrm{RRHO}}$ the change in the translational, rotational, and vibrational free energy computed using the rigid-rotor and harmonic oscillator approximation, respectively. The vibrational frequencies are computed at a lower level of theory compared to that used to compute $\Delta E$. $\Delta \delta G_{\mathrm{solv}}$ is the change in solvation free energy computed using a polarizable continuum model such as COSMO.

pH and molecular charge. Virtually all binding measurements in aqueous solution are performed in buffer with a constant pH and many ligands and or receptors contain one or more ionizable groups. The charge of an ionizable (acid/base) group in aqueous solution depends on its pK$_a$ and the pH:

$q= \frac{1}{1+10^{\mathrm{pH}-\mathrm{p}K_a}}-\delta$

where $\delta$ is 1 for an acid and 0 for a base.  In most cases, selecting the wrong charge for the ligand and/or host will result in a significant error in the computed binding free energy.  The pK$_a$ can be computed using electronic structure theory or empirically using software such Marvin. However, if the pK$_a$ value is perturbed by the binding the situation may be complicated further. Here I illustrate this point for a simple example where the ligand (L) has a basic group that is neutral when deprotonated and the receptor (R) is non-ionizable.

$\mathrm{R + L(H^+) \rightleftharpoons RL(H^+)}$

The (apparent) experimental binding constant is then

$K^\prime=\mathrm{\frac{[RL]+[RLH^+]}{[R]([L]+[LH^+])}}$

and the corresponding binding free energy is

$\Delta G^{\prime\circ}=\Delta G^{\circ}(+)-RT\ln \left( \frac{1+10^{\mathrm{pH}-\mathrm{p}K_a^c}}{1+10^{\mathrm{pH}-\mathrm{p}K_a^f}} \right)$   (1)

where $\Delta G^{\circ}(+)$ is the binding free energy computed using the charged (protonated) form of the ligand and pK$_a^c$ and pK$_a^f$ are the pK$_a$ values the ligand bound to the receptor and the free ligand, respectively.

If the pK$_a$ is unaffected by the binding, the binding free energy is unaffected by pH and the chosen protonation state of the ligand. For the remaining scenarios it is instructive to plug in some numbers. For example, for pH = 7, pK$_a^f$ = 9 and pK$_a^c$ = 11, the ligand is protonated before and after binding and $\Delta G^{\prime\circ}=\Delta G^{\circ}(+)$ is a good approximation. However, if pH = 7, pK$_a^f$ = 5 and pK$_a^c$ = 3, the ligand is neutral before and after binding and assuming it is charged leads to a 2.7 kcal/mol error in the binding free energy (unless corrected by this equation).  Finally, if pH = 7, pK$_a^f$ = 8 and pK$_a^c$ = 6, the ligand is (91%) charged before and (91%) neutral after binding and assuming it remains charged leads to a 1.4 kcal/mol error in the binding free energy.

For many ligands of interest the pK$_a^f$ can be estimated fairly accurately in a matter of second using programs such as Marvin. The effect of binding on pK$_a^f$ can often be estimated by chemical intuition since hydrogen bonds to charged acid and basic groups tend to, respectively, lower or raise the pK$_a$ even further.  For example, if an amine with pK$_a^f$ = 9 binds to the receptor via hydrogen bonding, then pK$_a^c$ is likely higher than 9 and $\Delta G^{\prime\circ}=\Delta G^{\circ}(+)$ is a good approximation.  However, if pK$_a^f$ is close to 7 then pK$_a^c$ should be computed.  Also, it is possible for charged ligands to change to their neutral state if they bind to hydrophobic or similarly charged receptors.

If pK$_a^f$ is known with some degree of confidence then pK$_a^c$ can be estimated by

$\mathrm{p}K_a^c=\mathrm{p}K_a^f-\Delta G^\circ /RT\ln(10)$

where $\Delta A^\circ$ is the free energy change for

$\mathrm{RLH^+ + L \rightleftharpoons  RL + LH^+}$

If there are several ionizable groups then Eq (1) generalizes to

$\Delta G^{\prime\circ}=\Delta G^{\circ}(-/+)-RT\ln \left(\sum_i \frac{1+10^{n_i(\mathrm{pH}-\mathrm{p}K_{a,i}^c)}}{1+10^{n_i(\mathrm{pH}-\mathrm{p}K_{a,i}^f)}} \right)$

where $\Delta G^{\circ}(-/+)$ is the binding free energy when all acids and bases are deprotonated and protonated, respectively, the sum runs over all ionizable groups and $n_i$ is 1 and -1 if $i$ is a base or acid, respectively.

However, this assumes that the ionizable groups titrate independently of one another, i.e. that the pK$_a$ value of one group is independent of the protonation states of all other ionizable groups.  If that is not the case then it is difficult to give a general expression for the pH-dependent free energy correction in terms of pK$_a$ values (though it can easily be derived for a specific case).

Instead a general expression can be written in terms of Legendre transformed free energies as suggested by Alberty (modified here to electronic structure calculations):

$G^{\prime\circ}(\overline{X})=-RT\ln \left( \sum_i \exp{(-G^{\prime\circ}(X_i)/RT)} \right)$

Here the sum runs over all possible protonation states and

$G^{\prime\circ}(X_i)=G^{\circ}(X_i)-n(\mathrm{H^+})[\delta G^\circ (\mathrm{H^+}) -RT\ln(10)\mathrm{pH}]$

where $G^{\circ}(X_i)$ is the usual standard free energy of protonation state $i$, $n(\mathrm{H^+})$ is the number of ionizable proton in pronation state $i$, and $\delta G^\circ (\mathrm{H^+})$ is the solvation free energy of the proton.  So in the case of ligand L considered above, $n(\mathrm{H^+})$ is 0 and 1 for L and LH$^+$, respectively. $\delta G^\circ (\mathrm{H^+})$ is usually taken from the literature where estimates vary between -265.8 and -268.6 kcal/mol.

Thus, Eq (1) can be rewritten as

$\Delta G^{\prime\circ}=G^{\prime\circ}(\overline{RL})-G^{\prime\circ}(\overline{L})-G^{\circ}(R)$

Since the electronic energy contribution to the standard free energy can be very large in magnitude this form is more easily evaluated

$G^{\prime\circ}(\overline{X})=G^{\prime\circ}({X_0})-RT\ln \left( 1+\sum_{i\ne0} \exp{(-G^{\prime\circ}(X_i)/RT)} \right)$

where $X_0$ is some arbitrarily chosen reference protonation state, for example that for which $n(\mathrm{H^+})$ = 0.  The sum can be combined with that over different conformations, discussed in a previous post.

Other ions
The buffers that are commonly used to regulate the pH also contain other ions, such as Na$^+$, Mg$^{2+}$, Cl$^-$.  At high ion concentrations, it is possible that these ions bind at certain sites in the ligand, receptor, or ligand-receptor complex with sufficient probability that they must be included in the thermodynamics. If so the exact same equations and considerations outlined above for H$^+$ also apply to, e.g. Cl$^-$ and pCl$^-$ (computed from the specified buffer concentration) is used instead of pH.



This work is licensed under a Creative Commons Attribution 4.0

Saturday, December 27, 2014

Predicting binding free energies with electronic structure theory: thermodynamic considerations - Part 1


2015.01.25:  I have summarized discussion of this and related issues in this paper.  I have also made some changes in the post to reflect what I have learned since I wrote it.

Predicting absolute binding free energies of biologically relevant molecules (in aqueous solution at pH 7 using electronic structure theory without empirical adjustments to within 1 kcal/mol) is one of the holy grails of computational chemistry. Recent work by Grimme and co-workers (here and here) have come close with mean absolute deviations of about 2 kcal/mol for host-guest complexes. This and other work has shed some light on why it is so difficult to predict binding free energies in aqueous solution, which I will discuss here and in subsequent posts. In addition I'll talk about further improvements that could potentially increase the accuracy further.

The approach is based on the following equation

$\Delta G^\circ = \Delta E + \Delta G^\circ_{\mathrm{RRHO}}+\Delta \delta G_{\mathrm{solv}}$

where $\Delta E$ is the change in electronic energy and $\Delta G^\circ_{\mathrm{RRHO}}$ the change in the translational, rotational, and vibrational free energy computed using the rigid-rotor and harmonic oscillator approximation, respectively. The vibrational frequencies are computed at a lower level of theory compared to that used to compute $\Delta E$. $\Delta \delta G_{\mathrm{solv}}$ is the change in solvation free energy computed using a polarizable continuum model such as COSMO.

Electronic energy
Grimme has shown that dispersion typically makes a very big (>10 kcal/mol) contribution to binding free energies of host-guest complexes. Dispersion corrections are therefore a must if DFT is used to compute the electronic binding energy. Furthermore, Grimme has shown that three-body dispersion makes a non-negligible (2-3 kcal/mol) contribution to the electronic binding energy.  For convergent methods this effect is only included in rather expensive methods that involve triple-excitations such as MP4 and CCSD(T).

Molecular Thermodynamics
The translational, rotational and vibrational thermodynamic contribution to the binding free energy is very large (>10 kcal/mol) and must be included for accurate results.  Some years ago there was a bit of confusion in the literature about whether the RRHO approach was appropriate for condenses phase systems, but Zhou and Gilson have clarified this beautifully.

Standard state.  Most electronic structure codes compute the RRHO energy corrections for an ideal gas, where the standard state is a pressure of 1 bar.  The standard state for solution is 1 M, so the free energy of a molecule X must be corrected accordingly

$G^\circ_{\mathrm{RRHO}}(\mathrm{X}) = G^{\circ,\mathrm{1 bar}}_{\mathrm{RRHO}}(\mathrm{X})-RT\ln (V^{-1})$

where $V$ is the volume of an ideal gas at 1 bar and temperature $T$. At 298K $-RT\ln (V^{-1})$ = 1.90 kcal/mol.  This correction is already included in the solvation energy $\delta G_{\mathrm{solv}}$

Symmetry. Many host molecules and some guest molecules are symmetric, which affects the rotational entropy through the symmetry number ($\sigma$) which is a function of the point group.

$S_{rot}=R\ln\left(\frac{8\pi^2}{\sigma}\left(\frac{2\pi ekT}{h^2}\right)^{3/2}\sqrt{I_1I_2I_3}\right)$

It can be very difficult to build large molecules with the correct point group and most studies use $C_1$ symmetry.  In this case the effect of symmetry must be added manually to the free energy

$G^\circ_{\mathrm{RRHO}}(\mathrm{X}) \rightarrow G^\circ_{\mathrm{RRHO}}(\mathrm{X})+RT \ln(\sigma_X)$

As an example, the popular host molecule corcubit[7] has $D_{7h}$ symmetry and a corresponding $\sigma$ value of 14, in which case the correction contributes 1.56 kcal/mol to the free energy at 298K.

Anharmonicity and low frequency modes. Host-guest complexes can exhibit very low frequency vibrations on the order of 50 cm$^{-1}$ or less, which tend to dominate the vibrational entropy contribution.  Grimme (and many others) have questioned whether the harmonic approximation is valid for such low frequency modes and this is an open research question.  The main problem is that it is very difficult to compute the vibrational entropy exactly.  Most methods for computing anharmonic vibrational are developed to obtain the 2 or 3 lowest energy states, but for very low frequency modes 10-20 states likely significantly populated at room temperature and therefore contribute to the entropy.

In the absence of theoretical benchmarks, comparison to experiment can prove constructive. Kjærgaard and co-workers have recently measured standard binding free energies for small gas phase compounds and compared them to CCSD(T)/aug-cc-pV(T+d) calculations.  For example, in the case of acetronitrile-HCl the measured binding free energy at 295K is between 1.2 and 1.9 kcal/mol, while the predicted value is 2.3 kcal/mol using the harmonic approximation.  Since the errors in $\Delta E$ and the rigid-rotor approximation presumably are quite low, this suggest and error in the vibrational free energy of at most 1.1 kcal/mol, despite the fact that the lowest vibrational frequency is only about 30 cm$^{-1}$.  Furthermore, the error can be reduced by 0.4 kcal/mol by scaling the harmonic frequencies by anharmonic scaling factors suggested by Shields and co-workers.  Similar results were found for dimethylsulfide-HCl.  So there are some indications that the harmonic approximation yields free energy corrections that are reasonable and that can be improved upon by relatively minor corrections.

Grimme has taken a different approach by arguing that low-frequency modes resemble free rotations and using the corresponding entropy term for low frequency modes.  This changes the RRHO free energy correction by 0.5 - 4 kcal/mol, depending on the system.

Imaginary frequencies. Low frequencies are especially susceptible to numerical error and it is not unusual to see 1 or 2 imaginary frequencies of low magnitude in a vibrational analysis of a host-guest complex.  Since imaginary frequencies are excluded from the vibrational free energy this effectively removes 1 or 2 low frequency contributions to the vibrational free energy. For example, a 30 cm$^{-1}$ frequency contributes about 1.7 kcal/mol to the free energy at 298K.

Imaginary frequencies resulting from a flat PES and numerical errors can often be removed by making the convergence criteria for the geometry optimization and electronic energy minimization more stringent and making the grid size finer in the case of DFT calculations. If the Hessian is computed using finite difference it is important to use double-differencing.  If all else fails, it is probably better to pretend that the imaginary frequency is real and add the corresponding vibrational free energy contribution.

Conformations. One of the main problems in computing accurate binding free energies is to identify the structures of the host, guest and (especially) the host-guest complex with the lowest free energy. Because both the RRHO and solvation energy contributions contribute greatly to the binding free energy change, simply finding the structure with the lowest electronic energy and computing the free energy only for that conformation is probably not enough.

The free energy for a molecule (X) with N conformations the standard free energy is

$G^\circ(\mathrm{X})= G^\circ _0(\mathrm{X})-RT\ln\left(\sum^N_{i=1} \left( 1+\exp{\left( -(G^\circ _i- G^\circ _0)/RT\right)} \right)\right)$

where $G^\circ _0(\mathrm{X})$ is the conformation with lowest free energy.  Conformations with free energies higher than 1.36 kcal/mol contribute less than 0.1 to the sum at 298K.  So a significant number of very low free energy structures is needed to make even a 0.5 kcal/mol contribution to the free energy.  Conformations related by symmetry should not be included here as their effects are accounted for in the rotational entropy (see above).  Also, conformationally flexible regions of the ligand or host that are unaffected by binding need not be explored since the effect on the binding free energy will cancel.



This work is licensed under a Creative Commons Attribution 4.0