## Wednesday, January 7, 2015

### Predicting binding free energies with electronic structure theory: thermodynamic considerations - Part 3

2015.01.25:  I have summarized discussion of this and related issues in this paper.

Predicting absolute binding free energies of biologically relevant molecules (in aqueous solution at pH 7 using electronic structure theory without empirical adjustments to within 1 kcal/mol) is one of the holy grails of computational chemistry. Recent work by Grimme and co-workers (here and here) have come close with mean absolute deviations of about 2 kcal/mol for host-guest complexes. This and other work has shed some light on why it is so difficult to predict binding free energies in aqueous solution, which I will discuss here and in two previous posts (Part 1 and Part 2). In addition I'll talk about further improvements that could potentially increase the accuracy further.

The approach is based on the following equation

$\Delta G^\circ = \Delta E_\mathrm{gas} + \Delta G^\circ_{\mathrm{gas,RRHO}}+\Delta \delta G^\circ_{\mathrm{solv}}$

where $\Delta E_\mathrm{gas}$ is the change in electronic energy and $\Delta G^\circ_{\mathrm{gas,RRHO}}$ the change in the translational, rotational, and vibrational free energy computed using the rigid-rotor and harmonic oscillator approximation, respectively - both evaluated in the gas phase. The vibrational frequencies are computed at a lower level of theory compared to that used to compute $\Delta E_{\mathrm{gas}}$. $\Delta \delta G^\circ_{\mathrm{solv}}$ is the change in solvation free energy computed using a continuum model of the solvent, such as COSMO.

Solvation Thermodynamics
Background. Most continuum models (CMs) of solvation compute the solvation free energy as the difference between the free energy in solution and the gas phase electronic energy

$\delta G^\circ\mathrm{_{solv}(X)}=G\mathrm{^{\circ,CM}_{soln}(X)}-E\mathrm{_{gas}(X)}$

$G\mathrm{^{CM}_{soln}(X)}$ typically contain energy terms describing the electrostatic interaction of the molecule and the continuum as well as the van der Waal inteactions with the solvent and free energy required to create the molecular cavity in the solvent (cavitation). The electrostatic interaction with the solvent alters the molecular wavefunction and is computed self-consistently.

Some software packages automatically compute $G\mathrm{^{\circ,CM}_{soln}(X)}$ and $E\mathrm{_{gas}(X)}$ in one run, while others only compute $G\mathrm{^{\circ,CM}_{soln}(X)}$. Also, some programs just compute the electrostatic component of $G\mathrm{^{\circ,CM}_{soln}(X)}$ by default.  However, the van der Waals and, especially, the cavitation component can make sizable contributions to the binding free energy and must be included for accurate results.  It is worth noting that any hydrophobic contribution to binding will derive primarily from the change in cavitation energy.

$G\mathrm{^{CM}_{soln}(X)}$ contain parameters (e.g. atomic radii) that are adjusted to reproduce experimentally measured solvation free energies

$\delta G\mathrm{^\circ_{solv}(X)}=G\mathrm{^{\circ,exp}_{soln}(X)}-G\mathrm{^{\circ,exp}_{gas}(X)}$

The standard state for both $G\mathrm{^{\circ,exp}_{soln}(X)}$ and $G\mathrm{^{\circ,exp}_{gas}(X)}$ is generally 1 M.  The latter is the reason why a 1 M reference state also must be used when computing $\Delta G^\circ_{\mathrm{RRHO}}$

Atomic radii. The solvation energy is computed using a set of atomic radii that define the solute-solvent boundary surface. These radii are usually obtained by fitting to experimentally measured solvation energies. Accurate solvation energies should not be expected from methods that use iso-electron density surfaces or van der Waals radii without additional empirical fitting. When using fitted radii one should use the same level of theory for the solute as was used in the parameterization.

Ions. For neutral molecules solvation free energies can be measured with an accuracy of roughly 0.2 kcal/mol and reproduced theoretically to within roughly 0.5 kcal/mol.  However, the solvation energies of ions cannot be directly measured and must be indirectly inferred relative to a standard (typically the solvation energy of the proton). The experimentally obtained solvation energies are typically accurate to within 3 kcal/mol and can be reproduced computationally with roughly the same accuracy.  The solvation energy of ions are therefore an especially likely source of error in binding free energies - especially if the ionic regions of the molecules become significantly desolvated due to binding.

Gas phase vs solution optimization.  The fitting of the radii described above is usually done using gas phase optimized structures only, i.e. any change in structure and corresponding rotational and vibrational effects are "included" in the radii via the parameterization.  However, for ionic species gas phase optimization can lead to significantly distorted structures or even proton transfer and in these cases solution phase optimizations and, hence, vibrational frequency calculations, tend to be used. However, numerical instability in the continuum models can make it necessary to increase (i.e. make less stringent) the geometry convergence criteria and can lead to more imaginary frequencies than in the gas phase. One option is to compute the vibrational contribution to $\Delta G^\circ_{\mathrm{RRHO}}$ using gas phase optimized structures.

When using solution phase geometries the gas phase single point energies needed to evaluate $\delta G\mathrm{^\circ_{solv}(X)}$ represent added computational expense and it can be tempting to use solution phase free energies to evaluate the binding free energies

$\Delta G^\circ = \Delta G\mathrm{^{\circ,CM}_{soln}}- \Delta G\mathrm{^{\circ,CM,RRHO}_{soln}}$

One problem with this approach is that $\Delta G\mathrm{^{\circ,CM}_{soln}}$, unlike $\Delta E_{\mathrm{gas}}$, is not systematically improveable due to the empirical parameterization.

Cavities. The atomic radii and corresponding cavity generation algorithms are parameterized for small molecules. For more complex molecules such as receptors this can lead to continuum solvation of regions of molecules, e.g. deep in the binding pocket, that are not accessible to the molecular solvent. Furthermore, any solvent molecule inside such pocket is likely to be quite "un-bulk-like" and not well-represented by the bulk solvent or fixed by the underlying parametrization.  However, how big an error this may introduce to the binding free energy is not really known, but certain models for the cavitation energy have been shown to give unrealistically large contributions to the binding free energy.

Explicit water molecules. Adding explicit solvent molecules to the receptor and/or ligand can potentially lead to more accurate results. For example, including explicit water molecules around ionic sites reduces the strong dependence of the solvation energy on the corresponding atomic radii. Also, "un-bulk-like" water molecules now are treated more naturally and the risk of solvating non-solvent-acesssible regions is reduced somewhat.  However, adding explicit solvent molecules increases the computational cost by increasing the CPU time needed to compute energies, perform conformational searches, and compute vibrational frequencies.

There are several approaches to include the effect of explicit solvent molecules in the binding free energy.  Bryantsev and co-workers suggest computing the solvation energy by

$\delta G^\circ_{\mathrm{solv},n}\mathrm{(X)}=\Delta G^\circ_{\mathrm{gas}}\mathrm{(X(H_2O)_n})+\delta G^\circ_{\mathrm{solv}}\mathrm{(X(H_2O)_n)}-$

$\delta G^{\circ,liq}_{\mathrm{solv}}\mathrm{((H_2O)_n)}$

where

$\Delta G^\circ_{\mathrm{gas}}\mathrm{(X(H_2O)_n})=G^\circ_{\mathrm{gas}}\mathrm{(X(H_2O)_n})-G^\circ_{\mathrm{gas}}\mathrm{(X})-G^\circ_{\mathrm{gas}}\mathrm{((H_2O)_n})$

and

$\delta G^{\circ,liq}_{\mathrm{solv}}\mathrm{((H_2O)_n)}=\delta G^\circ_{\mathrm{solv}}\mathrm{((H_2O)_n)}+RT\ln(\mathrm{[H_2O]/n})$

with "$\circ$" and "$liq$" referring to a standard state of 1 M and 55.34, respectively.  The term $RT\ln(\mathrm{[H_2O]/n})$ is the free energy required to change the standard state of (H$_2$O)$_n$ from 1 M to 55.34/n M.

Bryantsev et al. have shown that using this water cluster approach leads to a smooth convergence of the solvation free energy with respect to the cluster size $n$.  The optimum choice of $n$ is this one where an additional water does changes the solvation energy by less than a certain user defined amount.  One can thereby compute the optimum number of water molecules for the receptor ($n$), ligand ($m$) and receptor-ligand complex ($l$) and then compute the binding free energy as

$\Delta \delta G^\circ_{\mathrm{solv},x}=\delta G^\circ_{\mathrm{solv},l}\mathrm{(RL)}-\delta G^\circ_{\mathrm{solv},n}\mathrm{(L)}-\delta G^\circ_{\mathrm{solv},m}\mathrm{(R)}$

and computing $\Delta E_{\mathrm{gas}}$ and $\Delta G^\circ_{\mathrm{RRHO}}$ as before.  One can show that this corresponds to the free energy change for this reaction

$\mathrm{L(H_2O)_n(aq)+R(H_2O)_m(aq)+(H_2O)_l(liq) \rightleftharpoons RL(H_2O)_l(aq)}+$
$\mathrm{(H_2O)_n(liq)+L(H_2O)_m(liq)}$ (1)

In principle the free energy change is zero for

$\mathrm{(H_2O)_l(liq) \rightleftharpoons+(H_2O)_n(liq)+L(H_2O)_m(liq)+sgn(d)(H_2O)_{|d|}(liq)}$

where $d=l-m-n$ and sgn(d) returns the sign of d. So the free energy change for Reaction (1) can also be computed as the free energy change for

$\mathrm{L(H_2O)_n(aq)+R(H_2O)_m(aq) \rightleftharpoons RL(H_2O)_l(aq)+sgn(d)(H_2O)_{|d|}}$ (2)

However, this is only approximately true in practice due to errors in the computed gas phase and solvation free energies. Furthermore, Reaction 2 does not really lead to any significant reduction in CPU time because the water cluster free energies only have to be computed once. However, if Reaction (2) is used then one must add an additional term correcting for the indistinguishability of water molecules

$G^\circ_{\mathrm{RRHO}}\mathrm{(X(H_2O)_n)} \rightarrow G^\circ_{\mathrm{RRHO}}\mathrm{(X(H_2O)_n)}-RT\ln (n!)$

and similarly for the water clusters.  Using Reaction (1) leads to a cancellation of this term and also maximizes error cancellation in the other energy terms. Similar considerations apply to when using individual water molecules to the balance the reaction instead of water clusters

$\mathrm{(H_2O)_l(liq) \rightleftharpoons (H_2O)_n(liq)+L(H_2O)_m(liq)+sgn(d)|d|H_2O}$

When using many explicit water molecules the error in the continuum solvation energies can be reduced by ensuring that the continuum solvation energy of a single water molecule matches the experimental value of -6.32 kcal/mol at 298.15K as close as possible.