2015.01.25: I have summarized discussion of this and related issues in this paper. I have also made some changes in the post to reflect what I have learned since I wrote it.

Predicting absolute binding free energies of biologically relevant molecules (in aqueous solution at pH 7 using electronic structure theory without empirical adjustments to within 1 kcal/mol) is one of the holy grails of computational chemistry. Recent work by Grimme and co-workers (here and here) have come close with mean absolute deviations of about 2 kcal/mol for host-guest complexes. This and other work has shed some light on why it is so difficult to predict binding free energies in aqueous solution, which I will discuss here and in a previous post. In addition I'll talk about further improvements that could potentially increase the accuracy further.

The general approach is based on the following equation

where $\Delta E$ is the change in electronic energy and $\Delta G^\circ_{\mathrm{RRHO}}$ the change in the translational, rotational, and vibrational free energy computed using the rigid-rotor and harmonic oscillator approximation, respectively. The vibrational frequencies are computed at a lower level of theory compared to that used to compute $\Delta E$. $\Delta \delta G_{\mathrm{solv}}$ is the change in solvation free energy computed using a polarizable continuum model such as COSMO.

where $\delta$ is 1 for an acid and 0 for a base. In most cases, selecting the wrong charge for the ligand and/or host will result in a significant error in the computed binding free energy. The p

The (apparent) experimental binding constant is then

and the corresponding binding free energy is

where $\Delta G^{\circ}(+)$ is the binding free energy computed using the charged (protonated) form of the ligand and p

If the p

For many ligands of interest the p

If p

where $\Delta A^\circ$ is the free energy change for

If there are several ionizable groups then Eq (1) generalizes to

where $\Delta G^{\circ}(-/+)$ is the binding free energy when all acids and bases are deprotonated and protonated, respectively, the sum runs over all ionizable groups and $n_i$ is 1 and -1 if $i$ is a base or acid, respectively.

However, this assumes that the ionizable groups titrate independently of one another, i.e. that the p

Instead a general expression can be written in terms of Legendre transformed free energies as suggested by Alberty (modified here to electronic structure calculations):

Here the sum runs over all possible protonation states and

where $G^{\circ}(X_i)$ is the usual standard free energy of protonation state $i$, $n(\mathrm{H^+})$ is the number of ionizable proton in pronation state $i$, and $\delta G^\circ (\mathrm{H^+})$ is the solvation free energy of the proton. So in the case of ligand L considered above, $n(\mathrm{H^+})$ is 0 and 1 for L and LH$^+$, respectively. $\delta G^\circ (\mathrm{H^+})$ is usually taken from the literature where estimates vary between -265.8 and -268.6 kcal/mol.

Thus, Eq (1) can be rewritten as

Since the electronic energy contribution to the standard free energy can be very large in magnitude this form is more easily evaluated

Predicting absolute binding free energies of biologically relevant molecules (in aqueous solution at pH 7 using electronic structure theory without empirical adjustments to within 1 kcal/mol) is one of the holy grails of computational chemistry. Recent work by Grimme and co-workers (here and here) have come close with mean absolute deviations of about 2 kcal/mol for host-guest complexes. This and other work has shed some light on why it is so difficult to predict binding free energies in aqueous solution, which I will discuss here and in a previous post. In addition I'll talk about further improvements that could potentially increase the accuracy further.

The general approach is based on the following equation

$\Delta G^\circ = \Delta E + \Delta G^\circ_{\mathrm{RRHO}}+\Delta \delta G_{\mathrm{solv}}$

where $\Delta E$ is the change in electronic energy and $\Delta G^\circ_{\mathrm{RRHO}}$ the change in the translational, rotational, and vibrational free energy computed using the rigid-rotor and harmonic oscillator approximation, respectively. The vibrational frequencies are computed at a lower level of theory compared to that used to compute $\Delta E$. $\Delta \delta G_{\mathrm{solv}}$ is the change in solvation free energy computed using a polarizable continuum model such as COSMO.

**pH and molecular charge**. Virtually all binding measurements in aqueous solution are performed in buffer with a constant pH and many ligands and or receptors contain one or more ionizable groups. The charge of an ionizable (acid/base) group in aqueous solution depends on its p*K*$_a$ and the pH:
$q= \frac{1}{1+10^{\mathrm{pH}-\mathrm{p}K_a}}-\delta$

where $\delta$ is 1 for an acid and 0 for a base. In most cases, selecting the wrong charge for the ligand and/or host will result in a significant error in the computed binding free energy. The p

*K*$_a$ can be computed using electronic structure theory or empirically using software such Marvin. However, if the p*K*$_a$ value is perturbed by the binding the situation may be complicated further. Here I illustrate this point for a simple example where the ligand (L) has a basic group that is neutral when deprotonated and the receptor (R) is non-ionizable.
$\mathrm{R + L(H^+) \rightleftharpoons RL(H^+)}$

The (apparent) experimental binding constant is then

$K^\prime=\mathrm{\frac{[RL]+[RLH^+]}{[R]([L]+[LH^+])}}$

and the corresponding binding free energy is

$\Delta G^{\prime\circ}=\Delta G^{\circ}(+)-RT\ln \left( \frac{1+10^{\mathrm{pH}-\mathrm{p}K_a^c}}{1+10^{\mathrm{pH}-\mathrm{p}K_a^f}} \right)$

**(1)**where $\Delta G^{\circ}(+)$ is the binding free energy computed using the charged (protonated) form of the ligand and p

*K*$_a^c$ and p*K*$_a^f$ are the p*K*$_a$ values the ligand bound to the receptor and the free ligand, respectively.If the p

*K*$_a$ is unaffected by the binding, the binding free energy is unaffected by pH and the chosen protonation state of the ligand. For the remaining scenarios it is instructive to plug in some numbers. For example, for pH = 7, p*K*$_a^f$ = 9 and p*K*$_a^c$ = 11, the ligand is protonated before and after binding and $\Delta G^{\prime\circ}=\Delta G^{\circ}(+)$ is a good approximation. However, if pH = 7, p*K*$_a^f$ = 5 and p*K*$_a^c$ = 3, the ligand is neutral before and after binding and assuming it is charged leads to a 2.7 kcal/mol error in the binding free energy (unless corrected by this equation). Finally, if pH = 7, p*K*$_a^f$ = 8 and p*K*$_a^c$ = 6, the ligand is (91%) charged before and (91%) neutral after binding and assuming it remains charged leads to a 1.4 kcal/mol error in the binding free energy.For many ligands of interest the p

*K*$_a^f$ can be estimated fairly accurately in a matter of second using programs such as Marvin. The effect of binding on p*K*$_a^f$ can often be estimated by chemical intuition since hydrogen bonds to charged acid and basic groups tend to, respectively, lower or raise the p*K*$_a$ even further. For example, if an amine with p*K*$_a^f$ = 9 binds to the receptor via hydrogen bonding, then p*K*$_a^c$ is likely higher than 9 and $\Delta G^{\prime\circ}=\Delta G^{\circ}(+)$ is a good approximation. However, if p*K*$_a^f$ is close to 7 then p*K*$_a^c$ should be computed. Also, it is possible for charged ligands to change to their neutral state if they bind to hydrophobic or similarly charged receptors.If p

*K*$_a^f$ is known with some degree of confidence then p*K*$_a^c$ can be estimated by
$\mathrm{p}K_a^c=\mathrm{p}K_a^f-\Delta G^\circ /RT\ln(10)$

where $\Delta A^\circ$ is the free energy change for

$\mathrm{RLH^+ + L \rightleftharpoons RL + LH^+}$

If there are several ionizable groups then Eq (1) generalizes to

$\Delta G^{\prime\circ}=\Delta G^{\circ}(-/+)-RT\ln \left(\sum_i \frac{1+10^{n_i(\mathrm{pH}-\mathrm{p}K_{a,i}^c)}}{1+10^{n_i(\mathrm{pH}-\mathrm{p}K_{a,i}^f)}} \right)$

where $\Delta G^{\circ}(-/+)$ is the binding free energy when all acids and bases are deprotonated and protonated, respectively, the sum runs over all ionizable groups and $n_i$ is 1 and -1 if $i$ is a base or acid, respectively.

However, this assumes that the ionizable groups titrate independently of one another, i.e. that the p

*K*$_a$ value of one group is independent of the protonation states of all other ionizable groups. If that is not the case then it is difficult to give a general expression for the pH-dependent free energy correction in terms of p*K*$_a$ values (though it can easily be derived for a specific case).Instead a general expression can be written in terms of Legendre transformed free energies as suggested by Alberty (modified here to electronic structure calculations):

$G^{\prime\circ}(\overline{X})=-RT\ln \left( \sum_i \exp{(-G^{\prime\circ}(X_i)/RT)} \right)$

Here the sum runs over all possible protonation states and

$G^{\prime\circ}(X_i)=G^{\circ}(X_i)-n(\mathrm{H^+})[\delta G^\circ (\mathrm{H^+}) -RT\ln(10)\mathrm{pH}]$

where $G^{\circ}(X_i)$ is the usual standard free energy of protonation state $i$, $n(\mathrm{H^+})$ is the number of ionizable proton in pronation state $i$, and $\delta G^\circ (\mathrm{H^+})$ is the solvation free energy of the proton. So in the case of ligand L considered above, $n(\mathrm{H^+})$ is 0 and 1 for L and LH$^+$, respectively. $\delta G^\circ (\mathrm{H^+})$ is usually taken from the literature where estimates vary between -265.8 and -268.6 kcal/mol.

Thus, Eq (1) can be rewritten as

$\Delta G^{\prime\circ}=G^{\prime\circ}(\overline{RL})-G^{\prime\circ}(\overline{L})-G^{\circ}(R)$

Since the electronic energy contribution to the standard free energy can be very large in magnitude this form is more easily evaluated

$G^{\prime\circ}(\overline{X})=G^{\prime\circ}({X_0})-RT\ln \left( 1+\sum_{i\ne0} \exp{(-G^{\prime\circ}(X_i)/RT)} \right)$

where $X_0$ is some arbitrarily chosen reference protonation state, for example that for which $n(\mathrm{H^+})$ = 0. The sum can be combined with that over different conformations, discussed in a previous post.

**Other ions**

The buffers that are commonly used to regulate the pH also contain other ions, such as Na$^+$, Mg$^{2+}$, Cl$^-$. At high ion concentrations, it is possible that these ions bind at certain sites in the ligand, receptor, or ligand-receptor complex with sufficient probability that they must be included in the thermodynamics. If so the exact same equations and considerations outlined above for H$^+$ also apply to, e.g. Cl$^-$ and pCl$^-$ (computed from the specified buffer concentration) is used instead of pH.

This work is licensed under a Creative Commons Attribution 4.0