## Saturday, December 27, 2014

### Predicting binding free energies with electronic structure theory: thermodynamic considerations - Part 1

2015.01.25:  I have summarized discussion of this and related issues in this paper.  I have also made some changes in the post to reflect what I have learned since I wrote it.

Predicting absolute binding free energies of biologically relevant molecules (in aqueous solution at pH 7 using electronic structure theory without empirical adjustments to within 1 kcal/mol) is one of the holy grails of computational chemistry. Recent work by Grimme and co-workers (here and here) have come close with mean absolute deviations of about 2 kcal/mol for host-guest complexes. This and other work has shed some light on why it is so difficult to predict binding free energies in aqueous solution, which I will discuss here and in subsequent posts. In addition I'll talk about further improvements that could potentially increase the accuracy further.

The approach is based on the following equation

$\Delta G^\circ = \Delta E + \Delta G^\circ_{\mathrm{RRHO}}+\Delta \delta G_{\mathrm{solv}}$

where $\Delta E$ is the change in electronic energy and $\Delta G^\circ_{\mathrm{RRHO}}$ the change in the translational, rotational, and vibrational free energy computed using the rigid-rotor and harmonic oscillator approximation, respectively. The vibrational frequencies are computed at a lower level of theory compared to that used to compute $\Delta E$. $\Delta \delta G_{\mathrm{solv}}$ is the change in solvation free energy computed using a polarizable continuum model such as COSMO.

Electronic energy
Grimme has shown that dispersion typically makes a very big (>10 kcal/mol) contribution to binding free energies of host-guest complexes. Dispersion corrections are therefore a must if DFT is used to compute the electronic binding energy. Furthermore, Grimme has shown that three-body dispersion makes a non-negligible (2-3 kcal/mol) contribution to the electronic binding energy.  For convergent methods this effect is only included in rather expensive methods that involve triple-excitations such as MP4 and CCSD(T).

Molecular Thermodynamics
The translational, rotational and vibrational thermodynamic contribution to the binding free energy is very large (>10 kcal/mol) and must be included for accurate results.  Some years ago there was a bit of confusion in the literature about whether the RRHO approach was appropriate for condenses phase systems, but Zhou and Gilson have clarified this beautifully.

Standard state.  Most electronic structure codes compute the RRHO energy corrections for an ideal gas, where the standard state is a pressure of 1 bar.  The standard state for solution is 1 M, so the free energy of a molecule X must be corrected accordingly

$G^\circ_{\mathrm{RRHO}}(\mathrm{X}) = G^{\circ,\mathrm{1 bar}}_{\mathrm{RRHO}}(\mathrm{X})-RT\ln (V^{-1})$

where $V$ is the volume of an ideal gas at 1 bar and temperature $T$. At 298K $-RT\ln (V^{-1})$ = 1.90 kcal/mol.  This correction is already included in the solvation energy $\delta G_{\mathrm{solv}}$

Symmetry. Many host molecules and some guest molecules are symmetric, which affects the rotational entropy through the symmetry number ($\sigma$) which is a function of the point group.

$S_{rot}=R\ln\left(\frac{8\pi^2}{\sigma}\left(\frac{2\pi ekT}{h^2}\right)^{3/2}\sqrt{I_1I_2I_3}\right)$

It can be very difficult to build large molecules with the correct point group and most studies use $C_1$ symmetry.  In this case the effect of symmetry must be added manually to the free energy

$G^\circ_{\mathrm{RRHO}}(\mathrm{X}) \rightarrow G^\circ_{\mathrm{RRHO}}(\mathrm{X})+RT \ln(\sigma_X)$

As an example, the popular host molecule corcubit[7] has $D_{7h}$ symmetry and a corresponding $\sigma$ value of 14, in which case the correction contributes 1.56 kcal/mol to the free energy at 298K.

Anharmonicity and low frequency modes. Host-guest complexes can exhibit very low frequency vibrations on the order of 50 cm$^{-1}$ or less, which tend to dominate the vibrational entropy contribution.  Grimme (and many others) have questioned whether the harmonic approximation is valid for such low frequency modes and this is an open research question.  The main problem is that it is very difficult to compute the vibrational entropy exactly.  Most methods for computing anharmonic vibrational are developed to obtain the 2 or 3 lowest energy states, but for very low frequency modes 10-20 states likely significantly populated at room temperature and therefore contribute to the entropy.

In the absence of theoretical benchmarks, comparison to experiment can prove constructive. Kjærgaard and co-workers have recently measured standard binding free energies for small gas phase compounds and compared them to CCSD(T)/aug-cc-pV(T+d) calculations.  For example, in the case of acetronitrile-HCl the measured binding free energy at 295K is between 1.2 and 1.9 kcal/mol, while the predicted value is 2.3 kcal/mol using the harmonic approximation.  Since the errors in $\Delta E$ and the rigid-rotor approximation presumably are quite low, this suggest and error in the vibrational free energy of at most 1.1 kcal/mol, despite the fact that the lowest vibrational frequency is only about 30 cm$^{-1}$.  Furthermore, the error can be reduced by 0.4 kcal/mol by scaling the harmonic frequencies by anharmonic scaling factors suggested by Shields and co-workers.  Similar results were found for dimethylsulfide-HCl.  So there are some indications that the harmonic approximation yields free energy corrections that are reasonable and that can be improved upon by relatively minor corrections.

Grimme has taken a different approach by arguing that low-frequency modes resemble free rotations and using the corresponding entropy term for low frequency modes.  This changes the RRHO free energy correction by 0.5 - 4 kcal/mol, depending on the system.

Imaginary frequencies. Low frequencies are especially susceptible to numerical error and it is not unusual to see 1 or 2 imaginary frequencies of low magnitude in a vibrational analysis of a host-guest complex.  Since imaginary frequencies are excluded from the vibrational free energy this effectively removes 1 or 2 low frequency contributions to the vibrational free energy. For example, a 30 cm$^{-1}$ frequency contributes about 1.7 kcal/mol to the free energy at 298K.

Imaginary frequencies resulting from a flat PES and numerical errors can often be removed by making the convergence criteria for the geometry optimization and electronic energy minimization more stringent and making the grid size finer in the case of DFT calculations. If the Hessian is computed using finite difference it is important to use double-differencing.  If all else fails, it is probably better to pretend that the imaginary frequency is real and add the corresponding vibrational free energy contribution.

Conformations. One of the main problems in computing accurate binding free energies is to identify the structures of the host, guest and (especially) the host-guest complex with the lowest free energy. Because both the RRHO and solvation energy contributions contribute greatly to the binding free energy change, simply finding the structure with the lowest electronic energy and computing the free energy only for that conformation is probably not enough.

The free energy for a molecule (X) with N conformations the standard free energy is

$G^\circ(\mathrm{X})= G^\circ _0(\mathrm{X})-RT\ln\left(\sum^N_{i=1} \left( 1+\exp{\left( -(G^\circ _i- G^\circ _0)/RT\right)} \right)\right)$

where $G^\circ _0(\mathrm{X})$ is the conformation with lowest free energy.  Conformations with free energies higher than 1.36 kcal/mol contribute less than 0.1 to the sum at 298K.  So a significant number of very low free energy structures is needed to make even a 0.5 kcal/mol contribution to the free energy.  Conformations related by symmetry should not be included here as their effects are accounted for in the rotational entropy (see above).  Also, conformationally flexible regions of the ligand or host that are unaffected by binding need not be explored since the effect on the binding free energy will cancel.