## Tuesday, December 1, 2015

### Computational Chemistry Highlights: November issue

The November issue of Computational Chemistry Highlights is out.

CCH is an overlay journal that identifies the most important papers in computational and theoretical chemistry published in the last 1-2 years. CCH is not affiliated with any publisher: it is a free resource run by scientists for scientists. You can read more about it here.

Table of content for this issue features contributions from CCH editors Steven Bachrach, Christoph Jacob and Jan Jensen:

Enhancing NMR Prediction for Organic Compounds Using Molecular Dynamics

Tetrabenzo[a,f,j,o]perylene: A Polycyclic Aromatic Hydrocarbon With An Open-Shell Singlet Biradical Ground State

Molecular Dynamics of the Diels–Alder Reactions of Tetrazines with Alkenes and N2 Extrusions from Adducts

Dual-Cavity Basket Promotes Encapsulation in Water in an Allosteric Fashion

Switching between Aromatic and Antiaromatic 1,3-Phenylene-Strapped [26]- and [28]Hexaphyrins upon Passage to the Singlet Excited State

A Local Variant of the Conductor-Like Screening Model for Fragment-Based Electronic-Structure Methods

Interested in more? There are many ways to subscribe to CCH updates.

Also, for your daily computational chemistry fix subscribe to Computational Chemistry Daily

## Sunday, November 22, 2015

### Quick way to share interactive molecular structures and animations

Here is a neat trick I learned from @gyges01 on Twitter.  Say you want to share an interactive molecular structure or animation with someone not well versed in structural visualization tools. Here's what you do.

1. Put your file on a server like figshare and copy the link. For example: http://files.figshare.com/1860026/ethen_01oIRC.cml
Unfortunately, a dropbox link, which would have been even easier, does not seem to work. See below

2. Combine this link with "http://chemapps.stolaf.edu/jmol/jmol.php?source=":
http://chemapps.stolaf.edu/jmol/jmol.php?source=http://files.figshare.com/1860026/ethen_01oIRC.cml

3. Email (or Tweet) the link (takes a minute to load) with instructions to start animation: right-click > animation > play

4. You can autostart the animation by uploading a "dummy" xyz file with the commands: http://chemapps.stolaf.edu/jmol/jmol.php?source=http://files.figshare.com/2367420/ethen_test.xyz

ethen_test.xyz looks like this:

1
jmolscript: load http://files.figshare.com/1860026/ethen_01oIRC.cml; animation on; animation mode palindrome;
h 0.0 0.0 0.0

2015.11.23 Update Twitter strikes again: @aminophen found the trick to making this work with Dropbox.  Change https://www.dropbox.com/s/xxx/file?dl=0 to http://dl.dropboxusercontent.com/s/xxx/file.  Example:
https://www.dropbox.com/s/t42dx22mxsioja6/ethen_test.xyz?dl=0 to http://dl.dropboxusercontent.com/s/t42dx22mxsioja6/ethen_test.xyz and use it like before: http://chemapps.stolaf.edu/jmol/jmol.php?source=http://dl.dropboxusercontent.com/s/t42dx22mxsioja6/ethen_test.xyz

## Sunday, November 1, 2015

### Computational Chemistry Highlights: October issue

The October issue of Computational Chemistry Highlights is out.

CCH is an overlay journal that identifies the most important papers in computational and theoretical chemistry published in the last 1-2 years. CCH is not affiliated with any publisher: it is a free resource run by scientists for scientists. You can read more about it here.

Table of content for this issue features contributions from CCH editors Steven Bachrach, David Bowler and Jan Jensen:

Stochastic Multiconfigurational Self-Consistent Field Theory

The accuracy of semi-local functionals

Interested in more? There are many ways to subscribe to CCH updates.

Also, for your daily computational chemistry fix subscribe to Computational Chemistry Daily

## Thursday, October 1, 2015

### Computational Chemistry Highlights: September issue

The September issue of Computational Chemistry Highlights is out.

CCH is an overlay journal that identifies the most important papers in computational and theoretical chemistry published in the last 1-2 years. CCH is not affiliated with any publisher: it is a free resource run by scientists for scientists. You can read more about it here.

Table of content for this issue features contributions from CCH editors Steven Bachrach and Jan Jensen:

Experimental Protein Structure Verification by Scoring with a Single, Unassigned NMR Spectrum

## Sunday, September 27, 2015

### Automatic data extraction from multiple GAMESS output files: an example

Jimmy ran many PM6-D3H+ geometry optimizations using different starting geometries.  Typing "grep EQUILIBRIUM *.log" showed that they had all converged, but how to extract the heat of formation on the optimized geometry?  This python program does the trick.  It finds the last heat of formation in each file or, more precisely, it finds the second to last "word" in the last line where the first word is HEAT.

Update 2015.10.03.
1. Apologies to +Jimmy Charnley Kromann  for the poor formulation.  Jimmy ran the calculations, I wrote the script. After the post came online he was immediately teased mercilessly by his so-called friends for writing such crappy code. That's on me.

2. As +Anders Steen Christensen pointed out in the comments this can be done in one line using Bash: "for f in *.log; do tac $f | grep -m1 HEAT; done" 3. If you want to do it in Fortran +Lars Bratholm suggested the following code This work is licensed under a Creative Commons Attribution 4.0 ## Tuesday, September 1, 2015 ### Computational Chemistry Highlights: August issue The August issue of Computational Chemistry Highlights is out. CCH is an overlay journal that identifies the most important papers in computational and theoretical chemistry published in the last 1-2 years. CCH is not affiliated with any publisher: it is a free resource run by scientists for scientists. You can read more about it here. Table of content for this issue features contributions from CCH editors Steven Bachrach and Jan Jensen: AFNMR: automated fragmentation quantum mechanical calculation of NMR chemical shifts for biomolecules Benzene-Fused Azacorannulene Bearing an Internal Nitrogen Atom Interested in more? There are many ways to subscribe to CCH updates. Also, for your daily computational chemistry fix subscribe to Computational Chemistry Daily This work is licensed under a Creative Commons Attribution 4.0 ## Sunday, August 9, 2015 ### A very brief introduction to the electron correlation energy RHF is often not accurate enough for predicting the change in energies due to a chemical reaction, no matter how big a basis set we use. The reason is the error due to the molecular orbital approximation $$\Psi ({{\bf{r}}_1},{{\bf{r}}_2}, \ldots {{\bf{r}}_N}) \approx \left| {{\phi _1}(1){{\bar \phi }_1}(2) \ldots {\phi _{N/2}}(N - 1){{\bar \phi }_{N/2}}(N)} \right\rangle \equiv \Phi$$ and the energy difference due to this approximation is known as the correlation energy. Just like we improve the LCAO approximation by including more terms in an expansion, we can improve the orbital approximation by an expansion, in terms of Slater determinants $$\Psi ({{\bf{r}}_1},{{\bf{r}}_2}, \ldots {{\bf{r}}_N}) \approx \sum\limits_{i = 1}^L {{C_i}{\Phi _i}({{\bf{r}}_1},{{\bf{r}}_2}, \ldots {{\bf{r}}_N})}$$ The “basis set” of Slater determinants$\{\Phi_i \}$is generated by first computing an RHF wave function$\{\Phi_0 \}$as usual, which also generates a lot of virtual orbitals, and then generating other determinants with these orbitals. For example, for an atom or molecule with two electrons the RHF wave function is$\left| {{\phi _1}{{\bar \phi }_1}} \right\rangle $and we have$K-1$virtual orbitals (${\phi _2}, \ldots ,{\phi _K}$, where$K$is the number of basis functions), which can be used to make other Slater determinants like$\Phi _1^2 = \left| {{\phi _1}{{\bar \phi }_2}} \right\rangle $and$\Phi _{11}^{22} = \left| {{\phi _2}{{\bar \phi }_2}} \right\rangle $(Figure 1). Figure 1. Schematic representation of the electronic structure of some of the determinants used in Equation 3 Conceptually (in analogy to spectroscopy), an electron is excited from an occupied to a virtual orbital:$\left| {{\phi _1}{{\bar \phi }_2}} \right\rangle$represents a single excitation and$\left| {{\phi _2}{{\bar \phi }_2}} \right\rangle $a double excitation. For systems with more than two electrons higher excitations (like triple and quadruple excitations) are also possible. In general $$\Psi \approx {C_0}{\Phi _0} + \sum\limits_a {\sum\limits_r {C_a^r\Phi _a^r} } + \sum\limits_a {\sum\limits_b {\sum\limits_r {\sum\limits_s {C_{ab}^{rs}\Phi _{ab}^{rs}} } } } + \ldots$$ The expansion coefficients can be found using the variational principle $$\frac{{\partial E}}{{\partial {C_i}}} = 0 \ \textrm{for all} \ i$$ and this approach is called configuration interaction (CI). The more excitations we include (i.e. increase L in Eq 2.12.1) the more accurate the expansion and the resulting energy becomes. If the expansion includes all possible excitations (known as a full CI, FCI) then we have a numerically exact wave function for the particular basis set, and if we use a basis set where the HF limit is reached then we have a numerically exact solution to the electronic Schrödinger equation! That’s the good news … The bad news is that the FCI “basis set of determinants” is much, much larger than the LCAO basis set (i.e.$L >> K$), $$L = \frac{{K!}}{{N!(K - N)!}}$$ where$N$is the number of electrons. Thus, an RHF/6-31G(d,p) calculation on water involves 24 basis functions and roughly$\tfrac{1}{8}K^4$= 42,000 2-electron integrals but a corresponding FCI/6-31G(d) calculation involves nearly 2,000,000 Slater determinants. Just like finding the LCAO coefficients involves the diagonalization of the Fock matrix, finding the CI coefficients (Ci) and the lowest energy also involves a matrix diagonalization. $$\bf{E} = {{\bf{C}}^t}{\bf{HC}}$$ where$\bf{E}$is a diagonal matrix whose smallest value ($E_0$) corresponds to the variational energy minimum. While the Fock matrix is a$K \times K$matrix, the CI Hamiltonian ($\bf{H}$) is an$L \times L$matrix. Just holding the 2 million by 2 million matrix for the water molecule using the 6-31G(d,p) basis set requires millions of gigabites! Clever programming and large computers actually makes a FCI/6-31G(d,p) calculation on$\ce{H2O}$possible, but FCI is clearly not a routine molecular modeling tool. Using, for example, only single excitations (called CI singles, CIS) $${\Psi ^{CIS}} = {C_0}{\Phi _0} + \sum\limits_a {\sum\limits_r {C_a^r\Phi _a^r} }$$ is feasible, however is doesn’t result in any improvement. The CIS Hamiltonian has three kinds of contributions $$\begin{split} & \langle \Phi _0\left| {\hat H} \right| \Phi_0 \rangle = E_{RHF}\\ \langle\Phi^{CIS}\left| {\hat H} \right| \Phi^{CIS} \rangle \rightarrow & \langle \Phi _0\left| {\hat H} \right| \Phi_a^r \rangle = F_{ar} = 0 \\ & \langle \Phi _a^r\left| {\hat H} \right| \Phi_a^r \rangle \end{split}$$ which means that when this matrix is diagonalized$E_0=E_{RHF}$. Thus CIS does not give us any correlation energy. However, CIS is not completely useless. The second lowest value of$\bf{E}$,$E_1$, represents the energy of the first excited state, at roughly an RHF quality. Thus, we need at least single and double excitations (CISD) $$\Psi \approx {C_0}{\Phi _0} + \sum\limits_a {\sum\limits_r {C_a^r\Phi _a^r} } + \sum\limits_a {\sum\limits_b {\sum\limits_r {\sum\limits_s {C_{ab}^{rs}\Phi _{ab}^{rs}} } } }$$ to get any correlation energy. However, in general including doubles already results in an$\bf{H}$matrix that is impractically large for a matrix diagonalization. CI, i.e. finding the$C_i$coefficients using the variational principle, is therefore rarely used to compute the correlation energy. Perhaps the most popular means of finding the$C_i$’s is by perturbation theory, a standard mathematical technique in physics to compute corrections to a reference state (in this case RHF). Perturbation theory using this reference is called Møller-Plesset pertubation theory, and there are several successively more accurate and more expensive variants: MP2 (which includes some double excitations), MP3 (more double excitations than MP2), and MP4 (single, double, triple, and some quadruple excitations). Another approach is called coupled cluster which has a similar hierarchy of methods, such as CCSD (singles and doubles) and CCSD(T) (CCSD plus an estimate of the triples contributions). In terms of accuracy vs expense, MP2 is the best choice of a cheap correlation method, followed by CCSD, and CCSD(T). For example, MP4 is not too much cheaper than CCSD(T), but the latter is much more accurate. In fact for many practical purposes it is rarely necessary to go beyond CCSD(T) in terms of accuracy, provided a triple-zeta or higher basis set it used. However, CCSD(T) is usually too computationally demanding for molecules with more than 10 non-hydrogen atoms. In general, the computational expense of these correlated methods scale much worse than RHF with respect to basis set size: MP2 ($K^5$), CCSD ($K^6$), and CCSD(T) ($K^7$). These methods also require a significant amount of computer memory, compared to RHF, which is often the practical limitation of these post-HF methods. Finally, it should be noted that all these calculations also imply an RHF calculation as the first step. In conclusion we now have ways of systematically improving the wave function, and hence the energy, by increasing the number of basis functions ($K$) and the number of excitations ($L\$) as shown in Figure 2.

Figure 2 Schematic representation of the increase in accuracy due to using better correlation methods and larger basis sets.

The most important implication of this is that in principle it is possible to check the accuracy of a given level of theory without comparison to experiment!  If going to a better correlation method or a bigger basis set does not change the answer appreciably, then we have a genuine prediction with only the charges and masses of the particles involved as empirical input.  These kinds of calculations are therefore known as ab initio or first principle calculations.  In practice, different properties will converge at different rates, so it is better to monitor the convergence of the property you are actually interested in, than the total energy.  For example, energy differences (e.g. between two conformers) converge earlier than the molecular energies. Furthermore, the molecular structure (bond lengths and angles) tends to converge faster than the energy difference.  So it is common to optimize the geometry at a low level of theory [e.g. RHF/6-31G(d)] followed by an energy computation (a single point energy) at a higher level of theory [e.g. MP2/6-311+G(2d,p)].  This level of theory would be denoted MP2/6-311+G(2d,p)//RHF/6-31G(d).

Finally, the correlation energy is not just a fine-tuning of the RHF result but introduces an important intermolecular force called the dispersion energy.  The dispersion energy (also known as the induced dipole-induced dipole interaction) is a result of the simultaneous excitation of at least two electrons and is not accounted for in the RHF energy.  For example, the stacked orientation of base pairs in DNA is largely a result of dispersion interactions and cannot be predicted using RHF.