## Friday, August 28, 2009

### Get a half-life: From transition state to rate constant

[source]

In a previous posts I showed how to find the TS for hydrolysis of 3 at the RHF/3-21G level of theory. In this post I show how to compare the computed results with the experimentally observed half-life (t1/2) of 88 hours.

The half-life is connected to the the rate constant for the reaction (k) by (1)

The rate constant can be estimated from the activation free energy (ΔG) by transition state theory: (2)

(here the units are those of a unimolecular reaction). The activation free energy is the free energy of the TS relative to the reactant

ΔG = G(3TS) - G(3) = ΔE+ ΔGTRV

Which can be rewritten as the difference in electronic energy and the free energy correction due to translation, rotation, and vibration. I showed how to extract the RHF/3-21G electronic energy (-413.5161 Hartrees) and free energy correction (81.841 kcal/mol) for the 3TSa in a previous post.

In the screencast at the end of the post I show how to obtain the corresponding values (-413.6014 Hartrees and 82.889 kcal/mol) for 3. So,

ΔG(3TSa) = (-413.5161 - (-413.6014))*627.51 + (81.841 - 82.889)
= 52.5 kcal/mol

You can get the corresponding experimental number from the half life (88 hours) and equations (1) and (2): 25.1 kcal/mol and, yep, the RHF/3-21G activation free energy is way off. The corresponding barrier for 3Tsb (51.0 kcal/mol) is only a little better.

There could be many reasons why the activation free energy is so severely overestimated. Two of the most likely suspects are:

1) Level of theory. TS structures, and therefore the underlying wavefunctions, tend to be more complicated than the minima structures. Thus, larger basis sets and electron correlation methods may be needed to obtain accurate activation free energies. This tends to lower the energy of the TS more than the minima, thus lowering the barrier compared to lower levels, but this is not always the case.

2) A different reaction mechanism. What might seem the most straightforward path from reactants to products is not always the one with the lowest barrier. For example, in the case of amide hydrolysis a stepwise mechanism is also possible in which the hydroxyl hydrogen is transferred to the carbonyl O atom to create a (chiral) intermediate.

TSs corresponding to such a mechanism must also be found to see if they lead to lower activation free energies. If you followed the last few posts of finding TSs you now have all the skills needed to do that.

Here is the screencast that shows how to find the energies for 3:

I paste in the following set of keywords

\$contrl runtyp=optimize \$end
\$system mwords=125 \$end
\$statpt opttol=0.0005 nstep=50 hssend=.t. \$end
\$force nvib=2 \$end
\$contrl nzvar=1 \$end
\$zmat dlc=.t. auto=.t. \$end

In the RHF/3-21G optimization + frequency run I forgot to add

\$scf dirscf=.t. \$end

which would have speeded up the calculation, as I described in a previous post.

## Thursday, August 20, 2009

### GAMESS, memory and parallel

The screencast above is a repeat from the last post, where I computed the frequencies of a molecule at the RHF/3-21G level of theory (second screencast) and discovered that the amount of memory that GAMESS requests (1,000,000 words) was not enough. While GAMESS tells me how much it needs in the output file, it does so after a lot of computation, which is wasted because I have to start over.

In this post I discuss the basics of GAMESS memory requirements. The simplest case is when you are not running in parallel, i.e. the number of processors in GAMESSQ is set to 1, so I discuss this case first.

1. Memory is specified in \$system, and the default is 1 mega-word (i.e. 1,000,000 words):

\$system mwords=1 \$end

mword can only be integers (1, 2, 3, ..). (In the screencast above I used the older "memory" keyword but mwords is much more convenient.)

2. A word is 8 bytes, so the default memory request (1 mega-word) is a very modest 8 MB of RAM.

3. This is the maximum amount of memory GAMESS is allowed to use. Depending on the type of calculation GAMESS might use less. The option is there to avoid filling up the memory on the computer entirely, which will crash or freeze the computer.

4. Here is the simplest way to deal with memory: My current laptop computer has 2 GB of RAM, and I use it for other things while GAMESS is running, so I am willing to give GAMESS a maximum of roughly 1 GB of RAM. This translates to

1 GB ≈ 1,000 MB ≈ 125 mwords

\$system mwords=125 \$end
to all input files will allow me to run most GAMESS jobs that I would want to run on a laptop without ever getting a "not enough memory" error (of course if you have less memory you need to adjust mwords accordingly).

That's basically it. What follows is some more details that most casual users of GAMESS won't have to worry about.

6. Most common GAMESS runs will never get close to using 1 GB of RAM, meaning the memory is free to be used by other programs. The most common types of runs that potentially could use that much memory are frequency calculations using RHF and any kind of MP2 calculation. Slightly less common ones are TDDFT and NMR calculations. In GAMESS, DFT frequency calculations are done numerically, and do not require a lot of memory.

7. If you are in doubt whether you have reqeusted enough memory to perform a calculation it is possible to use GAMESS to check using

\$contrl exetyp=check \$end

This keyword will simulate an actual GAMESS run by skipping the most time consuming steps so it is very fast.

I show two examples (an RHF/3-21G frequency and MP2/6-31G(d) single calculations on 3TSa) in this screencast.

Parallel runs

8. Many desktop computers (and almost all larger computers) have more than one CPU (also known as cores). For example, my current laptop has one processor with 2 cores. Thus, I could make my GAMESS calculations go faster by specifying 2 processors when submitting with GAMESSQ (processor = core in GAMESS-speak). This means that two separate but related GAMESS calculations are running simultaneously, and this affects the memory request:

9. Most types of runs use replicated memory. This means that if mwords=1 but I ask for 2 processors, GAMESS will use a maximum of 2 mega-words. Thus, if I routinely use 2 processors when running on my laptop and want to impose a limit of 1 GB RAM, I should use

\$system mwords=63 \$end

10. The most common exception to this is MP2 calculations, where GAMESS uses both replicated (mwords) and distributed memory (memddi). Distributed memory is memory shared by the cores. If I specify memddi=100 and ask for 2 processors then 100 mega-words of RAM is distributed among the 2 processors (the simplest case is that each core gets 50 mega-words, but GAMESS figures that out for you).

So running on 2 processors with

\$system mwords=15 memddi=100 \$end
requests a total of (15 + 15 + 100 =) 130 mwords. You can use exetyp=check to figure out the optimum values of memory and memddi, as I show in this screencast.

The screencast shows how you can use 1 processor and

\$contrl exetyp=check \$end
\$system memddi=100 parall=.t. \$end

to check the memddi requirements. You need to make memddi large for this to work, but GAMESS will not use this memory during the check-run. Note that parall only "does something" for check runs: true parallel runs are specified by choosing more than 1 processors when you submit GAMESS.

You can get a complete list of runs that require memddi in Chapter 2 of the GAMESS documentation under the entry for \$system. Also, Chapter 5 has a more in-depth description of how memory is handled.

## Friday, August 14, 2009

### Finding a transition state: amide hydrolysis 2

In a previous post I found two TSs, called 3TSa and 3TSb, at the PM3 level of theory.

In this post I show how the PM3 structure and frequency information can be used as a starting point for finding 3TSa at the RHF/3-21G level of theory.

I use RHF/3-21G as an example of an expensive ab initio method where we want to keep the number of frequency calculation to a minimum, since it takes a long time to compute.

So I extract the PM3 optimized geometry from the output file of the TS search, and add to this the corresponding frequency information in the form of the \$HESS group from the .dat file.

The keywords I paste in look like this (many of these keywords have been described in a previous post):

\$scf dirscf=.t. \$end
\$statpt opttol=0.0005 nstep=50 \$end
\$contrl nzvar=1 \$end
\$zmat dlc=.t. auto=.t. \$end

Note the \$basis group is missing, but that I specify it using Avogadro. It's important to pick the basis set before pasting the remaining keywords in because Avogadro deletes some of the lines in the file when you select something in the menu.

Based on the PM3 guess, GAMESS finds the RHF/3-21G TS in 29 steps.

Actually, I can't be 100% sure the structure is a TS because I haven't verified that the structure has an imaginary frequency. Unlike with PM3 I don't automatically compute the frequencies at the end of the TS search (using hssend=.t.).

The reason is that it is a good idea to verify that the structure looks like a TS before committing CPU time to an expensive frequency calculations. As I showed in a previous post it can happen that TS searches end up find the reactants and products.

Luckily the TS structure has one, and only one imaginary frequency as you can see in this screencast

As you saw the default amount of memory that GAMESS requests (1,000,000 words) is insufficient for the RHF/3-21G frequency calculation, but it tells you how much it needs (~1,100,000 words). A "word" is 8 bytes, so we are talking about 10 MB here. I will discuss this issue in more details in a future post.

Finally, in principle it is possible to adjust the memory keyword in Avogadro (in the Advanced setup), but that option is currently not working.

Now we need to repeat this procedure for 3TSb. The procedure are exactly the same, so I haven't made a screencast. The TS search takes 36 steps.

## Monday, August 10, 2009

### GAMESSQ for everyone

When I made the post about GAMESSQ, I had no idea it was only being distributed for the Mac (together with the Mac binary) - I usually don't make posts on software that is not cross-platform.

However, now Brett Bode has made a general release of GAMESSQ. Here you can find a Windows executable and the source code, the later being the only current option for Linux users.

It is not inconceivable that one or two Linux binaries will be available soon. What flavor if Linux would be most useful to MolModBasic readers?

## Sunday, August 9, 2009

### Finding a transition state: amide hydrolysis I

In a previous post I showed how to find a TS for an Sn2 reaction involving a small molecule (it's a good idea to look at this post now if you haven't seen it before, and check out this post as well).

In this and future posts I discuss the steps involved in finding a TS for a reaction involving complicated molecules shown in the figure above (I have shown how to build some of the reactants in a previous post).

Say you want to find the TS for the hydrolysis of 4 and 5 at a relatively high level of theory. My advice is to start by finding the TS at a low level of theory (PM3) for a smaller representative reaction (hydrolysis of 3). This is the subject of today's post.

The basic procedure is the same as in the previous post:

1. I build a TS-like structure by constraining a key distance during an energy minimization.
2. I check whether the optimized geometry has an imaginary frequency that looks like the reaction coordinate
3. If it does, I use it as a starting point for the TS search.

In the previous example, one bond broke as the other one formed, so I simply forced one bond (the C-F bond) to form by constraining it to 2.0 Å and the optimization took care of the rest.

In the hydrolysis of 3, two bonds have to break (the C-N and O-H bonds) as two other bonds form (the C-O and N-H bonds), so what to constrain? In general I have found it a good idea to try to construct an intermediate structure, such as "3TSa" shown below.
Here the O-H bond is already broken and the C-O and N-H bonds are already formed. The question is now reduced to whether I should try to break the C-O or the C-N bond. I opted for the C-O bond which I start breaking by constraining it to 2.0 Å. The reason for choosing the C-O bond is that I assumed that the (potential) energy of 3 is lower than the products (because of ring strain) so the TS should be more product-like according to Hammond's postulate (you can of course check this computationally).

Why do I pick 2.0 Å? Well, you have to start somewhere and this is a nice round number (for bonds involving 3rd row atoms I would use 2.5 Å). As you see in the screencast below, 2.0 Å doesn't work but the TS search tells me how to change it, i.e. increase it to make it more reactant like as the TS search goes to products.

As you saw in the screencast I have pre-typed some text with keywords that I use again and again, and simply paste the text in the input file and alter it when necessary. For the constrained optimization the text looks like this:

\$contrl runtyp=optimize \$end
\$basis gbasis=pm3 \$end
\$statpt opttol=0.0005 nstep=50 hssend=.t. \$end
\$force nvib=2 \$end
\$contrl nzvar=1 \$end
\$zmat dlc=.t. auto=.t. ifzmat(1)=1,x,x fvalue(1)=2.00 \$end

and for the TS search the text looks like this.

\$basis gbasis=pm3 \$end
\$statpt opttol=0.0005 nstep=50 \$end
\$statpt hess=calc ihrep=5 hssend=.t. \$end
\$force nvib=2 \$end
\$contrl nzvar=1 \$end
\$zmat dlc=.t. auto=.t. \$end

You might also notice that I am being a "CPU slob" in the sense that I redo some computations I've already done. For example, I compute the frequencies for the optimized constrained geometry twice: at the end of the optimization and at the beginning of the TS search. That is because I am using PM3, which is such an efficient method that the computer can re-compute the freqeuncies in less time than it takes me to transfer the \$HESS group to the TS search input file (as shown in a previous screencast)! Similarly, when I find that 50 steps is insufficient and just restart with nstep=100 steps, which recomputes the first 50 steps. Good thing sloth is not a deadly sin.

Another advantage of constructing a hypothetical intermediate in this case is that you realize (hopefully) that the TS contains a chiral center, even though the reactants and products are achiral. So you have to look for two TSs, to see which one is lowest in energy.

The next screencast shows how I find 3TSb, which is made easier by the fact that I have a good idea of what the constraint value should be (2.32 Å) from the 3TSa geometry I found previously.

## Sunday, August 2, 2009

### Finding a transition state: an Sn2 reaction

Here is a screencast I made about finding a transition state for the reaction

F- + CH3Cl -> Cl- + CH3F

The general approach is as follows

1. Build the structure (F- + CH3Cl)
2. Optimize the geometry while constraining a key distance (between the F and C atom) to a certain value (2.0 Å)
3. Compute the frequencies for the optimized geometry to check for an imaginary frequency
4. If there is an imaginary frequency, use this geometry and frequency information as a starting point for the TS search

The distance constraint is imposed using

\$zmat ifzmat(1)=1,1,6 fvalue(1)=2.0 \$end

which is used in conjunction with the delocalized internal coordinates I discussed previously. 1,1,6 defines the distance (denoted with the first "1") between atoms 1 and 6. Note that this distance does not have be 2.0 Å in the guess geometry (though the closer it is to 2.0 Å the faster the optimization will converge).

I also show how the frequency information contained in the \$HESS group, found at the end of the .dat file that GAMESS produces along with the output file, can be transferred to the input file for the TS search, where hess=read is used in \$statpt. Any text editor can be used for this. Note that the hssend=.t. keyword has to be used to cause the frequencies to be computed at the end of the optimization. I have showed the use of this keyword in a screencast in a previous post.

The complete \$statpt groups looks like this

\$statpt opttol=0.0005 nstep=50 hess=read ihrep=5 hssend=.t. \$end

ihrep=5 causes the frequencies to be recomputed at every 5th step. This really helps convergence but is computationally expensive.

Note that there are no constraints imposed when searching for the TS.

The key question is of course why I chose the C-F distance and 2.0 Å for the constraint value. This is simply the first thing that popped into my head and it happened to work. If it hadn't, I would have tried some other values and, if none worked, I would have tried to the C-Cl distance. Future posts will show examples of trying difference distances.

Because you may have to try different constraints before succeeding, it is a good idea to find the TS at a cheap level of theory, such as PM3. The C-F distance in the PM3 TS structure (2.10 Å) should provide a good guess for TS searches at higher levels of theory.

I tried it for this reaction using RHF/3-21G, but with no luck. That is because I forgot the very first step in searching for a TS: make sure there is one! In the screencast below I show how a RHF/3-21G geometry optimization (with no constraints) starting from F- + CH3Cl leads to Cl- + CH3F, meaning that the former structure is not a minimum on the gas phase RHF/3-21G potential energy surface. So there is not much point in looking for the corresponding TS.