## Tuesday, June 5, 2012

### Complete basis set limit extrapolation

The complete basis set (CBS) limit is not a basis set though it is often written as such, e.g. B3LYP/CBS.  Instead the CBS limit is an extrapolated estimate of a result obtained using an infinitely large (complete) basis set.  In principle this procedure removes any error due to the linear combination of atomic orbitals approximation, and any remaining disagreement with experiment is due to some other approximation such as the treatment of correlation.  For many properties the CCSD(T)/CBS value can be regarded as a numerically exact for all practical purposes, i.e. it is unlikely that any higher level of theory predict significantly better results.

The extrapolation is based on a minimum of three separate calculations with increasingly larger basis sets.  CBS limit extrapolation works only with basis sets designed specifically for the task, such as the correlation- or polarization-consistent basis sets, e.g. cc-pVxZ or pc-n.

The procedure is as follows: a given property $Y$ of interest (e.g. a relative energy, a frequency, or a bond length) is computed at a given level of theory (e.g. B3LYP) using at least three basis sets (e.g. cc-VDZ, cc-VTZ, and cc-VQZ.  These data points a then fit to an equation, the two most popular equations are given here

$Y(x)=Y_{CBS}+Ae^{-Bx}$  (1)

$Y(x)=Y_{CBS}+Ax^{-3}$    (2)

Here, $Y_{CBS}$ is the CBS limit we're after and $x$ is 2 for cc-pVDZ, 3 for cc-pVTZ, and so on. $x$ is also often written as $L_{max}$ (or $l_{max}$), which is the highest angular momentum included in the basis set.  For cc-pVDZ this means $d$ orbitals, which have an angular momentum of 2, so $x$ and $L_{max}$ are really the same.

Equation (1) contains three parameters ($Y_{CBS}$, $A$, and $B$) so a minimum of three different basis sets are needed to determine them.  While Equation (2) only has two parameters, a minimum of three data points are still needed for reliable results.

For some properties and correlation methods the use of the double-zeta basis set does not lead to a good fit, so calculations with pentuple-zeta basis sets are necessary.   There is some evidence that the pc-n basis set provides faster convergence to the CMS limit.  CBS limit extrapolation is computationally very demanding and is typically done on relatively small systems to provide benchmark values to test more efficient methods.

Acknowledgment: I thank Anders Christensen for providing me with key papers and with helpful discussion.

Unknown said...

For magnetic properties there are specialized basis sets, pcS-n for shielding constants and pcJ-n for spin-spin coupling constants, which offers some speed up, compared to Dunnings cc-pVxZ type basis sets.

Unknown said...

Anders, when you say "speedup" is in terms of walltime or in terms of how fast the results converge towards the complete basis set limit?

Geoff Hutchison said...

Casper, if you read the linked paper, you see that the # of steps is identical between cc-pV*Z and pc-n, but pc-n has fewer basis functions (esp. at 4Z and above) so the wall time will definitely be better.

andreew said...

Hi there!

How do you request B3LYP/CBS or CCSD(T)/CBS?

On Gaussian's website, CBS is not used in conjunction with a level of theory; it's a method all by itself like G3 or G4.

Are the pc-n basis sets any good for accurate thermochemistry?

Thanks!

Jan Jensen said...

I don't know if there is a way to request the CBS for a given method in Gaussian. Consider posting your question at ccl.net

There isn't in GAMESS. You have to do the calculations separately and do your own fitting.

I'm afraid I don't know whether pc-n is good for thermochemistry. Perhaps ccl.net again?

If you do get answers to this, a new comment here would be much appreciated.

G said...

Hi.

I almost sure it is not possible to request a CBS extrapolation for a given method on Gaussian. But if you decide to do the extrapolation yourself, I think it is worth mention that the HF energy and the correlation energy converge to the CBS limit in diferent ways, so both energies should be separated and extrapolated independently for correlated methods.

You are right, «CBS methods» are methods by themselfs which use the type of extrapolation mentioned in the post.

Nice post and good day to everybody.

Jan Jensen said...

Good point about the HF and MP2 energies. I hadn't thought about that.

Glad to hear you liked the post

G said...

Just a note: Professor Jack Simons talks about this in lecture 5 (Lectures in electronic structure theory by Jack Simons, this blog).

By the way, thank you for the lectures. I have been learning a lot, even though I lake some fundamentals.

G said...

«lack» not «lake» rrrrrr

Anonymous said...

I've used the pc-n basis set quite a lot for DFT calculations. They really do offer excellent convergence and hence accuracy for thermochemistry. I recommend the original papers by F. Jensen which compares them to other basis sets. In my own small unpublished benchmarks for some small maingroup molecules, the convergence is usually much more reliable with pc-2 (a balanced polarized triple-zeta basis set) than 6-311G(with lots of polarization functions) or cc-pVTZ or even def2-TZVPP. The main problem has been that they have not been available for all atoms, but it seems that Frank Jensen is still developing them; a recent paper makes them available for the third main group row: dx.doi.org/10.1063/1.3690460

Btw, ORCA can do automatic extrapolation to the basis set limit for various basis set families. See chapter 6.1.3.4 in the current version 2.9 manual.

When estimating CCSD(T)/CBS energies, most people indeed do a separate extrapolation for HF with the cc-pVnZ family (although using the pc-n family would make sense as well) or just use a large basis HF calculation (HF/5Z e.g.) as an estimate for HF/CBS and then do a 2- or 3-point extrapolation of MP2 correlation energies using e.g. cc-pVTZ&cc-pVQZ e.g. The CCSD(T) part of the energy is then estimated by calculating the (E-CCSD(T)/smallbasis - E-MP2/smallbasis) difference with a small basis set. This approach, made popular by Pavel Hobza and many others, allows one to estimate the CBS correlation energy with a relatively cheap method like MP2 instead of the very expensive method CCSD(T).

Btw, recent papers have examined different extrapolation techniques for the correlation methods. See e.g. http://link.aip.org/link/doi/10.1063/1.3613639
See also an interesting paper from Frank Neese's group about replacing MP2 with a CEPA as the cheap method in CCSD(T)/CBS estimations:http://pubs.acs.org/doi/abs/10.1021/jp302096v

Cheers, Ragnar Bjornsson

Jan Jensen said...

Excellent comment. Thanks very much!

Frank Jensen said...

Let me put a few extra cents into the pot.

If you (only) want the HF or DFT basis set limit results, you should use an exponential like eq. (1) at the top of the blog. It is a three-point formula, but the B-constant is close to 6 for all the cases I have seen so far. Setting it to 6 makes it a two-point extra polation, and it works reasonably well even with a DZP and TZP basis set.

For a correlated calculation, one should in principle extrapolate the HF energy by eq. (1) and the correlation energy (only) by eq. (2), and add the two extrapolated results. However, beyond a TZP basis set, the error in the correlation energy is so dominating that an extrapolation using eq. (2) on the total energy works just as fine. Note that results calculated by -F12 methods are likely to be different, but not much work has been done comparing different extrapolation formulas yet.

The cc-pVXZ basis sets are often the best when aiming for very accurate correlated results. I (of course) recommend the pc-n basis sets for assesing the basis set error for DFT and HF methods. We have shown that they provide a faster convergence (and thus smaller basis set error at a given level) than other alternatives, for a basis set size smaller than or comparable to other basis sets of similar quality. Only snag is that they are general contracted, and thus programs that only can handle general contraction by duplicating primitive functions, are somewhat inefficient. Even so, the added computational cost by these programs is usually made up for by the lower basis set error.

The pc-n are available for atoms up to Kr (3rd row transition metals are being submitted very soon), and elements K-Kr will all be availabel at the EMSL site in the near future, along with the already available H-Ar. They provide a fast and (usually) monotonic convergence for any energy related valence property, like geometry, vibrational frequencies, thermochemistry etc.

For the first two rows in the periodic table, the pcS-n and pcJ-n basis sets similarly allows a rigorous assesment of the basis set errors in NMR shielding and spin-spin coupling constants, and agin outperform other alternatives at a given level. They will be extended to the third row elements in not too distant future. For electric properties, like polarizabilities, the aug-pc-n basis sets are recommended, although not much has been done in terms of benchmarking.

These comments of course only deals with the basis set errors. Once they are under control, you need to worry about the erros in your favorite method (MP2, CCSD, CCSD(T) or any of the hundreds of DFT functionals....).

Jan Jensen said...

Tak, Frank!

Grant Hill said...

To add to what Frank wrote above, I've done some extrapolation work with F12 methods, I can't claim to have done this exhaustively but thought it was worth mentioning.

Firstly, using the cc-pVnZ-F12 basis sets and a CABS singles relaxation removes the need to perform any extrapolation of the HF energy (in practice). This is due to both the enlarged s&p core of the basis sets and CABS.

For the correlation energy, I have taken the approach of using the general two-point formula of Schwenke, E_CBS = (E_large - E_small) * F + E_small. Where F is some function (often just a real number). It is possible to show that this formula is the same as the oft-used expression of Helgaker and co-workers, for a particular value of F.

My co-workers and I have optimized values of F for MP2-F12 and CCSD-F12b (check the paper for details). It is recommended that a (T) contribution is extrapolated separately to the CCSD-F12b (the (T) in most common implementations is not explicitly correlated and hence converges towards the limit at a different rate). Although this work is limited to the first two rows, we see that a T,Q extrapolation with F12 gives results comparable to a conventional 5,6 extrapolation - close to the limit of what can realistically be expected.

Jan Jensen said...

Thanks Grant! Definitely worth mentioning. Thanks very much.