Absolute hydration free energy of 2-chlorosyringaldehyde

In this tutorial we will calculate the solvation free energy of an organic molecule, 2-chlorosyringaldehyde, in water. 2-chlorosyringaldehyde is a substituted benzene molecule with multiple functional groups:

2-chlorosyringaldehyde (L11)

You could imagine extending this process to other, drug-like small molecules or ligands of interest.

Prerequisites

The l11 molecule should be prepared and ready for use with ProtoMS. In this case we are only interested in evaluating the hydration free energy starting from a single, low-energy conformation. However, in many cases you might wish to include multiple starting conformations, or at least multiple repeats, in your estimates to ensure your simulations sample all relevant conformations. The usual caveats about setting up structures (protonation state etc.) also apply - you can read more here.

Simple setup

Setup
The simplest way to setup the simulation is by typing the following:
python $PROTOMSHOME/protoms.py -s dualtopology -l l11.pdb --absolute

this sets up a dual topology simulation of the molecule in a box of TIP4P water, perturbing between the full interaction of the molecule with its surroundings and the zero interaction of a dummy atom with its surroundings.

The simulation will run 5 m equilibration steps and 40 m production steps for each of the 16 λ-values. Output will be printed to files every 100 k moves and Hamiltonian replica-exchanges between neighbouring λ-values will be attempted every 200 k moves.

Why dual topology?

For these sorts of absolute free energy calculations (i.e. 'perturbing to nothing'), dual topology simulations will generally provide smoother free energy curves and more precise free energy estimates with smaller errors, particularly at the endpoints of the perturbation. They also have the added advantage that no gas-phase simulations are needed to complete the thermodynamic cycle (see the ProtoMS manual for theory details). Although in theory these calculations can also using a single topology approach, by default protoms.py will suggest you use dual topology. Single topology simulations can be set up using the individual scripts in the tools folder if you prefer.

Why --absolute?

The -s dualtopology argument tells the script to set up a dual topology free energy simulation and the --absolute argument tells the script that no second ligand exists and that the ligand specified by -l should be perturbed to a dummy particle (nothing). Without the --absolute, the script will expect you to input a second ligand!

How do I know if the setup has been successful?

Protoms.py will print out a series of information messages if successful, and error messages if there are any problems. For more detailed information there is also a log file (called protoms_py.log by default), which includes debug messages and defines the values of any command line flags you give to protoms.py, or their default values if you don't set any.

However, the safest way to check is to read over the created files and visualise the created system. You can read more about the files that the setup script creates further down on this page, and you can visualise the system that will be simulated with (for instance) VMD. Note the highlighted dummy atom position in green:

vmd -m l11.pdb l11_dummy.pdb l11_box.pdb
L11 set up ready for simulation

Execution
To run the simulation we need to execute:
mpirun -np 16 $PROTOMSHOME/protoms3 run_free.cmd
This is most conveniently done on a computer cluster. The calculations take approximately 7 h to complete using the Iridis4 system at the University of Southampton.

Equilibration analysis
When the simulations are finished we need to analyse the simulation and extract the free energy change associated with perturbing our ligand to nothing. All the results files are conveniently ordered by λ-window in the out_free folder. Let's start by evaluating whether the system is well equilibrated. We should first visualise our simulation to check nothing has gone wrong (open out_free/lam-0.000/all.pdb in VMD for example), but we can also use the calc_series.py script to plot the time-dependence of some obvious properties of interest. Type:
python $PROTOMSHOME/tools/calc_series.py -f out_free/lam-0.000/results
This will bring up a wizard to look at a choice of various time series of results from the 0.000 λ-window (you can also run calc_series.py -h to see how to define these choices from the command line). First of all, let's see how the total energy of the system changes with time. Type total followed by enter twice. This will plot the total energy as a function of simulation snapshot. It should look something like this:
Total energy at lambda=0.00
Note that the total energy decreases continually at the beginning of the simulation before levelling off. The calc_series.py script also gives you an extimate of when the series is 'equilibrated', i.e. statistically when it levels out (denoted by the dashed line on the plot). Why does the total energy take so long to equilibrate? Let's take a closer look at the contributions to the total. Open up calc_series.py -f out_free/lam-0.000/results again and type:
total
inter/solvent-solvent/sum
inter/l11-solvent/sum

And then choose option 5 for displaying the graphs on the same plot, normalised to the energy of the final snapshot. You should see something like this:

Contributions to total energy

Clearly the change in the total energy is dominated by the solvent-solvent interactions, while the ligand-solvent interactions remain fairly constant throughout. But what effect does this have on the forward & backward free energy differences, and hence the gradients for our TI calculations? Let's take a look, but this time we'll evaluate the trends as running averages and use the command line flags for calc_series.py:

python $PROTOMSHOME/tools/calc_series.py -f out_free/lam-0.000/results --average -s forwfe gradient

This time use option 2 for displaying the graphs as sub-plots, as they have very different scales:

Gradient (blue) and forwards free energy (red) for lambda=0.000

So we can see that the free energy gradients seem fairly constant after around 100 snapshots. Hence we'll discard the first 100 snapshots in our free energy calculations. At this stage we're ready to begin the free energy analysis. We'll do this by Thermodynamic Integration (TI), Bennett's Acceptance Ratio (BAR) and multistate-BAR (MBAR), so we can compare whether the results are sensitive to the choice of estimator. Type:

python $PROTOMSHOME/tools/calc_dg.py -d out_free/ -l 0.25

(The -l flag says we should skip the first 25% of the simulation (100 snapshots) as we decided above). You'll see a table of the dV/dλ gradients followed by the total free energies for the perturbation calculated by TI, BAR and MBAR. Remember that the free energies given are for the forwards perturbation to nothing. The hydration free energy is the transfer free energy from vacuum to solvent - to opposite of what we've just simulated. So all we need to do is reverse the sign!

We obtained a value of around -9.3 kcal/mol with TI, -8.4 kcal/mol with BAR and -8.4 kcal/mol with MBAR. The experimental hydration free energy for 2-chlorosyringaldehyde is -7.8 kcal/mol, so although we are fairly close it seems that our computed free energy is slightly dependent on what estimator we choose...but to say for sure we would first have to run more repeats, or try out different simulation conditions.


Exploring more options

Above, we've simply chose to set up our simulation with the default options chosen by protoms.py. If you type:
python $PROTOMSHOME/protoms.py --fullhelp
...you can see that there are many different options you can define manually to explore your simulation further! A few examples are given below.
Running longer simulations
There are two arguments that you can invoke to run a longer simulation by typing for instance
python $PROTOMSHOME/protoms.py -s dualtopology -l l11.pdb --absolute --nequil 10E6 --nprod 50E6
you will run 10 m equilibration steps and 50 m production steps (instead of the 5 m and 40 m that is default)

Running with more λ-values
Earlier we noted our calculated free energies might be dependent on the estimator chosen. How might we decide whether the λ-window spacing is responsible? We can start by tracing the λ-windows visited by each initial replica, e.g.
python $PROTOMSHOME/tools/calc_replicapath.py -f out_free/lam-*/results -p 1.000 0.800 0.533 0.200 0.000

You'll see a plot something like this created as replica_path.png

Path of initial replicas

See how the replicas don't traverse the whole of λ-space? Perhaps we could run for longer, or perhaps we could add more windows, and see if the exchange was improved and whether this improved the agreement of our estimators. The argument that controls the number of λ-values is called --lambdas.

By typing (for instance):

python $PROTOMSHOME/protoms.py -s dualtopology -l l11.pdb --absolute --lambdas 24
you will initiate 24 λ-values rather than default 16. You can also give individual λ-values to the argument. For instance:

python $PROTOMSHOME/protoms.py -s dualtopology -l l11.pdb --absolute --lambdas 0.000 0.033 0.067 0.133 0.200 0.267 0.333 0.400 0.467 0.533 0.600 0.667 0.733 0.800 0.867 0.933 0.967 1.000

will add two new λ-values at 0.033 and 0.967 to the 16 created by default.

Running independent repeats
Usually it is wise to run several independent repeats of your calculation to check for convergence and to obtain a good estimate of the statistical uncertainty. The argument that controls the number of independent repeats is called --repeats or just -r.

by typing for instance:

python $PROTOMSHOME/protoms.py -s dualtopology -l l11.pdb --absolute -r 5
you will create 5 input files that will create 5 output folders. But remember, you also need to execute ProtoMS 5 times with the different input files. The output folders will be named e.g. out1_free,out2_free...

Running with a different water model

By default, protoms.py solvates systems using the TIP4P water model. Simulations can also be performed with the TIP3P model by using the --watmodel flag:

python $PROTOMSHOME/protoms.py -s dualtopology -l l11.pdb --absolute --watmodel tip3p
Alternatively you may wish to solvate the solute using a different pre-equilibrated water box, or create a box of different dimensions. This can be done using the individual solvate script in the tools section.


Files created by the setup script

Files describing the solute
Simulation specific files
You can read more about the ProtoMS input files in the ProtoMS manual. However, some sections of it are worth mentioning here:

dualtopology1 1 2 synctrans syncrot
softcore1 solute 1
softcore2 solute 2
softcoreparams coul 1 delta 0.2 deltacoul 2.0 power 6 soft66

this section sets up the dual topology simulation with appropriate soft-core parameters. It could be worth trying to optimize the soft-core parameters if your simulation is not performing well.

lambdare 100000 0.000 0.067 0.133 0.200 0.267 0.333 0.400 0.467 0.533 0.600 0.667 0.733 0.800 0.867 0.933 1.000

this section sets up the λ-replica exchange every 100,000 steps. You can add more λ-values manually if there are regions where the simulation is not performing well.

Setting up the simulation with individual tools

In this advanced section we will go through the setup of the simulations step-by-step using individual setup scripts rather than protoms.py
Setting up the solute
We will start setting up the solute.

  • First we need to make sure that the very first line of the l11.pdb contains a directive, telling ProtoMS the name of the solute. The line should read HEADER L11 and can be added using whichever editor you feel most comfortable with. For example:

    sed -i "1iHEADER L11" l11.pdb

  • Thereafter, we will create the force field for the l11 molecule using AmberTools, the GAFF force field and AM1-BCC charges. Type:

    python $PROTOMSHOME/tools/ambertools.py -f l11.pdb -n L11

    This will execute the AmberTools programs antechamber and parmchk, creating the files l11.prepi and l11.frcmod, respectively. If the ligand was charged, we could specify that here using the -c flag.

  • These files are in Amber format and in order to use them in ProtoMS we need to reformat them into a ProtoMS template file. This file will also contain a z-matrix that describes how l11 will be sampled during the simulation. To do this, you can type

    python $PROTOMSHOME/tools/build_template.py -p l11.prepi -f l11.frcmod -o l11.tem -n L11

    This will create the files l11.tem (the ProtoMS template file) and l11.zmat (the ProtoMS z-matrix). It is a good idea to check the latter to see if the script has defined the molecule properly.

  • The next thing we will do is to solvate the solute in a box of TIP4P water molecules. Type:

    python $PROTOMSHOME/tools/solvate.py -b $PROTOMSHOME/data/wbox_tip4p.pdb -s l11.pdb -o l11_box.pdb

    As standard this will create a box of mimimum 10 A distance between the solute and the edge of the box. The padding of the box from the solute can be increased using the --padding or -p option. Likewise a different pre-equilibrated box can be used by specifying the location using the -b flag. The output is written to the file l11_box.pdb.

  • Now we have setup l11, but to complete the setup we need the dummy particle that l11 will be perturbed into. This is created by typing:

    python $PROTOMSHOME/tools/make_dummy.py -f l11.pdb -o l11_dummy.pdb

    This creates l11_dummy.pdb that just contains the new particle.

  • Finally, we need to combine the template file of l11 with the template file of the dummy particle. Type

    python $PROTOMSHOME/tools/merge_templates.py -f l11.tem $PROTOMSHOME/data/dummy.tem -o l11-dummy.tem

    creating l11-dummy.tem. The template file of the dummy particle is located in $PROTOMSHOME/data/.

    Now we have all the files except the ProtoMS input file itself. As you may have noticed, this step-by-step procedure create a few files that protoms.py does not generate, but these often contain logs of the exact procedures performed, so they may be useful to you.

    Making ProtoMS input files
    To make the input files for ProtoMS type

    python $PROTOMSHOME/tools/generate_input.py -s dualtopology -l l11.pdb l11_dummy.pdb -t l11-dummy.tem -lw l11_box.pdb --absolute -o run

    creating run_free.cmd.