Relative binding free energy of two ligands to COX-2

In this tutorial, you will set up, run and analyse the calculation of the relative binding free energy of two ligands to the protein T4 Cyclooxygenase-2 (COX2).

To come back to the tutorial index, click here.

Prerequisites

lig1.pdb - the structure of the first ligand molecule in PDB format
lig3.pdb - the structure of the second ligand molecule in PDB format
protein.pdb - the structure of the COX2 protein in PDB format

Both the ligands and protein need to be prepared before they are used with ProtoMS. For instance, you must make sure that they are correctly protonated. You can read more about setting up structures here.

Simple setup — dual-topology simulation

Setup

The simplest way to setup the simulation is by typing the following

python $PROTOMSHOME/protoms.py -s dualtopology -l lig1.pdb lig3.pdb -p protein.pdb

this sets up one simulation for the bound leg and one for the free leg of the relative free energy cycle.

The simulations will run 5 m equilibration steps and 40 m production steps for each of the 16 λ-values. Output will be printed to files every 100 k moves and Hamiltonian replica-exchanges between neighbouring λ-values will be attempted every 100 k moves.

the -s dualtopology argument tells the script to setup a dual topology free energy simulation and the -p and -l arguments specify the protein and the ligands respectively.

You can read more about the files that the setup script creates further down on this page.

You can visualise the systems that will be simulated with (for instance) VMD:

vmd -m lig1.pdb lig3.pdb lig1_box.pdb
vmd -m lig1.pdb lig3.pdb protein_scoop.pdb water.pdb

Execution

To run the bound and free legs of the simulation you need to execute

mpirun -np 16 $PROTOMSHOME/protoms3 run_bnd.cmd
mpirun -np 16 $PROTOMSHOME/protoms3 run_free.cmd

This is most conveniently done on a computer cluster. The calculations will take approximately 48 h to complete.

Analysis

When the simulations are finished you need to analyse the simulation and extract the free energy of the bound and free legs. We will start with looking at some energy series to investigate whether the simulations are converged. Type

python $PROTOMSHOME/tools/calc_series.py -f out_free/lam-0.000/results

to look at time series for the free leg at λ=0.0. In the wizard that appears you can type total followed by enter twice. This will plot the total energy as a function of simulation snapshot. It should look something like this:

the program will also give you some indication of the equilibration of this time series. In this case, you should see equilibration fairly early in the simulation. Thus, we have to discard very little of the simulation when we compute the free energy.

Next, we will analyse how effective the λ replica exchange was. Type

grep "lambda swaps" out_free/lam-0.000/info

and this will give you the overall acceptance ratio of the λ swaps. It should be around 50% for this perturbation, indicating that the overall effectiveness is acceptable. Next, we will look at individual replicas and see how they exchange λ-value throughout the simulation. This can be done with

python $PROTOMSHOME/tools/calc_replicapath.py -f out_free/lam-*/results -p 0.0 0.2 0.4 0.6 0.8 1.0 -o replica_path_free.png

to plot the path of the first few λ-values. It should look something like this.

as is clear, these replicas were not able to traverse the full λ-space within the simulation time. Also the replicas starting with λ=1.0 or 0.0 did not exchange very much at all. Hence, the replica exchange was not fully efficient.

Now, we will estimate the free energy. We will do this both with thermodynamic integration, Bennett's acceptance ratio (BAR) and multistate BAR (MBAR). To do this you can type

python $PROTOMSHOME/tools/calc_dg.py -d out_free/ -l 0.025

A typical result is between 0 and 1 kcal/mol. Notice that only 10 snapshots are removed from the 400 snapshots (2.5 % of the simulation) when calculating the free energy, because the analysis with calc_series.py indicated that the simulation was equilibrated very much from the start.

It is important to study the gradient of the TI calculation. It should be smooth in order for the TI to work properly. For the free leg it should look something like this

Now, we need to repeat the analysis for the bound leg. You will notice that the equilibration of this simulation is much slower. The computed free energy should be between 1 and 2 kcal/mol (depending on TI or BAR and how much equilibration is removed).

This gives a relative binding free energy of about +0.5 kcal/mol.

Single-topology simulation

The calculation above is dual-topology, which means that the two ligands are simulated simultaneously. There is an alternative methodology that is called single-topology where only a single ligand is simulated but the force field of this ligand is perturbed. This methodology has some advantages when the difference between the two ligand is small such as in our case. Therefore, we will also setup a single topology simulation.

Setup

In the single-topology simulation it is important to know which atoms on the first ligand should be perturbed into which atoms on the second. If the ligands differ in the number of atoms, as is typical, some atoms need to be perturbed into dummy atoms. Therefore, it is necessary that the larger ligand is perturbed into the small one. In our case, this means that we will perturb ligand 3 into ligand 1. If we look at the structures of the ligands, we see that the difference is in a methyl group; ligand 3 has a methyl group whereas ligand 1 has only a hydrogen atom. Therefore, we will perturb the methyl carbon to the hydrogen in ligand 1 and the hydrogens in the methyl group will be perturbed to dummy atoms. This is sketched below

              H   H         H   DU
              |  /          |  /
Ligand 3 :  R-C-C--H  ==> R-C-H--DU  : Ligand 1
λ=0.0         |  \          |  \       λ=1.0    
              H    H        H    DU

To setup this simulation type:

python $PROTOMSHOME/protoms.py -s singletopology -l lig3.pdb lig1.pdb -p protein.pdb

during the setup you will be prompted to type in the atoms in ligand 1 that correspond to atoms in ligand 3. For C38 you should type H38 and for the 3 methyl hydrogens H43, H41, H42 you can simlply leave the line blank to indicate that they should be perturbed to dummy atoms.

As with dual-topology, this sets up simulations for the bound leg and the free leg of the relative free energy cycle.

Execution

The setup above will produce several ProtoMS command files. You have a choice to split up the perturbation in an electrostatic and van der Waals perturbation, or do the full perturbation in one step. The former can be advantageous if the perturbation is complicated. In addition to create input files for the bound and free leg, the setup will also setup command files for the perturbation in gas-phase. This calculation is of interest if you want to compute relative hydration free energy, or if you want to debug your perturbation. To run the one-step perturbations you need to execute

mpirun -np 16 $PROTOMSHOME/protoms3 run_comb_bnd.cmd
mpirun -np 16 $PROTOMSHOME/protoms3 run_comb_free.cmd
mpirun -np 16 $PROTOMSHOME/protoms3 run_comb_gas.cmd

This is most conveniently done on a computer cluster. The calculations will take approximately 48 h to complete.

Analysis

When the simulations are finished you need to analyse the simulation and extract the free energy of the bound and free legs. This procedure is the same as for the dual-topology simulation as above. If you decide to split up the perturbation, you need to do the analysis for both the electrostatic and van der Waals perturbation. For the gas-phase simulation, you can usually skip analysis of the equilibration, as this leg equilibrates very quickly.

Exploring more options

By running protoms.py as above, you accept the standard number of λ-values, simulation length etc. The values of these parameters were chosen from experience and should work for most systems. However, there are situations when you may want to do something else. Here, we will go through some of the many available options. To know about other options you have to read the manuals for the tools or the overall MC program. You might also have to set up your system by executing the individual tools separately.

Running longer simulations

There are two arguments that you can invoke to run a longer simulation

--nequil - this controls the number of equilibration steps
--nprod - this controls the number of production steps

by typing for instance

python $PROTOMSHOME/protoms.py -s dualtopology -l lig1.pdb lig3.pdb -p protein.pdb --nequil 10E6 --nprod 50E6

you will run 10 m equiliration steps and 50 m production steps (instead of the 5 m and 40 m that is default)

Running with more λ-values

The argument that controls the number of λ-values is called --lambdas.

by typing for instance

python  $PROTOMSHOME/protoms.py -s dualtopology -l lig1.pdb lig3.pdb -p protein.pdb --lambdas 24

you will initiate 24 λ-values rather than default 16. You can also give individual λ-values to the argument. For instance

python $PROTOMSHOME/protoms.py -s dualtopology -l lig1.pdb lig3.pdb -p protein.pdb --lambdas 0.000 0.033 0.067 0.133 0.200 0.267 0.333 0.400 0.467 0.533 0.600 0.667 0.733 0.800 0.867 0.933 0.967 1.000

will add two new λ-values at 0.033 and 0.967 to the 16 created by default.

Running independent repeats

Usually it is wise to run several independent repeats of your calculation to check for convergence and to obtain a good estimate of the statistical uncertainty. The argument that controls the number of independent repeats is called --repeats or just -r.

by typing for instance

python $PROTOMSHOME/protoms.py -s dualtopology -l lig1.pdb lig3.pdb -p protein.pdb -r 5

you will create 5 input files for the bound leg and 5 input files for the free leg. Therefore, you also need to execute ProtoMS 10 times with the different input files. The output will be in 10 different folders, e.g. out1_bnd and out2_bnd.

Running with temperature replica-exchange

To improve sampling, one could add temperature ladders to the simulation and perform replica-exchange swaps between neighboring values of temperatures. To add eight additional temperatures to λ-values 0 and 1.0, add the following lines to the command file(s).

lambdaladder temperature 25.0
temperaturere 1E5 25.0 27.0 30.0 34.0 39.0 45.0 52.0 60.0 69.0
temperatureladder lambda 0.00 1.00

However, it should be noted that the selection of temperatures is a non-trivial task and it is likely to be system dependent. The above lines should therefore be regarded as a suggestion. ProtoMS then needs to be executed with additional nodes, for instance

mpirun -np 32 $PROTOMSHOME/protoms3 run_bnd.cmd

Files created by the setup script

Files describing the ligands

lig1.prepi and lig3.prepi = the z-matrix and atom types of the ligands in Amber format
lig1.frcmod and lig3.frcmod = additional parameters not in GAFF
lig1.zmat and lig3.zmat = the z-matrix of the ligands used to sample them in the MC simulation
lig1.tem and lig3.tem = the complete template (force field) file for the ligands in ProtoMS format

You can read more about the setup of ligands here.

Files describing the protein

protein_scoop.pdb = the truncated protein structure

You can read more about the setup of proteins here.

Simulation specific files

lig1_box.pdb = the box of water solvating ligand 1 in the free leg simulation
lig3_box.pdb = the box of water solvating ligand 3, this box will not be used
water.pdb = the cap of water solvating the protein-ligand system in the bound leg simulation

For the dual-topology simulations the following files are created:

li1-li3.tem = the combined template file for the two ligands, used only in this simulation
run_bnd.cmd = the ProtoMS command file for the bound-leg simulation
run_free.cmd = the ProtoMS command file for the free-leg simulation

For the single-topology simulations the following files are created:

li3-li1_ele.tem = the template file for electrostatic perturbation, used only in this simulation
li3-li1_vdw.tem = the template file for van der Waals perturbation, used only in this simulation
li3-li1_comb.tem = the template file for one-step (combined) perturbation, used only in this simulation
single_cmap.dat = the atom correspondence map
run_comb_bnd.cmd, run_ele_bnd.cmd and run_vdw_bnd.cmd = the ProtoMS command files for the bound-leg simulation
run_comb_free.cmd, run_ele_free.cmd and run_vdw_free.cmd = the ProtoMS command file for the free-leg simulation
run_comb_gas.cmd, run_ele_gas.cmd and run_vdw_gas.cmd = the ProtoMS command file for the gas-phase simulation

You can read more about the command files in the ProtoMS manual. However, some sections of it are worth mentioning here

lambdare 100000 0.000 0.067 0.133 0.200 0.267 0.333 0.400 0.467 0.533 0.600 0.667 0.733 0.800 0.867 0.933 1.000

this section, which exists in all input file setup the λ-replica exchange. You can add more λ-values manually if there are regions where the simulation is not performing well.

In the command files for dual-topology simulations, the following lines exist

dualtopology1 1 2 synctrans syncrot
softcore1 solute 1
softcore2 solute 2
softcoreparams coul 1 delta 0.2 deltacoul 2.0 power 6 soft66

this section, which exists in both input files, sets up the dual topology simulation with appropriate soft-core parameters. It could be worth trying to optimize the soft-core parameters if your simulation is not performing well.

Setting up the simulation with individual tools

In this advanced section we will go through the setup of the simulations step-by-step using individual setup scripts rather than protoms.py

Setting up the free leg

We will start setting up the free-leg simulation. However, it should be noted that some of the files created in this section will also be used in the bound-leg simulation.

First we need to make sure that the very first line of lig1.pdb contains a directive, telling ProtoMS the name of the solute. The line should read HEADER LI1 and can be added by typing

sed -i "1iHEADER LI1" lig1.pdb

Thereafter, we will create the force field for the ligand using AmberTools. The force field will be GAFF with AM1-BCC charges. Type

python $PROTOMSHOME/tools/ambertools.py -f lig1.pdb -n LI1

and this will execute the AmberTools programs antechamber and parmchck, creating the files lig1.prepi and lig1.frcmod, respectively.

These files are in Amber format and in order to use them in ProtoMS we need to reformat them into a ProtoMS template file. This file will also contain a z-matrix that describes how the ligand will be sampled during the simulation. To do this, you can type

python $PROTOMSHOME/tools/build_template.py -p lig1.prepi -f lig1.frcmod -o lig1.tem -n LI1

this will create the files lig1.tem containing the ProtoMS template file and lig1.zmat. It is a good idea to check these files to see if the script has defined the molecule properly.

The steps above now need to be repeated for ligand 3.

Now we should have all the force field files for the two ligands. The next step is to solvate them. Start by concatenating the two PDB files

cat lig1.pdb lig3.pdb > temp.pdb

and then we will create a water box of TIP4P water molecules around this pdb file. Type

python $PROTOMSHOME/tools/solvate.py -b $PROTOMSHOME/data/wbox_tip4p.pdb -s temp.pdb -o lig1_box.pdb

this will solvate both ligands using standard settings, i.e. it will be 10 A between the solute and the edge of the box. A pre-equilibrated box of TIP4P water molecules located in $PROTOMSHOME/data/ is used. The box is written to the file lig1_box.pdb.

The next step differ if we want to setup a dual-topology or single-topology simulation. We will start with the dual-topology setup.

We need to combine the template files for the two ligands using

python $PROTOMSHOME/tools/merge_templates.py -f lig1.tem lig3.tem -o li1-li3.tem

creating li1-li3.tem.

For single-topology, we need to create template files for the electrostatic and van der Waals perturbation. This is done by

python $PROTOMSHOME/tools/make_single.py -t0 lig3.tem -t1 lig1.tem -p0 lig3.pdb -p1 lig1.pdb -o li3tli1

Just as when using protoms.py you will be prompted to type in the corresponding atoms in the two molecules. This tool will create li3-li1_ele.tem, li3-li1_vdw.tem, li3-li1_comb.tem, and li3-li1_cmap.dat

Now we have all the files to run the free leg of the simulation. The input file for ProtoMS will be created when we have prepared the bound leg.

Setting up the bound leg

First, we will change the name of atom and residue names in the protein PDB file such that they agree with the ProtoMS naming convention. Type

python $PROTOMSHOME/tools/convertatomnames.py -p protein.pdb -o protein_pms.pdb -s amber -c $PROTOMSHOME/data/atomnamesmap.dat

The converted structure will be in protein_pms.pdb. This execution assumes that the Amber naming convention is used in protein.pdb.

We have crystallographic waters in the protein structure and we need to convert them to the water model we will be using in the simulation (TIP4P). This can be done by

python $PROTOMSHOME/tools/convertwater.py -p protein_pms.pdb -o protein_pms_t4p.pdb

creating protein_pms_t4p.pdb.

Next step is to truncate the protein, creating a scoop. This will enable us to solvate the protein in a droplet and thereby reduce the number of simulated particles. The command is

python $PROTOMSHOME/tools/scoop.py -p protein_pms_t4p.pdb -l temp.pdb -o protein_scoop.pdb

The protein scoop is centred on the two ligand molecules and all residues further than 20 A are cut away. The scoop is written to protein_scoop.pdb

As a final step, we will solvate the protein and ligand in a droplet of TIP4P water molecules. To do this, type:

python $PROTOMSHOME/tools/solvate.py -b $PROTOMSHOME/data/wbox_tip4p.pdb -s temp.pdb -pr protein_scoop.pdb -o water.pdb -g droplet

this will create a droplet with 30 A radius centred on the benzene molecule. The droplet is written to water.pdb

The solvate.py script adds the crystallographic waters from the scoop to the droplet. Therefore, we need to remove them from the scoop PDB-file.

sed -i -e "/T4P/d" -e "/TER/d" protein_scoop.pdb

Now we have all the files need to run the simulation. As you noticed, this step-by-step procedure create a few files that protoms.py does not generate.

Making ProtoMS input files

To make the dual-topology input files for ProtoMS type

python $PROTOMSHOME/tools/generate_input.py -s dualtopology -p protein_scoop.pdb -l lig1.pdb lig3.pdb -t li1-li3.tem -pw water.pdb -lw lig1_box.pdb -o run

creating run_bnd.cmd and run_free.cmd

and for single-topology input files type

python $PROTOMSHOME/tools/generate_input.py -s singletopology -p protein_scoop.pdb -l lig3.pdb -t li3tli1_ele.tem -pw water.pdb -lw lig1_box.pdb -o run_ele --outfolder out_ele
python $PROTOMSHOME/tools/generate_input.py -s singletopology -p protein_scoop.pdb -l lig3.pdb -t li3tli1_vdw.tem -pw water.pdb -lw lig1_box.pdb -o run_vdw --outfolder out_vdw
python $PROTOMSHOME/tools/generate_input.py -s singletopology -p protein_scoop.pdb -l lig3.pdb -t li3tli1_comb.tem -pw water.pdb -lw lig1_box.pdb -o run_comb --outfolder out_comb

creating run_ele_bnd.cmd, run_ele_free.cmd, run_ele_gas.cmd, run_vdw_bnd.cmd, run_vdw_free.cmd, run_vdw_gas.cmd, run_comb_bnd.cmd, run_comb_free.cmd and run_comb_gas.cmd