Grand canonical alchemical perturbations (GCAP) of two ligands to Scytalone Dehydratase

In this tutorial, you will set up, run and analyse the calculation of the relative binding free energy of two ligands to Scytalone Dehydratase (SD), using GCAP. The grand canonical ensemble allows for the dynamic sampling of hydration states of a protein-ligand complex, as the ligand changes.

To come back to the tutorial index, click here.


Both the ligands and protein need to be prepared before they are used with ProtoMS. For instance, you must make sure that they are correctly protonated. You can read more about setting up structures here. Here, the protein has been set up in advance, with correct protonation states and scooping.


Defining the grand canonical region
First, we are going to set up the GCMC region of the system. This is the region where grand canonical insertions and deletions are attempted. The region is user defined, and can be chosen depending on the region of interest. Here, we are interested in the displacement of a single active site water molecule on perturbation of the ligand. A cubic box with lengths of 4 Å will be set up over the region using the following command.

python $PROTOMSHOME/tools/ -b 26.1 13.2 34.9 -p 2.

26.1 13.2 34.9 is the centre of the box, with a padding of 2 Å around the center. Other options are possible, setting up the box around a given ligand, or with varying dimensions of the box. To explore the options of the script, use the -h flag. Note the output of this script.

Volume of GCMC box: 64.0
Bequil: -9.71

This is telling us the volume of the GCMC box is 64.0 ų. The Bequil is the Adams' value at which we need to simulate to correspond to equilibrium with bulk water. This is calculated using:

Bequil = βμ' + ln(Vgcmc/Vo)

Where β is thermodynamic beta, β' is the excess chemical potential of the water model. Vgcmc is the volume of the GCMC region (64.0 ų here) and Vo is the volume of a single water molecule in bulk (30.0 ų here). More details of these can be found in the publications at the bottom of the page.

It is a good idea to visualise this, even before all of the system is set up, to get an idea that the GCMC box is where you expect it to be.

vmd -m ONE.pdb THR.pdb protein_scoop.pdb gcmc_box.pdb

Single topology perturbations
Now to set up the alchemical perturbation, we will first set up the single-topology transformation. For single topology simulations we need to correspond all of the atoms in one ligand, either to atoms in the other ligand or to dummy atoms, and will be asked about this during the set up.
python $PROTOMSHOME/ -s gcap_single -sc protein_scoop.pdb -l THR.pdb ONE.pdb --gcmcbox gcmc_box.pdb --adams -9.71

gcap_single informs ProtoMS that we are performing a single-topology GCAP simulation. -sc protein_scoop.pdb is the pre-generated protein file appropriate for simulation. -l THR.pdb ONE.pdb are the two ligands that we are comparing. For single topology, the larger ligand needs to be written first, and ProtoMS will throw an error if the first ligand is smaller. The --adams -9.71 tells ProtoMS what Adams' value to simulate it. This is the value that was output using the script earlier. This will change depending on the volume of the GCMC box. --gcmcbox gcmc_box.pdb is the GCMC box file that we generated previously. If this flag is not used, and there is no gcmc_box.pdb file in the directory, ProtoMS will set up a GCMC box, that will cover the entire ligand, with a padding of 2 Å. ProtoMS will try work out the atom mapping between the two ligands, but for the C-H group that is being perturbed, user input is required. You will see:

These are the distances (A): 
              C8     H29
     N19   0.000   1.078

Enter the corresponding atom for C8:

Type in N19 and press enter. Atoms C8 of ligand THR.pdb, and N19 of ONE.pdb are 0.000 Å apart. Next you will see:

Enter the corresponding atom for H29:

Now there is no atom in ligand ONE.pdb that corresponds to this, so we want to perturb it to a dummy atom. Press enter, and ProtoMS will complete its single-topology set up. This will generate several files, including a water.pdb and a water_clr.pdb file. The water_clr.pdb will be used in the simulation, as it has had any bulk water removed from inside the specified GCMC region. Various .cmd files are generated, which contain the simulation details. These will be for a two-step simulation, where the electrostatics and van der Waal's interactions are perturbed seperately, and comb files, where both are perturbed simultaneously. There are files generated for a gas phase, bulk solvent phase, and protein bound phase of each of these. These can be run using the following command.

mpirun -np 16 $PROTOMSHOME/protoms3 run_comb_free.cmd
mpirun -np 16 $PROTOMSHOME/protoms3 run_comb_gcap.cmd

This is most conveniently done on a computer cluster. The calculations will take approximately 12 h to complete. The gas phase simulations are required to calculate the relative free energy of solvation (see the tutorial on calculating relative free energy of solvation for details) and will not be looked at here. Here, only the comb simulations will be analysed, but feel free to simulate the vdw and ele simulations too.

Dual topology perturbations
In a single-topology simulation, one ligand is perturbed into another across an alchemical pathway. In dual-topology simulations, two ligands are simulated, but one is gradually turned off while the other is turned on. As single-topology requires atom mapping, some pairs of ligands can only reasonably be simulated using dual-topology. It is also easier to set up. Making a new directory, and copying the ligand files, protein_scoop.pdb and gcmc_box.pdb over, run:
python $PROTOMSHOME/ -s gcap_dual -sc protein_scoop.pdb -l THR.pdb ONE.pdb --gcmcbox gcmc_box.pdb --adams -9.71
This time you will not be asked for any user input, and only two files will be generated. run_free.cmd and run_gcap.cmd. Again, run using the following command.
mpirun -np 16 $PROTOMSHOME/protoms3 run_free.cmd
mpirun -np 16 $PROTOMSHOME/protoms3 run_gcap.cmd
As there are two copies of the ligand, these simulations are slower, and will take approximately 24 hours.
Surface GCAP simulations
These simulations have been performed at the equilibrium B value, -9.71, however it is possible to do a GCMC `titration' at each stage of the ligand perturbation. This is known as a surface-GCAP simulation. This can be performed with either the single or dual topology set up, but here single will be used.
python $PROTOMSHOME/ -s gcap_single -sc protein_scoop.pdb -l THR.pdb ONE.pdb --gcmcbox gcmc_box.pdb --adamsrange -17.71 -8.71 10 
This is the same command as before, and will run through the same set up as before, but now instead of using the --adams flag with the equilibrium B value, now we have used --adamsrange.

The simulation requires 16 λ values and 10 B values, which is 160 replicas. This can be run with the following command. These simulations require a lot of compute resources, and should be run on a suitable machine, with available disk space.

mpirun -np 160 $PROTOMSHOME/protoms3 run_comb_gcap.cmd


Once the simulations are finished, the results can be analysed. The free energy difference between the two ligands can be calculated using TI, BAR and MBAR. To calculate the perturbation from ligand THR to ligand ONE (this will depend on which ligand was written first in the set up), the free leg needs to be subtracted from the bound leg.
python $PROTOMSHOME/tools/ -d out_comb_free -l 0.25 --pmf --estimators mbar
python $PROTOMSHOME/tools/ -d out_comb_gcap -l 0.25 --pmf --estimators mbar
Try the above commands with a combination of 'ti', 'bar' and 'mbar' and see how consistent the different methods are.
out_comb_free   -90.18
out_comb_gcap   -94.20
The free energy values are very large, but the bound minus the free leg corresponds to a free energy difference of -4.0 kcal mol-1. This is in very good agreement with the experimental value of -3.8 kcal mol-1.

The PMF of the free energy can be calculated using TI, BAR and MBAR, and are plotted below. All three are shown, but as they are very consistent, only one is visible, with the other two perfectly behind. As the profiles are both smooth and consistent, this indicates that the perturbation is converged.

While this good experimental agreement is promising, it is useful to understand where the favourable energetics that make ligand ONE higher affinity than ligand THR come from, as this can be useful for SAR information and drug design.

python $PROTOMSHOME/tools/ -b out_comb_gcap -d out_comb_free -l 0.25 
The -b flag shows what is the bound leg, and -d flag indicates the free leg of the simulation. This will calculate the free energy difference of ligand THR and ONE by each energetic contribution.

             FDTI:   -4.234 +- 0.000
protein1-thr1_COU:   -2.926 +- 0.000
 protein1-thr1_LJ:   -0.339 +- 0.000
      thr-GCS_COU:   -2.983 +- 0.000
       thr-GCS_LJ:   -0.517 +- 0.000
  thr-solvent_COU:    1.102 +- 0.000
   thr-solvent_LJ:    0.316 +- 0.000
          thr_ANG:    0.077 +- 0.000
          thr_BND:   -0.001 +- 0.000
          thr_DIH:   -0.000 +- 0.000
          thr_NBC:    0.812 +- 0.000
          thr_NBL:    0.224 +- 0.000
     sum of terms:   -4.235 +- 0.000

The top-most and bottom-most numbers show the free energy perturbation for the bound and free leg combined, calculated from the total energy, and the sum of all individual energies respectively. The relative free energy is 0.2 kcal mol-1 different calculated by TI than MBAR. Three of the contributions contribute over 1 kcal mol-1, protein1-thr1_COU, thr-GCS_COU and thr-solvent_COU.

First considering protein1-thr1_COU, this shows that the coulombic interaction between ligand ONE and the protein is 3 kcal mol-1 more favourable than ligand THR, which is a large energetic contribution for the pertubration of a few atoms.

The other contributing energetic contributions are the interactions between the ligand and water. thr-solvent_COU shows an unfavourable 1 kcal mol-1 contribution to the free energy when ligand THR is perturbed to ligand ONE. The other energy, thr-GCS_COU, shows a 3 kcal mol-1 FAVOURABLE contribution for the same perturbation. The thr-GCS_COU energy is the free energy difference between the ligand and the grand canonical solute (the GCMC water molecule), which are the water molecules that are placed within the GCMC region throughout the simulation. This shows that ligand 1 has a much more favourable interaction with water in the 64.0 Å3 GCMC region set up at the beginning. So lets look at what this means.

GCMC attempts to insert and delete water molecules within the defined region, throughout the simulation. This allows the water molecule to be inserted or displaced, according to the equilibrium state of the protein-ligand complex.

Now lets look at the sampling of the GCMC water molecule at different λ values.

python $PROTOMSHOME/tools/ -f out_comb_gcap/lam-0.000/results -s solventson
python $PROTOMSHOME/tools/ -f out_comb_gcap/lam-0.533/results -s solventson
python $PROTOMSHOME/tools/ -f out_comb_gcap/lam-1.000/results -s solventson
This will show the water occupancy of the GCMC box through the simulation for the two ligands; ONE, an intermediate ligand between ONE and THR, and ligand THR.


This shows that the water occupancy with ligand ONE is one, and ligand THR is zero, although this value samples due to the insertion and deletion moves. For the intermediate ligand, the water occupancy exchanges frequently, and samples both occupancies well.

This can be visualised by taking a bit of each simulation and stitching it together to view as a trajectory. This can be done using the command below.

for file in out_comb_gcap/lam-*/all.pdb; do sed "/MODEL       31/q" $file; done > simulation.pdb
For a longer or shorter trajectory, increase or decrease the number following MODEL. PyMOL can handle the trajectory as it is, but VMD needs a constant number of water molecules in each frame to read in the trajectory, so we need to modify the input before visualising:
python $PROTOMSHOME/tools/ -i simulation.pdb -o vmd_simulation.pdb -n 2
vmd vmd_simulation.pdb
Where -n is the maximum number of water molecules expected, which has been determined from the solventson plots above.

This trajectory should show the perturbation from ligand THR to ligand ONE, with the C-H group shrinking in. The carbon atom will be perturbed to a nitrogen, but as it is the atom parameters and not the atom name that is updated, the atom does not appear to change to nitrogen. This trajectory should show that as the group is shrunk, the GCMC water molecule (resname WA1) inserts as the group becomes smaller.

The surface-GCAP simulation should take a similar amount of time as the single or dual topology simulation, but will require more processors to run and output more files.

The free energy of the perturbation from the surface can be calculated. As there are multiple B values, the volume is required.

python $PROTOMSHOME/tools/ -d out_comb_gcap/ -l 0.25 --estimators gcap --volume 64.
This gives a free energy difference of -94.14 kcal mol-1. This can be combined with the free leg of the single topology leg of the simulation from the previous section (-90.18) to give a relative binding free energy of -4.0 kcal mol-1. This is consistent with the result calculated with only a single B value.

The free energy surface and the water occupancy surface can be generated using the following command.

python $PROTOMSHOME/tools/ -d out_comb_gcap/ -l 0.25 -v 64. 
Again, other esimators can be used. These are the two-dimensional form of the free energy surface generated in the previous section. The water occupancy surface can show us the occupancy of the GCMC region at various combinations of λ and B values. Looking at the equilibrium B value (-9.7) we can see again the water occupancy is 1 for ligand ONE, and ~0 for ligand THR.


To calculate the free energy only at a particular B value, this can be done using:

python $PROTOMSHOME/tools/ -d out_comb_gcap/ -l 0.25 --subdir b_-9.710 --estimators mbar
python $PROTOMSHOME/tools/ -d out_comb_gcap/ -l 0.25 --subdir b_-17.710 --estimators mbar
Or any other one-dimensional free energy method. The --subdir points to the directory containing the B value of interest. The first command should give results consistent to the surface-GCAP results, and the results from the previous section, where the protein-ligand complex should be correctly hydrated at each stage. This gives a result of -94.28 kcal mol-1.

The second command is looking at the lowest B value, where looking at the water occupancy surface at B=-17.71, the GCMC region will be empty of waters. This gives a free energy of -91.43, which when combined with the free energy leg gives a relative binding free energy of -1.3 kcal mol-1. This shows that ligand ONE is more tightly bound, even without the water molecule, but the water molecule is able to stabilise its binding relative to ligand THR by 2.7 kcal mol-1.

The binding free energy of the water molecule itself can be calculated using GCI at λ=0 and λ=1.

python $PROTOMSHOME/tools/ -l 0.25 -v 64. -d out_comb_gcap/lam-0.000/
python $PROTOMSHOME/tools/ -l 0.25 -v 64. -d out_comb_gcap/lam-1.000/

This calculates the binding free energy of the water molecule with ligand ONE (λ = 1):

Number of Waters   Insertion Free Energy   Network Binding Free Energy   Water Binding Free Energy   
0                   0.000 +- 0.000          0.000 +- 0.000                0.000 +- 0.000          
1                  -8.883 +- 0.000         -2.683 +- 0.000               -2.683 +- 0.000  

Where the binding free energy of the water molecule is -2.683 kcal mol-1, which is consistent with the 2.7 stabilisation of ligand ONE on inclusion of the water molecule.