prz.pdb - the PDB structure of the small molecule PRZ
protein.pdb - the PDB structure of the MUP-I
Both the protein and ligand have been protonated, and the protein atoms are in the AMBER naming scheme. The ligand is in its crystallographic pose. There is one cysteine bridge between residues 64 and 157, which have been labelled as CYX. You can read more about setting up structures here.
python $PROTOMSHOME/protoms.py -s sampling -p protein.pdb -l prz.pdb
which automatically solvated the system, and using GAFF, generated the forcefield and topolographic information of the ligand. By entering
-s sampling , we are telling
protoms.py that we wish to sample configurations of the system.
The solvated system should look like this:
The command file
run_bnd.cmd was also generated by
protoms.py; it summarises everything ProtoMS needs to know about running a simulation. At the bottom of
dump 100000 results write results dump 100000 pdb all solvent=all file=all.pdb standard dump 100000 restart write restart dump 100000 averages reset chunk equilibrate 5000000 solvent=774 protein=213 solute=14 volume=0 chunk simulate 40000000 solvent=774 protein=213 solute=14 volume=0
The lines starting with
chunk tells ProtoMS what it should be sampling, and whether it should be recording data. The
chunk equilibrate 5000000 means that while 5,000,000 sequential system configurations will be generated, data will not be recorded over this period. Following that,
chunk simulate 40000000 means that 40,000,000 snap-shots will be generated, and that data will be recorded. This is the only difference between the
simulate commands; they both sample the system, but only one can output data. The benefit of using
equilibrate is that while the system is relaxing, the data is discarded without taking up any disk space. As
chunk simulate comes after
chunk equilibrate, the last frame from the equilibration run will be used as the start of the "simulate"
The lines starting with
dump relate to
chunk simulate, as they tell ProtoMS the frequency with which it will output the various files while it is simulating; these are the PDBs of the structures, the restart files, as well as the frequency with which cumlative averages (such as the total energy) will be reset.
As mentioned, there is one cysteine bridge in MUP-I. This needs to be constrained as their sampling is not yet properly coded for in ProtoMS. At the top of the
protein_pms.pdb file, this is achieved via the following lines:
REMARK chunk fixbackbone 1 63, 65, 156 REMARK chunk fixresidues 1 64, 157which will also constrain the backbones of the residues immediately adjacent to the cysteine bridge.
$PROTOMSHOME/protoms3 run_bnd.cmdto simulate the system. This will use one processor and take at least 5 hours to complete.
run_bnd.cmd. This was done by changing
chunk simulate 40000000 to
chunk simulate 80000000 in
To view a movie of the simulation, you can use VMD:
vmd all.pdbAll there is to see is a lot of atoms wiggling and jiggling. What we're really interested in is how the energies of the system change as the simulation progresses. For instance, type
python $PROTOMSHOME/tools/calc_series.py -f out_bnd/results -s total inter/protein1-solvent/sum inter/solvent-solvent/sum inter/prz-solvent/sum -p subwhich, due to the
-s flag, will analyse the total energy (
total), solvent-solvent interaction energy (
inter/solvent-solvent/sum), protein-solvent interaction energy (
inter/protein1-solvent/sum), and the ligand-solvent interaction energy (
-p sub flag indicates we'd like the plots of energies printed as sub-plots, which, for our 80M move simulation look like
dump commands in
run_bnd), 700 snapshots correspond to 70M Monte Carlo moves.
What about equilibration? A necessary, but often insufficient, condition for equilibrium is
"A quantity is said to be at equilibrium if it does not change with time."
Therefore, it appears as if the system composed of water, MUP-I and the ligand PRZ does not reach equilibrium until around 70M moves. Will this energy drift affect free energy calculations? Some insight can be gained by looking at the plots of the other energies we produced, namely the protein-solvent interaction energy (yellow, bottom left). This quantity accounts for much for the decrease in the total energy. The solvent-solvent energy also appears, by eye, to also decrease with time. In contrast, the ligand-solvent energy appears roughly constant over the entirety of the simulation. Before reaching any conclusions about the equilibration of the ligand, let's have a look at the ligand-protein interaction energy. So that you can gain some experience with the interactive mode of
python $PROTOMSHOME/tools/calc_series.py -f resultsand, type
inter/protein1-prz1/sum and hit the enter key. (Note that you can analyse the Lennard-Jones (
lj) and Coulomb (
cou) energies separately.) This will produce the ligand-protein interaction energy. In our simulation, we find
Like the ligand-solvent energy, this quantity appears to vary around a constant value.
We can also have a look at internal energies of the protein and ligand
python $PROTOMSHOME/tools/calc_series.py -f results -s intra/protein1/sum intra/prz/sum -p sub
The keen eyed among you would have noticed that one every plot is a verticle dashed line, and every time
calc_series is called produces output like
Equilibration found at snapshot 543 for prz_sum, value=74.653 This production part is estimated to contain 20 uncorrelated samples (g=12.472) The number of samples is maximized at 485, g=13.850 and the number of uncorrelated samples is 22This occurs because calc_series is automatically trying to find when a quantity reaches equilibrium. The dashed verticle line and first line of the output are done via a hypothesis test on whether the line is "flat" or not. Data to the left of the dashed line is estimated to be "unequilibrated", and data to the right is estimated to be "equilibrated". By trying to maximise the number of uncorrelated samples, a second estimate of the equilibration point is also produced. While these tools were designed to determine the points of equilibria objectively, they sometimes disagree with human intuition. Have another look at the plots above and see if you agree with the algorithm as to where to equilibration point is.
protoms.py created the following files:
water.pdb = the droplet of solvent water that surrounds the protein and ligand
protein_pms.pdb = the PDB structure of the protein in the ProtoMS naming scheme
prz.tem = the forcefield template file of the ligand PRZ, which contains move information
run_bnd.cmd = the command file that tells ProtoMS what to simulate and how
prx_box.pdb = cubic box of water around the ligand
prz.zmat = the z-matrix of the ligand, used to generate the ligand moves in Monte Carlo
prz.prepi = the AMBER prepi file of the ligand created by antechamber
prz.frcmod = the AMBER frcmod file that is also created by antechamber
prz.tem tells ProtoMS the partial charges and Lennard-Jones parameters for each atom in the ligand. It incorporates all the information present in
$PROTOMSHOME/tools. Here, we will set our simulation up from scratch.
python $PROTOMSHOME/tools/solvate.py -pr protein.pdb -g droplet -b $PROTOMSHOME/data/wbox_tip4p.pdb -o water.pdb
to make the sphere of solvent in
water.pdb. The default radius of the sphere is 30 Å, which is fine in this case. A different radius can be specified by the
It is important to consider that
solvate.py, which is also called by
protoms.py, adds water molecules randomly to the system based on a pre-equilibrated box of water. Water molecules that have been added but sterically clash with the protein or ligand are automatically removed. While this works okay in most situations, MUP-I has a very hydrophobic cavity, meaning that one should think very carefully about if and where one should add waters. With MUP-I, it is prudent to remove all the waters that have been added by
solvate.py that are in the binding site. To do so, we will start by creating a box around the ligand that is at least 3 Å away from it
python $PROTOMSHOME/tools/make_gcmcbox.py -s prz.pdb -p 3 -o box.pdbcreating
box.pdb, which can be visualsed in a molecular viewer, such as VMD. Any water molecules in
water.pdb that are inside
box.pdb can be removed
python $PROTOMSHOME/tools/clear_gcmcbox.py -b box.pdb -s water.pdb -o water_clr.pdbcreating
To set-up the ligand from scratch, we use the GAFF forcefield and AMBERTOOLS. These are called via
python $PROTOMSHOME/tools/ambertools.py -f prz.pdb -c 0where
-c 0 tells AMBERTOOLS that PRZ has no net charge. This has created
prz.frcmod. To use these to generate the template file for ProtoMS, type
python $PROTOMSHOME/tools/build_template.py -n PRZ -p prz.prepi -f prz.frcmod -o prz.temwhich creates
Before proceeding, the protein atom names have to be adapted to the ProtoMS convention. This is done with the command
python $PROTOMSHOME/tools/convertatomnames.py -p protein.pdb -c $PROTOMSHOME/data/atomnamesmap.datThe files above need to be reference by the ProtoMS command file, created using
python $PROTOMSHOME/tools/generate_input.py -p protein_pms.pdb -l prz.pdb -t prz.tem -pw water_clr.pdb -s sampling -o run_bnd.cmd --outfolder out_bnd --nprod 80000000 --nequil 5000000creating
run_bnd.cmd. Note that here we specify how long we want our equilibration and simulation time to be. If you recall, the data from the equilibration moves are not recorded. In the above, we specified that we want 5M moves of equilibration, and 80M moves for the production step. Given that it took roughly 70M moves for the total energy to equilibrate, if that were our quantity of interest, a much longer equilibration run would be necessary.
Finally, because there is a cysteine bridge in MUP-I, we need to add
REMARK chunk fixbackbone 1 63, 65, 156 REMARK chunk fixresidues 1 64, 157to the
REMARK section of the