Setting up structures to use with ProtoMS

This short document helps you prepare ligand and protein structure so that they can be used by the ProtoMS setup tools. It is not intended as a complete guide but contains references to tools and programs that can aid in the setup.

Ligand setup

Source of structure
If you are setting up a protein-ligand simulation, you will typically find the ligand or a related ligand bound to the protein. Then you can extract the coordinates of the ligand from the PDB files.

If this is not the case you have to create the structure manually. One option is to draw in a 2D drawing program like Marvin. Such tools can typically convert it to a reasonable 3D structure. If it is a common ligand you can retrieve the SMILE string from ChemSpider and paste the SMILE string into your drawing program. Alternatively, you can paste the SMILE string into a web-service that directly converts it to a 3D structure. One such site is Cactus, available at
Next, you need to protonate the ligand. This can be done with several programs such as Ot is important to consider the environment that surrounds the ligand. If it is surrounded by charged residues or co-factors the ligand can become charged. Therefore, it is important to check this prior to protonation as most of the software package above does not account for this.

It is also important to consider tautomeric states. Marvin Sketch can suggest the appropriate tautomer at a specific pH from a set of rules. However, be careful to check this.

It is recommended to minimize the ligand because typically bonds and only a limited number of angles are sampled during the Monte Carlo simulation. This can be done with program in the AmberTools package. Below, we assume that the files ligand.pdb exists and that the files ligand.prepi and ligand.frcmod has been created previously. The latter can be done with for instance.

First, we will set the name of the ligand to an environment variable so that we can use more general scripts

export name=ligand
then we will create a script for tleap and execute it, thereby creating Amber prmtop and prmcrd files.
cat << EOF >
source leaprc.gaff
loadamberprep ${name}.prepi
loadamberparams ${name}.frcmod
x=loadpdb ${name}.pdb
saveamberparm x prmtop prmcrd

tleap -f >& leap.out
next we will create input to the MD program sander and run a short 100 step minimization.
cat << EOF >

sander -O -i -o min.out -p prmtop -c prmcrd -r mincrd
the minimized coordinates are found in mincrd. To convert this to a PDB file we can use
ambpdb -p prmtop < mincrd > mincrd.pdb 
Finally, if you want to put the minimized coordinates back into your original PDB file you can type the following
header=$(sed -n '1p' ${name}.pdb)
echo $header > ${name}.pdb
sed "/REMARK/d" mincrd.pdb >> ${name}.pdb

Protein setup

Missing residues
It is common for protein structures to be incomplete. This is because certain protein regions can be extremely flexible, causing them to produce a diffuse diffraction pattern, rendering them unresolved. There are techniques available to tackle this problem. One such technique is homology modeling.

There are online tools that enable the addition of missing residues to your protein structure, for example Modloop.
After examining the PDB file and its content, missing hydrogens can be added to the protein structure. There are several concepts that one must consider when performing protonation. These include: There are several software packages that are able to protonate a protein. These are: After adding hydrogens it is always a good idea to use visual inspection to validate whether the protonation is reasonable.
A simple checklist
This is a list that you go through whenever you setup a new protein. You need to use some kind of visualization program to look at your protein residues. It is a good idea to save separate files during the setup, instead of modifying the original pdb-file from the Proteinbank (use appropriate names)
  1. Start with a pdb-file from the Protein Databank
  2. Read the article that describe the structure. Read all the initial rows in the pdb-file that occurs before the actual coordinates. Make notes on missing residues and crystallization conditions (e.g. pH)
  3. If there are ligands or co-factor that needs to be included, parameterize them.
  4. Determine the protonation state of charged residues
    Usually Asp, Glu, Arg, Lys and Cys residues have their normal protonation state at pH 7. Exceptions can be:
    Is the residue buried without any ion-pair partner?
    Is the residue bound to a metal ion?
    Is the residue involved in interactions with ligands or co-factors?
    You need to check this for buried residues.
  5. Determine the protonate state of histidine residues
    His residues are special and have typically one three protonation states: 1) protonated on the ND1 atom, 2) protonated on the NE2 atom, or 3) protonated on both positions.
    This is determined by the environment
    Try to avoid double protonated residues, as this increase the total charge of the protein
    NE2 protonated residues are most common, so use this as a default
    Look at all residues within 3A of the ring and try to located hydrogen-bonding partners
    Look out for ring flips. Sometimes the crystallographer swaps N and C atoms in the ring.
    Try a few prediction tools and see what they suggest
    Rename residues: HID (ND1), HIE (NE2), or HIP (both).
  6. Find residues with alternative conformations.
    You can only keep one in the simulation, so pick one with the highest occupancy (or any if the occupancy is 0.5).
  7. Locate cystein-cystein bridges.
    Change their names to CYX and make a note of their residue numbers.
  8. Remove buffer molecules
  9. Optionally, remove the ligand and co-factors (you can add them back later)
  10. Add hydrogens to the system using leap, e.g.
  cat << EOF >
  source leaprc.ff99SB  
  x=loadpdb protein.pdb
  savepdb x protein_protonated.pdb
  saveamberparm x prmtop prmcrd
  tleap -f  
for further reading, please consult the latest Amber manual at
Sometimes it is a good idea to minimize the protein before starting the Monte Carlo simulation. If you for instance modelled in a large bit of undefined residues. The procedure below assumes that the name of the PDB file is protein.pdb

First, we will set the name of the ligand to an environment variable so that we can use more general scripts

export name=protein
then we will create a script for tleap and execute it, thereby creating Amber prmtop and prmcrd files.
cat << EOF >
source leaprc.ff99SB
x=loadpdb ${name}.pdb
saveamberparm x prmtop prmcrd

tleap -f >& leap.out
The rest of the procedure follows from the minimization of ligands shown above.