Essex Research Group

Computational Simulation of Biomolecular Systems


Water plays an important role in mediating protein-ligand interactions. Water rearrangement upon a ligand binding or modification can be very slow and beyond typical timescales used in molecular dynamics (MD) simulations. Thus, inadequate sampling of slow water motions in MD simulations often impairs the accuracy of the accuracy of ligand binding free energy calculations. Previous studies suggest grand canonical Monte Carlo (GCMC) outperforms normal MD simulations for water sampling, thus GCMC has been applied to help improve the accuracy of ligand binding free energy calculations. However, in prior work we observed protein and/or ligand motions impaired how well GCMC performs at water rehydration, suggesting more work is needed to improve this method to handle water sampling. In this work, we applied GCMC in 21 protein-ligand systems to assess the performance of GCMC for rehydrating buried water sites. While our results show that GCMC can rapidly rehydrate all selected water sites for most systems, it fails in five systems. In most failed systems, we observe protein/ligand motions, which occur in the absence of water, combine to close water sites and block instantaneous GCMC water insertion moves. For these five failed systems, we both extended our GCMC simulations and tested a new technique named grand canonical nonequilibrium candidate Monte Carlo (GCNCMC). GCNCMC combines GCMC with the nonequilibrium candidate Monte Carlo (NCMC) sampling technique to improve the probability of a successful water insertion/deletion. Our results show that GCNCMC and extended GCMC can rehydrate all target water sites for three of the five problematic systems and GCNCMC is more efficient than GCMC in two out of the three systems. In one system, only GCNCMC can rehydrate all target water sites, while GCMC fails. Both GCNCMC and GCMC fail in one system. This work suggests this new GCNCMC method is promising for water rehydration especially when protein/ligand motions may block water insertion/removal.

A novel variant in GATM causes idiopathic renal Fanconi syndrome and predicts progression to end-stage kidney disease

E. G. Seaby, S. Turner, D. J. Bunyan, F. Seyed-Rezai, J. W. Essex, R. D. Gilbert & S. Ennis (2022)

Renal Fanconi syndrome (RFS) is a generalised disorder of the proximal convoluted tubule. Many genes have been associated with RFS including those that cause systemic disorders such as cystinosis, as well as isolated RFS. We discuss the case of a 10-year-old female who presented with leg pain and raised creatinine on a screening blood test. Her mother has RFS and required a kidney transplant in her thirties. Further investigations confirmed RFS in the daughter. Exome sequencing was performed on the affected mother, child, and unaffected father. We identified a novel variant in GATM; c.965G>C p.(Arg322Pro) segregating dominantly in the mother and daughter. We validated our finding with molecular dynamics simulations and demonstrated a dynamic signature that differentiates our variant and two previously identified pathogenic variants in GATM from wildtype. Genetic testing has uncovered a novel pathogenic variant that predicts progression to end stage kidney failure and has important implications for family planning and cascade testing. We recommend that GATM is screened for in children presenting with RFS, in addition to adults, particularly with kidney failure, who may have had previous negative gene testing.

Hinge disulfides in human IgG2 CD40 antibodies modulate receptor signaling by regulation of conformation and flexibility

C. M. Orr, H. Fisher, X. Yu, C. H.-T. Chan, Y. Gao, P. J. Duriez, S. G. Booth, I. Elliott, T. Inzhelevskaya, I. Mockridge, C. A. Penfold, A. Wagner, M. J. Glennie, A. L. White, J. W. Essex, A. R. Pearson, M. S. Cragg & I. Tews (2022)

Monoclonal antibodies (mAbs) are powerful therapeutic agents. The agonistic activity of mAbs is dependent on the epitope recognized as well as the isotype. For human IgG2 mAbs, the structure of the hinge region is important for modulating agonistic activity, but the underlying mechanism is unclear. Orr et al. studied the structure and agonistic activity of a series of hIgG2 anti-CD40 cysteine to serine exchange variants. Agonistic activity varied depending on the pattern of disulfide bonds within the IgG2 hinge region, with more agonistic antibodies featuring a disulfide crossover, which directly affected the flexibility and the conformation of the antibody. mAbs with less flexible hinge regions had fewer conformations, which enabled increased receptor agonism. These findings highlight the importance of hinge variation in modulating antibody activity.

Higher Affinity Antibodies Bind With Lower Hydration and Flexibility in Large Scale Simulations

M. T. Y. Wong, S. Kelm, X. Liu, R. D. Taylor, T. Baker & J. W. Essex (2022)

We have carried out a long-timescale simulation study on crystal structures of nine antibody-antigen pairs, in antigen-bound and antibody-only forms, using molecular dynamics with enhanced sampling and an explicit water model to explore interface conformation and hydration. By combining atomic level simulation and replica exchange to enable full protein flexibility, we find significant numbers of bridging water molecules at the antibody-antigen interface. Additionally, a higher proportion of interactions excluding bulk waters and a lower degree of antigen bound CDR conformational sampling are correlated with higher antibody affinity. The CDR sampling supports enthalpically driven antibody binding, as opposed to entropically driven, in that the difference between antigen bound and unbound conformations do not correlate with affinity. We thus propose that interactions with waters and CDR sampling are aspects of the interface that may moderate antibody-antigen binding, and that explicit hydration and CDR flexibility should be considered to improve antibody affinity prediction and computational design workflows.

Enhancing Ligand and Protein Sampling Using Sequential Monte Carlo

M. Suruzhon, M. S. Bodnarchuk, A. Ciancetta, I. D. Wall & J. W. Essex (2022)

The sampling problem is one of the most widely studied topics in computational chemistry. While various methods exist for sampling along a set of reaction coordinates, many require system-dependent hyperparameters to achieve maximum efficiency. In this work, we present an alchemical variation of adaptive sequential Monte Carlo (SMC), an irreversible importance resampling method that is part of a well-studied class of methods that have been used in various applications but have been underexplored in computational biophysics. Afterward, we apply alchemical SMC on a variety of test cases, including torsional rotations of solvated ligands (butene and a terphenyl derivative), translational and rotational movements of protein-bound ligands, and protein side chain rotation coupled to the ligand degrees of freedom (T4-lysozyme, protein tyrosine phosphatase 1B, and transforming growth factor β). We find that alchemical SMC is an efficient way to explore targeted degrees of freedom and can be applied to a variety of systems using the same hyperparameters to achieve a similar performance. Alchemical SMC is a promising tool for preparatory exploration of systems where long-timescale sampling of the entire system can be traded off against short-timescale sampling of a particular set of degrees of freedom over a population of conformers.

Enhancing Sampling of Water Rehydration on Ligand Binding: A Comparison of Techniques

Y. Ge, D. C. Wych, M. L. Samways, M. E. Wall, J. W. Essex & D. L. Mobley (2022)

Water often plays a key role in protein structure, molecular recognition, and mediating protein–ligand interactions. Thus, free energy calculations must adequately sample water motions, which often proves challenging in typical MD simulation time scales. Thus, the accuracy of methods relying on MD simulations ends up limited by slow water sampling. Particularly, as a ligand is removed or modified, bulk water may not have time to fill or rearrange in the binding site. In this work, we focus on several molecular dynamics (MD) simulation-based methods attempting to help rehydrate buried water sites: BLUES, using nonequilibrium candidate Monte Carlo (NCMC); grand, using grand canonical Monte Carlo (GCMC); and normal MD. We assess the accuracy and efficiency of these methods in rehydrating target water sites. We selected a range of systems with varying numbers of waters in the binding site, as well as those where water occupancy is coupled to the identity or binding mode of the ligand. We analyzed the rehydration of buried water sites in binding pockets using both clustering of trajectories and direct analysis of electron density maps. Our results suggest both BLUES and grand enhance water sampling relative to normal MD and grand is more robust than BLUES, but also that water sampling remains a major challenge for all of the methods tested. The lessons we learned for these methods and systems are discussed.

Water molecules play important roles in all biochemical processes. Therefore, it is of key importance to obtain information of the structure, dynamics, and thermodynamics of water molecules around proteins. Numerous computational methods have been suggested with this aim. In this study, we compare the performance of conventional and grand-canonical Monte Carlo (GCMC) molecular dynamics (MD) simulations to sample the water structure, as well GCMC and grid-based inhomogeneous solvation theory (GIST) to describe the energetics of the water network. They are evaluated on two proteins: the buried ligand-binding site of a ferritin dimer and the solvent-exposed binding site of galectin-3. We show that GCMC/MD simulations significantly speed up the sampling and equilibration of water molecules in the buried binding site, thereby making the results more similar for simulations started from different states. Both GCMC/MD and conventional MD reproduce crystal-water molecules reasonably for the buried binding site. GIST analyses are normally based on restrained MD simulations. This improves the precision of the calculated energies, but the restraints also significantly affect both absolute and relative energies. Solvation free energies for individual water molecules calculated with and without restraints show a good correlation, but with large quantitative differences. Finally, we note that the solvation free energies calculated with GIST are ∼5 times larger than those estimated by GCMC owing to differences in the reference state.

How well does molecular simulation reproduce environment-specific conformations of the intrinsically disordered peptides PLP, TP2 and ONEG

L. M. Reid, I. Guzzetti, T. Svensson, A-C. Carlsson, W. Su, T. Leek, L. von Sydow, W. Czechtizky, M. Miljak, C. Verma, L. De Maria & J. W. Essex (2022)

Understanding the conformational ensembles of intrinsically disordered proteins and peptides (IDPs) in their various biological environments is essential for understanding their mechanisms and functional roles in the proteome, leading to a greater knowledge of, and potential treatments for, a broad range of diseases. To determine whether molecular simulation is able to generate accurate conformational ensembles of IDPs, we explore the structural landscape of the PLP peptide (an intrinsically disordered region of the proteolipid membrane protein) in aqueous and membrane-mimicking solvents, using replica exchange with solute scaling (REST2), and examine the ability of four force fields (ff14SB, ff14IDPSFF, CHARMM36 and CHARMM36m) to reproduce literature circular dichroism (CD) data. Results from variable temperature (VT) 1H and Rotating frame Overhauser Effect SpectroscopY (ROESY) nuclear magnetic resonance (NMR) experiments are also presented and are consistent with the structural observations obtained from the simulations and CD. We also apply the optimum simulation protocol to TP2 and ONEG (a cell-penetrating peptide (CPP) and a negative control peptide, respectively) to gain insight into the structural differences that may account for the observed difference in their membrane-penetrating abilities. Of the tested force fields, we find that CHARMM36 and CHARMM36m are best suited to the study of IDPs, and accurately predict a disordered to helical conformational transition of the PLP peptide accompanying the change from aqueous to membrane-mimicking solvents. We also identify an α-helical structure of TP2 in the membrane-mimicking solvents and provide a discussion of the mechanistic implications of this observation with reference to the previous literature on the peptide. From these results, we recommend the use of CHARMM36m with the REST2 protocol for the study of environment-specific IDP conformations. We believe that the simulation protocol will allow the study of a broad range of IDPs that undergo conformational transitions in different biological environments.

Atomistic models provide a detailed representation of molecular systems, but are sometimes inadequate for simulations of large systems over long timescales. Coarse-grained models enable accelerated simulations by reducing the number of degrees of freedom, at the cost of reduced accuracy. New optimisation processes to parameterise these models could improve their quality and range of applicability. We present an automated approach for the optimisation of coarse-grained force fields, by reproducing free energy data derived from atomistic molecular simulations. To illustrate the approach, we implemented hydration free energy gradients as a new target for force field optimisation in ForceBalance and applied it successfully to optimise the un-charged side-chains and the protein backbone in the SIRAH protein coarse-grain force field. The optimised parameters closely reproduced hydration free energies of atomistic models and gave improved agreement with experiment.

Generation of Quantum Configurational Ensembles Using Approximate Potentials

J. Morado, P. N. Mortenson, J. W. M. Nissink, M. L. Verdonk, R. A. Ward, J. W. Essex & C-K Skylaris (2021)

Conformational analysis is of paramount importance in drug design: it is crucial to determine pharmacological properties, understand molecular recognition processes, and characterize the conformations of ligands when unbound. Molecular Mechanics (MM) simulation methods, such as Monte Carlo (MC) and molecular dynamics (MD), are usually employed to generate ensembles of structures due to their ability to extensively sample the conformational space of molecules. The accuracy of these MM-based schemes strongly depends on the functional form of the force field (FF) and its parametrization, components that often hinder their performance. High-level methods, such as ab initio MD, provide reliable structural information but are still too computationally expensive to allow for extensive sampling. Therefore, to overcome these limitations, we present a multilevel MC method that is capable of generating quantum configurational ensembles while keeping the computational cost at a minimum. We show that FF reparametrization is an efficient route to generate FFs that reproduce QM results more closely, which, in turn, can be used as low-cost models to achieve the gold standard QM accuracy. We demonstrate that the MC acceptance rate is strongly correlated with various phase space overlap measurements and that it constitutes a robust metric to evaluate the similarity between the MM and QM levels of theory. As a more advanced application, we present a self-parametrizing version of the algorithm, which combines sampling and FF parametrization in one scheme, and apply the methodology to generate the QM/MM distribution of a ligand in aqueous solution.

Rimantadine Binds to and Inhibits the Influenza A M2 Proton Channel without Enantiomeric Specificity

J. L. Thomaston, M. L. Samways, A. Konstantinidi, C. Ma, Y. Hu, H. E. Bruce Macdonald, J. Wang, J. W. Essex, W. F. DeGrado & A. Kolocouris (2021)

The influenza A M2 wild-type (WT) proton channel is the target of the anti-influenza drug rimantadine. Rimantadine has two enantiomers, though most investigations into drug binding and inhibition have used a racemic mixture. Solid-state NMR experiments using the full length-M2 WT have shown significant spectral differences that were interpreted to indicate tighter binding for (R)- vs (S)-rimantadine. However, it was unclear if this correlates with a functional difference in drug binding and inhibition. Using X-ray crystallography, we have determined that both (R)- and (S)-rimantadine bind to the M2 WT pore with slight differences in the hydration of each enantiomer. However, this does not result in a difference in potency or binding kinetics, as shown by similar values for kon, koff, and Kd in electrophysiological assays and for EC50 values in cellular assays. We concluded that the slight differences in hydration for the (R)- and (S)-rimantadine enantiomers are not relevant to drug binding or channel inhibition. To further explore the effect of the hydration of the M2 pore on binding affinity, the water structure was evaluated by grand canonical ensemble molecular dynamics simulations as a function of the chemical potential of the water. Initially, the two layers of ordered water molecules between the bound drug and the channel’s gating His37 residues mask the drug’s chirality. As the chemical potential becomes more unfavorable, the drug translocates down to the lower water layer, and the interaction becomes more sensitive to chirality. These studies suggest the feasibility of displacing the upper water layer and specifically recognizing the lower water layers in novel drugs.

Water molecules at protein-drug interfaces: computational prediction and analysis methods

M. L. Samways, R. D. Taylor, H. E. Bruce Macdonald & J. W. Essex (2021)

The fundamental importance of water molecules at drug–protein interfaces is now widely recognised and a significant feature in structure-based drug design. Experimental methods for analysing the role of water in drug binding have many challenges, including the accurate location of bound water molecules in crystal structures, and problems in resolving specific water contributions to binding thermodynamics. Computational analyses of binding site water molecules provide an alternative, and in principle complete, structural and thermodynamic picture, and their use is now commonplace in the pharmaceutical industry. In this review, we describe the computational methodologies that are available and discuss their strengths and weaknesses. Additionally, we provide a critical analysis of the experimental data used to validate the methods, regarding the type and quality of experimental structural data. We also discuss some of the fundamental difficulties of each method and suggest directions for future study.

Computational Methods and Tools in Antimicrobial Peptide Research

P. G. A. Aronica, L. M. Reid, N. Desai, J. Li, S. J. Fox, S. Yadahalli, J. W. Essex & C. S. Verma (2021)

The evolution of antibiotic-resistant bacteria is an ongoing and troubling development that has increased the number of diseases and infections that risk going untreated. There is an urgent need to develop alternative strategies and treatments to address this issue. One class of molecules that is attracting significant interest is that of antimicrobial peptides (AMPs). Their design and development has been aided considerably by the applications of molecular models, and we review these here. These methods include the use of tools to explore the relationships between their structures, dynamics, and functions and the increasing application of machine learning and molecular dynamics simulations. This review compiles resources such as AMP databases, AMP-related web servers, and commonly used techniques, together aimed at aiding researchers in the area toward complementing experimental studies with computational approaches.

Coarse-Grained Molecular Dynamics Simulations of Membrane Proteins: A Practical Guide

W. G. Glass, J. W. Essex, F. Fraternali, J. Gebbie-Rayet, I. Marzuoli, M. L. Samways, P. C. Biggin & S. Khalid (2021)

Current computer architectures, coupled with state-of-the-art molecular dynamics simulation software, facilitate the in-depth study of large biomolecular systems at high levels of detail. However, biological phenomena take place at various time and length scales and as a result a multiscale approach must be adopted. One such approach is coarse-graining, where biochemical accuracy is sacrificed for computational efficiency. Here, we present a practical guide to setting up and carrying out coarse-grained molecular dynamics simulations.

ParaMol: A Package for Automatic Parameterization of Molecular Mechanics Force Fields

J. Morado, P. N. Mortenson, M. L. Verdonk, R. A. Ward, J. W. Essex & C-K Skyarlis (2021)

The ensemble of structures generated by molecular mechanics (MM) simulations is determined by the functional form of the force field employed and its parameterization. For a given functional form, the quality of the parameterization is crucial and will determine how accurately we can compute observable properties from simulations. While accurate force field parameterizations are available for biomolecules, such as proteins or DNA, the parameterization of new molecules, such as drug candidates, is particularly challenging as these may involve functional groups and interactions for which accurate parameters may not be available. Here, in an effort to address this problem, we present ParaMol, a Python package that has a special focus on the parameterization of bonded and nonbonded terms of druglike molecules by fitting to ab initio data. We demonstrate the software by deriving bonded terms’ parameters of three widely known drug molecules, viz. aspirin, caffeine, and a norfloxacin analogue, for which we show that, within the constraints of the functional form, the methodologies implemented in ParaMol are able to derive near-ideal parameters. Additionally, we illustrate the best practices to follow when employing specific parameterization routes. We also determine the sensitivity of different fitting data sets, such as relaxed dihedral scans and configurational ensembles, to the parameterization procedure, and discuss the features of the various weighting methods available to weight configurations. Owing to ParaMol’s capabilities, we propose that this software can be introduced as a routine step in the protocol normally employed to parameterize druglike molecules for MM simulations.

Sensitivity of Binding Free Energy Calculations to Initial Protein Crystal Structure

M. Suruzhon, M. S. Bodnarchuk, A. Ciancetta, R. Viner, I. D. Wall & J. W. Essex (2021)

Binding free energy calculations using alchemical free energy (AFE) methods are widely considered to be the most rigorous tool in the computational drug discovery arsenal. Despite this, the calculations suffer from accuracy, precision, and reproducibility issues. In this publication, we perform a high-throughput study of more than a thousand AFE calculations, utilizing over 220 μs of total sampling time, on three different protein systems to investigate the impact of the initial crystal structure on the resulting binding free energy values. We also consider the influence of equilibration time and discover that the initial crystal structure can have a significant effect on free energy values obtained at short timescales that can manifest itself as a free energy difference of more than 1 kcal/mol. At longer timescales, these differences are largely overtaken by important rare events, such as torsional ligand motions, typically resulting in a much higher uncertainty in the obtained values. This work emphasizes the importance of rare event sampling and long-timescale dynamics in free energy calculations even for routinely performed alchemical perturbations. We conclude that an optimal protocol should not only concentrate computational resources on achieving convergence in the alchemical coupling parameter (λ) space but also on longer simulations and multiple repeats.

grand: A Python Module for Grand Canonical Water Sampling in OpenMM

M. L. Samways, H. E. Bruce Macdonald & J. W. Essex (2020)

Networks of water molecules can play a critical role at the protein-ligand interface, and can directly influence drug-target interactions. Grand canonical methods aid in the sampling of these water molecules, where conventional molecular dynamics equilibration times are often long, by allowing waters to be inserted and deleted from the system, according to the chemical potential. Here, we present our open source Python module, grand (, which allows molecular dynamics simulations to be performed in conjunction with grand canonical Monte Carlo sampling, using the OpenMM simulation engine. We demonstrate the accuracy of this module by reproducing the density of bulk water observed from constant pressure simulations. Application of this code to the bovine pancreatic trypsin inhibitor protein reproduces three buried crystallographic water sites that are poorly sampled using conventional molecular dynamics.

BRIDGE: An Open Platform for Reproducible High-Throughput Free Energy Simulations

T. Senapathi, M. Suruzhon, C. B. Barnett, J. W. Essex & K. J. Naidoo (2020)

Biomolecular Reaction and Interaction Dynamics Global Environment (BRIDGE) is an open-source web platform developed with the aim to provide an environment for the design of reliable methods to conduct reproducible biomolecular simulations. It is built on the well-known Galaxy bioinformatics platform. Through this, BRIDGE hosts computational chemistry tools on public web servers for internet use and provides machine- and operating-system-independent portability using the Docker container platform for local use. This construction improves the accessibility, shareability, and reproducibility of computational methods for molecular simulations. Here we present integrated free energy tools (or apps) to calculate absolute binding free energies (ABFEs) and relative binding free energies (RBFEs), as illustrated through use cases. We present free energy perturbation (FEP) methods contained in various software packages such as GROMACS and YANK that are independent of hardware configuration, software libraries, or operating systems when ported in the Galaxy-BRIDGE Docker container platform. By performing cyclin-dependent kinase 2 (CDK2) FEP calculations on geographically dispersed web servers (in Africa and Europe), we illustrate that large-scale computations can be performed using the exact same codes and methodology by collaborating groups through publicly shared protocols and workflows. The ease of public sharing and independent reproduction of simulations via BRIDGE makes possible an open review of methods and complete simulation protocols. This makes the discovery of inhibitors for drug targets accessible to nonexperts and the computer experiments that are used to arrive at leads verifiable by experts and reviewers. We illustrate this on β-galactoside α-2,3-sialyltransferase I (ST3Gal-I), a breast cancer drug target, where a combination of RBFE and ABFE methods are used to compute the binding free energies of three inhibitors.

Methicillin‐resistant Staphylococcus aureus (MRSA) tolerates β‐lactam antibiotics by carrying out cell wall synthesis with the transpeptidase Penicillin‐binding protein 2a (PBP2a), which cannot be inhibited by β‐lactams. It has been proposed that PBP2a's active site is protected by two loops to reduce the probability of it binding with β‐lactams. Previous crystallographic studies suggested that this protected active site opens for reaction once a native substrate binds at an allosteric domain of PBP2a. This opening was proposed for the new β‐lactam ceftaroline's mechanism in successfully treating MRSA infections, i. e. by it binding to the allosteric site, thereby opening the active site to inhibition. In this work, we investigate the binding of ceftaroline at this proposed allosteric site using molecular dynamics simulations. Unstable binding was observed using the major force fields CHARMM36 and Amber ff14SB, and free energy calculations were unable to confirm a strong allosteric effect. Our study suggests that the allosteric effect induced by ceftaroline is weak at best.

ProtoCaller: robust automation of binding free energy calculations

M. Suruzhon, T. Senapathi, M. S. Bodnarchuk, R. Viner, I. D. Wall, C. B. Barnett, K. J. Naidoo & J. W. Essex (2020)

ProtoCaller is a Python library distributed through Anaconda which automates relative protein–ligand binding free energy calculations in GROMACS. It links a number of popular specialized tools used to perform protein setup and parametrization, such as PDB2PQR, Modeller, and AmberTools. ProtoCaller supports commonly used AMBER force fields with additional cofactor parameters, and AM1-BCC is used to derive ligand charges. ProtoCaller also comes with an extensive PDB parser, an enhanced maximum common substructure algorithm providing improved ligand–ligand mapping, and a light GROMACS wrapper for running multiple molecular dynamics simulations. ProtoCaller is highly relevant to most researchers in the field of biomolecular simulation, allowing a customizable balance between automation and user intervention.

Preorganization of large, directionally oriented, electric fields inside protein active sites has been proposed as a crucial contributor to catalytic mechanism in many enzymes, and it may be efficiently investigated at the atomistic level with molecular dynamics simulations. Here, we evaluate the ability of the AMOEBA polarizable force field, as well as the additive Amber ff14SB and Charmm C36m models, to describe the electric fields present inside the active site of the peptidyl-prolyl isomerase cyclophilin A. We compare the molecular mechanical electric fields to those calculated with a fully first-principles quantum mechanical (QM) representation of the protein, solvent, and ions, and find that AMOEBA consistently shows far greater correlation with the QM electric fields than either of the additive force fields tested. Catalytically relevant fields calculated with AMOEBA were typically smaller than those observed with additive potentials, but were generally consistent with an electrostatically driven mechanism for catalysis. Our results highlight the accuracy and the potential advantages of using polarizable force fields in systems where accurate electrostatics may be crucial for providing mechanistic insights.

Cell-penetrating peptides (CPPs) offer an exciting approach to tackle the pharmacokinetic challenges associated with the delivery of large, polar molecules to intracellular targets. Since the discovery of the first CPPs in the early 1990s, vast amounts of research have been undertaken to characterise their cellular uptake mechanisms. Despite this, the precise mechanisms of cellular internalisation of many CPPs remain elusive owing to inconsistent experimental results. Molecular dynamics (MD) simulations provide an approach to probe specific aspects of the internalisation process and many published CPP studies have incorporated simulation data. This review provides a critical evaluation of the current approaches that are being used to simulate CPPs interacting with artificial lipid bilayers.

Biomolecular simulations: from dynamics and mechanisms to computational assays of biological activity

Huggins, D. J., Biggin, P. C., Dämgen, M. A., Essex, J. W., Harris, S. A., Henchman, R. H., ... van der Kamp, M. W. (2018)

Biomolecular simulation is increasingly central to understanding and designing biological molecules and their interactions. Detailed, physics‐based simulation methods are demonstrating rapidly growing impact in areas as diverse as biocatalysis, drug delivery, biomaterials, biotechnology, and drug design. Simulations offer the potential of uniquely detailed, atomic‐level insight into mechanisms, dynamics, and processes, as well as increasingly accurate predictions of molecular properties. Simulations can now be used as computational assays of biological activity, for example, in predictions of drug resistance. Methodological and algorithmic developments, combined with advances in computational hardware, are transforming the scope and range of calculations. Different types of methods are required for different types of problem. Accurate methods and extensive simulations promise quantitative comparison with experiments across biochemistry. Atomistic simulations can now access experimentally relevant timescales for large systems, leading to a fertile interplay of experiment and theory and offering unprecedented opportunities for validating and developing models. Coarse‐grained methods allow studies on larger length‐ and timescales, and theoretical developments are bringing electronic structure calculations into new regimes. Multiscale methods are another key focus for development, combining different levels of theory to increase accuracy, aiming to connect chemical and molecular changes to macroscopic observables. In this review, we outline biomolecular simulation methods and highlight examples of its application to investigate questions in biology.

Evaluating anti-CD32b F(ab) conformation using molecular dynamics and small-angle X-ray scattering

Sutton, E. J., Bradshaw, R., Orr, C., Frendéus, B., Larsson, G., Teige, I., ... Essex, J. W. (2018)

Complementary strategies of small-angle x-ray scattering (SAXS) and crystallographic analysis are often used to determine atomistic three-dimensional models of macromolecules and their variability in solution. This combination of techniques is particularly valuable when applied to macromolecular complexes to detect changes within the individual binding partners. Here, we determine the x-ray crystallographic structure of a F(ab) fragment in complex with CD32b, the only inhibitory Fc-γ receptor in humans, and compare the structure of the F(ab) from the crystal complex to SAXS data for the F(ab) alone in solution. We investigate changes in F(ab) structure by predicting theoretical scattering profiles for atomistic structures extracted from molecular dynamics (MD) simulations of the F(ab) and assessing the agreement of these structures to our experimental SAXS data. Through principal component analysis, we are able to extract principal motions observed during the MD trajectory and evaluate the influence of these motions on the agreement of structures to the F(ab) SAXS data. Changes in the F(ab) elbow angle were found to be important to reach agreement with the experimental data; however, further discrepancies were apparent between our F(ab) structure from the crystal complex and SAXS data. By analyzing multiple MD structures observed in similar regions of the principal component analysis, we were able to pinpoint these discrepancies to a specific loop region in the F(ab) heavy chain. This method, therefore, not only allows determination of global changes but also allows identification of localized motions important for determining the agreement between atomistic structures and SAXS data. In this particular case, the findings allowed us to discount the hypothesis that structural changes were induced upon complex formation, a significant find informing the drug development process. The methodology described here is generally applicable to deconvolute global and local changes of macromolecular structures and is well suited to other systems.

Large-scale analysis of water stability in bromodomain binding pockets with grand canonical Monte Carlo

Aldeghi, M., Ross, G. A., Bodkin, M. J., Essex, J. W., Knapp, S., & Biggin, P. C. (2018)

Conserved water molecules are of interest in drug design, as displacement of such waters can lead to higher affinity ligands, and in some cases, contribute towards selectivity. Bromodomains, small protein domains involved in the epigenetic regulation of gene transcription, display a network of four conserved water molecules in their binding pockets and have recently been the focus of intense medicinal chemistry efforts. Understanding why certain bromodomains have displaceable water molecules and others do not is extremely challenging, and it remains unclear which water molecules in a given bromodomain can be targeted for displacement. Here we estimate the stability of the conserved water molecules in 35 bromodomains via binding free energy calculations using all-atom grand canonical Monte Carlo simulations. Encouraging quantitative agreement to the available experimental evidence is found. We thus discuss the expected ease of water displacement in different bromodomains and the implications for ligand selectivity.

Computational methods to calculate ligand binding affinities are increasing in popularity, due to improvements in simulation algorithms, computational resources, and easy-to-use software. However, issues can arise in relative ligand binding free energy simulations if the ligands considered have different active site water networks, as simulations are typically performed with a predetermined number of water molecules (fixed N ensembles) in preassigned locations. If an alchemical perturbation is attempted where the change should result in a different active site water network, the water molecules may not be able to adapt appropriately within the time scales of the simulations—particularly if the active site is occluded. By combining the grand canonical ensemble (μVT) to sample active site water molecules, with conventional alchemical free energy methods, the water network is able to dynamically adapt to the changing ligand. We refer to this approach as grand canonical alchemical perturbation (GCAP). In this work we demonstrate GCAP for two systems; Scytalone Dehydratase (SD) and Adenosine A2A receptor. For both systems, GCAP is shown to perform well at reproducing experimental binding affinities. Calculating the relative binding affinities with a naı̈ve, conventional attempt to solvate the active site illustrates how poor results can be if proper consideration of water molecules in occluded pockets is neglected. GCAP results are shown to be consistent with time-consuming double decoupling simulations. In addition, by obtaining the free energy surface for ligand perturbations, as a function of both the free energy coupling parameter and water chemical potential, it is possible to directly deconvolute the binding energetics in terms of protein–ligand direct interactions and protein binding site hydration.

Prediction of the closed conformation and insights into the mechanism of the membrane enzyme LpxR

Saunders, G. M., Bruce Macdonald, H. E., Essex, J. W., & Khalid, S. (2018)

Covalent modification of outer membrane lipids of Gram-negative bacteria can impact the ability of the bacterium to develop resistance to antibiotics as well as modulating the immune response of the host. The enzyme LpxR from Salmonella typhimurium is known to deacylate lipopolysaccharide molecules of the outer membrane; however, the mechanism of action is unknown. Here, we employ molecular dynamics and Monte Carlo simulations to study the conformational dynamics and substrate binding of LpxR in representative outer membrane models as well as detergent micelles. We examine the roles of conserved residues and provide an understanding of how LpxR binds its substrate. Our simulations predict that the catalytic H122 must be Nε-protonated for a single water molecule to occupy the space between it and the scissile bond, with a free binding energy of −8.5 kcal mol−1. Furthermore, simulations of the protein within a micelle enable us to predict the structure of the putative “closed” protein. Our results highlight the need for including dynamics, a representative environment, and the consideration of multiple tautomeric and rotameric states of key residues in mechanistic studies; static structures alone do not tell the full story.

Surface reconstruction amendment to the intrinsic sampling method

Longford, F. G. J., Essex, J. W., Skylaris, C-K., & Frey, J. G. (2018)

The intrinsic sampling method (ISM) is a powerful tool that allows the exploration of interfacial properties from molecular simulations by fitting a function that represents the local boundary between two phases. However, owing to the non-physical nature of an “intrinsic” surface, there remains an ambiguity surrounding the comparison of theoretical properties with the physical world. It is therefore important that the ISM remains internally consistent when reproducing simulated properties which match experiments, such as the surface tension or interfacial density distribution. We show that the current ISM procedure causes an over-fitting of the surface to molecules in the interface region, leading to a biased distribution of curvature at these molecular coordinates. We assert that this biased distribution is a cause of the disparity between predicted interfacial densities upon convolution to a laboratory frame, an artefact which has been known to exist since the development of the ISM. We present an improvement to the fitting procedure of the ISM in an attempt to alleviate the ambiguity surrounding the true nature of an intrinsic surface. Our “surface reconstruction” method is able to amend the shape of the interface so as to reproduce the global curvature distribution at all sampled molecular coordinates. We present the effects that this method has on the ISM predicted structure of a simulated Lennard-Jones fluid air-liquid interface. Additionally, we report an unexpected relationship between surface thermodynamic predictions of our reconstructed ISM surfaces and those of extended capillary wave theory, which is of current interest.

We present an unexpected finite size effect affecting interfacial molecular simulations that is proportional to the width-to-surface-area ratio of the bulk phase Ll/A. This finite size effect has a significant impact on the variance of surface tension values calculated using the virial summation method. A theoretical derivation of the origin of the effect is proposed, giving a new insight into the importance of optimising system dimensions in interfacial simulations. We demonstrate the consequences of this finite size effect via a new way to estimate the surface energetic and entropic properties of simulated air-liquid interfaces. Our method is based on macroscopic thermodynamic theory and involves comparing the internal energies of systems with varying dimensions. We present the testing of these methods using simulations of the TIP4P/2005 water forcefield and a Lennard-Jones fluid model of argon. Finally, we provide suggestions of additional situations, in which this finite size effect is expected to be significant, as well as possible ways to avoid its impact.

CD1b-restricted GEM T cell responses are modulated by Mycobacterium tuberculosis mycolic acid meromycolate chains

Chancellor, A., Tocheva, A., Cave-Ayland, C., Tezera, L. B., White, A., Al Dulayymi, J., ... Mansour, S. (2017)

Tuberculosis (TB), caused by Mycobacterium tuberculosis, remains a major human pandemic. Germline-encoded mycolyl lipid-reactive (GEM) T cells are donor-unrestricted and recognize CD1b-presented mycobacterial mycolates. However, the molecular requirements governing mycolate antigenicity for the GEM T cell receptor (TCR) remain poorly understood. Here, we demonstrate CD1b expression in TB granulomas and reveal a central role for meromycolate chains in influencing GEM-TCR activity. Meromycolate fine structure influences T cell responses in TB-exposed individuals, and meromycolate alterations modulate functional responses by GEM-TCRs. Computational simulations suggest that meromycolate chain dynamics regulate mycolate head group movement, thereby modulating GEM-TCR activity. Our findings have significant implications for the design of future vaccines that target GEM T cells.

Conformation and dynamics of human urotensin II and urotensin related peptide in aqueous solution

Haensele, E., Mele, N., Miljak, M., Read, C. M., Whitley, D. C., Banting, L., ... Clark, T. (2017)

Conformation and dynamics of the vasoconstrictive peptides human urotensin II (UII) and urotensin related peptide (URP) have been investigated by both unrestrained and enhanced-sampling molecular-dynamics (MD) simulations and NMR spectroscopy. These peptides are natural ligands of the G-protein coupled urotensin II receptor (UTR) and have been linked to mammalian pathophysiology. UII and URP cannot be characterized by a single structure but exist as an equilibrium of two main classes of ring conformations, open and folded, with rapidly interchanging subtypes. The open states are characterized by turns of various types centered at K8Y9 or F6W7 predominantly with no or only sparsely populated transannular hydrogen bonds. The folded conformations show multiple turns stabilized by highly populated transannular hydrogen bonds comprising centers F6W7K8 or W7K8Y9. Some of these conformations have not been characterized previously. The equilibrium populations that are experimentally difficult to access were estimated by replica-exchange MD simulations and validated by comparison of experimental NMR data with chemical shifts calculated with density-functional theory. UII exhibits approximately 72% open:28% folded conformations in aqueous solution. URP shows very similar ring conformations as UII but differs in an open:folded equilibrium shifted further toward open conformations (86:14) possibly arising from the absence of folded N-terminal tail-ring interaction. The results suggest that the different biological effects of UII and URP are not caused by differences in ring conformations but rather by different interactions with UTR.

G protein coupled receptors (GPCRs) are located in membranes rich in cholesterol. The membrane spanning surfaces of GPCRs contain exposed backbone carbonyl groups and residue side chains potentially capable of forming hydrogen bonds to cholesterol molecules buried deep within the hydrophobic core of the lipid bilayer. Coarse-grained molecular dynamics (CGMD) simulations allow the observation of GPCRs in cholesterol-containing lipid bilayers for long times (50 μs), sufficient to ensure equilibration of the system. We have detected a number of deep cholesterol binding sites on β2 adrenergic and A2A adenosine receptors, and shown changes in these sites on agonist binding. The requirements for binding are modest, just a potential hydrogen bond partner close to a cleft or hole in the surface. This makes it likely that similar binding sites for cholesterol will exist on other classes of membrane protein.

On the calculation of acyl chain order parameters from lipid simulations

Piggot, T. J., Allison, J. R., Sessions, R. B., & Essex, J. W. (2017)

For molecular dynamics simulations of biological membrane systems to live up to the potential of providing accurate atomic level detail into membrane properties and functions, it is essential that the force fields used to model such systems are as accurate as possible. One membrane property that is often used to assess force field accuracy is the carbon–hydrogen (or carbon–deuterium) order parameters of the lipid tails, which can be accurately measured using experimental NMR techniques. There are a variety of analysis tools available to calculate these order parameters from simulations and it is essential that these computational tools work correctly to ensure the accurate assessment of the simulation force fields. In this work we compare many of these computational tools for calculating the order parameters of POPC membranes. While tools that work on all-atom systems and tools that work on saturated lipid tails in general work extremely well, we demonstrate that the majority of the tested tools that calculate the order parameters for unsaturated united-atom lipid tails do so incorrectly. We identify tools that do perform accurate calculations and include one such program with this work, enabling rapid and accurate calculation of united-atom lipid order parameters. Furthermore, we discuss cases in which it is nontrivial to appropriately predict the unsaturated carbon order parameters in united-atom systems. Finally, we examine order parameter splitting for carbon 2 in sn-2 lipid chains, demonstrating substantial deviations from experimental values in several all-atom and united-atom lipid force fields.

Development of coarse-grained (CG) molecular dynamics models is often a laborious process which commonly relies upon approximations to similar models, rather than systematic parametrization. PyCGTOOL automates much of the construction of CG models via calculation of both equilibrium values and force constants of internal coordinates directly from atomistic molecular dynamics simulation trajectories. The derivation of bespoke parameters from atomistic simulations improves the quality of the CG model compared to the use of generic parameters derived from other molecules, while automation greatly reduces the time required. The ease of configuration of PyCGTOOL enables the rapid investigation of multiple atom-to-bead mappings and topologies. Although we present PyCGTOOL used in combination with the GROMACS molecular dynamics engine its use of standard trajectory input libraries means that it is in principle compatible with other software. The software is available from the URL as the following doi: 10.5281/zenodo.259330.

Replica-Exchange and Standard State Binding Free Energies with Grand Canonical Monte Carlo

Ross, G. A., Bruce Macdonald, H. E., Cave-Ayland, C., Cabedo Martinez, A. I., & Essex, J. W. (2017)

The ability of grand canonical Monte Carlo (GCMC) to create and annihilate molecules in a given region greatly aids the identification of water sites and water binding free energies in protein cavities. However, acceptance rates without the application of biased moves can be low, resulting in large variations in the observed water occupancies. Here, we show that replica-exchange of the chemical potential significantly reduces the variance of the GCMC data. This improvement comes at a negligible increase in computational expense when simulations comprise of runs at different chemical potentials. Replica-exchange GCMC is also found to substantially increase the precision of water binding free energies as calculated with grand canonical integration, which has allowed us to address a missing standard state correction.

Hybrid free energy methods allow estimation of free energy differences at the quantum mechanics (QM) level with high efficiency by performing sampling at the classical mechanics (MM) level. Various approaches to allow the calculation of QM corrections to classical free energies have been proposed. The single step free energy perturbation approach starts with a classically generated ensemble, a subset of structures of which are postprocessed to obtain QM energies for use with the Zwanzig equation. This gives an estimate of the free energy difference associated with the change from an MM to a QM Hamiltonian. Owing to the poor numerical properties of the Zwanzig equation, however, recent developments have produced alternative methods which aim to provide access to the properties of the true QM ensemble. Here we propose an approach based on the resampling of MM structural ensembles and application of a Monte Carlo acceptance test which in principle, can generate the exact QM ensemble or intermediate ensembles between the MM and QM states. We carry out a detailed comparison against the Zwanzig equation and recently proposed non-Boltzmann methods. As a test system we use a set of small molecule hydration free energies for which hybrid free energy calculations are performed at the semiempirical Density Functional Tight Binding level. Equivalent ensembles at this level of theory have also been generated allowing the reverse QM to MM perturbations to be performed along with a detailed analysis of the results. Additionally, a previously published nucleotide base pair data set simulated at the QM level using ab initio molecular dynamics is also considered. We provide a strong rationale for the use of the Monte Carlo Resampling and non-Boltzmann approaches by showing that configuration space overlaps can be estimated which provide useful diagnostic information regarding the accuracy of these hybrid approaches.

Advanced potential energy surfaces for molecular simulation

Albaugh, A., Boateng, H. A., Bradshaw, R., Demerdash, O. N., Dziedzic, J., Mao, Y., ... Head-Gordon, T. (2016)

Advanced potential energy surfaces are defined as theoretical models that explicitly include many-body effects that transcend the standard fixed-charge, pairwise-additive paradigm typically used in molecular simulation. However, several factors relating to their software implementation have precluded their widespread use in condensed-phase simulations: the computational cost of the theoretical models, a paucity of approximate models and algorithmic improvements that can ameliorate their cost, underdeveloped interfaces and limited dissemination in computational code bases that are widely used in the computational chemistry community, and software implementations that have not kept pace with modern high-performance computing (HPC) architectures, such as multicore CPUs and modern graphics processing units (GPUs). In this Feature Article we review recent progress made in these areas, including well-defined polarization approximations and new multipole electrostatic formulations, novel methods for solving the mutual polarization equations and increasing the MD time step, combining linear-scaling electronic structure methods with new QM/MM methods that account for mutual polarization between the two regions, and the greatly improved software deployment of these models and methods onto GPU and CPU hardware platforms. We have now approached an era where multipole-based polarizable force fields can be routinely used to obtain computational results comparable to state-of-the-art density functional theory while reaching sampling statistics that are acceptable when compared to that obtained from simpler fixed partial charge force fields.

We present blind predictions submitted to the SAMPL5 challenge on calculating distribution coefficients. The predictions were based on estimating the solvation free energies in water and cyclohexane of the 53 compounds in the challenge. These free energies were computed using alchemical free energy simulations based on a hybrid all-atom/coarse-grained model. The compounds were treated with the general Amber force field, whereas the solvent molecules were treated with the Elba coarse-grained model. Considering the simplicity of the solvent model and that we approximate the distribution coefficient with the partition coefficient of the neutral species, the predictions are of good accuracy. The correlation coefficient, R is 0.64, 82 % of the predictions have the correct sign and the mean absolute deviation is 1.8 log units. This is on a par with or better than the other simulation-based predictions in the challenge. We present an analysis of the deviations to experiments and compare the predictions to another submission that used all-atom solvent.

Cholesteryl esters stabilize human CD1c conformations for recognition by self-reactive T cells

Mansour, S., Tocheva, A., Cave-Ayland, C., Machelett, M. M., Sander, B., Lissin, N. M., ... Gadola, S. D. (2016)

Cluster of differentiation 1c (CD1c)-dependent self-reactive T cells are abundant in human blood, but self-antigens presented by CD1c to the T-cell receptors of these cells are poorly understood. Here we present a crystal structure of CD1c determined at 2.4 Å revealing an extended ligand binding potential of the antigen groove and a substantially different conformation compared with known CD1c structures. Computational simulations exploring different occupancy states of the groove reenacted these different CD1c conformations and suggested cholesteryl esters (CE) and acylated steryl glycosides (ASG) as new ligand classes for CD1c. Confirming this, we show that binding of CE and ASG to CD1c enables the binding of human CD1c self-reactive T-cell receptors. Hence, human CD1c adopts different conformations dependent on ligand occupancy of its groove, with CE and ASG stabilizing CD1c conformations that provide a footprint for binding of CD1c self-reactive T-cell receptors.

Hydration free energy (HFE) calculations are often used to assess the performance of biomolecular force fields and the quality of assigned parameters. The AMOEBA polarizable force field moves beyond traditional pairwise additive models of electrostatics and may be expected to improve upon predictions of thermodynamic quantities such as HFEs over and above fixed-point-charge models. The recent SAMPL4 challenge evaluated the AMOEBA polarizable force field in this regard but showed substantially worse results than those using the fixed-point-charge GAFF model. Starting with a set of automatically generated AMOEBA parameters for the SAMPL4 data set, we evaluate the cumulative effects of a series of incremental improvements in parametrization protocol, including both solute and solvent model changes. Ultimately, the optimized AMOEBA parameters give a set of results that are not statistically significantly different from those of GAFF in terms of signed and unsigned error metrics. This allows us to propose a number of guidelines for new molecule parameter derivation with AMOEBA, which we expect to have benefits for a range of biomolecular simulation applications such as protein–ligand binding studies.

The effects of electronic polarization in biomolecular interactions will differ depending on the local dielectric constant of the environment, such as in solvent, DNA, proteins, and membranes. Here the performance of the AMOEBA polarizable force field is evaluated under nonaqueous conditions by calculating the solvation free energies of small molecules in four common organic solvents. Results are compared with experimental data and equivalent simulations performed with the GAFF pairwise‐additive force field. Although AMOEBA results give mean errors close to “chemical accuracy,” GAFF performs surprisingly well, with statistically significantly more accurate results than AMOEBA in some solvents. However, for both models, free energies calculated in chloroform show worst agreement to experiment and individual solutes are consistently poor performers, suggesting non‐potential‐specific errors also contribute to inaccuracy. Scope for the improvement of both potentials remains limited by the lack of high quality experimental data across multiple solvents, particularly those of high dielectric constant. © 2016 The Authors. Journal of Computational Chemistry Published by Wiley Periodicals, Inc.

Most p53 mutations associated with cancer are located in its DNA binding domain (DBD). Many structures (X‐ray and NMR) of this domain are available in the protein data bank (PDB) and a vast conformational heterogeneity characterizes the various free and complexed states. The major difference between the apo and the holo‐complexed states appears to lie in the L1 loop. In particular, the conformations of this loop appear to depend intimately on the sequence of DNA to which it binds. This conclusion builds upon recent observations that implicate the tetramerization and the C‐terminal domains (respectively TD and Cter) in DNA binding specificity. Detailed PCA analysis of the most recent collection of DBD structures from the PDB have been carried out. In contrast to recommendations that small molecules/drugs stabilize the flexible L1 loop to rescue mutant p53, our study highlights a need to retain the flexibility of the p53 DNA binding surface (DBS). It is the adaptability of this region that enables p53 to engage in the diverse interactions responsible for its functionality. Proteins 2016; 84:1443–1461. © 2016 Wiley Periodicals, Inc.

A computational study of vicinal fluorination in 2,3-difluorobutane: implications for conformational control in alkane chains

Fox, S. J., Gourdain, S., Coulthurst, A., Fox, C., Kuprov, I., Essex, J. W., ... Linclau, B. (2015)

A comprehensive conformational analysis of both 2,3‐difluorobutane diastereomers is presented based on density functional theory calculations in vacuum and in solution, as well as NMR experiments in solution. While for 1,2‐difluoroethane the fluorine gauche effect is clearly the dominant effect determining its conformation, it was found that for 2,3‐difluorobutane there is a complex interplay of several effects, which are of similar magnitude but often of opposite sign. As a result, unexpected deviations in dihedral angles, relative conformational energies and populations are observed which cannot be rationalised only by chemical intuition. Furthermore, it was found that it is important to consider the free energies of the various conformers, as these lead to qualitatively different results both in vacuum and in solvent, when compared to calculations based only on the electronic energies. In contrast to expectations, it was found that vicinal syn‐difluoride introduction in the butane and by extension, longer hydrocarbon chains, is not expected to lead to an effective stabilisation of the linear conformation. Our findings have implications for the use of the vicinal difluoride motif for conformational control.

We present an efficient all-atom/coarse-grained hybrid model and apply it to membrane processes. This model is an extension of the all-atom/ELBA model applied previously to processes in water. Here, we improve the efficiency of the model by implementing a multiple-time step integrator that allows the atoms and the coarse-grained beads to be propagated at different timesteps. Furthermore, we fine-tune the interaction between the atoms and the coarse-grained beads by computing the potential of mean force of amino acid side chain analogs along the membrane normal and comparing to atomistic simulations. The model was independently validated on the calculation of small-molecule partition coefficients. Finally, we apply the model to membrane peptides. We studied the tilt angle of the Walp23 and Kalp23 helices in two different model membranes and the stability of the glycophorin A dimer. The model is efficient, accurate, and straightforward to use, as it does not require any extra interaction particles, layers of atomistic solvent molecules or tabulated potentials, thus offering a novel, simple approach to study membrane processes.

Direct validation of the single step classical to quantum free energy perturbation

Cave-Ayland, C., Skylaris, C-K., & Essex, J. W. (2015)

The use of the Zwanzig equation in the calculation of single-step perturbations to provide first-principles (ab initio) quantum mechanics (QM) correction terms to molecular mechanics (MM) free energy cycles is well established. A rigorous test of the ability to converge such calculations would be very useful in this context. In this work, we perform a direct assessment of the convergence of the MM to QM perturbation, by attempting the reverse QM to MM perturbation. This required the generation of extensive QM molecular dynamics trajectories, using density functional theory (DFT), within the representative biological system of a DNA adenosine–thymidine dimer. Over 100 ps of dynamics with the PBE functional and 6.25 ps with the LDA functional were generated. We demonstrate that calculations with total potential energies are very poorly convergent due to a lack of overlap of phase space distributions between ensembles. While not theoretically rigorous, the use of interaction energies provides far superior convergence, despite the presence of nonclassical charge transfer effects within the DFT trajectories. The source of poor phase space overlap for total energies is diagnosed, the approximate quantification of overlaps suggesting that even for the comparatively simple system considered here convergence of total energy calculations within a reasonable simulation time is unfeasible.

Water sites, networks, and free energies with grand canonical Monte Carlo

Ross, G. A., Bodnarchuk, M. S., & Essex, J. W. (2015)

Water molecules play integral roles in the formation of many protein–ligand complexes, and recent computational efforts have been focused on predicting the thermodynamic properties of individual waters and how they may be exploited in rational drug design. However, when water molecules form highly coupled hydrogen-bonding networks, there is, as yet, no method that can rigorously calculate the free energy to bind the entire network or assess the degree of cooperativity between waters. In this work, we report theoretical and methodological developments to the grand canonical Monte Carlo simulation technique. Central to our results is a rigorous equation that can be used to calculate efficiently the binding free energies of water networks of arbitrary size and complexity. Using a single set of simulations, our methods can locate waters, estimate their binding affinities, capture the cooperativity of the water network, and evaluate the hydration free energy of entire protein binding sites. Our techniques have been applied to multiple test systems and compare favorably to thermodynamic integration simulations and experimental data. The implications of these methods in drug design are discussed.

X-ray crystallographic and EPR spectroscopic analysis of HydG, a maturase in [FeFe]-hydrogenase H-cluster assembly

Dinis, P., Suess, D. L. M., Fox, S. J., Harmer, J. E., Driesener, R. C., De La Paz, L., ... Roach, P. L. (2015)

Hydrogenases use complex metal cofactors to catalyze the reversible formation of hydrogen. In [FeFe]-hydrogenases, the H-cluster cofactor includes a diiron subcluster containing azadithiolate, three CO, and two CN− ligands. During the assembly of the H cluster, the radical S-adenosyl methionine (SAM) enzyme HydG lyses the substrate tyrosine to yield the diatomic ligands. These diatomic products form an enzyme-bound Fe(CO)x(CN)y synthon that serves as a precursor for eventual H-cluster assembly. To further elucidate the mechanism of this complex reaction, we report the crystal structure and EPR analysis of HydG. At one end of the HydG (βα)8 triosephosphate isomerase (TIM) barrel, a canonical [4Fe-4S] cluster binds SAM in close proximity to the proposed tyrosine binding site. At the opposite end of the active-site cavity, the structure reveals the auxiliary Fe-S cluster in two states: one monomer contains a [4Fe-5S] cluster, and the other monomer contains a [5Fe-5S] cluster consisting of a [4Fe-4S] cubane bridged by a μ2-sulfide ion to a mononuclear Fe2+ center. This fifth iron is held in place by a single highly conserved protein-derived ligand: histidine 265. EPR analysis confirms the presence of the [5Fe-5S] cluster, which on incubation with cyanide, undergoes loss of the labile iron to yield a [4Fe-4S] cluster. We hypothesize that the labile iron of the [5Fe-5S] cluster is the site of Fe(CO)x(CN)y synthon formation and that the limited bonding between this iron and HydG may facilitate transfer of the intact synthon to its cognate acceptor for subsequent H-cluster assembly.

Extensive all-atom Monte Carlo sampling and QM/MM corrections in the SAMPL4 hydration free energy challenge

Genheden, S., Cabedo Martinez, A. I., Criddle, M. P., & Essex, J. W. (2014)

We present our predictions for the SAMPL4 hydration free energy challenge. Extensive all-atom Monte Carlo simulations were employed to sample the compounds in explicit solvent. While the focus of our study was to demonstrate well-converged and reproducible free energies, we attempted to address the deficiencies in the general Amber force field force field with a simple QM/MM correction. We show that by using multiple independent simulations, including different starting configurations, and enhanced sampling with parallel tempering, we can obtain well converged hydration free energies. Additional analysis using dihedral angle distributions, torsion-root mean square deviation plots and thermodynamic cycles support this assertion. We obtain a mean absolute deviation of 1.7 kcal mol−1 and a Kendall’s τ of 0.65 compared with experiment.

Strategies to calculate water binding free energies in protein–ligand complexes

Bodnarchuk, M. S., Viner, R., Michel, J., & Essex, J. W. (2014)

Water molecules are commonplace in protein binding pockets, where they can typically form a complex between the protein and a ligand or become displaced upon ligand binding. As a result, it is often of great interest to establish both the binding free energy and location of such molecules. Several approaches to predicting the location and affinity of water molecules to proteins have been proposed and utilized in the literature, although it is often unclear which method should be used under what circumstances. We report here a comparison between three such methodologies, Just Add Water Molecules (JAWS), Grand Canonical Monte Carlo (GCMC), and double-decoupling, in the hope of understanding the advantages and limitations of each method when applied to enclosed binding sites. As a result, we have adapted the JAWS scoring procedure, allowing the binding free energies of strongly bound water molecules to be calculated to a high degree of accuracy, requiring significantly less computational effort than more rigorous approaches. The combination of JAWS and GCMC offers a route to a rapid scheme capable of both locating and scoring water molecules for rational drug design.

Structures of lipoyl synthase reveal a compact active site for controlling sequential sulfur insertion reactions

Harmer, J., Hiscox, M. J., Dinis, P. C., Fox, S. J., Iliopoulos, A., Hussey, J. E., ... Roach, P. L. (2014)

Lipoyl cofactors are essential for living organisms and are produced by the insertion of two sulfur atoms into the relatively unreactive C–H bonds of an octanoyl substrate. This reaction requires lipoyl synthase, a member of the radical S-adenosylmethionine (SAM) enzyme superfamily. In the present study, we solved crystal structures of lipoyl synthase with two [4Fe–4S] clusters bound at opposite ends of the TIM barrel, the usual fold of the radical SAM superfamily. The cluster required for reductive SAM cleavage conserves the features of the radical SAM superfamily, but the auxiliary cluster is bound by a CX4CX5C motif unique to lipoyl synthase. The fourth ligand to the auxiliary cluster is an extremely unusual serine residue. Site-directed mutants show this conserved serine ligand is essential for the sulfur insertion steps. One crystallized lipoyl synthase (LipA) complex contains 5′-methylthioadenosine (MTA), a breakdown product of SAM, bound in the likely SAM-binding site. Modelling has identified an 18 Å (1 Å=0.1 nm) deep channel, well-proportioned to accommodate an octanoyl substrate. These results suggest that the auxiliary cluster is the likely sulfur donor, but access to a sulfide ion for the second sulfur insertion reaction requires the loss of an iron atom from the auxiliary cluster, which the serine ligand may enable.

Free energies of binding from large-scale first-principles quantum mechanical calculations: application to ligand hydration energies

Fox, S. J., Pittock, C., Tautermann, C. S., Fox, T., Christ, C., Malcolm, N. O. J., ... Skylaris, C-K. (2013)

Schemes of increasing sophistication for obtaining free energies of binding have been developed over the years, where configurational sampling is used to include the all-important entropic contributions to the free energies. However, the quality of the results will also depend on the accuracy with which the intermolecular interactions are computed at each molecular configuration. In this context, the energy change associated with the rearrangement of electrons (electronic polarization and charge transfer) upon binding is a very important effect. Classical molecular mechanics force fields do not take this effect into account explicitly, and polarizable force fields and semiempirical quantum or hybrid quantum–classical (QM/MM) calculations are increasingly employed (at higher computational cost) to compute intermolecular interactions in free-energy schemes. In this work, we investigate the use of large-scale quantum mechanical calculations from first-principles as a way of fully taking into account electronic effects in free-energy calculations. We employ a one-step free-energy perturbation (FEP) scheme from a molecular mechanical (MM) potential to a quantum mechanical (QM) potential as a correction to thermodynamic integration calculations within the MM potential. We use this approach to calculate relative free energies of hydration of small aromatic molecules. Our quantum calculations are performed on multiple configurations from classical molecular dynamics simulations. The quantum energy of each configuration is obtained from density functional theory calculations with a near-complete psinc basis set on over 600 atoms using the ONETEP program.

Water network perturbation in ligand binding: Adenosine A2AAntagonists as a case study

Bortolato, A., Tehan, B. G., Bodnarchuk, M. S., Essex, J. W., & Mason, J. S. (2013)

Recent efforts in the computational evaluation of the thermodynamic properties of water molecules have resulted in the development of promising new in silico methods to evaluate the role of water in ligand binding. These methods include WaterMap, SZMAP, GRID/CRY probe, and Grand Canonical Monte Carlo simulations. They allow the prediction of the position and relative free energy of the water molecule in the protein active site and the analysis of the perturbation of an explicit water network (WNP) as a consequence of ligand binding. We have for the first time extended these approaches toward the prediction of kinetics for small molecules and of relative free energy of binding with a focus on the perturbation of the water network and application to large diverse data sets. Our results support a qualitative correlation between the residence time of 12 related triazine adenosine A2A receptor antagonists and the number and position of high energy trapped solvent molecules. From a quantitative viewpoint, we successfully applied these computational techniques as an implicit solvent alternative, in linear combination with a molecular mechanics force field, to predict the relative ligand free energy of binding (WNP-MMSA). The applicability of this linear method, based on the thermodynamics additivity principle, did not extend to 375 diverse A2A receptor antagonists. However, a fast but effective method could be enabled by replacing the linear approach with a machine learning technique using probabilistic classification trees, which classified the binding affinity correctly for 90% of the ligands in the training set and 67% in the test set.

A recently developed coarse-grain model is applied to simulate hydrated membranes containing the lamellar lipid DOPC and the nonlamellar lipid DOPE. In a first series of simulations, DOPC–water and DOPE–water systems are shown to form respectively bilayers and inverse hexagonal phases, in agreement with the well-known behaviour observed experimentally. A second set of calculations is then run to investigate several fundamental physical features of mixed DOPC–DOPE bilayers at different relative compositions. In particular, a quantitative characterisation is obtained of the internal distributions (profiles) of lateral pressure and electrical potential. These two properties, very difficult to measure experimentally, are thought to underpin many key membrane phenomena, including nonspecific lipid-mediated mechanisms of protein regulation. The molecular origin of the distributions, and their dependence on changes in the DOPC : DOPE ratio, are explained through an analysis of separate contributions from individual interaction types and molecular groups.

We present a molecular simulation protocol to compute free energies of binding, which combines a QM/MM correction term with rigorous classical free energy techniques, thereby accounting for electronic polarization effects. Relative free energies of binding are first computed using classical force fields, Monte Carlo sampling, and replica exchange thermodynamic integration. Snapshots of the configurations at the end points of the perturbation are then subjected to DFT-QM/MM single-point calculations using the B3LYP functional and a range of basis sets. The resulting quantum mechanical energies are then processed using the Zwanzig equation to give free energies incorporating electronic polarization. Our approach is conceptually simple and does not require tightly coupled QM and MM software. The method has been validated by calculating the relative free energies of hydration of methane and water and the relative free energy of binding of two inhibitors of cyclooxygenase-2. Closed thermodynamic cycles are obtained across different pathways, demonstrating the correctness of the technique, although significantly more sampling is required for the protein−ligand system. Our method offers a simple and effective way to incorporate quantum mechanical effects into computed free energies of binding.

Pocket-space maps to identify novel binding-site conformations in proteins

Craig, I. R., Pfleger, C., Gohlke, H., Essex, J. W., & Spiegel, K. (2011)

The identification of novel binding-site conformations can greatly assist the progress of structure-based ligand design projects. Diverse pocket shapes drive medicinal chemistry to explore a broader chemical space and thus present additional opportunities to overcome key drug discovery issues such as potency, selectivity, toxicity, and pharmacokinetics. We report a new automated approach to diverse pocket selection, PocketAnalyzerPCA, which applies principal component analysis and clustering to the output of a grid-based pocket detection algorithm. Since the approach works directly with pocket shape descriptors, it is free from some of the problems hampering methods that are based on proxy shape descriptors, e.g. a set of atomic positional coordinates. The approach is technically straightforward and allows simultaneous analysis of mutants, isoforms, and protein structures derived from multiple sources with different residue numbering schemes. The PocketAnalyzerPCA approach is illustrated by the compilation of diverse sets of pocket shapes for aldose reductase and viral neuraminidase. In both cases this allows identification of novel computationally derived binding-site conformations that are yet to be observed crystallographically. Indeed, known inhibitors capable of exploiting these novel binding-site conformations are subsequently identified, thereby demonstrating the utility of PocketAnalyzerPCA for rationalizing and improving the understanding of the molecular basis of protein–ligand interaction and bioactivity. A Python program implementing the PocketAnalyzerPCA approach is available for download under an open-source license ( or

Selective anticancer activity of a hexapeptide with sequence homology to a non-kinase domain of Cyclin Dependent Kinase 4

Warenius, H. M., Kilburn, J. D., Essex, J. W., Maurer, R. I., Blaydes, J. P., Agarwala, U., & Seabra, L. A. (2011)

Cyclin-dependent kinases 2, 4 and 6 (Cdk2, Cdk4, Cdk6) are closely structurally homologous proteins which are classically understood to control the transition from the G1 to the S-phases of the cell cycle by combining with their appropriate cyclin D or cyclin E partners to form kinase-active holoenzymes. Deregulation of Cdk4 is widespread in human cancer, CDK4 gene knockout is highly protective against chemical and oncogene-mediated epithelial carcinogenesis, despite the continued presence of CDK2 and CDK6; and o verexpresssion of Cdk4 promotes skin carcinogenesis. Surprisingly, however, Cdk4 kinase inhibitors have not yet fulfilled their expectation as 'blockbuster' anticancer agents. Resistance to inhibition of Cdk4 kinase in some cases could potentially be due to a non-kinase activity, as recently reported with epidermal growth factor receptor.

The ELBA force field for coarse-grain modeling of lipid membranes

Fraternali, F. (Ed.), Orsi, M., & Essex, J. W. (2011)

A new coarse-grain model for molecular dynamics simulation of lipid membranes is presented. Following a simple and conventional approach, lipid molecules are modeled by spherical sites, each representing a group of several atoms. In contrast to common coarse-grain methods, two original (interdependent) features are here adopted. First, the main electrostatics are modeled explicitly by charges and dipoles, which interact realistically through a relative dielectric constant of unity. Second, water molecules are represented individually through a new parametrization of the simple Stockmayer potential for polar fluids; each water molecule is therefore described by a single spherical site embedded with a point dipole. The force field is shown to accurately reproduce the main physical properties of single-species phospholipid bilayers comprising dioleoylphosphatidylcholine (DOPC) and dioleoylphosphatidylethanolamine (DOPE) in the liquid crystal phase, as well as distearoylphosphatidylcholine (DSPC) in the liquid crystal and gel phases. Insights are presented into fundamental properties and phenomena that can be difficult or impossible to study with alternative computational or experimental methods. For example, we investigate the internal pressure distribution, dipole potential, lipid diffusion, and spontaneous self-assembly. Simulations lasting up to 1.5 microseconds were conducted for systems of different sizes (128, 512 and 1058 lipids); this also allowed us to identify size-dependent artifacts that are expected to affect membrane simulations in general. Future extensions and applications are discussed, particularly in relation to the methodology's inherent multiscale capabilities.

Anisotropic elastic network modeling of entire microtubules

Deriu, M. A., Soncini, M., Orsi, M., Patel, M., Essex, J. W., Montevecchi, F. M., & Redaelli, A. (2010)

Microtubules are supramolecular structures that make up the cytoskeleton and strongly affect the mechanical properties of the cell. Within the cytoskeleton filaments, the microtubule (MT) exhibits by far the highest bending stiffness. Bending stiffness depends on the mechanical properties and intermolecular interactions of the tubulin dimers (the MT building blocks). Computational molecular modeling has the potential for obtaining quantitative insights into this area. However, to our knowledge, standard molecular modeling techniques, such as molecular dynamics (MD) and normal mode analysis (NMA), are not yet able to simulate large molecular structures like the MTs; in fact, their possibilities are normally limited to much smaller protein complexes. In this work, we developed a multiscale approach by merging the modeling contribution from MD and NMA. In particular, MD simulations were used to refine the molecular conformation and arrangement of the tubulin dimers inside the MT lattice. Subsequently, NMA was used to investigate the vibrational properties of MTs modeled as an elastic network. The coarse-grain model here developed can describe systems of hundreds of interacting tubulin monomers (corresponding to up to 1,000,000 atoms). In particular, we were able to simulate coarse-grain models of entire MTs, with lengths up to 350 nm. A quantitative mechanical investigation was performed; from the bending and stretching modes, we estimated MT macroscopic properties such as bending stiffness, Young modulus, and persistence length, thus allowing a direct comparison with experimental data.

Coarse-grain modelling of DMPC and DOPC lipid bilayers

Orsi, M., Michel, J., & Essex, J. W. (2010)

Our recently developed coarse-grain model for dimyristoylphosphatidylcholine (DMPC) has been improved and extended to dioleylphosphatidylcholine (DOPC), a more typical constituent of real biological membranes. Single-component DMPC and DOPC bilayers have been simulated using microsecond-long molecular dynamics. We investigated properties that are difficult or impossible to access experimentally, such as the pressure distribution, the spontaneous curvature and the diffusion pattern of individual lipid molecules. Moreover, we studied the dipole potential, a basic physical feature of paramount biological importance that cannot be currently modelled by other coarse-grain approaches. In fact, a complete representation of the system electrostatics and a realistic description of the water component make our method unique amongst the existing coarse-grain membrane models. The spontaneous permeation of water, a phenomenon out of reach of standard atomistic models, was also observed and quantified; this was possible thanks to the efficiency of our model, which is about two orders of magnitude less computationally expensive than atomic-level counterparts. Results are generally in good agreement with the literature data. Further model extensions and future applications are proposed.

Triclocarban and triclosan, two potent antibacterial molecules present in many consumer products, have been subject to growing debate on a number of issues, particularly in relation to their possible role in causing microbial resistance. In this computational study, we present molecular-level insights into the interaction between these antimicrobial agents and hydrated phospholipid bilayers (taken as a simple model for the cell membrane). Simulations are conducted by a novel ‘dual-resolution’ molecular dynamics approach which combines accuracy with efficiency: the antimicrobials, modelled atomistically, are mixed with simplified (coarse-grain) models of lipids and water. A first set of calculations is run to study the antimicrobials' transfer free energies and orientations as a function of depth inside the membrane. Both molecules are predicted to preferentially accumulate in the lipid headgroup–glycerol region; this finding, which reproduces corresponding experimental data, is also discussed in terms of a general relation between solute partitioning and the intramembrane distribution of pressure. A second set of runs involves membranes incorporated with different molar concentrations of antimicrobial molecules (up to one antimicrobial per two lipids). We study the effects induced on fundamental membrane properties, such as the electron density, lateral pressure and electrical potential profiles. In particular, the analysis of the spontaneous curvature indicates that increasing antimicrobial concentrations promote a ‘destabilizing’ tendency towards non-bilayer phases, as observed experimentally. The antimicrobials' influence on the self-assembly process is also investigated. The significance of our results in the context of current theories of antimicrobial action is discussed.

Docking into multiple receptor conformations (“ensemble docking”) has been proposed, and employed, in the hope that it may account for receptor flexibility in virtual screening and thus provide higher enrichments than docking into single rigid receptor structures. The statistical analyses presented in this paper provide quantitative evidence that in some cases docking into a crystallographically derived conformational ensemble does indeed yield better enrichment than docking into any of the individual members of the ensemble. However, these “successful” ensembles account for only a minority of those examined and it would not have been possible to prospectively predict their identity using only protein structural information. A more frequently observed outcome is that the ensemble enrichment is higher than the mean of the enrichments provided by its individual members. An additional and promising finding is that, if a set of known active compounds is available, an approach based on induced-fit docking appears to be a reliable way to construct ensembles which provide relatively high enrichments.

The unassisted permeation process of β-blocker drugs (alprenolol, atenolol, pindolol) and steroid hormones (progesterone, testosterone) through a lipid membrane is simulated by a novel dual-resolution molecular dynamics approach. The lipid and water molecules are described by simple and efficient coarse-grain models, whereas the drug and hormone permeants are represented by traditional atomistic models. Our hybrid method is about two orders of magnitude faster than standard atomic-level counterparts. For each permeant, we calculate the transfer free energy as a function of depth inside the bilayer; these data indicate the location across the membrane where the solutes preferentially partition. Using the free energy profiles, we develop a simple expression that proves remarkably accurate in predicting experimental permeability rankings; the proposed permeation model highlights and addresses potentially problematic aspects of the standard solubility-diffusion theory. We also calculate the diffusion coefficients of the permeants, and track their lateral motion to study their diffusive patterns. Furthermore, we show the drugs' perturbing effect on the bilayer structure and quantify the steroids' preferred orientations. The results obtained compare favourably with experimental measurements and traditional atomic-level simulation data reported in the literature. Promising potential applications of our methodology to areas such as drug design and membrane-protein modelling are discussed.

Many limitations of current computer-aided drug design arise from the difficulty of reliably predicting the binding affinity of a small molecule to a biological target. There is thus a strong interest in novel computational methodologies that claim predictions of greater accuracy than current scoring functions, and at a throughput compatible with the rapid pace of drug discovery in the pharmaceutical industry. Notably, computational methodologies firmly rooted in statistical thermodynamics have received particular attention in recent years. Yet free energy calculations can be daunting to learn for a novice user because of numerous technical issues and various approaches advocated by experts in the field. The purpose of this article is to provide an overview of the current capabilities of free energy calculations and to discuss the applicability of this technology to drug discovery.

Rigorous free energy calculations in structure-based drug design

Michel, J., Foloppe, N., & Essex, J. W. (2010)

Structure‐based drug design could benefit greatly from computational methodologies that accurately predict the binding affinity of small compounds to target biomolecules. However, the current scoring functions used to rank compounds in virtual screens by docking are not sufficiently accurate to guide reliably the design of tight binding ligands. Thus, there is strong interest in methodologies based on molecular simulations of protein‐ligand complexes which are perceived to be more accurate and, with advances in computing power, amenable to routine use. This report provides an overview of the technical details necessary to understand, execute and analyze binding free energy calculations, using free energy perturbation or thermodynamic integration methods. Examples of possible applications in structure‐based drug design are discussed. Current methodological limitations are highlighted as well as a number of ongoing developments to improve the scope, reliability, and practicalities of free energy calculations. These efforts are paving the way for a more common use of free energy calculations in molecular design.

Assessment of nonequilibrium free energy methods

Cossins, B. P., Foucher, S., Edge, C. M., & Essex, J. W. (2009)

One of the factors preventing the general application of free energy methods in rational drug design remains the lack of sufficient computational resources. Many nonequilibrium (NE) free energy methods, however, are easily made embarrassingly parallel in comparison to equilibrium methods and may be conveniently run on desktop computers using distributed computing software. In recent years, there has been a proliferation of NE methods, but the general applicability of these approaches has not been determined. In this study, a subset including only those NE methods which are easily parallelised were considered for examination, with a view to their application to the prediction of protein−ligand binding affinities. A number of test systems were examined, including harmonic oscillator (HO) systems and the calculation of relative free energies of hydration of water−methane. The latter system uses identical potentials to the protein ligand case and is therefore an appropriate model system on which methods may be tested. As well as investigating existing protocols, a replica exchange NE approach was developed, which was found to offer advantages over conventional methods. It was found that Rosenbluth-based approaches to optimizing the NE work values used in NE free energy estimates were not consistent in the improvements in accuracy achieved and that, given their computational cost, the simple approach of taking each work value in an unbiased way is to be preferred. Of the two free energy estimators examined, Bennett’s acceptance ratio was the most consistent and is, therefore, to be preferred over the Jarzynski estimator. The recommended protocols may be run very efficiently within a distributed computing environment and are of similar accuracy and precision to equilibrium free energy methods.

HIV-1 protease performs a vital step in the propagation of the HIV virus and is therefore an important drug target in the treatment of AIDS. It consists of a homodimer, with access to the active site limited by two protein flaps. NMR studies have identified two time scales of motions that occur in these flaps, and it is thought that the slower of these is responsible for a conformational change that makes the protein ligand-accessible. This motion occurs on a time scale outside that achievable using traditional molecular dynamics simulations. Reversible Digitally Filtered Molecular Dynamics (RDFMD) is a method that amplifies low frequency motions associated with conformational change and has recently been applied to, among others, E. coli dihydrofolate reductase, inducing a conformational change between known crystal structures. In this paper, the conformational motions of HIV-1 protease produced during MD and RDFMD simulations are presented, including movement between the known semiopen and closed conformations, and the opening and closing of the protein flaps.

The transmembrane permeation of eight small (molecular weight <100) organic molecules across a phospholipid bilayer is investigated by multiscale molecular dynamics simulation. The bilayer and hydrating water are represented by simplified, efficient coarse-grain models, whereas the permeating molecules are described by a standard atomic-level force-field. Permeability properties are obtained through a refined version of the z-constraint algorithm. By constraining each permeant at selected depths inside the bilayer, we have sampled free energy differences and diffusion coefficients across the membrane. These data have been combined, according to the inhomogeneous solubility−diffusion model, to yield the permeability coefficients. The results are generally consistent with previous atomic-level calculations and available experimental data. Computationally, our multiscale approach proves 2 orders of magnitude faster than traditional atomic-level methods.

The microscopic flexibility of DNA is a key ingredient for understanding its interaction with proteins and drugs but is still poorly understood and technically challenging to measure. Several experimental methods probe very long DNA samples, but these miss local flexibility details. Others mechanically disturb or modify short molecules and therefore do not obtain flexibility properties of unperturbed and pristine DNA. Here, we show that it is possible to extract very detailed flexibility information about unmodified DNA from melting temperatures with statistical physics models. We were able to retrieve, from published melting temperatures, several established flexibility properties such as the presence of highly flexible TATA regions of genomic DNA and support recent findings that DNA is very flexible at short length scales. New information about the nanoscale Na+ concentration dependence of DNA flexibility was determined and we show the key role of ApT and TpA steps when it comes to ion-dependent flexibility and melting temperatures.

The HIV-1 IN enzyme is one of three crucial virally encoded enzymes (HIV-1 IN, HIV-1 PR, and HIV-1 RT) involved in the life-cycle of the HIV-1 virus, making it an attractive target in the development of drugs against the AIDS virus. The structure and mechanism of the HIV-1 IN enzyme is the least understood of the three enzymes due to the lack of three-dimensional structural information. X-ray cystallographic studies have not yet been able to resolve the full-length structure, and studies have been mainly focused on the catalytic domain. This central domain possesses an important catalytic loop observed to overhang the active site, and experimental studies have shown that its dynamics affects the catalytic activity of mutant HIV-1 IN enzymes. In this study, the enhanced sampling technique, Reversible Digitally Filtered Molecular Dynamics (RDFMD), has been applied to the catalytic domain of the WT and G140A/G149A HIV-1 IN enzymes and has highlighted significant differences between the behavior of the catalytic loop which may explain the decrease of activity observed in experimental studies for this mutant.

Thermal equivalence of DNA duplexes for probe design

Weber, G., Haslam, N., Essex, J. W., & Neylon, C. (2009)

We present the theory of thermal equivalence in the framework of the Peyrard–Bishop model and some of its anharmonic variants. The thermal equivalence gives rise to a melting index τ which maps closely the experimental DNA melting temperatures for short DNA sequences. We show that the efficient calculation of the melting index can be used to analyse the parameters of the Peyrard–Bishop model and propose an improved set of Morse potential parameters. With this new set we are able to calculate some of the experimental melting temperatures to ± 1.2 °C. We review some of the concepts of sequencing probe design and show how to use the melting index to explore the possibilities of gene coverage by tuning the model parameters.

A quantitative coarse-grain model for lipid bilayers

Orsi, M., Haubertin, D. Y., Sanderson, W. E., & Essex, J. W. (2008)

A simplified particle-based computer model for hydrated phospholipid bilayers has been developed and applied to quantitatively predict the major physical features of fluid-phase biomembranes. Compared with available coarse-grain methods, three novel aspects are introduced. First, the main electrostatic features of the system are incorporated explicitly via charges and dipoles. Second, water is accurately (yet efficiently) described, on an individual level, by the soft sticky dipole model. Third, hydrocarbon tails are modeled using the anisotropic Gay−Berne potential. Simulations are conducted by rigid-body molecular dynamics. Our technique proves 2 orders of magnitude less demanding of computational resources than traditional atomic-level methodology. Self-assembled bilayers quantitatively reproduce experimental observables such as electron density, compressibility moduli, dipole potential, lipid diffusion, and water permeability. The lateral pressure profile has been calculated, along with the elastic curvature constants of the Helfrich expression for the membrane bending energy; results are consistent with experimental estimates and atomic-level simulation data. Several of the results presented have been obtained for the first time using a coarse-grain method. Our model is also directly compatible with atomic-level force fields, allowing mixed systems to be simulated in a multiscale fashion.

The identification of lead molecules using computational modeling often relies on approximate, high-throughput approaches, of limited accuracy. We show here that, with a methodology we recently developed, it is possible to predict the relative binding free energies of structurally diverse ligands of the estrogen receptor-α using a rigorous statistical thermodynamics approach. Predictions obtained from the simulations with an explicit solvation model are in good qualitative agreement with experimental data, while simulations with implicit solvent models or rank ordering by empirical scoring functions yield predictions of lower quality. In addition, it is shown that free energy techniques can be used to select the most likely binding mode from a set of possible orientations generated by a docking program. It is suggested that the free energy techniques outlined in this study can be used to rank-order, by potency, structurally diverse compounds identified by virtual screening, de novo design or scaffold hopping programs.

Optimal probe length varies for targets with high sequence variation: implications for probe library design for resequencing highly variable genes

Haslam, N. J., Whiteford, N. E., Weber, G., Prügel-Bennett, A., Essex, J. W., & Neylon, C. (2008)

Sequencing by hybridisation is an effective method for obtaining large amounts of DNA sequence information at low cost. The efficiency of SBH depends on the design of the probe library to provide the maximum information for minimum cost. Long probes provide a higher probability of non-repeated sequences but lead to an increase in the number of probes required whereas short probes may not provide unique sequence information due to repeated sequences. We have investigated the effect of probe length, use of reference sequences, and thermal filtering on the design of probe libraries for several highly variable target DNA sequences.

Pattern recognition based on color-coded quantum mechanical surfaces for molecular alignment

Hudson, B. D., Whitley, D. C., Ford, M. G., Swain, M., & Essex, J. W. (2008)

A pattern recognition algorithm for the alignment of drug-like molecules has been implemented. The method is based on the calculation of quantum mechanical derived local properties defined on a molecular surface. This approach has been shown to be very useful in attempting to derive generalized, non-atom based representations of molecular structure. The visualization of these surfaces is described together with details of the methodology developed for their use in molecular overlay and similarity calculations. In addition, this paper also introduces an additional local property, the local curvature (C L), which can be used together with the quantum mechanical properties to describe the local shape. The method is exemplified using some problems representing common tasks encountered in molecular similarity.

Coarse-grain models are becoming an increasingly important tool in computer simulations of a wide variety of molecular processes. In many instances it is, however, desirable to describe key portions of a molecular system at the atomic level. There is therefore a strong interest in the development of simulation methodologies that allow representations of matter with mixed granularities in a multiscale fashion. We report here a strategy to conduct mixed atomic-level and coarse-grain simulations of molecular systems with a recently developed coarse-grain model. The methodology is validated by computing partition coefficients of small molecules described in atomic detail and solvated by water or octane, both of which are represented by coarse-grain models. Because the present coarse-grain force field retains electrostatic interactions, the simplified solvent particles can interact realistically with the all-atom solutes. The partition coefficients computed by this approach rival the accuracy of fully atomistic simulations and are obtained at a fraction of their computational cost. The present methodology is simple, robust and applicable to a wide variety of molecular systems.

Protein-ligand binding affinity by nonequilibrium free energy methods

Cossins, B. P., Foucher, S., Edge, C. M., & Essex, J. W. (2008)

Nonequilibrium (NE) free energy methods are embarrassingly parallel and may be very conveniently run on desktop computers using distributed computing software. In recent years there has been a proliferation of NE methods, but these approaches have barely, if at all, been used in the context of calculating protein−ligand binding free energies. In a recent study by these authors, different combinations of NE methods with various test systems were compared and protocols identified which yielded results as accurate as replica exchange thermodynamic integration (RETI). The NE approaches, however, lend themselves to extensive parallelization through the use of distributed computing.(1) Here the best performing of those NE protocols, a replica exchange method using Bennett’s acceptance ratio as the free energy estimator (RENE), is applied to two sets of congeneric inhibitors bound to neuraminidase and cyclooxygenase-2. These protein−ligand systems were originally studied with RETI,(2) giving results to which NE and RENE simulations are compared. These NE calculations were carried out on a large, highly distributed group of low-performance desktop computers which are part of a Condor pool.(3) RENE was found to produce results of a predictive quality at least as good as RETI in less than half the wall clock time. However, non-RE NE results were found to be far less predictive. In addition, the RENE method successfully identified a localized region of rapidly changing free energy gradients without the need for prior investigation. These results suggest that the RENE protocol is appropriate for use in the context of predicting protein−ligand binding free energies and that it can offer advantages over conventional, equilibrium approaches.

Classification of water molecules in protein binding sites

Barillari, C., Taylor, J., Viner, R., & Essex, J. W. (2007)

Water molecules play a crucial role in mediating the interaction between a ligand and a macromolecular receptor. An understanding of the nature and role of each water molecule in the active site of a protein could greatly increase the efficiency of rational drug design approaches:  if the propensity of a water molecule for displacement can be determined, then synthetic effort may be most profitably applied to the design of specific ligands with the displacement of this water molecule in mind. In this paper, a thermodynamic analysis of water molecules in the binding sites of six proteins, each complexed with a number of inhibitors, is presented. Two classes of water molecules were identified:  those conserved and not displaced by any of the ligands, and those that are displaced by some ligands. The absolute binding free energies of 54 water molecules were calculated using the double decoupling method, with replica exchange thermodynamic integration in Monte Carlo simulations. It was found that conserved water molecules are on average more tightly bound than displaced water molecules. In addition, Bayesian statistics is used to calculate the probability that a particular water molecule may be displaced by an appropriately designed ligand, given the calculated binding free energy of the water molecule. This approach therefore allows the numerical assessment of whether or not a given water molecule should be targeted for displacement as part of a rational drug design strategy.

e-Malaria: the schools malaria project

Geldhill, R., Kent, S., Milsted, A., Chapman, R., Essex, J. W., & Frey, J. G. (2007)

e‐Malaria ( aims to bring together 16–18 year old school students with university researchers to explain aspects of computational drug design using the example of the hunt for a new anti‐Malaria drug. Malaria kills a child every 30 seconds, and 40% of the world's population lives in countries where the disease is endemic. Resistance to existing drugs is increasing and there is a growing need for new compounds. This challenge is being offered to school students who will use a distributed drug search and selection system via a Web interface to design potential drugs. The project makes use of industrial code for the docking study (‘GOLD’) and as such presents valuable lessons in how to achieve the integration of industrial programs into a ‘free’ outreach environment. The results of the trials are displayed in an accessible manner, giving students an opportunity for discussion and debate both with peers and with university contacts. The initial project has been extended to provide a similar challenge for undergraduate chemists as part of a chemical informatics course at a level relevant to more advances chemical skills.

A methodology for the calculation of the free energy difference between a pair of molecules of arbitrary topology is proposed. The protocol relies on a dual-topology paradigm, a softening of the intermolecular interactions, and a constraint that prevents the perturbed molecules from drifting away from each other at the end states. The equivalence and the performance of the methodology against a single-topology approach are demonstrated on a pair of harmonic oscillators, the calculation of the relative solvation free energy of ethane and methanol, and the relative binding free energy of two congeneric inhibitors of cyclooxygenase 2. The stability of two alternative binding modes of an inhibitor of cyclin-dependent kinase 2 is then investigated. Finally, the relative binding free energy of two structurally different inhibitors of cyclin-dependent kinase 2 is calculated. The proposed methodology allows the study of a range of problems that are beyond the reach of traditional relative free energy calculation protocols and should prove useful in drug design studies.

Three hydrolases and a transferase: comparative analysis of active-site dynamics via the BioSimGrid database

Tai, K., Baaden, M., Murdock, S., Wu, B., Ng, M. H., Johnston, S., ... Sansom, M. S. P. (2007)

Comparative molecular dynamics (MD) simulations enable us to explore the conformational dynamics of the active sites of distantly related enzymes. We have used the BioSimGrid ( database to facilitate such a comparison. Simulations of four enzymes were analyzed. These included three hydrolases and a transferase, namely acetylcholinesterase, outer-membrane phospholipase A, outer-membrane protease T, and PagP (an outer-membrane enzyme which transfers a palmitate chain from a phospholipid to lipid A). A set of 17 simulations were analyzed corresponding to a total of ∼0.1 μs simulation time. A simple metric for active-site integrity was used to demonstrate the existence of clusters of dynamic conformational behaviour of the active sites. Small (i.e. within a cluster) fluctuations appear to be related to the function of an enzymatically active site. Larger fluctuations (i.e. between clusters) correlate with transitions between catalytically active and inactive states. Overall, these results demonstrate the potential of a comparative MD approach to analysis of enzyme function. This approach could be extended to a wider range of enzymes using current high throughput MD simulation and database methods.

A computer-aided drug discovery system for chemistry teaching

Gledhill, R., Kent, S., Hudson, B., Richards, W. G., Essex, J. W., & Frey, J. G. (2006)

The Schools Malaria Project ( brings together school students with university researchers in the hunt for a new antimalaria drug. The design challenge being offered to students is to use a distributed drug search and selection system to design potential antimalaria drugs. The system is accessed via a Web interface. This e-science project displays the results of the trials in an accessible manner, giving students an opportunity for discussion and debate both with peers and with the university contacts. The project has been implemented by using distributed computing techniques, spreading computer load over a network of machines that cross institutional boundaries, forming a grid. This provides access to greater computing power and allows a much more complex and detailed formulation of the drug design problem to be tackled for research, teaching, and learning.

BioSimGrid: grid-enabled biomolecular simulation data storage and analysis

Ng, M. H., Johnston, S., Wu, B., Murdock, S. E., Tai, K., Fangohr, H., ... Jeffreys, P. (2006)

In computational biomolecular research, large amounts of simulation data are generated to capture the motion of proteins. These massive simulation data can be analysed in a number of ways to reveal the biochemical properties of the proteins. However, the legacy way of storing these data (usually in the laboratory where the simulations have been run) often hinders a wider sharing and easier cross-comparison of simulation results. The data is commonly encoded in a way specific to the simulation package that produced the data and can only be analysed with tools developed specifically for that simulation package. The BioSimGrid platform seeks to provide a solution to these challenges by exploiting the potential of the Grid in facilitating data sharing. By using BioSimGrid either in a scripting or web environment, users can deposit their data and reuse it for analysis. BioSimGrid tools manage the multiple storage locations transparently to the users and provide a set of retrieval and analysis tools for processing the data in a convenient and efficient manner. This paper details the usage and implementation of BioSimGrid using a combination of commercial databases, the Storage Resource Broker and Python scripts, gluing the building blocks together. It introduces a case study of how BioSimGrid can be used for better storage, retrieval and analysis of biomolecular simulation data.

Bringing chemical data onto the semantic web

Taylor, K. R., Gledhill, R., Essex, J. W., Frey, J. G., Harris, S. W., & De Roure, D. C. (2006)

Present chemical data storage methodologies place many restrictions on the use of the stored data. The absence of sufficient high-quality metadata prevents intelligent computer access to the data without human intervention. This creates barriers to the automation of data mining in activities such as quantitative structure−activity relationship modelling. The application of Semantic Web technologies to chemical data is shown to reduce these limitations. The use of unique identifiers and relationships (represented as uniform resource identifiers, URIs, and resource description framework, RDF) held in a triplestore provides for greater detail and flexibility in the sharing and storage of molecular structures and properties.

Efficient generalized Born models for Monte Carlo simulations

Michel, J., Taylor, R. D., & Essex, J. W. (2006)

The Generalized Born Surface Area theory (GBSA) has become a popular method to model the solvation of biomolecules. While efficient in the context of molecular dynamics simulations, GBSA calculations do not integrate well with Monte Carlo simulations because of the nonlocal nature of the Generalized Born energy. We present a method by which Monte Carlo Generalized Born simulations can be made seven to eight times faster on a protein−ligand binding free energy calculation with little or no loss of accuracy. The method can be employed in any type of Monte Carlo or Hybrid Monte Carlo-molecular dynamics simulation and should prove useful in numerous applications.

Continuum electrostatics is combined with rigorous free-energy calculations in an effort to deliver a reliable and efficient method for in silico lead optimization. The methodology is tested by calculation of the relative binding free energies of a set of inhibitors of neuraminidase, cyclooxygenase2, and cyclin-dependent kinase 2. The calculated free energies are compared to the results obtained with explicit solvent simulations and empirical scoring functions. For cyclooxygenase2, deficiencies in the continuum electrostatics theory are identified and corrected with a modified simulation protocol. For neuraminidase, it is shown that a continuum representation of the solvent leads to markedly different protein−ligand interactions compared to the explicit solvent simulations, and a reconciliation of the two protocols is problematic. Cyclin-dependent kinase 2 proves more challenging, and none of the methods employed in this study yield high quality predictions. Despite the differences observed, for these systems, the use of an implicit solvent framework to predict the ranking of congeneric inhibitors to a protein is shown to be faster, as accurate or more accurate than the explicit solvent protocol, and superior to empirical scoring schemes.

Quality assurance for biomolecular simulations

Murdock, S. E., Tai, K., Ng, M. H., Johnston, S., Wu, B., Fangohr, H., ... Sansom, M. S. P. (2006)

Contemporary structural biology has an increased emphasis on high-throughput methods. Biomolecular simulations can add value to structural biology via the provision of dynamic information. However, at present there are no agreed measures for the quality of biomolecular simulation data. In this Letter, we suggest suitable measures for the quality assurance of molecular dynamics simulations of biomolecules. These measures are designed to be simple, fast, and general. Reporting of these measures in simulation papers should become an expected practice, analogous to the reporting of comparable quality measures in protein crystallography. We wish to solicit views and suggestions from the simulation community on methods to obtain reliability measures from molecular-dynamics trajectories. In a database which provides access to previously obtained simulationsfor example BioSimGrid ( user needs to be confident that the simulation trajectory is suitable for further investigation. This can be provided by the simulation quality measures which a user would examine prior to more extensive analyses.

The semantic grid and chemistry: experiences with CombeChem

Taylor, K., Essex, J. W., Frey, J. G., Mills, H. R., Hughes, G., & Zaluska, E. J. (2006)

The CombeChem e-Science project has demonstrated the advantages of using Semantic Web technology, in particular RDF and the associated triplestores, to describe and link diverse and complex chemical information, covering the whole process of the generation of chemical knowledge from inception in the synthetic chemistry laboratory, through analysis of the materials made which generates physical measurements, computations based on this data to develop interpretations, and the subsequent dissemination of the knowledge gained. The RDF descriptions employed allow for a uniform description of chemical data in a wide variety of forms including multimedia, and of the chemical processes both in the laboratory and in model building. The project successfully adopted a strategy of capturing semantic annotations ‘at source’ and establishing schema and ontologies based closely on current operational practice in order to facilitate implementation and adoption. We illustrate this in the contexts of the synthetic organic chemistry laboratory with chemists at the bench, computational chemistry for modelling data, and the linking of chemical publications to the underlying results and data to provide the appropriate provenance. The resulting ‘Semantic Data Grid’ comprises tens of millions of RDF triples across multiple stores representing complex chains of derived data with associated provenance.

An analysis of the feasibility of short read sequencing

Whiteford, N., Haslam, N., Weber, G., Prugel-Bennett, A., Essex, J. W., Roach, P. L., ... Neylon, C. (2005)

Several methods for ultra high-throughput DNA sequencing are currently under investigation. Many of these methods yield very short blocks of sequence information (reads). Here we report on an analysis showing the level of genome sequencing possible as a function of read length. It is shown that re-sequencing and de novo sequencing of the majority of a bacterial genome is possible with read lengths of 20–30 nt, and that reads of 50 nt can provide reconstructed contigs (a contiguous fragment of sequence data) of 1000 nt and greater that cover 80% of human chromosome 1.

To reach their biological target, drugs have to cross cell membranes, and understanding passive membrane permeation is therefore crucial for rational drug design. Molecular dynamics simulations offer a powerful way of studying permeation at the single molecule level. Starting from a computer model proven to be able to reproduce the physical properties of a biological membrane, the behaviour of small solutes and large drugs in a lipid bilayer has been studied. Analysis of dihedral angles shows that a few nanosesconds are sufficient for the simulations to converge towards common values for those angles, even if the starting structures belong to different conformations. Results clearly show that, despite their difference in size, small solutes and large drugs tend to lie parallel to the bilayer normal and that, when moving from water solution into biomembranes, permeants lose degrees of freedom. This explains the experimental observation that partitioning and permeation are highly affected by entropic effects and are size-dependent. Tilted orientations, however, occur when they make possible the formation of hydrogen bonds. This helps to understand the reason why hydrogen bonding possibilities are an important parameter in cruder approaches which predict drug absorption after administration. Interestingly, hydration is found to occur even in the membrane core, which is usually considered an almost hydrophobic region. Simulations suggest the possibility for highly polar compounds like acetic acid to cross biological membranes while hydrated. These simulations prove useful for drug design in rationalising experimental observations and predicting solute behaviour in biomembranes.

Grid computing and biomolecular simulation

Woods, C. J., Ng, M. H., Johnston, S., Murdock, S. E., Wu, B., Tai, K., ... Essex, J. W. (2005)

Biomolecular computer simulations are now widely used not only in an academic setting to understand the fundamental role of molecular dynamics on biological function, but also in the industrial context to assist in drug design. In this paper, two applications of Grid computing to this area will be outlined. The first, involving the coupling of distributed computing resources to dedicated Beowulf clusters, is targeted at simulating protein conformational change using the Replica Exchange methodology. In the second, the rationale and design of a database of biomolecular simulation trajectories is described. Both applications illustrate the increasingly important role modern computational methods are playing in the life sciences.

Conventional paper-based scientific publications are limited in the amount of data and interaction they can provide. Simply placing an electronic copy of such a publication on the web makes for easier distribution, but more can be achieved by making use of the technology to navigate through the paper, to show the links between calculated parameters and the data, and provide access to the chain of calculated results all the way back to the raw data and ultimately the laboratory notebook. While the paper version of this publication is in a relatively conventional form, an interactive demonstrator version (available at and as an installable package) illustrates the concepts of publication@source, whereby all the figures and data presented in the paper are linked back to the original raw data together with a description of the processes by which the raw data were analysed. This level of interactivity is achieved using semantic technologies, which have the additional advantage of making the final document subsequently available and navigable by automated techniques. We present the combined information from experimental studies of surface tension and second harmonic generation (SHG) on the behaviour of benzo-15-crown-5 at the solution/air interface, together with a molecular dynamics computer simulation, to demonstrate how the simulation aids the interpretation of the SHG experiment. The adsorption isotherm was determined using SHG and fitted to a Langmuir isotherm giving ΔadsG0=26 kJ mol−1.

Mechanism and structure-activity relationships of norspermidine-based peptidic inhibitors of trypanothione reductase

Dixon, M. J., Maurer, R. I., Biggi, C., Oyarzabal, J., Essex, J. W., & Bradley, M. (2005)

A library of polyamine–peptide conjugates based around some previously identified inhibitors of trypanothione reductase was synthesised by parallel solid-phase chemistry and screened. Kinetic analysis of library members established that subtle structural changes altered their mechanism of action, switching between competitive and non-competitive inhibition. The mode of action of the non-competitive inhibitors was investigated in detail by a variety of techniques including enzyme kinetic analysis (looking at both NADPH and trypanothione disulfide substrates), gel filtration chromatography and analytical ultracentrifugation, leading to the identification of an allosteric mode of inhibition.

Parametrization of reversible digitally filtered molecular dynamics simulations

Wiley, A. P., Swain, M. T., Phillips, S. C., Essex, J. W., & Edge, C. M. (2005)

Reversible Digitally Filtered Molecular Dynamics (RDFMD) is a method of amplifying or suppressing motions in a molecular dynamics simulation, through the application of a digital filter to the simulation velocities. RDFMD and its derivatives have been previously used to promote conformational motions in liquid-phase butane, the Syrian hamster prion protein, alanine dipeptide, and the pentapeptide, YPGDV. The RDFMD method has associated with it a number of parameters that require specification to optimize the desired response. In this paper methods for the systematic analysis of these parameters are presented and applied to YPGDV with the specific emphasis of ensuring a gentle and progressive method that produces maximum conformation change from the energy put into the system. The portability of the new parameter set is then shown with an application to the M20 loop of E-coli dihydrofolate reductase. A conformational change is induced from a closed to an open structure similar to that seen in the DHFR−NADP+ complex.

Prediction of properties from simulations: a re-examination with modern statistical methods

Mansson, R. A., Frey, J. G., Essex, J. W., & Welsh, A. H. (2005)

We discuss models fit to data collected by Duffy and Jorgensen to predict solvation free energies and partition equilibria of drugs, organic molecules, aromatic heterocycles, and other molecules. These data were originally examined using linear regression, but here more recently developed statistical models are applied. The data set is complicated due to the presence of discrepant observations and also curvature in the response. In some cases it is possible to discard a small number of the observations to get good fit to the data, but, in others, discarding an increasing proportion of the observations does not improve the fit. Our general preference is to use robust parameter estimation which downweights to reduce the influence of discrepant observations on the fitted models. Models are selected for four responses using linear or more complicated representations of the explanatory variables, such as cubic polynomials, B-splines, or smoothers via generalized additive models (GAMs). Variables are chosen using the traditional approach of formal tests to assess contribution to the fit of a model, and resampling methods including bootstrap are also considered to assess the prediction error for given models. Results of our analysis indicate that GAMs are an improvement on linear models for describing the data and making predictions. In general robust regression models and GAMs have the smallest conditional expected loss of prediction over the four responses. In addition, robust regression models offer the advantage of identifying molecules that perform poorly in the fit. In general, models were identified that yielded an improvement of approximately 50% in the conditional expected loss of prediction compared with the original parametrization of Duffy and Jorgensen. It was also found that the use of cross-validation to compare models was unreliable, and bootstrapping is preferred.

A web/grid portal implementation of BioSimGrid: a biomolecular simulation database

Wu, B., Dovey, M., Ng, M. H., Tai, K., Murdock, S., Fangohr, H., ... Sansom, M. S. P. (2004)

The overall aim of the BioSimGrid project ( is to exploit the grid infrastructure to enable comparative analysis of the results of biomolecular simulations. In particular, we present the implementation of current BioSimGrid Web portal. The portal has a SOA (service oriented architecture) framework built on the layer of OGSA (open grid service architecture) and OGSA-DAI (open grid service architecture database access and integration) middleware. The PortalLib has been developed to allow RAD (rapid application development) of portal applications. The portal also integrates PKI (public key infrastructure) and supports two levels of distributed SSO (single sign on): grid certificate-based SSO for high security, and user/pass based SSO for maximal flexibility.

BioSimGrid: towards a worldwide repository for biomolecular simulations

Tai, K., Murdock, S., Wu, B., Ng, M. H., Johnston, S., Fangohr, H., ... Sansom, M. S. P. (2004)

BioSimGrid is a database for biomolecular simulations, or, a ‘Protein Data Bank extended in time’ for molecular dynamics trajectories. We describe the implementation details: architecture, data schema, deposition, and analysis modules. We encourage the simulation community to explore BioSimGrid and work towards a common trajectory exchange format.

Cell membrane permeation is required for most drugs to reach their biological target, and understanding this process is therefore crucial for rational drug design. Recent molecular dynamics simulations have studied the permeation of eight small molecules through a phospholipid bilayer. Unlike experiments, atomistic simulations allow the direct calculation of diffusion and partition coefficients of solutes at different depths inside a lipid membrane. Further analyses of the simulations suggest that solute diffusion is less size-dependent and solute partitioning more size-dependent than was commonly thought.

Relative binding free energies of amino acid derivatives to the host macrobicycle 12 in chloroform are calculated to determine the effect of stereochemistry and amide bond conformation. Simulations are performed for three different amino acid derivatives using Monte Carlo simulations, free energy perturbations, and the generalized Born/surface area solvation model. The free energy results support previous experimental findings that the l enantiomers of N-acetyl phenylalanine carboxylate and N-acetyl alanine carboxylate bind preferentially over the d enantiomers and that the amide bond is in the cis conformation for the bound l enantiomers. The enantioselectivity arises because of better shape complementarity, and the cis stabilization is achieved by a selective hydrogen bond to the cavity rim of the host and a remote steric stabilization of the side-chain χ1 dihedral angle. This computational approach was also applied to N-acetyl glycine carboxylate and indicated that, for this amino acid derivative, both the trans and the cis forms of the acetyl amide can be accommodated within the macrobicycle cavity. Experimental evidence to support this finding was obtained by detailed NMR experiments on a 1:1 complex of M12 and N-acetyl glycine. The study demonstrates two important points regarding binding and conformation:  First, the destabilization of the χ1 dihedral angle of N-acetyl phenylalanine away from the most stable conformation appears to stabilize the cis conformation of the preceding amide bond, suggesting that amino acids other than proline might be able to increase the likelihood of cis amide bonds. More generally, internal conformational coupling is an important factor to consider in molecular recognition. Second, the comprehensive sampling protocol undertaken for this relatively small system that allows the host to adjust properly to each guest is shown to be crucial in obtaining converged free energies and different binding modes. Subtle changes in host geometry have profound effects on binding. Conversely, different binding modes arising from small changes in guest conformation, chirality, or side chain serve as a reminder that binding modes are not always transferable between apparently similar molecules.

To reach their biological target, drugs have to cross cell membranes, and understanding passive membrane permeation is therefore crucial for rational drug design. Molecular dynamics simulations offer a powerful way of studying permeation at the single molecule level, yielding detailed dynamic and thermodynamic data. Biological membranes have a very inhomogeneous character and a highly anisotropic behavior. Starting from a computer model proven to reproduce the physical properties of such a complex system, the permeation of small organic molecules across a lipid bilayer model has been studied. Free energy profiles and diffusion coefficients along the bilayer normal have been calculated for small organic molecules by means of all-atom molecular dynamics (MD) simulations constraining the compounds at chosen depths inside the membrane. These data also allow for the calculation of permeability coefficients, the results for which have been compared with experimental data. The calculated permeability coefficients are generally 1 order of magnitude larger than the equivalent experimental data, but the molecules' relative permeability coefficients are reproduced.

Replica-exchange-based free-energy methods

Essex, J. W., Woods, C. J., Foucher, S., & King, M. A. (2004)

The calculation of relative free-energies that involve large reorganizations of the environment is one of the great challenges of condensed-phase simulation. To meet this challenge, we have developed a new free-energy technique that combines the advantages of the Hamiltonian replica-exchange method with thermodynamic integration. We have tested and compared this technique with other free-energy methods on a range of systems including the binding of ligands to p38 kinase. The use of replica-exchange moves leads to dramatic improvements in configurational sampling, and a reduction of simulation errors. Subtle solvation effects are explicitly taken into account in the free-energy results. This is achieved at no extra computational cost, relative to standard free-energy methods.

Generalized Born Surface Area (GBSA) models for water using the Pairwise Descreening Approximation (PDA) have been parameterized by two different methods. The first method, similar to that used in previously reported parameterizations, optimizes all parameters against the experimental free energies of hydration of organic molecules. The second method optimizes the PDA parameters to compensate only for systematic errors of the PDA. The best models are compared to Poisson–Boltzmann calculations and applied to the computation of potentials of mean force (PMFs) for the association of various molecules. PMFs present a more rigorous test of the ability of a solvation model to correctly reproduce the screening of intermolecular interactions by the solvent, than its accuracy at predicting free energies of hydration of small molecules. Models derived with the first method are sometimes shown to fail to compute accurate potentials of mean force because of large errors in the computation of Born radii, while no such difficulties are observed with the second method. Furthermore, accurate computation of the Born radii appears to be more important than good agreement with experimental free energies of solvation. We discuss the source of errors in the potentials of mean force and suggest means to reduce them. Our findings suggest that Generalized Born models that use the Pairwise Descreening Approximation and that are derived solely by unconstrained optimization of parameters against free energies of hydration should be applied to the modeling of intermolecular interactions with caution. © 2004 Wiley Periodicals, Inc. J Comput Chem 25: 1760–1770, 2004

Application of the Hilbert-Huang transform to the analysis of molecular dynamics simulations

Phillips, S. C., Gledhill, R. J., Essex, J. W., & Edge, C. M. (2003)

The Hilbert−Huang transform (HHT) is a new method for the analysis of nonstationary signals that allows a signal's frequency and amplitude to be evaluated with excellent time resolution. In this paper, the HHT method is described, and its performance is compared with the Fourier methods of spectral analysis. The HHT is then applied to the analysis of molecular dynamics simulation trajectories, including enhanced sampling trajectories produced by reversible digitally filtered molecular dynamics. Amplitude-time, amplitude-frequency, and amplitude-frequency-time spectra are all produced with the method and compared to equivalent results obtained using wavelet analysis. The wavelet and HHT analysis yield qualitatively similar results, but the HHT provides a better match to physical intuition than the wavelet transform. Moreover the HHT method is able to show the flow of energy into low-frequency vibrations during conformational change events and is able to identify frequencies appropriate for amplification by digital filters including the observation of a 10 cm-1 shift in target frequency.

Enhanced configurational sampling in binding free-energy calculations

Woods, C. J., Essex, J. W., & King, M. A. (2003)

The newly developed method of replica-exchange thermodynamic integration (RETI) was tested and compared with finite-difference thermodynamic integration (FDTI) on the calculation of the relative binding free energies of halides to a calix[4]pyrrole derivative. The calculation was challenging, because the dimethylsulfoxide solvent was contaminated by small amounts of water. The λ-swap move of RETI enabled more-complete sampling of the solvents and produced relative binding free energies that included the effect of the fluoride's higher affinity for water. In addition, the λ-swap move increased the quality of the configurational sampling of the host, because the system was able to escape from local minima. The results demonstrate that the sampling of RETI is superior to that of FDTI, at no additional computational expense.

The docking of flexible small molecule ligands to large flexible protein targets is addressed in this article using a two‐stage simulation‐based method. The methodology presented is a hybrid approach where the first component is a dock of the ligand to the protein binding site, based on deriving sets of simultaneously satisfied intermolecular hydrogen bonds using graph theory and a recursive distance geometry algorithm. The output structures are reduced in number by cluster analysis based on distance similarities. These structures are submitted to a modified Monte Carlo algorithm using the AMBER‐AA molecular mechanics force field with the Generalized Born/Surface Area (GB/SA) continuum model. This solvent model is not only less expensive than an explicit representation, but also yields increased sampling. Sampling is also increased using a rotamer library to direct some of the protein side‐chain movements along with large dihedral moves. Finally, a softening function for the nonbonded force field terms is used, enabling the potential energy function to be slowly turned on throughout the course of the simulation. The docking procedure is optimized, and the results are presented for a single complex of the arabinose binding protein. It was found that for a rigid receptor model, the X‐ray binding geometry was reproduced and uniquely identified based on the associated potential energy. However, when side‐chain flexibility was included, although the X‐ray structure was identified, it was one of three possible binding geometries that were energetically indistinguishable. These results suggest that on relaxing the constraint on receptor flexibility, the docking energy hypersurface changes from being funnel‐like to rugged. A further 14 complexes were then examined using the optimized protocol. For each complex the docking methodology was tested for a fully flexible ligand, both with and without protein side‐chain flexibility. For the rigid protein docking, 13 out of the 15 test cases were able to find the experimental binding mode; this number was reduced to 11 for the flexible protein docking. However, of these 11, in the majority of cases the experimental binding mode was not uniquely identified, but was present in a cluster of low energy structures that were energetically indistinguishable. These results not only support the presence of a rugged docking energy hypersurface, but also suggest that it may be necessary to consider the possibility of more than one binding conformation during ligand optimization. © 2003 Wiley Periodicals, Inc. J Comput Chem 24: 1637–1656, 2003

Reversible digitally filtered molecular dynamics

Phillips, S. C., Swain, M. T., Wiley, A. P., Essex, J. W., & Edge, C. M. (2003)

It has recently been shown that digital filtering methods may be used to selectively enhance or suppress the vibrational motion in a molecular dynamics computer simulation solely on the basis of frequency (J. Chem. Phys.2000, 112, 2586−2597). The method of digitally filtered molecular dynamics (DFMD) does, however, suffer from a number of disadvantages, the most important of which is the rapid redistribution of energy from the selected frequency range in condensed phase simulations. Here, an extension of the DFMD method that solves this problem, reversible digitally filtered molecular dynamics (RDFMD), is presented. In RDFMD, the digital filter is applied successively to velocities that have been generated from previous applications of the filter, by the simple expedient of running simulations both forward and backward in time to fill the filter buffer after each filter application. In this way, kinetic energy is added slowly to the system, with the result that the conformational transitions observed are more controlled and realistic. The method is applied to a number of systems of increasing complexity including alanine dipeptide in gas and condensed phases. These studies demonstrate the advantage of adding energy gradually and also reveal a change in the characteristic frequency of critical vibrations as the transition state is approached. A protocol for applying RDFMD to protein systems has also been devised and tested on the YPGDV pentapeptide in water.

Solid-phase synthesis of a focused library of trypanothione reductase inhibitors

De Luca, S., Ulhaq, S., Dixon, M. J., Essex, J., & Bradley, M. (2003)

A focused library of inhibitors of the enzyme trypanothione reductase was prepared using solid-phase synthesis. The inhibitors were based on a previously identified, non-competitive, lead compound comprising of two Pmc (2,2,5,7,8-pentamethylchroman-6-sulfonyl) side-chain protected, N-capped arginine residues linked by a spermidine bridge. In total six protecting groups and four capping groups were used to generate a 24-membered library. All compounds bearing the 5-methoxyindole-3-acetic acid capping group were found to have good activity. The most potent inhibitor was observed to contain the Mtr (4-methoxy-2,3,6-trimethylbenzenesulphonyl) protecting group on the arginine residue, terminated with tryptophan as the capping group.

The development of replica-exchange-based free-energy methods

Woods, C. J., Essex, J. W., & King, M. A. (2003)

The calculation of relative free energies that involve large reorganizations of the environment is one of the great challenges of condensed-phase simulation. Such calculations are of particular importance in protein−ligand free-energy calculations. To meet this challenge, we have developed new free-energy techniques that combine the advantages of the replica-exchange method with free-energy perturbation (FEP) and finite-difference thermodynamic integration (FDTI). These new techniques are tested and compared with FEP, FDTI, and the adaptive umbrella weighted histogram analysis method (AdUmWHAM) on the challenging calculation of the relative hydration free energy of methane and water. This calculation involves a large solvent configurational change. Through the use of replica-exchange moves along the λ-coordinate, the configurations sampled along λ are allowed to mix, which leads to dramatic improvements in solvent configurational sampling, an efficient reduction of random sampling error, and a reduction of general simulation error. This is achieved at effectively no extra computational cost, relative to standard FEP or FDTI.

A review of protein-small molecule docking methods

Taylor, R. D., Jewsbury, P. J., & Essex, J. W. (2002)

The binding of small molecule ligands to large protein targets is central to numerous biological processes. The accurate prediction of the binding modes between the ligand and protein, (the docking problem) is of fundamental importance in modern structure-based drug design. An overview of current docking techniques is presented with a description of applications including single docking experiments and the virtual screening of databases.

Fluoride-selective binding in a new deep cavity calix[4]pyrrole: experiment and theory

Woods, C. J., Camiolo, S., Light, M. E., Coles, S. J., Hursthouse, M. B., King, M. A., ... Essex, J. W. (2002)

A new “super-extended cavity” tetraacetylcalix[4]pyrrole derivative was synthesized and characterized, and X-ray crystal structures of complexes bound to fluoride and acetonitrile were obtained. The binding behavior of this receptor was investigated by NMR titration, and the complex was found to exclusively bind fluoride ions in DMSO-d6. This unusual binding behavior was investigated by Monte Carlo free energy perturbation simulations and Poisson calculations, and the ion specificity was seen to result from the favorable electrostatic interactions that the fluoride gains by sitting lower in the phenolic cavity of the receptor. The effect of water present in the DMSO on the calculated free energies of binding was also investigated. Owing to the use of a saturated ion solution, the effect of contaminating water is small in this case; however, it has the potential to be very significant at lower ion concentrations. Finally, the adaptive umbrella WHAM protocol was investigated and optimized for use in binding free energy calculations, and its efficiency was compared to that of the free energy perturbation calculations; adaptive umbrella WHAM was found to be approximately two times more efficient. In addition, structural evidence demonstrates that the protocol explores a wider conformational range than free energy perturbation and should therefore be the method of choice. This paper represents the first complete application of this methodology to “alchemical” changes.

The development of a coarse‐grained reduced‐representation model of the hydrocarbon region of a biological membrane is reported. The potential is based on the popular Gay–Berne model of liquid crystals, and involves the linking of individual Gay–Berne ellipsoids by harmonic springs to form each hydrocarbon chain. Diffusion coefficients and order parameters have been calculated by molecular dynamics computer simulations for a range of parameter sets. The results clearly demonstrate the presence of a phase transition from an ordered low‐temperature solid phase reminiscent of the Lβ′ phase of phospholipids, to a high‐temperature disordered phase reminiscent of the Lα phase. Order parameters calculated for each layer of the model are consistent with the experimental segmental order parameters reported for dipalmitoyl phosphatidylcholine. The application of this model to the study of small molecule diffusion within the membrane core is proposed.

The linear finite difference Poisson-Boltzmann (FDPB) equation is applied to the calculation of the electrostatic binding free energies of a group of inhibitors to the Neuraminidase enzyme. An ensemble of enzyme-inhibitor complex conformations was generated using Monte Carlo simulations and the electrostatic binding free energies of subtly different configurations of the enzyme-inhibitor complexes were calculated. It was seen that the binding free energies calculated using FDPB depend strongly on the configuration of the complex taken from the ensemble. This configurational dependence was investigated in detail in the electrostatic hydration free energies of the inhibitors. Differences in hydration energies of up to 7 kcal mol−1 were obtained for root mean square (RMS) structural deviations of only 0.5 Å. To verify the result, the grid size and parameter dependence of the calculated hydration free energies were systematically investigated. This showed that the absolute hydration free energies calculated using the FDPB equation were very sensitive to the values of key parameters, but that the configurational dependence of the free energies was independent of the parameters chosen. Thus just as molecular mechanics energies are very sensitive to configuration, and single-structure values are not typically used to score binding free energies, single FDPB energies should be treated with the same caution.

A new method for modifying the course of a molecular dynamics computer simulation is presented. Digitally filtered molecular dynamics (DFMD) applies the well-established theory of digital filters to molecular dynamics simulations, enabling atomic motion to be enhanced or suppressed in a selective manner solely on the basis of frequency. The basic theory of digital filters and its application to molecular dynamics simulations is presented, together with the application of DFMD to the simple systems of single molecules of water and butane. The extension of the basic theory to the condensed phase is then described followed by its application to liquid phase butane and the Syrian hamster prion protein. The high degree of selectivity and control offered by DFMD, and its ability to enhance the rate of conformational change in butane and in the prion protein, is demonstrated.

Pristine and naphthalene-adsorbed (001) surfaces of the organic conducting salt (TMTSF)2PF6 were examined by scanning tunneling microscopy (STM) to determine the structure of epitaxial monolayer films of naphthalene on the (001) surface. The observed STM images were interpreted by calculating partial electron density plots of the surface, and for naphthalene-adsorbed surfaces, by carrying out Monte Carlo simulations for the adsorption of a naphthalene molecule on the surface. The STM images recorded for the pristine (001) surface correspond to the cation layer of (TMTSF)2PF6, and each circular bright spot of the molecular-resolution STM images represents the three hydrogen atoms of the most protruding methyl group of a TMTSF molecule on the (001) surface. Naphthalene molecules adsorbed on this surface form a pseudomorphic (1 × 1) overlayer structure with respect to the underlying substrate. The naphthalene overlayer shows mechanical stability against etching by the scanning tip. An identical overlayer structure of naphthalene was obtained from several different preparation methods. On the cation-layer (001) surface naphthalene is adsorbed on each “four-methyl-step” defined by four methyl groups of two adjacent TMTSF molecules within each TMTSF stack.