The search for the “holy grail” in computer-aided drug design

In computer-aided drug design (CADD), the ultimate goal is to be able to predict accurately how strongly a potential drug compound will bind to a protein target (i.e., the compound’s binding affinity). If we were able to do this, drug discovery would be significantly accelerated, since we would only need to synthesize in the lab those compounds with predicted high affinity for the target of interest (and predicted low affinity for proteins we didn’t want to interfere with).

Given that our understanding of the many competing and complex factors that govern the binding of a ligand to a protein (what we call “molecular recognition”) is still imperfect, it is hardly surprising the goal of predicting binding affinity reliably enough to have a major impact on drug discovery projects has remained elusive. However, there are glimmers of hope that things might be starting to change.

As I highlighted briefly towards the end of 2018, the combination of state-of-the- art GPU computing resources, improved molecular mechanics force fields and molecular dynamics (MD) programs that can take advantage of the massively parallel architectures that GPUs offer, have enabled the development of free energy perturbation (FEP) calculations that can, in some cases, rank compounds accurately enough in terms of their predicted binding affinities to enable project teams to make good decisions about which compounds to make.

Retrospective and prospective data

A landmark paper in this regard was published in 2015 showing useful predictions across 10 protein targets in both retrospective and, importantly, prospective studies. This publication sparked considerable debate. Since that time, many more groups have gained access to this kind of technology and something of a consensus is emerging, at least anecdotally.

A CADD group at Janssen has published quite a lot about its experiences with FEP over the last few years. In the first of its papers, the FEP+ methodology developed by Schrödinger. was successfully applied to the design of inhibitors of beta-secretase (BACE1), once again in both retrospective and prospective studies. In the latter, the default simulations (of 5ns duration) gave a mean unsigned error (MUE) compared to experiment of 0.91 kcal/mol and this level of accuracy could be improved to 0.59 kcal/mol with 20ns simulations. (Note that we generally require predictions to be within 1 kcal/mol to be useful, since 1.4 kcal/mol equates to a 10-fold difference in binding affinity).

In closing, the authors noted a couple of pre-requisites for success with FEP: confidence in the underlying binding mode and sufficient computational sampling. Insufficient sampling is often the reason when outliers are observed.

More recently, a more challenging test case involving phosphodiesterase 2 (PDE2) has been reported by the same group. The inhibitors studied fell into two groups: “small” and “large”, but both showed similar potency. Molecules of intermediate size showed variable activity. In their initial experiments, the group found that they were able to predict the small and large compounds’ activities accurately when they treated the groups separately, but when they combined them, the activities of the smaller compounds were under-predicted. As the authors note, this would have made it difficult to apply the method prospectively with confidence. It was only when a new X-ray structure of PDE2 was employed, in conjunction with a modeled dimer of the enzyme,that the results improved. This illustrates the potential for FEP calculations to be sensitive to the protein conformation employed and that the first X-ray structure chosen may not necessarily be the most suitable.

Activity cliffs

In its latest publication, the group has turned its attention to the use of FEP calculations to predict “activity cliffs”. These are instances where a relatively small change in a compound’s structure produces a much larger change in activity than might have been predicted. Clearly, it would be very helpful for medicinal chemists to be able to predict such phenomena during lead optimization. An interesting feature of this work was that the group compared two implementations of FEP: the already-mentioned FEP+ and that within the GROMACS suite of programs. Two test sets were studied and the two methods gave qualitative agreement on both. In quantitative terms, the two methods gave identical errors for the first set (1.43 kcal/mol – equivalent roughly to a 10-fold difference in binding affinity) and for the second, FEP+ performed better (1.17 kcal/mol cf. 1.90 kcal/mol for GROMACS).

Of course, there is always the tendency to publish success stories, rather than failures. Anecdotally, experience is still somewhat mixed with groups reporting success in some projects and not in others. It’s also clear that FEP is not a “black box” but needs to be applied by those with a reasonable level of expertise and understanding. Even then, as the PDE2 example shows, some perseverance may be required to achieve success.

Nonetheless, compared to the situation say ten years ago, the application of FEP has advanced dramatically and it is to be hoped that with continuing developments and experience, the success rate when applied to real-world drug discovery projects will continue to increase.

Further Reading:

  1. Williams-Noonan BJ, Yuriev E, Chalmers DK. Free Energy Methods in Drug Design: Prospects of “Alchemical Perturbation” in Medicinal Chemistry. Med. Chem. 2018, 61(3), 638-649. DOI: 10.1021/acs.jmedchem.7b00681
  2. Cournia Z, Allen B, Sherman W. Relative Binding Free Energy Calculations in Drug Discovery: Recent Advances and Practical Considerations. Chem. Inf. Model. 2017, 57(12), 2911-2937. DOI: 10.1021/acs.jcim.7b00564