How scientists can use machine learning tools to build a better molecule. Second in our AI series.

As anyone in drug discovery knows, finding druggable compounds is a time-consuming process of some hits and many misses. So it begs the question, rather than trying to get an existing molecule to work, doesn’t it make more sense to build a brand new one? That is essentially the focus of Dr. Ola Engkvist, PhD, a computational chemist and section head of the hit discovery department at AstraZeneca’s Discovery Sciences department. Dr. Engkvist’s lab has been harnessing the power of machine learning. Through a collaboration with the University of Muenster, they demonstrated the first application of recurrent Neural Networks to molecular design. This methodology allows one to design novel drug molecules using machine learning to navigate the breadth of chemical space and to exploit our vast knowledge base. Eureka recently spoke to Dr. Engkvist as part of a multi-part series on how AI is being used in drug discovery. Below are his slightly edited responses.

Eureka: How is AI transforming how you design drugs at AZ?

OE: AI is helping us augment traditional drug design with sophisticated computational methods to predict what molecules to make next and optimize how to synthesize them.

Eureka: One of your areas of interest within AI pertains to de novo molecular design. What exactly does this term mean and why is it such a hot topic in AI?

OE: De novo molecular design means creating novel compounds using a computer. It has been around for about 25 years in the form of algorithms that can sample a subset of the chemical space through virtual enumeration of combinatorial libraries or genetic algorithms. These methods have a place in the drug design toolkit but they aren’t designed to efficiently search the whole chemical space. More recently, deep learning-based generative methods have for the first time made it possible to search the full chemical space. We have shown that recurrent neural networks (RNN) sample the whole chemical space for fragments (GDB-13). This comparison is possible since the fragment space is enumerable. The probabilistic compression of the chemical space possible with deep learning methods is something unique, and wasn’t available before.

Eureka: What architectural method are you using to design these molecules?

OE: Our favorite architecture is recurrent neural networks (RNN), which we believe is especially suitable for scaffold hopping. However, for optimization of a compound series we also use encoder architectures as a complement to RNN.

Eureka: How well do neural networks understand the chemical structures needed to make a drug-like compound? Are they better than a human?

OE: Recurrent neural networks (RNN) don’t really understand chemical structures; they learn rules about how to generate novel character strings that correspond to molecules within the chemical space. If the RNN is trained on drug-like molecules they will generate novel drug-like molecules. Since humans decide on which chemical space to train the RNN, humans are essential to the process. What the RNN can do is generate many more molecules that are drug-like and can combine these with information about a drug target to home in on a certain part of the drug-like chemical space that the human may not have thought of.

Eureka: What do you think are the biggest challenges in using AI to help discover drugs? 

Today we are generating and have access to more data than ever before. Data and analytics have the potential to transform drug discovery but the true value of scientific data can only be realised if it is “FAIR”, or Findable, Accessible, Interoperable and Reusable. AstraZeneca is focused on creating an enterprise data and AI architecture. To do this we are bringing the right people together to ensure we are collecting, organising and using the right data, in the best way.

Eureka: What do you think will be the “Next Big Thing” in the application of AI in drug discovery? 

OE: I think the next 5-10 years will be really exciting. I think one of the next big advances will be a much tighter integration with automation that allows us to move from an augmented drug design paradigm where the design chemist takes all the decisions to an autonomous drug design paradigm, where the system can autonomously decide which compound to make next.

Eureka: Robots are ubiquitous in entertainment. Who is your favorite robot?

OE: R2-D2 since I saw the first Star Wars movie in 1977 as a kid.

Thanks for tuning in. Our next Q&A in this series on AI in Drug Discovery will be with Dr. Gisbert Schneider, PhD, a professor of Computer-Assisted Drug Design at the Institute of Pharmaceutical Sciences in the Department of Chemistry and Applied Sciences, ETH Zurich. His focus has been the development and application of adaptive intelligent systems for molecular de novo design and drug discovery. You can follow our series here.