The devil is in the details. How a software tool for image analysis is helping to accelerate drug discovery. Part four in our series on AI in Drug Discovery.

Anne Carpenter, an Institute Scientist at the Broad Institute of Harvard and MIT, is a pioneer in image-based profiling, the extraction of rich, unbiased information from images for a number of important applications in drug discovery and functional genomics. Her research group develops algorithms and strategies for large-scale experiments involving images. When she was doing her post-doctoral work at MIT’s Whitehead Institute for Biomedical Research, she encountered a bottleneck in the processing of cell images while measuring the size of Drosophila fruit fly cells. She decided to write her own software code to solve the imaging problems, and that code eventually became CellProfiler. Today, the team’s open-source CellProfiler software is used by thousands of biologists worldwide. Eureka connected with Dr. Carpenter as part of its series on AI in Drug Discovery. Here are her emailed responses. 

Eureka: Your lab developed the image software tools CellProfiler in 2005 and later CellProfiler Analyst. What is the difference between them?

AC: CellProfiler automatically identifies cells in images and measures their properties. CellProfiler Analyst allows biologists to use machine learning to train a classifier that can recognize cells that have a particular appearance, or phenotype, of interest. For example, a neuroscientist might use CellProfiler to identify and count the number of synapses per cell in an image. A cancer researcher might use CellProfiler Analyst to detect cells that have a metastatic appearance by providing a few dozen example images of cells that are metastatic and non-metastatic. 

Eureka: You have also developed a unique way of mining and measuring the myriad features of a cell, called image-based profiling. How does that work?

AC: Images contain far more information about the state of cells than is typically measured by biologists—in fact, more than can even be detected by eye! We aim to capture that information by measuring a huge variety of properties of each cell in each image, producing a quantitative snapshot of the cell’s state. This snapshot is called a profile, and once it has been extracted from a cell, we can compare and contrast that cell with other cells, which may have been treated with different drugs, for example.

Eureka: Can you provide an example of how these tools have accelerated drug development?

AC: This ability to quantitatively match cells based on their image-based profile is deceptively simple but there are so many applications in drug discovery. For example, we can take cells from patients with and without a disease and compare their morphology. If we find a difference in their profiles, this can serve as a diagnostic tool, but even better we can now test thousands of drugs to find any that are able to reverse the disease profile and make cells look healthy again.

Eureka: How difficult is it for laboratories to use CellProfiler?

AC: CellProfiler was designed by and for biologists. It is point-and-click software rated as the most usable and flexible by a third party academic review of bioimage analysis software. Really anyone can download it and get started quickly with the examples we have online. But it is also quite flexible and powerful; there are lots of options to do complex analyses, including a new module that can run a trained deep learning model, to detect nuclei. That feature is a bit more complex because you need to separately install software for deep learning. One of our major efforts going forward is to make user-friendly software for biologists to use deep learning in their imaging research.

Eureka: What do you think are the biggest challenges in using AI to help discover drugs? 

AC: So much of drug discovery, and biology, involves teasing apart mechanisms from the messiness—a lot of problems can’t be reduced to a simple machine learning classification or prediction task. And we don’t have the right kind of ground truth data to make predictions—for example, if we wanted to predict toxicity of drugs, it would be nice to have a database of all drugs, each given to thousands of healthy humans and monitoring every aspect of their health over 80 years. But such a thing doesn’t exist and instead we only have sparse data. So I think the biggest challenge is in framing drug discovery problems in a way that AI tools can use the data we *do* have to make progress. 

Eureka: What do you think will be the “Next Big Thing” in the application of AI in drug discovery?

AC: We are in the midst of a convergence of massive amounts of new imaging data from high-throughput experiments, and powerful machine learning algorithms and strategies. Using images to detect disease-related phenotypes and to match effective drugs to those diseases, computationally, will transform drug discovery in my estimation.

Eureka: Robots are ubiquitous in entertainment. Who is your favorite robot?

AC: I’m a nerd, but not THAT kind of nerd :D.