Microsoft AI researchers develop MoLeR: a deep learning-based generative model that enables efficient drug design

This Article Is Based On The Research Paper 'LEARNING TO EXTEND MOLECULAR SCAFFOLDS WITH STRUCTURAL MOTIFS' and Microsoft Article. All Credit For This Research Goes To The Researchers Of This Paper 👏👏👏

✍ Submit AI Related News/Story/PR Here

Please Don't Forget To Join Our ML Subreddit

Health systems constantly need new medicines to meet unmet medical needs in various therapeutic areas. Pharmaceutical industries strive to bring new drugs to market through complex drug discovery and development activities. Identifying and validating targets, identifying hits, creating and optimizing leads, and finally identifying a candidate for further development are all part of the discovery process. Development, on the other hand, includes optimizing chemical synthesis and formulation, researching toxicity in animals, conducting clinical trials, and finally obtaining regulatory approval. Both of these procedures take a lot of time and cost a lot of money.

Expert medicinal chemists are currently working on the development of “hit” molecules, which are compounds that show some potential but also have unfavorable characteristics in early detection. Chemists aim to modify the structure of affected compounds in further testing to improve their biological effectiveness and eliminate potential negative effects. To focus costly and time-consuming research on the most promising compounds, computer modeling approaches have been created to predict how molecules will behave in the laboratory.

To overcome these issues, a new study by the Microsoft Generative Chemistry team in collaboration with Novartis has developed a model named MoLeR. Their paper, “LEARNING TO EXTEND MOLECULAR SCAFFOLDS WITH STRUCTURAL PATTERNS,” demonstrates how generative models based on deep learning can help transform the drug discovery process and discover new molecules faster.

The researchers also believed that the automatic development of compounds that better meet the project’s requirements than existing candidate compounds would facilitate the drug discovery process. They notice that only a few promising molecules exist in the vast, largely unexplored chemical space. Finding them requires a level of imagination and intuition that pre-programmed algorithms or established rules cannot capture.

The team had previously developed CGVAE, a generative model of molecules that performed well on simple synthetic tasks. However, two issues limited the applicability of the CGVAE model in actual drug discovery:

  1. It cannot be naturally constrained to explore only molecules containing a specific substructure (called the scaffold)
  2. Due to its low-level, atom-by-atom generative procedure, it has difficulty reproducing key structures, such as complex ring systems.

Molecules are represented as graphs in the MoLeR model, with atoms appearing as vertices connected by edges corresponding to bonds. The team used the auto-encoder paradigm to train the model. Here, the encoder is a graph neural network (GNN) that attempts to compress an input molecule into a latent code. On the other hand, the decoder attempts to reconstruct the original molecule from this code.

The reconstruction procedure is designed to be sequential because the decoder must decompress a short encoding into a graph of any size. They extend the partially produced graph by adding new atoms or bonds in each phase.

Rather than relying on previous predictions, the model’s decoder makes predictions at each step simply based on a partial graph and latent code.

The researchers explain that drug compounds are not made up of combinations of random atoms and are usually made up of larger structural patterns, similar to how sentences in spoken languages ​​are made up of words rather than random sequences of letters. Unlike CGVAE, MoLeR learns to extend a partial molecule using complete motifs after first discovering these common building components from data (rather than single atoms). MoLeR is also trained to build the same molecule in different orders because the order in which the molecule is built is arbitrary.

As a result, MoLeR not only requires fewer steps to make drug-like compounds, but it also does so in steps that are more analogous to how scientists think about building molecules.

A scaffold is defined as a crucial component of the molecule that has already demonstrated promising properties. Drug development programs typically focus on a narrow subset of the chemical space by first identifying a scaffold and then examining only compounds that contain the scaffold as a subgraph. By using an arbitrary scaffold as the initial state in the decoding loop, MoLeR’s decoder design allows smooth integration of an arbitrary scaffold. MoLeR learns to complete arbitrary subgraphs via random generation order during training, making it suitable for targeted scaffold-based exploration.

The researchers mention that although MoLeR does not have a concept of “molecular optimization”, it is possible to use an out-of-the-box black box optimization method to perform in-space optimization of latent codes. . They used Molecular Swarm Optimization (MSO) in this work because it produces state-of-the-art results for latent space optimization in other models. Their findings show that it also works well for MoLeR. The team tested optimization with MSO and MoLeR on new benchmark tasks that resemble real drug discovery projects involving huge scaffolds and found that this combination outperformed current models.




James G. Williams