The power of AI and computation in drug discovery

Diba Dindoust
11 min readOct 24, 2021

Have you ever looked at the bottle of Tylenol in your kitchen cupboard and asked yourself how such a useful medicine came about? No matter if I’d had surgery, the flu, cramps, or soreness, Tylenol has been a quick off-the-shelf medication to reduce any type of pain I had been facing.

Sometimes, after taking a pill and waiting for the pain to subside, I look at the bottle of medication in my hands and think about who created it and how they did it. And sometimes I think about the times when we didn’t have such an accessible medication to get rid of physical pain.

Tylenol, a brand name of the drug acetaminophen, is one of those off-the-shelf medications that are so widely used today that we sometimes forget that there was actually much time and money spent to discover acetaminophen in the 19th and 20th centuries. Today, there are thousands of drugs being discovered and developed for many diseases; in the next decade or so, we will only see a handful of at our local pharmacies sell these drugs.

The phase in the development of a new drug.

Developing new drugs is hard…

Developing new drugs is a lengthy process that can take 10 years from initial discovery to reach the marketplace. The average cost to research and develop each successful drug is estimated to be $2.6 billion. The development of a new drug has the phases outlined in the diagram to the left.

The future of medicine is personalization

Additionally, there is an increasing demand for personalized medicine. Computational drug discovery methods, more specifically artificial intelligence (AI), can interpret large data and be of use to create more personalized medicine.

Table of Contents

1. Applications of AI and computational methods in drug discovery

2. Breakthroughs in R&D

3. Problems to be solved

4. Market growth of computational drug discovery

1. How are AI and computational methods used in drug discovery?

The stages in drug discovery and development. In this section, only drug discovery is discussed.

Finding new drugs isn’t a cheap process. Most pharmaceutical companies will spend around $1 billion to introduce a new drug to the market. The high cost is due to the classical experimentation methods being expensive. On top of being costly, traditional experimentation methods develop drugs for a sole purpose or target.

Novel AI and computational methods speed up and reduce costs in the whole drug discovery workflow. These computational models refine the design phase of drug discovery by working on large amounts of data from multiple angles to complete pharmacological tasks. Such tasks can be in pharmacodynamics, understanding the effects of drugs, and pharmacokinetics, the study of how drugs move through the body. Tasks that would have taken many years and money for a human to complete are now boiled down to a fraction of the time and money with computers.

Let’s take a look at the applications of computational methods and AI in the most prominent early stages of drug discovery.

1.1 Identify and validate the target

Components of target validation.

First, the function of a therapeutic target, such as a gene or a protein, is determined and its role in the disease. We can expect more therapeutics benefits and safety in the clinic if we have an adequate understanding of the target-disease relation.

This process usually takes many years by doing functional analysis, expression profile, cell-based models, and biomarker identification. Machine learning (ML) or bioinformatics can be used to analyze genes, find genes and predict the folding of a protein (ex: AlphaFold).

More on target validation.

1.2 Hit identification

Percentage of which hit identification methods are used the most.

Once we’ve identified the target, we then want to know which therapeutic compounds can bind to the target and modify its function. ML and cheminformatic approaches eliminate the need to test each compound individually by running rapid analysis tests on millions of compounds at the same time. This method is called high-throughput screening (HTS). The active compounds identified by HTS are called “hits”. Some other methods are affinity selection and fragment-based techniques. Consequently, time and money are saved to find a number of hits to be developed into leads.

More on hit identification.

1.3 Hit to lead optimization

A routine workflow for hit to lead optimization.

This is the process where identified hits are optimized in their physicochemical properties, potency and selectivity for further in vitro and in vivo testing and lead development. Normally, the analogues of a hit are tested to find structure-activity relationships (SAR) which is important information for the selection and design of structural analogs that have better activity and to verify the core structure of the molecules. Computational docking studies shorten the time it takes for this process by reducing 2 years to just a few months or a year. A computational docking study docks the hit compound to the target to understand how the compound interacts with the target, finding analogs of the hit compound.

Pharmacokinetic tests are also started at this stage to understand the pharmacokinetic properties and AMDE of the hit compounds.

More on hit to lead optimization.

1.4 Lead optimization

Before preclinical studies, the most promising compounds are optimized to have improved effectiveness, diminished toxicity and increased absorption. Computational pharmacokinetics and toxicity studies like kinase profiling, hepatotoxicity studies and cytochrome P450 inhibition assays can be done to save money and time.

More on lead optimization.

2. What is the most interesting R+D happening right now?

The “protein problem” solved by AlphaFold

Over the summer, Google’s Deep Mind released their AlphaFold system to the world. The news rippled out about this new system that could predict the 3D structure of proteins, a task that has taken many years to achieve. AlphaFold uses the genetic sequence of proteins to predict highly accurate 3D models.

For decades, scientists have struggled with predicting how proteins fold; this is called the “protein problem”. The function of proteins is defined by their shape so we can predict the function of a protein if we can predict what shape the protein will take. In this way, we can develop drugs that accommodate the shape of the protein. For years, this work has been done experimentally which is very costly in time and resources — AI can accelerate this task.

The goal of AlphaFold was to be able to predict the structure of a protein by just using a genetic sequence of the protein, with no previously solved proteins as templates. The DeepMind team took to approaches to this problem, both with the use of neural networks. Both approaches train the model to predict two properties of the protein: (a) the distances between pairs of amino acids and (b) the angles between chemical bonds that connect those amino acids.

In the first approach, the accuracy of the model was calculated by a score that adds up the predicted distribution of distances between every pair of residues in a protein. A GNN was used to invent new fragments that the neural network could train on. The second approach optimized accuracy by calculating gradient descent on a complete protein chain.

The 2 waysDeepMind visualizes Alphafold’s prediction accuracy.

DeepMind submitted AlphaFold for CASP13, the 13th community-wide experiment on the Critical Assessment of protein Structure Prediction (CASP). CASP is a worldwide experiment for protein structure prediction that takes place every 2 years. It is like a world championship for research groups developing models for the prediction of the 3D structure of proteins.

Read the paper on AlphaFold.

Read more on CASP.

3. What are some interesting problems to be solved?

Understanding how the drug affects the cell

Imagine an HIV drug is created that binds very well to the HIV protease but then the question remains, how well will the drug protect the T cells from the HIV viral load? Just because the drug binds well to the HIV protease doesn’t mean that that binding will protect the cell.

Sometimes the drug just doesn’t create a large enough disruption in the cell. Sometimes a drug can even bind to multiple targets instead of just one; cancer drugs sometimes affect cancerous cells in unexpected ways.

A good drug is one that has high target binding affinity and phenotypical potency. Docking is used in predicting the target binding affinity and machine learning is used to predict phenotypic activity. Solving this problem could potentially be beneficial to preclinical and clinical studies to avoid multiple binding and improve the efficacy of drugs.

Optimizing drug properties for preclinical studies

During hit to lead optimization, the goal is to improve potency and ADME of a hit before it is further optimized as a lead to be tested on animals. Multi-parametric optimization of the hit can save time and money by improving multiple parameters of a hit at the same time. Multi-parametric optimization can be done with ML models. ML models, especially deep learning (DL) models, present an advantage in the way that they can handle a large number of parameters and features at the same time.

4. How much is the market growing?

The computer-aided drug discovery market is steadily growing with AI presenting great opportunities for the future as the demand for novel molecule drugs increases.

There is an opportunity for growth

According to WHO, the prevalence of chronic diseases like cardiovascular disease has increased and paired with COVID-19, the demand for novel molecule drugs has increased. Such demand is a catalyst for growth in the computer-aided drug discovery market.

The integration of AI in the areas of computer-aided drug discovery mentioned in the previous section can perhaps drive the market as demand is growing by getting the job done quicker, cheaper and with more success. AI also provides the opportunity for start-ups to enter the market. For example the American early-stage company, Mydecine Innovation Group worked with the University of Alberta to create their in-silico drug discovery program for drug screening and development. The involvement of startups attracts investors to the field because of the innovation that they produce.

The benefits of AI

According to Globe Newswire, the global AI in the drug discovery market was approximately worth USD 830 million in 2019 and is expected to grow at a CAGR of 39% to reach USD 12,000 Million by 2026.

Aside from the current application of AI in the drug discovery process, in the future, quantum machine learning presents an opportunity to find patterns in complex biological and pharmaceutical data. Quantum computing presents a huge growth opportunity for the market.

How the market is segmented

the computer-aided drug discovery market is segmented based on discovery type, therapeutic area, end-user, and region.

Type

Sub-segment: Type

Sub segments: structure-based drug design, ligand-based drug design, and sequence-based approaches.

Ligand-based drug design is expected to have the majority of shares by 2028 at $3,366.4 million which is an increase from $1,175.3 million in 2020. This is because ligand-based drug design has key features that provide predictive models that are good for lead compound optimization. The latter is the process of optimizing an identified lead in its affinity, safety, pharmacokinetics, and ADME (absorption, distribution, metabolism, elimination, toxicology).

Furthermore, the two segments of drug type are small molecules and large molecules with small molecules making up 60% of the market shares. The three segments of technology type are deep learning, machine learning and other segments, with machine learning taking up 50% of market shares.

Therapeutic area

Sub-segment: Therapeutic area

Sub-segments: neurology, oncology, cardiovascular disease, respiratory disease, diabetes, and others.

Currently, oncology controls the market at about 40% share of the market. However, the cardiovascular sub-segment is expected to have the highest growth by generating a revenue of $1,368.6 million by 2028. According to CDC, cardiovascular diseases are the leading cause of death in the US, causing 1 in 4 deaths. There is an increasing demand for novel molecule drugs.

End-user

Sub-segments: biotechnology companies, pharmaceutical companies, and research laboratories. The pharmaceutical company sub-segment is expected to have a significant CAGR and generate a revenue of $2,646.9 million due to their growing R&D investments. For example, the partnership between AstraZeneca and Schrodinger is an example of such investment to accelerate computational drug discovery. Currently, Pharmaceutical companies dominate with around 40% market share in the target market.

Region

At the moment, North America dominates the target market at around 50% market share. The market for Asia-Pacific is expected to have its revenue grow from $714.2 million in 2020 to an estimated $2,271.4 million by 2028. This is due to increasing research and innovation by healthcare firms in these countries, as well as the increasing number of disease cases in China and India.

The challenges…

For the computer-aided drug discovery market to achieve its full potential in growth, there is a need for more skilled labourers. We need more people to get involved or become familiar with computational methods. As well, low penetration of computer-aided drug discovery in emerging economies is an impediment to the growth of the field. As with most emerging technologies, computer-aided drug discovery raises the question of how these revolutionary technologies can really impact the world if only a select few can have access to it in the world…

More on computer-aided drug discovery market analysis.

More on the AI in drug discovery market report.

Takeaways

  1. Computational drug discovery and AI have many applications in the drug discovery process that help to save time and money.
  2. Google’s DeepMind has recently made significant progress in solving the “protein folding problem” with AlphaFold.
  3. In the future, more work needs to be done in creating drugs with high target binding affinity and phenotypical activity. More work also needs to be done in multi-parametric optimization during a hit-to-lead optimization.
  4. The computer-aided drug discovery market is steadily growing as the demand for new therapeutics grows. The market also presents an opportunity for startups.

Although currently unclear, drug discovery can become more reliant on computation in the future. Until then computational drug discovery and AI models are best integrated with more traditional methods

Do you want to learn more about my projects?

Follow newsletter: Diba Dindoust

Follow Linkedin: Diba Dindoust

Follow Youtube: Diba Dindoust

--

--

Diba Dindoust

Solving big problems in the world, step by step, through technology. Your source of gene editing, self-improvement, drug discovery, and AI articles.