In Silico ADMET Modeling: The 20-Year Machine Learning Roadmap Cutting Drug Attrition and Protecting Pharma’s $2.6B R&D Bets

I. Executive Summary

Drug discovery’s core problem has not changed: roughly 95% of candidates that enter clinical trials fail, and ADMET-related failures (poor pharmacokinetics, organ toxicity, metabolic instability) account for a disproportionate share of that attrition. In the late 1990s, ADME/PK failures drove 40% of clinical losses. By the early 2010s, systematic early ADMET screening had pushed that figure to 11%.

Machine learning did not cause that improvement alone. Better assay throughput, improved compound library curation, and the strategic shift from late-stage to early-stage ADMET evaluation all contributed. But ML is what made the strategy scalable. When a company runs 300,000 virtual compounds through an ADMET classifier before committing to synthesis, it is doing something that QSAR alone could not reliably support, and that animal models could never economically sustain.

This report traces the technical evolution of in silico ADMET prediction across four distinct computational eras, from early extended connectivity fingerprint (ECFP)-based classifiers to multitask graph neural networks and generative molecular design. It examines Bayer Pharma’s platform as the most thoroughly documented longitudinal case study in the field, quantifies the IP value embedded in proprietary ADMET datasets and model architectures, and lays out a forward roadmap covering quantum machine learning, Explainable AI, and Organ-on-a-Chip integration through 2032.

For R&D leads, this is a platform decision guide. For IP teams, it identifies which computational assets are defensible via trade secret versus patent. For institutional investors, it maps which companies hold structural advantages in ADMET modeling capability and how those advantages translate to pipeline productivity and valuation premium.

II. What ADMET Actually Governs — and Why It Ends Clinical Programs

ADMET is the shorthand for five pharmacological dimensions that collectively determine whether a compound can function as a drug in a human body. Understanding each dimension at a mechanistic level is necessary before evaluating any computational approach that claims to predict them.

Absorption governs the fraction of an administered dose that reaches systemic circulation. For orally delivered drugs, the rate-limiting steps are intestinal epithelial permeability and first-pass hepatic extraction. The Caco-2 permeability assay, which uses a human colorectal adenocarcinoma cell monolayer to simulate small intestinal epithelium, is the standard in vitro proxy. P-glycoprotein (P-gp) efflux is a frequent complication: compounds that are P-gp substrates get pumped back into the intestinal lumen even when passive permeability is favorable, producing low oral bioavailability. Lipinski’s Rule of Five (molecular weight below 500 Da, cLogP below 5, no more than 5 hydrogen bond donors, no more than 10 hydrogen bond acceptors) remains the most widely cited absorption heuristic, though it fails for natural products, macrocycles, and certain modern targeted therapies.

Distribution tracks where a drug goes after it enters circulation. The volume of distribution (Vd) captures the apparent space a drug occupies: compounds with high plasma protein binding tend to have low Vd because most of the drug is sequestered in the bloodstream rather than penetrating tissues. Blood-brain barrier (BBB) penetration is a distribution sub-problem of particular importance in CNS programs. The BBB’s tight junctions and active efflux transporters (P-gp, BCRP) create a second absorption barrier inside the body. Incorrectly predicting CNS penetration has killed multiple CNS programs that showed excellent peripheral exposure with negligible brain concentration.

Metabolism is where the cytochrome P450 (CYP) enzyme family dominates the pharmacological landscape. CYP3A4, CYP2D6, CYP2C9, CYP2C19, and CYP1A2 collectively metabolize approximately 90% of marketed small-molecule drugs. A compound can fail for two distinct metabolism-related reasons: it clears too quickly (short half-life, requiring inconvenient dosing or IV administration) or it produces reactive metabolites that damage hepatocytes or trigger immune-mediated hypersensitivity. Site-of-Metabolism (SoM) prediction, the identification of which specific atoms a CYP enzyme will oxidize, has become a distinct computational subfield within ADMET modeling.

Excretion covers how drugs and metabolites leave the body. Renal clearance is the dominant pathway for hydrophilic compounds; biliary excretion handles larger, more lipophilic molecules. Compounds with extremely low clearance accumulate over multiple doses, creating toxicity risk even at doses that appear safe on day one. Compounds with extremely high clearance require frequent dosing or sustained-release formulations that complicate chemistry, manufacturing, and controls (CMC) work.

Toxicity spans a spectrum from acute cytotoxicity to organ-specific damage, genotoxicity, and cardiotoxicity. hERG (human Ether-a-go-go-Related Gene) channel inhibition is the canonical cardiotoxicity liability: off-target hERG binding prolongs the cardiac QT interval, increasing arrhythmia risk. This liability has terminated multiple late-stage programs, including several that cleared Phase II efficacy endpoints. DILI (drug-induced liver injury) is the leading post-market safety withdrawal cause, yet it remains among the hardest ADMET endpoints to predict in silico because its mechanisms are heterogeneous.

Key Takeaways — Section II

Understanding each ADMET parameter at the mechanistic level tells you which computational approach is appropriate. Permeability is relatively tractable for ML because the physics of membrane diffusion is learnable from structural features. CYP-mediated clearance is harder because enzyme promiscuity means the same molecular scaffold behaves differently in different metabolic contexts. DILI is the hardest because it involves immune response components that are not encoded in molecular structure at all. Any vendor claiming uniform high accuracy across all five ADMET dimensions should be pressed on which endpoints drive that aggregate number.

III. The Economic Case for Computational ADMET: Numbers Decision-Makers Need

Why ADMET Failures Are Catastrophically Expensive

The canonical figure for bringing a new molecular entity (NME) to market crossed $2.6 billion in 2024, and the number keeps climbing as late-stage trials get larger and longer. Most of that cost is not the experiment itself but the opportunity cost of failed candidates that consumed Phase I and Phase II resources before scientists recognized the compound would not work.

A clinical-stage ADMET failure is qualitatively different from a preclinical one. A compound that fails for hepatotoxicity in Phase II has typically undergone 5 to 8 years of optimization, synthesis, formulation development, IND-enabling toxicology studies, and first-in-human dose escalation. The median cost of a single clinical trial sits at $19 million, and that median obscures the much higher costs of oncology and CNS programs where patient recruitment is expensive and failure rates are highest.

The 10% improvement rule is the clearest financial framing: improving the probability of correctly identifying a clinical failure in preclinical stages by just 10% saves approximately $100 million per drug. That math explains why major pharma companies invest eight-figure sums in computational ADMET infrastructure that, from a P&L perspective, produces no direct revenue. The return is entirely in avoided losses.

Market Scale and Growth Trajectory

The global in silico drug discovery market was valued at $3.61 billion in 2024 and is projected to reach $7.22 billion by 2030 at a 12.2% CAGR. The broader AI in pharma market is growing faster, from $2.92 billion in 2024 to $3.8 billion in 2025 at a 30.1% CAGR, with forecasts placing it at $9.64 billion by 2029. The dedicated ADMET testing market (in vitro and in silico combined) was $6.38 billion in 2024, growing at approximately 10% annually.

These figures matter for one specific reason: the gap between in silico market growth (12.2% CAGR) and the broader AI-in-pharma market growth (30.1% CAGR) suggests that ADMET-specific platforms are growing more slowly than AI drug discovery overall. That gap reflects the relative maturity of ADMET as a computational discipline compared to target identification and molecular generation, where AI hype is newer and investment is more recent.

The Fail-Early Calculus in Practice

Pharmaceutical companies that adopted systematic early ADMET screening in the late 1990s saw ADME/PK-related clinical failures drop from 40% to 11% of total attrition. That 29-percentage-point reduction did not happen because the industry got better at identifying safe compounds. It happened because companies stopped advancing compounds with predictable ADMET liabilities. The pipeline quality improved because the filtering happened earlier, not because the biology got simpler.

This distinction matters for anyone evaluating an AI drug discovery platform. The right question is not ‘does this platform generate better molecules?’ but ‘at what stage does this platform identify ADMET liabilities, and what does advancing that identification forward in time save?’ The second question has a quantifiable answer. The first often does not.

Key Takeaways — Section III

The economic argument for in silico ADMET is not speculative. A 10% improvement in preclinical failure prediction saves $100 million per drug. ADME/PK failures fell from 40% to 11% of clinical attrition after early screening became standard. The market is growing at 12.2% CAGR but is mature relative to broader AI-in-pharma, which means competitive advantage now depends on model accuracy and dataset size rather than on being first to deploy a machine learning framework.

Investment Strategy — Section III

Portfolio managers should distinguish between companies that use in silico ADMET tools and companies that own proprietary training datasets large enough to build models that outperform commercial alternatives. The latter have a compounding advantage: more data produces better models, better models attract more R&D partnerships, and more partnerships generate more proprietary data. Evaluate the size, homogeneity, and structural diversity of a company’s internal ADMET dataset before using AI capability as a valuation input. A company with 50,000 homogeneous Caco-2 measurements has a different IP position than one with 2 million structurally diverse in-house measurements.

IV. Phase I (2001–2010): Rule-Based Systems, Early QSAR, and the Birth of Computational ADMET

The Computational Environment in 2001

When Bayer Pharma initiated its in silico ADMET program around 2001, the computational toolkit was defined by three constraints: sparse experimental datasets (sometimes fewer than several hundred data points per endpoint), limited molecular representation methods, and computing infrastructure that made large-scale molecular dynamics simulations prohibitively slow.

Early computational ADMET relied on two theoretical approaches. Ligand-based methods, particularly 3D-QSAR and pharmacophore modeling, identified structural features correlated with ADMET outcomes by analyzing known active and inactive compounds. Structure-based methods including molecular docking attempted to simulate drug-receptor interactions directly from protein crystal structures. Both approaches struggled in the ADMET context because ADMET targets, particularly CYP enzymes, are promiscuous: the same binding site accommodates hundreds of structurally diverse substrates, making a receptor-based model inherently unstable.

The most tractable problems in this era were also the most empirical. Lipinski’s Rule of Five, published in 1997, was not a machine learning model but a statistical observation from Pfizer’s oral drug database. It remained the dominant absorption filter well into the 2000s precisely because it was simple, interpretable, and reasonably predictive for the chemical space that 1990s pharma actually explored. Its limitations only became apparent as the field moved toward higher-molecular-weight biologics, PROTACs, and macrocycles that systematically violated its constraints.

Molecular descriptors during this era were computationally generated numerical representations of molecular structure. Extended Connectivity Fingerprints (ECFPs), specifically ECFP4 and ECFP6 variants, became the default representation. ECFPs encode the chemical environment around each atom out to a defined radius, producing a bit-vector that captures local structural motifs. Their computational efficiency and reasonable predictive performance for classification tasks made them the ‘work-horse’ descriptor at Bayer and across the industry throughout this decade.

Strategic Impact of Early ADMET Integration

The most consequential change in this era was not a specific algorithm but a strategic shift in when ADMET evaluation occurred in the drug discovery workflow. Before widespread in silico adoption, ADMET was a late-stage filter: synthesize compounds, identify the most potent series, then test the best candidates for pharmacokinetic properties. That sequencing meant that by the time a problematic ADMET property was identified, the medicinal chemistry team had spent years optimizing around a scaffold that could not be salvaged.

When ADMET evaluation moved to the design stage, the question changed from ‘can we fix this compound?’ to ‘should we build in this direction?’ The computational infrastructure of the early 2000s was just good enough to support that earlier question for a subset of endpoints (solubility, permeability, basic CYP inhibition) even if predictions were imprecise. The strategic value of early integration exceeded the technical value of any individual model’s accuracy.

Key Takeaways — Section IV

ECFP-based descriptors and rule-based filters (Lipinski, Veber, etc.) established computational ADMET as a practical discipline even before sophisticated machine learning was available. The strategic value of early integration, meaning evaluating ADMET at the design stage rather than the candidate selection stage, exceeded the technical accuracy of the tools. The industry’s shift to early ADMET screening between the late 1990s and mid-2000s is the reason ADME/PK-related clinical failures dropped from 40% to 11%, not any single algorithmic advance.

V. Phase II (2010–2018): Random Forest, SVM, and the Data-Volume Inflection

Algorithm Selection and Why It Stabilized

By the early 2010s, the field had converged on a short list of effective algorithms for ADMET prediction: Random Forest (RF), Support Vector Machines (SVMs), and Gradient Boosting (GB). This convergence was not accidental. These algorithms share properties that made them well-suited to the ADMET prediction problem as it existed at the time: they handle high-dimensional sparse feature vectors efficiently, they are relatively robust to the class imbalance that characterizes many toxicity datasets, and their hyperparameters are interpretable enough that experienced computational chemists can reason about why a given model performs well or poorly.

Random Forest consistently outperformed SVMs on diverse chemical libraries because its ensemble structure provides implicit regularization, reducing overfitting when training sets contain structural bias. SVMs performed comparably on congeneric series where the chemical space was well-defined and the support vectors captured meaningful structural boundaries. Gradient Boosting added sequential error correction that improved performance on endpoints where the relationship between molecular features and biological outcome was non-linear and non-additive.

Bayer’s internal experience through this period confirmed what many groups found empirically: SVMs and Random Forests were among the most effective algorithms for their applications. The important implication is that algorithm selection stabilized before deep learning arrived. This meant that improvements in model performance between approximately 2010 and 2016 were almost entirely driven by data volume and quality rather than algorithmic innovation.

The Bayer-Schering Merger as a Data Event

The 2007 merger of Bayer and Schering AG was primarily a business transaction: Bayer paid $16.9 billion to acquire a company whose portfolio included betamethasone-based dermatologics, oral contraceptives, and the diagnostic imaging agent Magnevist. For the computational chemistry team, it was a data event.

Schering’s pharmaceutical division had been running ADMET assays for decades under different experimental protocols, different compound libraries, and different data management systems. Merging the two datasets required extensive data harmonization, including salt stripping, charge standardization, tautomer normalization, stereochemistry flattening, removal of uncertain measurements, and aggregation of replicate experiments using median values. This is unglamorous work, but it is exactly the kind of work that determines whether a merged dataset trains a better model or simply trains a noisier one.

Once harmonized, the combined Bayer-Schering dataset provided substantially greater structural diversity than either dataset alone. Structural diversity is not just a feature-space concept. It means the model has seen more of the relevant chemical universe and is less likely to extrapolate wildly when presented with a compound from a new scaffold class. Bayer’s pharmacokinetic and physicochemistry assays subsequently generated thousands of new data points annually, compounding the structural diversity advantage year over year.

From Classifiers to Regression Models

A technical milestone that is underappreciated in most ADMET retrospectives is the transition from classification to regression models for the most important endpoints. Classification models predict a binary outcome: a compound either clears a threshold or does not. That is useful for go/no-go decisions but provides no gradient information. If two compounds both fail a permeability classifier, a classification model cannot tell the medicinal chemist which one is closer to the threshold, or what structural change would push it over.

Regression models predict a continuous value, such as the actual permeability coefficient in centimeters per second, rather than a binary pass/fail designation. Transitioning from classifiers to regression models requires more data, because the model must learn the shape of the dose-response curve rather than just which side of a line a compound falls on. Bayer’s data expansion in the post-merger period made this transition possible for several key endpoints. The practical result was that medicinal chemists received actionable numerical predictions they could use to guide optimization rather than binary flags they could only use to eliminate candidates.

Data Quality as Competitive Moat

The most important insight from Phase II is structural: proprietary training data is a more durable competitive advantage than any specific algorithm. Random Forest was available to everyone. The specific hyperparameter combinations that Bayer identified as optimal were not secret. What Bayer had that external competitors did not was a decade of homogeneous in-house ADMET measurements on diverse chemical scaffolds, collected under consistent protocols, carefully curated to remove noise.

This is why the ‘give-to-get’ consortium model that has emerged more recently in the industry is strategically significant. Companies that contribute proprietary data to pre-competitive consortia gain access to a much larger pooled dataset. Whether that trade is favorable depends on how unique their contributed data is relative to what the consortium returns, and on how much of their competitive advantage is downstream of ADMET prediction (in target biology, clinical execution, or regulatory strategy) versus in ADMET prediction itself.

Key Takeaways — Section V

The 2007 Bayer-Schering merger was as much a data acquisition event as a portfolio acquisition. Model performance gains between 2010 and 2018 were driven by data volume and quality, not algorithm selection. Random Forest and SVMs stabilized as the dominant approaches before deep learning arrived, meaning that the deep learning transition in Phase III required an algorithmic case for superiority that was not obvious from Phase II performance benchmarks alone. Regression models replaced classifiers for high-value endpoints as dataset sizes crossed a practical threshold, giving medicinal chemists gradient information rather than binary flags.

Investment Strategy — Section V

Evaluate pharma companies’ computational ADMET capability by asking how many homogeneous, internally generated data points they have per key endpoint. Companies with fewer than 10,000 curated measurements per endpoint for critical parameters like Caco-2 permeability, CYP3A4 clearance, and hERG inhibition are almost certainly using commercially licensed data or public datasets, which means their models face the same training data as competitors. Internal dataset size is often disclosed in scientific publications or conference presentations by computational chemistry leads, even when it is not in earnings calls.

VI. Phase III (2018–Present): Deep Learning, Graph Neural Networks, and Generative Design

Why Deep Learning Finally Worked for ADMET

Deep learning had been theoretically applicable to ADMET prediction for years before it became practically dominant. Three changes in the mid-2010s made it work: dataset sizes crossed the scale required to reliably train networks with millions of parameters, GPU hardware became accessible enough to run training workloads on commodity infrastructure, and the molecular representation problem was partially solved by graph-based encodings that did not require hand-crafted descriptors.

Fully Connected Neural Networks (FCNNs) were the first deep architectures deployed for ADMET prediction. They took the same ECFP descriptor vectors used by Random Forest as input, replacing the tree-based decision boundaries with learned nonlinear transformations. Performance improvements over RF were modest on small datasets and became meaningful at datasets exceeding roughly 10,000 examples per endpoint, which is why the transition was gradual rather than immediate.

Convolutional Neural Networks (CNNs) and Recurrent Neural Networks (RNNs) extended deep learning to SMILES string representations of molecules, treating chemical structure as a sequence rather than a fixed-length vector. RNNs, particularly Long Short-Term Memory (LSTM) architectures, were applied to generate novel SMILES strings — an early form of molecular generation that predated more sophisticated generative approaches.

Graph Neural Networks (GNNs) are the most significant architectural advance for molecular property prediction. A molecule is naturally a graph: atoms are nodes, bonds are edges, and the graph structure encodes the chemical topology without any information loss from the fingerprinting process. GNNs apply learned aggregation functions that propagate atom-level features across the molecular graph through multiple message-passing steps. After several rounds of aggregation, each atom’s learned representation captures information about its extended chemical neighborhood, producing a molecule-level representation that the network uses for property prediction.

The critical advantage of GNNs over ECFP-based methods is their ability to learn the relevant structural features from data rather than relying on a predefined fingerprinting algorithm. ECFPs make a fixed set of structural assumptions baked into the fingerprinting radius and bit-vector size. GNNs, trained on large diverse datasets, discover which structural motifs are predictive for a given endpoint. For endpoints where the chemical space of interest has shifted significantly from the training data (a persistent challenge for all in silico ADMET models), GNNs extrapolate more reliably than fixed-descriptor methods.

Multitask Learning: The Architecture That Changed Prediction Economics

Multitask learning is the practice of training a single neural network to simultaneously predict multiple related properties. For ADMET modeling, this means one network predicts Caco-2 permeability, metabolic stability, hERG inhibition, solubility, and plasma protein binding simultaneously. The network shares its lower-layer feature representations across all endpoints while maintaining endpoint-specific output heads.

The performance improvement from multitask learning is most pronounced for endpoints with limited training data. When the network learns that structural features predictive of metabolic stability are partially correlated with features predictive of CYP inhibition, it can transfer learned representations from the larger, better-characterized CYP dataset to improve the metabolic stability model even when direct metabolic stability training data is sparse. Quantitative performance improvements of 5 to 15% have been reported for data-sparse endpoints in multitask settings compared to single-task neural network baselines.

From a resource economics standpoint, a single multitask model also reduces the engineering overhead of maintaining multiple independent models. Bayer and similar large pharma groups maintain ADMET prediction pipelines covering dozens of endpoints; a multitask architecture that shares infrastructure across endpoints reduces both computational cost and model maintenance burden.

Anisotropic Quantum Chemical Descriptors and SoM Prediction

Site-of-Metabolism prediction deserves specific technical attention because it is one area where deep learning integration with quantum chemistry has produced results that exceed what either approach achieves independently.

SoM prediction requires identifying which specific atom in a molecule CYP enzymes will oxidize. Standard ECFP-based methods treat atoms as graph nodes with chemical type labels; they do not encode directional electronic properties. Anisotropic atomic reactivity descriptors, derived from quantum mechanical charge distribution calculations, capture the directional electron density around each atom. This directionality determines which atoms are most susceptible to CYP oxidation based on orbital accessibility.

When anisotropic quantum chemical descriptors are used as input features in a GNN or FCNN, SoM prediction accuracy improves substantially over purely fingerprint-based methods. This is one of the clearer examples in computational ADMET of a hybrid approach, combining physics-based molecular quantum mechanics with data-driven machine learning, outperforming either approach alone.

The practical value is direct: accurate SoM prediction tells medicinal chemists exactly where to place metabolic blocking groups (fluorine substitution, methyl groups at reactive positions) to improve metabolic stability without requiring synthesis and testing of multiple blocked analogs.

Generative Models and the Shift from Prediction to Design

The most strategically significant capability that Phase III introduced is not better prediction of existing compounds but the ability to generate novel compounds with specified ADMET profiles. This shift from prediction to generative design is the difference between a filter and a design tool.

Variational Autoencoders (VAEs) encode molecules into a continuous latent space. Any point in that space decodes to a molecular structure. The decoder learns a differentiable map from latent vectors to SMILES or graph representations, which means optimization algorithms can move through latent space to find structures with desired properties. When a trained ADMET predictor is connected to the decoder, the generative loop can be optimized to produce molecules that satisfy multiple simultaneous ADMET constraints.

Generative Adversarial Networks (GANs) take a different approach: a generator network produces candidate molecules while a discriminator network distinguishes generated structures from real drug-like compounds. The adversarial training produces molecules that are chemically realistic (valid, synthesizable) while allowing the generator to be steered toward favorable ADMET property regions.

Reinforcement Learning (RL) is increasingly used to steer both VAE-based and graph-based generative models. The RL agent takes structural modification actions on a candidate molecule, receives a reward signal based on predicted ADMET properties and potency, and learns a policy that generates drug-like molecules with pre-optimized profiles. Insilico Medicine’s work on IPF target discovery demonstrated the operational consequence: AI-nominated preclinical candidates moved from target identification to clinical entry in 30 months at a fraction of the cost of a conventional program.

Key Takeaways — Section VI

GNNs learn molecular representations directly from graph topology without fixed fingerprinting assumptions, making them more robust to novel chemical space than ECFP-based methods. Multitask learning transfers learned representations from data-rich endpoints to data-sparse ones, improving prediction accuracy for hard-to-characterize ADMET properties by 5 to 15%. Anisotropic quantum chemical descriptors enable SoM prediction accuracy that outperforms purely data-driven approaches. Generative models combined with RL have moved in silico ADMET from ‘screen and filter’ to ‘design with constraints built in’ — a qualitatively different capability that compresses preclinical timelines by two years or more.

Investment Strategy — Section VI

Companies that have deployed multitask GNN architectures connected to generative design pipelines have a compounding advantage over companies running single-task classifiers. The former group can produce ADMET-optimized novel chemical matter at the design stage; the latter group is still filtering. In deal negotiations, acquirers should distinguish between a company’s ADMET prediction capability (how accurately it scores existing compounds) and its ADMET design capability (whether it can generate new compounds that satisfy ADMET constraints from the outset). The latter is worth substantially more.

VII. Bayer Pharma’s ADMET Platform: A 20-Year Technical Autopsy

Platform Genesis and the 2001 Mandate

Bayer’s in silico ADMET platform originated around 2001 from a specific internal mandate: generate reliable predictive models for pharmacokinetic and physicochemical endpoints early in drug discovery, before synthesis resources were committed. This was not a research project. It was operational infrastructure, expected to produce scores that influence real synthesis decisions.

The platform’s early architecture reflected the computational state of the art in 2001. ECFP descriptors formed the primary molecular representation. Models were trained on whatever internal experimental data existed, which for most endpoints meant datasets of a few hundred to a few thousand measurements. The primary algorithmic workhorses were the methods that performed well under those constraints: SVMs for classification tasks, early RF implementations as data grew, and simpler QSAR approaches for endpoints where limited data precluded more complex models.

The platform was designed from the beginning with a specific philosophy: models exist to focus experimental effort, not to replace it. The goal was not to eliminate in vitro ADMET experiments but to ensure that the compounds that reached the assay queue were those most likely to have favorable profiles. This focus distinction matters for understanding platform success metrics. Bayer did not evaluate platform success by asking ‘how accurate are our predictions?’ It evaluated success by asking ‘are our synthesized compounds showing better ADMET profiles than they did before the platform existed?’

Post-Merger Architecture Expansion (2007–2015)

The Bayer-Schering merger forced a complete overhaul of data infrastructure before any model improvements could be realized. Schering’s assay data arrived in formats incompatible with Bayer’s existing systems, with different concentration units, different assay protocols for nominally identical endpoints, and in some cases fundamentally different biological read-outs for what were classified as the same property.

The harmonization process established data preparation procedures that Bayer has maintained as internal standards since. Salt stripping removes counterions that are pharmaceutical rather than pharmacological entities. Charge standardization ensures that a compound’s ionization state is handled consistently. Tautomer normalization collapses variant structural representations of the same compound to a canonical form. Removing uncertain data, specifically measurements where assay artifacts were suspected, and aggregating replicates via median values rather than mean reduced the noise that had limited model performance in earlier years.

After harmonization, the combined dataset showed substantially greater structural diversity across multiple scaffold classes. Bayer’s annual pharmacokinetic and physicochemistry assay output consistently added thousands of new data points per endpoint, making the dataset larger, more diverse, and more contemporary each year. This compounding data advantage is the structural foundation of the platform’s long-term performance.

Transition to Regression and Exposure Modeling

The transition from binary classifiers to continuous regression models was driven by the dataset size increases that followed the merger. For endpoints like microsomal clearance, plasma protein binding, and Caco-2 permeability, regression models trained on datasets of 10,000 or more measurements produced predictions with dynamic range information that classifiers could not deliver.

Exposure prediction is the hardest regression target in computational ADMET. In vivo exposure, measured as area under the plasma concentration-time curve (AUC) or maximum plasma concentration (Cmax), depends on absorption, distribution, metabolism, and excretion simultaneously. It is an integrated pharmacokinetic outcome rather than a single-endpoint property. Early Bayer platform versions could not reliably predict exposure because the datasets were too small and the endpoints too interrelated. Deep neural network architectures, which naturally handle multitask learning across correlated endpoints, made continuous exposure prediction tractable by the late 2010s when the underlying datasets had grown large enough to support the model complexity.

The ANDROMEDA model, designed at Bayer for direct in silico to in vivo prediction without an in vitro intermediate step, represents the most ambitious endpoint in this progression. Traditional ADMET workflows run: in silico prediction to prioritize compounds, then in vitro testing to measure selected properties, then in vivo pharmacokinetic studies in preclinical species to estimate human pharmacokinetics. ANDROMEDA attempts to collapse the in silico to in vivo step, predicting in vivo outcomes directly from molecular structure. The validation data for this approach is not yet fully public, but the approach represents the logical terminus of the model development trajectory Bayer has followed over two decades.

Validation Methodology: Why Time-Dependent CV Matters

Model validation in ADMET is not a detail. Choosing the wrong validation strategy systematically overestimates model performance and leads to deployment decisions based on inflated accuracy estimates.

Standard k-fold cross-validation randomly partitions a dataset into k subsets and cycles each as a test set. This works well when the training and test compounds come from similar structural regions. In pharmaceutical research, it is a poor proxy for real-world performance because the compounds a model will be asked to score in the future are structurally different from those in the training set. The model encounters chemical novelty every time a new project scaffold emerges.

Time-dependent cross-validation, which Bayer uses as a preferred validation approach, partitions data chronologically. Models trained on data from years 2001 to 2015 are tested on data from 2016 to 2020. This setup better approximates actual deployment conditions where models trained on historical data must generalize to future compounds. ‘Leave-cluster-out’ CV is an alternative that withholds entire structural clusters rather than random examples, testing generalization across scaffold classes rather than within them. Both approaches produce more pessimistic but more realistic performance estimates than standard cross-validation.

Bayer reserves 20% of its dataset as a held-out external test set, training on the remaining 80% using its preferred CV strategy. The external test set is not used during hyperparameter optimization or model selection, preserving it as an unbiased final performance estimate.

Key Takeaways — Section VII

Bayer’s platform success over 20 years rests on three factors: data infrastructure (harmonized, continuously growing, structurally diverse internal datasets), algorithmic pragmatism (using the most reliable algorithm for each dataset size rather than chasing the newest architecture), and validation rigor (time-dependent and leave-cluster-out CV rather than standard random splits). The platform’s goal was never to eliminate in vitro experiments but to focus them on the most promising compounds, a distinction that determined how platform success was measured and, consequently, what it was optimized to achieve.

VIII. IP Valuation of In Silico ADMET Technology Assets

What Makes Computational ADMET IP Defensible

The intellectual property landscape for in silico ADMET is more complex than for small-molecule drugs or biologics, where patent claims on composition of matter are well-established. Computational methods occupy a contested space where software patents face 35 U.S.C. Section 101 eligibility challenges under Alice/Mayo, model architectures are rapidly reproduced once published, and the most valuable assets are often not the algorithms but the data and the workflows built around them.

Defensible IP in computational ADMET falls into several distinct categories.

Proprietary training datasets are the most durable competitive asset. A company with a decade of homogeneous, internally generated ADMET measurements across chemically diverse compound libraries has something that cannot be reconstructed quickly by a competitor. Patents do not protect datasets; trade secret law does. The key requirement for trade secret protection is that the dataset is maintained as confidential (not published or shared without NDA protection) and that reasonable steps are taken to maintain its confidentiality. This means internal dataset access controls, employee NDAs, and careful review of any publications that might inadvertently disclose training data composition or size.

Model architectures are patentable in principle but difficult to defend in practice. The European Patent Office and USPTO have both granted patents on specific ML architectures applied to drug discovery, but the patent claims must recite a specific technical improvement over prior art rather than an abstract idea implemented on a computer. SoM prediction using anisotropic quantum chemical descriptors is an example of a claim that has a reasonable chance of surviving Section 101 scrutiny because it combines a novel molecular representation (anisotropic atomic reactivity) with a specific technical application (CYP oxidation site prediction) to produce a result (metabolic stability improvement guidance) that is not achievable with prior methods.

Workflows and software tools built on top of ML models can be protected by copyright (the specific code) and by trade secret (the implementation logic), but not by patents that would preclude competitors from building alternative implementations of the same underlying approach. This means platform-level advantages require either first-mover speed (building a user base before competitors deploy similar tools), proprietary data (ensuring the tool’s accuracy depends on data competitors cannot replicate), or integration depth (embedding the tool so deeply into internal workflows that switching costs become prohibitive).

Bayer’s IP Position in Computational ADMET

Bayer Pharma holds a unique IP position in in silico ADMET based on three compounding advantages: 20+ years of continuously harmonized internal assay data, published platform descriptions that establish prior art for core methods (limiting competitors’ ability to patent similar approaches while not exposing Bayer’s data), and deep integration of the platform into its drug discovery workflow across multiple therapeutic areas.

The disclosure of the platform’s architecture in peer-reviewed publications, most notably the 2020 paper describing the platform’s machine learning history, was a strategic choice. By disclosing the general methodological framework, Bayer both contributes to the scientific community and establishes prior art that prevents competitors from filing blocking patents on the same approaches. The specific models, trained on Bayer’s proprietary data, remain trade secrets even when the architecture is public.

From an IP valuation standpoint, the ADMET platform’s value is not separable from the dataset that trained it. Any acquirer evaluating Bayer’s computational assets should assess the dataset size, endpoint coverage, data quality protocols, and annual accretion rate as the primary value drivers, with the model architecture as a secondary consideration that any competent computational chemistry team could approximately replicate given the same data.

Insilico Medicine: IP Strategy in AI-Native Drug Discovery

Insilico Medicine presents a different IP architecture. Its Chemistry42 generative chemistry platform and its Biology42 target discovery platform generate intellectual property at the intersection of computational methods and biological targets. The company’s IPF candidate INS018-055, which moved from AI-identified target to Phase II in approximately 30 months, creates a specific IP situation: the compound itself is protectable via composition-of-matter patents, the target identification method may have method-of-treatment implications, and the generative platform that designed the compound is a separable trade secret asset.

For institutional investors, the relevant question is whether Insilico’s platform IP creates durable advantage or whether its value is primarily in the first-generation pipeline it produced. If competitors can deploy equivalent generative chemistry platforms (several are now commercially available or in development at Recursion, Exscientia, and others), Insilico’s platform advantage narrows over time. The composition-of-matter IP on INS018-055 and subsequent candidates remains valuable regardless of platform competition, but it requires clinical success to convert to market exclusivity.

Atomwise: Structure-Based IP and Screening Platform Valuation

Atomwise built its platform around structure-based virtual screening using 3D convolutional neural networks applied to protein binding site geometries. Its 2015 demonstration of Ebola drug candidate identification in under 24 hours established the platform’s speed advantage. From an IP perspective, Atomwise’s primary assets are the AtomNet architecture, its proprietary protein-ligand interaction database, and the screening agreements it holds with pharmaceutical partners.

The relevant IP valuation question for Atomwise-style screening platforms is utilization rate and exclusivity structure. Partnership agreements that grant exclusive screening rights for specific targets to pharma partners generate immediate revenue but may limit the platform’s ability to serve multiple partners in the same therapeutic area. Non-exclusive agreements preserve optionality but reduce per-partner revenue. The platform’s long-term value depends on whether AtomNet’s predictive accuracy remains differentiated as competitors deploy their own structure-based deep learning systems.

Technology Roadmap: IP Protection Through the Generative Era

As ADMET prediction transitions from scoring to design, the IP landscape shifts with it. When a generative model produces a novel chemical structure with predicted ADMET properties, the resulting compound can be protected via composition-of-matter patent. The claim that a compound was designed by an AI system does not affect its patentability — the human researchers who directed the design process are inventors, and the compound’s novelty and utility are assessed against prior art in the normal way.

The critical IP practice for companies using generative ADMET platforms is first-mover documentation: recording the date on which a generative model first produced a given structural scaffold, and documenting the ADMET constraints that drove the generation process. This documentation supports priority claims and enables the company to trace a continuous chain of inventive activity from computational design to synthesis and pharmacological confirmation.

IX. Key Vendors, Academic Platforms, and Competitive Positioning

Commercial ADMET Prediction Platforms

The commercial ADMET prediction software market divides into two categories: general-purpose platforms that cover multiple ADMET endpoints using publicly available or licensed training data, and specialized platforms that focus on specific endpoints or chemical spaces using proprietary data.

ADMET Predictor (Simulations Plus) is the most widely deployed commercial platform for small-molecule ADMET prediction. Its endpoint coverage is broad, spanning absorption, distribution, metabolism, excretion, and several toxicity sub-models. Its training data derives primarily from public databases (ChEMBL, PubChem, proprietary licensed datasets) rather than homogeneous in-house assay data, which means its accuracy is generally lower than well-resourced internal models for specific endpoints where a company has large proprietary datasets. Published validation studies have noted that ADMET Predictor’s predictive accuracy decreases substantially when translating from in vitro to in vivo predictions, a known limitation of all commercial platforms that rely on in vitro-derived training data.

Schrödinger’s physics-based ADMET tools occupy a different position: rather than ML-based prediction from fingerprints, they simulate the underlying biophysical processes using free energy perturbation (FEP+) and molecular dynamics. This approach is more computationally expensive but produces more interpretable predictions with better generalization to novel scaffolds in some applications. Schrödinger’s platform is most defensible for endpoints where the biophysics are well-understood and the computational cost is justified by the importance of the decision.

Inductive Bio takes a consortium-based approach, building ADMET models from pooled pre-competitive data contributed by member companies. The model quality benefit of consortium data sharing is real but difficult to quantify without access to each member’s individual contribution. The consortium’s IP framework, which allows data sharing without member companies disclosing proprietary structures, addresses the core tension between data pooling and competitive confidentiality.

Academic Platforms and Open-Source Ecosystems

DeepChem, the open-source deep learning library for the life sciences, provides implementations of GNNs, multitask networks, and classical ML methods for molecular property prediction. Its value to pharma companies is primarily as a benchmark: teams can use DeepChem to evaluate whether their internal proprietary models outperform publicly available architectures trained on public data. If the answer is no, the internal platform adds limited value.

RDKit is the dominant open-source cheminformatics toolkit, providing ECFP generation, SMARTS matching, stereochemistry handling, and basic QSAR utilities. Virtually every commercial and academic ADMET platform uses RDKit for molecular preprocessing; it is infrastructure rather than a competitive differentiator.

OPERA (OPEn (Q)SAR App), developed at the EPA’s National Center for Computational Toxicology, covers environmental fate and human health endpoints under OECD QSAR validation principles. It is most relevant for regulatory submissions requiring QSAR models with documented applicability domain and uncertainty estimates, rather than for internal drug discovery use.

Key Takeaways — Section IX

Commercial ADMET platforms trained on public or licensed data will not outperform well-resourced internal models for endpoints where a company has large proprietary datasets. The relevant competitive question for any pharma company is not ‘should we use a commercial platform or build internally?’ but ‘for which specific endpoints does our internal data size and quality justify internal model development?’ For endpoints where internal data is sparse (rare toxicity endpoints, human-specific pharmacokinetics), commercial or consortium data provides a faster path to reasonable accuracy.

X. Challenges That Still Kill Deployments

The in vitro to in vivo Translation Gap

The most persistent technical failure mode in computational ADMET is the gap between in vitro measurements and in vivo outcomes. A model trained on Caco-2 permeability measurements predicts Caco-2 permeability accurately. But Caco-2 permeability is a proxy for human intestinal absorption, and the proxy relationship breaks down for specific compound classes: P-gp substrates, compounds with pH-dependent solubility that affects dissolution in vivo, and compounds where pre-systemic metabolism in the intestinal wall is significant.

A 2025 bioRxiv preprint from researchers at multiple institutions analyzed the translation accuracy of widely used ADME/PK models and concluded that human pharmacokinetic prediction from in vitro data and in silico models ‘is lost in translation’ for a substantial fraction of chemical space. The quantitative specifics matter: models that perform well in aggregate hide systematic failure modes for specific structural classes or therapeutic areas.

The commercial implication is that ADMET platform vendors who report aggregate performance statistics are not necessarily misrepresenting their tools; they are reporting a number that masks important heterogeneity. IP teams and R&D leads evaluating platforms should request endpoint-specific performance data stratified by scaffold class, and should ask for head-to-head comparison data between in silico predictions and actual human PK outcomes for matched compound sets.

Domain Applicability and Chemical Space Shifts

All machine learning models perform worse on chemical structures that differ substantially from their training data. This is the domain applicability problem, and it is particularly acute in drug discovery because therapeutic modalities shift over time. The transition from pure small-molecule drugs to PROTACs, molecular glues, macrocycles, and antibody-drug conjugates (ADCs) has put each of these structural classes outside the applicability domain of models trained primarily on traditional drug-like small molecules.

GNNs handle chemical space shift better than ECFP-based models because they learn structural features rather than relying on predefined fingerprinting. But even GNNs require training examples in or near the chemical space of interest to perform reliably. A GNN trained exclusively on Rule-of-Five-compliant small molecules will not reliably predict ADMET properties for a 2,000-dalton PROTAC.

Class Imbalance in Toxicity Endpoints

Toxicity endpoints often have severe class imbalance: in a typical corporate compound library, 5% or fewer compounds may show hERG channel inhibition above a clinically relevant threshold. A classifier trained on such data can achieve 95% accuracy by predicting ‘non-hERG inhibitor’ for every compound, which is operationally useless.

Standard approaches to class imbalance include oversampling the minority class (SMOTE), undersampling the majority class, and using class-weighted loss functions during training. None of these approaches is universally superior, and the optimal strategy depends on the specific endpoint and training dataset. For institutional investors, this is a detail that matters when evaluating AI drug discovery companies’ published model performance data: accuracy on imbalanced toxicity datasets is meaningless without precision and recall reported separately, with the positive class defined explicitly.

Regulatory Model Documentation Requirements

The FDA’s April 2025 announcement phasing out mandatory animal testing for specific drug categories increases regulatory pressure on in silico ADMET models to serve as primary evidence rather than supplementary evidence. That transition requires model documentation standards that most pharma in silico teams are not currently meeting.

The OECD QSAR Assessment Framework, which the EMA references in its 2024 AI/ML review, requires five documentation elements for regulatory-grade QSAR models: a defined endpoint, an unambiguous algorithm, a defined applicability domain, appropriate measures of goodness-of-fit and predictivity, and a mechanistic interpretation where possible. Most industry models satisfy the first two requirements but are incomplete on the latter three, particularly applicability domain characterization and mechanistic interpretation.

Explainable AI techniques, specifically SHAP (SHapley Additive exPlanations) and LIME (Local Interpretable Model-agnostic Explanations), address the mechanistic interpretation requirement by attributing prediction outcomes to specific molecular features. These techniques do not solve the applicability domain problem, but they provide the kind of feature-level explanation that regulators need to evaluate whether a model is operating within its reliable prediction space.

XI. Technology Roadmap: 2026–2032

Near-Term (2026–2028): AutoML Maturation and Organ-on-a-Chip Integration

Automated Machine Learning (AutoML) for ADMET modeling has progressed from research prototype to production tool. Published results show that AutoML pipelines searching across algorithm types and hyperparameter spaces can produce models predicting 11 ADMET properties simultaneously with AUC exceeding 0.8 across all endpoints, a performance level that required months of manual development five years ago. The near-term impact is organizational: as model development time compresses, computational chemistry teams can redirect effort from model maintenance to experimental design and biological interpretation.

Organ-on-a-Chip (OoC) systems are becoming practical ADMET testing platforms. Gut-Liver OoC systems that circulate compounds through intestinal epithelium and hepatocyte compartments in sequence can measure bioavailability and hepatic clearance in a single experiment under more physiologically relevant conditions than standard Caco-2 and microsomal clearance assays. Mathematical modeling of OoC microenvironments is a pre-requisite for experimental design — getting cells to the right density, flow rate, and oxygen tension requires computational optimization before any drug is introduced. The integration of OoC experimental data with ML models represents a feedback loop: OoC generates human-relevant ADMET measurements that improve training data quality; better models design better OoC protocols.

Mid-Term (2028–2030): Quantum Machine Learning and Large-Scale Molecular Simulation

Quantum computing applications to drug discovery are moving from theoretical frameworks to early practical implementation. The QCS-ADME (Quantum Circuit Search for ADME) framework, described in an arXiv preprint from early 2025, applies quantum circuit architectures to ADME property prediction. The framework addresses two specific challenges that classical ML handles poorly: imbalanced datasets and regression tasks with non-linear continuous outputs. QCS-ADME introduces weighted matrix approaches for class imbalance and similarity-based continuous target representations for regression, implemented on quantum hardware.

Current quantum computers lack the qubit count and error correction required for full-scale molecular simulation. But quantum-classical hybrid approaches, where quantum circuits handle specific computationally difficult sub-problems while classical computers handle the rest, are viable on near-term hardware. IBM’s roadmap projects 100,000+ physical qubits by 2033; Google and IonQ have demonstrated error-corrected logical qubits at small scale. The transition from quantum advantage on small toy molecules to pharmaceutical-scale ADMET prediction is not immediate, but the trajectory is clear.

More near-term are improvements to classical molecular dynamics and quantum chemistry calculations that inform ADMET model training. Density functional theory (DFT) calculation throughput has increased substantially as GPU-accelerated quantum chemistry codes (TeraChem, QUICK) enable electronic structure calculations on molecules that previously required supercomputer time. The anisotropic quantum chemical descriptors used for SoM prediction are one application of this improved throughput; broader use of electronic structure properties as GNN node features is likely to expand by 2028.

Long-Term (2030–2032): Fully Integrated In Silico to in Vivo Pipelines

The long-term trajectory points toward ADMET prediction pipelines that operate without an in vitro intermediate step for a defined subset of endpoints. ANDROMEDA-class models, designed to predict in vivo pharmacokinetic outcomes directly from molecular structure, represent the approach. Practical realization requires training datasets that pair molecular structure with matched human clinical pharmacokinetic data, which is available from public clinical databases and from proprietary internal sources at major pharma companies.

Physiologically Based Pharmacokinetic (PBPK) models, which simulate drug disposition across compartments representing major organs and tissues, are an alternative path. PBPK models are parameterized by in vitro measurements (plasma protein binding, intrinsic clearance, permeability); connecting in silico ADMET predictions to PBPK model parameters creates a workflow that extends from molecular structure to predicted plasma concentration-time profiles without animal experiments. FDA has accepted PBPK modeling data in regulatory submissions; as in silico ADMET prediction of PBPK input parameters improves, the full computational simulation of human pharmacokinetics becomes more credible.

Key Takeaways — Section XI

The 2026-2028 period will see AutoML reduce ADMET model development time from months to weeks, releasing computational chemistry resources for higher-value activities. OoC systems will generate more physiologically relevant training data, improving in vitro to in vivo translation accuracy. The 2028-2032 period will see quantum-classical hybrid approaches demonstrate ADMET prediction advantages for specific hard problems (highly polar molecules, large macrocycles), and PBPK-coupled in silico pipelines will reduce routine in vivo pharmacokinetic studies for new chemical entities.

Investment Strategy — Section XI

The most strategically defensible position through 2032 is a proprietary ADMET dataset of sufficient size and diversity to train GNN-based multitask models that outperform commercial alternatives, combined with integration into OoC experimental workflows that continuously generate human-relevant training data. Companies that build this combination will see compounding advantages: better models attract partnerships, partnerships generate data, data improves models. Investors should weight R&D spend on experimental platform development (OoC, high-throughput assays) as an ADMET IP investment, not just a biology cost center, because the data those platforms generate is the long-term model training asset.

XII. Regulatory Landscape: FDA, EMA, and the Animal Testing Inflection

FDA’s April 2025 Phase-Out Announcement

The FDA’s April 2025 announcement phasing out mandatory animal testing requirements for certain drug categories is the most consequential regulatory development for in silico ADMET in a decade. The announcement reflects a combination of scientific acknowledgment (animal models predict human toxicity imperfectly, particularly for DILI and immunotoxicity), ethical pressure (the FDA Modernization Act 2.0, signed into law in December 2022, explicitly encouraged alternatives to animal testing), and practical availability of alternative methods at scale.

The practical implementation is more nuanced than the headline suggests. The phase-out is endpoint-specific and drug category-specific, not universal. Some toxicity assessments require animal data by law or by the international ICH guidelines that govern global drug registration. The ICH S9 guideline for oncology drugs, for example, is unlikely to be amended rapidly. The endpoints most amenable to in silico replacement are those where human-relevant in vitro methods (OoC, primary human hepatocyte cultures, induced pluripotent stem cell-derived organ models) combined with validated computational models can produce data packages that regulators accept.

The regulatory guidance that will define what ‘validated’ means in this context is still being written. FDA’s Center for Drug Evaluation and Research (CDER) has committed to additional guidance on alternative methods, and the EMA’s 2024 AI/ML review identified model interpretability and applicability domain characterization as the two highest-priority requirements for regulatory acceptance. Companies that build their in silico ADMET platforms to meet these requirements now are better positioned to submit animal testing alternative packages as guidance solidifies.

EMA’s 2024 AI/ML Review: What It Actually Requires

The EMA’s 2024 Horizon Scanning report on AI/ML applications in medicines lifecycle is the most detailed public regulatory position paper on computational ADMET to date. Its key substantive points deserve specific attention rather than paraphrase.

The EMA emphasizes model interpretability as a requirement, not a preference. For ADMET models submitted as part of regulatory packages, black-box predictions without feature-level attribution are unlikely to be accepted. SHAP values or equivalent XAI outputs that identify which structural features drive a prediction in the claimed direction are the minimum interpretability requirement.

The EMA advises caution regarding specific regulatory endpoint predictions, including No Observed Adverse Effect Levels (NOAELs) and Lowest Observed Adverse Effect Levels (LOAELs). The precise definition of these endpoints varies by study design, and that variation affects prediction accuracy in ways that may not be apparent from cross-validation statistics. This is a specific warning to pharmaceutical companies that the EMA will scrutinize NOAEL and LOAEL predictions with particular care.

Probabilistic model outputs, which express prediction uncertainty rather than point estimates, are explicitly encouraged. A model that predicts CYP3A4 clearance of 15 mL/min/kg with 90% confidence interval [8, 28] provides more regulatory information than a point estimate of 15 mL/min/kg because it quantifies the prediction’s reliability. Bayesian neural networks and ensemble methods (like Random Forest’s natural uncertainty quantification) are well-suited to producing calibrated uncertainty estimates.

Key Takeaways — Section XII

The FDA’s 2025 animal testing phase-out creates a commercial opportunity for in silico ADMET platforms that can produce regulatory-grade predictions with documented applicability domains and interpretable feature attributions. The EMA’s 2024 requirements translate to three concrete technical needs: XAI outputs (SHAP/LIME) for all regulatory submissions, explicit applicability domain boundaries, and probabilistic rather than point-estimate predictions. Companies that build these capabilities into their platforms now will be ahead of the regulatory curve as formal guidance documents arrive.

XIII. Investment Strategy for Portfolio Managers and Institutional Analysts

Evaluating AI Drug Discovery Companies: ADMET-Specific Due Diligence

ADMET modeling capability is increasingly central to the valuation of AI drug discovery companies, but it is also one of the hardest capabilities to evaluate from public information. The following due diligence framework focuses on the factors that actually drive ADMET model quality rather than the marketing claims that surround them.

The first question is dataset provenance. Where does the company’s training data come from? Public databases (ChEMBL, PubChem, FDA adverse event databases) are accessible to all competitors. Licensed commercial databases provide some differentiation but are also available to competitors with equivalent licensing budgets. Internally generated, proprietary assay data is the only source of durable competitive advantage in model training. Ask for approximate data volumes per key endpoint, the date range over which data was collected (older data may not reflect current compound classes), and the assay protocols used (homogeneous protocols produce better training data than data collected under varying conditions).

The second question is endpoint specificity. A company claiming ‘best-in-class ADMET prediction’ should be able to specify which endpoints, which chemical classes, and what validation approach supports that claim. Aggregate performance numbers are nearly useless. Ask for performance stratified by endpoint (permeability separately from hERG separately from CYP clearance separately from DILI), and ask for time-held-out or scaffold-held-out validation statistics, not random k-fold cross-validation.

The third question is platform integration depth. A company that uses ADMET predictions to filter a virtual screening library is doing something commercially available tools can replicate. A company that uses ADMET predictions to actively steer a generative design loop — rejecting molecular modifications that worsen ADMET properties in real time — has a qualitatively different and harder-to-replicate capability. Ask whether ADMET models are used upstream (in molecular generation) or only downstream (in screening).

Public Company Positioning

Among publicly traded companies, Schrödinger (SDGR) has the most transparent commercial ADMET platform tied to its physics-based modeling stack. Its partnership revenues from pharma companies validate demand but also expose the platform to competition from companies willing to charge less for ML-only approaches with comparable accuracy for specific use cases.

Recursion Pharmaceuticals (RXRX) uses phenotypic screening at scale rather than ADMET-driven computational design. Its platform generates ADMET-adjacent information (cellular morphology changes that may predict toxicity) but does not target the molecular property prediction space that traditional ADMET models occupy. Its valuation is driven by target discovery and indication expansion rather than ADMET optimization capability.

Exscientia (acquired by Recursion in 2024) brought generative chemistry and AI-designed clinical candidates. Its most relevant ADMET contribution was automated molecular design that incorporated ADMET constraints during generation rather than as a post-generation filter. The integration of Exscientia’s generative platform with Recursion’s phenotypic data creates a combined experimental-computational ADMET capability that is more potent than either alone.

For private company investments, the IPO pipeline for AI drug discovery companies in 2025 and 2026 will include several companies where ADMET modeling quality is the central value claim. The due diligence framework above applies directly: dataset provenance, endpoint-specific validation statistics, and platform integration depth are the three questions that separate a differentiated AI drug discovery platform from a competitive positioning narrative.

IP Lifecycle Considerations

Computational ADMET platforms do not expire the way drug patents expire, but they do depreciate. A model trained on 2015 compound libraries and not updated will perform progressively worse as drug discovery programs move into chemical space the model has not seen. Companies must invest in continuous experimental data generation to keep their models current, which means ADMET platform maintenance is an ongoing operational cost, not a one-time capital investment.

Trade secret protection, the primary IP protection for proprietary ADMET datasets and model training logic, requires ongoing active maintenance. Employment agreements must contain appropriate IP assignment clauses for computational chemistry work. Data access logging demonstrates that reasonable steps were taken to protect confidentiality. When key computational chemistry personnel leave a company, trade secret audits should assess what proprietary data or model architectures were retained by the departing employee in violation of their agreements.

The interplay between trade secret and publication strategy is a specific risk for companies where the scientific community’s validation of their methods creates credibility but also potential trade secret waiver. Bayer’s approach, disclosing methodology while protecting data, represents the appropriate balance for companies whose competitive advantage rests primarily on data volume rather than algorithmic novelty.

XIV. Glossary of Technical Terms

ADMET: Absorption, Distribution, Metabolism, Excretion, Toxicity — the five pharmacological parameters that determine a drug candidate’s pharmacokinetic profile and safety.

ANDROMEDA: Bayer’s internal model targeting direct in silico to in vivo pharmacokinetic prediction, bypassing in vitro intermediates.

AUC: Area under the plasma concentration-time curve, a key pharmacokinetic exposure metric.

AutoML: Automated Machine Learning, which searches algorithm types and hyperparameter spaces automatically to identify optimal model configurations.

Caco-2 Permeability Assay: In vitro intestinal permeability test using human colorectal adenocarcinoma cells as a proxy for intestinal epithelium.

CYP (Cytochrome P450): A family of hepatic enzymes responsible for metabolizing approximately 90% of small-molecule drugs.

DILI: Drug-Induced Liver Injury, the leading cause of post-market drug withdrawals.

ECFP (Extended Connectivity Fingerprint): A molecular descriptor that encodes the chemical environment around each atom up to a defined radius, producing a bit-vector representation.

FEP+ (Free Energy Perturbation): A physics-based simulation method for predicting relative binding affinities and other thermodynamic properties.

GAN (Generative Adversarial Network): A generative model architecture where a generator network and a discriminator network train adversarially, producing chemically realistic novel structures.

GNN (Graph Neural Network): A neural network architecture that operates on molecular graphs, propagating atom-level features through multiple message-passing steps.

hERG: Human Ether-a-go-go-Related Gene, encoding a potassium channel whose inhibition by drugs causes QT interval prolongation and arrhythmia risk.

LIME: Local Interpretable Model-agnostic Explanations, an XAI technique that approximates complex model predictions locally with interpretable linear models.

LOAEL: Lowest Observed Adverse Effect Level.

NOAEL: No Observed Adverse Effect Level.

OoC (Organ-on-a-Chip): Microfluidic devices that recreate organ-level biochemical environments for more physiologically relevant in vitro ADMET testing.

PBPK (Physiologically Based Pharmacokinetic Modeling): Compartmental mathematical models that simulate drug disposition across organ systems using in vitro-derived parameters.

P-gp (P-glycoprotein): An efflux transporter that pumps drug molecules back into the intestinal lumen, reducing oral bioavailability for substrate compounds.

PROTAC: Proteolysis-Targeting Chimera, a bifunctional small molecule that recruits an E3 ubiquitin ligase to degrade a target protein.

QSAR (Quantitative Structure-Activity Relationship): Statistical models correlating molecular structural features with biological activity.

Reinforcement Learning (RL): A machine learning paradigm where an agent learns a policy by receiving reward signals for desired outcomes, used in molecular generation to steer generative models toward favorable ADMET profiles.

SHAP (SHapley Additive exPlanations): An XAI technique attributing model predictions to individual input features using game-theoretic Shapley values.

SMILES: Simplified Molecular Input Line Entry System, a text-based molecular structure representation format.

SoM (Site of Metabolism): The specific atom in a molecule where CYP-mediated oxidation occurs.

OECD QSAR Framework: The Organisation for Economic Co-operation and Development’s five-criterion framework for validating QSAR models for regulatory use.

VAE (Variational Autoencoder): A generative model that encodes molecules into a continuous latent space and decodes latent vectors back to molecular structures, enabling gradient-based optimization.

vHTS (Virtual High-Throughput Screening): Computational screening of large compound libraries against predictive ADMET models or target binding models before physical synthesis.

XAI (Explainable Artificial Intelligence): A set of techniques that provide interpretable explanations for machine learning model predictions.

This analysis was produced for pharma and biotech IP teams, portfolio managers, and R&D decision-makers. Data sources include published peer-reviewed literature, regulatory agency publications (FDA, EMA), market research from NextMSC, Mordor Intelligence, and Grand View Research, and patent database analysis. All market size figures are cited to primary sources; forward-looking statements reflect analyst consensus.