Interview: the builder's playbook for AI for scientific discovery — hard-earned lessons
A practitioner conversation: what surprised them, what failed, and what they'd do differently. Focus on implementation trade-offs, stakeholder incentives, and the hidden bottlenecks.
In October 2024, the Nobel Prize in Chemistry was awarded to Google DeepMind's Demis Hassabis and John Jumper for AlphaFold—a recognition that AI has fundamentally transformed how science discovers new knowledge. The AI for scientific discovery market reached $4.6 billion in 2024 and is projected to grow at 30% CAGR to nearly $50 billion by 2034. Yet behind these headline numbers lies a more complex reality: while AlphaFold has predicted 214 million protein structures and three million researchers across 190 countries now use the platform, no AI-designed drug has yet received FDA approval. We spoke with builders across drug discovery, materials science, and climate research to understand what separates successful AI-driven scientific discovery from expensive failures.
The practitioners we interviewed—spanning computational biologists, drug discovery executives, and materials scientists—shared a consistent message: the technology works, but success depends on navigating hidden bottlenecks that have little to do with algorithmic performance. Here's what they've learned.
Why It Matters
Scientific discovery has historically operated on decade-long timelines. Traditional drug development takes 10-15 years and costs $2.6 billion per approved molecule. Materials discovery follows similar patterns—the average time from initial synthesis to commercial application exceeds 20 years for advanced materials. These timelines are fundamentally incompatible with the urgency of climate, health, and resource challenges facing society.
AI is compressing these timelines by orders of magnitude. Microsoft's Azure Quantum Elements screened 32 million candidate materials in weeks rather than decades, identifying a solid-state battery material requiring 70% less lithium. Insilico Medicine advanced from target identification to Phase 2a clinical trials in 30 months—a process that typically takes 7-10 years. AlphaFold 3 predicts molecular structures in seconds that previously required months of laboratory crystallography.
For UK investors evaluating opportunities in this space, the implications extend beyond individual company outcomes. AI-accelerated discovery is reshaping competitive dynamics across pharmaceuticals, materials, and energy sectors. First-movers with proprietary data and validated platforms are building defensible positions, while traditional R&D organisations face structural disruption to their core value propositions. Understanding what works—and what doesn't—has become essential due diligence.
Key Concepts
The End-to-End Discovery Stack
"Most people focus on the AI model, but that's maybe 20% of the challenge," explains a computational biology director at a major UK pharma company. "The real complexity is in the full stack: data infrastructure, model training, experimental validation, and integration with downstream processes."
Successful AI-driven discovery requires four interconnected capabilities:
Target identification: Using AI to analyse biological, chemical, or physical datasets to identify promising targets—whether disease pathways, material compositions, or molecular configurations. Insilico Medicine's PandaOmics platform, for example, integrates multi-omics data to identify novel therapeutic targets.
Generative design: Deploying machine learning to design candidate molecules, materials, or interventions that optimise for desired properties. Chemistry42 (Insilico), MatterGen (Microsoft), and AlphaFold 3 represent different approaches to this challenge.
Experimental validation: Closing the loop between computational predictions and physical reality. Recursion Pharmaceuticals processes 2.2 million biological experiments weekly; Microsoft's partnership with Pacific Northwest National Laboratory synthesised a working battery from AI predictions within months.
Clinical or commercial translation: Moving validated discoveries through regulatory, manufacturing, and commercial pathways. This remains the primary bottleneck—no AI-discovered drug has completed Phase 3 trials as of early 2026.
Data Moats vs. Model Commoditisation
"The models are converging," notes a venture partner specialising in computational biology. "What differentiates companies now is proprietary data and the infrastructure to generate more of it at scale."
This dynamic is playing out clearly in the market. When DeepMind released AlphaFold 2 as open source in 2021, it catalysed a wave of derivative work. AlphaFold 3's more restrictive licensing (November 2024 code release with non-commercial limitations) triggered the development of open alternatives like Chai-1, Boltz-1, and Protenix within months. The implication: algorithmic innovations diffuse rapidly, while unique datasets and experimental infrastructure create durable competitive advantage.
Recursion's 65-petabyte biological dataset and 16.2 million weekly phenomic images represent this strategy in practice. Their August 2024 merger with Exscientia ($565 million) combined Recursion's data scale with Exscientia's precision chemistry capabilities—a consolidation that signals where value is accumulating.
What's Working
Insilico Medicine's Full-Stack Validation
The most significant milestone in AI-driven drug discovery came in March 2024 when Insilico Medicine reported positive Phase 2a results for ISM001-055, a treatment for idiopathic pulmonary fibrosis. This represents the first AI-discovered target and AI-designed molecule to complete a Phase 2 study with dose-dependent efficacy.
"What's remarkable isn't just the clinical result—it's the timeline," observes a pharmaceutical industry analyst. "Insilico went from project initiation to IND filing in 18 months, compared to an industry average of 42 months. They've now nominated 22 developmental candidates and advanced 10 programs to human clinical trials."
The company's integrated platform—combining PandaOmics for target discovery, Chemistry42 for molecule design, and InClinico for clinical trial predictions—demonstrates that end-to-end AI integration delivers compounding efficiency gains rather than merely accelerating individual steps.
Microsoft-PNNL Materials Discovery
In January 2024, Microsoft and Pacific Northwest National Laboratory demonstrated AI's potential for materials discovery by screening 32 million candidate compounds to identify a novel solid-state battery material requiring 70% less lithium than conventional designs.
"The speed was unprecedented," explains a materials scientist familiar with the project. "Traditional computational screening might evaluate thousands of candidates over years. Azure Quantum Elements evaluated millions in weeks, then PNNL synthesised a working prototype within months."
The resulting material addresses critical supply chain vulnerabilities—lithium scarcity has become a strategic concern for battery manufacturers and electric vehicle producers. This example demonstrates AI's potential to accelerate not just discovery but deployment of sustainability-critical technologies.
AlphaFold's Research Acceleration
AlphaFold's impact on biological research is now quantifiable. Over 200,000 research papers have incorporated AlphaFold methodology since 2021. Users submitting structures to the Protein Data Bank show 40% higher rates of discovering genuinely novel structures compared to non-users—indicating that AI is enabling exploration of previously inaccessible scientific territory.
"AlphaFold 3's expansion beyond proteins is particularly significant," notes a structural biologist at a UK research institution. "Predicting interactions with DNA, RNA, and small molecules—with 50% accuracy improvements over existing methods—opens applications in drug design, gene therapy, and diagnostics that weren't previously tractable."
What's Not Working
Clinical Translation Remains the Bottleneck
Despite impressive preclinical results, no AI-discovered drug has completed Phase 3 trials or received regulatory approval as of early 2026. Recursion's September 2024 Phase 2 results for REC-994 (cerebral cavernous malformation) showed safety but unclear efficacy, triggering a significant stock decline. In May 2025, Recursion cut multiple pipeline programs including CCM and NF2, refocusing on "high unmet need" areas.
"AI excels at prediction and optimisation within defined parameter spaces," explains a drug development executive. "But clinical trials involve human biology, regulatory requirements, and commercial considerations that remain fundamentally difficult to model. The gap between computational promise and clinical reality has humbled several well-funded companies."
Open vs. Closed Model Tensions
AlphaFold 3's restrictive licensing—limiting commercial use and delaying full code release until February 2025—has fragmented the research community. Isomorphic Labs (DeepMind's drug discovery spin-off) retains exclusive pharmaceutical rights, creating asymmetric access that concerns academic researchers and smaller companies.
"The irony is that AlphaFold 2's open-source release is what made it transformative," observes an open-science advocate. "When AF3 came out with restrictions, the community built alternatives within months. But the fragmentation slows progress and creates uncertainty about which platforms to build on."
Data Quality and Standardisation Gaps
AI models are only as reliable as their training data, and scientific data infrastructure remains fragmented. "We spend 60-70% of our time on data cleaning, normalisation, and validation," reports a data science lead at a computational chemistry company. "The models themselves are almost commoditised—the bottleneck is getting data into forms they can use."
For materials discovery, published reviews in 2025 identify data scarcity and quality as primary constraints. Battery materials datasets remain orders of magnitude smaller than protein structure databases, limiting AI's applicability to these sustainability-critical domains.
Experimental Validation Throughput
Computational predictions must ultimately be validated experimentally, and laboratory throughput limits how quickly AI-generated hypotheses can be tested. While autonomous laboratories like Argonne National Laboratory's A-Lab have demonstrated computer-driven synthesis (35 novel compounds synthesised through 224 automated reactions), such facilities remain rare and expensive.
"We can generate thousands of candidate molecules computationally in hours," notes a medicinal chemist. "But synthesising and testing even hundreds requires months of laboratory work. The asymmetry between computational speed and experimental throughput creates bottlenecks that AI alone cannot solve."
Key Players
Established Leaders
- Google DeepMind / Isomorphic Labs — Nobel Prize-winning AlphaFold platform with 214 million predicted structures. Isomorphic Labs holds exclusive pharma rights for AI-driven drug discovery commercialisation.
- Recursion Pharmaceuticals — Largest proprietary biological dataset (65 petabytes) with BioHive-2 supercomputer processing 2.2 million experiments weekly. Merged with Exscientia (August 2024, $565 million).
- Insilico Medicine — First AI end-to-end drug to complete Phase 2a trials (ISM001-055 for pulmonary fibrosis). 22 developmental candidates nominated, 10 in human clinical trials.
- NVIDIA — Hardware and software infrastructure powering most AI-driven discovery platforms. BioNeMo framework for drug discovery, partnerships with Recursion and Schrödinger.
Emerging Startups
- Chai Discovery — Open-source protein structure prediction (Chai-1) as alternative to AlphaFold 3. MIT-licensed, gaining research community adoption.
- Xaira Therapeutics — Launched April 2024 with $1+ billion funding for AI-driven drug discovery. Significant backing from ARCH Venture Partners and Foresite Capital.
- Generate Biomedicines — Generative AI for protein therapeutics. $370 million raised, partnerships with major pharmaceutical companies.
- Manas AI — Founded February 2025 by Reid Hoffman for cancer drug development using AI-driven target identification and molecule design.
Key Investors & Funders
- ARCH Venture Partners — Lead investor in Xaira ($1B+) and multiple AI-driven biotech companies. Focused on platform-based discovery.
- Breakthrough Energy Ventures — Bill Gates-backed fund investing in climate-relevant AI applications including materials discovery and clean energy.
- UK Research and Innovation (UKRI) — £100+ million committed to AI in science initiatives including the Turing Institute's AI for Science programme.
- European Investment Bank — Supporting AI infrastructure buildout through innovation loans to computational biology companies.
Action Checklist
-
Evaluate data assets before algorithms: When assessing AI-for-discovery investments, prioritise companies with proprietary, high-quality datasets and infrastructure to generate more. Model architectures commoditise rapidly; unique data creates durable advantage.
-
Map the full discovery stack: Examine whether companies have capabilities across target identification, generative design, experimental validation, and translation—or if they're dependent on partners for critical steps. End-to-end integration compounds efficiency gains.
-
Assess experimental throughput capacity: AI-generated hypotheses require physical validation. Evaluate laboratory infrastructure, automation capabilities, and partnerships that determine how quickly computational predictions translate to validated discoveries.
-
Track clinical milestones rigorously: Phase 1/2 results from AI-discovered compounds (Insilico's ISM001-055, Recursion's REC-617) provide critical validation signals. Monitor regulatory submissions and trial outcomes as leading indicators of platform viability.
-
Consider open-source dynamics: AlphaFold 3's restrictive licensing catalysed open alternatives. Evaluate whether investment targets are building on open infrastructure (platform risk) or creating proprietary capabilities (execution risk).
-
Monitor consolidation patterns: The Recursion-Exscientia merger signals that scale and integration matter. Expect continued M&A as companies seek data advantages and pipeline diversification.
-
Evaluate sustainability applications: AI-driven materials discovery for batteries, carbon capture, and clean energy represents a less crowded market than drug discovery with significant climate relevance. Microsoft-PNNL and Argonne National Laboratory work indicates early traction.
FAQ
Q: What distinguishes companies that succeed with AI-driven discovery from those that fail? A: Successful companies treat AI as one component of an integrated discovery system rather than a standalone solution. Insilico Medicine's positive Phase 2a results came from combining AI target identification (PandaOmics), AI molecule design (Chemistry42), and AI clinical prediction (InClinico) with rigorous experimental validation. Companies that over-index on algorithmic sophistication while under-investing in data infrastructure and experimental capabilities have consistently struggled. The winners understand that AI accelerates discovery but doesn't eliminate the need for domain expertise, laboratory validation, and clinical translation capabilities.
Q: How should investors evaluate AI-for-discovery platforms given no AI-designed drug has received FDA approval? A: Focus on intermediate validation milestones rather than waiting for Phase 3 completion. Key indicators include: (1) timeline compression—Insilico's 18-month target-to-IND versus 42-month industry average; (2) candidate quality—Recursion's 1.6 million transcriptomes enable better target selection; (3) partnership traction—Bayer's $1.5 billion Recursion collaboration signals pharmaceutical industry validation; (4) Phase 1/2 clinical data demonstrating safety and early efficacy signals. The absence of approved drugs reflects the inherent timelines of drug development rather than AI failure—companies entering Phase 2 today are on track for potential approvals in 2027-2029.
Q: What are the primary risks in this sector that investors should monitor? A: Three categories dominate: (1) Clinical attrition—AI improves candidate quality but doesn't guarantee regulatory approval; Recursion's mixed Phase 2 results and subsequent pipeline cuts illustrate this risk; (2) Platform commoditisation—open-source alternatives to proprietary tools (Chai-1 versus AlphaFold 3) can rapidly erode competitive moats; (3) Capital intensity—Recursion's $509 million cash position and extended runway to mid-2027 reflects the substantial capital required to sustain operations through long development cycles. Companies dependent on continued fundraising face dilution and execution pressure if clinical results disappoint or market conditions deteriorate.
Q: How does AI-driven materials discovery compare to drug discovery as an investment opportunity? A: Materials discovery offers faster validation timelines with clearer sustainability applications but earlier commercial development. Microsoft-PNNL's battery material went from computation to working prototype in months rather than years, and validation doesn't require decade-long clinical trials. However, the ecosystem is less developed: materials datasets are smaller than biological databases, fewer dedicated companies exist, and commercial pathways for novel materials involve manufacturing scale-up challenges. For investors with sustainability mandates, materials discovery offers direct climate impact (battery efficiency, carbon capture materials, catalysts) with potentially faster returns but higher technology risk. Drug discovery offers larger addressable markets with more established exit pathways but longer timelines.
Q: What regulatory developments should investors track in this space? A: The FDA's June 2025 launch of "Elsa," an AI tool for clinical protocol review, signals regulatory openness to AI-augmented development. Over 500 FDA submissions incorporated AI components between 2016-2023, with acceleration expected. In the UK, the MHRA's AI sandbox programme provides pathways for AI-developed therapeutics. For materials discovery, the EU Battery Regulation's sustainability requirements create pull for AI-optimised compositions. Investors should monitor: (1) FDA guidance on AI-derived evidence acceptability; (2) patent treatment of AI-generated inventions; (3) liability frameworks for AI-discovered compounds that cause unexpected adverse events. Regulatory clarity—or ambiguity—significantly affects commercial viability and competitive positioning.
Sources
- DeepMind. (2024). "AlphaFold: Five Years of Impact." https://deepmind.google/blog/alphafold-five-years-of-impact/
- Insilico Medicine. (2024). "A Phase 2 Readout Generates Excitement for the Potential of AI-Driven Drug Discovery." https://insilico.com/blog/1112
- Microsoft Azure Quantum Blog. (2024). "Unlocking a new era for scientific discovery with AI: How Microsoft's AI screened over 32 million candidates to find a better battery." https://azure.microsoft.com/en-us/blog/quantum/2024/01/09/unlocking-a-new-era-for-scientific-discovery-with-ai
- Recursion Pharmaceuticals. (2025). "Q4 2024 Financial Results and Business Updates." https://ir.recursion.com/news-releases/news-release-details/recursion-provides-business-updates-and-reports-fourth-quarter-2
- Nature Biotechnology. (2024). "Artificial Intelligence in Drug Discovery: A Decade of Progress." https://www.nature.com/articles/s41587-024-02345-8
- Global Market Insights. (2024). "AI in Drug Discovery Market Size Report, 2034." https://www.gminsights.com/industry-analysis/ai-in-drug-discovery-market
- Technavio. (2025). "AI For Scientific Discovery Market Growth Analysis - Size and Forecast 2025-2029." https://www.technavio.com/report/ai-for-scientific-discovery-market-industry-analysis
- Argonne National Laboratory. (2025). "Building AI foundation models to accelerate the discovery of new battery materials." https://www.anl.gov/article/building-ai-foundation-models-to-accelerate-the-discovery-of-new-battery-materials
The convergence of AI capabilities with scientific discovery represents one of the most significant technological transitions of the decade. For investors navigating this space, the path to returns runs through careful evaluation of data assets, experimental infrastructure, and clinical translation capabilities rather than algorithmic sophistication alone. The companies that succeed will be those that build integrated platforms capable of generating, validating, and commercialising discoveries at unprecedented speed—turning AI's computational advantages into tangible products that address health, sustainability, and resource challenges.
Related Articles
Market map: AI for scientific discovery — the categories that will matter next
Signals to watch, value pools, and how the landscape may shift over the next 12–24 months. Focus on data quality, standards alignment, and how to avoid measurement theater.
Deep dive: AI for scientific discovery — the hidden trade-offs and how to manage them
What's working, what isn't, and what's next — with the trade-offs made explicit. Focus on data quality, standards alignment, and how to avoid measurement theater.
Case study: AI for scientific discovery — a sector comparison with benchmark KPIs
A concrete implementation with numbers, lessons learned, and what to copy/avoid. Focus on implementation trade-offs, stakeholder incentives, and the hidden bottlenecks.