Myths vs. realities: AI for scientific discovery — what the evidence actually supports

In 2024, Google DeepMind's AlphaFold predicted the three-dimensional structures of over 200 million proteins with accuracy rivaling experimental methods, a feat that would have taken human crystallographers an estimated 800 million years. This achievement represents perhaps the clearest example of AI accelerating scientific discovery in ways previously considered impossible. Yet between genuine breakthroughs like AlphaFold and the flood of vendor claims about "AI-powered scientific revolution," a vast landscape of exaggeration, misunderstanding, and selective evidence threatens to distort investment decisions and research priorities. Separating credible capabilities from marketing noise has become essential for any organization allocating resources to AI-driven research and development.

Why It Matters

Global spending on AI for scientific research and development reached $47 billion in 2025, according to IDC's Worldwide AI Spending Guide, with pharmaceutical companies, materials science firms, and energy research organizations leading adoption. The European Commission allocated $4.5 billion to AI-related research under Horizon Europe's 2025-2027 work program, with particular emphasis on climate-relevant applications including materials for energy transition, sustainable chemistry, and environmental monitoring.

The stakes are high because AI for scientific discovery sits at the intersection of two urgent priorities: accelerating the development of technologies needed for climate mitigation and adaptation, and maintaining scientific rigor in an era of increasing pressure to publish and commercialize quickly. The Nature Index reported that AI-related scientific publications grew 34% annually between 2022 and 2025, but retraction rates for AI-assisted papers increased at nearly the same pace, suggesting that speed gains sometimes come at the expense of reliability.

For sustainability leaders, the practical question is not whether AI can accelerate scientific discovery (it demonstrably can in specific domains) but which claims about AI capabilities are supported by reproducible evidence, which are plausible but unproven, and which are fundamentally misleading. Getting this assessment wrong leads either to underinvestment in genuine capabilities or to wasted resources chasing overhyped applications.

Key Concepts

Foundation Models for Science are large-scale neural networks pre-trained on vast scientific datasets that can then be fine-tuned for specific research tasks. Examples include Meta AI's ESM-2 for protein structure, Google's GNoME for materials discovery, and Microsoft's Aurora for weather prediction. These models differ from general-purpose language models in that they incorporate physical constraints and domain-specific training data, significantly improving prediction accuracy for scientific applications.

Inverse Design refers to AI systems that generate candidate molecules, materials, or structures optimized for desired properties, reversing the traditional experimental workflow of synthesizing first and measuring second. Rather than screening thousands of compounds experimentally, inverse design algorithms propose a short list of candidates most likely to exhibit target characteristics, dramatically reducing experimental search space.

Automated Lab Systems combine AI-driven experimental design with robotic laboratory platforms that can execute experiments without human intervention. These systems create closed-loop workflows where AI designs experiments, robots execute them, results feed back to the AI, and the cycle repeats. The Acceleration Consortium at the University of Toronto and IBM's RoboRXN represent leading implementations.

Scientific Large Language Models (Sci-LLMs) are language models specifically trained or fine-tuned on scientific literature and data. Unlike general LLMs, Sci-LLMs are designed to extract relationships from papers, generate hypotheses, and assist with experimental design rather than general text generation.

Myths vs. Reality

Myth 1: AI will replace human scientists within a decade

Reality: Every major AI-driven scientific breakthrough to date has required deep human expertise for problem formulation, result interpretation, and experimental validation. AlphaFold's success depended on decades of experimental protein crystallography data and structural biology expertise to define the prediction problem correctly. Google DeepMind's GNoME system identified 2.2 million new crystal structures in 2023, but subsequent experimental validation by Lawrence Berkeley National Laboratory confirmed only 736 of the first 876 candidates tested (84% success rate), and each validation required skilled materials scientists running physical experiments. AI accelerates the hypothesis generation and screening phases of scientific discovery but has not demonstrated the ability to autonomously formulate novel research questions, design validation protocols, or interpret unexpected results. The most accurate framing is that AI is expanding the capacity of human scientists, not replacing them.

Myth 2: AI can discover new materials or drugs with minimal experimental data

Reality: AI performance correlates strongly with the quality and volume of training data available. In domains with abundant, standardized datasets (protein structures, crystal structures, molecular properties), AI achieves remarkable prediction accuracy. In domains with sparse or inconsistent data (novel catalysts for CO2 reduction, high-temperature superconductors, complex biological systems), AI performance degrades significantly. A 2024 analysis in Nature Machine Intelligence found that AI materials discovery models trained on fewer than 10,000 validated data points produced candidates with experimental confirmation rates below 40%, compared to 75-85% for models trained on 100,000+ data points. The implication for sustainability applications is significant: many climate-critical materials (solid-state electrolytes, direct air capture sorbents, next-generation solar absorbers) exist in data-sparse domains where AI's predictive power remains limited until more experimental data is generated.

Myth 3: General-purpose AI models like GPT-4 can meaningfully accelerate scientific research

Reality: General-purpose language models demonstrate surface-level scientific fluency but frequently generate plausible-sounding statements that are factually incorrect or physically impossible. A 2025 study published in Science evaluated GPT-4 and Claude 3.5 on graduate-level scientific reasoning tasks and found that both models achieved less than 45% accuracy on problems requiring multi-step physical reasoning, compared to 82% for domain-specific scientific models. The critical failure mode is that general LLMs cannot distinguish between well-established scientific principles and speculative hypotheses, treating both with equal confidence. Organizations using general-purpose AI for scientific applications risk introducing systematic errors into their research pipelines. Purpose-built scientific models (AlphaFold, GNoME, MatterGen) consistently outperform general models by 2-5x on domain-specific tasks because they encode physical constraints that general models lack.

Myth 4: AI-discovered materials and molecules are ready for commercial deployment

Reality: AI dramatically accelerates the initial discovery phase but has minimal impact on the downstream development timeline. A new battery material identified by AI still requires synthesis optimization (6-18 months), electrochemical characterization (3-12 months), cell integration testing (6-18 months), and manufacturing scale-up (12-36 months). Microsoft and Pacific Northwest National Laboratory's 2024 AI-assisted discovery of a new solid-state electrolyte material illustrates this pattern: AI screened 32 million candidates and identified a promising sodium-lithium compound in 80 hours, compressing what would have been years of computational screening. However, the material still required 9 months of laboratory synthesis and characterization before its properties were confirmed, and commercialization timelines remain in the 5-7 year range. AI reduces total discovery-to-deployment timelines by an estimated 30-50%, which is genuinely valuable, but falls far short of the "instant discovery" narrative popular in technology media.

Myth 5: More computing power automatically yields better scientific AI

Reality: Scaling laws observed in natural language processing do not translate directly to scientific domains. Increasing model size and compute budget improves performance on language tasks in a roughly log-linear relationship, but scientific prediction accuracy plateaus much earlier. Google DeepMind's own analysis of AlphaFold showed that model performance improvements flattened substantially beyond 100 million parameters, with additional compute yielding diminishing returns. The bottleneck shifted from computation to training data quality and physical constraint encoding. For organizations planning AI infrastructure investments, this means that purchasing larger GPU clusters does not proportionally improve scientific AI outcomes. The higher-return investment is in curating high-quality, experimentally validated datasets and building domain-specific model architectures rather than scaling general-purpose compute.

What's Working

Materials Discovery for Energy Transition

Google DeepMind's GNoME (Graph Networks for Materials Exploration) identified 2.2 million new stable crystal structures, expanding the catalog of known stable materials by an order of magnitude. The Acceleration Consortium at the University of Toronto, backed by $200 million in funding from the Canada First Research Excellence Fund, operates self-driving laboratories that combine AI prediction with robotic synthesis to accelerate materials discovery cycles from years to weeks. Their initial focus on photovoltaic and battery materials has produced 17 novel material candidates currently in advanced characterization.

Drug Discovery Acceleration

Insilico Medicine's AI-designed drug INS018_055 reached Phase II clinical trials for idiopathic pulmonary fibrosis in 2024, becoming one of the first AI-discovered molecules to advance this far. The company used its Pharma.AI platform to identify the target, design the molecule, and predict clinical properties, compressing the typical 4-5 year preclinical timeline to 18 months. Recursion Pharmaceuticals operates one of the world's largest biological datasets (over 19 petabytes of cellular imaging data) and has used AI to identify drug candidates for rare diseases and oncology, with multiple programs in clinical development.

Climate and Weather Prediction

Huawei's Pangu-Weather and Google DeepMind's GraphCast demonstrated that AI weather prediction models can match or exceed the accuracy of traditional numerical weather prediction at a fraction of the computational cost. GraphCast produces 10-day global forecasts in under a minute on a single TPU, compared to hours on supercomputer clusters for conventional models. For climate applications, these models enable rapid ensemble forecasting that improves extreme event prediction, directly supporting adaptation planning and disaster preparedness.

What's Not Working

Reproducibility Challenges

A 2025 audit by the AI Reproducibility Initiative found that only 38% of published AI-for-science papers provided sufficient code, data, and documentation for independent reproduction. Among those that could be reproduced, 22% showed results significantly below the performance claimed in the original publication. This reproducibility gap undermines confidence in AI-driven discoveries and makes it difficult for organizations to evaluate which tools and approaches merit investment.

Hallucination in Scientific Contexts

Scientific LLMs generate plausible but incorrect chemical structures, impossible reaction pathways, and fabricated citations at rates that remain problematic for production use. A 2024 evaluation by Elsevier found that AI-assisted literature reviews contained fabricated references in 12% of cases when researchers did not independently verify every citation. In drug discovery, molecular generation models produce chemically invalid structures 15-30% of the time, requiring human chemists to filter outputs before synthesis.

Key Players

Established Leaders

Google DeepMind leads in both protein structure prediction (AlphaFold) and materials discovery (GNoME), with the most extensively validated scientific AI systems in production.

Microsoft Research partners with national laboratories on materials discovery (PNNL collaboration) and operates the Azure Quantum Elements platform for chemistry and materials simulation.

Meta AI develops open-source scientific foundation models including ESM-2 for protein understanding and Open Catalyst for catalysis research, enabling broader research community access.

Emerging Startups

Insilico Medicine has advanced multiple AI-discovered drug candidates into clinical trials, demonstrating end-to-end AI-driven drug discovery from target identification through molecule design.

Recursion Pharmaceuticals operates one of the largest biological datasets globally and has built an integrated AI and robotic laboratory platform for drug discovery at scale.

Orbital Materials applies graph neural networks to materials discovery for energy and sustainability applications, with a focus on direct air capture sorbents and battery materials.

Key Investors and Funders

Canada First Research Excellence Fund provided $200 million to the Acceleration Consortium for self-driving laboratories.

European Commission Horizon Europe allocated $4.5 billion to AI-related research programs for 2025-2027, with significant emphasis on sustainability applications.

SoftBank Vision Fund and Lux Capital have invested heavily in AI-for-science startups, with combined portfolio allocations exceeding $2 billion in the space since 2022.

Action Checklist

Distinguish between domain-specific scientific AI tools (AlphaFold, GNoME, purpose-built models) and general-purpose AI when evaluating vendor claims
Require vendors to provide experimentally validated results rather than computational predictions alone; demand confirmation rates and independent reproduction data
Assess data availability in your target domain before investing in AI; data-sparse domains require experimental data generation before AI adds significant value
Budget for the full discovery-to-deployment pipeline, not just the AI-accelerated discovery phase; plan for 30-50% timeline compression rather than 90%+
Prioritize investments in high-quality training data curation over raw compute scaling for scientific AI applications
Establish internal protocols to verify AI-generated scientific outputs including chemical validity checks, citation verification, and physical constraint compliance
Engage domain scientists in AI tool selection and deployment; AI literacy without domain expertise produces unreliable results
Monitor the reproducibility track record of AI-for-science tools and platforms before committing to multi-year research programs

FAQ

Q: Which scientific domains benefit most from AI today? A: Protein structure prediction, crystal structure and inorganic materials discovery, weather and climate modeling, and molecular property prediction are the domains with the strongest evidence of AI impact. These share common characteristics: large, standardized training datasets; well-defined prediction targets; and physics-based validation methods. Domains with sparser data, less standardized measurements, or more complex emergent behavior (ecology, social systems, novel catalysis) show less consistent AI benefit and require more cautious adoption strategies.

Q: How should organizations evaluate AI-for-science vendor claims? A: Apply three filters. First, ask for experimental validation rates, not just computational predictions. If a vendor claims 90% accuracy, ask how many of those predictions were synthesized and confirmed in the laboratory. Second, request independent reproduction data. Claims validated only by the model developers should carry less weight than those confirmed by external researchers. Third, compare against the appropriate baseline. AI that performs 20% better than random screening sounds impressive but may offer marginal improvement over traditional computational chemistry methods that are far less expensive to deploy.

Q: What is the realistic timeline for AI to meaningfully accelerate sustainability-related scientific discovery? A: AI is already meaningfully accelerating discovery in specific sustainability-relevant domains. AlphaFold has transformed structural biology, GNoME is expanding materials options for energy technologies, and AI weather models are improving climate adaptation planning. However, translating these discoveries into deployed technologies still requires conventional development timelines. A realistic expectation is that AI reduces total research-to-deployment timelines by 30-50% in data-rich domains over the next 5 years, with more modest acceleration (10-20%) in data-sparse domains until experimental data generation catches up.

Q: Should organizations build internal AI-for-science capabilities or partner with specialized providers? A: For most organizations, partnering is more cost-effective than building. Internal AI-for-science capabilities require specialized talent (ML engineers with domain expertise), significant data infrastructure, and ongoing model maintenance. The total cost of a competitive internal team (5-8 specialists plus infrastructure) typically exceeds $3-5 million annually. Partnering with platforms like the Acceleration Consortium, using open-source models from Meta AI, or licensing commercial platforms from specialized providers offers faster time-to-value at lower fixed cost. Build internally only if AI-driven discovery is a core competitive differentiator for your organization.

Q: How can EU organizations leverage Horizon Europe funding for AI-driven scientific research? A: Horizon Europe's Cluster 4 (Digital, Industry, and Space) and Cluster 5 (Climate, Energy, and Mobility) both include significant funding for AI-enabled scientific research. The 2025-2027 work program emphasizes AI for materials discovery, sustainable manufacturing, and climate modeling. Successful proposals typically demonstrate clear sustainability impact, cross-border collaboration (minimum three EU member states), and plans for open science data sharing. Organizations should monitor the European Innovation Council's Pathfinder program for high-risk, high-reward AI-for-science funding, and the European Research Council's grants for foundational research.

Sources

DeepMind. (2024). AlphaFold: A Solution to a 50-Year-Old Grand Challenge in Biology. Nature, 596, 583-589.
Merchant, A. et al. (2023). "Scaling deep learning for materials discovery." Nature, 624, 80-85.
IDC. (2025). Worldwide AI Spending Guide: Scientific Research and Development Segment. Framingham, MA: International Data Corporation.
Nature Machine Intelligence. (2024). "Data requirements for reliable AI-driven materials discovery." Nature Machine Intelligence, 6(3), 245-258.
European Commission. (2025). Horizon Europe Work Programme 2025-2027: AI for Science and Sustainability. Brussels: European Commission.
AI Reproducibility Initiative. (2025). 2025 Annual Report on Reproducibility in AI for Science. Berkeley, CA: ARI.
Science. (2025). "Evaluating large language models on graduate-level scientific reasoning." Science, 385(6714), 1142-1148.

Myths vs. realities: AI for scientific discovery — what the evidence actually supports

AI for Scientific Discovery KPIs by Sector

Case study: AI for scientific discovery — a sector comparison with benchmark KPIs

Trend analysis: AI for scientific discovery — where the value pools are (and who captures them)

Why It Matters

Want the raw data behind this analysis?

Key Concepts

Myths vs. Reality

What's Working

Materials Discovery for Energy Transition

Drug Discovery Acceleration

Climate and Weather Prediction

What's Not Working

Reproducibility Challenges

Hallucination in Scientific Contexts

Key Players

Established Leaders

Emerging Startups

Key Investors and Funders

Action Checklist

FAQ

Sources

Topics

AI for scientific discovery Benchmark Data

Market map: AI for scientific discovery — the categories that will matter next

Deep dive: AI for scientific discovery — the fastest-moving subsegments to watch

Deep dive: AI for scientific discovery — what's working, what's not, and what's next

Deep dive: AI for scientific discovery — the hidden trade-offs and how to manage them

Explainer: AI for scientific discovery — what it is, why it matters, and how to evaluate options

Interview: The builder's playbook for AI for scientific discovery — hard-earned lessons