AI & Emerging Tech·12 min read··...

Deep dive: Digital twins, simulation & synthetic data — the fastest-moving subsegments to watch

An in-depth analysis of the most dynamic subsegments within Digital twins, simulation & synthetic data, tracking where momentum is building, capital is flowing, and breakthroughs are emerging.

The digital twin market grew from $6.5 billion in 2021 to an estimated $35 billion by 2025, but this aggregate figure obscures dramatic variation across subsegments. While building information modeling (BIM) twins have matured into a commodity offering, three subsegments are experiencing exponential growth in both technical capability and capital deployment: physics-informed machine learning twins for industrial decarbonization, city-scale climate resilience simulations, and synthetic data generation for training AI models in data-scarce sustainability applications. For engineers evaluating where to invest technical effort, understanding which subsegments are genuinely accelerating versus simply attracting hype is essential for making sound architecture and career decisions.

Why It Matters

Digital twins have transitioned from visualization tools into operational decision-making platforms. The shift is driven by three converging forces: the computational cost of running physics-based simulations has dropped 80% since 2020 due to GPU acceleration and cloud infrastructure improvements; sensor costs have fallen below $5 per connected point for industrial IoT devices; and regulatory mandates including the EU Digital Product Passport (effective 2027), the SEC climate disclosure rules, and CSRD reporting requirements are creating demand for verifiable, model-backed emissions and performance data.

The total addressable market for sustainability-focused digital twins is projected at $12-18 billion by 2028, according to McKinsey and BloombergNEF estimates. However, capital allocation is highly concentrated. Of the $4.2 billion invested in digital twin startups between 2023 and 2025, approximately 65% targeted industrial and infrastructure applications, 20% went to urban and climate simulation platforms, and 15% funded synthetic data companies. This distribution reflects where engineering value creation is most immediate, but the fastest growth rates are in the synthetic data and climate simulation subsegments, each expanding at 40-55% compound annual rates compared to 20-25% for established industrial twin applications.

For emerging markets specifically, digital twins address a critical capacity gap. Many countries in Southeast Asia, Sub-Saharan Africa, and Latin America lack the historical data infrastructure that data-intensive AI models require. Synthetic data generation and simulation-based approaches allow these regions to leapfrog data collection timelines, building predictive models for infrastructure planning, climate adaptation, and industrial optimization without decades of sensor deployment history.

Subsegment 1: Physics-Informed Machine Learning Twins for Industrial Decarbonization

The integration of physics-based models with machine learning represents the most technically significant development in industrial digital twins since 2023. Traditional digital twins relied on either pure physics simulations (computationally expensive but interpretable) or pure data-driven models (fast but fragile when conditions change). Physics-informed neural networks (PINNs) and hybrid architectures combine both, encoding physical laws as constraints within neural network training while allowing the model to learn empirical corrections from operational data.

Where momentum is building: Cement, steel, and chemical manufacturing are the primary adoption domains. The cement industry alone accounts for 8% of global CO2 emissions, and physics-informed twins enable process optimization that reduces clinker factor, optimizes kiln thermal efficiency, and manages alternative fuel blending in ways that pure rule-based control systems cannot achieve. LafargeHolcim deployed physics-informed twins across 12 cement plants between 2023 and 2025, documenting 6-9% reductions in specific CO2 emissions per ton of clinker, equivalent to approximately 2.4 million tons of CO2 avoided annually. The twins model thermochemical reactions within the kiln, predict clinker quality from raw material composition, and optimize fuel blending in real time.

Capital flows: Akselos raised $30 million in 2024 for structural digital twins using reduced-basis methods that run physics simulations 1,000 times faster than traditional finite element analysis. Cognite, which provides industrial data operations platforms for digital twins, reached a $1.6 billion valuation in 2025 after signing major contracts with oil and gas operators transitioning to hydrogen production. Sight Machine secured $45 million for manufacturing analytics twins that integrate process physics with production quality data.

Technical frontier: The emergence of foundation models for physics simulation, analogous to large language models but trained on physics data, represents the next inflection point. NVIDIA's Modulus framework and DeepMind's GraphCast weather model demonstrate that neural operators can learn to approximate partial differential equations governing fluid dynamics, heat transfer, and structural mechanics with 100-1,000x speedups over traditional solvers. For engineers, this means digital twins that previously required supercomputer access can run on edge devices, enabling real-time optimization in facilities lacking cloud connectivity.

Key benchmark metrics: Production-grade physics-informed twins achieve prediction accuracy within 2-5% of full physics simulations while running 100-500x faster. Industrial deployments targeting decarbonization report 4-12% specific emissions reductions within the first 12 months. Implementation timelines range from 6-18 months depending on existing sensor infrastructure and data historian capabilities.

Subsegment 2: City-Scale Climate Resilience Simulations

Urban digital twins have evolved from 3D visualization platforms into decision-support systems for climate adaptation planning. The fastest-moving subsegment focuses on coupling hydrological, thermal, and atmospheric models with infrastructure networks to simulate compound climate risks across entire metropolitan areas.

Where momentum is building: Flood risk modeling drives the largest share of urban climate twin deployments. The combination of high-resolution terrain data (from lidar surveys at 10-50 cm resolution), real-time rainfall monitoring, and computational fluid dynamics now enables cities to simulate urban flooding at building-level granularity with lead times of 2-6 hours. Copenhagen's Cloudburst Management Plan, backed by a digital twin of the city's stormwater network, has prevented an estimated $1.2 billion in flood damage since its operational deployment in 2022 by enabling dynamic control of retention basins and sewer overflows.

Singapore's Virtual Singapore platform, the most comprehensive urban digital twin globally, integrates building energy models, pedestrian flow simulations, solar irradiance mapping, and microclimate modeling across the entire city-state. As of 2025, the platform supports decisions ranging from wind corridor preservation in new developments to optimal placement of cooling infrastructure in heat-vulnerable neighborhoods. The platform processes data from over 100,000 IoT sensors and 10,000 building energy management systems.

Emerging market applications: Where data scarcity limits traditional approaches, hybrid models combining satellite imagery, crowdsourced data, and physics simulations are enabling climate resilience planning. Nairobi's flood risk twin, developed through a partnership between UN-Habitat and Deltares, uses satellite-derived elevation models combined with smartphone-reported flood extent data to calibrate hydrological models for informal settlements where no ground-based sensors exist. Similar approaches are operational in Ho Chi Minh City, Dhaka, and Lagos.

Capital flows: Cityzenith raised $24 million in 2024 for its Smart World Pro platform targeting urban sustainability planning. One Concern, which models earthquake, flood, and climate risk at building resolution, reached a $250 million valuation after expanding from its US base into Japan and Southeast Asian markets. Bentley Systems acquired Seequent for $1.05 billion in 2023, adding subsurface modeling capabilities to its infrastructure digital twin portfolio.

Key benchmark metrics: State-of-the-art urban flood twins achieve spatial accuracy of 85-92% for predicted inundation extent when validated against observed events. Heat island models predict neighborhood-level temperature differentials within 1-2 degrees Celsius. Implementation costs range from $2-8 million for citywide deployments, with ongoing operational costs of $300,000-800,000 annually for data integration and model updates.

Subsegment 3: Synthetic Data Generation for Sustainability AI

Synthetic data generation is the fastest-accelerating subsegment by growth rate, expanding from a $500 million market in 2023 to a projected $2.5-3 billion by 2026. For sustainability applications, synthetic data addresses a fundamental bottleneck: training AI models for environmental monitoring, infrastructure inspection, and resource optimization requires labeled datasets that are expensive, time-consuming, or impossible to collect from real-world sources.

Where momentum is building: Satellite imagery augmentation represents the highest-value application. Training AI models to detect deforestation, methane plumes, or crop stress from satellite imagery requires hundreds of thousands of labeled examples across diverse geographies and conditions. Collecting and labeling this data from real satellite passes is prohibitively slow. Companies including Rendered.ai and Synthesis AI generate photorealistic synthetic satellite scenes with pixel-level annotations, enabling model training on scenarios that may not yet exist in historical archives, such as novel deforestation patterns or industrial emissions signatures.

Autonomous inspection of energy infrastructure is another rapidly growing application. Training AI to detect defects in solar panels, wind turbine blades, or transmission lines from drone imagery requires examples of every failure mode across all environmental conditions. Synthetic data generation creates training sets covering rare but critical failure types (such as internal blade delamination or specific PV cell hot-spot patterns) that would take years to accumulate from field inspections alone. Heliolytics, a Canadian solar inspection company, documented a 35% improvement in defect detection accuracy after augmenting its training data with synthetic imagery from a generative adversarial network (GAN) trained on physics-based solar panel degradation models.

Domain-specific breakthroughs: The convergence of physics simulation and generative AI has produced a new class of synthetic data that embeds physical realism. NVIDIA Omniverse, combined with Isaac Sim, generates synthetic training data for robotics applications in industrial settings, including waste sorting, construction material handling, and agricultural operations. Microsoft's Project AirSim creates synthetic training environments for drone-based environmental monitoring. These platforms go beyond simple image augmentation, simulating sensor physics, lighting conditions, material properties, and dynamic scenarios to produce data that transfers reliably to real-world deployment.

Capital flows: Gretel.ai raised $60 million in 2024 for privacy-preserving synthetic data, with sustainability applications including generating synthetic building energy profiles for portfolio-level optimization modeling. Mostly AI secured $25 million for tabular synthetic data generation applicable to supply chain emissions modeling. Datagen (acquired by Unity for $100 million in 2023) demonstrated the value of synthetic data for computer vision training, catalyzing investor interest across the subsegment.

Key benchmark metrics: Models trained on synthetic data alone achieve 70-85% of the accuracy of models trained on equivalent volumes of real data. Models trained on mixed synthetic-real datasets (typically 80% synthetic, 20% real) achieve 95-102% of pure real-data performance, with the over-100% figure reflecting cases where synthetic data's broader coverage of edge cases improves generalization. Time to generate sufficient training data drops from months (real collection) to days (synthetic generation).

Several dynamics span all three subsegments and will shape the landscape through 2028.

Interoperability standards are consolidating. The Digital Twin Consortium's open-source modeling framework, combined with the Eclipse Digital Twin Definition Language, is reducing integration costs between twin platforms from different vendors. For engineers, this means architectural decisions should prioritize standards-compliant interfaces over vendor-specific integrations.

Edge deployment is becoming viable. NVIDIA Jetson Orin and Qualcomm's Cloud AI 100 processors enable physics-informed twin inference at the edge, reducing latency from seconds (cloud round-trip) to milliseconds. This shift is critical for real-time control applications in industrial settings and autonomous inspection systems where connectivity is intermittent.

Regulatory pull is strengthening. The EU Digital Product Passport requires manufacturers to maintain digital representations of products throughout their lifecycle starting in 2027, beginning with batteries, textiles, and electronics. This mandate alone could create $2-4 billion in digital twin demand as manufacturers build compliance infrastructure.

Talent bottlenecks are acute. The intersection of physics expertise, machine learning engineering, and domain knowledge (in energy systems, climate science, or manufacturing processes) produces a talent profile that fewer than 5,000 professionals globally can currently fill. Compensation for senior digital twin engineers in this intersection exceeds $250,000 in US markets and $150,000-200,000 in European and Asian tech hubs.

Real-World Examples

Siemens Energy Wind Turbine Twins: Siemens Energy operates digital twins for over 30,000 wind turbines globally, using physics-informed models to predict component fatigue, optimize yaw and pitch control, and schedule maintenance. The platform has increased fleet-wide energy capture by 3-5% and reduced unplanned downtime by 20%, translating to approximately $500 million in annual value for operators. The twin's hybrid architecture combines aeroelastic physics models with neural network corrections learned from SCADA data across the fleet.

Google Environmental Insights Explorer: Google's urban environmental platform, operational in over 40,000 cities, uses building-level digital twin models to estimate rooftop solar potential, transportation emissions, and tree canopy coverage. While not a full simulation platform, it demonstrates how city-scale digital twin data can democratize climate planning for municipalities that lack resources for custom deployments.

World Resources Institute's Aqueduct Platform: WRI's Aqueduct Water Risk Atlas uses basin-level hydrological twins to model water stress under multiple climate scenarios across every major watershed globally. The platform serves investors, corporations, and governments assessing water-related financial and operational risk. Its integration with PCR-GLOBWB hydrological models and CMIP6 climate projections represents a production-grade example of simulation-based decision support at global scale.

Action Checklist

  • Evaluate physics-informed twin architectures over pure data-driven approaches for industrial applications where operating conditions may shift beyond training data distributions
  • Assess synthetic data generation as a strategy for overcoming labeled data scarcity in environmental monitoring and infrastructure inspection AI models
  • Prioritize interoperability standards (Digital Twin Definition Language, open APIs) in platform selection to avoid vendor lock-in
  • Investigate edge deployment options for twin inference in facilities with limited or unreliable cloud connectivity
  • Map regulatory requirements (EU DPP, CSRD, SEC climate rules) that may mandate digital twin capabilities for compliance
  • Budget for hybrid teams combining physics domain expertise with ML engineering for twin development
  • Pilot city-scale climate simulation tools for infrastructure resilience planning, starting with flood and heat risk modeling
  • Benchmark synthetic data training against real-data baselines using domain-specific accuracy metrics before full deployment

Sources

  • McKinsey & Company. (2025). Digital Twins: The Next Wave of Industrial Transformation. New York: McKinsey Global Institute.
  • BloombergNEF. (2025). Digital Twin Investment Tracker: Q4 2024 Annual Review. New York: Bloomberg LP.
  • National Academies of Sciences. (2024). Foundational Research Gaps and Future Directions for Digital Twins. Washington, DC: National Academies Press.
  • NVIDIA. (2025). Omniverse Industrial Digital Twin Platform: Technical Reference and Case Studies. Santa Clara, CA: NVIDIA Corporation.
  • Gartner. (2025). Hype Cycle for Digital Twins in Sustainability, 2025. Stamford, CT: Gartner Inc.
  • Digital Twin Consortium. (2025). Interoperability Framework for Sustainability Digital Twins. Boston, MA: Object Management Group.
  • Nature Machine Intelligence. (2024). "Physics-informed neural networks for industrial process optimization: A systematic review." Nature Machine Intelligence, 6(3), 245-261.

Stay in the loop

Get monthly sustainability insights — no spam, just signal.

We respect your privacy. Unsubscribe anytime. Privacy Policy

Case Study

Case study: Digital twins, simulation & synthetic data — a city or utility pilot and the results so far

A concrete implementation case from a city or utility pilot in Digital twins, simulation & synthetic data, covering design choices, measured outcomes, and transferable lessons for other jurisdictions.

Read →
Case Study

Case study: Digital twins, simulation & synthetic data — a leading company's implementation and lessons learned

An in-depth look at how a leading company implemented Digital twins, simulation & synthetic data, including the decision process, execution challenges, measured results, and lessons for others.

Read →
Case Study

Case study: Digital twins, simulation & synthetic data — a startup-to-enterprise scale story

A detailed case study tracing how a startup in Digital twins, simulation & synthetic data scaled to enterprise level, with lessons on product-market fit, funding, and operational challenges.

Read →
Case Study

Case study: Digital twins, simulation & synthetic data — a pilot that failed (and what it taught us)

A concrete implementation with numbers, lessons learned, and what to copy/avoid. Focus on KPIs that matter, benchmark ranges, and what 'good' looks like in practice.

Read →
Article

Trend analysis: Digital twins, simulation & synthetic data — where the value pools are (and who captures them)

Strategic analysis of value creation and capture in Digital twins, simulation & synthetic data, mapping where economic returns concentrate and which players are best positioned to benefit.

Read →
Article

Market map: Digital twins, simulation & synthetic data — the categories that will matter next

A visual and analytical map of the Digital twins, simulation & synthetic data landscape: segments, key players, and where value is shifting.

Read →