AI & Emerging Tech·16 min read··...

Deep dive: Digital twins, simulation & synthetic data — what's working, what's not, and what's next

A comprehensive state-of-play assessment for Digital twins, simulation & synthetic data, evaluating current successes, persistent challenges, and the most promising near-term developments.

The global digital twin market reached $16.8 billion in 2025 and is projected to exceed $110 billion by 2030, yet the gap between vendor marketing and operational reality remains substantial. For every headline about a digital twin delivering transformative results, there are dozens of deployments stalled in proof-of-concept stages, struggling with data integration, model fidelity, or unclear return on investment. This deep dive evaluates what is genuinely working in digital twins, simulation, and synthetic data, where the persistent challenges lie, and where the technology is heading over the next three to five years.

Why It Matters

Digital twins, simulation environments, and synthetic data generation represent three interconnected capabilities that are reshaping how organizations design, operate, and optimize physical systems. A digital twin is a virtual replica of a physical asset, process, or system that is continuously updated with real-world data to mirror its physical counterpart's state and behaviour. Simulation engines enable testing scenarios and interventions on these virtual replicas without risk to physical operations. Synthetic data, generated from simulations or generative models, addresses the chronic shortage of labelled training data for machine learning applications, particularly in domains where real data is expensive, scarce, or privacy-constrained.

The convergence of these three capabilities has created a new category of infrastructure for decision-making. McKinsey estimated in 2024 that digital twins could generate $1.3 trillion in economic value across manufacturing, infrastructure, healthcare, and energy by 2030. But realising this potential requires navigating significant technical, organisational, and economic obstacles that early adopters are only beginning to understand.

For founders and technology leaders, the strategic question is not whether digital twins will matter but which applications are ready for commercial deployment today, which require further maturation, and where the investment thesis supports building versus waiting. The technology landscape is shifting rapidly: advances in physics-informed neural networks, cloud-native simulation platforms, and foundation models for physical systems are compressing development timelines that previously required years of domain-specific engineering.

Three structural forces are accelerating adoption. First, regulatory mandates including the EU's Energy Performance of Buildings Directive (EPBD recast), the SEC's climate disclosure requirements, and industrial safety regulations increasingly require continuous monitoring and scenario analysis that digital twins are uniquely positioned to provide. Second, the cost of the enabling infrastructure, specifically cloud compute, IoT sensors, and connectivity, has declined by 60 to 80% over the past decade, making deployments economically viable for a broader range of applications. Third, the maturation of open standards including the Digital Twin Definition Language (DTDL), the Asset Administration Shell, and the Open Digital Twin Consortium's reference architecture is reducing vendor lock-in and integration costs.

Key Concepts

Digital Twin Maturity Levels range from static 3D models (Level 1) through data-connected visualisations (Level 2), analytical twins with simulation capabilities (Level 3), predictive twins using machine learning (Level 4), to autonomous twins that execute decisions without human intervention (Level 5). Most production deployments today operate at Levels 2 or 3. The jump from Level 3 to Level 4 requires significantly more data infrastructure and model validation, while Level 5 remains largely aspirational outside of highly controlled environments like semiconductor fabrication.

Physics-Informed Neural Networks (PINNs) embed physical laws (conservation of energy, fluid dynamics equations, structural mechanics) as constraints within neural network architectures. This approach addresses the fundamental challenge that pure data-driven models require enormous training datasets and may produce physically implausible predictions. PINNs can deliver accurate simulations with 10 to 100 times less training data than conventional neural networks by leveraging known physics as an inductive bias.

Synthetic Data Generation creates artificial datasets that preserve the statistical properties and structural relationships of real data without containing actual observations. For digital twin applications, simulation-based synthetic data enables training machine learning models for scenarios that rarely occur in practice (equipment failures, extreme weather, demand spikes) or for which real data collection would be prohibitively expensive. Gartner projected that by 2026, 60% of the data used for AI development would be synthetically generated, up from less than 1% in 2021.

Federated Digital Twins connect individual asset-level twins into system-level representations. A federated approach enables modelling interactions between components, such as how a building's HVAC system responds to grid price signals, or how traffic patterns affect urban air quality. The technical challenge lies in reconciling different time scales, spatial resolutions, and modelling frameworks across constituent twins while maintaining computational tractability.

Digital Twin Deployment KPIs: Benchmark Ranges

MetricBelow AverageAverageAbove AverageTop Quartile
Model Accuracy vs Physical System<80%80-90%90-95%>95%
Data Latency (sensor to twin)>15 min5-15 min1-5 min<1 min
Predictive Maintenance Lead Time<24 hrs1-7 days1-4 weeks>4 weeks
Implementation Time to Value>18 months12-18 months6-12 months<6 months
Unplanned Downtime Reduction<10%10-25%25-40%>40%
Energy Optimization Savings<5%5-12%12-20%>20%
Synthetic Data Training Improvement<10%10-25%25-40%>40%

What's Working

Manufacturing Process Optimization

Manufacturing represents the most mature digital twin application domain, with documented returns across discrete and process manufacturing. Siemens' deployment at its Amberg electronics plant, often cited as the reference implementation, uses a comprehensive digital twin spanning product design, production planning, and real-time process control. The plant produces over 17 million Simatic controllers annually with a first-pass yield of 99.99885%, a figure that would be impossible without continuous twin-driven process adjustment.

Beyond showcase facilities, mid-market manufacturers are achieving meaningful results. A 2024 study by the Manufacturing Technology Centre (MTC) across 47 UK manufacturers found that digital twin deployments focused on production scheduling and quality control delivered median ROI of 3.2x within 18 months. The critical success factor was scoping: manufacturers who started with a single production line or critical bottleneck process achieved faster payback than those attempting facility-wide deployments. Typical annual savings ranged from $200,000 to $1.5 million per production line, primarily from reduced scrap, improved throughput, and optimised energy consumption.

BMW's use of NVIDIA Omniverse to create a complete virtual replica of its Regensburg plant before physical construction exemplifies the design-phase value proposition. The digital twin enabled identification and resolution of 2,300 potential manufacturing issues before any physical equipment was installed, reducing commissioning time by an estimated 30%. This "build virtually first" approach is becoming standard practice for greenfield manufacturing facilities and major production line modifications.

Infrastructure and Utility Asset Management

Water utilities, electricity networks, and transportation infrastructure operators have adopted digital twins for asset management with growing sophistication. Thames Water's digital twin of its London distribution network, built on Bentley Systems' iTwin platform, monitors pressure, flow, and water quality across 31,000 km of mains, enabling predictive identification of bursts and leaks. The system has contributed to a 15% reduction in leakage and a significant decrease in the average time to detect and repair network failures.

National Grid's digital twin of the UK electricity transmission network integrates real-time SCADA data with thermal rating models for overhead lines and underground cables. Dynamic line rating, enabled by the twin, has increased effective transmission capacity by 10 to 20% during favourable weather conditions without any physical infrastructure investment, deferring capital expenditure on network reinforcement.

The Port of Rotterdam's digital twin, developed with IBM, processes data from over 40 sensors per berth to optimise vessel scheduling, predict tidal and current conditions, and coordinate autonomous surface vessels. The twin handles approximately 30,000 vessel visits annually and has reduced average port call duration by 1 hour, generating estimated savings of EUR 80 million per year in vessel operating costs and emissions reductions from reduced idling.

Synthetic Data for Autonomous Systems

Waymo, Cruise, and other autonomous vehicle developers generate billions of simulated driving miles annually using synthetic environments. Waymo's simulation platform, SurfelGAN and its successors, creates photorealistic street scenes with procedurally generated traffic scenarios, enabling testing of edge cases (pedestrian crossings in fog, emergency vehicle interactions, construction zones) that would require millions of real-world miles to encounter naturally. Waymo reported that approximately 70% of its validation testing is conducted in simulation rather than on public roads.

This approach has extended beyond autonomous driving. Amazon's use of synthetic data for warehouse robotics training has reduced the time required to teach robots new pick-and-place tasks from weeks to hours. By training in simulation with randomised object geometries, lighting conditions, and gripper dynamics, robots achieve 95%+ real-world pick success rates with zero real training data. The sim-to-real transfer gap, the difference between simulated and real-world performance, has narrowed from 15 to 25% five years ago to 3 to 8% for well-calibrated synthetic environments.

What's Not Working

Building-Level Digital Twins for Existing Stock

Despite significant vendor investment, digital twins for existing commercial buildings remain largely stuck at Level 2 maturity (data-connected visualisation). The fundamental challenge is the cost of creating accurate as-built models for buildings constructed before BIM adoption. Laser scanning and point cloud processing can generate geometric models, but capturing the MEP (mechanical, electrical, plumbing) systems, control logic, and operational parameters needed for meaningful simulation typically costs $3 to $8 per square foot, a figure that closes the business case for all but the highest-value properties.

A 2025 survey by the Building Performance Institute Europe (BPIE) found that fewer than 4% of European commercial buildings had operational digital twins with simulation capabilities. Among those that did, 62% reported that maintaining model currency (keeping the twin synchronised with physical modifications, equipment replacements, and control changes) consumed 30 to 50% of ongoing programme costs.

City-Scale and National-Scale Twins

Several high-profile national digital twin initiatives, including Singapore's Virtual Singapore, the UK's National Digital Twin programme, and Helsinki's Kalasatama district twin, have demonstrated impressive visualisation capabilities but struggled to deliver quantifiable operational value proportional to their investment. Virtual Singapore, which has received over SGD 73 million in government funding since 2014, provides a detailed 3D model integrating building, terrain, and infrastructure data, but operational use cases beyond urban planning visualisation remain limited.

The core technical challenge is computational: simulating interactions between millions of assets across different domains (buildings, transport, energy, water) at sufficient temporal and spatial resolution for real-time decision support exceeds current computing capabilities. Federated approaches that connect domain-specific twins through standardised APIs offer a more tractable path, but interoperability standards remain immature and adoption is fragmented.

Synthetic Data Quality Assurance

While synthetic data generation capabilities have advanced rapidly, systematic approaches to validating that synthetic data accurately represents the target distribution remain underdeveloped. The "reality gap" between simulated and real-world conditions creates subtle biases that can degrade model performance in production. For safety-critical applications (autonomous vehicles, medical devices, structural integrity monitoring), the absence of accepted standards for synthetic data validation represents a significant barrier to regulatory approval.

The EU AI Act's requirements for training data quality and documentation will increase pressure on organisations using synthetic data to demonstrate its fidelity and representativeness. Current best practices involve statistical divergence testing (comparing synthetic and real data distributions using metrics like the Frechet Inception Distance or Maximum Mean Discrepancy), but these metrics capture distributional similarity rather than the causal structures that determine model behaviour in novel situations.

What's Next

Foundation Models for Physical Systems

The emergence of foundation models trained on diverse physical simulation data represents the most significant near-term development. NVIDIA's Modulus platform enables training neural network surrogates on computational fluid dynamics, structural mechanics, and electromagnetic simulations, reducing simulation times from hours to milliseconds while maintaining engineering accuracy. Microsoft's Bonsai and Siemens' Industrial Copilot are integrating large language model interfaces with simulation backends, enabling non-specialist users to configure, run, and interpret digital twin analyses using natural language.

These approaches address the expertise bottleneck that has constrained digital twin adoption. Building and maintaining physics-based simulation models has historically required scarce specialist engineering skills. Foundation models that learn general physical principles from large simulation corpora and adapt to specific applications with minimal fine-tuning could dramatically reduce the cost and time of twin development.

Autonomous Operation and Closed-Loop Control

The transition from advisory twins (which recommend actions for human approval) to autonomous twins (which execute decisions directly) is advancing in controlled industrial environments. BASF's Ludwigshafen chemical complex operates autonomous digital twins for several process units, where the twin continuously adjusts reactor temperatures, feed rates, and catalyst regeneration cycles based on real-time process data and economic optimisation objectives. Human operators monitor performance dashboards and intervene only when the twin encounters conditions outside its validated operating envelope.

Extending autonomous operation to less controlled environments, such as building energy management, traffic signal control, or water network pressure management, requires robust uncertainty quantification and fail-safe mechanisms. Research on conformal prediction and calibrated uncertainty estimation for digital twin outputs is progressing rapidly and will be essential for regulatory acceptance of autonomous twin decision-making in safety-critical applications.

Regulatory-Driven Standardisation

The EU's proposed Data Act and the European Data Strategy include provisions for mandatory data sharing in specific sectors (energy, transport, and public administration) that will create standardised data feeds for digital twin applications. The Construction Products Regulation revision mandates digital product passports for construction materials, providing the data foundation for building-level twins. The ISO 23247 standard for manufacturing digital twins, published in 2024, provides a reference architecture that is being adopted by major automation vendors.

These regulatory and standards developments will reduce the integration costs that currently consume 40 to 60% of digital twin project budgets, shifting the value equation in favour of deployment for a broader range of applications and organisation sizes.

Action Checklist

  • Assess digital twin maturity level for your highest-value assets and identify the specific operational decisions each twin should support
  • Evaluate data infrastructure readiness: sensor coverage, data latency, historical data availability, and integration with existing operational systems
  • Start with a bounded, high-value use case (single production line, critical asset, or well-instrumented building) rather than attempting enterprise-wide deployment
  • Define measurable success criteria before development, including target accuracy, decision lead time improvement, and financial ROI thresholds
  • Investigate synthetic data generation for training ML models where real operational data is scarce, expensive, or privacy-constrained
  • Evaluate physics-informed approaches (PINNs, hybrid models) to reduce data requirements and improve physical plausibility of twin predictions
  • Engage with emerging open standards (DTDL, Asset Administration Shell, ISO 23247) to reduce vendor lock-in and future integration costs
  • Plan for ongoing model maintenance: allocate 20 to 30% of initial development cost annually for model updates, recalibration, and drift monitoring

FAQ

Q: What is the minimum viable digital twin, and what does it cost? A: A minimum viable digital twin connects real-time operational data from an asset to a computational model capable of answering at least one decision-relevant question (e.g., "when will this component fail?" or "what is the optimal setpoint for current conditions?"). For a single industrial asset (pump, compressor, or heat exchanger), development costs range from $50,000 to $200,000 using commercial platforms, with ongoing costs of $20,000 to $60,000 annually for data infrastructure and model maintenance. For a production line or building, costs scale to $200,000 to $1 million for initial development. Cloud-native platforms from vendors like Azure Digital Twins, AWS IoT TwinMaker, and Bentley iTwin have reduced infrastructure costs by 40 to 60% compared to on-premise approaches.

Q: How does synthetic data compare to real data for training machine learning models? A: When well-calibrated, synthetic data can match or exceed real data performance for specific tasks. Studies across computer vision, natural language processing, and robotics consistently show that models trained on 80% synthetic and 20% real data outperform models trained on 100% real data alone, because synthetic generation enables controlled variation and edge case coverage that real datasets lack. The critical requirement is calibration: the synthetic generation process must accurately represent the domain's physical properties, sensor characteristics, and noise distributions. Poorly calibrated synthetic data introduces systematic biases that degrade performance. Validation against a held-out real dataset is essential.

Q: Which industries are seeing the fastest digital twin ROI? A: Semiconductor manufacturing leads with typical payback periods under 12 months, driven by the extreme cost of yield losses ($100,000+ per wafer batch) and the highly instrumented production environment. Oil and gas upstream operations achieve 12 to 18 month payback through reduced unplanned downtime and optimised production. Electric utilities see 18 to 24 month returns through deferred capital expenditure and improved asset utilisation. Commercial real estate and general manufacturing typically require 24 to 36 months, though this is improving as platform costs decline.

Q: What skills does my organisation need to build and maintain digital twins? A: Core capabilities include domain engineering expertise (understanding the physical system being twinned), data engineering (sensor integration, data pipelines, and quality management), simulation or modelling skills (computational physics, systems dynamics, or machine learning), and software engineering for platform development and integration. Most successful deployments use a hybrid team combining internal domain experts with external platform and modelling specialists. The growing availability of low-code twin development platforms is reducing but not eliminating the need for specialist skills.

Q: How do digital twins interact with generative AI and large language models? A: LLMs are being integrated as natural language interfaces to digital twins, enabling non-specialist users to query twin state, configure simulations, and interpret results through conversational interaction. More substantively, multimodal foundation models are being fine-tuned on physical simulation data to serve as rapid surrogate models, replacing computationally expensive physics simulations with near-instantaneous neural network inference. This integration is nascent but represents a significant opportunity to democratise access to simulation capabilities.

Sources

  • McKinsey Global Institute. (2024). The Economic Potential of Digital Twins: A Cross-Industry Assessment. New York: McKinsey & Company.
  • MarketsandMarkets. (2025). Digital Twin Market: Global Forecast to 2030. Pune: MarketsandMarkets Research.
  • Gartner. (2024). Predicts 2025: Synthetic Data and Simulation Will Reshape AI Development. Stamford, CT: Gartner Inc.
  • National Digital Twin Programme. (2024). Annual Report: Progress, Challenges, and Strategic Priorities. Cambridge: Centre for Digital Built Britain.
  • NVIDIA. (2025). Omniverse and Modulus: Industrial Digital Twin Platform Technical Documentation. Santa Clara, CA: NVIDIA Corporation.
  • Raissi, M., Perdikaris, P., and Karniadakis, G. E. (2019). Physics-informed neural networks: A deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations. Journal of Computational Physics, 378, 686-707.
  • Building Performance Institute Europe. (2025). Digital Twins for Building Decarbonisation: Adoption, Barriers, and Policy Recommendations. Brussels: BPIE.

Stay in the loop

Get monthly sustainability insights — no spam, just signal.

We respect your privacy. Unsubscribe anytime. Privacy Policy

Case Study

Case study: Digital twins, simulation & synthetic data — a city or utility pilot and the results so far

A concrete implementation case from a city or utility pilot in Digital twins, simulation & synthetic data, covering design choices, measured outcomes, and transferable lessons for other jurisdictions.

Read →
Case Study

Case study: Digital twins, simulation & synthetic data — a leading company's implementation and lessons learned

An in-depth look at how a leading company implemented Digital twins, simulation & synthetic data, including the decision process, execution challenges, measured results, and lessons for others.

Read →
Case Study

Case study: Digital twins, simulation & synthetic data — a startup-to-enterprise scale story

A detailed case study tracing how a startup in Digital twins, simulation & synthetic data scaled to enterprise level, with lessons on product-market fit, funding, and operational challenges.

Read →
Case Study

Case study: Digital twins, simulation & synthetic data — a pilot that failed (and what it taught us)

A concrete implementation with numbers, lessons learned, and what to copy/avoid. Focus on KPIs that matter, benchmark ranges, and what 'good' looks like in practice.

Read →
Article

Trend analysis: Digital twins, simulation & synthetic data — where the value pools are (and who captures them)

Strategic analysis of value creation and capture in Digital twins, simulation & synthetic data, mapping where economic returns concentrate and which players are best positioned to benefit.

Read →
Article

Market map: Digital twins, simulation & synthetic data — the categories that will matter next

A visual and analytical map of the Digital twins, simulation & synthetic data landscape: segments, key players, and where value is shifting.

Read →