AI & Emerging Tech·14 min read··...

Interview: the builder's playbook for Digital twins, simulation & synthetic data — hard-earned lessons

A practitioner conversation: what surprised them, what failed, and what they'd do differently. Focus on implementation trade-offs, stakeholder incentives, and the hidden bottlenecks.

The global digital twins market reached $17-25 billion in 2024, yet 67% of manufacturers cite data integration as their primary obstacle to implementation. Meanwhile, 60% of AI and analytics projects now use synthetically generated data—a figure that seemed implausible just three years ago. We spoke with practitioners across aerospace, manufacturing, and energy to understand the hard-earned lessons from deploying digital twins at scale, the hidden bottlenecks that derail projects, and the trade-offs that technology vendors rarely mention.

The gap between vendor promises and operational reality has never been wider. Digital twin implementations that looked straightforward in pilot phases collapse when confronting legacy systems, fragmented data landscapes, and organisational resistance. Synthetic data generation, heralded as the solution to privacy constraints and data scarcity, introduces its own failure modes—including the emerging phenomenon of model collapse. Here's what practitioners wish they had known before their first deployment.

Why It Matters

Digital twins represent a fundamental shift in how engineers model, simulate, and optimise physical systems. The technology enables real-time monitoring, predictive maintenance, and scenario testing without risking physical assets. For UK engineers, the stakes are particularly high: the National Digital Twin Programme (NDTP) is now in its third tranche, with the £37.6 million UK Digital Twin Centre opening in Belfast in May 2025 to accelerate adoption across maritime, aerospace, and defence sectors.

The commercial case is compelling. PepsiCo's deployment of Siemens Digital Twin Composer identified 90% of potential issues before physical modifications, delivered a 20% throughput increase, and reduced capital expenditure by 10-15% through virtual validation. BMW's aerodynamics simulations using Simcenter STAR-CCM+ with NVIDIA Blackwell GPUs compress design iterations from days to hours—achieving the computational equivalent of 10,000+ CPU cores on a single GPU.

Yet the technology readiness level for industrial digital twins sits at just 4.8 out of 9 according to expert panels, indicating significant maturation gaps. For engineers evaluating digital twin investments, understanding where value accrues and where projects fail determines whether implementations generate returns or become expensive technical debt.

Key Concepts

Digital Twin Maturity Levels

Practitioners distinguish between three maturity levels, each with distinct infrastructure requirements and value propositions:

Descriptive twins mirror physical assets through sensor data and basic visualisation. These require IoT connectivity and data pipelines but minimal simulation capability. Most industrial deployments remain at this level.

Predictive twins incorporate physics-based and machine learning models to forecast asset behaviour, maintenance requirements, and failure modes. The computational demands increase substantially, typically requiring cloud or edge computing infrastructure with GPU acceleration.

Prescriptive twins autonomously recommend or implement optimisations based on continuous analysis. Siemens' vision of "adaptive manufacturing" at their Erlangen facility—announced for 2026 deployment—represents this frontier, where factories continuously analyse digital twins, test improvements virtually, and deploy validated changes to the shop floor.

Synthetic Data Generation Methods

Synthetic data has become essential for training AI models where real data is scarce, privacy-constrained, or imbalanced. The primary generation approaches include:

Generative Adversarial Networks (GANs) accounted for 38.2% of synthetic data market revenue in 2024, creating realistic samples through competing neural networks. GANs excel at image and tabular data but require careful tuning to avoid mode collapse.

Diffusion models represent the fastest-growing segment at 47.6% CAGR through 2030. These models generate high-fidelity outputs by learning to reverse a gradual noising process, producing superior image quality compared to GANs for many applications.

Physics-based simulation generates synthetic data through computational models of physical processes. This approach dominates autonomous vehicle testing, where companies like Tesla and Waymo generate driving scenarios that would be dangerous or impossible to capture in the real world.

The Model Collapse Problem

A critical finding from 2024 research: training AI models exclusively on synthetic data leads to degradation over successive generations. As synthetic data propagates through training pipelines, statistical properties diverge from real-world distributions. Nature published research demonstrating that models trained on AI-generated data experience "model collapse"—a progressive loss of fidelity that compounds with each generation. Practitioners must maintain real data anchors and implement quality control mechanisms to avoid this failure mode.

What's Working

Siemens-NVIDIA Industrial AI Stack

The Siemens-NVIDIA partnership, expanded at CES 2026, demonstrates digital twin deployment at industrial scale. Their Digital Twin Composer unifies 2D/3D digital twin data with real-time physical information using NVIDIA Omniverse libraries, available on the Siemens Xcelerator Marketplace from mid-2026.

The technical stack enables physics-accurate simulation across the product lifecycle. NVIDIA PhysicsNeMo powers autonomous digital twins for real-time engineering design, while Industrial Copilot for Operations—running on NVIDIA RTX PRO 6000 Blackwell Server Edition GPUs—delivers a 30% reduction in reactive maintenance time for shopfloor operators.

HD Hyundai, the world's largest shipbuilder, uses this stack to manage 7+ million discrete parts in real-time for next-generation hydrogen and ammonia-powered vessels. Design iteration time has compressed from days to hours through generative AI-assisted workflows.

UK National Digital Twin Programme

The NDTP has progressed to Tranche 3 as of March 2025, focusing on scaling and operationalisation across infrastructure sectors. The programme's Integration Architecture enables cross-sector interoperability through the Information Management Framework (IMF) and Gemini Principles.

A Memorandum of Understanding with the National Energy System Operator, signed in 2024, demonstrates practical application in critical national infrastructure. The Digital Twin Hub community platform provides practitioners with standards, frameworks, and peer learning opportunities.

The UK Digital Twin Centre in Belfast offers physical facilities including a 360-degree immersive space and laboratory spanning six enabling technology areas: intelligence, data services, immersive user experience, cyber-physical systems, integration, and security. Partners include Thales UK, Spirit AeroSystems, and Artemis Technologies.

Commonwealth Fusion Systems SPARC Reactor

Announced in January 2026, Commonwealth Fusion Systems' partnership with Siemens and NVIDIA to build an AI-powered digital twin of the SPARC fusion machine represents digital twin technology applied to frontier engineering challenges. The integration of Siemens NX software with NVIDIA Omniverse libraries aims to "compress years of manual experimentation into weeks of virtual optimisation."

The same digital twin technology extends to CFS's magnet factory operations in Devens, Massachusetts, demonstrating how organisations can leverage digital twins across both R&D and manufacturing operations simultaneously.

What's Not Working

Data Integration Complexity

Sixty-seven percent of manufacturers rank data integration as their primary obstacle. Fragmented, siloed data landscapes across enterprise systems create fundamental barriers before any simulation or AI capability can be deployed.

Legacy equipment—often 20-30 years old—lacks IoT connectivity and requires middleware, protocol translation, and significant custom engineering. A senior integration engineer at a UK aerospace manufacturer describes the challenge: "We spent eighteen months just getting data flowing from legacy PLCs before we could build any digital twin capability. The vendor timeline assumed modern, connected equipment."

Multimodal data fusion—combining spatial, temporal, and operational data streams—remains technically challenging. Real-time synchronisation across geographically distributed assets requires substantial edge computing infrastructure that most organisations underestimate.

Skills and Talent Shortage

Digital twin implementation demands multidisciplinary expertise spanning IoT, AI/ML, data science, cybersecurity, and domain-specific engineering knowledge. Korn Ferry projects a 4.3 million tech-skilled worker deficit by 2030, with digital twin specialists among the scarcest roles.

The UK Digital Twin Centre's Digital Twin Adoption Accelerator—offering up to £100k Innovate UK grants—explicitly addresses this gap through industry-SME partnerships. But practitioners report that even with funding, finding qualified personnel extends project timelines by 6-12 months.

Synthetic Data Quality Control

While 60% of AI projects now incorporate synthetic data, quality assurance remains immature. The model collapse phenomenon documented in Nature research demonstrates systematic risks when synthetic data dominates training pipelines.

Ensuring statistical fidelity to real-world distributions requires ongoing validation that most organisations lack processes to perform. Bias amplification—where synthetic data generation magnifies existing dataset biases—creates regulatory risk under the EU AI Act's requirements for training data quality.

NVIDIA's March 2025 acquisition of Gretel signals market recognition that synthetic data generation requires sophisticated tooling. But practitioners report six-figure monthly GPU bills for high-quality synthetic data generation at scale, with ROI justification remaining challenging for applications beyond autonomous vehicle simulation.

Standardisation Gaps

The absence of universal consensus on digital twin definitions, components, and interfaces creates vendor lock-in and interoperability failures. The NDTP's Information Management Framework addresses this at the national level, but cross-vendor compatibility remains limited.

Building code approval for simulation-validated designs lags behind technical capability. A structural engineer working on construction digital twins notes: "We can simulate building performance with high fidelity, but regulatory acceptance still requires physical testing. The simulation saves engineering time, not approval time."

Key Players

Established Leaders

Siemens operates the most comprehensive industrial digital twin portfolio through Xcelerator, integrating Teamcenter PLM, NX Designcenter, and Simcenter simulation tools. The January 2026 Digital Twin Composer launch unifies their stack with NVIDIA Omniverse for physics-accurate simulation.

NVIDIA provides the computational substrate for industrial digital twins through Omniverse libraries, CUDA-X acceleration, and AI frameworks including PhysicsNeMo and NIM microservices. Their Blackwell GPU architecture delivers 25x AI execution acceleration for industrial applications.

Dassault Systèmes offers 3DEXPERIENCE as an integrated platform spanning CAD, simulation, and lifecycle management. Their strength lies in aerospace and automotive applications with deep physics modelling capabilities.

Microsoft Azure Digital Twins provides cloud-native digital twin infrastructure with strong integration into enterprise IT environments. The platform excels at building management and smart infrastructure applications.

PTC ThingWorx focuses on industrial IoT connectivity and augmented reality applications, with particular strength in field service and maintenance use cases.

Emerging Startups

Cosmo Tech delivers decision-intelligence through simulation digital twins, enabling scenario modelling for complex systems including supply chains, energy networks, and urban infrastructure.

Cognite provides industrial data operations through Data Fusion, contextualising operational data for digital twin applications. Strong adoption in energy and manufacturing sectors.

Akselos offers reduced-order modelling for structural digital twins, enabling real-time simulation of large infrastructure assets like offshore wind turbines and bridges.

Synthesis AI generates photorealistic synthetic data for computer vision training, with applications spanning automotive, retail, and security sectors.

Mostly AI leads in tabular synthetic data generation for financial services and healthcare, where privacy regulations constrain real data usage.

Key Investors & Funders

Innovate UK provides funding through programmes including the Digital Twin Adoption Accelerator (up to £100k grants) and Academic R&D Accelerator.

Belfast Region City Deal contributed £37.6 million to establish the UK Digital Twin Centre, demonstrating regional economic development through digital twin infrastructure.

European Investment Bank and EU Innovation Fund support large-scale industrial digital twin projects, with Siemens committing €500 million for industrial AI cloud infrastructure deploying NVIDIA RTX PRO Servers and DGX B200 systems.

Breakthrough Energy Ventures and Lowercarbon Capital back digital twin startups addressing climate and sustainability applications.

Action Checklist

  1. Audit your data landscape before evaluating vendors. Map existing data sources, formats, and connectivity across target assets. The 67% of manufacturers citing integration as their primary obstacle typically discovered this gap after vendor selection.

  2. Start with high-ROI use cases demonstrating clear value. Predictive maintenance and bottleneck prediction offer measurable returns. Avoid ambitious prescriptive automation until descriptive and predictive capabilities prove reliable.

  3. Budget for skills development alongside technology. Allocate 20-30% of digital twin investment to training and potentially external expertise. The 4.3 million worker deficit means internal capability building is essential.

  4. Implement synthetic data quality controls from day one. Maintain real data anchors, establish statistical validation processes, and monitor for distribution drift. Model collapse is progressive and difficult to reverse once embedded in training pipelines.

  5. Engage with NDTP frameworks and Digital Twin Hub resources. The Integration Architecture and Information Management Framework provide standardised approaches that reduce vendor lock-in and improve interoperability.

  6. Plan for edge computing requirements. Real-time digital twins require local processing capacity that cloud-only architectures cannot deliver. Assess latency, bandwidth, and reliability requirements for your specific use cases.

  7. Pilot with partners through UK Digital Twin Centre programmes. The Adoption Accelerator and SME Network provide funded opportunities to test digital twin approaches with industry guidance before major capital commitments.

  8. Establish governance for simulation-validated decisions. Define which decisions can rely on digital twin outputs and which require physical validation. Regulatory acceptance lags technical capability in most domains.

FAQ

Q: What's the realistic timeline for digital twin deployment, and how does it compare to vendor estimates?

Practitioners consistently report 2-3x vendor timelines for meaningful deployment. A descriptive digital twin connecting existing IoT infrastructure might deploy in 3-6 months. Predictive twins requiring new sensor deployment, data pipeline construction, and model development typically require 12-24 months. Prescriptive twins with autonomous optimisation remain multi-year programmes. The critical variable is existing data infrastructure maturity—organisations with connected, well-documented asset data proceed significantly faster than those requiring foundational IoT deployment.

Q: How do we justify digital twin ROI to leadership when benefits are difficult to quantify upfront?

Successful business cases focus on specific, measurable outcomes rather than platform capabilities. PepsiCo's 20% throughput increase and 10-15% CapEx reduction through virtual validation provide reference points. For UK manufacturers, the 30% reduction in reactive maintenance time from Siemens Industrial Copilot offers a conservative baseline. Structure pilots to deliver measurable value within 6-9 months, then expand based on demonstrated returns. Avoid multi-year platform investments without intermediate value gates.

Q: When should we use synthetic data versus investing in real data collection?

Synthetic data excels in three scenarios: privacy-constrained domains (healthcare, financial services), rare event augmentation (fraud detection, failure prediction), and simulation-based training (autonomous systems, robotics). For standard industrial applications with accessible operational data, real data remains superior. The 60% of AI projects using synthetic data reflects privacy constraints and rare event challenges more than general preference. Critically, synthetic data should augment rather than replace real data to avoid model collapse risks. Maintain at least 30-40% real data in training pipelines as an anchor.

Q: How do we address cybersecurity concerns with interconnected digital twins?

Digital twins create attack surfaces that require security architecture from inception rather than retrofit. Siemens' integration of NVIDIA BlueField DPUs demonstrates hardware-level security for operational technology environments. Practical measures include: network segmentation isolating digital twin infrastructure from operational systems, role-based access controls limiting simulation and optimisation permissions, encrypted data channels for all sensor and command traffic, and regular penetration testing by specialists familiar with industrial control systems. The NDTP's security guidance provides UK-specific frameworks aligned with NCSC recommendations.

Q: What's the relationship between digital twins and AI agents—are these complementary or competing approaches?

Digital twins and AI agents are increasingly convergent rather than competing. The Siemens-NVIDIA "AI Brain" concept demonstrates this integration: AI agents continuously analyse digital twin state, test interventions through simulation, and deploy validated changes to physical systems. LLM-based agents provide natural language interfaces to digital twin data—Siemens' Industrial Copilot allows shopfloor operators to query complex operational data conversationally. The emerging architecture uses digital twins as the world model that AI agents act within, enabling safe exploration and validation before physical action. For UK engineers, this convergence suggests evaluating digital twin platforms for AI agent integration capabilities rather than treating them as separate technology decisions.

Sources

The convergence of digital twins, simulation, and synthetic data represents a fundamental transformation in how engineers design, build, and operate physical systems. The technology has matured beyond proof-of-concept, with industrial deployments delivering measurable returns. Yet the gap between vendor capabilities and operational reality remains substantial. Engineers who approach digital twin implementation with clear-eyed assessment of their data infrastructure, skills capacity, and governance requirements will capture disproportionate value as the market grows from $25 billion today toward $260 billion by 2032. Those who underestimate integration complexity, skills requirements, and standardisation gaps will contribute to the substantial failure rate that practitioners describe—but rarely vendors acknowledge.

Related Articles