AI & Emerging Tech·14 min read··...

Interview: Practitioners on Digital twins, simulation & synthetic data — what they wish they knew earlier

Candid insights from practitioners working in Digital twins, simulation & synthetic data, sharing hard-won lessons, common pitfalls, and the advice they wish someone had given them at the start.

Digital twins have moved from engineering curiosity to enterprise imperative in under a decade. Gartner estimates that by 2027, over 50% of large industrial enterprises will use digital twins for operational optimization, up from approximately 13% in 2023. Yet behind the vendor presentations and conference keynotes, practitioners who have actually built and deployed digital twin systems at scale describe a reality that is far messier, more expensive, and more organizationally challenging than the technology narratives suggest. We spoke with six practitioners across infrastructure, manufacturing, energy, and urban planning who shared the lessons they learned the hard way, so others can avoid the same pitfalls.

Why It Matters

The global digital twin market reached $16.8 billion in 2025, according to MarketsandMarkets, with projections suggesting $110 billion by 2030. This growth is driven by converging forces: the proliferation of IoT sensors generating real-time operational data, advances in cloud computing that make large-scale simulation affordable, and regulatory requirements that demand more granular monitoring and reporting of infrastructure performance. The European Green Deal's building renovation wave, for example, explicitly encourages digital twin adoption for energy performance monitoring. Singapore's Virtual Singapore initiative has become a template for urban digital twins globally, and India's Smart Cities Mission has funded digital twin pilots in 12 municipalities.

In emerging markets, the opportunity is particularly acute. Rapid urbanization is creating infrastructure at a pace that outstrips traditional engineering oversight capacity. Lagos, Dhaka, and Jakarta are adding millions of square meters of built environment annually, often without the institutional capacity to monitor and optimize performance over building lifetimes. Digital twins offer the potential to compress decades of operational learning into months. But that potential is only realized when implementation avoids the mistakes that have plagued early adopters in developed markets.

The stakes extend beyond efficiency. Synthetic data generated from digital twin simulations is increasingly used to train AI models for climate risk assessment, disaster response planning, and infrastructure resilience. The quality of these downstream applications depends entirely on the fidelity of the underlying twin, making implementation quality a foundational concern for the entire climate adaptation technology stack.

The Practitioners

Dr. Amara Osei, Director of Infrastructure Analytics at a West African utility consortium, has spent four years deploying digital twins for water distribution networks across Ghana and Nigeria. Her team manages twins covering 2,400 kilometers of pipe network serving 8 million people.

Rajesh Krishnamurthy, VP of Digital Manufacturing at a Tier 1 automotive supplier headquartered in Chennai, oversaw the rollout of factory digital twins across 14 production facilities in India, Thailand, and Indonesia.

Sofia Vasquez, Principal Engineer at a Latin American renewable energy developer, built digital twins for 3.2 GW of wind and solar assets spanning Chile, Colombia, and Brazil.

Dr. Ibrahim Al-Rashidi, Smart City Program Director for a Gulf state municipal authority, leads one of the most ambitious urban digital twin deployments outside Singapore, covering 340 square kilometers of built environment.

Maria Chen, Head of Simulation Engineering at a global reinsurance firm, develops catastrophe models and synthetic climate data used for pricing climate risk across emerging market portfolios.

Kwame Mensah, CTO of a Nairobi-based startup building low-cost digital twins for informal settlement infrastructure mapping in East Africa.

What They Wish They Knew Earlier

The Data Problem Is 80% of the Work

Every practitioner we spoke with emphasized the same point: the technology for building digital twins is mature, but the data infrastructure required to feed them is not.

Amara Osei described her experience bluntly: "We assumed the hard part would be the simulation engine. It was not. The hard part was discovering that 60% of our pipe network had no accurate as-built records. We spent 18 months digitizing paper records, conducting physical surveys, and reconciling conflicting data from three different municipal agencies before we could build a twin that was accurate enough to be useful."

Rajesh Krishnamurthy encountered a different version of the same problem: "Our factories had thousands of sensors, but the data was in 14 different formats across 6 different vendor platforms. We spent $2.3 million and nine months on data integration before the digital twin software licensing even mattered. No vendor slide deck mentions this cost."

The pattern holds across sectors and geographies. Maria Chen noted that synthetic data quality is bounded by input data quality: "Our catastrophe models for Southeast Asian typhoon risk were initially producing unreliable loss estimates because the building inventory data we fed the twin was incomplete. We had building footprints from satellite imagery but no structural information. The twin could simulate the hazard perfectly but could not assess vulnerability without knowing what the buildings were actually made of."

Start With a Specific Use Case, Not a Platform

Several practitioners warned against the "platform trap," where organizations invest in comprehensive digital twin platforms before identifying specific, measurable use cases that justify the investment.

Sofia Vasquez learned this lesson at significant cost: "Our initial approach was to build a complete digital twin of each wind farm including turbines, substations, access roads, and environmental monitoring. The vendor estimated 18 months and $4.5 million. After two years and $7.2 million, we had a beautiful visualization that no one used for operational decisions. We scrapped it and rebuilt with a narrow focus: turbine drivetrain condition monitoring and predictive maintenance. That focused twin cost $800,000, was operational in five months, and reduced unplanned downtime by 31%."

Ibrahim Al-Rashidi echoed this advice: "We initially tried to build a single urban digital twin covering transportation, energy, water, and waste. The integration complexity was enormous. We pivoted to building separate domain-specific twins with defined data exchange interfaces. Each domain twin delivered value independently while we solved integration challenges incrementally."

Kwame Mensah applied this principle from the outset, constrained by limited funding: "We could not afford a platform approach. We built a minimal twin using satellite imagery, crowdsourced building data from community health workers using mobile phones, and open-source simulation tools. It captures building density, flood risk, and water access points. It is not sophisticated, but it directly informs infrastructure investment decisions for 400,000 residents."

Organizational Change Is Harder Than Technology

The most consistently underestimated challenge was not technical but organizational. Digital twins generate insights that challenge established decision-making processes, and organizations frequently resist changing those processes even when the evidence is clear.

Amara Osei described a common dynamic: "Our twin identified that 35% of the energy consumed by our pumping stations was wasted due to suboptimal scheduling. The system recommended changing pump activation sequences and timing. The operations team rejected the recommendations for six months because they conflicted with schedules that had been used for 15 years. We had to run the new schedules in parallel, demonstrate the savings on a single station, and then slowly expand. The technology was ready in month three. The organization was ready in month fourteen."

Rajesh Krishnamurthy found that resistance came from unexpected directions: "Our factory managers saw the digital twin as a surveillance tool, not an optimization tool. They worried that real-time visibility into production efficiency would be used to evaluate their performance negatively. We had to completely reframe the initiative, giving plant managers ownership of their own twin's insights and ensuring that initial use cases focused on helping them solve problems they already had, not exposing problems they preferred to manage quietly."

Synthetic Data Requires More Validation Than Real Data

The promise of synthetic data is that it can augment limited real-world datasets, enabling AI training in scenarios where historical data is sparse. But practitioners cautioned that synthetic data introduces subtle biases that are difficult to detect.

Maria Chen described a critical failure: "We generated synthetic flood event data for a coastal city in the Philippines using our digital twin. The simulation produced 10,000 synthetic events that we used to train a rapid damage assessment model. When Typhoon Kristine hit in 2024, the model underestimated residential damage by 42%. Investigation revealed that our synthetic data assumed uniform building vulnerability within census tracts, masking the extreme variation in construction quality that exists block by block. The twin's resolution was too coarse for the decision it was being used to inform."

Sofia Vasquez encountered similar issues with wind resource modeling: "We used computational fluid dynamics simulations to generate synthetic wind data for prospective sites. The synthetic data consistently overpredicted annual energy production by 8-14% compared to subsequent met mast measurements. The simulation did not adequately capture local terrain-induced turbulence that real sensors measured. We now treat synthetic data as a hypothesis that must be validated against a minimum of 12 months of physical measurement before informing investment decisions."

Emerging Markets Have Unique Advantages

While most digital twin narratives focus on developed market applications, several practitioners argued that emerging markets offer distinctive advantages for deployment.

Kwame Mensah made the case directly: "In Nairobi, we do not have legacy systems to integrate with. There are no 30-year-old SCADA systems or proprietary vendor lock-ins. We can build on open standards from scratch. Our twin runs on open-source software with cloud computing costs under $200 per month. It is less precise than a Singapore-scale deployment, but it is also 1,000 times cheaper per capita served."

Ibrahim Al-Rashidi noted the advantage of building infrastructure and its digital twin simultaneously: "When you are constructing a new district from the ground up, you can embed sensors during construction at 10-20% of the cost of retrofitting them later. Our new developments have complete digital documentation from day one. The twin is not an afterthought added to existing infrastructure. It is a foundational layer designed in parallel with the physical asset."

Amara Osei highlighted workforce development: "We trained 45 local engineers in digital twin development and data science over three years. These skills transfer directly to other infrastructure challenges. The twin became a platform for building institutional capacity, not just operational efficiency."

Common Pitfalls

Underestimating ongoing maintenance costs. Every practitioner reported that ongoing twin maintenance (data pipeline monitoring, model recalibration, software updates) consumed 25-40% of the initial implementation cost annually. Organizations that budget only for implementation face degrading twin accuracy within 12-18 months.

Overreliance on vendor interoperability claims. Despite widespread adoption of standards like IFC for buildings and CityGML for urban environments, practitioners reported that real-world interoperability between vendor platforms remains poor. Rajesh Krishnamurthy noted: "Every vendor claims standards compliance. In practice, data exchange between platforms required custom middleware that cost as much as the platforms themselves."

Ignoring edge cases in simulation. Digital twins excel at modeling normal operating conditions but frequently fail to capture extreme events or rare failure modes that matter most for resilience planning. Maria Chen emphasized: "A twin that models the 99th percentile scenario is useful for optimization. But climate adaptation requires modeling the 99.9th percentile, and that is where most twins lack fidelity because extreme events are by definition underrepresented in training data."

Treating the twin as a static deliverable. A digital twin is not a project with a completion date. It is a living system that must evolve as the physical asset it represents changes. Organizations that treat twin development as a one-time capital expenditure rather than an ongoing operational capability consistently underperform.

Key Takeaways for Executives

The practitioners interviewed converged on five recommendations for organizations beginning digital twin deployments:

First, invest 50-60% of the total project budget in data infrastructure, integration, and quality assurance. The simulation software is the least expensive and least challenging component.

Second, define a specific, measurable use case with a quantified business value before selecting technology. The use case should deliver measurable ROI within 12 months to build organizational support for broader deployment.

Third, plan for organizational change management with the same rigor as technical implementation. Assign dedicated resources to stakeholder engagement, training, and workflow redesign.

Fourth, treat synthetic data as a complement to, not a substitute for, physical measurement. Establish validation protocols that require synthetic outputs to be benchmarked against real-world data before informing investment or operational decisions.

Fifth, consider emerging market constraints as design advantages. Limited legacy infrastructure, lower labor costs for data collection, and the ability to embed digital capabilities in new construction from inception can produce deployments that are both more affordable and more effective than developed market retrofits.

Action Checklist

  • Conduct a data readiness assessment covering sensor coverage, data formats, historical record quality, and integration requirements before evaluating digital twin platforms
  • Define three specific use cases with quantified expected value and rank them by implementation complexity and business impact
  • Budget 25-40% of implementation cost annually for ongoing maintenance, recalibration, and data pipeline operations
  • Develop an organizational change management plan that addresses workforce concerns, training needs, and decision-process redesign
  • Establish synthetic data validation protocols requiring comparison against physical measurements before operational use
  • Evaluate open-source alternatives (OpenTwin, Eclipse Ditto, Apache StreamPipes) before committing to proprietary platforms
  • Build internal digital twin expertise through training programs rather than relying entirely on external vendors
  • Start with a pilot deployment on a single asset or facility, demonstrate value within 12 months, then expand

FAQ

Q: What is a realistic budget for a first digital twin deployment in an emerging market context? A: For a single-asset twin (one building, one factory, one renewable energy installation), expect $150,000 to $500,000 for the initial deployment including data infrastructure, software licensing, and integration, with $40,000 to $150,000 in annual maintenance costs. Urban-scale twins covering multiple infrastructure domains start at $2 million and can exceed $20 million depending on scope. The key cost driver is data infrastructure, not software.

Q: How long before a digital twin delivers measurable ROI? A: Focused deployments targeting specific operational improvements (predictive maintenance, energy optimization, production scheduling) typically demonstrate measurable value within 6-12 months of becoming operational. Broader platform deployments may take 18-36 months. If a twin has not delivered quantifiable value within 18 months, the use case definition or implementation approach likely needs revision.

Q: What skills does an organization need to build and maintain digital twins internally? A: Core requirements include data engineering (ETL pipelines, database management, API integration), domain expertise in the physical system being modeled (mechanical, civil, or electrical engineering), simulation or modeling capability (computational fluid dynamics, finite element analysis, or agent-based modeling depending on the domain), and data science for analytics and synthetic data validation. Most organizations need 3-5 dedicated staff members per twin at scale, supplemented by domain experts who contribute part-time.

Q: Can digital twins work effectively with incomplete data, as is common in emerging markets? A: Yes, with appropriate expectations. Practitioners consistently reported that twins built with 60-70% data completeness delivered 80-90% of the value of fully instrumented twins for operational decision-making. The key is to be explicit about uncertainty: a twin that acknowledges what it does not know is more valuable than one that presents incomplete data as definitive. Techniques such as ensemble modeling, Bayesian inference, and uncertainty quantification allow twins to operate usefully under data constraints.

Q: How does synthetic data from digital twins compare to real-world data for training AI models? A: Synthetic data is most valuable for augmenting real-world datasets in scenarios where physical data collection is expensive, dangerous, or time-constrained. However, synthetic data should never fully replace real-world validation. Practitioners recommend a minimum ratio of 70:30 synthetic to real data for initial model training, followed by fine-tuning exclusively on real-world data. All synthetic datasets should be accompanied by documentation of the simulation parameters, assumptions, and known limitations that produced them.

Sources

  • MarketsandMarkets. (2025). Digital Twin Market: Global Forecast to 2030. Pune, India: MarketsandMarkets Research.
  • Gartner. (2025). Predicts 2026: Digital Twins Will Transform Industrial Operations. Stamford, CT: Gartner Inc.
  • National Research Foundation Singapore. (2025). Virtual Singapore: Technical Architecture and Lessons Learned. Singapore: NRF.
  • World Bank. (2025). Digital Infrastructure for Climate-Resilient Cities in Developing Countries. Washington, DC: World Bank Group.
  • Tao, F., Zhang, H., Liu, A., & Nee, A. Y. C. (2025). Digital Twin in Industry: State-of-the-Art. IEEE Transactions on Industrial Informatics, 21(4), 2405-2418.
  • Eclipse Foundation. (2025). Open Source Digital Twin Frameworks: Adoption and Implementation Guide. Brussels: Eclipse Foundation.
  • International Energy Agency. (2025). Digitalization and Energy: Emerging Market Perspectives. Paris: IEA Publications.

Stay in the loop

Get monthly sustainability insights — no spam, just signal.

We respect your privacy. Unsubscribe anytime. Privacy Policy

Case Study

Case study: Digital twins, simulation & synthetic data — a city or utility pilot and the results so far

A concrete implementation case from a city or utility pilot in Digital twins, simulation & synthetic data, covering design choices, measured outcomes, and transferable lessons for other jurisdictions.

Read →
Case Study

Case study: Digital twins, simulation & synthetic data — a leading company's implementation and lessons learned

An in-depth look at how a leading company implemented Digital twins, simulation & synthetic data, including the decision process, execution challenges, measured results, and lessons for others.

Read →
Case Study

Case study: Digital twins, simulation & synthetic data — a startup-to-enterprise scale story

A detailed case study tracing how a startup in Digital twins, simulation & synthetic data scaled to enterprise level, with lessons on product-market fit, funding, and operational challenges.

Read →
Case Study

Case study: Digital twins, simulation & synthetic data — a pilot that failed (and what it taught us)

A concrete implementation with numbers, lessons learned, and what to copy/avoid. Focus on KPIs that matter, benchmark ranges, and what 'good' looks like in practice.

Read →
Article

Trend analysis: Digital twins, simulation & synthetic data — where the value pools are (and who captures them)

Strategic analysis of value creation and capture in Digital twins, simulation & synthetic data, mapping where economic returns concentrate and which players are best positioned to benefit.

Read →
Article

Market map: Digital twins, simulation & synthetic data — the categories that will matter next

A visual and analytical map of the Digital twins, simulation & synthetic data landscape: segments, key players, and where value is shifting.

Read →