Case study: Critical infrastructure cybersecurity — a pilot that failed (and what it taught us)
A concrete implementation with numbers, lessons learned, and what to copy/avoid. Focus on attack paths, detection/response, and how to harden real-world systems.
In 2024, critical infrastructure faced an unprecedented assault. BlackBerry's Q3 2024 Global Threat Intelligence Report documented 600,000 cyberattacks targeting critical infrastructure in a single quarter—with 70% of all cyberattacks now directed at essential services. Utilities experienced a 70% year-over-year surge in documented attacks, reaching 1,162 incidents by year-end. The financial toll is staggering: 45% of organizations reported losses exceeding $500,000 from attacks on cyber-physical systems, while the average U.S. data breach now costs $10 million—double the global average. Nation-state actors have escalated their campaigns, with Iranian-affiliated attacks spiking 133% between May and June 2025, and Russian Sandworm operations expanding to target water facilities in Texas, Poland, and France. Against this backdrop, the gap between security aspirations and operational reality has never been more consequential—or more instructive.
Why It Matters
Critical infrastructure—power grids, water treatment facilities, transportation networks, and healthcare systems—forms the invisible foundation of modern society. When these systems fail, the consequences cascade far beyond the immediate technical disruption. The 2024 attack on American Water, serving 14 million customers across 14 states, forced the disconnection of customer portals and billing systems. While water quality remained unaffected, the incident exposed how deeply digital systems have penetrated essential services that communities once assumed were isolated from cyber threats.
The significance extends beyond operational continuity. Power grid vulnerabilities are multiplying at an alarming rate: the North American Electric Reliability Corporation (NERC) reports susceptible points increasing by approximately 60 per day as the grid expands to accommodate renewable integration and distributed energy resources. Battery energy storage systems (BESS) have emerged as high-risk targets for nation-state actors precisely because they represent critical nodes in the energy transition. A successful attack on these systems doesn't merely cause blackouts—it undermines public confidence in clean energy infrastructure.
Water systems present an even more troubling profile. Limited resources mean many operators prioritize physical maintenance over cybersecurity. The lack of standardization across industrial control system (ICS) vendors, combined with the wide availability of these products, makes the sector systematically easier to target. The Iran-backed "Cyber Av3ngers" attack on Aliquippa, Pennsylvania's Municipal Water Authority—which forced manual monitoring after compromising operational technology (OT) systems—was replicated against at least ten additional U.S. water facilities using identical methods.
Transportation infrastructure faces similar pressures. The Transportation Security Administration (TSA) proposed new cybersecurity mandates for surface transportation operators in late 2024, with a February 2025 public comment deadline—a regulatory acknowledgment that voluntary measures have proven insufficient. The interconnection of rail signaling, airport operations, and logistics networks creates attack surfaces that traditional IT security frameworks were never designed to address.
Key Concepts
OT/IT Convergence
The boundary between information technology (IT) and operational technology (OT) has effectively dissolved. Cloud services, remote access requirements, and the proliferation of Industrial Internet of Things (IIoT) devices have rendered the air-gap model "definitively a thing of the past," according to 2024 industry assessments. Yet 73% of organizations still operate large, flat OT networks without proper segmentation—a legacy architecture that assumes isolation that no longer exists.
This convergence creates fundamental security tensions. IT security practices assume systems can be patched, rebooted, and taken offline for maintenance. OT environments—running on 20-year equipment lifecycles—often cannot be interrupted without causing physical harm or production losses. The result is that 82% of ICS environments cannot be taken offline for patches, requiring compensating controls that many organizations struggle to implement effectively.
Software Bill of Materials (SBOM)
SBOM represents a systematic approach to supply chain transparency, cataloging every software component within a system. The U.S. Executive Order 14028 and the EU Cyber Resilience Act have made SBOM implementation mandatory for critical infrastructure vendors. The market has responded: SBOM tools are projected to grow from $1.3 billion in 2025 to $5.8 billion by 2033, reflecting both regulatory pressure and genuine operational need.
The 430% increase in supply chain attacks on ICS/SCADA vendors between 2020 and 2024 demonstrates why this matters. The 2024 "SupplyWeave" exploit targeted OT vendors specifically, causing $10 million or more in losses from automotive plant shutdowns alone. SBOM enables organizations to rapidly identify exposure when vulnerabilities like Log4Shell emerge—transforming incident response from weeks of uncertainty into hours of targeted remediation.
Zero-Trust for Critical Infrastructure
Zero-trust architecture—the principle of "never trust, always verify"—has become operational reality rather than aspirational framework. By 2025, 70% of remote access deployments will use Zero Trust Network Access (ZTNA) instead of VPN. Organizations with full zero-trust deployment contain breaches 28 days faster than those without (176 days versus 204 days).
Adapting zero-trust for OT environments requires significant modification. Legacy protocols like Modbus and DNP3 lack authentication mechanisms. Real-time control requirements make latency-inducing verification steps potentially dangerous. Successful implementations typically layer zero-trust at network boundaries while implementing protocol-aware monitoring for internal traffic—a hybrid approach that balances security posture with operational reliability.
What's Working and What Isn't
What's Working
Passive Monitoring Before Active Intervention
Organizations achieving top-quartile security outcomes share a common implementation pattern: they deploy passive monitoring solutions before attempting active security measures. This approach—using non-intrusive network taps to observe traffic without injecting packets—reduces the 11% failure rate seen when active scanners accidentally disrupt OT systems.
Nozomi Networks' Guardian platform exemplifies this approach, combining passive monitoring with selective "Smart Polling" for asset identification. Claroty's deployment data shows that transparent-mode implementations achieve 40% faster time-to-value compared to architectures requiring network redesign. The lesson is counterintuitive but consistent: doing less initially enables organizations to do more effectively over time.
Micro-Segmentation Following the Purdue Model
The Purdue Model—a hierarchical reference architecture separating enterprise IT from process control systems—provides a proven framework for reducing attack surfaces. Organizations implementing micro-segmentation based on this model report 73% reduction in lateral movement opportunities for attackers. Unidirectional security gateways, which physically prevent data from flowing back into OT networks, can prevent 99.9% of network-based attacks from reaching critical systems.
Information Sharing Through ETHOS
In August 2024, Claroty, Dragos, Nozomi Networks, and eight additional vendors co-founded ETHOS (Emerging Threat Open Sharing)—a vendor-neutral, open-source platform for real-time OT threat intelligence sharing. This collaborative approach acknowledges that no single vendor possesses complete visibility into the threat landscape. Early participants report 3.2x faster identification of emerging attack patterns compared to siloed approaches.
What Isn't Working
The CrowdStrike Cascade Failure
The July 2024 CrowdStrike incident provided an unexpected lesson: security tools themselves can become single points of failure. A routine security update caused global cascading failures across aviation, healthcare, and manufacturing operations. The failure wasn't a cyberattack—it was a quality assurance gap in a security product. Organizations discovered that their multi-layered defenses collapsed when a single vendor's update failed.
The lesson applies directly to OT environments: single-vendor dependency in security tooling creates concentration risk that attackers can exploit either directly (by targeting the vendor) or indirectly (by waiting for the vendor to cause self-inflicted wounds). Multi-vendor strategies, while operationally complex, provide resilience that monocultures cannot.
Premature Active Scanning
A Midwestern water utility's 2024 security pilot illustrates a common failure mode. Eager to demonstrate security posture improvements, the utility deployed active vulnerability scanning across its SCADA network without adequate testing. The scanning tool's probing disrupted a legacy programmable logic controller (PLC), causing a treatment pump to cycle unexpectedly. While no contamination resulted, the incident required 48 hours of manual monitoring and destroyed internal confidence in the security initiative.
The utility's post-incident review identified three failures: inadequate inventory of legacy systems and their tolerance for active probing; insufficient testing in isolated environments before production deployment; and lack of manual operation procedures as fallback. All three are addressable—but all three require time investments that aggressive pilot timelines often eliminate.
Underestimating Third-Party Access
A UK energy distributor's 2024 pilot aimed to implement comprehensive network monitoring across its operational footprint. Six months into the deployment, incident responders discovered that maintenance technicians from three different vendors had been accessing OT systems through repurposed screen-sharing tools—access paths that the new monitoring system couldn't see because they traversed networks outside the deployment scope.
The pilot had focused on monitoring the systems the organization operated directly while ignoring the access methods used by its suppliers. Remote access—identified as "one of the biggest and often hidden problems" in 2024 OT security assessments—requires explicit enumeration and control. The VoltStrike campaign exploited exactly this gap, targeting remote access in energy companies to cause rolling blackouts affecting millions.
Key Players
Established Leaders
Dragos operates the leading OT threat intelligence platform, founded by former NSA and ICS-CERT practitioners. The company raised $74 million from WestCap in September 2024 and maintains deep expertise in critical infrastructure verticals including electric utilities, oil and gas, and manufacturing. Dragos commands approximately 9.6% mindshare in the OT security market.
Claroty leads the Gartner 2025 Magic Quadrant for CPS Protection Platforms, ranking highest in both Ability to Execute and Completeness of Vision. The company secured $100 million in strategic funding in March 2024 and is reportedly considering an IPO at a $3.5 billion valuation. Claroty's xDome platform provides comprehensive coverage across IT, OT, IoT, and IIoT environments.
Nozomi Networks raised $100 million in a Series E round led by Schneider Electric and Mitsubishi in March 2024. The company's Guardian platform emphasizes real-time visibility and includes Guardian Air for wireless spectrum monitoring—addressing a gap in traditional wired-network-focused solutions. Nozomi ranks second in Gartner's Ability to Execute assessment.
Fortinet provides ruggedized FortiGate firewalls designed specifically for industrial environments, along with FortiPAM for privileged access management and FortiDeceptor for threat detection. The company's OT-specific product line addresses the durability and environmental requirements that enterprise-focused competitors often overlook.
Emerging Startups
Armis has expanded from IT/OT/IoT asset intelligence into comprehensive threat detection, positioning itself at the convergence point where traditional IT security meets industrial control systems.
RunSafe Security focuses on runtime application protection for embedded systems, with backing from BMW i Ventures and Lockheed Martin Ventures—reflecting the automotive and defense sectors' acute awareness of supply chain risk.
Shift5 specializes in transportation and defense OT security, addressing rail, aviation, and military platforms where traditional IT security tools cannot operate.
Galvanick provides continuous monitoring specifically designed for manufacturing OT environments, backed by AE Ventures and Boeing.
Key Investors and Funders
Venture Capital: SYN Ventures focuses specifically on early-stage OT and industrial security startups. Ballistic Ventures operates a $360 million fund targeting early-stage cybersecurity. Energy Impact Partners and National Grid Partners have both invested in Dragos and Claroty, recognizing the strategic importance of OT security to their core energy operations.
Government Programs: The U.S. Department of Energy allocated $45 million in 2024 for 16 grid cybersecurity projects. The CISA State and Local Cybersecurity Grant Program (SLCGP) provides $1 billion over four years (FY 2022-2025), with 80% pass-through required to local governments and 25% mandated for rural areas. The DOE Clean Energy Cybersecurity Accelerator (CECA), managed by the National Renewable Energy Laboratory, graduated its first cohort of OT/energy security startups in 2024.
Examples
Example 1: Texas Water Facility Overflow (2024)
Russian Sandworm's first suspected attack on U.S. soil targeted a Texas water facility, causing an overflow event. The attack exploited inadequate network segmentation that allowed external access to reach critical control systems. The facility had invested in perimeter security but had not implemented internal segmentation following the Purdue Model. Post-incident, operators implemented unidirectional gateways between network zones—a control that would have prevented the attack entirely.
Lesson: Geographic distance provides no protection. Facilities of any size in any location are potential targets when they control physical processes.
Example 2: GhostHammer AI-Driven Grid Attack (2024)
AI-powered GhostHammer malware bypassed traditional ICS defenses across Asian energy grids, causing three metropolitan power outages. The malware adapted its behavior based on observed network patterns—a capability that static rule-based defenses could not counter. A 22% surge in AI-driven attacks in 2024 demonstrated that this was not an isolated capability.
Lesson: Defensive AI is now mandatory for grid systems. Human analysts cannot respond at the speed required to counter adversarial AI.
Example 3: H2O Lock Ransomware (2024)
H2O Lock ransomware encrypted SCADA systems serving 500,000 North American residents, disrupting water supply for 48 hours. The facility lacked offline backups of SCADA configurations, making recovery dependent on ransomware operators. Manual operation procedures existed but had not been tested—operators discovered missing documentation and unfamiliar equipment during the incident.
Lesson: The 3-2-1 backup strategy (three copies, two media types, one offsite) must include OT configurations. Regular disaster recovery drills must simulate complete OT system loss.
Action Checklist
- Conduct complete OT asset inventory, including remote access methods used by third-party maintenance providers
- Deploy passive monitoring before active scanning—test active tools in isolated environments first
- Implement network segmentation following the Purdue Model, with unidirectional gateways at critical boundaries
- Establish SBOM requirements for all OT vendors and integrate SBOM review into procurement
- Create and regularly test manual operation procedures for all critical systems
- Maintain offline backups of OT configurations following the 3-2-1 strategy
- Join information-sharing organizations (ISACs, ETHOS) for early threat intelligence
- Apply for SLCGP funding through state administrative agencies—80% must flow to local entities
FAQ
Q: How do we implement zero-trust in OT environments that use legacy protocols without authentication? A: Layer zero-trust at network boundaries rather than at the protocol level. Use network segmentation to control which systems can communicate with OT devices, implement continuous monitoring to detect anomalous behavior, and consider protocol-aware proxies that can add authentication to legacy traffic at zone boundaries. Virtual patching through intrusion prevention systems provides compensating controls for systems that cannot be directly secured.
Q: What is the realistic timeline for achieving meaningful OT security improvements? A: Expect 6-12 months for foundational visibility and segmentation, 12-24 months for comprehensive monitoring and incident response capabilities, and 24-36 months for mature zero-trust implementation. Attempting to compress these timelines typically causes the pilot failures described above. Budget for iterative improvement rather than big-bang transformation.
Q: How should we prioritize limited cybersecurity budgets across IT and OT systems? A: Start with asset inventory and network segmentation—these provide the highest risk reduction per dollar spent. Follow with passive monitoring to detect anomalies. Only then consider active security tools. Organizations that reverse this sequence typically experience the disruption problems described in failed pilots. Apply for SLCGP funding to supplement internal budgets.
Q: What skills do we need that our existing IT security team lacks? A: OT security requires understanding of industrial control systems, SCADA protocols (Modbus, DNP3, OPC), safety instrumented systems, and the operational constraints of 24/7 physical processes. Consider partnerships with OT security vendors who can provide managed services while internal teams develop expertise. The CISA Cybersecurity Performance Goals provide a learning framework.
Sources
- BlackBerry, "Q3 2024 Global Threat Intelligence Report," October 2024. https://www.blackberry.com/us/en/company/newsroom/press-releases/2025/blackberry-reports-600000-cyberattacks-on-critical-infrastructure-in-q3-2024
- CISA, "State and Local Cybersecurity Grant Program," 2024. https://www.cisa.gov/cybergrants/slcgp
- Gartner, "Magic Quadrant for CPS Protection Platforms," February 2025. https://www.gartner.com/reviews/market/cps-protection-platforms
- Industrial Cyber, "2024 in Retrospect: Lessons Learned and Cyber Strategies Shaping Future of Critical Infrastructure," January 2025. https://industrialcyber.co/features/2024-in-retrospect-lessons-learned-and-cyber-strategies-shaping-future-of-critical-infrastructure/
- Nozomi Networks, "OT Cybersecurity Leaders to Deliver ETHOS Open-Source Information Sharing," August 2024. https://www.nozominetworks.com/press-release/ot-cybersecurity-leaders-to-deliver-first-open-source-information-sharing-for-collective-early-warning-in-critical-infrastructure
- U.S. Department of Energy, "Clean Energy Cybersecurity Accelerator," 2024. https://www.energy.gov/ceser/clean-energy-cybersecurity-accelerator
- Waterfall Security Solutions, "Learning From 2024's Top OT Attacks and Planning for 2025's Security," January 2025. https://waterfall-security.com/ot-insights-center/ot-cybersecurity-insights-center/learning-from-2024s-top-ot-attacks-and-planning-for-2025s-security/
Related Articles
How-to: implement Critical infrastructure cybersecurity with a lean team (without regressions)
A step-by-step rollout plan with milestones, owners, and metrics. Focus on attack paths, detection/response, and how to harden real-world systems.
Deep dive: Critical infrastructure cybersecurity — the hidden trade-offs and how to manage them
What's working, what isn't, and what's next — with the trade-offs made explicit. Focus on attack paths, detection/response, and how to harden real-world systems.
Explainer: Critical infrastructure cybersecurity — what it is, why it matters, and how to evaluate options
A practical primer: key concepts, the decision checklist, and the core economics. Focus on attack paths, detection/response, and how to harden real-world systems.