The binder was impressive. Sixty-two pages, indexed tabs, version-controlled, signed by the COO. It had survived three audits and a cyber insurance review. It listed roles, escalation paths, external contacts, and a recovery time objective of four hours. The only thing it had never survived was an actual incident. We found that out twenty minutes into a tabletop exercise last spring with a Hudson Valley manufacturer—250 employees, $80 million in annual revenue, three production lines running single-shift. Twenty minutes before the plan stopped making contact with reality.
The scenario was straightforward: ransomware detected on two workstations in the plant floor network at 6:45 a.m. on a Tuesday. The security team in the exercise—IT manager, operations director, HR lead, and the CFO—opened the plan and began working through it. That's when the gaps started surfacing, one after another, with the quiet inevitability of a leak finding every crack in a hull.
The plant manager wasn't in the room. He wasn't supposed to be—the IR plan didn't list him as part of the response team. But his name appeared in three places as a required decision-maker for production shutdowns. Nobody had told him. He learned he was in the communication chain about forty seconds after we pointed it out.
An IR plan that hasn't been tested isn't a plan. It's a hypothesis. And Tuesday is not when you want to find out it was wrong.
The backup vendor contact was next. The IR plan listed a support number and a named account manager. The number went to general support. The account manager had left the company. The managed backup contract itself had lapsed eleven months prior—auto-renewal had failed when the company changed its accounts payable platform and the invoice routing broke. Nobody had noticed, because backups were still running. They just weren't monitored, tested, or under any service agreement.
When we asked the team how long since they'd done a full restore test, the IT manager said six months. When we pulled the actual log records, it was fourteen months. There is always a gap between what people remember and what the records show. That gap is where incidents become disasters.
The Difference Between a Document and a Capability
An IR plan that lives in a document repository and gets updated for audits is not a capability. It is a compliance artifact. The distinction matters enormously, because the two require completely different things to maintain.
A compliance artifact needs to be current, accurate, and accessible. A capability needs to be practiced, tested, and load-bearing under pressure. You can maintain a document through annual review cycles. You cannot maintain a capability that way. Capabilities atrophy. Contacts change. Vendor relationships expire. People leave, and institutional memory walks out with them. The plan you wrote eighteen months ago reflects a company that no longer exists.
Compliance Artifact
Updated annually. Passes audit. Reviewed by security team. Lives in SharePoint. Nobody reads it under pressure.
Operational Capability
Tested quarterly. Roles walked. Contacts verified. Backup restores validated. The team has muscle memory, not just documentation.
The manufacturer's plan was a compliance artifact. It described what the company intended to do. It did not reflect what the company was actually capable of doing on a given Tuesday morning, with the specific people on shift, using the specific tools under contract, within the constraints of a plant that cannot afford more than a few hours of unplanned downtime before the margin math turns ugly.
This is not a failure of intent. It is a structural problem that affects most mid-size organizations without a dedicated security operations function. The plan was built once, audited periodically, and never stress-tested against the friction of real conditions. That's not negligence—it's the default state of IR planning in organizations where security is one of a dozen competing priorities and nobody owns readiness as a continuous function.
Why Manufacturing Is Uniquely Vulnerable
Every industry has IR challenges. Manufacturing has a set of structural conditions that make those challenges significantly harder, and most IR frameworks were not designed with those conditions in mind.
The first is OT/IT convergence. Over the past decade, plant floor systems have become progressively more networked. SCADA systems, programmable logic controllers, human-machine interfaces, and MES platforms now share network infrastructure with corporate IT environments. The security architecture almost never kept pace with that connectivity. The result is an attack surface that spans both the business network and operational technology environments—two domains with fundamentally different security tolerances, patching cycles, and recovery requirements.
The OT/IT Convergence Problem in Plain Terms
Your IT network can be taken offline, rebuilt, and restored from backup. Your production line cannot. A PLC running decades-old firmware, connected to a network that also connects to your email system, does not have a four-hour recovery window. It has a recovery window measured by how long your customers will wait before going somewhere else.
- Many OT systems cannot be patched without vendor involvement and scheduled maintenance windows
- Ransomware that crosses from IT to OT can brick equipment, not just encrypt files
- Network segmentation between OT and IT is inconsistent or nonexistent in most mid-size plants
- OT vendors may have remote access that bypasses corporate security controls entirely
- IR runbooks written by IT teams routinely fail to account for plant floor dependencies
The second structural condition is production pressure. Manufacturing operates on narrow margins and tight schedules. Downtime is not an abstraction—it is a number per hour with a decimal point and a direct line to customer relationships and cash flow. When an incident occurs, the pressure to restore production is immediate and real. That pressure directly competes with the disciplined, methodical containment process that effective incident response requires. In practice, the pressure usually wins, and the response gets shortened or skipped in ways that create secondary problems.
The third condition is thin security staffing. An $80 million manufacturer with 250 employees does not have a security operations center. It has an IT manager who also handles the help desk, network infrastructure, and vendor management. That person is technically capable but structurally overwhelmed during a crisis. The IR plan assumes capabilities and bandwidth that don't exist in the building.
The Tabletop That Exposed Everything
Tabletop exercises work because they create consequences without creating damage. The scenario is real enough to surface genuine gaps—the confusion, the missing contacts, the process steps nobody has actually done—without burning down the house. What follows is a compressed walk-through of how the exercise unfolded and what it found.
Scenario Inject — 06:45 Tuesday
None of these failures were exotic. Expired vendor contracts, outdated role assignments, untested backups, missing network documentation, absent stakeholders—these are not sophisticated security problems. They are operational maintenance failures. They happen because nobody owns the job of keeping IR readiness current as a continuous function.
The COO in the room was quiet for most of the debrief. At the end, he said something I've heard variations of in a lot of rooms: "I thought we had this covered. We paid for the plan. We passed the audit." That's the document-vs-capability gap in a sentence. Payment and passage don't produce readiness. Practice does.
The four-hour RTO looked reasonable on paper. In the room, with real people and real gaps, we couldn't make a containment decision in under two hours—and production hadn't been touched yet.
What Response Readiness Actually Looks Like
Auditors accept documentation. Incidents test capability. Those two standards are not the same, and organizations that conflate them are making a category error that will cost them when the event is real.
Response readiness has three components that no document alone can satisfy: people who know their role under pressure, processes that have been executed at least in rehearsal, and tools and relationships that have been verified recently enough to be trusted. Strip any one of those three and you have a plan that will fail under stress.
People Readiness
Every person in the communication chain knows they're in it. Decision authority is explicit. Escalation paths are practiced, not just written.
Process Readiness
Playbooks have been walked, not just written. Containment steps are understood by the people who will execute them, not just the person who wrote them.
Technical Readiness
Backups have been tested within 90 days. Vendor contacts are verified active. Network documentation reflects current reality. Restore procedures exist and work.
The distinction auditors miss—because most audit frameworks are not designed to test it—is temporal. A plan that was accurate and practiced twelve months ago may be neither today. People leave. Contracts expire. Networks change. Mergers happen. The audit cadence and the decay rate of IR readiness are not synchronized, and in most mid-size organizations the decay rate wins.
True readiness requires a maintenance cadence that matches the decay rate: quarterly at a minimum for contact verification and backup testing, semi-annually for tabletop exercises, and annually for full plan review incorporating any infrastructure or organizational changes. That is not a heavy lift. It is a scheduled, lightweight discipline that most organizations skip because nothing has gone wrong yet.
The Cost Argument: Downtime Is Not Abstract
The conversation about IR investment always comes back to cost. Quarterly exercises, backup validation, vendor management, OT/IT network documentation—these are not free. For a mid-size manufacturer operating on 8-12% EBITDA margins, every discretionary dollar is competitive. The argument for IR readiness has to be made in the same language as every other capital decision: what does it cost, and what is the risk of not doing it?
For the manufacturer in this case study, we worked through the numbers explicitly. The calculation is not complicated, but it requires intellectual honesty about what downtime actually costs when you account for all of it.
| Cost Category | Conservative Estimate | Notes |
|---|---|---|
| Direct production downtime | $18,000–$24,000/hr | Based on revenue/operating hours, excluding fixed overhead absorption |
| Incident response retainer / emergency IR | $35,000–$85,000 | Emergency IR engagement without a retainer runs at premium rates; retainers average $20–40K/yr |
| Regulatory notification and legal | $15,000–$50,000 | Depends on data exposure; NY SHIELD Act notification obligations apply |
| Customer penalty clauses / SLA breach | Contractual | Manufacturing customers often have late-delivery penalties; quantify your exposure specifically |
| Cyber insurance deductible | $25,000–$100,000 | Deductibles have risen significantly; verify your current policy terms |
| Reputational / competitive impact | Unquantified | Single-source customers represent outsized risk; recovery timeline matters as much as the event itself |
A 24-hour production disruption at this manufacturer lands conservatively between $600,000 and $900,000 in direct costs, before legal fees, before the insurance deductible, before any customer penalties. The cost of a quarterly IR readiness program—contact verification, backup testing, one tabletop per year, maintaining an IR retainer—runs roughly $30,000 to $50,000 annually when properly scoped for a company this size.
That is not a security spend. That is operational insurance with a calculable premium and a calculable exposure. The CFO in the room ran the math in about ninety seconds. The readiness program was approved before the debrief was over.
Your cyber insurance covers the aftermath. IR readiness compresses the duration. Those are not the same protection, and only one of them keeps your customers from calling your competitor.
One more cost consideration that doesn't get enough attention: cyber insurance carriers are paying attention to IR readiness now. Carriers who write manufacturing policies are increasingly requiring evidence of tested incident response programs—not just documented plans. Organizations without that evidence are seeing coverage terms tighten, deductibles rise, and in some cases, coverage denied for incidents where negligent readiness contributed to the extent of the damage. The market is pricing readiness. Your premium already reflects it.
Five Things Every Manufacturer Should Test Quarterly
This is not a comprehensive IR program. It is a minimum viable readiness cadence—the five activities that, done consistently every quarter, keep the gap between your documented plan and your actual capability from becoming dangerous. Each one takes between 30 minutes and two hours. None of them require outside consultants to execute once you've built the habit.
-
Verify every contact in your IR plan by attempting to reach them. Call the vendor support lines. Confirm the account manager is still there. Test the after-hours number. Send a test email to the IR retainer. If a contact fails, fix it before you need it. This takes under an hour and eliminates one of the most common points of failure in real incidents.
-
Execute and document a full restore from backup. Not a backup check. A restore. Stand up a test system and recover actual data from your most recent backup set. Document the time, the method, and any issues encountered. If you cannot restore in under four hours in a test environment, your RTO is a fiction. Do this quarterly, not annually.
-
Walk the communication chain with every person named in it. Sit down with every role listed in your IR plan—including operations and plant management—and confirm they know they're in it, what they're expected to decide, and how they'll be reached. The plant manager who doesn't know he's a decision authority is a gap that costs nothing to close and potentially everything to leave open.
-
Verify OT/IT network segmentation hasn't drifted. Networks change. A new vendor connection, a system migration, a shortcut taken during a maintenance window—any of these can silently erode the boundary between your IT network and your production environment. Quarterly verification that your network documentation reflects current reality is the difference between a contained IT incident and a production floor shutdown.
-
Run a 90-minute tabletop exercise twice per year—with the right people in the room. "The right people" means operations leadership, not just IT. The decisions that matter most during a manufacturing incident—production shutdown, customer notification, inventory management, shift scheduling—are not IT decisions. They require the plant manager, the COO, and someone who can speak to customer commitments. Run the scenario with them present, not in a debrief afterward.
A Note on OT-Specific Readiness
If your production environment includes networked SCADA systems, PLCs, or any operational technology connected to your IT network, add two items to your quarterly list that most generic IR frameworks omit entirely.
- Confirm which OT systems have active vendor remote access and verify those connections are properly controlled and logged
- Document the recovery sequence for production systems specifically—the order in which systems must be restored to bring a line back up safely, and who authorizes each step
- Identify which OT systems cannot be restored from backup and require vendor involvement—that call needs to be in the plan before the incident, with verified contacts
- Test your ability to isolate the OT network from the IT network without taking production down—if you can't do it cleanly, that's a design problem with a cost attached to it
The manufacturer from this case study completed their first full quarterly readiness review three weeks after the tabletop. They found two more expired vendor contracts, updated the communication chain to include the plant manager and shift supervisors, and conducted their first documented backup restore in fourteen months—which partially failed and exposed a configuration problem in their backup agent that had been silently degrading their backup integrity for six weeks.
They found it in a test. Not on a Tuesday.
That is exactly what this work is for. The plan in the binder was not the problem. The absence of a practice that keeps the plan honest was. Fix the practice. The binder takes care of itself.