Incident response plans fail in predictable ways. The plan exists. It's been written, approved, filed somewhere. When an incident actually happens, the team discovers the plan assumes resources that aren't available, decisions that haven't been pre-authorized, and communications protocols nobody remembers. The IR plan becomes a post-incident artifact, something to reference while explaining to leadership why things went the way they did.

The difference between organizations that respond well and organizations that don't is almost entirely in the preparation. Not the plan document. The preparation: the decisions you make in advance, the infrastructure you test before you need it, and the exercises that reveal whether your program functions or just looks like it does.

The decisions that need to be made before an incident

The most expensive moment in an incident is when the response team is in a room waiting for someone with authority to make a decision. Pre-authorizing decisions eliminates that wait.

Isolation authority

Who can authorize taking a production system offline to contain a compromise? At most companies that decision, in practice, needs the CISO, the CTO, and sometimes legal, and getting that authorization takes time. Pre-authorize it with a defined threshold: "Any system with confirmed malicious activity may be isolated from the network by the security team without prior approval. Business systems with revenue impact require [role] notification within 30 minutes of isolation."

Ransom payment decision-making

If you experience a ransomware event, you'll face a ransom payment decision under time pressure. That decision has legal, financial, regulatory, and strategic dimensions, and it should not be made for the first time during the incident. Involve your general counsel, CFO, and CISO in a pre-incident conversation about your policy, what triggers re-evaluation of it, and who has final authority. Document the outcome.

External communications authority

Who can communicate with customers, partners, or the press about a security incident? What's the threshold for mandatory notification? What's the 24-hour holding statement? Decided in advance, these save hours of legal and PR review during the event.

Law enforcement notification

Do you notify law enforcement? When? For which incident types? The FBI and CISA have resources for ransomware victims that most organizations don't know how to access until they're already in the middle of an event.

The infrastructure that needs to exist and be tested

Out-of-band communication

If your email and Slack are compromised, or even if there's only uncertainty about whether they are, your primary communication channels are unavailable for incident coordination. You need a pre-established out-of-band channel: a separate email domain, a Signal group, or a voice bridge that doesn't depend on your production infrastructure. The team needs to know it exists and how to reach it before they need it.

Preserved contact information

During a credential compromise or ransomware event, your team may be locked out of internal systems. Phone numbers for your IR retainer, cyber insurance carrier, legal counsel, and key internal stakeholders need to live somewhere that doesn't require internal system access. A printed list kept physically is not a bad answer.

IR retainer

Without a pre-negotiated IR retainer, you'll be in a queue during an active incident. The major IR firms (Mandiant, CrowdStrike, Palo Alto Unit 42, and others) have retainer programs, and the fee is almost always worth it relative to the delay cost during an active compromise. Your cyber insurance carrier may also have preferred IR firms built into your policy. Check that before an event.

Forensic-grade logging

Your ability to understand what happened, contain it, and recover depends on log coverage and retention. At minimum: centralized authentication logs (Active Directory, Entra ID, Okta), endpoint telemetry from your EDR, network flow data, and cloud service logs where applicable. Thirty-day retention isn't enough for most investigations. Ninety days is a reasonable baseline; 180 days is better for elevated risk profiles.

Clean backup verification

The most common ransomware disaster isn't encryption of production systems. It's encryption of production systems plus the discovery that backups can't be restored. Test your restore capability at least quarterly. Not "verify the backup job completed," actually restore from backup to a test environment and confirm the data is intact and functional.

Roles and responsibilities

An IR plan that says "the incident response team will..." without naming specific people and alternates is not a plan. It's a description of a plan. For each function, assign a primary and at least one backup:

  • Incident commander: owns the response and coordinates all workstreams
  • Technical lead: owns investigation, containment, and remediation
  • Communications lead: owns internal and external communications
  • Legal counsel: owns regulatory notification assessment, law enforcement liaison, and legal holds
  • Executive sponsor: owns decisions requiring C-level or board authorization

Most SMB and mid-market organizations don't have a dedicated IR team. The people in these roles have day jobs. That's fine. What matters is that the roles are assigned, the people understand what's expected of them in an incident, and they've practiced it at least once.

What a tabletop exercise actually tests

A tabletop is not a test of your technical response capability. It's a test of your decision-making and communication under pressure, and that's what most teams get wrong when they design one.

A useful tabletop starts with a realistic scenario, not an unrealistically catastrophic one and not an unrealistically clean one. Something like: "It's 11pm on a Friday. Your MDR provider just called. They've detected unusual outbound data transfers from three servers in your production environment over the past six hours. They believe a service account's credentials were compromised. They're seeing lateral movement." From there, work through the actual decisions:

  • Who do you call, and in what order?
  • When do you isolate systems, and who authorizes it?
  • When do you notify your insurance carrier?
  • What do you tell the engineering team trying to push a release tomorrow morning?
  • At what point do you notify the board?
  • When do you involve external legal counsel?
  • If customer data is confirmed exfiltrated, who drafts the notification, and what's the timeline?

The value is not in getting the "right" answers. It's in discovering the gaps: the decisions where nobody had pre-authorization, the contacts nobody had numbers for, the processes that assumed tools that aren't available at 11pm. Document every gap the exercise reveals and assign an owner to close it. Re-run the exercise in six months, same scenario or a variation, to verify the gaps actually closed.

The metrics that indicate IR readiness

A KRI program for incident response readiness looks something like this:

  • Backup restore test completion rate: the share of critical systems with a successful restore test in the past 90 days. Target 100%.
  • Log coverage completeness: the share of systems generating logs that are collected, with retention meeting your standard. Gaps in coverage are gaps in your ability to investigate.
  • IR contact list currency: when the list was last reviewed and validated. Stale lists are a common failure point.
  • Tabletop recency: the date of the last exercise. Most frameworks recommend annual tabletops, semi-annual for elevated risk profiles.
  • Retainer status: an active, confirmed retainer with an IR firm. Yes or no.
  • Out-of-band communication test: when the channel was last tested. Quarterly at minimum.

The 15 minutes before you call your IR firm

When an incident starts, the first 15 minutes determine a lot about how the next 72 hours go. A few things to do immediately.

Don't start remediating before you understand the scope. The instinct on finding a compromised system is to clean it. A compromised system is evidence, and cleaning it before forensics are complete destroys the data you need to understand the attack path. Contain first, investigate before you remediate.

Preserve everything. Enable enhanced logging immediately if it isn't already running. Snapshot affected systems if your environment allows. Put a legal hold on relevant data if customer data may be involved.

Document the timeline in real time. Start a running log of everything you know: when the alert fired, what it showed, what actions were taken and by whom. This becomes critical for regulatory reporting, insurance claims, and post-incident review.

Start the clock on your notification obligations. Most state breach notification laws have timelines that begin at "discovery," not when the breach occurred but when you knew about it. Your legal counsel needs to be in the loop early enough to assess and manage those timelines.

Readiness you can see before the call.

Draxis tracks your IR readiness KRIs, backup verification rates, log coverage, retainer status, alongside your security posture indicators. When something's missing, it shows up in your risk posture before an incident, not during one.

See how Draxis monitors IR readiness →