Every other security domain feeds the SOC. Identity, endpoint, network, and application controls all generate signals that someone has to detect, triage, and act on. When the SOC is not working, the spend on every one of those controls is partly wasted, because nobody is watching the alerts they produce.
The KRIs below measure three things: how well you detect, how fast you respond, and whether the operational plumbing (SIEM tuning, log coverage, analyst capacity) holds up. A SOC that fires 50,000 alerts a day can still be blind to the attack that matters. A quiet console can be quiet because nothing is watching. These indicators tell you which one you have. For the full catalog with thresholds mapped to CIS Controls, see the KRI reference library, and if you are standing up measurement from scratch, start with how to build a KRI program.
KRI inventory
Eight indicators, each with what to measure, why it matters, where the data comes from, how to calculate it, and green/amber/red thresholds. The thresholds are starting points, tune them to your environment and threat model.
1. Alert-to-investigation ratio (alert fidelity)
What to measure. The share of security alerts that result in actual analyst investigation, expressed as the percentage of alerts not auto-closed, suppressed, or discarded without human review.
Why it matters. Alert fatigue is the leading cause of SOC analyst burnout and of missed detections. A SIEM generating 50,000 alerts a day with 99% auto-closed still leaves 500 for review. If that is too many, analysts start ignoring the queue. This ratio tells you whether your alerting produces useful signal or noise.
Derivation sources:
- SIEM platform (Splunk Enterprise Security, Microsoft Sentinel, IBM QRadar, Elastic SIEM): alert volume and disposition breakdown (
auto-closed,investigated,escalated,false_positive) - SOAR platform (Palo Alto XSOAR, Splunk SOAR, Tines): playbook execution logs showing auto-disposition vs. human review
- Ticketing system (ServiceNow, Jira, PagerDuty): tickets created from security alerts vs. total alert volume
- MDR provider reports: escalation rate vs. total alert volume, if you use an MDR
How to calculate. (Alerts resulting in human investigation) ÷ (total alerts generated) × 100. Also track false positive rate: (alerts investigated and closed as false positive) ÷ (total investigated alerts) × 100.
| Status | Criteria |
|---|---|
| Green | >10% of alerts result in meaningful investigation; false positive rate <30% of investigated alerts |
| Amber | 2–10% fidelity rate; or false positive rate 30–60% |
| Red | <2% fidelity rate (noise-dominated); or false positive rate >60% (analysts have stopped trusting alerts) |
2. Detection coverage rate
What to measure. The percentage of MITRE ATT&CK tactics and techniques relevant to your threat model that have at least one active detection rule or analytic in your SIEM or detection platform.
Why it matters. Detection coverage gaps are blind spots. An attacker using a technique you have no detection for can operate indefinitely without alerting. Mapping your detection rules to ATT&CK shows which attack paths are visible and which are not, which is the foundation of any mature detection engineering program.
Derivation sources:
- SIEM rule inventory: export all active detection rules and map to ATT&CK technique IDs (many SIEMs support ATT&CK tagging natively)
- ATT&CK Navigator: visual mapping of covered vs. uncovered techniques
- Detection engineering platforms (Sigma rules, Microsoft Sentinel Analytics, Chronicle YARA-L): rule libraries with ATT&CK mappings
- EDR platform: built-in detections mapped to ATT&CK (CrowdStrike Falcon, SentinelOne)
- Threat intelligence: your threat model, which actors and techniques are most relevant to your industry
How to calculate. (ATT&CK techniques with at least one active, tested detection) ÷ (total ATT&CK techniques in scope for your threat model) × 100.
| Status | Criteria |
|---|---|
| Green | >70% coverage of threat-model-relevant techniques; coverage mapped and reviewed quarterly |
| Amber | 40–69% coverage; or coverage mapped but not validated through purple team testing |
| Red | <40% coverage; or no ATT&CK-mapped detection inventory; or detection rules deployed but never tested |
3. Mean time to detect (MTTD) by severity
What to measure. Average hours from the earliest log evidence of attack behavior to the creation of an alert or incident ticket, segmented by incident severity (critical, high, medium).
Why it matters. MTTD is the most predictive single metric for breach cost. It measures the gap between when an attacker starts doing damage and when your team knows about it. Segmenting by severity shows whether detection is tuned for the right priority threats. For watching this trend slide before it becomes an incident, see reading the signal on drifting KRIs.
Derivation sources:
- SIEM: first-indicator-of-compromise log timestamp vs. alert creation timestamp, per closed incident
- Incident management platform: incident timeline fields (
event_start,detected,contained), which require post-incident review discipline to populate accurately - EDR (Microsoft Defender for Endpoint): process start time of malicious activity vs. detection or alert timestamp
- SOAR: automated incident timeline reconstruction from log correlation
How to calculate. For each closed incident: (detection timestamp) − (earliest confirmed event timestamp from forensic review). Average by severity tier.
| Status | Criteria |
|---|---|
| Green | Critical <1 hour; high <4 hours; medium <24 hours |
| Amber | Critical 1–4 hours; high 4–24 hours |
| Red | Critical >4 hours; or no MTTD measurement capability (no post-incident log correlation) |
4. SIEM log source coverage and health
What to measure. The percentage of expected log sources actively ingesting into the SIEM within the defined freshness window, plus the rate of log source health failures: sources that stop sending logs without alerting the security team.
Why it matters. A SIEM is only as good as the data feeding it. Log sources fail silently. A server stops sending syslog, a cloud trail stops after a config change, a network device drops connection, and detection goes blind for that source with no alert. Without proactive coverage monitoring, you discover the gap during post-incident forensic reconstruction, not before.
The silent failure that costs you the incident
The most expensive log gap is the one you find while writing the post-incident report. A source that quietly stopped reporting three weeks ago means the detection that depended on it never had a chance to fire. Alert on source disconnection, not just on the detections themselves.
Derivation sources:
- SIEM platform (Splunk Enterprise Security): source inventory config vs. sources with events in the last 24 hours; data freshness dashboard
- Expected source inventory: CMDB critical asset list cross-referenced against the SIEM source list
- Log management platform (Cribl, Logstash, Fluentd): pipeline health metrics, source disconnection events
- Cloud logging services (CloudTrail, Azure Monitor, GCP Cloud Logging): log delivery health checks
How to calculate. Coverage: (active SIEM sources with events in last 24h) ÷ (expected sources from critical asset inventory) × 100. Health: (log source disconnection events per month with alert generated) ÷ (total disconnection events) × 100.
| Status | Criteria |
|---|---|
| Green | >99% of critical sources actively ingesting; source disconnection alerting active; gaps investigated within 1 hour |
| Amber | 95–98% active; or source loss detection latency >4 hours |
| Red | <95% active; or no monitoring for source loss; or critical sources offline discovered during an incident |
5. Mean time to contain (MTTC)
What to measure. Average hours from incident detection to confirmed containment: attacker access removed, malware isolated, lateral movement stopped.
Why it matters. Detection without fast containment still produces large losses. The gap between detection and containment is the window during which data exfiltrates, ransomware spreads, and persistence gets established. MTTC is the second half of the damage equation after MTTD.
Derivation sources:
- Incident management platform: containment action timestamps per incident (needs a defined
containment_confirmedstate in the workflow) - EDR (Microsoft Defender for Endpoint, CrowdStrike Falcon): endpoint isolation timestamps vs. alert creation timestamps
- Identity platform: account disable timestamps for compromised accounts vs. detection
- Network: firewall or NAC block rule creation for isolating compromised segments
- SOAR: automated containment playbook execution timestamps
How to calculate. For each incident: (containment confirmed timestamp) − (detection timestamp). Average by severity tier.
| Status | Criteria |
|---|---|
| Green | Critical <2 hours; high <8 hours |
| Amber | Critical 2–8 hours; high 8–24 hours |
| Red | Critical >8 hours; or containment not tracked as a discrete milestone |
6. Playbook coverage rate
What to measure. The percentage of common incident types with a documented, tested response playbook covering escalation paths, containment actions, communication templates, and evidence preservation.
Why it matters. Incident response without playbooks degrades to improvisation under pressure. Playbooks encode institutional knowledge, cut response time by removing decisions during triage, and keep actions consistent across shifts and analysts. Coverage rate measures whether your most common incident types have the operational support they need.
Derivation sources:
- SOAR platform: automated playbook inventory and activation rates
- Incident tracking system: incident type taxonomy vs. playbook existence
- SOC runbook documentation (Confluence, internal wiki): manual playbook library with last-tested dates
- Incident history: trailing 12-month incident type frequency. High-frequency types with no playbook are the highest-priority gaps
How to calculate. (Incident types with a documented and tested playbook) ÷ (total incident types in taxonomy, weighted by frequency) × 100.
| Status | Criteria |
|---|---|
| Green | >90% of incident types by frequency covered; playbooks tested at least annually; automated playbooks in SOAR for the top 5 incident types |
| Amber | 70–89% by frequency; or playbooks documented but not tested |
| Red | <70% by frequency; or no playbooks for the top 3 most frequent incident types |
7. Analyst capacity utilization rate
What to measure. Analyst productive time on investigation and response as a percentage of total available analyst hours, a measure of whether SOC capacity matches the alert and incident workload.
Why it matters. Overloaded analysts miss alerts, cut corners on investigations, and churn. Underutilized analysts represent misallocated investment. Neither extreme produces good security outcomes. Capacity utilization is the leading indicator of SOC burnout and the operational number that justifies staffing decisions.
Derivation sources:
- SIEM and SOAR: mean time per alert investigation × alert volume = estimated investigation hours
- Ticketing system: ticket open and close rates and time-in-progress per analyst
- Shift schedules: available analyst hours per period
- Time tracking, if used: actual investigation time logs
- Analyst feedback: qualitative signal on workload through regular 1:1s or surveys
How to calculate. Target utilization is 60–70% of analyst time on active investigation, leaving 30–40% for tuning, training, and documentation. Alert load per analyst: total alerts requiring human review ÷ analyst headcount per shift. Investigation completion rate: percentage of alerts triaged within the defined SLA.
| Status | Criteria |
|---|---|
| Green | 60–75% investigation utilization; <10% of alerts missing triage SLA |
| Amber | >80% utilization (overloaded) or <40% (underutilized, often a sign of alert fatigue or suppression); or >20% of alerts missing SLA |
| Red | Systematic SLA breaches; analyst attrition above industry baseline; or no capacity measurement |
8. Threat intelligence integration rate
What to measure. The percentage of active threat intelligence feeds whose indicators (IOCs) are operationally integrated into detection tooling (firewalls, SIEM, EDR, email gateway) with automated blocking or alerting on matched indicators.
Why it matters. Threat intelligence that lives in a portal but is not wired into detection tools is intelligence nobody is acting on. Integration rate measures whether your threat intel spend produces operational outcomes or just adds context for analysts who are already overloaded.
Derivation sources:
- Threat intelligence platform (Recorded Future, Intel 471, MISP, OpenCTI): feed inventory and indicator export status
- SIEM: threat intelligence indicator match rules, confirming whether feeds contribute to detection
- Firewall or proxy: IP and domain block lists sourced from threat intelligence
- EDR: threat intelligence integration for IOC matching
- SOAR: automated IOC enrichment in incident investigation playbooks
How to calculate. (TI feeds with automated indicator distribution to at least one detection tool) ÷ (total TI feeds subscribed) × 100. Also track IOC match rate: active detections triggered by threat intelligence IOCs per month.
| Status | Criteria |
|---|---|
| Green | >90% of TI feeds producing integrated IOCs to detection tools; IOC match rate tracked monthly |
| Amber | 60–89% integrated; or TI feeds in platform but analyst-driven rather than automated |
| Red | <60% integrated; or TI subscriptions with no operational integration; or IOC match rate unknown |
Deriving these KRIs by source type
Where each signal comes from, and the query patterns that get you there. The exact syntax depends on your platform, but the shape of the calculation carries across tools.
From SIEM platforms (Splunk, Microsoft Sentinel, Elastic, QRadar)
- Alert volume and disposition:
index=alerts | stats count by status | where status IN ("investigated","auto-closed","false_positive"), adjusted to your SIEM query language. See the Splunk Enterprise Security setup guide. - Log source health: built-in data freshness dashboards (Sentinel:
Heartbeattable; Splunk:| tstats count where index=* by hostcompared to the expected source list) - MTTD calculation: join alert creation time with earliest correlated event time, which needs accurate incident timeline logging or SOAR enrichment
- ATT&CK coverage: tag detection rules with ATT&CK technique IDs, then
| stats dc(technique_id) as covered_techniquesagainst the total in-scope technique count
From SOAR platforms (Palo Alto XSOAR, Splunk SOAR, Tines, Cortex)
- Playbook activation rates: which playbooks run most often, identifying the highest-volume incident types
- Automated vs. human action ratio: actions taken by playbook vs. actions requiring an analyst decision
- Containment action timestamps: automated isolation and block actions carry precise timestamps for MTTC calculation
- Analyst workload: playbook escalation rate to a human analyst, by playbook type
From EDR and XDR platforms (CrowdStrike Falcon, SentinelOne, Microsoft Defender)
- Detection-to-isolation time: endpoint isolation timestamp vs. detection alert creation, an MTTC component (CrowdStrike Falcon, Microsoft Defender for Endpoint)
- ATT&CK technique coverage: built-in ATT&CK mapping in detection events (Falcon Intelligence, SentinelOne MITRE report)
- Alert volume by technique: high-volume, low-fidelity techniques that need tuning vs. low-volume, high-fidelity detections
- Agent health: reporting agents vs. total managed endpoints, which feeds the coverage KRI
From threat intelligence platforms (Recorded Future, MISP, OpenCTI)
- Feed freshness: indicator last-updated timestamps, since stale TI feeds produce outdated IOCs
- IOC distribution audit: which feeds are configured to export to SIEM, firewall, and EDR, giving integration coverage
- Match rate: cross-reference TI IOCs against SIEM logs, for example
| lookup threat_intel_iocs dest_ip as dest_ip OUTPUT confidencein Splunk - Threat actor tracking: coverage of threat actors relevant to your industry vertical, as a gap analysis
From incident management (PagerDuty, ServiceNow, Jira)
- Incident timeline fields: populating
event_start,detected,contained, andresolvedtimestamps takes discipline; automate via SOAR integration where possible - SLA compliance: configure SLA timers in ServiceNow or Jira for MTTD and MTTC targets, and track the breach rate
- Incident type taxonomy: standardize incident classification to enable the playbook coverage calculation
- Post-incident review completion rate: percentage of incidents above the severity threshold with a completed PIR, which feeds back into detection improvement
These signals do not live in one place, which is why deriving them by hand means weeks of stitching across the SIEM, SOAR, EDR, and ticketing systems. The point of tracking them as KRIs is to make the stitching continuous instead of a once-a-quarter scramble. Related domains share many of these sources: see the sibling guides on enterprise security KRIs, security architecture KRIs, and identity and access management KRIs.
See your SOC KRIs without the manual stitching
Draxis pulls MTTD, MTTC, detection coverage, alert fidelity, and log source health straight from your SIEM, SOAR, and EDR, and tracks them in real time so you know when detection or response is drifting before it costs you an incident.
Request access →