KRI Series: Data Security & Data Governance

The KRIs in this domain measure the data management program that determines what you can protect, who can access it, and how long you keep it. They cover classification coverage, access control quality, data discovery completeness, retention enforcement, and the DLP controls that catch data leaving your environment.

If you are standing this up from scratch, start with how to build a KRI program and the consolidated KRI reference library, which maps every domain to one CIS-aligned catalog.

In this guide

KRI inventory
Deriving these KRIs by source type

Framework mapping

CIS Controls v8

The KRIs in this domain implement and measure these CIS Critical Security Controls:

CIS 3, Data Protection. Classification coverage, DLP enforcement, retention, and encryption.
CIS 6, Access Control Management. Sensitive-data access entitlement and least privilege.

KRI inventory

1. Data classification coverage rate

What to measure. Percentage of known data stores (databases, file shares, cloud storage, SaaS application data repositories) with a current, documented sensitivity classification, and the percentage where classification was applied through automated discovery rather than self-declaration.

Why it matters. You cannot apply encryption, access controls, retention policies, or DLP rules to data you haven't classified. Classification is the prerequisite for every other data security control. Self-declared classification by data owners is systematically optimistic, automated discovery consistently finds sensitive data in stores that owners classified as non-sensitive.

Data discovery and classification tools (Microsoft Purview, Varonis, Spirion, BigID, Nightfall): scan results showing data stores with classification status
Cloud storage: Macie (AWS), Purview (Azure), DLP API (GCP), sensitive data classification in cloud storage
Database activity monitoring: database inventory with classification metadata
CMDB: data store inventory cross-referenced against classification records
SaaS data discovery: tools that discover sensitive data in SaaS applications (Varonis SaaS, Netskope, CASB solutions)

How to calculate. (Data stores with current classification applied within 12 months) ÷ (total known data stores) × 100 Also: automated vs. self-declared classification rate, automated is more reliable

Status	Criteria
Green	>90% of known data stores classified; automated discovery active and covering cloud + SaaS + on-premises; classification reviewed annually
Amber	75–89%; or classification predominantly self-declared; or SaaS data stores not in scope
Red	<75%; or no automated data discovery; or classification program limited to on-premises only

2. Sensitive data access entitlement coverage

What to measure. Percentage of users and service accounts with access to data stores classified as confidential or restricted, confirmed to have legitimate, current, reviewed access entitlements versus access that was granted and never reviewed.

Why it matters. Sensitive data access sprawl is the consequence of years of permission grants without systematic reviews. Every former employee whose access wasn't revoked, every service account with database permissions granted for a now-complete project, every user whose role changed but whose access wasn't updated is an over-privileged identity with access to data they shouldn't have. This is both a breach risk and a compliance exposure.

IGA platform (SailPoint, Saviynt): data access entitlement reviews, reviewed vs. unreviewed access to sensitive data stores
Database access management: database user roles and permissions cross-referenced against HR active roster
Cloud IAM (AWS, Azure, GCP): data store access policies (S3 bucket policies, Azure RBAC, BigQuery IAM), enumerate who has access to sensitive data stores
Active Directory: group memberships granting file share access to sensitive data, membership vs. HR active roster

How to calculate. (Access entitlements to sensitive data stores confirmed appropriate in last 12-month review cycle) ÷ (total access entitlements to sensitive data stores) × 100

Status	Criteria
Green	>90% of sensitive data access entitlements reviewed in last 12 months; automated access review workflow for sensitive data classifications; revocations executed within 48 hours of review decision
Amber	70–89%; or access review cycle >12 months; or revocation lag >5 days
Red	<70%; or no access review process for sensitive data; or service accounts with unreviewed access to production databases

3. Data retention and destruction compliance rate

What to measure. Percentage of data stores with documented retention policies being enforced, data deleted or archived at the end of its defined retention period, and the rate of confirmed policy violations where data was retained beyond the defined period without approval.

Why it matters. Retaining data beyond its required period is a regulatory liability, not just an operational inefficiency. Regulators fine organizations for retaining personal data beyond defined periods under GDPR, CCPA, and HIPAA. In litigation, data you retained beyond your retention policy that you should have destroyed can be compelled in discovery. Less data stored is less data at risk.

Data governance platform: retention policy assignments per data category and enforcement status
Storage systems: storage age analysis, data older than retention period with no archival or deletion action
DLP platform: data retention policy compliance reports
Email and collaboration: email archiving and auto-deletion configuration (Exchange retention policies, Google Vault, Slack retention settings)
Cloud storage lifecycle policies: S3 lifecycle rules, Azure Blob lifecycle management, GCP Object Lifecycle Management, automated deletion enforcement

How to calculate. (Data stores with retention policy defined and enforced through automated deletion or archival) ÷ (total data stores with applicable retention policy) × 100 Also: data identified beyond retention period and not yet deleted or archived, absolute count

Status	Criteria
Green	>95% of data stores with automated retention enforcement; data identified beyond retention period actioned within 30 days; privacy-sensitive categories with shortest retention periods at 100%
Amber	80–94%; or retention enforcement manual and periodic rather than automated; or known violations under remediation
Red	<80%; or no retention policy enforcement; or personal data retained indefinitely with no policy

4. Data minimization and shadow data rate

What to measure. Rate at which automated discovery identifies sensitive data in locations it shouldn't exist, personal data in development environments, financial data in unstructured file shares, PII in application logs, test databases containing real production data.

Why it matters. Shadow data, sensitive data that exists outside governed, protected repositories, is among the most common breach sources. Development databases seeded with production data, log files capturing PII for debugging, spreadsheet exports of customer records saved in personal folders, these exist in almost every organization. Data minimization programs actively identify and eliminate them.

Data discovery tools (BigID, Varonis, Spirion, Nightfall): scans of development environments, file shares, log storage, collaboration tools for out-of-policy sensitive data
Code scanning (Semgrep, custom regex): source code repositories for hardcoded PII or sensitive data patterns
Cloud storage scanning (Macie, Purview): unstructured cloud storage with sensitive data classifications
Email/collaboration: DLP policies scanning for PII in Slack, Teams, Google Drive, email attachments

How to calculate. Track as a trend metric:

Monthly count of shadow data instances discovered (sensitive data found outside governed repositories)
Remediation rate: (Shadow data instances remediated within 30 days) ÷ (total discovered) × 100

Status	Criteria
Green	Automated shadow data discovery active; remediation rate >90% within 30 days; development environments confirmed free of production data
Amber	Discovery active but remediation lagging; or development environments not in discovery scope
Red	No shadow data discovery; or confirmed production data in development environments; or sensitive data in application logs without scrubbing

5. DLP enforcement coverage and effectiveness

What to measure. Percentage of sensitive data transmission channels covered by Data Loss Prevention controls in enforcement (blocking) mode, including email, cloud upload, USB, web upload, collaboration tools, and printing, and the bypass rate (data leaving successfully despite DLP controls).

Why it matters. DLP in detection-only mode documents data loss without preventing it. Coverage rate tells you which channels are protected; bypass rate tells you whether protection is working. Gaps in coverage (e.g., DLP covering email but not cloud upload or collaboration tools) are the channels exfiltration actually uses.

DLP platform (Microsoft Purview DLP, Forcepoint, Symantec DLP, Zscaler): channel coverage configuration (email, web, endpoint, cloud, print)
Enforcement mode status: policy mode per channel, test/audit/enforce; count channels in enforce vs. audit
DLP event logs: blocked events, bypassed events, user override events, calculate bypass rate
CASB (Netskope, Zscaler, Microsoft Defender for Cloud Apps): cloud application DLP coverage

How to calculate.

Coverage: (Transmission channels with DLP in enforcement mode) ÷ (total sensitive data transmission channels in scope) × 100
Bypass rate: (Successful transmissions of classified data past DLP controls) ÷ (total classified data transmissions detected) × 100

Status	Criteria
Green	>90% channel coverage in enforcement mode; bypass rate <0.5%; CASB covering cloud upload channels
Amber	70–89% coverage; or key channels (cloud upload, collaboration) in detection-only; or bypass rate 0.5–2%
Red	<70% coverage; or DLP universally in detection mode; or bypass rate >2%

6. Data breach impact scope readiness

What to measure. Whether the organization can rapidly determine the scope of a data breach, specifically: which data stores were affected, what data categories were contained in those stores, how many records were involved, and who the affected data subjects are. Measured by time required to produce this assessment in a simulated scenario.

Why it matters. Regulatory breach notification timelines (72 hours for GDPR, 30 days for HIPAA, 30–72 hours for US state laws) are measured from "becoming aware." The ability to rapidly scope what was affected determines whether notification timelines can be met. Organizations with current data maps and automated inventory can scope a breach in hours. Organizations without them spend weeks in forensic investigation before they can notify.

Data map / ROPA (Records of Processing Activities): documented data flows per system, per data category, per jurisdiction
CMDB + data classification: cross-reference to identify affected systems from incident forensics → data categories at risk
Identity/access logs: access to affected data stores, who accessed what during the attack window
DBA tooling: database row count and PII column identification for rapid record count estimation

KRI values.

Data map currency: last update date of data flow documentation
Scope assessment exercise: annual tabletop scenario testing time to produce breach scope assessment, measure in hours
ROPA completeness: (Processing activities documented in ROPA) ÷ (total processing activities in scope) × 100

Status	Criteria
Green	Data map updated within 12 months; breach scope assessment exercise completed in <4 hours; ROPA >90% complete
Amber	Data map 12–24 months old; or scope assessment exercise >8 hours; or ROPA 70–89%
Red	Data map >24 months old; or no breach scope assessment capability; or ROPA <70% or nonexistent

Deriving these KRIs by source type

From Microsoft Purview (formerly AIP/Azure Information Protection)

Classification coverage: GET /informationProtection/sensitivityLabels + SharePoint / Teams data scanning reports; compliance center Data Classification dashboard shows labeled vs. unlabeled content counts
DLP policy status: Compliance Center → Data Loss Prevention → Policies; enforcement mode per policy; simulation results
Activity Explorer: Shows labeling activity, DLP matches, sensitivity label changes, feeds classification trend and DLP effectiveness metrics
Insider Risk Management: Exfiltration signals aggregated (not individual), feeds shadow data and DLP bypass metrics

From Varonis

Data classification: Varonis DatAdvantage shows sensitive data classification by file share, SharePoint, and cloud; stale and overexposed data reports
Access entitlements: "Who has access to this?" per sensitive data store; overexposed sensitive data, users with access who haven't accessed in 90 days
Shadow data: Sensitive data in unexpected locations, Varonis classifies and flags out-of-policy locations
Behavior analytics: Unusual data access patterns, bulk access, access outside hours, behavioral signals for sensitive data

From BigID / Spirion / Nightfall

Data discovery API: Automated scans across cloud storage, databases, SaaS applications; results include store, data category, record count, risk score
Remediation workflow: Flagged data stores → owner notification → remediation tracking → closure confirmation
Retention violation detection: Data older than defined retention period by data category → remediation queue

From Database Activity Monitoring (Imperva, IBM Guardium, Datadog DBM)

Sensitive query monitoring: Queries selecting PII columns from classified tables, volume trend as a baseline and anomaly signal
User access audit: Users accessing sensitive tables; cross-reference against access review records
Data export events: Large SELECT queries or bulk exports from sensitive tables, potential exfiltration signal

From Cloud DLP (AWS Macie, GCP Cloud DLP, Azure Purview)

AWS Macie: aws macie2 list-findings --finding-criteria '{"criterion":{"category":{"eq":["CLASSIFICATION"]}}}', sensitive data in S3 by severity
GCP Cloud DLP: gcloud dlp jobs list --type=INSPECT_JOB, scheduled inspection jobs; findings by infoType
Auto-classification: Macie and Purview auto-apply classification tags to S3 objects and Azure Blob storage, feeds classification coverage KRI

Draxis turns these KRIs into a live signal

Draxis connects to the tools you already run (data classification, DLP, DSPM, and access governance tooling) and computes these data security KRIs automatically, with the green/amber/red bands, trend lines, and drift alerts described above. No spreadsheets, no manual stitching.

See how Draxis reads your stack →