Controls fail. That is a given. What separates a contained endpoint incident from an enterprise-wide encryption event is architecture: whether one compromised account can become domain admin, whether one infected laptop can reach every share, whether one misconfigured bucket exposes millions of records. Security architecture is the domain that sets the blast radius for everything else.

The KRIs below measure whether the structural design of your environment is producing the containment you invested in, not on the diagram, in the running environment. They cover segmentation, zero trust maturity, cryptographic standards, attack surface, resilience design, infrastructure as code, and the gap between what your documentation claims and what your stack actually does. If you are standing this up from scratch, start with how to build a KRI program and the consolidated KRI reference library.

KRI inventory

Eight indicators cover the security architecture domain. Each has a clear measurement method, the sources you derive it from, and green/amber/red bands you can tune to your own risk appetite. The bands here are starting points.

1. Network segmentation effectiveness score

What to measure. The degree to which network segmentation prevents lateral movement between zones, measured as the proportion of network paths between critical asset classes that are properly restricted, and validated through periodic segmentation testing.

Why it matters. Segmentation is the architectural control that sets ransomware blast radius. A flat network gives an attacker unrestricted lateral movement the moment they have a foothold. Segmentation effectiveness is the difference between a contained endpoint incident and an enterprise-wide encryption event.

  • Network topology documentation: documented segments and the access policies between them.
  • Firewall rule analysis (Algosec, FireMon, Tufin): which source/destination combinations are explicitly allowed versus blocked. Connect Palo Alto NGFW or FortiGate and FortiManager to read policy and allowed flows directly.
  • Network scanning and internal pen test: actual reachability between segment pairs. This is the ground truth, what can connect to what regardless of policy intent.
  • Network detection tools (Darktrace, ExtraHop, Cisco Stealthwatch): east-west traffic between segments that should not be talking.
  • Cloud network security (VPC security groups, Azure NSGs, GCP firewall rules): cloud segmentation rule analysis.

How to calculate. Identify critical segment pairs that should NOT communicate directly (user workstations to domain controllers, internet-facing apps to internal databases, OT to corporate IT). Test reachability, then blocked_pairs ÷ total_critical_pairs_that_should_be_blocked × 100.

StatusCriteria
Green>95% of critical segment pairs confirmed blocked, validated through testing rather than policy review alone, with segmentation tested annually or after major architecture change.
Amber80–94% blocked, or segmentation untested (policy only, not validated).
Red<80% of critical pairs blocked, or a flat network with no segmentation, or a ransomware blast radius that spans the entire network.

2. Zero trust architecture maturity score

What to measure. Progress against a defined zero trust maturity model, assessed across five pillars (identity, devices, networks, applications, data), using the CISA Zero Trust Maturity Model or NIST SP 800-207 as the reference.

Why it matters. Zero trust is an architectural posture, not a product. It assumes breach and enforces verification for every access request, which directly reduces the blast radius of every other failure. Organizations early in maturity carry implicit trust zones that attackers exploit; mature ones force verification at every boundary.

  • CISA Zero Trust Maturity Model self-assessment: structured scoring against Traditional, Advanced, and Optimal levels per pillar.
  • Identity pillar: MFA enforcement, conditional access, privileged access controls. Pull this from Okta or Entra ID.
  • Device pillar: endpoint management compliance and device health attestation before access.
  • Network pillar: micro-segmentation and encrypted east-west traffic.
  • Application pillar: application-layer access controls and API authentication enforcement.
  • Data pillar: data classification coverage, encryption at rest and in transit, DLP enforcement.

How to calculate. Score each pillar 1 to 3 (Traditional/Advanced/Optimal) and take the overall average. Track the per-pillar scores and the average as a trend over 12-month periods.

StatusCriteria
GreenAverage score >2.0 (Advanced or better across pillars), with specific pillar gaps identified and on the roadmap.
AmberAverage 1.5–2.0, or one or more pillars at Traditional with no active improvement plan.
RedAverage <1.5, or no zero trust assessment conducted, or a majority of pillars at Traditional maturity.

3. Cryptographic standards compliance rate

What to measure. The percentage of cryptographic implementations across the environment (TLS configurations, encryption at rest, PKI, signing algorithms, key lengths) that comply with current standards, and the rate of deprecated or weak configurations still in use.

Why it matters. Cryptographic debt is silent. TLS 1.0 and 1.1 are deprecated and exploitable, SHA-1 signatures are collision-vulnerable, and 1024-bit RSA keys are no longer secure. Organizations accumulate this debt through legacy systems, third-party libraries, and years of "we will fix that later." The risk does not surface until someone exploits it.

  • TLS scanning (SSL Labs, Qualys SSL Scanner, testssl.sh): cipher suite and protocol assessment across internet-facing endpoints.
  • Internal TLS scan: the same assessment on internal services and APIs, often worse than external.
  • Certificate management platform: certificate algorithm and key length inventory (SHA-1 versus SHA-256, RSA 2048 versus 4096).
  • PKI audit: root and intermediate CA key lengths, algorithm types, validity periods.
  • Database encryption: algorithm used for data at rest (AES-128 versus AES-256, deprecated DES/3DES).
  • SSH configuration audit: protocol version, accepted key algorithms, and cipher suites on servers.

How to calculate. services_on_current_standards ÷ total_services_assessed × 100, then flag by deprecated category: TLS 1.0/1.1, SSL, SHA-1, RSA <2048, DES/3DES.

StatusCriteria
Green>99% compliance with current standards, deprecated cryptography eliminated, and a cryptographic inventory maintained and reviewed annually.
AmberKnown deprecated cryptography with a documented remediation timeline, or internal services worse than external.
RedSHA-1 in active use, TLS 1.0/1.1 on internet-facing services, or no cryptographic inventory.

4. Security architecture review coverage rate

What to measure. The percentage of significant new system deployments, major architecture changes, and new third-party integrations that went through a formal security architecture review before deployment.

Why it matters. Architecture debt is built one unreviewed deployment at a time. Each system shipped without review creates assumptions nobody checked: default credentials, missing encryption, inadequate segmentation, unsecured APIs, data access permissions nobody approved. This KRI tells you whether security is in the path of change or being routed around.

  • Change management system (ServiceNow Change Management, Jira): changes tagged as requiring security review, with review completion as a required field before approval.
  • Architecture review board records: projects reviewed versus projects deployed in the same period.
  • Security team records: completed design reviews, threat models, and architecture sign-offs.
  • Infrastructure as code (Terraform, CloudFormation): PR review process and the security team review requirement for infrastructure changes.

How to calculate. significant_changes_with_completed_review ÷ total_significant_changes × 100. Define "significant" in policy: new external-facing systems, new data stores handling sensitive data, major architectural changes, and new third-party integrations with data access.

StatusCriteria
Green>95% of defined significant changes reviewed before deployment, with review embedded in the change management workflow.
Amber80–94%, or a review process that is manual and easily bypassed.
Red<80%, or security architecture review not in the change management workflow, or reviews happening after deployment.

5. Attack surface reduction rate

What to measure. The trend in size and exposure of the external and internal attack surface, measured as change in internet-facing service count, open service count on critical internal assets, and shadow IT/unmanaged asset count over the trailing 12 months.

Why it matters. Attack surface reduction is the architectural principle of removing or hiding what does not need to be reachable. Most attack surfaces grow by default through new services, forgotten assets, and accumulated exceptions. A shrinking or stable surface, with documented justification for everything in it, is a maturity signal. Watch the direction of travel, which is exactly the kind of trend covered in reading the signal on drifting KRIs.

  • External attack surface management (Censys, Shodan, Runzero): internet-facing service inventory tracked quarter over quarter.
  • Network discovery (Runzero, Nmap scheduled scans): internal asset and open port inventory trend.
  • Cloud CSPM (Wiz, Prisma Cloud, AWS Security Hub): public-facing cloud resource count and change over time. Connect AWS Security Hub to feed cloud posture and exposure findings.
  • Shadow IT discovery (Axonius, Netskope): unmanaged assets on the network.
  • Firewall rule count: total approved external exposure entries, where growth signals architectural bloat.

How to calculate. Track as trend metrics over the trailing four quarters: internet-facing services count (should be stable or declining), shadow IT count (should be declining or stable with justification), and internet-facing services with no documented business justification (should be zero).

StatusCriteria
GreenAttack surface stable or shrinking, all internet-facing services documented and justified, and shadow IT on a declining trend.
AmberGradual growth with no corresponding business justification program, or shadow IT stable with no reduction effort.
RedRapid growth in internet-facing services without review, or growing shadow IT, or large unreviewed external exposure.

6. Architectural resilience and recovery design coverage

What to measure. The percentage of critical systems and services with documented architectural resilience design (high availability configuration, failure mode analysis, backup architecture, recovery time verification) confirmed through testing.

Why it matters. Security architecture includes resilience: surviving and recovering from failures, whether caused by attackers or by operational events. Systems designed without resilience fail catastrophically under stress, and recovery objectives that have never been tested are aspirations, not commitments.

  • Architecture documentation: HA configuration docs and failure mode and effects analysis (FMEA) per critical system.
  • Cloud architecture review (AWS Well-Architected, Azure Advisor, GCP Architecture Framework): resilience pillar scoring.
  • DR/BCP documentation: Recovery Time Objectives and Recovery Point Objectives per critical system.
  • Recovery testing records: tabletop and functional tests of recovery procedures.
  • Backup system health: feeds into the architectural resilience picture.

How to calculate. (critical_systems_with_HA_design + recovery_time_verified_by_testing) ÷ total_critical_systems × 100.

StatusCriteria
Green>90% of critical systems with documented resilience design and tested recovery, with RTOs validated annually.
Amber70–89%, or RTOs documented but untested, or HA documented but not validated after change.
Red<70%, or critical systems with no documented failure handling, or RTOs that have never been tested.

Recovery objectives that have never been tested are aspirations, not commitments. The architectural question is whether critical systems are designed to meet their objectives, and whether that design has been exercised.

7. Infrastructure as code security coverage

What to measure. The percentage of cloud and infrastructure deployments provisioned through infrastructure as code (IaC) with security policy scanning integrated, and the rate of policy violations caught in IaC before deployment versus in live environments.

Why it matters. IaC scanning is the cheapest point in the lifecycle to catch infrastructure misconfigurations. A misconfiguration caught in a Terraform plan costs minutes to fix. The same one discovered in production during a CSPM scan may have been live for weeks with exposure accumulating. This KRI measures how much of your infrastructure can be caught at design time rather than in production.

  • IaC platforms (Terraform, CloudFormation, Pulumi, Bicep): configuration files in source repositories.
  • IaC security scanners (Checkov, tfsec, Terrascan, KICS, Snyk IaC): scan results per repository.
  • CI/CD pipeline: presence of an IaC scan step and an enforcement gate.
  • CSPM findings: compare IaC-managed resources (tagged managed_by = terraform) against manually created ones, which typically carry higher misconfiguration rates. AWS Security Hub supplies the runtime posture side of this comparison.

How to calculate. IaC coverage: resources_via_iac ÷ total_cloud_resources × 100. Scanning coverage: iac_repos_with_scanning ÷ total_iac_repos × 100. Shift-left effectiveness: misconfigs_caught_in_iac_scan ÷ (misconfigs_caught_in_iac_scan + cspm_findings_for_iac_resources) × 100.

StatusCriteria
Green>90% of cloud resources provisioned via IaC, IaC security scanning in all pipelines, and >80% of misconfigurations caught before deployment.
Amber70–89% IaC coverage, or scanning present but not gated.
Red<70% IaC coverage (most infrastructure manually provisioned), or no IaC security scanning, or CSPM findings concentrated on manually created resources.

8. Security architecture documentation currency

What to measure. The percentage of production system architectures with current, accurate documentation, specifically data flow diagrams showing where sensitive data moves, network architecture showing actual segmentation, and trust boundary documentation, reviewed or updated within the last 12 months.

Why it matters. Security decisions require accurate architectural knowledge. Incident response needs data flows, and regulatory assessments need documented processing architectures. Documentation that is three years out of date is a liability: it creates false confidence, misleads responders, and produces inaccurate risk assessments. Auditors, regulators, insurers, and the board all eventually ask to see it.

  • Architecture documentation system (Confluence, Lucidchart, Miro, architectural decision records): document last_modified dates.
  • Data flow diagrams: cross-reference against actual network traffic (from NDR or network monitoring) and database connection logs. Do the DFDs match what is happening?
  • Change management system: architectural changes in the last 12 months versus documentation updates in the same period.
  • Automated architecture mapping tools (CloudMapper, Cartography, Lynis): compare tool-discovered architecture against documentation.

How to calculate. critical_system_architectures_updated_in_last_12_months ÷ total_critical_system_architectures × 100.

StatusCriteria
Green>90% of critical systems with current documentation, with automated discovery used to validate accuracy.
Amber70–89%, or documentation reviewed but not validated against the actual environment.
Red<70%, or primary architecture documentation >24 months stale, or data flow diagrams that do not reflect known architectural changes.

The common failure: scoring policy, not reality

Most segmentation and zero trust scores are built from policy intent, the firewall rules that exist, the maturity model that was filled in once. The ground truth is reachability: what can actually connect to what, regardless of what the policy says. Validate through testing, not policy review, or the green band is fiction.

Deriving these KRIs by source type

The same indicators look different depending on which system you pull them from. Below is how to extract each from the source types you already have.

From network security and visibility tools

  • Segmentation validation: Runzero network scan, export reachability data between segments and compare to approved firewall policy. Run nmap -p- --open <segment_range> from each segment as an authorized pen test.
  • East-west traffic baseline: NDR tools (ExtraHop, Darktrace) export a traffic flow matrix, then flag connections between segments that should be restricted.
  • Firewall rule analysis: FireMon or Algosec rule utilization report identifies unused rules (attack surface bloat) and any/any rules (segmentation failures).

From cloud platform APIs (AWS, Azure, GCP)

  • AWS attack surface: aws ec2 describe-security-groups | jq '.SecurityGroups[] | select(.IpPermissions[].IpRanges[].CidrIp == "0.0.0.0/0")' finds security groups open to the internet.
  • Azure public exposure: az network list-usages plus NSG rule analysis for inbound rules from * or Internet.
  • GCP public resources: Cloud Asset API, gcloud asset search-all-resources --scope=projects/PROJECT_ID --query="state:ACTIVE" with a public access filter.
  • IaC versus manual: compare resources tagged managed_by = terraform against total resources.

From CSPM platforms (Wiz, Prisma Cloud, AWS Security Hub)

  • Misconfiguration rate by resource type: CSPM findings dashboard filtered by severity, tracking new critical findings per week.
  • IaC versus runtime comparison: Wiz and Prisma can identify which resources match IaC definitions versus which were created manually.
  • Attack path analysis: the Wiz attack path graph shows architectural paths from internet exposure to crown jewel resources. The AWS Security Hub integration normalizes these findings into Draxis.

From certificate and cryptographic management

  • TLS scanning: testssl.sh --wide --json <target> across all FQDN inventory, parsing for deprecated protocol and cipher suite use.
  • Certificate inventory: ACME CA logs (Let's Encrypt), internal PKI logs, or the Cert Spotter API give certificate algorithm and key length per entry.
  • SSH configuration audit: the ssh-audit tool against server inventory for key algorithm and cipher compliance.

From source code management (for IaC)

  • IaC repository discovery: GitHub API, list repositories containing .tf, *.yaml (CloudFormation), or Bicep files.
  • IaC scan integration: inspect GitHub Actions workflow files for the presence of checkov, tfsec, terrascan, or snyk iac steps.
  • IaC change frequency: commits to infrastructure repositories versus CSPM findings on those resources. A high commit rate with low scan coverage means high misconfiguration risk.

These eight pair naturally with the operational and application views. See the companion guides on KRIs for security operations, KRIs for application security, and KRIs for enterprise security for the controls that sit on top of the architecture.

Draxis turns these KRIs into a live signal

Draxis connects to the tools you already run (firewalls, cloud posture, CSPM, certificate management, source control) and computes these architecture KRIs automatically, with the green/amber/red bands, trend lines, and drift alerts described above. No spreadsheets, no manual reachability stitching.

See how Draxis reads your stack →