Severity is no longer a triage input. Risk scoring you own is.

A vulnerability scanner finding card. The SEVERITY field is redacted under a diagonal red 'NOT SCHEDULED' rubber stamp. The RISK field below is filled with a burnt-orange 'CRITICAL' badge.

“I’m sorry, Dave. I’m afraid I can’t do that.” NIST said it more politely on April 15.

The NVD change is permanent, not a temporary glitch. CVE volume has outpaced NIST’s analysis capacity. For 25 years, vulnerability-management programs assumed the National Vulnerability Database would grade every CVE with a CVSS score, an affected-products string, and a weakness classification.

Workflows everywhere are keyed on that assumption: scanner integrations pull NVD CVSS as the severity field, ticketing rules auto-create criticals, SLA timers tier off “high or above,” executive dashboards count “open critical findings in production.” All of it inherits a signal that NIST has now formally stopped producing for most CVEs.

The short version of what to do instead: separate severity from risk, own the risk scoring, and let the rest of your program tighten against the scoring you control. Severity becomes context, not gate.

What April 15 actually changed

NIST established three prioritization tiers for CVE enrichment going forward:

CVEs in CISA’s Known Exploited Vulnerabilities (KEV) catalog. SLA target: one business day.
CVEs affecting federal-government software.
CVEs in “critical software” per Executive Order 14028 (identity / credential / access management, operating systems, hypervisors, container runtimes, and the libraries they depend on).

Everything else is published in the NVD but labeled “Not Scheduled.” About 29,000 backlogged CVEs were moved into Not Scheduled in bulk. NIST also stopped routinely producing its own CVSS score when a CNA (CVE Numbering Authority, typically the vendor) has already supplied one.

Two consequences hit the same day.

Scanner output goes blank or defaults to medium. Some tools that pull CVSS from NVD return null for unenriched CVEs. Some quietly drop the finding. Others assign a hardcoded “medium” that is neither safe nor risk-based.

Severity becomes contested. VulnCheck’s analysis of CVEs with both NVD and CNA scores found they disagreed in more than half the cases, sometimes by enough to move a vulnerability across severity levels. “The NVD CVSS” is no longer a single number. It is an absence, or a CNA score, or both with a gap between them.

The volume backdrop matters too. CVE submissions grew 263% from 2020 to 2025. FIRST forecasts 50,000+ CVEs in 2026. Cisco’s Jerry Gamblin forecasts ~70,000. Both forecasts predate AI-assisted vulnerability discovery operating at scale. Whatever signal NVD CVSS used to provide is degrading further, not stabilizing.

Severity and risk are not synonyms

Most programs treat severity and risk as the same thing. They’re not, and the conflation is now actively expensive.

Severity is the label assigned by an external party about a vulnerability in isolation. The CVSS base score asks: how bad is this vulnerability if everything else is equal? CVSS itself, in its own documentation, says base scores shouldn’t be used in isolation. Programs use them in isolation anyway, because the alternative was hard.

Risk is your assessment of likelihood and impact in your environment. Will an attacker actually reach the vulnerable code in our setup? If they do, what gets damaged? Risk is a property of your system, not the CVE.

The conflation worked while severity was a usable proxy for risk. NIST was the quiet quality-assurance layer that made the proxy hold up: when CNAs varied wildly in completeness and scoring, NIST re-scored independently. With that layer gone, severity is now a label without a referee, in a system producing 50,000+ labels a year.

You can’t fix this by chasing better severity data. You fix it by assessing risk yourself.

Risk scoring you own

NIST 800-30 (Guide for Conducting Risk Assessments) defines risk as the product of likelihood and impact. The methodology is standard. What’s missing from most vulnerability management programs is operationalizing risk assessment at scanner volume. Ponemon found two-thirds of programs scan and respond only periodically, and only half rate their prioritization of critical vulnerabilities as effective.

Here’s the operationalization we recommend starting with:

Likelihood = exploit-evidence × reachability.

Exploit-evidence: is the CVE in CISA KEV, in VulnCheck KEV, scoring high on EPSS, or covered by an active vendor advisory? KEV and EPSS are not redundant. KEV is binary, evidence-grounded, and defensible to an auditor: it replaces severity-driven escalation for the confirmed. EPSS is a daily-refreshed probability of exploitation in the next 30 days: it replaces CVSS for ranking the unknown long tail. Both belong in your scoring. VulnCheck KEV is a strict superset of CISA KEV (~80% more entries, ~2,500 vs ~1,400, added on average ~27 days earlier than competing catalogs), and its knownRansomwareCampaignUse flag is the most-cited input for ransomware-driven hard-escalation rules.
Reachability: in your environment, is the vulnerable code reachable from where you’re exposed? Three flavors. Network reachability: is the system internet-exposed? Package reachability: is the vulnerable library actually loaded? Function-level reachability: does your call graph invoke the specific vulnerable function? Endor Labs’ published research finds fewer than 10% of vulnerabilities flagged by traditional SCA tools are actually reachable from application entry points. The other 90%+ is noise.

Impact = blast radius.

What’s the data classification of the affected resource? Production vs. dev. Regulated vs. internal. Customer-facing vs. internal-only.
What does compromise of this resource enable? An IAM role with administrative access has a wider blast radius than one scoped to a single S3 prefix.

A finding is critical when it’s exploit-evidenced AND reachable AND on a high-blast-radius resource. High when any two of those three hold. The thresholds for medium and low fall out from the same dimensions. In practice, this base scoring is never the whole story: operationalized programs layer policy floors on top.

Two floor patterns recur in published programs:

1) KEV-listed CVEs get forced to an emergency tier on systems that run the vulnerable software. Federal civilian programs apply this without an explicit reachability check (CISA BOD 22-01 requires ~14-day remediation once the inventory match is made). Engineering-led programs typically condition on exposure (internet-facing, reachable).

2) Tier-1 asset classes (customer-facing systems, identity infrastructure, regulated data stores) get the tightest SLA regardless of CVSS.

Floors don’t replace scoring; they pin specific classes above a ceiling so that a low computed score on a high-blast-radius resource can’t quietly pass. They’re also why no single math choice is sufficient on its own. Hard policy constraints like “PCI scope is always at least high” aren’t encodable as a multiplied factor. Operationalized programs use overrides on top of the math, whether vendor-built or in-house.

Underneath, this is a Threat × Exposure × Impact rubric: Threat from exploit evidence, Exposure from reachability, Impact from blast radius. The math is a tactical choice. Named programs in 2026 share the operational pattern (tiered SLAs, asset overlays, KEV / EPSS overrides, policy floors), but their scoring math varies. GitLab and Atlassian start from CVSS. CISA uses an SSVC decision tree. Datadog blends signals into a weighted score. Netflix uses FAIR loss-expectancy. We chose multiplicative T × E × I for auditability and one-rubric coverage of both compliance and risk-based decisions; the floors are what make any of these choices work in practice.

That’s NIST 800-30 applied to vulnerability findings, not a separate rubric. It’s worth saying explicitly because the first reaction to a homemade-looking scoring rubric is “is this defensible to an auditor?” The answer: yes, you’re applying a federal risk-assessment methodology to your environment. That’s the job.

What changes operationally

Once you own your risk scoring, four things tighten up.

KPIs. The headline number stops being “open critical (NVD-defined) findings” and starts being “open critical (by our scoring) findings in production.” The first metric is conformance to a degrading external signal. The second tracks your actual exposure. SLA compliance moves from primary to secondary, measured against your scoring.

SLAs. SLA tiers still exist, but they tier off your scoring, not vendor severity. A finding with a critical CVSS that fails the reachability check is no longer automatically a 7-day P1. It’s documented as “scored down for non-reachability” and routed to a longer track. A finding with medium CVSS that is exploit-evidenced and reachable in production gets the 7-day clock.

The operating model. If your operating model puts triage on humans, severity-as-default was doing most of the prioritization work for you (poorly). Risk scoring shifts the load onto something explicit. At scanner volumes, the explicit scoring has to be largely automated, with humans reviewing what passes the gate. The humans don’t disappear. They move from authoring to sign-off, where their judgment compounds instead of getting eaten by the queue.

Dashboards. Stop reporting NVD-defined criticals as a primary metric to executives. That number is now reporting a signal NIST has explicitly de-prioritized maintaining. Lead with the count of risk-scored criticals. Keep severity counts as secondary context.

What doesn’t change

A few things hold.

Severity is still a useful input to your risk scoring. Exploit-evidence is one half of likelihood, and CVSS base is one input to exploit-evidence (alongside KEV, EPSS, and vendor advisories). You’re not throwing severity away. You’re moving it from gate to ingredient.

Compliance frameworks lag. PCI, SOC 2, FedRAMP and similar standards still reference CVSS thresholds. Auditor expectations will lag the technical reality for years. You still need a severity view for those conversations. The good news: a risk-scored ticket trail is closer to what auditors are actually trying to verify (“did you triage this responsibly?”) than a CVSS spreadsheet ever was.

The volume problem doesn’t go away. Risk scoring reduces what you act on, not what you process. You still need infrastructure to ingest, score, and route the thousands or tens of thousands of findings your scanners produce each year. The scoring is what makes the volume tractable.

Make the shift before the gap forces it

Programs that own their risk scoring in 2026 will be defensible in front of auditors and operationally honest with their own teams. Programs that don’t will spend the year explaining how their open critical (NVD-defined) findings count stays green while their actual exposure climbs.

Speed is the other forcing function. The Zero Day Clock tracks mean time-to-exploit falling from 2.3 years in 2018 to 56 days in 2024 to 23 days in 2025.

2026 so far: 10 hours, more than a 50x reduction from 2025.

74% of exploited CVEs in 2026 were weaponized before public disclosure (as of May 2026). The 30-day patch window is gone, which means a queue ranked by external severity isn’t producing fast enough decisions even before NIST stopped curating it.

The work is concrete:

Define your critical and high tiers in your team’s terms. Likelihood × impact, with operationalizations for both.
Re-key your KPIs and SLAs to your scoring. Move CVSS severity to context.
Build or buy the automation that applies the scoring at scanner volume, with humans reviewing the gates.

How to build a reachability + exploitability stack to support risk assessment is its own post (that’s in the queue).

Severity is what arrives. Risk is what you assess. The market just made the difference impossible to ignore.

Recent Posts

Recent Comments