Welcome back to our series on scoring vulnerabilities in medical device designs! Now that we’ve observed the evolution of CVSS to become a rubric more suitable for evaluating MIS/IT systems in live production environments (CVSS generation 3.x), but less suitable for evaluating medical device designs (CVSS generation 2.x), it’s time to tour the other rubrics that are out there and see what they have to offer.
Over the past decade, a large variety of scoring rubrics have been created. They are usually created to fit a specific niche need or industry, and all are written from the perspective of a released device. In other words, they are designed to asses “threats” rather than “vulnerabilities.” These rubrics include:
- FDA Premarket Guidance (Exploitability over Severity)
- NIST’s Risk Determination (Likelihood over Impact)
- CVSS v2 (includes “Collateral Damage” (i.e. “Severity”))
- CVSS v3 (removed “Collateral Damage” (i.e. “Severity”))
- o CVSS v3.1
- NIST’s CMSS (a variant of CVSS)
- NIST’s CCSS (a variant of CVSS)
- MITRE’s Medical Device Rubric (an elaboration on CVSS)
- IVSS (an industrial variant of CVSS)
- OWASP Risk Rating
- Billy Rios & DHS’s RSS-MD (a medical device variant of CVSS)
- PVSS/EPSS (a probability variant of CVSS)
These rubrics usually fall into one of three approaches:
- Multiple subjective factors/attributes, all equally reflected in the final score
- Multiple subjective factors/attributes, with each attribute being conditioned by a weighting factor before being calculated into a final score
- Either a or b with the inclusion of a “Likelihood” metric
If you’ve been following our cybersecurity blogs, you may already be aware of our perspective on the usefulness (or, more specifically, uselessness) of “Likelihood” as a metric for assessing vulnerabilities. However, you may not be as familiar with how “Likelihood” and its various spin-off/sidekick attributes are used in cybersecurity scoring rubrics. Let’s dig a little deeper so you can understand why any inclusion of Likelihood distorts vulnerability scoring in medical devices.
The first consideration should be that since we are scoring design vulnerabilities (see our first post in this series), this medical device has not yet been created. Therefore, there is no likelihood of it being exploited. This metric doesn’t pertain to the context in which we are deploying the rubric. Attempts to assign a probabilistic rate of attack or exploit occurrence are impossible to ascertain, except as a “best guess” regarding conditions that may exist in the future if the device were to be manufactured and sold according to the early-stage design in front of us. It relies on speculation and conjecture, in other words, and is not useful for identifying and incorporating appropriate security mitigations into your design.
Some rubrics try to conflate “Likelihood” with other attributes, typically called out as metrics with names like:
- Attacker skill level
- Attacker motive
- Opportunity to exploit
- Size of attack
- Ease of discovery
- Ease of exploitation
- User awareness
- Intrusion detection
Most of these share the same basic issues with Likelihood. Not only are they dependent on speculation and guesswork, but they are also focused on threats, not vulnerabilities.
Only one of these makes a reasonable contribution to medical device design scoring, and that is “Ease of Exploitation.” This is still quite subjective, but it does add some meaning to the final score. For example, if exploiting the vulnerability in question requires the active use of an elaborate equipment set up in the same room as the device being attacked, it may be considered relatively more difficult to exploit because of the challenge of transporting and setting up the equipment in close enough proximity without being detected before the attack can be carried out. On the other hand, if the exploit can be carried out from anywhere with the click of a button, it may be considered relatively less difficult to carry out.
A good, representative example of how Likelihood and related metrics are typically deployed by these rubrics is the National Institute of Standards and Technology (NIST)’s Risk Determination rubric, where “Likelihood” and “Severity” are both equally influential upon the final score.
Under this rubric, the assessor could determine that the vulnerability’s “Severity” is “Very High” because a patient could die from this vulnerability being exploited. But, in a case where the previous version of the device has been on the market for many years without a known attack against that same design vulnerability, the assessor could rate the “Likelihood” as “Very Low.” The resultant score would be “Low,” and thus, the scoring rubric would allow the designers to ignore it.
Needless to say, this is extremely problematic! Either the device will not be approved for the market because the flawed rubric used for the design vulnerability assessment failed to flag the vulnerability and the design was submitted for regulatory approval without adequate mitigation, or worse, the flawed rubric will mislead reviewers, who then approve the device for the market, only to see it later become how one or more patients is severely harmed due to this “unlikely” vulnerability being exploited when it matters.
Conversely, if you assume that “Likelihood” is always high, then this metric simplistically magnifies “Severity” and distorts the comparative outcome when identifying vulnerabilities in need of mitigation. Either way, the fundamental problem with these types of rubrics is revealed: the outcome is easily gamed, intentionally or unintentionally, by the person or team assessing the vulnerability.
Reinforcing its subjective nature, even NIST doesn’t know how to define Likelihood properly:
“The term likelihood, as discussed in this guideline, is not likelihood in the strict sense of the term; rather, it is a likelihood score. Risk assessors do not define a likelihood function in the statistical sense. Instead, risk assessors assign a score (or likelihood assessment) based on available evidence, experience, and expert judgment. Combinations of factors such as targeting, intent, and capability thus can be used to produce a score representing the likelihood of threat initiation; combinations of factors such as capability and vulnerability severity can be used to produce a score representing the likelihood of adverse impacts, and combinations of these scores can be used to produce an overall likelihood score.”
– NIST SP800-30r1 “Guide for Conducting Risk Assessments”
Because there is no firm definition of likelihood, the results will not be consistent or even congruent between iterations, assessors, reviewers, or projects.
In our next post, we’ll comb through these various rubrics and explain which are, and which are not, suitable for design-phase medical device vulnerability assessment.