CVE - Technical Guidance for Handling the New CVE ID Syntax (Archived)

Technical Guidance for Handling the New CVE ID Syntax (Archived)

Introduction
Terminology Used in This Document
Special Note to CVE Users Who Are Not Developers
Considerations for Input Format
Considerations for Output Format
Considerations for Extraction or Parsing
Extraction and Conversion Methods for CVE IDs
Example Conversion Algorithm for Incoming IDs
Test Data for Implementers
Inquiries and More Information
Change Log

Introduction

This page provides technical guidance and test data to developers and consumers for tools, web sites, and other capabilities that use CVE Identifiers (CVE IDs).

Terminology Used in This Document

Year Portion

The portion of the CVE Identifier that is associated with the year of issuance; e.g., in CVE-2014-54321, the year portion is 2014.

Sequence Number

The numeric portion of the CVE Identifier that appears after the year. This is a minimum of 4 digits (with leading "0" digits if necessary), or 5 digits or more (beginning with a non-0 digit). For example, in CVE-2014-54321, the sequence number is 54321.

Non-Conformant Input

An input string that is known (or suspected) to be intended as a CVE Identifier, but does not strictly conform to the new 2014 CVE ID syntax. For example, the non-conformant strings "cve-2014-1234", "CVE: 2014-1234", and "2014-1234" might all be variations of the syntactically-valid identifier "CVE-2014-1234".

Syntactically-Valid ID

A CVE Identifier that conforms to the new 2014 CVE ID syntax.

Truncation

The process of inadvertently reducing the length of a CVE Identifier, typically by "chopping off" the digits that appear at the end of the ID. This is a problem for products that assume the sequence number contains only 4 digits. For example, the valid ID "CVE-2014-12345" could be truncated to only "CVE-2014-1234" â€“ a completely different, but valid ID.

Conversion

The process of accepting an input that is intended to be used as a CVE Identifier, and converting it into a syntactically-valid ID. For example, "CVE 2014-1234" is not syntactically valid because there is not a hyphen between the "CVE" prefix and the year portion, but it can be converted to the valid "CVE-2014-1234" ID with a presumably-low risk of error.

Extraction

The process of extracting one or more CVE Identifiers from free text, such as a mailing list post.

Option A

A proposed modification to the CVE ID syntax that used a fixed-length identifier with leading '0' digits, such as "CVE-2014-001234". This option was not accepted by the CVE Editorial Board during the voting period in 2013. (Note that Option A' is also for fixed-length identifiers, but with 8-digit sequence numbers instead of 6-digit numbers used by Option A.)

Option B

The syntax modification that was accepted by the CVE Editorial Board to become the new CVE ID syntax beginning in 2014.

Option C

A proposed modification to the CVE ID syntax that used variable-length IDs followed by the Luhn checksum in order to detect transcription errors. This option was not accepted by the CVE Editorial Board during the voting period in 2013.

Special Note to CVE Users Who Are Not Developers

While this technical guidance is primarily focused on developers, there are several things that CVE end users can do in order to ensure smooth operation with the new CVE syntax:

For any tool or capability that supports searching by CVE ID, try using some of the valid (and invalid) IDs provided in the test data, and see how the tool handles it. Search forms might inadvertently truncate long IDs, or rearrange the digits of a long ID in unexpected ways.
Consider how longer CVE IDs can affect the formatting and display of documents and other products that you produce yourself. For example, a spreadsheet that includes a CVE column might only be 13 characters wide (the size of a classic CVE ID); how would the display be affected if you had to support longer IDs?
Ask your vendors whether they have actively tested their products using the test data provided through this site.

Considerations for Input Format

Truncation

If the product assumes that the CVE ID's sequence number will contains only 4 digits, then it may remove or otherwise shift sequence numbers that contain 5 or more digits, producing an incorrect ID. (For example, the "CVE-2014-12345" ID could be inadvertently truncated to "CVE-2014-1234", which is for a completely different issue; MITRE has observed this in real-world implementations). Truncation can cause significant errors in the exchange of vulnerability information, since it can produce incorrect CVE IDs that are for unrelated vulnerabilities. Note that the risk of truncation was a known limitation of Option B during the voting period; see a later section on the protection block that MITRE has implemented during the 2014 transition year.

Conversion

When an input is expected to be a CVE Identifier, the product may perform conversion on inputs that are not syntactically valid. For example, in a "CVE ID search" utility, the "2014-1234" ID is not syntactically valid, but it can be converted to a "CVE-2014-1234" identifier. This improves usability and minimizes user confusion. However, when the sequence number is either less than or more than 4 digits, or if the conversion does not account for other unexpected properties of the input, then the conversion could produce an incorrect ID. See the later section on ID Conversion Errors.

Considerations for Output Format

Formatting CVE IDs in Output

CVE IDs can be more than 13 characters wide; for example, "CVE-2014-123456" is 15 characters wide. If the capability produces output that assumes only 13 characters — such as a formatted table — then newer IDs might affect the display of these IDs.

In some cases, if the ID is truncated to only 13 characters and it appears in a report, this could cause the wrong ID to be presented (e.g., results from a vulnerability scan that requires the enterprise to fix a particular issue in order to satisfy compliance requirements). See the Truncation Errors - Dangers of the 4-digit Assumption section.

Sorting

When more than one CVE ID is presented in a list, sometimes it is desirable to sort the IDs. Even though CVE IDs are not allocated sequentially based on the disclosure date â€” and, therefore, there is no guarantee that an ID with a smaller sequence number has been published before an ID with a larger sequence number â€” sorting has other uses.

Because the ID length is no longer fixed with leading zeroes in the new syntax, some sorting routines that used to work correctly might not sort IDs from the lowest sequence number to the highest.

Consider the series of CVE-2014-1234, CVE-2014-9999, CVE-2014-10000, and CVE-2014-12345. Many simplistic string-based searches would sort these as:

CVE-2014-10000
CVE-2014-1234
CVE-2014-12345
CVE-2014-9999

However, based on the numeric value of the sequence number alone, the appropriate sort would be:

CVE-2014-1234
CVE-2014-9999
CVE-2014-10000
CVE-2014-12345

While there are no CVE Compatibility Requirements for sort order, some capabilities may have their own requirements to use a sequence-number-based sort order instead of an alphanumeric-based order. For example, since CVE IDs are issued sequentially, a reverse sort based on sequence number will list the most recently-allocated CVE IDs first.

See the Test Data section for testing guidance and examples.

Storage of CVE IDs - Length

Examine how CVE IDs are stored. If they are simply stored as arbitrary-length strings, then it is unlikely to pose a problem.

However, some storage mechanisms might only allocate 13 characters (the length of a "CVE-YYYY-NNNN" string), which could trigger problems because CVE IDs can now contain more than just 13 characters. For example, a database column for CVE IDs might only be defined for 13 bytes.

Other implementations might use more memory-efficient encodings or structures consisting of only a few bytes, since the numeric values for a 4-digit year and sequence number could be stored in two bytes each.

The Test Data section includes example IDs that are syntactically valid but could cause some issues in storage.

Considerations for Extraction or Parsing

The sections below provide additional guidance for how to extract, parse, and/or validate CVE IDs using the new syntax.

Truncation Errors - Dangers of the 4-digit Assumption

One known limitation of the new CVE ID syntax is that an implementation that assumes only 4-digit sequence numbers could inadvertently truncate longer IDs to 4-digit IDs, which themselves are still valid - but for a completely different vulnerability. For example, CVE-2014-12345 is a valid identifier in the 2014 syntax, but an implementation could truncate it to CVE-2014-1234.

Implementers should be very careful that truncation does not occur. Some areas to examine are:

Importing / Parsing CVE IDs, where either 4 digits is assumed, or a 13-character length (the length of a CVE ID with a 4-digit sequence number) is assumed.
Exporting CVE IDs, where either 4 digits is assumed, or a 13-character length is assumed.
Search forms that assume a fixed length. For example, MITRE has observed web pages in which a CVE ID lookup form restricts user input to exactly 13 characters; if a user enters a longer ID, it may be silently truncated and the user might not notice the truncation. Also, the user would not be able to search for any IDs that are too long.

Reducing the Risk of Truncation Errors - 2014 Protection Block

Once the first CVE ID with a 5-digit sequence number is published, it is likely that many implementations will break, despite the amount of time that has been given to developers for changing their implementations. This could happen near the end of 2014.

If implementations break fatally, this will be noticeable to the consumer. However, inadvertent ID truncation can occur silently, without any warning to the user, and it would produce highly inaccurate results because it would generate incorrect CVE IDs. MITRE has implemented a mechanism to make it easier for users to notice when an implementation is incorrectly truncating IDs.

Since CVE IDs are issued sequentially, the first 5-digit CVE ID would be CVE-2014-10000, which could be truncated to CVE-2014-1000 by some CVE implementations that have not yet implemented the new syntax. Similarly, CVE-2014-11000 could be truncated to CVE-2014-1100.

To address such potential truncations, MITRE has decided to implement a protection block of "unusable" CVE IDs, similar to what is documented here:

https://cve.mitre.org/data/board/archives/2013-06/msg00005.html

In summary, CVE-2014-1000 through CVE-2014-1199 have not been issued at all (CVE-2014-1200 is also currently unissued).

If an implementation inadvertently generates any ID between CVE-2014-1000 and CVE-2014-1199, then it is possibly due to a truncation error. Because those IDs do not exist at all, any inadvertent usage will likely generate a noticeable error in a CVE-using implementation, which might alert the user (and vendor) to the possible truncation.

This protection block implementation also means that these CVE IDs will not show up in any CVE downloads, thus will not appear in any CVE-using databases that populate their own data using these downloads. They will not even show up as "RESERVED." In addition, any lookup to the CVE web site will generate an error to notify the user of the use of a potentially truncated CVE ID, for example:

https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2014-1000

When a CVE in the protection block is requested, the CVE Web site's lookup will generate a more informative message, notifying users of the potential of a 5-digit truncation.

It is uncertain whether CVE will naturally hit 10,000 IDs for 2014 and trigger use of the protection block. We cannot be sure whether implementations have adjusted to the new CVE ID syntax until we issue a 5-digit CVE, but we must also give implementers as much time as possible to fix their products to adjust to the change. If 10,000 IDs are not achieved by the end of 2014, then one possible approach is to issue some legitimate 5-digit IDs, which would then trigger any truncation/parsing errors that remain unfixed at that time. However, any methods for intentionally issuing 5-digit IDs "before their time" need to be considered as part of the overall awareness strategy. We will discuss awareness efforts with the CVE Editorial Board in the near future.

ID Conversion Errors

When an implementation receives an input that is intended to be a CVE ID, but the ID is not syntactically valid, then the implementation might attempt to convert the input into a valid ID. This makes the product easier to use and allows the product to handle minor errors or syntax violations during human data entry. MITRE has observed many instances of incorrectly-formatted CVE IDs in security advisories by vendors and researchers alike, and these errors can also cascade into products and databases.

MITRE has observed some cases in which IDs can be converted to unexpected values. It is not always clear what programming logic is triggering the error.

For example, in one implementation, a request for "CVE-2014-12345" (i.e., two spaces before the "14") produced "CVE-1234-0005", and a request for "CVE-2014-456132" produced a "CVE-4561-0032" identifier.

In another implementation, the malformed "CVE-14-1234" ID would be incorrectly converted to "CVE-0014-1234" in one function (for lookup), or converted to "CVE- 14-1234" in another function (for extraction). While the year portion of the ID is obviously non-compliant, this shows how different conversion methods can produce different results.

To catch the most egregious conversion errors, it is recommended that after an input is converted, the syntax of the result should be re-validated using strict checks. If the resulting ID is not syntactically valid, then either the conversion was incorrect, or the original input is so badly malformed that a usable ID could not be created from it.

As observed elsewhere in this document, relying solely on strict syntax validation can cause usability issues in cases where human-entered IDs can be used.

Validation of CVE IDs

Anywhere a CVE ID is provided, there is an opportunity to validate the ID to ensure that it has the correct syntax. This is especially the case for any kind of input, extraction, or parsing.

Suggested Regular Expressions for ID Validation

Since there are many different syntaxes for regular expressions, these might not be appropriate for a particular language.

Strict: /^CVE-\d{4}-(0\d{3}|[1-9]\d{3,})$/

This ensures that the year is 4 digits. It also ensures that a sequence number cannot have a leading zero if it is 5 digits or more, and that every sequence number must have at least 4 digits. Note that this allows years before 1999, which is contrary to CVE's current issuance practices, but implementations are not expected to provide more advanced logic to enforce such a restriction; in addition, since CVE is likely to be used for many years to come, and the information security industry is still changing rapidly, there is a small chance that years before 1999 could be allowed in future issuance strategies.

The "^" and "$" anchors ensure that there are no extraneous characters outside of the CVE ID itself. In many cases, especially involving hand entry by humans, this could be too strict. At the very least, extra whitespace could be trimmed from any input before validating the input. The anchors could be removed from the regular expression, and the expression could be used to match the intended sequence, extract the matched values, and reconstruct the ID.

Note that the CVE prefix is assumed to be capitalized; an "implied strict" approach might be to convert the input to uppercase (to effectively capitalize the "CVE" prefix) and, for implementations that use CVE data preceding approximately 2005, another implied-strict approach might be to support the old "CAN-" prefix.

Implied Strict: /^CVE-\d{4}-\d{4,}$/

This is the simplest regular expression that does not mark any valid IDs as invalid; however, it removes the check for the leading 0 when there are 5 or more digits in the sequence number. However, if this type of ID were used to look up more CVE data, the lookup would likely fail.

Loose: /CVE[^\w]*\d{4}[^\w]+\d{4,}/

This ensures that there is a CVE prefix, followed by zero or more non-alphanumeric characters (whether spaces, hyphens, etc.), with a 4-digit year, followed by at least one non-alphanumeric character, and at least 4 digits.

This would accept IDs such as "CVE: 2014-1234", "CVE_2014_1234", etc.

Notice that this would not accept IDs of a form such as "CVE-20141234" because there is nothing that separates the year and sequence number. It seems risky to allow such sequences, since a typo in the year would not necessarily be easily detected, but it is the implementer's decision about whether to accept this risk.

See the Test Data section for more examples of IDs to consider for loose extraction, conversion, and validation.

CVE ID Validation Logic in Metadata

Some XML Schema Definitions (XSD), structured representation languages, and input validation frameworks might encode validation logic that ensures that a CVE ID is well-formatted. This logic could be in the form of a regular expression. MITRE has observed real-world cases of this issue. If well-structured data is transferred between systems and the data includes CVE IDs, then a failed validation could prevent all data from being transferred if the failure is treated as a fatal error.

Extraction and Conversion Methods for CVE IDs

Implementations are not required to accept only syntactically valid CVE IDs, although some implementations might have this assumption.

Since CVE IDs are not always specified perfectly in the real world, there are many syntactically invalid inputs that nonetheless could be reasonably interpreted as a particular ID.

Due to the wide variety of use cases and scenarios that cannot all be known to any single party since CVE is often used in non-public tools and processes, MITRE is not specifying any particular requirements for extraction or conversion. However, some test data is available that could be used to evaluate the efficacy of your own implementation and consider whether it is sufficient.

Note that there are at least three methods of extraction and conversion:

Strict - Conforms perfectly to the specified syntax.
Implied strict - Conforms fairly closely to the specified syntax, with a small number of limited exceptions in cases where the use of a CVE ID is clearly implied. (Examples: "cve-2014-1234" with a lowercase prefix; "CVE 2014-1234" with a space instead of a hyphen for an identifier; or simply "2014-1234" when the lookup is known to be based on a CVE ID, instead of other identifiers that may have the same syntax).
Loose - The inputs do not conform closely to the specified syntax, but could possibly be interpreted as CVE IDs.

It is up to the implementation to determine when to use strict, implied strict, or loose extraction and conversion. These terms are not defined precisely, because the use cases and requirements for each tool/capability can vary so widely for each implementation.

See the Test Data section below for test data that can be used to evaluate the effectiveness of extraction and conversion methods.

Example Conversion Algorithm for Incoming IDs

Implementations are not required to perform conversion for non-conformant inputs, but it can be useful or convenient in some cases, especially when human data entry is involved.

Obtain the input that may be a CVE ID
Validate the ID to ensure that it conforms to the CVE ID syntax
- if valid: lookup ID; DONE.
- if invalid: WARN possibly invalid ID; continue to step 3
Extract the first two numeric sequences (with at least 4 digits each) that are separated only by punctuation characters or whitespace
- if two sequences can't be found: ERROR (no ID possible); DONE
- if found: continue to step 4
Re-format the ID as "CVE-[year]-[seqnum]" where [year] is the first numeric sequence from step 3, and [seqnum] is the second numeric sequence from step 3
Validate the re-formatted ID from step 4
- if valid: lookup new ID; DONE
- if invalid: ERROR (malformed syntax); DONE

Test Data for Implementers

CVE Website

The following test data is available for implementers:

CVE ID-Test Data Version 1.1, 20 February 2014 (ZIP, 527 KB)

This ZIP file includes several data files that can be used to devise tests for implementations, including:

A large number of valid IDs, some of which are intentionally designed to trigger known or suspected errors in some ID parsers.
A large number of invalid IDs, some of which are intentionally designed to trigger known or suspected errors in some ID parsers, and many of which have been observed in the real world.
Raw CVE download data, in the same formats as available from the CVE Web site, to help test import/export capabilities.
A short list of IDs along with a recommended method for sorting.
The README-tests.txt file in the ZIP contains more detailed explanations for each data file, along with suggestions for constructing tests.

National Vulnerability Database (NVD)

For CVE consumers who use NIST's NVD data, NIST provides test data in NVD format at: https://nvd.nist.gov/cve-id-syntax-change.

Inquiries and More Information

For any inquiries or suggestions, please contact: cve-id-change@mitre.org.

Change Log

Changes to this document are noted below:

Fixed syntax error in for the "Strict" example in the "Suggested Regular Expressions for ID Validation" section. Document updated to Version 1.1.

This is a draft report and does not represent an official position of The MITRE Corporation. © 2014, The MITRE Corporation. All rights reserved. Permission is granted to redistribute this document if this paragraph is not removed. This document is subject to change without notice.
Document version: 1.1	Date: December 12, 2014

Technical Guidance for Handling the New CVE ID Syntax (Archived)

Table of Contents

Introduction

Terminology Used in This Document

Year Portion

Sequence Number

Non-Conformant Input

Syntactically-Valid ID

Truncation

Conversion

Extraction

Option A

Option B

Option C

Special Note to CVE Users Who Are Not Developers

Considerations for Input Format

Truncation

Conversion

Considerations for Output Format

Formatting CVE IDs in Output

Sorting

Storage of CVE IDs - Length

Considerations for Extraction or Parsing

Truncation Errors - Dangers of the 4-digit Assumption

Reducing the Risk of Truncation Errors - 2014 Protection Block

ID Conversion Errors

Validation of CVE IDs

Suggested Regular Expressions for ID Validation

CVE ID Validation Logic in Metadata

Extraction and Conversion Methods for CVE IDs

Example Conversion Algorithm for Incoming IDs

Test Data for Implementers

CVE Website

National Vulnerability Database (NVD)

Inquiries and More Information

Change Log