|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] Re: Re: [CVEPRI] Increasing numbers and timeliness of candidates
Scott Lawler said: >(b) An acceptable level of noise is the low sustained level of >inaccuracy often dealing with some possible duplicates and level of >abstraction challenges. How many "errors" a month is too many? I'm >not sure this level is possible to determine...let alone define, >track, and meet to everyone's expectations. I'd estimate that currently, about 3-5 duplicates are "caught" in refinement for every 100 candidates, before a candidate is assigned. And that's with a process that's designed to catch duplicates before the refinement phase (that way, a content team member doesn't waste time refining an issue that someone else is working on). One of the most common causes of duplicates is alternate spellings. Consider the following submissions that we might receive from 3 different sources. Submission A: "MyProduct vulnerability in CoolFeature" Submission B: "My Product vulnerability in Cool Feature" Submission C: "My Product vulnerability" Submissions B and C match up, but they don't match A at all, except for the "vulnerability" term, which is rather common. So, one content team member does refinement on just group (A), and a different member gets (B, C). For this case, B's short description might do a better job of explaining the issue to the team member than A or C does. So, one refiner might have an easier time of understanding the issue well enough to apply content decisions. If all 3 submissions are together, there is less effort. The alternate spelling of "My Product" and "MyProduct" is key... and a *very* common occurrence across different vulnerability databases. These errors come from the original announcement of the vulnerability. One information provider might catch the misspelling and fix it, while others might use the original spelling. Sometimes, even the vendor uses multiple spellings. (To facilitate search and lookup of CVE names, we find the correct spelling and put the wrong spelling in the keywords. While I'm on the topic, I suggest that vulnerability databases consider this practice as well.) OK, so we've now had team member 1 produce a "pre-candidate" for group (A), and member 2 has produced a "pre-candidate" for group (B,C). This produces a "refinement duplicate." The references don't always help with matching, either. There may be other keywords in the descriptions that help to match. So, we match A against everything... then B against everything... then C against everything. It's redundant, but matching goes much faster than refinement does (and it requires less technical skill), so we reduce the bottleneck of refinement. Due to some personnel transition issues, in Fall 2001, we had a situation where some submissions were not matched against other submissions, which effectively mimics what would happen if we didn't match everything against everything else. The result? A sharp increase in refinement duplicates. I catch refinement duplicates when I am editing everyone's refinements, *before* I assign a candidate number. But, I don't always remember everything I've done (and every team member, including me, has refined an A group and then a B,C group without remembering that they already did it!). So, while I catch a number of dupes during editing, some of them still might slip by. And this an example of how, for example, Mark Cox just found that CAN-2001-1227 and CAN-2001-1278 are duplicates. It's also possible that submission A only describes one bug, whereas B describes 1 bug, and C describes 2 bugs. (Actually, this is pretty typical). So, abstraction errors can creep in, too; I usually catch these during editing and clean them up, too. Based on content decision statistics, 15% of all candidates are affected by abstraction CDs. That means that up to 15% of all reported issues could be given the wrong level of abstraction, depending on the amount of information available. My editing task also includes things like modifying descriptions to better fit the CVE "style," ensuring that content decisions have been applied correctly, making sure that the analysis section includes the appropriate information (e.g. which line in a change log can be proven to indicate vendor acknowledgement), etc. There is one last chance to catch refinement duplicates, and that is *after* the numbers are assigned, but *before* the candidates are proposed to the Board (or at least published on the CVE web site). This involves matching all the newly created candidates against each other. Content team members will have placed alternate spellings in the keywords, and duplicate candidates will share many of the same references, and the references will have been CVE-normalized. I usually skip this step due to time factors; otherwise the CAN-2001-1227/CAN-2001-1278 dupe would have been caught easily. In the few times I've had the time to do this, I've reliably caught a few more dupes. Recent Improvements ------------------- In recent months, we have begun enhancing the submissions by automatically extracting the references (where possible) and normalizing them to the CVE-style format. For example, a URL to a Bugtraq post is followed, and a CVE-style reference is made that includes the date and subject line. Same thing for a URL to a vendor advisory. The matching algorithm also matches on references. So, these normalized references help avoid duplicates - and they also save refiners time, because they don't have to convert URL's or "loose text" to the CVE style references. This approach cuts down on duplicates in the longer run, because no matter how many ways someone can spell a product name, the vendor advisory ID or original Bugtraq post is the same, and the submissions are more likely to be matched together. If we modify our processes to skip some of these steps, then refinement duplicates will not be caught as often. If people are putting candidates into their databases and products sooner, then the "bad" duplicate will likely stay in those databases longer, which prevents users from linking between multiple sources for those vulnerabilities. (I believe that there is good evidence that this would happen.) Miscellaneous ------------- Note that I'm skipping all the difficulties in determining vendor acknowledgement (which would be addressed in the long term by responsible vulnerability disclosure on the part of vendors and researchers), or the detailed types of questions that cause even vendors to scratch their heads. Will The Editorial Board Always Catch Duplicates While Voting? -------------------------------------------------------------- No. That has already been demonstrated a few times, unfortunately. But nobody's perfect, and it's possible that a Board member sees and votes on one candidate but not the other, so it is to be expected that sometimes a duplicate candidate becomes a duplicate entry. What Can Be Done About It? -------------------------- Getting more candidates into more vendor and researcher advisories, sooner... which argues for more vendors and researchers using candidate reservation. Alternately, getting candidate numbers into CVE's data sources, before CVE uses those data sources, which immediately brings chickens and eggs to mind. Either approach would be facilitated by increasing the number of CNA's, which requires "training" with respect to content decisions and process changes (some of this training is already going on behind the scenes). And the final punchline: the best solution may be a very small, closely coordinated group of individuals or organizations across the industry, who are dedicated to producing candidate numbers quickly, who are in the business of producing vulnerability data *very* fast and on a *very* large scale, and who are willing and able to put in some time daily since vulnerabilities never sleep, and who are able to apply CVE's content decisions consistently regardless of how they do things in their own databases, and who are experts in vulnerability analysis across a variety of software or platforms, and who are reasonably good at writing terse descriptions. - Steve
|
||||