[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: Question about robots.txt



On Tue, 8 Dec 2015, Coffin, Chris wrote:

: We made the choice a long time ago to not allow indexing of the 
: cve.mitre.org web site. At least part of that decision was simply 
: resource constraints ? when CVE was in its toddler years, search 
engine 
: indexers were very resource intensive.

That 'decision' was based on crap excuses, even back then. =) As 
someone 
who ran two sites over the time MITRE ran CVE, and intensively watched 
logs on one of them (attrition.org, since 1998-10-07), search engines 
were 
NOT resource intensive back then. Attrition staff talked about that 
issue 
and didn't block any of our content in robots.txt because search engine 
spam was present, but not heavy. For those interested in Internet 
history...

forced ~$ more /home/admin/util/list.filter
72.14.203.104
forced.attrition.org
images.search.yahoo.com
casualgamer.org
myspace.com
stumbleupon.com
f-mai.gif
f-bak.gif
f-att.gif
thefiles.gif
panopta.com
divinelanguage.com
forced ~$ grep -i google /home/admin/util/list.*
/home/admin/util/list.bot:googlebot.com
/home/admin/util/list.bot:Feedfetcher-Google
/home/admin/util/list.filter-old:google.com
/home/admin/util/list.filter-old:google.co.jp/search
/home/admin/util/list.filter-old:google.de
/home/admin/util/list.filter-old:google.fr
/home/admin/util/list.filter-old:google.co.uk
forced ~$

"list.filter-old" is from 2003-08-25. The limited set of Google domains 
should be very telling, given the year and traffic generated.

We actually *stopped* filtering Google at some point, while ignoring 
Yahoo 
early on. Why? Because they were simply not hammering sites and causing 
any undue burden, to a random desktop machine bought at the local 
computer 
store. Those were "ignore displaying those entries in our log parser", 
not 
"block them from reaching our web server" via iptables.

That was Attrition when it was run on a ~ $500 box bought in 1998 and 
hosted on a consumer link, compared to MITRE's resources and CVE 
contract 
money from the government at the time. So to be clear, MITRE's answer 
in 
2015, is based on people forgetting what it was like in 1997 - 1999.

That said, after Kurt's mail in December of 2015... in the last ~ 30 - 
60 
days, I noticed that MITRE finally changed that. Google is now indexing 
and caching the CVE pages.

Thank you, as a long-time taxpayer funding MITRE's projects, including 
CVE, to the tune of $1,487,334,000 in MITRE income last year. Good to 
see 
you making these small changes to help the industry.

: We are currently re-examining this policy and will keep the Board 
: posted.

Except... you didn't. Just like you didn't ask us about the 3k+ 
RESERVED 
fiasco that got several of us talking about this morning, figuring out 
how 
we'd handle it. When NVD spoke up, we all collectively said "hell yeah!"

The fact that NVD called you out, and has since said they will be 
'ignoring' those IDs, is also very significant in CVE history. This is 
the 
first *real* break that NVD has had from CVE ever. There have been 
other 
breaks the last year+, but they were more pedantic and favored NVD over 
MITRE/CVE, based on the time of entries becoming public (e.g. NVD 
published before MITRE did).

Brian


Page Last Updated or Reviewed: May 11, 2017