[
Date Prev][
Date Next][
Thread Prev][
Thread Next][
Date Index][
Thread Index]
RE: Question about robots.txt
On Tue, 8 Dec 2015, Coffin, Chris wrote:
: We made the choice a long time ago to not allow indexing of the
: cve.mitre.org web site. At least part of that decision was simply
: resource constraints ? when CVE was in its toddler years, search
engine
: indexers were very resource intensive.
That 'decision' was based on crap excuses, even back then. =) As
someone
who ran two sites over the time MITRE ran CVE, and intensively watched
logs on one of them (attrition.org, since 1998-10-07), search engines
were
NOT resource intensive back then. Attrition staff talked about that
issue
and didn't block any of our content in robots.txt because search engine
spam was present, but not heavy. For those interested in Internet
history...
forced ~$ more /home/admin/util/list.filter
72.14.203.104
forced.attrition.org
images.search.yahoo.com
casualgamer.org
myspace.com
stumbleupon.com
f-mai.gif
f-bak.gif
f-att.gif
thefiles.gif
panopta.com
divinelanguage.com
forced ~$ grep -i google /home/admin/util/list.*
/home/admin/util/list.bot:googlebot.com
/home/admin/util/list.bot:Feedfetcher-Google
/home/admin/util/list.filter-old:google.com
/home/admin/util/list.filter-old:google.co.jp/search
/home/admin/util/list.filter-old:google.de
/home/admin/util/list.filter-old:google.fr
/home/admin/util/list.filter-old:google.co.uk
forced ~$
"list.filter-old" is from 2003-08-25. The limited set of Google domains
should be very telling, given the year and traffic generated.
We actually *stopped* filtering Google at some point, while ignoring
Yahoo
early on. Why? Because they were simply not hammering sites and causing
any undue burden, to a random desktop machine bought at the local
computer
store. Those were "ignore displaying those entries in our log parser",
not
"block them from reaching our web server" via iptables.
That was Attrition when it was run on a ~ $500 box bought in 1998 and
hosted on a consumer link, compared to MITRE's resources and CVE
contract
money from the government at the time. So to be clear, MITRE's answer
in
2015, is based on people forgetting what it was like in 1997 - 1999.
That said, after Kurt's mail in December of 2015... in the last ~ 30 -
60
days, I noticed that MITRE finally changed that. Google is now indexing
and caching the CVE pages.
Thank you, as a long-time taxpayer funding MITRE's projects, including
CVE, to the tune of $1,487,334,000 in MITRE income last year. Good to
see
you making these small changes to help the industry.
: We are currently re-examining this policy and will keep the Board
: posted.
Except... you didn't. Just like you didn't ask us about the 3k+
RESERVED
fiasco that got several of us talking about this morning, figuring out
how
we'd handle it. When NVD spoke up, we all collectively said "hell yeah!"
The fact that NVD called you out, and has since said they will be
'ignoring' those IDs, is also very significant in CVE history. This is
the
first *real* break that NVD has had from CVE ever. There have been
other
breaks the last year+, but they were more pedantic and favored NVD over
MITRE/CVE, based on the time of entries becoming public (e.g. NVD
published before MITRE did).
Brian