1 |
Seriously, what do we gain from crawlers accessing sources.gentoo.org? I cant |
2 |
really remember seeing it once in a google query result... |
3 |
|
4 |
Possibly it would not even be required to deny all requests, but just deny |
5 |
everything related to ancient history... |
6 |
|
7 |
> Hello, |
8 |
> |
9 |
> For a while sources.gentoo.org has been puttering along and its health |
10 |
> has slowly declined. We migrated it to some newer shiny hardware in an |
11 |
> attempt to mitigate the problem but that did not pan out. 90% (or |
12 |
> more) of sources.gentoo.org traffic is crawler bots and not actual |
13 |
> humans. That being said; if we cannot serve requests to the bots |
14 |
> within our timeouts we serve 500's instead which is never really what |
15 |
> we want (particularly when we spent 20s of CPU to calculate 80% of the |
16 |
> response only to see the client timeout :/.) |
17 |
> |
18 |
> The majority of the expensive requests are related to package.mask and |
19 |
> use.local.desc queries by crawlers. Like crawling the entire 13000 rev |
20 |
> history for package.mask (or similar.) |
21 |
> |
22 |
> While it is likely we will monkey patch viewvc to be less wasteful; in |
23 |
> the meantime I have removed use.local.desc from sources.gentoo.org |
24 |
> (and also anoncvs, because they share the same repo.) I hope this is a |
25 |
> short term (order of weeks) hack. |
26 |
> |
27 |
> -A |
28 |
|
29 |
-- |
30 |
Andreas K. Huettel |
31 |
Gentoo Linux developer |
32 |
kde, sci, arm, tex, printing |