1 |
On Mon, Dec 5, 2011 at 3:48 AM, Andreas K. Huettel <dilfridge@g.o> wrote: |
2 |
> |
3 |
> Seriously, what do we gain from crawlers accessing sources.gentoo.org? I cant |
4 |
> really remember seeing it once in a google query result... |
5 |
|
6 |
We want the site searchable. |
7 |
|
8 |
> |
9 |
> Possibly it would not even be required to deny all requests, but just deny |
10 |
> everything related to ancient history... |
11 |
> |
12 |
>> Hello, |
13 |
>> |
14 |
>> For a while sources.gentoo.org has been puttering along and its health |
15 |
>> has slowly declined. We migrated it to some newer shiny hardware in an |
16 |
>> attempt to mitigate the problem but that did not pan out. 90% (or |
17 |
>> more) of sources.gentoo.org traffic is crawler bots and not actual |
18 |
>> humans. That being said; if we cannot serve requests to the bots |
19 |
>> within our timeouts we serve 500's instead which is never really what |
20 |
>> we want (particularly when we spent 20s of CPU to calculate 80% of the |
21 |
>> response only to see the client timeout :/.) |
22 |
>> |
23 |
>> The majority of the expensive requests are related to package.mask and |
24 |
>> use.local.desc queries by crawlers. Like crawling the entire 13000 rev |
25 |
>> history for package.mask (or similar.) |
26 |
>> |
27 |
>> While it is likely we will monkey patch viewvc to be less wasteful; in |
28 |
>> the meantime I have removed use.local.desc from sources.gentoo.org |
29 |
>> (and also anoncvs, because they share the same repo.) I hope this is a |
30 |
>> short term (order of weeks) hack. |
31 |
>> |
32 |
>> -A |
33 |
> |
34 |
> -- |
35 |
> Andreas K. Huettel |
36 |
> Gentoo Linux developer |
37 |
> kde, sci, arm, tex, printing |
38 |
> |
39 |
> |