1 |
2011/12/5 Chí-Thanh Christopher Nguyễn <chithanh@g.o>: |
2 |
> Alec Warner schrieb: |
3 |
>>> Seriously, what do we gain from crawlers accessing sources.gentoo.org? I cant |
4 |
>>> really remember seeing it once in a google query result... |
5 |
>> |
6 |
>> We want the site searchable. |
7 |
> |
8 |
>>>> The majority of the expensive requests are related to package.mask and |
9 |
>>>> use.local.desc queries by crawlers. Like crawling the entire 13000 rev |
10 |
>>>> history for package.mask (or similar.) |
11 |
> |
12 |
> Would it be feasible to use mod_rewrite to direct the most expensive |
13 |
> requests to a static copy, which is re-generated every |
14 |
> ${REASONABLE_TIMEFRAME}? |
15 |
|
16 |
For now user-agents that look like a bot get sent to |
17 |
sources2.gentoo.org (via HTTP-302, not a perm redirect) and humans are |
18 |
good on sources.gentoo.org. Assuming the crawlers and indexing systems |
19 |
follow the spec; hopefully all our search resutls do not get rewritten |
20 |
to sources2.gentoo.org (that would surprise me greatly...wait no it |
21 |
wouldn't ;p) |
22 |
|
23 |
Robin added a caching layer for some segments of the application; I am |
24 |
looking at cprofile dumps and discussing pain points with upstream. |
25 |
|
26 |
-A |
27 |
|
28 |
> |
29 |
> |
30 |
> Best regards, |
31 |
> Chí-Thanh Christopher Nguyễn |
32 |
> |