Gentoo Archives: gentoo-dev

From: Alec Warner <antarus@g.o>
To: gentoo-dev@l.g.o
Subject: Re: [gentoo-dev] sources.gentoo.org instability
Date: Fri, 09 Dec 2011 05:31:49
Message-Id: CAAr7Pr9Q+ZSPL11fgWPK6batyqH4S1dHJcCW9PJQn3CSqLoDqw@mail.gmail.com
In Reply to: Re: [gentoo-dev] sources.gentoo.org instability by "Chí-Thanh Christopher Nguyễn"
1 2011/12/5 Chí-Thanh Christopher Nguyễn <chithanh@g.o>:
2 > Alec Warner schrieb:
3 >>> Seriously, what do we gain from crawlers accessing sources.gentoo.org?  I cant
4 >>> really remember seeing it once in a google query result...
5 >>
6 >> We want the site searchable.
7 >
8 >>>> The majority of the expensive requests are related to package.mask and
9 >>>> use.local.desc queries by crawlers. Like crawling the entire 13000 rev
10 >>>> history for package.mask (or similar.)
11 >
12 > Would it be feasible to use mod_rewrite to direct the most expensive
13 > requests to a static copy, which is re-generated every
14 > ${REASONABLE_TIMEFRAME}?
15
16 For now user-agents that look like a bot get sent to
17 sources2.gentoo.org (via HTTP-302, not a perm redirect) and humans are
18 good on sources.gentoo.org. Assuming the crawlers and indexing systems
19 follow the spec; hopefully all our search resutls do not get rewritten
20 to sources2.gentoo.org (that would surprise me greatly...wait no it
21 wouldn't ;p)
22
23 Robin added a caching layer for some segments of the application; I am
24 looking at cprofile dumps and discussing pain points with upstream.
25
26 -A
27
28 >
29 >
30 > Best regards,
31 > Chí-Thanh Christopher Nguyễn
32 >