Note: Due to technical difficulties, the Archives are currently not up to date.
GMANE provides an alternative service for most mailing lists. c.f. bug 424647
List Archive: gentoo-dev
2011/12/5 Chí-Thanh Christopher Nguyễn <chithanh@g.o>:
> Alec Warner schrieb:
>>> Seriously, what do we gain from crawlers accessing sources.gentoo.org? I cant
>>> really remember seeing it once in a google query result...
>>
>> We want the site searchable.
>
>>>> The majority of the expensive requests are related to package.mask and
>>>> use.local.desc queries by crawlers. Like crawling the entire 13000 rev
>>>> history for package.mask (or similar.)
>
> Would it be feasible to use mod_rewrite to direct the most expensive
> requests to a static copy, which is re-generated every
> ${REASONABLE_TIMEFRAME}?
For now user-agents that look like a bot get sent to
sources2.gentoo.org (via HTTP-302, not a perm redirect) and humans are
good on sources.gentoo.org. Assuming the crawlers and indexing systems
follow the spec; hopefully all our search resutls do not get rewritten
to sources2.gentoo.org (that would surprise me greatly...wait no it
wouldn't ;p)
Robin added a caching layer for some segments of the application; I am
looking at cprofile dumps and discussing pain points with upstream.
-A
>
>
> Best regards,
> Chí-Thanh Christopher Nguyễn
>
|
|