Gentoo Archives: gentoo-dev

From: "Robin H. Johnson" <robbat2@g.o>
To: gentoo-dev@l.g.o
Subject: Re: [gentoo-dev] packages.gentoo.org lives!
Date: Thu, 29 Nov 2007 18:35:53
Message-Id: 20071129183319.GV14557@curie-int.orbis-terrarum.net
In Reply to: Re: [gentoo-dev] packages.gentoo.org lives! by Mike Frysinger
1 On Thu, Nov 29, 2007 at 10:20:11AM -0500, Mike Frysinger wrote:
2 > On Tuesday 13 November 2007, Robin H. Johnson wrote:
3 > > If you had bookmarks to the old style of URL, please consult the FAQ for
4 > > the new form. We are NOT rewriting these URLs:
5 > > '/packages/?category=media-sound;name=mp3unicode'
6 > > (The new form is '/package/media-sound/mp3unicode').
7 > why ? you've just broken every site out there that links to us in the common
8 > form you've quoted here. there's no reason you cant add three lines of code
9 > to check if the "category" GET variable exists and if so, redirect
10 > accordingly.
11 Because:
12 - Using the ';' as an argument separator in the old side is not a valid
13 query argument separator, and there are URLs out there that have added
14 further arguments using it, complicating parsing.
15 - See also RFC1738: 'Within the <path> and <searchpart> components, "/",
16 ";", "?" are reserved.'
17 - The old site allowed a LOT of varations, all leading to the same
18 content, but some of which broke badly.
19 /?category=foo&name=bar
20 /?category=foo;name=bar
21 /?name=bar&category=foo
22 /?name=bar;category=foo;this=wasbroken
23 /packages/?(one of the above query strings)
24 (several more prefixes, all of which gave you the same page)
25 - Having a single valid URL for a given resource greatly improves cache
26 hit rates (and we do use caching heavily on the new site, 60% hit rate
27 at the moment, see further down as well).
28 - The old parsing and variable usage code was the source of multiple
29 bugs as well as the security issue that shuttered the site.
30 - I _want_ old sites to change to using the new form, which I do
31 advertise as being permanent resource URLs (as well as being much
32 easier to construct, take any "[CAT/]PN[-PF]" and slap it onto the
33 base URL, and you are done).
34
35 That said, if somebody wants to point me to something decent so that
36 Squid can rewrite the URLs WITH the query parameters (the built-in squid
37 stuff seems to ignore them) and hit the cache, and that can add a big
38 warning at the top of the page, I'd be happy to use it for a transition
39 period, just like the RSS URLs (which are redirected until January 2008,
40 but only because they are automated, and not browsed by humans).
41
42 On the subject of Squid, it would be extremely useful if it could ignore
43 some headers and respect others in figuring out if the page is already
44 in the cache, without stripping the headers from the request (it is
45 doable with Apache's mod_cache), so that two requests with only a
46 slightly different User-Agent between them hit the same cache entry,
47 while different Accept* headers are respected, adn don't hit the same
48 cache entry?
49
50 --
51 Robin Hugh Johnson
52 Gentoo Linux Developer & Infra Guy
53 E-Mail : robbat2@g.o
54 GnuPG FP : 11AC BA4F 4778 E3F6 E4ED F38E B27B 944E 3488 4E85

Replies

Subject Author
Re: [gentoo-dev] packages.gentoo.org lives! Thilo Bangert <bangert@g.o>
Re: [gentoo-dev] packages.gentoo.org lives! "Jan Kundrát" <jkt@g.o>