1 |
On Thu, Nov 29, 2007 at 10:20:11AM -0500, Mike Frysinger wrote: |
2 |
> On Tuesday 13 November 2007, Robin H. Johnson wrote: |
3 |
> > If you had bookmarks to the old style of URL, please consult the FAQ for |
4 |
> > the new form. We are NOT rewriting these URLs: |
5 |
> > '/packages/?category=media-sound;name=mp3unicode' |
6 |
> > (The new form is '/package/media-sound/mp3unicode'). |
7 |
> why ? you've just broken every site out there that links to us in the common |
8 |
> form you've quoted here. there's no reason you cant add three lines of code |
9 |
> to check if the "category" GET variable exists and if so, redirect |
10 |
> accordingly. |
11 |
Because: |
12 |
- Using the ';' as an argument separator in the old side is not a valid |
13 |
query argument separator, and there are URLs out there that have added |
14 |
further arguments using it, complicating parsing. |
15 |
- See also RFC1738: 'Within the <path> and <searchpart> components, "/", |
16 |
";", "?" are reserved.' |
17 |
- The old site allowed a LOT of varations, all leading to the same |
18 |
content, but some of which broke badly. |
19 |
/?category=foo&name=bar |
20 |
/?category=foo;name=bar |
21 |
/?name=bar&category=foo |
22 |
/?name=bar;category=foo;this=wasbroken |
23 |
/packages/?(one of the above query strings) |
24 |
(several more prefixes, all of which gave you the same page) |
25 |
- Having a single valid URL for a given resource greatly improves cache |
26 |
hit rates (and we do use caching heavily on the new site, 60% hit rate |
27 |
at the moment, see further down as well). |
28 |
- The old parsing and variable usage code was the source of multiple |
29 |
bugs as well as the security issue that shuttered the site. |
30 |
- I _want_ old sites to change to using the new form, which I do |
31 |
advertise as being permanent resource URLs (as well as being much |
32 |
easier to construct, take any "[CAT/]PN[-PF]" and slap it onto the |
33 |
base URL, and you are done). |
34 |
|
35 |
That said, if somebody wants to point me to something decent so that |
36 |
Squid can rewrite the URLs WITH the query parameters (the built-in squid |
37 |
stuff seems to ignore them) and hit the cache, and that can add a big |
38 |
warning at the top of the page, I'd be happy to use it for a transition |
39 |
period, just like the RSS URLs (which are redirected until January 2008, |
40 |
but only because they are automated, and not browsed by humans). |
41 |
|
42 |
On the subject of Squid, it would be extremely useful if it could ignore |
43 |
some headers and respect others in figuring out if the page is already |
44 |
in the cache, without stripping the headers from the request (it is |
45 |
doable with Apache's mod_cache), so that two requests with only a |
46 |
slightly different User-Agent between them hit the same cache entry, |
47 |
while different Accept* headers are respected, adn don't hit the same |
48 |
cache entry? |
49 |
|
50 |
-- |
51 |
Robin Hugh Johnson |
52 |
Gentoo Linux Developer & Infra Guy |
53 |
E-Mail : robbat2@g.o |
54 |
GnuPG FP : 11AC BA4F 4778 E3F6 E4ED F38E B27B 944E 3488 4E85 |