Gentoo Archives: gentoo-dev

From: "Robin H. Johnson" <robbat2@g.o>
To: gentoo-dev@l.g.o
Subject: Re: [gentoo-dev] packages.gentoo.org lives!
Date: Fri, 30 Nov 2007 20:04:24
Message-Id: 20071130200009.GC14557@curie-int.orbis-terrarum.net
In Reply to: Re: [gentoo-dev] packages.gentoo.org lives! by "Jan Kundrát"
1 On Fri, Nov 30, 2007 at 10:11:31AM +0100, Jan Kundr?t wrote:
2 > > - See also RFC1738: 'Within the <path> and <searchpart> components, "/",
3 > > ";", "?" are reserved.'
4 > My copy of RFC1738 says (end of section 2.2):
5 ...
6 > I wasn't able to find your quote in that file.
7 My quote was from the first sentence of RFC1738, sec 3.3 (HTTP), para 4.
8
9 > What is source of your definition of "valid query argument separator"?
10 <searchpath> is also better defined in RFC2396, section 3.4:
11 Within a query component, the characters ";", "/", "?", ":", "@",
12 "&", "=", "+", ",", and "$" are reserved.
13 Reserved because they have special meanings.
14
15 > > - Having a single valid URL for a given resource greatly improves cache
16 > > hit rates (and we do use caching heavily on the new site, 60% hit rate
17 > > at the moment, see further down as well).
18 > Redirecting clients to new URLs would give you perfect caching as well.
19 That's why I say i'm willing to do redirection at the cache level.
20 I do NOT want lots of users with old links to hit the actually web application
21 if it's just going to redirect all of them to a page that is already in the
22 cache.
23
24 > > - The old parsing and variable usage code was the source of multiple
25 > > bugs as well as the security issue that shuttered the site.
26 > Only because it passed the raw, unescaped values directly to shell,
27 > which is of course badly broken.
28 Have a look at the recent discussion about HTML5 issues
29 (http://www.crockford.com/html/), which also applies to web applications:
30 "HTML 5 is strict in the formulation of HTML entities. In the past, some
31 browsers have been too forgiving of malformed entities, exposing users to
32 security exploits. Browsers should not perform heroics to try to make bad
33 content displayable. Such heroics result in security vulnerabilities."
34
35 > > - I _want_ old sites to change to using the new form, which I do
36 > > advertise as being permanent resource URLs (as well as being much
37 > > easier to construct, take any "[CAT/]PN[-PF]" and slap it onto the
38 > > base URL, and you are done).
39 > Which isn't a reason for breaking old links, IMHO.
40 Visitors to the old /ebuilds/ or /packages/ links get a redirect to the
41 frontpage. While that isn't the content they were after, it's find to help them
42 find it.
43
44 > > That said, if somebody wants to point me to something decent so that
45 > > Squid can rewrite the URLs WITH the query parameters (the built-in squid
46 > > stuff seems to ignore them) and hit the cache, and that can add a big
47 > > warning at the top of the page, I'd be happy to use it for a transition
48 > > period, just like the RSS URLs (which are redirected until January 2008,
49 > > but only because they are automated, and not browsed by humans).
50 > Now that's something that sound reasonable. Why limit the period and
51 > don't provide it forever?
52 Time limited to force everybody to move over, and to not have to support
53 the redirections for the old version of the site forever, when they
54 weren't advertised as permanent URLs.
55
56 I did a quick hack up of some statistics, and I see that only 6.7% (5001 out of
57 (69434+5001)) of the overall visitors were arriving at the old locations and
58 not receiving the content they were originally interested in.
59
60 Based on these stats, I'd say we are doing well in getting users to
61 update their links for the new site already, since it's been up for 2
62 weeks now.
63
64 Successful page loads (2xx, 304), by section, for November 29th.
65 60 /verbump
66 114 /newpackage
67 167 /faq
68 645 /robots.txt
69 779 /categories
70 1037 /arch
71 2348 /category
72 3329 /favicon.ico
73 9084 /
74 9292 /media
75 20491 /package
76 35354 /feed
77 -----------------------------
78 69434 Total of data pages (no robots, css, images, favicon)
79 13266 Total of rotos, images, favicon.
80
81 Failed page loads (4xx, 5xx, 3xx excluding 304), by section and code, for
82 November 29th. Slew of 404 codes for PHP exploits excluded, and grouped by
83 how it was handled:
84 - Specific redirect for usage of an old RSS path:
85 25 /feed 301
86 91 /archs 301
87 - Redirected because requested object not found (invalid package, etc):
88 25 /arch 302
89 30 /category 302
90 44 /feed 406
91 164 /feed 302
92 632 /package 302
93 - Error or general redirect for an old URL:
94 11 /similar 404
95 22 /main 404
96 24 ///x86%20stable 404
97 44 /daily 404
98 222 /search 404
99 347 /images 404 (excluded from total)
100 2096 /ebuilds 302
101 2582 /packages 302
102 -----------------------------
103 5001 Total (no images)
104
105 --
106 Robin Hugh Johnson
107 Gentoo Linux Developer & Infra Guy
108 E-Mail : robbat2@g.o
109 GnuPG FP : 11AC BA4F 4778 E3F6 E4ED F38E B27B 944E 3488 4E85

Replies

Subject Author
Re: [gentoo-dev] packages.gentoo.org lives! "Jan Kundrát" <jkt@g.o>