1 |
On Fri, Nov 30, 2007 at 10:11:31AM +0100, Jan Kundr?t wrote: |
2 |
> > - See also RFC1738: 'Within the <path> and <searchpart> components, "/", |
3 |
> > ";", "?" are reserved.' |
4 |
> My copy of RFC1738 says (end of section 2.2): |
5 |
... |
6 |
> I wasn't able to find your quote in that file. |
7 |
My quote was from the first sentence of RFC1738, sec 3.3 (HTTP), para 4. |
8 |
|
9 |
> What is source of your definition of "valid query argument separator"? |
10 |
<searchpath> is also better defined in RFC2396, section 3.4: |
11 |
Within a query component, the characters ";", "/", "?", ":", "@", |
12 |
"&", "=", "+", ",", and "$" are reserved. |
13 |
Reserved because they have special meanings. |
14 |
|
15 |
> > - Having a single valid URL for a given resource greatly improves cache |
16 |
> > hit rates (and we do use caching heavily on the new site, 60% hit rate |
17 |
> > at the moment, see further down as well). |
18 |
> Redirecting clients to new URLs would give you perfect caching as well. |
19 |
That's why I say i'm willing to do redirection at the cache level. |
20 |
I do NOT want lots of users with old links to hit the actually web application |
21 |
if it's just going to redirect all of them to a page that is already in the |
22 |
cache. |
23 |
|
24 |
> > - The old parsing and variable usage code was the source of multiple |
25 |
> > bugs as well as the security issue that shuttered the site. |
26 |
> Only because it passed the raw, unescaped values directly to shell, |
27 |
> which is of course badly broken. |
28 |
Have a look at the recent discussion about HTML5 issues |
29 |
(http://www.crockford.com/html/), which also applies to web applications: |
30 |
"HTML 5 is strict in the formulation of HTML entities. In the past, some |
31 |
browsers have been too forgiving of malformed entities, exposing users to |
32 |
security exploits. Browsers should not perform heroics to try to make bad |
33 |
content displayable. Such heroics result in security vulnerabilities." |
34 |
|
35 |
> > - I _want_ old sites to change to using the new form, which I do |
36 |
> > advertise as being permanent resource URLs (as well as being much |
37 |
> > easier to construct, take any "[CAT/]PN[-PF]" and slap it onto the |
38 |
> > base URL, and you are done). |
39 |
> Which isn't a reason for breaking old links, IMHO. |
40 |
Visitors to the old /ebuilds/ or /packages/ links get a redirect to the |
41 |
frontpage. While that isn't the content they were after, it's find to help them |
42 |
find it. |
43 |
|
44 |
> > That said, if somebody wants to point me to something decent so that |
45 |
> > Squid can rewrite the URLs WITH the query parameters (the built-in squid |
46 |
> > stuff seems to ignore them) and hit the cache, and that can add a big |
47 |
> > warning at the top of the page, I'd be happy to use it for a transition |
48 |
> > period, just like the RSS URLs (which are redirected until January 2008, |
49 |
> > but only because they are automated, and not browsed by humans). |
50 |
> Now that's something that sound reasonable. Why limit the period and |
51 |
> don't provide it forever? |
52 |
Time limited to force everybody to move over, and to not have to support |
53 |
the redirections for the old version of the site forever, when they |
54 |
weren't advertised as permanent URLs. |
55 |
|
56 |
I did a quick hack up of some statistics, and I see that only 6.7% (5001 out of |
57 |
(69434+5001)) of the overall visitors were arriving at the old locations and |
58 |
not receiving the content they were originally interested in. |
59 |
|
60 |
Based on these stats, I'd say we are doing well in getting users to |
61 |
update their links for the new site already, since it's been up for 2 |
62 |
weeks now. |
63 |
|
64 |
Successful page loads (2xx, 304), by section, for November 29th. |
65 |
60 /verbump |
66 |
114 /newpackage |
67 |
167 /faq |
68 |
645 /robots.txt |
69 |
779 /categories |
70 |
1037 /arch |
71 |
2348 /category |
72 |
3329 /favicon.ico |
73 |
9084 / |
74 |
9292 /media |
75 |
20491 /package |
76 |
35354 /feed |
77 |
----------------------------- |
78 |
69434 Total of data pages (no robots, css, images, favicon) |
79 |
13266 Total of rotos, images, favicon. |
80 |
|
81 |
Failed page loads (4xx, 5xx, 3xx excluding 304), by section and code, for |
82 |
November 29th. Slew of 404 codes for PHP exploits excluded, and grouped by |
83 |
how it was handled: |
84 |
- Specific redirect for usage of an old RSS path: |
85 |
25 /feed 301 |
86 |
91 /archs 301 |
87 |
- Redirected because requested object not found (invalid package, etc): |
88 |
25 /arch 302 |
89 |
30 /category 302 |
90 |
44 /feed 406 |
91 |
164 /feed 302 |
92 |
632 /package 302 |
93 |
- Error or general redirect for an old URL: |
94 |
11 /similar 404 |
95 |
22 /main 404 |
96 |
24 ///x86%20stable 404 |
97 |
44 /daily 404 |
98 |
222 /search 404 |
99 |
347 /images 404 (excluded from total) |
100 |
2096 /ebuilds 302 |
101 |
2582 /packages 302 |
102 |
----------------------------- |
103 |
5001 Total (no images) |
104 |
|
105 |
-- |
106 |
Robin Hugh Johnson |
107 |
Gentoo Linux Developer & Infra Guy |
108 |
E-Mail : robbat2@g.o |
109 |
GnuPG FP : 11AC BA4F 4778 E3F6 E4ED F38E B27B 944E 3488 4E85 |