Gentoo Archives: gentoo-user

From:	Pandu Poluan <pandu@××××××.info>
To:	gentoo-user@l.g.o
Subject:	Re: [gentoo-user] Google privacy changes
Date:	Wed, 08 Feb 2012 17:19:05
Message-Id:	`CAA2qdGXs8h0f7K+MC+YaZ1GQ=HyObWbrdWO6PRX_2Za2PZe9SQ@mail.gmail.com`
In Reply to:	Re: [gentoo-user] Google privacy changes by Michael Mol

1	On Feb 8, 2012 10:57 PM, "Michael Mol" <mikemol@×××××.com> wrote:
2	>
3	> On Wed, Feb 8, 2012 at 10:46 AM, Paul Hartman
4	> <paul.hartman+gentoo@×××××.com> wrote:
5	> > On Wed, Feb 8, 2012 at 2:55 AM, Pandu Poluan <pandu@××××××.info> wrote:
6	> >>
7	> >> On Jan 27, 2012 11:18 PM, "Paul Hartman" <paul.hartman+gentoo@×××××.com
8	>
9	> >> wrote:
10	> >>>
11	> >>
12	> >> ---- >8 snippage
13	> >>
14	> >>>
15	> >>> BTW, the Baidu spider hits my site more than all of the others
16	combined...
17	> >>>
18	> >>
19	> >> Somewhat anecdotal, and definitely veering way off-topic, but Baidu
20	was the
21	> >> reason why my company decided to change our webhosting company: Its
22	> >> spidering brought our previous webhosting to its knees...
23	> >>
24	> >> Rgds,
25	> >
26	> > I wonder if Baidu crawler honors the Crawl-delay directive in
27	robots.txt?
28	> >
29	> > Or I wonder if Baidu cralwer IPs need to be covered by firewall tarpit
30	rules. ;)
31	>
32	> I don't remember if it respects Crawl-Delay, but it respects forbidden
33	> paths, etc. I've never been DDOS'd by Baidu crawlers, but I did get
34	> DDOS'd by Yahoo a number of times. Turned out the solution was to
35	> disallow access to expensive-to-render pages. If you're using
36	> MediaWiki with prettified URLs, this works great:
37	>
38	> User-agent: *
39	> Allow: /mw/images/
40	> Allow: /mw/skins/
41	> Allow: /mw/title.png
42	> Disallow: /w/
43	> Disallow: /mw/
44	> Disallow: /wiki/Special:
45	>
46
47	slaps forehead
48
49	Now why didn't I think of that before?!
50
51	Thanks for reminding me!
52
53	Rgds,

Subject	Author
Re: [gentoo-user] Google privacy changes	Michael Mol <mikemol@×××××.com>