Gentoo Archives: gentoo-user

From: Pandu Poluan <pandu@××××××.info>
To: gentoo-user@l.g.o
Subject: Re: [gentoo-user] Google privacy changes
Date: Wed, 08 Feb 2012 17:19:05
Message-Id: CAA2qdGXs8h0f7K+MC+YaZ1GQ=HyObWbrdWO6PRX_2Za2PZe9SQ@mail.gmail.com
In Reply to: Re: [gentoo-user] Google privacy changes by Michael Mol
1 On Feb 8, 2012 10:57 PM, "Michael Mol" <mikemol@×××××.com> wrote:
2 >
3 > On Wed, Feb 8, 2012 at 10:46 AM, Paul Hartman
4 > <paul.hartman+gentoo@×××××.com> wrote:
5 > > On Wed, Feb 8, 2012 at 2:55 AM, Pandu Poluan <pandu@××××××.info> wrote:
6 > >>
7 > >> On Jan 27, 2012 11:18 PM, "Paul Hartman" <paul.hartman+gentoo@×××××.com
8 >
9 > >> wrote:
10 > >>>
11 > >>
12 > >> ---- >8 snippage
13 > >>
14 > >>>
15 > >>> BTW, the Baidu spider hits my site more than all of the others
16 combined...
17 > >>>
18 > >>
19 > >> Somewhat anecdotal, and definitely veering way off-topic, but Baidu
20 was the
21 > >> reason why my company decided to change our webhosting company: Its
22 > >> spidering brought our previous webhosting to its knees...
23 > >>
24 > >> Rgds,
25 > >
26 > > I wonder if Baidu crawler honors the Crawl-delay directive in
27 robots.txt?
28 > >
29 > > Or I wonder if Baidu cralwer IPs need to be covered by firewall tarpit
30 rules. ;)
31 >
32 > I don't remember if it respects Crawl-Delay, but it respects forbidden
33 > paths, etc. I've never been DDOS'd by Baidu crawlers, but I did get
34 > DDOS'd by Yahoo a number of times. Turned out the solution was to
35 > disallow access to expensive-to-render pages. If you're using
36 > MediaWiki with prettified URLs, this works great:
37 >
38 > User-agent: *
39 > Allow: /mw/images/
40 > Allow: /mw/skins/
41 > Allow: /mw/title.png
42 > Disallow: /w/
43 > Disallow: /mw/
44 > Disallow: /wiki/Special:
45 >
46
47 *slaps forehead*
48
49 Now why didn't I think of that before?!
50
51 Thanks for reminding me!
52
53 Rgds,

Replies

Subject Author
Re: [gentoo-user] Google privacy changes Michael Mol <mikemol@×××××.com>