1 |
I have some doubts about massive "hosts" files for adblocking. I |
2 |
downloaded one that listed 13,148 sites. I fed them through a script |
3 |
that called "host" for each entry, and saved the output to a text file. |
4 |
The result was 1,059 addresses. Note that some adservers have multiple |
5 |
IP address entries for the same name. A back-of-the-envelope analysis |
6 |
is that close to 95% of the entries in the large host file are invalid, |
7 |
amd return "not found: 3(NXDOMAIN)". |
8 |
|
9 |
I'm not here to trash the people compiling the lists; the problem is |
10 |
that hosts files are the wrong tool for the job. Advertisers know about |
11 |
hosts files and deliberately generate random subdomain names with short |
12 |
lifetimes to invalidate the hosts files. Every week the sites are |
13 |
probably mostly renamed. Further analysis of the 1,059 addresses show |
14 |
810 unique entries, i.e. 249 duplicates. It gets even better. 44 |
15 |
addresses show up in 52.84.146.xxx; I should probably block the entire |
16 |
/24 with one entry. There are multiple similar occurences, which could |
17 |
be aggregated into small CIDRs. So the number of blocking rules is |
18 |
greatly reduced. |
19 |
|
20 |
I'm not a deep networking expert. My question is whether I'm better |
21 |
off adding iptables reject/drop rules or "reject routes", e.g... |
22 |
|
23 |
route add -net 10.0.0.0 netmask 255.0.0.0 metric 1024 reject |
24 |
|
25 |
(an example from the "route" man page). iptables rules have to be |
26 |
duplicated coming and going to catch inbound and outbound traffic. A |
27 |
reject route only needs to be entered once. This excercise is intended |
28 |
to block web adservers, so another question is how web browsers react to |
29 |
route versus iptables blocking. |
30 |
|
31 |
While I'm at it (I did say I'm not an expert) is there another way to |
32 |
handle this? E.g. redirect "blocked CIDRs" via iptables or route to a |
33 |
local pixel image? Will that produce an immediate response by the web |
34 |
browser, versus timing out with "regular blocking"? |
35 |
|
36 |
-- |
37 |
Walter Dnes <waltdnes@××××××××.org> |
38 |
I don't run "desktop environments"; I run useful applications |