1 |
On 04/10/2017 07:28, Walter Dnes wrote: |
2 |
> I have some doubts about massive "hosts" files for adblocking. I |
3 |
> downloaded one that listed 13,148 sites. I fed them through a script |
4 |
> that called "host" for each entry, and saved the output to a text file. |
5 |
> The result was 1,059 addresses. Note that some adservers have multiple |
6 |
> IP address entries for the same name. A back-of-the-envelope analysis |
7 |
> is that close to 95% of the entries in the large host file are invalid, |
8 |
> amd return "not found: 3(NXDOMAIN)". |
9 |
> |
10 |
> I'm not here to trash the people compiling the lists; the problem is |
11 |
> that hosts files are the wrong tool for the job. Advertisers know about |
12 |
> hosts files and deliberately generate random subdomain names with short |
13 |
> lifetimes to invalidate the hosts files. Every week the sites are |
14 |
> probably mostly renamed. Further analysis of the 1,059 addresses show |
15 |
> 810 unique entries, i.e. 249 duplicates. It gets even better. 44 |
16 |
> addresses show up in 52.84.146.xxx; I should probably block the entire |
17 |
> /24 with one entry. There are multiple similar occurences, which could |
18 |
> be aggregated into small CIDRs. So the number of blocking rules is |
19 |
> greatly reduced. |
20 |
> |
21 |
> I'm not a deep networking expert. My question is whether I'm better |
22 |
> off adding iptables reject/drop rules or "reject routes", e.g... |
23 |
> |
24 |
> route add -net 10.0.0.0 netmask 255.0.0.0 metric 1024 reject |
25 |
> |
26 |
> (an example from the "route" man page). iptables rules have to be |
27 |
> duplicated coming and going to catch inbound and outbound traffic. A |
28 |
> reject route only needs to be entered once. This excercise is intended |
29 |
> to block web adservers, so another question is how web browsers react to |
30 |
> route versus iptables blocking. |
31 |
> |
32 |
> While I'm at it (I did say I'm not an expert) is there another way to |
33 |
> handle this? E.g. redirect "blocked CIDRs" via iptables or route to a |
34 |
> local pixel image? Will that produce an immediate response by the web |
35 |
> browser, versus timing out with "regular blocking"? |
36 |
> |
37 |
|
38 |
|
39 |
This is a complex problem with no cut-and-dried solution. It's real life |
40 |
and as you know real life is murky. |
41 |
|
42 |
Let's define the real problem you are wanting to solve: there's a bunch |
43 |
of ad servers out there, and you want them to disappear. Or more |
44 |
accurately, you want their traffic to disappear from *your* wires. |
45 |
|
46 |
There are really 3 approaches as you know: |
47 |
redefine the hostname to be a blackhole (e.g. 127.0.0.1) |
48 |
find the addresses or subnets and drop/reject the packets with iptables |
49 |
find the subnets (sometimes the individual hosts) and route them into a |
50 |
blackhole |
51 |
|
52 |
Each has their strengths and weaknesses. |
53 |
packet filters work best at the TCP/UDP/ICMP layer where you have an |
54 |
addresses and often a port. |
55 |
routing works best at the IP layer where you have whole chunks of |
56 |
subnets and tell the router what to do with all traffic from that entire |
57 |
subnet |
58 |
host files work best at the name layer where you have dns names |
59 |
|
60 |
Your problem seems to slot in somewhere between a firewall and a routing |
61 |
solution, explaining why you can't decide. Host files for this sucks |
62 |
major big eggs as you know, people still use it as it seems legit (but |
63 |
isn't) and they understand it whereas they don't understand the other 2. |
64 |
|
65 |
Ad providers are well aware of this. I was surprised to see |
66 |
52.84.146.0/24 show up in your mail, as that is Amazon's AWS range. Yes, |
67 |
you could null-route that subnet, but it's Amazon and maybe there's |
68 |
hosts in there that you DO want to use. |
69 |
|
70 |
I'd suggest you use a packet filter, but not on Linux and certainly not |
71 |
iptables. That thing is a god-awful mess looking like it was built by |
72 |
unsupervised schoolkids masquerading as internes. The best tool for this |
73 |
is the pf packet filter, but it runs on FreeBSD. Get yourself a spare |
74 |
machine, load pfsense on it (it's an appliance like wrt) and drop the |
75 |
traffic from all offensive addresses. Drop, not reject. |
76 |
|
77 |
You could in theory do the same thing with iptables, but the ruleset |
78 |
will quickly drive you nuts. Perhaps the ipset plugin would help, I've |
79 |
been meaning to check it out for ages and never got around to it. |
80 |
|
81 |
|
82 |
-- |
83 |
Alan McKinnon |
84 |
alan.mckinnon@×××××.com |