adblocking with a hosts file
Mon, 14 Nov 2011 10:49 categories: blogNaturally adblock plus is a must-have extension for firefox but other programs displaying websites might not offer such a facility.
To block ads on any application accessing the internet, the use of a hosts file which redirects requests to certain hostnames to 127.0.0.1 (which will refuse incoming connections) provides a universal method to get rid of advertisements.
The question is how to obtain a list of malicious hosts.
Searching around revealed three lists that seemed to be well-maintained:
- http://winhelp2002.mvps.org/hosts.htm
- http://pgl.yoyo.org/adservers/
- http://someonewhocares.org/hosts/
The according hosts-file entries can be found under these urls respectively:
- http://winhelp2002.mvps.org/hosts.txt
- http://pgl.yoyo.org/adservers/serverlist.php?hostformat=hosts&mimetype=plaintext
- http://someonewhocares.org/hosts/hosts
I also looked into the adblock plus filter rules but they mostly contain expressions for the path, query and fragment part of URIs and not so much hostnames. This makes sense because using its syntax adblock plus is able to block with much more accuracy than just blocking whole domains.
Now I wanted a combined list of them without duplicates so I cleaned them up using the following sed expression:
sed 's/\([^#]*\)#.*/\1/;s/[ \t]*$//;s/^[ \t]*//;s/[ \t]\+/ /g'
It removes comments, whitespace at the beginning and end of the line and reduces any additional whitespace (between ip and hostname) to only one space. I would then run the output through sort and uniq and append the result to my /etc/hosts.
What is still problematic about this approach is, that if one doesnt have a service bound to 127.0.0.1:80 then every application trying to establish a TCP connection to it will meaninglessly wait for localhost to respond until timeout is reached. To avoid this and immediately send a tcp RST when the browser is redirected to 127.0.0.1 when it tries to retrieve an advertisement, I use the following iptables rule:
iptables -A INPUT -i lo -p tcp -m tcp --dport 80 -j REJECT --reject-with tcp-reset
Some hosts that you might also want to add to your /etc/hosts because they are there to track users are:
127.0.0.1 www.google-analytics.com
127.0.0.1 auto.search.msn.com
127.0.0.1 ad.doubleclick.net
127.0.0.1 google-analytics.com
127.0.0.1 stat.livejournal.com
127.0.0.1 stats.surfaid.ihost.com
127.0.0.1 ads.imeem.com
They are not included by default in the lists above because it might break some websites if they were.
EDIT (2012-05-21)
I forgot to include port 443 for https in the iptables rule above. For example google uses https for googleadservices.com and others might too, so dont forget to also reset connections to port 443 with the rule given above.