Code Monkey home page Code Monkey logo

hosts's Introduction

Take Note!

With the exception of issues and PRs regarding changes to hosts/data/StevenBlack/hosts, all other issues regarding the content of the produced hosts files should be made with the appropriate data source that contributed the content in question. The contact information for all of the data sources can be found in the hosts/data/ directory.


Logo

latest release license repo size contributors Build Status Code style: black commits since last release last commit commit activity

Unified hosts file with base extensions

This repository consolidates several reputable hosts files, and merges them into a unified hosts file with duplicates removed. A variety of tailored hosts files are provided.

Therefore this repository is a hosts file aggregator.

Aggregator

Size history

List of all hosts file variants

This repository offers 31 different host file variants, in addition to the base variant, with and without the unified hosts included.

The Non GitHub mirror is the link to use for some hosts file managers like Hostsman for Windows that don't work with GitHub download links.

Host file recipe Readme Raw hosts Unique domains Non GitHub mirror
Unified hosts = (adware + malware) Readme link 124,961 link
Unified hosts + fakenews Readme link 127,155 link
fakenews Readme link 2,194 link
Unified hosts + gambling Readme link 133,041 link
gambling Readme link 8,092 link
Unified hosts + porn Readme link 208,096 link
porn Readme link 83,831 link
Unified hosts + social Readme link 128,092 link
social Readme link 3,157 link
Unified hosts + fakenews + gambling Readme link 135,235 link
fakenews + gambling Readme link 10,286 link
Unified hosts + fakenews + porn Readme link 210,290 link
fakenews + porn Readme link 86,025 link
Unified hosts + fakenews + social Readme link 130,286 link
fakenews + social Readme link 5,351 link
Unified hosts + gambling + porn Readme link 216,176 link
gambling + porn Readme link 91,923 link
Unified hosts + gambling + social Readme link 136,172 link
gambling + social Readme link 11,249 link
Unified hosts + porn + social Readme link 211,226 link
porn + social Readme link 86,987 link
Unified hosts + fakenews + gambling + porn Readme link 218,370 link
fakenews + gambling + porn Readme link 94,117 link
Unified hosts + fakenews + gambling + social Readme link 138,366 link
fakenews + gambling + social Readme link 13,443 link
Unified hosts + fakenews + porn + social Readme link 213,420 link
fakenews + porn + social Readme link 89,181 link
Unified hosts + gambling + porn + social Readme link 219,306 link
gambling + porn + social Readme link 95,079 link
Unified hosts + fakenews + gambling + porn + social Readme link 221,500 link
fakenews + gambling + porn + social Readme link 97,273 link

Expectation: These unified hosts files should serve all devices, regardless of OS.

Sources of hosts data unified in this variant

Updated hosts files from the following locations are always unified and included:

Host file source Home page Raw hosts License Issues Description
Steven Black's ad-hoc list link raw MIT issues Additional sketch domains as I come across them.
AdAway link raw CC BY 3.0 issues AdAway is an open source ad blocker for Android using the hosts file.
add.2o7Net link raw MIT issues 2o7Net tracking sites based on hostsfile.org content.
add.Dead link raw MIT issues Dead sites based on hostsfile.org content.
add.Risk link raw MIT issues Risk content sites based on hostsfile.org content.
add.Spam link raw MIT issues Spam sites based on hostsfile.org content.
Mitchell Krog's - Badd Boyz Hosts link raw MIT issues Sketchy domains and Bad Referrers from my Nginx and Apache Bad Bot and Spam Referrer Blockers
hostsVN link raw MIT issues Hosts block ads of Vietnamese
KADhosts link raw CC BY-SA 4.0 issues Fraud/adware/scam websites.
MetaMask eth-phishing-detect link raw DON'T BE A DICK PUBLIC LICENSE issues Phishing domains targeting Ethereum users.
minecraft-hosts link raw CC0-1.0 issues Minecraft related tracker hosts
MVPS hosts file link raw CC BY-NC-SA 4.0 issues The purpose of this site is to provide the user with a high quality custom HOSTS file.
shady-hosts link raw CC0-1.0 issues Analytics, ad, and activity monitoring hosts
Dan Pollock – someonewhocares link raw non-commercial with attribution issues How to make the internet not suck (as much).
Tiuxo hostlist - ads link raw CC BY 4.0 issues Categorized hosts files for DNS based content blocking
UncheckyAds link raw MIT issues Windows installers ads sources sites based on https://unchecky.com/ content.
URLHaus link raw CC0 issues A project from abuse.ch with the goal of sharing malicious URLs.
yoyo.org link raw issues Blocking with ad server and tracking server hostnames.

Extensions

The unified hosts file is optionally extensible. Extensions are used to include domains by category. Currently, we offer the following categories: fakenews, social, gambling, and porn.

Extensions are optional, and can be combined in various ways with the base hosts file. The combined products are stored in the alternates folder.

Data for extensions are stored in the extensions folder. You manage extensions by curating this folder tree, where you will find the data for fakenews, social, gambling, and porn extension data that we maintain and provide for you.

Generate your own unified hosts file

You have three options to generate your own hosts file. You can use our container image, build your own image, or do it in your own environment. Option #1 is easiest if you have Linux with Docker installed.

Option 1: Use our container image (Linux only)

This will replace your /etc/hosts.

We assume you have Docker available on your host. Just run the following command. Set extensions to your preference.

docker run --pull always --rm -it -v /etc/hosts:/etc/hosts \
ghcr.io/stevenblack/hosts:latest updateHostsFile.py --auto \
--replace --extensions gambling porn

If you want to add custom hosts or a whitelist, create either or both files as per the instructions and add the following arguments before ghcr.io/stevenblack/hosts:latest depending on which you wish to use.

-v "path/to/myhosts:/hosts/myhosts" \
-v "path/to/whitelist:/hosts/whitelist" \

You can rerun this exact command later to update based on the latest available hosts (for example, add it to a weekly cron job).

Option 2: Generate your own container image

We provide the Dockerfile used by the previous step, which you can use to create a container image with everything you need. The container will contain Python 3 and all its dependency requirements, and a copy of the latest version of this repository.

Build the Docker container from the root of this repo like this:

docker build --no-cache . -t stevenblack-hosts

Then run your command as such:

docker run --rm -it stevenblack-hosts updateHostsFile.py

This will create the hosts file, and remove it with the container when done, so not very useful. You can use the example in option #1 to add volumes so files on your host are replaced.

Option 3: Generate it in your own environment

To generate your own amalgamated hosts files you will need Python 3.6 or later.

First, install the dependencies with:

pip3 install --user -r requirements.txt

Note we recommend the --user flag which installs the required dependencies at the user level. More information about it can be found on pip documentation.

Option 4: Generate it in Google Colab

Spin up a free remote Google Colab environment.

Common steps regardless of your development environment

To run unit tests, in the top-level directory, run:

python3 testUpdateHostsFile.py

The updateHostsFile.py script will generate a unified hosts file based on the sources in the local data/ subfolder. The script will prompt you whether it should fetch updated versions (from locations defined by the update.json text file in each source's folder). Otherwise, it will use the hosts file that's already there.

python3 updateHostsFile.py [--auto] [--replace] [--ip nnn.nnn.nnn.nnn] [--extensions ext1 ext2 ext3]

Command line options

--help, or -h: display help.

--auto, or -a: run the script without prompting. When --auto is invoked,

  • Hosts data sources, including extensions, are updated.
  • No extensions are included by default. Use the --extensions or -e flag to include any you want.
  • Your active hosts file is not replaced unless you include the --replace flag.

--backup, or -b: Make a backup of existing hosts file(s) as you generate over them.

--extensions <ext1> <ext2> <ext3>, or -e <ext1> <ext2> <ext3>: the names of subfolders below the extensions folder containing additional category-specific hosts files to include in the amalgamation. Example: --extensions porn or -e social porn.

--flush-dns-cache, or -f: skip the prompt for flushing the DNS cache. Only active when --replace is also active.

--ip nnn.nnn.nnn.nnn, or -i nnn.nnn.nnn.nnn: the IP address to use as the target. Default is 0.0.0.0.

--keepdomaincomments, or -k: true (default) or false, keep the comments that appear on the same line as domains. The default is true.

--noupdate, or -n: skip fetching updates from hosts data sources.

--output <subfolder>, or -o <subfolder>: place the generated source file in a subfolder. If the subfolder does not exist, it will be created.

--replace, or -r: trigger replacing your active hosts

--skipstatichosts, or -s: false (default) or true, omit the standard section at the top, containing lines like 127.0.0.1 localhost. This is useful for configuring proximate DNS services on the local network.

--nogendata, or -g: false (default) or true, skip the generation of the readmeData.json file used for generating readme.md files. This is useful if you are generating host files with additional whitelists or blacklists and want to keep your local checkout of this repo unmodified.

--nounifiedhosts: false (default) or true, do not include the unified hosts file in the final hosts file. Usually used together with --extensions.

--compress, or -c: false (default) or true, Compress the hosts file ignoring non-necessary lines (empty lines and comments) and putting multiple domains in each line. Reducing the number of lines of the hosts file improves the performances under Windows (with DNS Client service enabled).

--minimise, or -m: false (default) or true, like --compress, but puts each domain on a separate line. This is necessary because many implementations of URL blockers that rely on hosts files do not conform to the standard which allows multiple hosts on a single line.

--blacklist <blacklistfile>, or -x <blacklistfile>: Append the given blacklist file in hosts format to the generated hosts file.

--whitelist <whitelistfile>, or -w <whitelistfile>: Use the given whitelist file to remove hosts from the generated hosts file.

How do I control which sources are unified?

Add one or more additional sources, each in a subfolder of the data/ folder, and specify the url key in its update.json file.

Add one or more optional extensions, which originate from subfolders of the extensions/ folder. Again the url in update.json controls where this extension finds its updates.

Create an optional blacklist file. The contents of this file (containing a listing of additional domains in hosts file format) are appended to the unified hosts file during the update process. A sample blacklist is included, and may be modified as you need.

  • NOTE: The blacklist is not tracked by git, so any changes you make won't be overridden when you git pull this repo from origin in the future.

How do I include my own custom domain mappings?

If you have custom hosts records, place them in file myhosts. The contents of this file are prepended to the unified hosts file during the update process.

The myhosts file is not tracked by git, so any changes you make won't be overridden when you git pull this repo from origin in the future.

How do I prevent domains from being included?

The domains you list in the whitelist file are excluded from the final hosts file.

The whitelist uses partial matching. Therefore if you whitelist google-analytics.com, that domain and all its subdomains won't be merged into the final hosts file.

The whitelist is not tracked by git, so any changes you make won't be overridden when you git pull this repo from origin in the future.

How can I contribute hosts records?

If you discover sketchy domains you feel should be included here, here are some ways to contribute them.

Option 1: contact one of our hosts sources

The best way to get new domains included is to submit an issue to any of the data providers whose home pages are listed here. This is best because once you submit new domains, they will be curated and updated by the dedicated folks who maintain these sources.

Option 2: Fork this repository, add your domains to Steven Black's personal data file, and submit a pull request

Fork this hosts this repo and add your links to https://github.com/StevenBlack/hosts/blob/master/data/StevenBlack/hosts.

Then, submit a pull request.

WARNING: this is less desirable than Option 1 because the ongoing curation falls on us. So this creates more work for us.

Option 3: create your own hosts list as a repo on GitHub

If you're able to curate your own collection of sketchy domains, then curate your own hosts list. Then signal the existence of your repo as a new issue and we may include your new repo into the collection of sources we pull whenever we create new versions.

What is a hosts file?

A hosts file, named hosts (with no file extension), is a plain-text file used by all operating systems to map hostnames to IP addresses.

In most operating systems, the hosts file is preferential to DNS. Therefore if a domain name is resolved by the hosts file, the request never leaves your computer.

Having a smart hosts file goes a long way towards blocking malware, adware, and other irritants.

For example, to nullify requests to some doubleclick.net servers, adding these lines to your hosts file will do it:

# block doubleClick's servers
0.0.0.0 ad.ae.doubleclick.net
0.0.0.0 ad.ar.doubleclick.net
0.0.0.0 ad.at.doubleclick.net
0.0.0.0 ad.au.doubleclick.net
0.0.0.0 ad.be.doubleclick.net
# etc...

We recommend using 0.0.0.0 instead of 127.0.0.1

Traditionally most host files use 127.0.0.1, the loopback address, to establish an IP connection to the local machine.

We prefer to use 0.0.0.0, which is defined as a non-routable meta-address used to designate an invalid, unknown, or non-applicable target.

Using 0.0.0.0 is empirically faster, possibly because there's no wait for a timeout resolution. It also does not interfere with a web server that may be running on the local PC.

Why not use 0 instead of 0.0.0.0?

We tried that. Using 0 doesn't work universally.

Location of your hosts file

To modify your current hosts file, look for it in the following places and modify it with a text editor.

  • macOS (until 10.14.x macOS Mojave), iOS, Android, Linux: /etc/hosts file.
  • macOS Catalina: /private/etc/hosts file.
  • Windows: %SystemRoot%\system32\drivers\etc\hosts file.

Gentoo

Gentoo users may find sb-hosts in ::pf4public Gentoo overlay

NixOS

To install hosts file on your machine add the following into your configuration.nix:

{
  networking.extraHosts = let
    hostsPath = https://raw.githubusercontent.com/StevenBlack/hosts/master/hosts;
    hostsFile = builtins.fetchurl hostsPath;
  in builtins.readFile "${hostsFile}";
}
  • NOTE: Change hostsPath if you need other versions of hosts file.
  • NOTE: The call to fetchurl is impure. Use fetchFromGitHub with the exact commit if you want to always get the same result.

Nix Flake

NixOS installations which are managed through flakes can use the hosts file like this:

{
  inputs.hosts.url = github:StevenBlack/hosts;
  outputs = { self, nixpkgs, hosts }: {
    nixosConfigurations.my-hostname = {
      system = "<architecture>";
      modules = [
        hosts.nixosModule {
          networking.stevenBlackHosts.enable = true;
        }
      ];
    };
  };
}

The hosts extensions are also available with the following options:

{
  networking.stevenBlackHosts = {
    blockFakenews = true;
    blockGambling = true;
    blockPorn = true;
    blockSocial = true;
  };
}

Updating hosts file on Windows

(NOTE: See also some third-party Hosts managers, listed below.)

On Linux and macOS, run the Python script. On Windows more work is required due to compatibility issues so it's preferable to run the batch file as follows:

updateHostsWindows.bat

This file MUST be run in command prompt with administrator privileges in the repository directory. In addition to updating the hosts file, it can also replace the existing hosts file, and reload the DNS cache. It goes without saying that for this to work, you must be connected to the internet.

To open a command prompt as administrator in the repository's directory, do the following:

  • Windows XP: Start → Run → cmd
  • Windows Vista, 7: Start Button → type cmd → right-click Command Prompt → "Run as Administrator"
  • Windows 8: Start → Swipe Up → All Apps → Windows System → right-click Command Prompt → "Run as Administrator"
  • Windows 10: Start Button → type cmd → right-click Command Prompt → "Run as Administrator"

You can also refer to the "Third-Party Hosts Managers" section for further recommended solutions from third parties.

Warning: Using this hosts file in Windows may require disabling DNS Cache service

Windows has issues with larger hosts files. Recent changes in security within Windows 10 denies access to changing services via other tools except registry hacks. Use the disable-dnscache-service-win.cmd file to make proper changes to the Windows registry. You will need to reboot your device once that's done. See the the comments within the cmd file for more details.

Disabling the DNS Cache Service can cause issues with services and applications like WSL and it's possible to compress the hosts file and negate the need to disable the DNS caching service. You can try the C++ Windows command line tool at Hosts Compress - Windows (the recommended method) or the PowerShell compression script and check out the guide located at the Hosts Compression Scripts repository.

Reloading hosts file

Your operating system will cache DNS lookups. You can either reboot or run the following commands to manually flush your DNS cache once the new hosts file is in place.

The Google Chrome browser may require manually cleaning up its DNS Cache on chrome://net-internals/#dns page to thereafter see the changes in your hosts file. See: https://superuser.com/questions/723703

Windows

Open a command prompt with administrator privileges and run this command:

ipconfig /flushdns

Linux

Open a Terminal and run with root privileges:

  • Debian/Ubuntu sudo service network-manager restart

  • Linux Mint sudo /etc/init.d/dns-clean start

  • Linux with systemd: sudo systemctl restart network.service

  • Fedora Linux: sudo systemctl restart NetworkManager.service

  • Arch Linux/Manjaro with Network Manager: sudo systemctl restart NetworkManager.service

  • Arch Linux/Manjaro with Wicd: sudo systemctl restart wicd.service

  • RHEL/Centos: sudo /etc/init.d/network restart

  • FreeBSD: sudo service nscd restart

    To enable the nscd daemon initially, it is recommended that you run the following commands:

    sudo sysrc nscd_enable="YES"
    sudo service nscd start

    Then modify the hosts line in your /etc/nsswitch.conf file to the following:

    hosts: cache files dns
    
  • NixOS: The nscd.service is automatically restarted when the option networking.extraHosts was changed.

  • Others: Consult this Wikipedia article.

macOS

As described in this article, open a Terminal and run:

sudo dscacheutil -flushcache;sudo killall -HUP mDNSResponder

Release management

This repository uses release-it, an excellent CLI release tool for GitHub repos and npm packages, to automate creating releases. This is why the package.json and .release-it.json files are bundled.

Goals of this unified hosts file

The goals of this repo are to:

  1. automatically combine high-quality lists of hosts,
  2. provide situation-appropriate extensions,
  3. de-dupe the resultant combined list,
  4. and keep the resultant file reasonably sized.

A high-quality source is defined here as one that is actively curated. A hosts source should be frequently updated by its maintainers with both additions and removals. The larger the hosts file, the higher the level of curation is expected.

It is expected that this unified hosts file will serve both desktop and mobile devices under a variety of operating systems.

Third-Party Hosts Managers

  • Unified Hosts AutoUpdate (for Windows): The Unified Hosts AutoUpdate package is purpose-built for this unified hosts project as well as in active development by community members. You can install and uninstall any blacklist and keep it automatically up to date, and can be placed in a shared network location and deployed across an organization via group policies. And since it is in active development by community members, your bug reports, feature requests, and other feedback are most welcome.
  • ViHoMa is a Visual Hosts file Manager, written in Java, by Christian Martínez. Check it out!

Interesting Applications

  • Hosts-BL is a simple tool to handle hosts file black lists. It can remove comments, remove duplicates, compress to 9 domains per line, add IPv6 entries. In addition, it can also convert black lists to multiple other black list formats compatible with other software, such as dnsmasq, DualServer, RPZ, Privoxy, and Unbound, to name a few.
  • Host Minder is a simple GUI that allows you to easily update your /etc/hosts file to one of four consolidated hosts files from StevenBlack/hosts. It is provided as a deb package and comes pre-installed on UbuntuCE.
  • Maza ad blocking is a bash script that automatically updates host file. You can also update a fresh copy. And each time it generates a dnsmasq-compatible configuration file. Fast installation, compatible with MacOS, Linux and BSD.
  • Hostile is a nifty command line utility to easily add or remove domains from your hosts file. If our hosts files are too aggressive for you, you can use hostile to remove domains, or you can use hostile in a bash script to automate a post process each time you download fresh versions of hosts.
  • macOS Scripting for Configuration, Backup and Restore helps customizing, re-installing and using macOS. It also provides a script to install and update the hosts file using this project on macOS. In combination with a launchd it updates the hosts file every x days (default is 4). To install both, download the GitHub repo and run the install script from the directory one level up.
  • Pi-hole is a network-wide DHCP server and ad blocker that runs on Raspberry Pi. Pi-hole uses this repository as one of its sources.
  • Block ads and malware via local BIND9 DNS server (for Debian, Raspbian & Ubuntu): Set up a local DNS server with a /etc/bind/named.conf.blocked file, sourced from here.
  • Block ads, malware, and deploy parental controls via local DualServer DNS/DHCP server (for BSD, Windows & Linux): Set up a blacklist for everyone on your network using the power of the unified hosts reformatted for DualServer. And if you're on Windows, this project also maintains an update script to make updating DualServer's blacklist even easier.
  • Blocking ads and malwares with unboundUnbound is a validating, recursive, and caching DNS resolver.
  • dnsmasq conversion script This GitHub gist has a short shell script (bash, will work on any 'nix) and uses wget & awk present in most distros, to fetch a specified hosts file and convert it to the format required by dnsmasq. Supports IPv4 and IPv6. Designed to be used as either a shell script, or can be dropped into /etc/cron.weekly (or wherever suits). The script is short and easily edited, also has a short document attached with notes on dnsmasq setup.
  • BlackHosts - Command Line Installer/Updater This is a cross-platform command line utility to help install/update hosts files found at this repository.
  • Hosts Compression Scripts These are various scripts to help compress hosts files (by the author of BlackHosts).
  • Hosts Compress - Windows This is a C++ Windows command line tool to help compress hosts files (by the author of BlackHosts and Hosts Compression Scripts). This is highly recommended over the scripts as it is exponentially faster.
  • dnscrypt-proxy provides a tool to build block lists from local and remote lists in common formats.
  • Control D offers a public anycast network hosted mirror of the Unified (Adware + Malware) blocklist:
    • Legacy DNS: 76.76.2.35, 76.76.10.35, 2606:1a40::35, 2606:1a40:1::35
    • DNS-over-HTTPS/TLS/DOQ: https://freedns.controld.com/x-stevenblack, x-stevenblack.freedns.controld.com

Contribute

Please read our Contributing Guide. Among other things, this explains how we organize files and folders in this repository.

We are always interested in discovering well-curated sources of hosts. If you find one, please open an issue to draw our attention.

Before you create or respond to any issue, please read our code of conduct.

Logo by @Tobaloidee Thank you!.

hosts's People

Contributors

alexandercecile avatar ankitpati avatar anudeepnd avatar bigdargon avatar blimmer avatar brarcher avatar dennisvandehoef avatar dependabot[bot] avatar djnym avatar dnmtx avatar fademind avatar francogag avatar funilrys avatar gfyoung avatar indrajitr avatar l1m5 avatar lateralus138 avatar lightswitch05 avatar matkoniecz avatar paxperscientiam avatar perfectslayer avatar rhtenhove avatar scafroglia93 avatar scripttiger avatar stevenblack avatar tanrax avatar tobaloidee avatar tyzbit avatar xhmikosr avatar zhong-z avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

hosts's Issues

Can't update sources

While updating sources I get:

Updating source /Users/ForgottenPlayer/Downloads/hosts-master 2/data/adaway.org from http://adaway.org/hosts.txt
Problem getting file: http://adaway.org/hosts.txt
Traceback (most recent call last):
File "updateHostsFile.py", line 384, in
main()
File "updateHostsFile.py", line 93, in main
promptForUpdate()
File "updateHostsFile.py", line 107, in promptForUpdate
updateAllSources()
File "updateHostsFile.py", line 176, in updateAllSources
updatedFile = updatedFile.replace('\r', '') #get rid of carriage-return symbols
AttributeError: 'NoneType' object has no attribute 'replace'

Any ideas?

Duplicate entries found.

There are duplicate entries in current hosts file: lines 200458 and 200459.
May seem not so critical, but may break particular systems.

Also line 200459 violates file format consistency.

Can I use this in AdAway?

So not really an issue, more of a question. Can I had this host file to my AdAway sources? If yes, what URL should I use?
Thank you for this in any case!

Questions and request

I have a few questions and one request.

1st question
As far as I understand, in order to add new hosts file in Windows, the file has to be downloaded and one can simply replace the original hosts file in \etc map.
I have read in my hosts file (in Windows) that "each entry should be kept on an individual line."
I opened one hosts file from your list and the domains are added without spaces between domain names and IP addresses. They are also not kept on individual lines. I suppose this doesn't matter - or am I missing something?

2nd question & request
I read on the webpage that your amalgamated hosts file can be updated automatically. But I do not know the step-by-step procedure to set this up.
Could you explain it? This is also a request: you could write a brief "quick guide" on the GitHub Wiki page, in case anyone else will have the same question in the future.

3rd question
I saw that you are combining a few sources in your lists.
I have also found a few more lists to block malware sites. These are:
a) Malware domains: www.malwaredomains.com (in addition to already implemented "Malware domain list")
direct link (file): http://mirror1.malwaredomains.com/files/justdomains
b) Spam404: http://www.spam404.com
direct link (list): http://spam404bl.com/spam404scamlist.txt
c) DisableWinTracking (tool): https://github.com/10se1ucgo/DisableWinTracking
direct link (list used by the tool): https://gist.githubusercontent.com/10se1ucgo/fcb774d781a66ea9d31f/raw/b99089721fcd9d8c5718224e23f275d3b99c06e2/ips+domains.txt

Could you check them out?
I think, there are around 15.000 more websites blocked within these two lists in addition to the current ones. They could be implemented beside current sources - if they fit into your amalgamated hosts file.

Unreachable data sources should not crash updateHostsFile.py

Please add exception for offline pages during update proccess.

When I launched python2 updateHostsFile.py command and script stuck on trying download update file from http://someonewhocares.org/hosts/zero/hosts I have:

[tomasz@arch hosts]$ python2 updateHostsFile.py 
Do you want to update all data sources? [Y/n] Y
Updating source yoyo.org from http://pgl.yoyo.org/adservers/serverlist.php?hostformat=hosts&mimetype=plaintext&useip=0.0.0.0
Updating source adaway.org from http://adaway.org/hosts.txt
Updating source mvps.org from http://winhelp2002.mvps.org/hosts.txt
Updating source StevenBlack from https://raw.github.com/StevenBlack/hosts/master/data/StevenBlack/hosts
Updating source someonewhocares.org from http://someonewhocares.org/hosts/zero/hosts
Traceback (most recent call last):
  File "updateHostsFile.py", line 312, in <module>
    main()
  File "updateHostsFile.py", line 41, in main
    promptForUpdate()
  File "updateHostsFile.py", line 54, in promptForUpdate
    updateAllSources()
  File "updateHostsFile.py", line 120, in updateAllSources
    updatedFile = urllib2.urlopen(updateURL)
  File "/usr/lib/python2.7/urllib2.py", line 154, in urlopen
    return opener.open(url, data, timeout)
  File "/usr/lib/python2.7/urllib2.py", line 431, in open
    response = self._open(req, data)
  File "/usr/lib/python2.7/urllib2.py", line 449, in _open
    '_open', req)
  File "/usr/lib/python2.7/urllib2.py", line 409, in _call_chain
    result = func(*args)
  File "/usr/lib/python2.7/urllib2.py", line 1227, in http_open
    return self.do_open(httplib.HTTPConnection, req)
  File "/usr/lib/python2.7/urllib2.py", line 1197, in do_open
    raise URLError(err)
urllib2.URLError: <urlopen error [Errno 111] Connection refused>

127.0.0.1 > 0.0.0.0 conversion

On Linux I'm running this.
python updateHosts.py

OK, it gathers stuff. Removes the domains from the lists.
That's OK.

But when it comes to replacing 127.0.0.1 with 0.0.0.0 IP's, it fails on some of them.
For example the below part is from Yoyo list and

  • 0.0.0.0 24log.com
  • 0.0.0.0 24log.de
  • 0.0.0.0 24pm-affiliation.com
  • 0.0.0.0 2mdn.net
  • 127.0.0.1 2o7.net
  • 0.0.0.0 360yield.com
  • 127.0.0.1 4affiliate.net
  • 0.0.0.0 4d5.net
  • 0.0.0.0 50websads.com
    ...

...

...

  • 127.0.0.1 7adpower.com
  • 127.0.0.1 7bpeople.com
  • 127.0.0.1 7search.com

Also, it's not over.
The duplicate detection? Doesn't work as intended.
Right now I'm looking at 2x 0.0.0.0, 1x 127.0.0.1 references to ads.blog.com entries.

I've tried running your original git at a different machine, same result...

Readme uses 127.0.0.1 as example but refers to 0.0.0.0 as the better method

The title speaks for itself.
I'm unable to create a push request for this right now. Which is why I'm making a ticket for it.

Extract from README:


For example, to nullify requests to some doubleclick.net servers, adding these lines to your hosts
file will do it:

# block doubleClick's servers
127.0.0.1 ad.ae.doubleclick.net
127.0.0.1 ad.ar.doubleclick.net
127.0.0.1 ad.at.doubleclick.net
127.0.0.1 ad.au.doubleclick.net
127.0.0.1 ad.be.doubleclick.net
# etc...

Why use 0.0.0.0 instead of 127.0.0.1?

Using 0.0.0.0 is faster because you don't have to wait for a timeout. It also does not interfere
with a web server that may be running on the local PC.


EDIT: I just noticed there's a pull request for this: #66

ping connect to blocked site if 0 is used instead of 0.0.0.0

When 0 is set instead of 0.0.0.0 in TARGET_HOST I can connecting with domains via ping:

[tomasz@arch ~]$ ping -c 2 doubleclick.com
PING doubleclick.com (173.194.113.72) 56(84) bytes of data.
64 bytes from fra02s21-in-f8.1e100.net (173.194.113.72): icmp_seq=1 ttl=57 time=53.0 ms
64 bytes from fra02s21-in-f8.1e100.net (173.194.113.72): icmp_seq=2 ttl=57 time=52.3 ms

--- doubleclick.com ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 999ms
rtt min/avg/max/mdev = 52.378/52.705/53.032/0.327 ms

[tomasz@arch ~]$ ping -c 4 creative.ak.fbcdn.net
PING a34.dsw4.akamai.net (2.18.213.218) 56(84) bytes of data.
64 bytes from 2.18.213.218 (2.18.213.218): icmp_seq=1 ttl=59 time=59.8 ms
64 bytes from 2.18.213.218 (2.18.213.218): icmp_seq=2 ttl=59 time=60.5 ms
64 bytes from 2.18.213.218 (2.18.213.218): icmp_seq=3 ttl=59 time=60.1 ms
64 bytes from 2.18.213.218 (2.18.213.218): icmp_seq=4 ttl=59 time=59.8 ms

--- a34.dsw4.akamai.net ping statistics ---
4 packets transmitted, 4 received, 0% packet loss, time 7133ms
rtt min/avg/max/mdev = 59.828/60.080/60.527/0.333 ms

After revert to TARGET_HOST to value 0.0.0.0 I have:

[tomasz@arch ~]$ ping -c 4 creative.ak.fbcdn.net
PING creative.ak.fbcdn.net (127.0.0.1) 56(84) bytes of data.
64 bytes from localhost (127.0.0.1): icmp_seq=1 ttl=64 time=0.052 ms
64 bytes from localhost (127.0.0.1): icmp_seq=2 ttl=64 time=0.068 ms
64 bytes from localhost (127.0.0.1): icmp_seq=3 ttl=64 time=0.066 ms
64 bytes from localhost (127.0.0.1): icmp_seq=4 ttl=64 time=0.065 ms

--- creative.ak.fbcdn.net ping statistics ---
4 packets transmitted, 4 received, 0% packet loss, time 3001ms
rtt min/avg/max/mdev = 0.052/0.062/0.068/0.011 ms

and now all is FINE.

Please revert commit about changing TARGET_HOST from 0.0.0.0 in to 0. This just not work properly.

localhost entries missing on OS X

On OS X, the localhost definition at the top of /etc/hosts is a bit more verbose:

127.0.0.1       localhost
255.255.255.255 broadcasthost
::1             localhost
fe80::1%lo0     localhost

In the generated hosts file, the broadcasthost (see this) and the second IPv6 are missing:

127.0.0.1 localhost
::1 localhost

0.0.0.0 != localhost == 127.0.0.1

Please replace this line "0.0.0.0 localhost" to "127.0.0.1 localhost".
"0.0.0.0 localhost" is incorrect.

(I don't know why, I just can't make a pull request.)

Trailing spaces/tabs breaks hosts file reading

On my platform trailing spaces/tabs brings up error messages about missing information for servers while being read in.

The lines in question are purely comment lines, so i'd like to ask if those trailing spaces/tabs could be removed?
I could do it with a rexx parser myself, but would need to perform this on every update.

The lines in question are:
17662
17663

17666
17667

17670

17672

17679

17681

17705
17706

Thanks a lot for this great hosts file

Allow user configuration of target ip?

This is a small thing , but (seems to me) an easy to implement feature, why not provide a configuration file for things such as setting the target ip address?
I use your script in conjunction with a dnsmasq server that serves as a lan dns server and have an nginx http server serving blank pages (had to change the code TARGET_HOST = '_________'
).
It works as expected, however if I set up say a cron job to update the script from git every month, I would have to go in and change the code after every update.

Inconsistent Merging Process

Hello,
First I have to thank you and all constributors for your great script. So.. Thank you very much!


I found out that the merging process of the remote lists (adaway, mvps, etc) often [always?] results in a "different" new hosts file. I stick to your given lists. No Exclusions, no personal list added.

Behavior is the following:

  1. Run Script. Update, no exlusion, dont copy
  2. wc -l hosts 30385 (Lines in the new hosts file)
  3. cat hosts | grep "^0" | wc -l 28090 (That are the "pure" filter rules)

then

  1. Run Script again. NO update, no exlusion, dont copy
  2. wc -l hosts 30417
  3. cat hosts | grep "^0" | wc -l 28120

After each test I checked the local remote hosts files:
wc -l data/*/hosts both times the results were the same. What means, that each merge was based on the same base files.

I did that a couple of times with and without update and even if the local remote hosts files are the same (same linecount) I get a different output (new host file).

The output "new shiny host file with X unique entries" IS always the same.

Rules blocking certificate authorities

I noticed that while using this host file configuration, some certificate authorities were being blocked, so the certificates for twitter and certain other sites could not be verified. Would it be possible to locate and remove those specific rules, as it poses a security concern to users.

updateHostsFile crashes when run under path contains CJK characters

Console output

~/工作空間/第三方專案/Amalgamated hosts file$ python updateHostsFile.py --help
Traceback (most recent call last):
  File "updateHostsFile.py", line 78, in <module>
    DATA_PATH           = os.path.join(BASEDIR_PATH, 'data')
  File "/usr/lib/python2.7/posixpath.py", line 73, in join
    path += '/' + b
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe5 in position 14: ordinal not in range(128)

Reporter's Environment

Operating System

Ubuntu 15.10 AMD64

Python

2.7.10

Amalgamated hosts file

commit 231dc43

Locale

LANG=zh_TW.UTF-8
LANGUAGE=zh_TW:zh_HK:zh_CN:en_US:en
LC_CTYPE="zh_TW.UTF-8"
LC_NUMERIC=zh_TW.UTF-8
LC_TIME=zh_TW.UTF-8
LC_COLLATE="zh_TW.UTF-8"
LC_MONETARY=zh_TW.UTF-8
LC_MESSAGES="zh_TW.UTF-8"
LC_PAPER=zh_TW.UTF-8
LC_NAME=zh_TW.UTF-8
LC_ADDRESS=zh_TW.UTF-8
LC_TELEPHONE=zh_TW.UTF-8
LC_MEASUREMENT=zh_TW.UTF-8
LC_IDENTIFICATION=zh_TW.UTF-8
LC_ALL=

DNS Client Must Be Disabled

The size of the file requires the DNS Client to be disabled, otherwise the browser and ipconfig commands will appear to hang. This should be added to the readme.

(summary)
services.msc --> DNS Client (right click) --> properties --> startup "automatic" should be changed to "disabled" --> reboot

EDIT: I've noticed significant delays in webpage loading and some YouTube ads began appearing that I did not experience before. As a result, I've re-enabled the DNS Client and removed hosts-file.net from the folder for my own personal use. If there's an alternate solution to the problem, I'd be glad to hear it.

vimeo

Don't know why it's included, vimeo.com is great!
BTW, did you know it uses html5 player instead of flash?

Blocking AAAA (IPv6) Requests

I'm running OpenWRT with dnsmasq on my personal router where I use the generated hostfile.

The Problem
Host names that are blocked by the hostsfile (0.0.0.0 redirect) can be bypassed by AAAA requests.


Background:
nslookup google-analytics.com (which is blocked by the hostsfile) on a local machine in my network returns "0.0.0.0" and a remote IPv6 adress.

Looking into the log files of the router one could see that the A-Request is blocked by the hostsfile, but the AAAA request is redirected to the remote DNS-Server.


Solution
Shouldn't we start blocking AAAA (IPv6) requests as well by generating ::1 entries to the existing entries? That would result in a doubled size of the hosts file.

0.0.0.0 www.blocked-host-nr-1.com
0.0.0.0 www.blocked-host-nr-2.com

would become

0.0.0.0 www.blocked-host-nr-1.com
0.0.0.0 www.blocked-host-nr-2.com
::1 www.blocked-host-nr-1.com
::1 www.blocked-host-nr-2.com


Testing
As a "real life test" I applied the mentioned change (adding ::1 entry for every 0.0.0.0 entry).
The log files show that within 20 hours there were 49 of the ::1 entries that were blocked (which normally wouldn't have been).

So this really seems to be a thing to think about.


The point is that I'm not 100% sure if that applies to hosts files on local machines too or if that's just a dnsmasq thing.

I myself find it necessary to implement the AAAA Blocking as well. For personal use I wrote a script for that. The question is if you want/need to integrate this in your project.

Any ideas or opinions anyone?

SiFy Video Streaming Blocked

One of the hosts in the host file blocks streaming via SyFy. (i.e. here).

I reset the file to my OS default, refreshed the page and found the video loaded perfectly.

Problem running the script

Hello! I want to install the hosts on my osx... The problem is, whenever I try to run the script, I get:
"A line in the hostfile is going to cause problems because it is nonstandard

The line reads 0.0.0.0
please check your data files. Maybe you have a comment without a #?"

My actual hosts has no line with 0.0.0.0, and the hosts provided, has all the 0.0.0.0, mainly cause that blocks everything...
Any ideas?
I'm using OSX 10.11.11 (Latest Version). And the sources have been updated.

Add HOSTS file for AcrylicDNS

AcrylicDNS can use * and regexp.

127.0.0.1 ad.ae.doubleclick.net
127.0.0.1 ad.ar.doubleclick.net
127.0.0.1 ad.at.doubleclick.net
127.0.0.1 ad.au.doubleclick.net
127.0.0.1 ad.be.doubleclick.net

will become

127.0.0.1 ad.*.doubleclick.net

OR

127.0.0.1 ad.ae.doubleclick.net ad.ar.doubleclick.net ad.at.doubleclick.net ad.au.doubleclick.net ad.be.doubleclick.net

Simple, eh?

Website inc.com is blocked

The content on inc.com doesn't load. I suspect hosts file, because I also tested it with disabled adblocker and Firefox Tracking protection which doesn't help.

The problem is that I manually removed the entry inc.com from my hosts file and website still doesn't load.

"/" intended?

Saw those from your personal list, I tried opening those on my browser (Chrome) using your latest hosts file and it loaded.
Is it because of the "/"?

Use nine hostnames per line instead of one

If there are multiple hostnames on a line, the names after the first are treated as aliases for the first, which means that it takes less time to load in the file; also this trims file size by minimizing the number of occurrences of the redirect IP address in the file.

Although even 24 hostnames per line works in Unix-like systems (although too many names per line itself has its problems), Windows ignores any hostnames on a line after the first nine, so nine per line is ideal: http://forum.hosts-file.net/viewtopic.php?p=16438&sid=3e0ec8605c66da5a6a4bdd1bb49b5fbb#p16438

Should 0.0.0.0 be replaced with an explicitly invalid IP address?

To my knowledge, 0.0.0.0 is not intended to be used as an invalid address.

RFC 3330 from 2002 specifies explicitly invalid IP addresses, which could be used instead:

192.0.2.0/24 Test-Net

This is confirmed in RFC 5737 from 2010:

The blocks 192.0.2.0/24 (TEST-NET-1), 198.51.100.0/24 (TEST-NET-2),
and 203.0.113.0/24 (TEST-NET-3) are provided for use in
documentation.

I don't know if 3330 is the first RFC to mention Test-Net-1, or if it was known/implemented earlier. Then again, you probably would not surf the web with a pre-2002 machine.

localhost entries incorrectly rewritten to 0.0.0.0

It looks like the filter that converts all IPs to the chosen target (e.g. the default 0.0.0.0) is a bit too greedy. The valid localhost entries along with the broadcashost from the someonewhocares.org hosts file are incorrectly converted to 0.0.0.0.

Original file snippet:

#<localhost>
127.0.0.1       localhost
127.0.0.1       localhost.localdomain
255.255.255.255 broadcasthost
::1             localhost
127.0.0.1       local
#fe80::1%lo0    localhost
#</localhost>

Resulting hosts file:

#<localhost>
0.0.0.0 localhost.localdomain
0.0.0.0 broadcasthost
0.0.0.0 local
#fe80::1%lo0    localhost
#</localhost>

Disable DnsCache or not ?

Hi,

I've seen dozen of thread online about disabling DnsCache service on Windows when using large hosts file, what do you think ?

Some quotes :

I'm a bit lost :)

Request: Backups

The updateHostsFile.py should create a backup of the users original hosts file. This is just good practice, and could prevent losing a custom hosts file built over time.

.DS_Store files causes crash / request

Hi there,

I have to say right away that I'm no programmer, just started learning my first language (Python), those are only my observations and please do not laugh at my terminology ;).

First off, I'm really happy about how your script works, this is something that I've been looking for, fixes my problem without an app.
I've been using Gas Mask, it's an OS X app which allows you to use different hosts files and combine them into one from different remote sources with auto-updating.
It didn't allow for using so many hosts source files combined into one - crashed when putting one too many, sometimes died on it's own (so I wrote a launchdaemon which kept it alive but then at some point I would get stuck with two instances running at the same time...)

[Python 2.7.10 / OS X 10.10.4]
Everything worked fine when I ran the script for the first time.
However, when I've decided that I'm feeling adventurous and added my own hosts to /data/ then the script showed an error right from the start. .DS_Store does not have hosts file and update file (can't give you an exact reading because I've fixed this issue since yesterday).
And then I realized that I first ran the script before Finder added .DS_Store files to the folder.
That means your script reads the hidden system files and treats them as if they were another hosts source.

I don't know what system you are using but (wiki) .DS_Store is the name of a file in the Apple OS X operating system for storing custom attributes of a folder such as the position of icons or the choice of a background image.
Maybe you could put something in your script that ignores those files?

I've installed an app called Asepsis, which keeps all .DS_Store files in one folder and links them to corresponding folders. Then I deleted those files from your script's folder via terminal and the script started working again.

Also, I managed to create a hosts file which is readable by your script from malwaredomains.com by running a function in terminal which adds specified text string to every line in a text file.
Maybe you could do some magic voo-doo which would allow your script to re-do/re-organize the contents of a file to a 127.0.0.1 domain.com format?
There are many lists, used by different blockers, but they are in such a weird line format that I've got (currently) no clue how to re-organise them in bulk by some set rules and make them usable for a hosts file.

Now, after adding malwaredomains.com list and my private one I'm at 36361 unique entries.

Python 3 Compatibility

I noticed that updateHostsFile.py isn't compatible with Python 3. I think you should either mention this in the README or, preferably, update updateHostsFile.py so that it is compatible with Python 3.

Spotify ads

[Spotify]

127.0.0.1 media-match.com
127.0.0.1 adclick.g.doublecklick.net
127.0.0.1 www.googleadservices.com
127.0.0.1 open.spotify.com
127.0.0.1 pagead2.googlesyndication.com
#127.0.0.1 desktop.spotify.com

127.0.0.1 googleads.g.doubleclick.net
127.0.0.1 pubads.g.doubleclick.net
127.0.0.1 securepubads.g.doubleclick.net
#127.0.0.1 audio2.spotify.com
#127.0.0.1 www.omaze.com
#127.0.0.1 omaze.com
#127.0.0.1 bounceexchange.com

127.0.0.1 core.insightexpressai.com
127.0.0.1 content.bitsontherun.com
127.0.0.1 s0.2mdn.net
127.0.0.1 v.jwpcdn.com
127.0.0.1 d2gi7ultltnc2u.cloudfront.net
127.0.0.1 crashdump.spotify.com
127.0.0.1 adeventtracker.spotify.com
127.0.0.1 log.spotify.com
127.0.0.1 analytics.spotify.com
127.0.0.1 ads-fa.spotify.com

Aflac Website

Something in the Host file is preventing me from logging into Aflac's website. I can visit Aflac's website without any problem but if I use the Host file then I am prevented from logging in. I've captured the traffic and then scanned the file and cannot find a match. This is the captured traffic:
Category Website being accessed.
Business and Economy: Financial Data and Services www.aflac.com
Information Technology: Search Engines and Portals www.google.com
Information Technology vassg142.ocsp.omniroot.com
Information Technology vassg142.ocsp.omniroot.com
Information Technology ocsp.entrust.net
Information Technology ocsp.entrust.net
Information Technology ocsp.entrust.net
Information Technology ocsp.comodoca.com
Information Technology ocsp.entrust.net
Information Technology ocsp.geotrust.com
Information Technology gn.symcd.com
Information Technology gb.symcd.com
Information Technology gn.symcd.com
Information Technology g.symcd.com
Information Technology gn.symcd.com
Information Technology ocsp.geotrust.com
Information Technology gn.symcd.com
Information Technology ocsp.entrust.net
Information Technology ocsp.entrust.net
Information Technology gz.symcd.com
Information Technology gz.symcd.com
Information Technology ocsp.trustwave.com
Information Technology ocsp.trustwave.com
Information Technology ocsp.trustwave.com
Information Technology g.symcd.com
Information Technology ocsp.trustwave.com
Information Technology ocsp.comodoca.com
Information Technology gb.symcd.com
Miscellaneous: Web Infrastructure ocsp.digicert.com
Miscellaneous: Web Infrastructure ocsp.digicert.com
Miscellaneous: Web Infrastructure ocsp.godaddy.com
Miscellaneous: Web Infrastructure ocsp.digicert.com
Miscellaneous: Web Infrastructure ocsp.digicert.com
Miscellaneous: Web Infrastructure ocsp.godaddy.com
Miscellaneous: Web Infrastructure ocsp.godaddy.com
Miscellaneous: Web Infrastructure ocsp.godaddy.com
Miscellaneous: Web Infrastructure ocsp.godaddy.com
Miscellaneous: Web Infrastructure ocsp.godaddy.com
Miscellaneous: Web Infrastructure ocsp.godaddy.com
Miscellaneous: Web Infrastructure ocsp.godaddy.com
Miscellaneous: Web Infrastructure ocsp.digicert.com
Miscellaneous: Web Infrastructure ocsp.digicert.com
Information Technology ss.symcd.com
Information Technology gn.symcd.com
Information Technology gn.symcd.com
Information Technology ss.symcd.com
Information Technology ocsp.entrust.net
Information Technology ocsp.entrust.net
Information Technology api.wd.lenovo.com
Business and Economy: Financial Data and Services www.aflac.com
Miscellaneous: Web Infrastructure ocsp.digicert.com
Miscellaneous: Web Infrastructure ocsp.digicert.com
Business and Economy: Financial Data and Services www.aflac.com
Information Technology dl.javafx.com

whitespace hosts lines halting script

Hello!

The current version of the hosts files for someonewhocares.org contains a line with a single space, which is breaking the file parser:

$ git clone https://github.com/StevenBlack/hosts.git
Cloning into 'hosts'...
remote: Counting objects: 513, done.
remote: Total 513 (delta 0), reused 0 (delta 0)
Receiving objects: 100% (513/513), 21.22 MiB | 1.46 MiB/s, done.
Resolving deltas: 100% (209/209), done.
Checking connectivity... done.
$ cd hosts/
$ egrep -n '^ $' data/*/hosts
data/someonewhocares.org/hosts:323:
$ python ./updateHostsFile.py
Do you want to update all data sources? [Y/n] y
Updating source mvps.org from http://winhelp2002.mvps.org/hosts.txt
Updating source yoyo.org from http://pgl.yoyo.org/adservers/serverlist.php?hostformat=hosts&    mimetype=plaintext
Updating source StevenBlack from https://raw.github.com/StevenBlack/hosts/master    /data/StevenBlack/hosts
Updating source someonewhocares.org from http://someonewhocares.org/hosts/hosts
Updating source malwaredomainlist.com from http://www.malwaredomainlist.com/hostslist/hosts.txt
Do you want to exclude any domains?
For example, hulu.com video streaming must be able to access its tracking and ad servers in order to play video. [Y/n] y
Do you want to exclude the domain hulu.com ? [Y/n] y
Do you want to exclude any other domains? [Y/n] n
==>::1 localhost<==
==>::1 localhost<==
==># mistyped<==
==># log<==
==># URLs<==
==># May<==
==># video<==
==># and<==
==># up<==
==># problems<==
A line in the hostfile is going to cause problems because it is nonstandard
The line reads
 please check your data files. Maybe you have a comment without a #?

The removeDups() function only tests for lines that start with '#' or a newline. So a line consisting of spaces and a newline will make it into the stripRule() function and halt the script. The test can be updated to test for any line consisting of only whitespace:

$ git diff updateHostsFile.py
diff --git a/updateHostsFile.py b/updateHostsFile.py
index 709a525..14dac7a 100644
--- a/updateHostsFile.py
+++ b/updateHostsFile.py
@@ -165,7 +165,7 @@ def removeDups(mergeFile):

        hostnames = set()
        for line in mergeFile.readlines():
-               if line[0].startswith("#") or line[0] == '\n':
+               if line[0].startswith("#") or re.match(r'^\s*$', line[0]):
                        finalFile.write(line) #maintain the comments for readability
                        continue
                strippedRule = stripRule(line) #strip comments
$ python ./updateHostsFile.py
Do you want to update all data sources? [Y/n] y
Updating source mvps.org from http://winhelp2002.mvps.org/hosts.txt
Updating source yoyo.org from http://pgl.yoyo.org/adservers/serverlist.php?hostformat=hosts&mimetype=plaintext
Updating source StevenBlack from https://raw.github.com/StevenBlack/hosts/master/data/StevenBlack/hosts
Updating source someonewhocares.org from http://someonewhocares.org/hosts/hosts
Updating source malwaredomainlist.com from http://www.malwaredomainlist.com/hostslist/hosts.txt
Do you want to exclude any domains?
For example, hulu.com video streaming must be able to access its tracking and ad servers in order to play video. [Y/n] y
Do you want to exclude the domain hulu.com ? [Y/n] y
Do you want to exclude any other domains? [Y/n] n
==>::1 localhost<==
==>::1 localhost<==
Success! Your shiny new hosts file has been prepared.
It contains 25343 unique entries.
Do you want to replace your existing hosts file with the newly generated file? [Y/n] n

--Chris

Minimal Python 2 version?

I'd like to contribute to this project but have a question on what's the minimal version of Python are you supporting?

Seeing the code I can assume should be 2.6 because of the use of "".format(...), and I'm thinking that with small changes can be lowered down to at least 2.4

BTW, I also would like to thank you not only for this, but also for all your previous work on the VFP community.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.