Scrape Dutch names stats, enrich them with cohort tables and serve them via a web app.
The goal of this project is to obtain the expected number of people alive with a given name. Birth rates per name are scraped from the Nederlandse Voornamenbank (set up by the Meertens Instituut). The expected number of people alive for a given name & gender are calculated using yearly life expectancies give by the CBS. This project is inspired by/copied from FiveThirtyEight's excellent article How to Tell Someone’s Age When All You Know Is Her Name.
website
) Scraping is done with scrapy
and consists of two stages. First all the names on the website are collected with their summary statistics. From those statistics the subset of names can with yearly rates are determined and are then scraped.
[dutch-names/spiders](spiders)
: cd spiders
scrapy crawl meertens_list -o list.json
scrapy crawl meertens_details -o details.json
IPython Notebook How to Tell Someone’s Age When All You Know Is Her Dutch Name
python app/app.py
The project uses the following tools: