Code Monkey home page Code Monkey logo

ukpopulation's Issues

Separate SNPP data for each country

so that the time ranges need not be the same. As of 9/6/18 2016-based (to 2041) data is available for En/Sc/NI, Wales is still 2014-based and their next delivery may be 2017-based

Migrate to new SNPP 2018 data.

An update to the SNPP has broken the downstream tests for the new dynamic microsimulation (Daedalus). Will need to migrate to the new data in order to be able to cache new test data (E09000001) into travis-ci.

test harness

work out a small dummy dataset for testing (on travis)

New ONS SNPP release

Investigate whether we can upgrade to this data (bear in mind will likely only cover England)

JOSS issue

Hola!

Can you give a one/two-liner on the methodology in the paper--- what you are doing ---

"It also provides a simple methodology for extrapolating the shorter-term subnational data using the longer-term national data, whilst preserving the age-gender structure present at the smaller geography. "

would be useful.

Unable to pull Wales snpp data

Upon calling SNPPData.SNPPData() I am receiving an error relating to Wales.

It would seem this call to statswales is being called, which does not work
http://open.statswales.gov.wales/dataset/popu6010?$select=Area_AltCode1,Year_Code,Data,Gender_Code,Age_Code,Area_Hierarchy,Variant_Code&$filter=Gender_Code

Where Gender_Code throws the error.

The exact traceback is ukpopulation/snppdata.py line 25
self.data[utils.WA] = self.__do_wales()
line 286
data += r_data['value']
KeyError: 'value

image

Modify query to handle newly released (2018 based) principal projections from nismod

Since new principal projections have been released, the package downloads an empty tsv file in place of these projections. Query parameters need to be modified to fix this issue. Also a decision needs to be made about whether we continue using 2016 based projections for now. We could move straight on to the 2018 projections, but this could cause downstream errors, e.g. if paths are ever hard-coded.

copyediting

Hey,

  1. "The datasets published online in different places and in different formats, depending on the originating agency. "

  2. "UK statistics agencies (ONS, StatsWales, NRScotland and NISRA) produce mid-year population estimates, national and subnational population estimates and projections, by single year of age, gender. " ---> I don't know what 'single year of age' means

  3. "Population projection variants (e.g. high fertility) are extremely useful for scenario analyses and a number of population projection variants available at national scale. "

  4. "This python package aims unify the retrieval of UK-wide data"

etc.

Authorship

Suggestions: add author and how you would like to have this work quoted.

JOSS submission

  • rename to ukpopulation for clearer scope
  • write abstract
  • fix known bugs #9
  • do enhancements #11
  • better test coverage
  • fix issue with England 2014 SNPP #12
  • satisfy any other JOSS requirements

Custom SNPP projection data

  • Stored as csv of (fractional) counts by age/gender/LAD/year in cache directory
  • Duplicate functionality of SNPPData class, adding a registry (JSON name-file dict) also stored in cache dir, plus a registration function taking and validating data.
  • Constructor arg = name of (already registered) custom projection

function(s) for aggregated LAD codes

e.g. allLADs(GB) returns all the LAD codes in GB

allowing easy retrieval of totals for large areas without having to explicitly specify all the LAD codes:

ew_snpp2031 = snpp.aggregate(["GENDER", "C_AGE"], utils.allLADs(utils.EW), 2031)

NPP 90+ age aggregation - check consistency between ppp and variants

import population.nppdata as NPPData
npp=NPPData.NPPData()
npp.variant_ratio("hhh", ["en"], 2016, ages=range(89,91))
                                  OBS_VALUE GEOGRAPHY_CODE
C_AGE GENDER PROJECTED_YEAR_NAME
89    1      2016                  1.000000      E92000001
      2      2016                  1.000000      E92000001
90    1      2016                  1.000672      E92000001
      2      2016                  1.001981      E92000001

values are not dependent on the variant so suspect its ppp thats wrong

Invalid/nonexistent LAD codes are silently ignored

More a feature than a bug, but if some of the LAD codes in your request are e.g. old such as E07000097 for East Herts, which was updated to E07000242 in 2013, then ukpopulation silently returns no data for the old code. Have changed snpp.filter to raise if the result doesnt contain all the LAD codes requested. The error can also trigger if the years are not in the SNPP range, or the age/gender filters are invalid.

NPP download bug (python3.5 only)

$ python3 doc/example_variant.py
Cache directory:  /home/geoaps/.ukpopulation/cache
using cached LAD codes: /home/geoaps/.ukpopulation/cache/lad_codes.json
Loading NPP principal (ppp) data for England, Wales, Scotland & Northern Ireland
/home/geoaps/.ukpopulation/cache/NM_2009_1_metadata.json found, using cached metadata...
Using cached data: /home/geoaps/.ukpopulation/cache/NM_2009_1_0bcd330bc936cd7902566cf7198d8868.tsv
Cache directory:  /home/geoaps/.ukpopulation/cache
using cached LAD codes: /home/geoaps/.ukpopulation/cache/lad_codes.json
Collating SNPP data for England...
/home/geoaps/.ukpopulation/cache/NM_2006_1_metadata.json found, using cached metadata...
Using cached data: /home/geoaps/.ukpopulation/cache/NM_2006_1_1412780ddd715d804371850734000928.tsv
/home/geoaps/.ukpopulation/cache/NM_2006_1_metadata.json found, using cached metadata...
Using cached data: /home/geoaps/.ukpopulation/cache/NM_2006_1_a5b81a739b05970852420fdf22dd43c9.tsv
Collating SNPP data for Wales...
Collating SNPP data for Scotland...
Collating SNPP data for Northern Ireland...
using /home/geoaps/.ukpopulation/cache/npp_ni.zip
using /home/geoaps/.ukpopulation/cache/npp_wa.zip
using /home/geoaps/.ukpopulation/cache/npp_sc.zip
using /home/geoaps/.ukpopulation/cache/npp_en.zip
Extracting ni_hhh
Traceback (most recent call last):
  File "doc/example_variant.py", line 20, in <module>
    hhh = snpp.create_variant("hhh", npp, lad, range(start_year, end_year + 1))
  File "/usr/local/lib/python3.5/dist-packages/ukpopulation-1.0.1-py3.5.egg/ukpopulation/snppdata.py", line 131, in create_variant
    scaling = npp.variant_ratio(variant_name, utils.country(geog_code), year_range).reset_index().sort_values(["C_AGE", "GENDER", "PROJECTED_YEAR_NAME"])
  File "/usr/local/lib/python3.5/dist-packages/ukpopulation-1.0.1-py3.5.egg/ukpopulation/nppdata.py", line 142, in variant_ratio
    num = self.detail(variant_numerator, geog, years, ages, genders).set_index(["C_AGE", "GENDER", "PROJECTED_YEAR_NAME"])
  File "/usr/local/lib/python3.5/dist-packages/ukpopulation-1.0.1-py3.5.egg/ukpopulation/nppdata.py", line 100, in detail
    self.__load_variant(variant_name)
  File "/usr/local/lib/python3.5/dist-packages/ukpopulation-1.0.1-py3.5.egg/ukpopulation/nppdata.py", line 222, in __load_variant
    vdata = np.array(_read_excel_xml(self.cache_dir + "/" + vxml, "Population"))
  File "/usr/local/lib/python3.5/dist-packages/ukpopulation-1.0.1-py3.5.egg/ukpopulation/nppdata.py", line 18, in _read_excel_xml
    if sheet["ss:Name"] == sheet_name:
  File "/usr/lib/python3/dist-packages/bs4/element.py", line 958, in __getitem__
    return self.attrs[key]
KeyError: 'ss:Name'

grammar/copy-editing

For convenience, I have posted a version of the ms. below which fixes a few of the issues.


UK statistics agencies (ONS, StatsWales, NRScotland and NISRA) produce mid-year population estimates (MYE), national population projections (NPP), and subnational population projections (SNPP), by age and gender. Mid-year estimates and subnational population projections are at local authority district (or equivalent) geography; national projections are produced for England, Wales, Scotland and Northern Ireland respectively.

At the time of writing (June 2018), mid-year estimates are available from 1991 to 2016. Subnational population projections typically have a 25-year horizon and in most cases cover the period 2016-2041. National projections have a longer horizon, extending from 2016 to 2116.

The datasets are published online in different places and in different formats, depending on the originating agency. The data may require significant preprocessing/reformatting before it can be used programmatically and, crucially, reproducibly.

Population projection variants (e.g. high fertility) are extremely useful for scenario analyses and a number of population projection variants are available at national scale. However, the availability of variants at the subnational scale is patchy at best.

This python package aims to unify the retrieval of UK-wide data, providing consistent interfaces for each of the three (MYE, SNPP, NPP) datasets. For efficiency, all data is cached locally.

The package includes functionality to filter data by geography (e.g. for analysis of a single local authority), and/or by age and by gender, e.g. for analysis of the working-age population.

It also provides a simple methodology for extrapolating the shorter-term SNPP data using the longer-term NPP data. By extrapolating the SNPP independently for each age and gender enables the age-gender structure of the original population to be captured. Aggregation only takes place on the extrapolated age-gender specific values. This means that the trends shown by SNPP geographies with different age-gender structures will differ, even though they both extrapolate using the same NPP data. (If aggregation was performed before the extrapolation it would not be possible to compute different extrapolated trends for regions with different initial age-gender structures.)

This functionality enables researchers and policymakers to examine population growth trajectories and construct plausible projections, both principal and variant, at sub-national scale, consistently for anywhere in the UK over longer time horizons than the official data permit. By automating the process of obtaining and consistently formatting the data, the package also makes it much easier to create reproducible analyses.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.