nismod / ukpopulation Goto Github PK
View Code? Open in Web Editor NEWPopulation and demographics projection module, developed for ITRC/MISTRAL
License: MIT License
Population and demographics projection module, developed for ITRC/MISTRAL
License: MIT License
so that the time ranges need not be the same. As of 9/6/18 2016-based (to 2041) data is available for En/Sc/NI, Wales is still 2014-based and their next delivery may be 2017-based
see pypa/pip#3939 and update doc. i would rec. for now just going w/ requirements.txt
An update to the SNPP has broken the downstream tests for the new dynamic microsimulation (Daedalus). Will need to migrate to the new data in order to be able to cache new test data (E09000001) into travis-ci.
When first downloaded the snhp_w dataset causes SIMIM to fail, as reported by @tg137
On subsequent runs there is no problem
work out a small dummy dataset for testing (on travis)
Scotland SNHP download uses a lookup table loaded from a relative path that assumes the microsimulation package is installed and in that location.
keep country-specific categories?
provide lowest common denominator categories?
both?
e.g.
MYE: aggregate(years, geogs, cats)
vs
SNPP: aggregate(cats, geogs, years)
Investigate whether we can upgrade to this data (bear in mind will likely only cover England)
year = 2016 means pre_range passed to ukpopulation is [], which then fails
Hola!
Can you give a one/two-liner on the methodology in the paper--- what you are doing ---
"It also provides a simple methodology for extrapolating the shorter-term subnational data using the longer-term national data, whilst preserving the age-gender structure present at the smaller geography. "
would be useful.
.
Upon calling SNPPData.SNPPData() I am receiving an error relating to Wales.
It would seem this call to statswales is being called, which does not work
http://open.statswales.gov.wales/dataset/popu6010?$select=Area_AltCode1,Year_Code,Data,Gender_Code,Age_Code,Area_Hierarchy,Variant_Code&$filter=Gender_Code
Where Gender_Code throws the error.
The exact traceback is ukpopulation/snppdata.py line 25
self.data[utils.WA] = self.__do_wales()
line 286
data += r_data['value']
KeyError: 'value
Since new principal projections have been released, the package downloads an empty tsv file in place of these projections. Query parameters need to be modified to fix this issue. Also a decision needs to be made about whether we continue using 2016 based projections for now. We could move straight on to the 2018 projections, but this could cause downstream errors, e.g. if paths are ever hard-coded.
Hey,
"The datasets published online in different places and in different formats, depending on the originating agency. "
"UK statistics agencies (ONS, StatsWales, NRScotland and NISRA) produce mid-year population estimates, national and subnational population estimates and projections, by single year of age, gender. " ---> I don't know what 'single year of age' means
"Population projection variants (e.g. high fertility) are extremely useful for scenario analyses and a number of population projection variants available at national scale. "
"This python package aims unify the retrieval of UK-wide data"
etc.
Suggestions: add author and how you would like to have this work quoted.
Current implementation only computes extrapolated population for a single LAD, which is extremely inefficient when GB-wide data is required.
e.g. allLADs(GB)
returns all the LAD codes in GB
allowing easy retrieval of totals for large areas without having to explicitly specify all the LAD codes:
ew_snpp2031 = snpp.aggregate(["GENDER", "C_AGE"], utils.allLADs(utils.EW), 2031)
...and how to define them?
import population.nppdata as NPPData
npp=NPPData.NPPData()
npp.variant_ratio("hhh", ["en"], 2016, ages=range(89,91))
OBS_VALUE GEOGRAPHY_CODE
C_AGE GENDER PROJECTED_YEAR_NAME
89 1 2016 1.000000 E92000001
2 2016 1.000000 E92000001
90 1 2016 1.000672 E92000001
2 2016 1.001981 E92000001
values are not dependent on the variant so suspect its ppp thats wrong
More a feature than a bug, but if some of the LAD codes in your request are e.g. old such as E07000097 for East Herts, which was updated to E07000242 in 2013, then ukpopulation silently returns no data for the old code. Have changed snpp.filter to raise if the result doesnt contain all the LAD codes requested. The error can also trigger if the years are not in the SNPP range, or the age/gender filters are invalid.
Suggest defaulting to
$HOME/.ukpopulation/cache_dir
or equivalent, in a way that works for all platforms
e.g. SNPP(hhh, LAD) = SNPP(ppp, LAD) * [ NPP(hhh, England) / NPP(ppp, England) ]
(this could be done per age/gender and summed)
(after JOSS review)
no reason to wait
e.g SNPP(LAD, 2045) = SNPP(LAD, 2039) * NPP(England, 2045) / NPP(England, 2039)
(this could be done per age/gender and summed)
the logic to force the arg into a list need a more generic check for a scalar type
$ python3 doc/example_variant.py
Cache directory: /home/geoaps/.ukpopulation/cache
using cached LAD codes: /home/geoaps/.ukpopulation/cache/lad_codes.json
Loading NPP principal (ppp) data for England, Wales, Scotland & Northern Ireland
/home/geoaps/.ukpopulation/cache/NM_2009_1_metadata.json found, using cached metadata...
Using cached data: /home/geoaps/.ukpopulation/cache/NM_2009_1_0bcd330bc936cd7902566cf7198d8868.tsv
Cache directory: /home/geoaps/.ukpopulation/cache
using cached LAD codes: /home/geoaps/.ukpopulation/cache/lad_codes.json
Collating SNPP data for England...
/home/geoaps/.ukpopulation/cache/NM_2006_1_metadata.json found, using cached metadata...
Using cached data: /home/geoaps/.ukpopulation/cache/NM_2006_1_1412780ddd715d804371850734000928.tsv
/home/geoaps/.ukpopulation/cache/NM_2006_1_metadata.json found, using cached metadata...
Using cached data: /home/geoaps/.ukpopulation/cache/NM_2006_1_a5b81a739b05970852420fdf22dd43c9.tsv
Collating SNPP data for Wales...
Collating SNPP data for Scotland...
Collating SNPP data for Northern Ireland...
using /home/geoaps/.ukpopulation/cache/npp_ni.zip
using /home/geoaps/.ukpopulation/cache/npp_wa.zip
using /home/geoaps/.ukpopulation/cache/npp_sc.zip
using /home/geoaps/.ukpopulation/cache/npp_en.zip
Extracting ni_hhh
Traceback (most recent call last):
File "doc/example_variant.py", line 20, in <module>
hhh = snpp.create_variant("hhh", npp, lad, range(start_year, end_year + 1))
File "/usr/local/lib/python3.5/dist-packages/ukpopulation-1.0.1-py3.5.egg/ukpopulation/snppdata.py", line 131, in create_variant
scaling = npp.variant_ratio(variant_name, utils.country(geog_code), year_range).reset_index().sort_values(["C_AGE", "GENDER", "PROJECTED_YEAR_NAME"])
File "/usr/local/lib/python3.5/dist-packages/ukpopulation-1.0.1-py3.5.egg/ukpopulation/nppdata.py", line 142, in variant_ratio
num = self.detail(variant_numerator, geog, years, ages, genders).set_index(["C_AGE", "GENDER", "PROJECTED_YEAR_NAME"])
File "/usr/local/lib/python3.5/dist-packages/ukpopulation-1.0.1-py3.5.egg/ukpopulation/nppdata.py", line 100, in detail
self.__load_variant(variant_name)
File "/usr/local/lib/python3.5/dist-packages/ukpopulation-1.0.1-py3.5.egg/ukpopulation/nppdata.py", line 222, in __load_variant
vdata = np.array(_read_excel_xml(self.cache_dir + "/" + vxml, "Population"))
File "/usr/local/lib/python3.5/dist-packages/ukpopulation-1.0.1-py3.5.egg/ukpopulation/nppdata.py", line 18, in _read_excel_xml
if sheet["ss:Name"] == sheet_name:
File "/usr/lib/python3/dist-packages/bs4/element.py", line 958, in __getitem__
return self.attrs[key]
KeyError: 'ss:Name'
not clear why it isn't there yet.
For convenience, I have posted a version of the ms. below which fixes a few of the issues.
UK statistics agencies (ONS, StatsWales, NRScotland and NISRA) produce mid-year population estimates (MYE), national population projections (NPP), and subnational population projections (SNPP), by age and gender. Mid-year estimates and subnational population projections are at local authority district (or equivalent) geography; national projections are produced for England, Wales, Scotland and Northern Ireland respectively.
At the time of writing (June 2018), mid-year estimates are available from 1991 to 2016. Subnational population projections typically have a 25-year horizon and in most cases cover the period 2016-2041. National projections have a longer horizon, extending from 2016 to 2116.
The datasets are published online in different places and in different formats, depending on the originating agency. The data may require significant preprocessing/reformatting before it can be used programmatically and, crucially, reproducibly.
Population projection variants (e.g. high fertility) are extremely useful for scenario analyses and a number of population projection variants are available at national scale. However, the availability of variants at the subnational scale is patchy at best.
This python package aims to unify the retrieval of UK-wide data, providing consistent interfaces for each of the three (MYE, SNPP, NPP) datasets. For efficiency, all data is cached locally.
The package includes functionality to filter data by geography (e.g. for analysis of a single local authority), and/or by age and by gender, e.g. for analysis of the working-age population.
It also provides a simple methodology for extrapolating the shorter-term SNPP data using the longer-term NPP data. By extrapolating the SNPP independently for each age and gender enables the age-gender structure of the original population to be captured. Aggregation only takes place on the extrapolated age-gender specific values. This means that the trends shown by SNPP geographies with different age-gender structures will differ, even though they both extrapolate using the same NPP data. (If aggregation was performed before the extrapolation it would not be possible to compute different extrapolated trends for regions with different initial age-gender structures.)
This functionality enables researchers and policymakers to examine population growth trajectories and construct plausible projections, both principal and variant, at sub-national scale, consistently for anywhere in the UK over longer time horizons than the official data permit. By automating the process of obtaining and consistently formatting the data, the package also makes it much easier to create reproducible analyses.
by gender by single year of age
from 2011 (or possibly even 2001)
requires that issues surrounding MYE changes and 2016 SNPP availablitiy are resolved
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.