onsbigdata / precon Goto Github PK
View Code? Open in Web Editor NEWFunctions for price index economics.
License: MIT License
Functions for price index economics.
License: MIT License
Line 77 in 823c0c1
Add this after:
if adjustments is not None:
base_prices = base_prices * adjustments
And update docstring.
Line 24 in 4e441a7
The above line in the function means if passing in a dataframe with the following format
date | col1 | col2 |
---|---|---|
2019-01-01 | 101 | NaN |
... | ... | ... |
2019-05-01 | 104 | NaN |
2019-06-01 | 103 | 100 |
... | ... | ... |
2020-01-01 | 101 | 102 |
(i.e. col2 timeseries starts later than col1 ) then jan_adjustment will drop the entire row for 2019-01-01.
Not sure on the correct behaviour, but anecdotally removing the dropna seems to work well.
The axis to adjust across.
This might need some generalisation later on, but replace what is there for now. Maybe this function can move too, index_methods? Move index_calculator there too?
Lines 143 to 152 in ea185fa
Still totally unsure whether this will solve the issue a user is experiencing, but in some adapted code the lines were:
zeros_and_nans = indices.isna() | indices.eq(0)
weights = weights.mask(zeros_and_nans, 0).fillna(0)
Consider implementing it on it's own line with a comment explaining why that fill is necessary. Also find out what edge case it solves and write a test for it.
Line 68 in cf0df3a
I want to add some pre-commit hooks for developers.
Remove whitespace
Flake 8 linting
Check commit msg subject len
Change from the following:
elif isinstance(obj, pd.core.frame.DataFrame):
# Create an empty DataFrame to fill with adjustments
adjustments = pd.DataFrame().reindex_like(obj)
for index, row in iter_method(obj):
# Create a selector based on the axis
slice_ = axis_slice(index, axis)
adjustments.loc[slice_] = _get_adjustments(row, decimals)
to this:
elif isinstance(obj, pd.core.frame.DataFrame):
adjustments = obj.apply(_get_adjustments, args=(decimals), axis=axis)
This should also allow for the removal of:
iter_dict = {
0: pd.DataFrame.iterrows,
1: pd.DataFrame.iteritems,
}
iter_method = iter_dict.get(axis)
Slimming the function right down.
While taking care of this, remember to also do the following:
The ternary operator is unnecessary here - a simple conditional will do since it returns True or False anyway.
Line 82 in 46752ea
Similar to Matt's implementation here:
in_year_base = indices.resample('AS').first()
# Align base indices to full time series values
in_year_base = (
in_year_base
.reindex_like(indices, method='ffill')
The chain does not handle missing periods correctly but still produces a result.
import pandas as pd
from pandas import Timestamp
import precon
df_all_periods = pd.DataFrame.from_records([
(Timestamp('2018-01-01'), 100.000000),
(Timestamp('2018-02-01'), 100.527400),
(Timestamp('2018-03-01'), 100.894000),
(Timestamp('2018-04-01'), 100.689100),
(Timestamp('2018-05-01'), 102.670400),
(Timestamp('2018-06-01'), 100.811000),
(Timestamp('2018-07-01'), 102.632500),
(Timestamp('2018-08-01'), 103.133200),
(Timestamp('2018-09-01'), 103.111400),
(Timestamp('2018-10-01'), 103.417700),
(Timestamp('2018-11-01'), 103.155800),
(Timestamp('2018-12-01'), 103.616800),
(Timestamp('2019-01-01'), 104.246480),
(Timestamp('2019-02-01'), 101.093900),
(Timestamp('2019-03-01'), 101.726900),
(Timestamp('2019-04-01'), 100.478600), # April 2019 value present
(Timestamp('2019-05-01'), 100.647800),
(Timestamp('2019-06-01'), 100.439100),
(Timestamp('2019-07-01'), 102.181900),
(Timestamp('2019-08-01'), 100.608800),
(Timestamp('2019-09-01'), 102.067000),
(Timestamp('2019-10-01'), 102.418300),
(Timestamp('2019-11-01'), 102.769600),
(Timestamp('2019-12-01'), 103.120900),
(Timestamp('2020-01-01'), 103.519414),
(Timestamp('2020-02-01'), 100.710500),
],
columns=('period', 'index_value'),
).set_index('period')
df_period_missing = pd.DataFrame.from_records([
(Timestamp('2018-01-01'), 100.000000),
(Timestamp('2018-02-01'), 100.527400),
(Timestamp('2018-03-01'), 100.894000),
(Timestamp('2018-04-01'), 100.689100),
(Timestamp('2018-05-01'), 102.670400),
(Timestamp('2018-06-01'), 100.811000),
(Timestamp('2018-07-01'), 102.632500),
(Timestamp('2018-08-01'), 103.133200),
(Timestamp('2018-09-01'), 103.111400),
(Timestamp('2018-10-01'), 103.417700),
(Timestamp('2018-11-01'), 103.155800),
(Timestamp('2018-12-01'), 103.616800),
(Timestamp('2019-01-01'), 104.246480),
(Timestamp('2019-02-01'), 101.093900),
(Timestamp('2019-03-01'), 101.726900),
(Timestamp('2019-04-01'), None), # April 2019 value missing
(Timestamp('2019-05-01'), 100.647800),
(Timestamp('2019-06-01'), 100.439100),
(Timestamp('2019-07-01'), 102.181900),
(Timestamp('2019-08-01'), 100.608800),
(Timestamp('2019-09-01'), 102.067000),
(Timestamp('2019-10-01'), 102.418300),
(Timestamp('2019-11-01'), 102.769600),
(Timestamp('2019-12-01'), 103.120900),
(Timestamp('2020-01-01'), 103.519414),
(Timestamp('2020-02-01'), 100.710500),
],
columns=('period', 'index_value'),
).set_index('period')
expected = pd.DataFrame.from_records([
(Timestamp('2018-01-01'), 100.000000),
(Timestamp('2018-02-01'), 100.527400),
(Timestamp('2018-03-01'), 100.894000),
(Timestamp('2018-04-01'), 100.689100),
(Timestamp('2018-05-01'), 102.670400),
(Timestamp('2018-06-01'), 100.811000),
(Timestamp('2018-07-01'), 102.632500),
(Timestamp('2018-08-01'), 103.133200),
(Timestamp('2018-09-01'), 103.111400),
(Timestamp('2018-10-01'), 103.417700),
(Timestamp('2018-11-01'), 103.155800),
(Timestamp('2018-12-01'), 103.616800),
(Timestamp('2019-01-01'), 104.246480),
(Timestamp('2019-02-01'), 105.386833),
(Timestamp('2019-03-01'), 106.046713),
(Timestamp('2019-04-01'), 104.745404),
(Timestamp('2019-05-01'), 104.921789),
(Timestamp('2019-06-01'), 104.704227),
(Timestamp('2019-07-01'), 106.521034),
(Timestamp('2019-08-01'), 104.881133),
(Timestamp('2019-09-01'), 106.401255),
(Timestamp('2019-10-01'), 106.767473),
(Timestamp('2019-11-01'), 107.133691),
(Timestamp('2019-12-01'), 107.499909),
(Timestamp('2020-01-01'), 107.915346),
(Timestamp('2020-02-01'), 108.682084),
],
columns=('period', 'index_value'),
).set_index('period')
df_all_periods['chained'] = precon.chain(df_all_periods)
df_period_missing['chained'] = precon.chain(df_period_missing)
pd.concat([df_all_periods, df_period_missing, expected], keys=['all_periods', 'period_missing', 'expected'], axis=1)
In the above example expected
is calculated for if all periods are present but using the equation of unlinked index * linked base / 100
so the chained indices after the missing period are not affected. precon.chain
doesn't have an issue as it uses a backfill after shifting the indices by one period to fill in the first month.
It would be useful to be able to view the docs for this project.
Currently, I think, you have to clone and build them yourself?
A solution would be to use GitHub pages to serve the docs as this works well with sphinx.
Create a generator to create random index data in a reproducible way.
Support the generation of hierarchical structure of indices.
Add functionality to aggregate for a given MultiIndex level or set of levels, and extend that functionality to enable an aggregation up a hierarchical tree given by a set of MultiIndex levels.
Add tests and ensure docstrings are thorough.
Additional contributions functions were developed for the consumer prices faster indicators project. Pull these into precon
.
period_on_period_contributions
contributions_level
contributions_up_hierarchy
Review existing contributions code and add some tests and documentation.
This is to abide by the numpy style convention.
There's a bug here, since base_period is a list rather than a single int. Change to isin()
method.
Line 191 in 4e441a7
I think index_calculator may need to support multiple scenarios:
Consider a sensible way of implementing this - might need some tests first!
Line 80 in cf0df3a
Add a .fillna(base_prices)
method to cover the NaNs created by the shift.
Be mindful that this is changing in impute_base_prices
too, but it's covered their already with the .fillna(start_prices)
.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.