Code Monkey home page Code Monkey logo

gettsim's Introduction

Documentation Status image image Continuous Integration Workflow image image image image image



GETTSIM

GETTSIM provides a depiction of the German Taxes and Transfers System that is usable in a wide array of research applications, ranging from complex dynamic programming models to detailed microsimulation studies.

GETTSIM is implemented in Python, thereby achieving both user-friendliness and flexibility. All features are extensively tested.

You can install GETTSIM via conda with

$ conda install -c conda-forge gettsim

The documentation is available at https://gettsim.readthedocs.io. If you want to use it or help out in its development, feel free to get in touch! The ideal ways are to open an issue if you find a bug or something does not work as expected, or by joining our Zulip Chat at https://gettsim.zulipchat.com.

Initiated by

IZA DIW ifo Institute ZEW Universität Bonn

Universität Kassel Ludwig-Maximilians-Universität München Universität Mannheim Freie Universität Berlin IAB

gettsim's People

Contributors

amageh avatar boryana-ilieva avatar christianzimpelmann avatar davpahl avatar effiehan avatar eric-sommer avatar hmgaudecker avatar jakobwegmann avatar janosg avatar jhermann99 avatar juergenwiemers avatar lars-reimann avatar lauragergeleit avatar lillyfischer avatar m-pannier avatar maxblesch avatar mimmesberger avatar mj023 avatar mjbloemer avatar nafetsk avatar paulinaschroeder avatar pre-commit-ci[bot] avatar schra avatar si-pf avatar sofyaakimova avatar tebackh avatar timmens avatar tobiasraabe avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

gettsim's Issues

Actually run CI

Looking back at this I just realised that the Azure pipeline was never run. Are we missing some trigger or so?

Move zve_list to `policy_for_date` and make it year-dependent

Bug description

In #61, the definition of zve_list was not moved into policy_for_date and is still in test_favorability_check. It also must not be in calculate_tax_transfers. Moreover, it needs to be year-sensitive.

Solution

if year >= 2009:
    tb["zve_list"] = ["nokfb", "kfb", "abg_nokfb", "abg_kfb"]
else:
    tb["zve_list"] = ["nokfb", "kfb"]

extend modelling of vorsorgeaufwendungen

Current and desired situation

By means of Vorsorgeaufwendungen, taxpayers can deduct a fair share of their social security contributions. This is a major tax deduction and needs to be properly modelled. In 2004 and 2010, legislation was changed. So far, we only have implemented the ruling since 2010. For earlier years, we don't deduct them yet.

Proposed implementation

Three functions are needed.

  • ruling before 2004
  • ruling between 2004 and 2010
  • a function comparing those three against each other, choosing the most profitable one.

Considered alternatives

  • None really :)

Kinderzuschlag used to be paid only 36 months

Bug description

We assign Kinderzuschlag if the household is eligible. From 2005 to 2008, it was however only paid up to 36 months.

Possible Solution

Do not pay Kinderzuschlag at all from 2005 to 2008. This is parallel to the treatment of "Unterhaltsvorschuss" prior to 2017. This would require a small change in policies_for_date.

Function out- and input variables

Current and desired situation

I added an module where all the function outputs are saved. We could do the same with all the input variables. Then we could call from the tax_transfer.py and the tests these variables and have an aligned structure.

Proposed implementation

We could save the variable names in an extra module.

Considered alternatives

  • Save them in the config file.
  • Create some yaml or json file.
  • Save them next to the individual function code.

Convert test data to yaml.

Current and desired situation

  • The tests require ods-files which can only be read with a pip package which blocks making gettsim conda installable. #61
  • We have five data formats (csv, json, yaml, ods, xlsx). json can be replaced with yaml.
  • Consider to tackle #63 along the way.

Proposed implementation

Discussing

Considered alternatives

Discussing

Set up documentation

Aim: Get a working version sometime in August, no need to be perfect -- this will change a lot in the long run.

Use pytest.fixture to provide test data.

Current and desired situation

The functions in auxiliary_test_tax.py and maybe there are similar ones elsewhere should be implemented as pytest fixtures and collected in a conftest.py. More on this

Proposed implementation

Left out, but can be added for clarification.

Considered alternatives

  • No alternative.

Develop strategy for endogenizing program takeup

in the future, there will be a state variable (0/1) on whether or not a household applies for benefits. This should be treated accordingly in the benefit functions. Alternatively, you would (not) call the benefit functions at all and assign a zero value instead.

Migrate Gitlab issues

  • Given the amount, probably quickest to do by hand rather than using some automatization tool
  • Also some editing seems useful - divide up things like No. 17, English only (No. 14 - income vs. wage tax), no lists as in No 10.

Add the wohnbedarf array as parameters

While we are cleaning params here, do we want to add the wohnbedarf array into param.yaml here? Has also to do with alg2.

Originally posted by @Eric-Sommer in #69 (comment)

This one?

Yes. But in a separate PR.

I think these are the parents share for the housing cost (only used for KiZ, right?).

There are probably multiple approaches to add these as parameters:

At ifo we do not use these numbers from the KiZ Merkblätter. They are functions from other parameters defined by a law and published in the Existenzminimumbericht.

Or is there a specific source for this array?

Originally posted by @mjbloemer in #69 (comment)

Set up development workflow

Things like required reviews etc. over at OpenSourceEconomics / estimagic etc. look very useful to me. Now that we have ported everything from the old Gitlab repository, we can also be a bit more defensive on merging things.

@tobiasraabe, @janosg: Do you have any workflow written up somewhere? We should discuss this eventually and I'll set up the Github Project accordingly.

Calculate Lohnsteuer

We want to implement the option, that taxes are simulated as Lohnsteuer and not as Einkommenssteuer, as it is of now. The following discussion leads the foundation:

Eric Sommer:

As of now, our income tax calculation mimics the situation after the household filed the tax declaration and received the rebate. This is however not the same what the household pays when receiving wages. This is a fundamental choice one has to make.

  1. Simulate Income Tax
    • income from all sources, compare tax due to child benefit payments
  2. Simulate withholding tax (Lohnsteuer) on earnings
    • taxable earnings are wages minus
      • Werbungskostenpauschale (2019: 1000€)
      • Sonderausgabenpauschale (2019: 36€)
      • Vorsorgepauschale (similar, but not identical to "Vorsorgeaufwendungen")
    • child tax credit is irrelevant for Lohnsteuer, but not for Soli!
    • everybody receives child benefit.
    • for the remaining income sources, calculate income taxes as before.
    • There is no income splitting for married couples. Instead, the basic allowances might be distributed among couples ("Steuerklassen")

Currently, (1) is implemented, but there might be good reasons to opt for (2) as it more closely resembles the actual disposable income people are facing. It is questionable whether everybody anticipates the future tax rebate. One problem is that "Steuerklassen", which have a major impact on disposable income, are usually not available and would need to be assumed. A second problem is that withholding taxes are attributed to the individual. In order for them to have an impact on labor supply, one would need to drop the unitary household model and think about how common income is distributed among couples.

On a sidenote, the withdrawal tax is the revelant item when calculating benefit amounts for ALG2 and Housing Benefit.

HMG:

Thanks for the heads-up! Tricky thing indeed, but I do not think that the decision should be that fundamental. We should be able to handle both cases in the code eventually, picking whichever one is relevant for a project.

We might even be mixing both -- think of female labour supply with tax classes 3 (husband) / 5 (wife) -- she will look at a huge marginal tax rate when looking at her wage bill, but things might look much nicer when filing the tax return.

As for data on Steuerklassen: We might not get the joint distribution of them with all relevant variables, but we can always get marginals (and some joints) from other data sets. That income tax database for sure, but maybe even Microcensus and EVS have something?

Eric Sommer:

she will look at a huge marginal tax rate when looking at her wage bill, but things might look much nicer when filing the tax return.

Presumably, but only if she gets her share of the tax rebate (=> Intra-household bargaining).

EVS and microcensus do not have information on tax brackets. SOEP asked it in 2016, but lots of unrealistic cases (eg. 3/4 combinations).

inconsistent treatment of `tb_pens`

Bug description

tax_transfer() accepts no tb_pens is handed over. However, if you do this, the pension function breaks. So the non-existence of tb_pens is not checked for.

Suggested Solution

As pensions will be crucial for long-term analyses and their calculation does not make much sense without these parameters, I suggest making tb_pens mandatory.

Docs: How to contribute sections

Before we do anything like API docs, we should outline the workflow, this will be more stable...

Both for contributors (remember we will have many newbies, so maybe even call a section "What to do in case of errors" and then explain in detail how to open an issue) and for developers (always open PR, request review, rebase on current master and squash merge).

Add features for taxable income (zve function)

  • Model all seven income sources separately. This means to split the current gross_e1 to differentiate between income from farming, business, self-employment and freie Berufe.

  • If this is done, check which income sources are relevant for which benefits (Elterngeld, ALG I, Wohngeld, ALG II,...). Edit: This is obsolete as you don't need the income definitions from the income tax, but rather the actual gross incomes (e.g. self-employment and employment income for Elterngeld)

  • Lohnsteuer: Model withholding tax depending on Steuerklasse. (see also #18)

  • Progressionsvorbehalt: There are a couple of income items (e.g. Elterngeld, Unemployment Benefit) which are not taxed, but they increase the individual tax rate. Not yet modelled. (see #126)

  • Taxable Income: Model the most important deductions (fiscal-wise), e.g. Entfernungspauschale, Unterhaltsfreibetrag.

Automated parameter documentation

Current and desired situation

Currently the param.xls gives a relatively good overview of all the parameters over time. With issues of xls files and other aspects raised in #31 and the move to csv files (see #31 (comment)) for the parameters there is scope for a different approach on automated parameter documentation.

Proposed implementation

Add a script that replicates a param.xls file as well as a PDF documentation with nice graphs (example see #31 (comment)) or a nice website/html with all the parameters.

Considered alternatives

  • Ignore this, because the csv file as discussed in #31 is already easy to read

Fill up documentation table on user data columns

Current and desired situation

The tables for required input columns and for output columns are hardly filled in the second column.

It would be useful for picking up work on GEP 1 again and even more for my soon-to-be-drafted GEP 2 on internal data representation to have an overview of the actual names. @MaxBlesch, would be great if you could do that soonish with @Eric-Sommer's input as required.

Proposed implementation

  • Either just fill up the table for now.
  • In the long run, we probably want to have a YAML file that may have additional information structure (e.g., grouping of variables, units of measurement, ...)

I have no strong opinion for the moment what is easier; if it is simple to create a table from the YAML in the docs (is it, @tobiasraabe ?), maybe do it right immediately. Also saves the formatting stuff...

Add ALG II transfer withdrawal 2005-01-01 to 2005-09-30

Correct treatment of 'non-standard' households

Bug description

As soon as a household does not fit into the four classical categories (Single, Single Parent, Couple, Couple with kids), things get complicated when it comes to the benefit system in particular. We haven't properly sorted this issue out yet. Examples include:

  • how to treat the additional need for single parents in case there are more adults? In other words, how should a single parent be defined?
  • Wohngeld and ALG2 apply different household concepts. You may have the core family eligible for ALG2, while there are other persons under the same roof eligible for Wohngeld (see discussion, in particular section 5.4). This affects both the calculation of claims and a proper evaluation of different claims within in a household.
  • The treatment of pensioners' eligibility for benefits is related to that. They are eligible to wohngeld, but not to ALG2, although "Grundsicherung im Alter" basically pays the same amount. Pensioners are also hardly covered in the tests as of now.

This issue can be solved by looking in particular at exemplary cases from the literature and applying them as test cases.

Doublecheck tests

Bug description

In an earlier PR I came across some strange values in the test data. Therefore I included TODOs in

We need to doublecheck the test data if this is just due to rounding errors, due to false formulas calculating the data or errors in the functions.

Remove some cruft and set up coverage statistics

A look at the coverage statistics reveals that:

unterhaltsvorschuss ignores interaction with other parameters

The current modeling of Unterhaltsvorschuss ist not precise and interactions with child benefits are ignored.

Currently there are fixed values in the param file for the age groups while in reality the calculation of these values follows a specified function of existing parameters.

E.g. currently in 2019 this results for a child in the second age group in 212 Euro (which is the parameter uhv11).

Actually the calculation is: Kindliches Existenzminimum / 12 - Kindergeld.

E.g. for 2019-01-01 4872/12 - 194 = 212.

Now, if there is a change in Kindergeld in status quo or in a counterfactual, we would want to keep the interaction alive.

Actually from 2019-07-01 there was an increase in Kindergeld by 10 Euro and the Unterhaltsvorschuss automatically decreased by this, so now it is for the child in the middle age group 4872/12 - 204 = 202. See e.g. https://www.berit-sander.de/2019/unterhaltsvorschuss-ab-01-juli-2019/

For the calculation of the amount § 7 Abs. 1 Satz 1 UhVorschG links to § 1612a BGB (Mindestunterhalt). There are factors 0.87 and 1.17 for the first and the third age group which could be considered as parameters.

Kindergeld is already a parameter and the Kindliches Existenzminimum can be found in Existenzminimumbericht or calculated from Kinderfreibetrag if Betreuungs- und Ausbildungsbedarfe can be added as parameters .

wohngeld fails for SOEP data

Bug description

Trying again the real SOEP data lead to the following error:

File "/data/homes/iza6354/gettsim/gettsim/benefits/wohngeld.py", line 22, in wg
household["wohngeld_basis"] = apply_wg_formula(household, tb, hhsize)
  File "/data/homes/iza6354/gettsim/gettsim/benefits/wohngeld.py", line 244, in apply_wg_formula
+ (tb[f"wg_c_{hhsize}p"] * household["Y"])
KeyError: 'wg_a_13p'

Reason + Solution

This is a household with 13 members, for which there aren't any parameters defined. Therfore, one needs to hand over the min(12, hhsize) to apply_wg_formula().

Favorability Check with capital income tax

Current and desired situation

Since 2009, capital income is taxed with a flat rate of 25% (plus soli) (Abgeltungssteuer). In principle, tax authorities are supposed to tax capital income with a lower rate if the personal tax rate of all incomes would be lower than 25%. In order to do this, taxpayers have to claim their capital income on a separate sheet. Whether this happens on a regular basis is unclear.

Proposed implementation

If we want to implement this, we need to extend tb["zve_list"] with the taxable income definitions including capital income, i.e. zve_abg_nokfb and zve_abg_kfb. Renaming abg to e.g. cap might also make sense here.

Considered alternatives

leave as is because it's not relevant.

generic hypothetical household data analysis

One of the main applications of the model will presumaby be the application of 'typical' households, as microdata access is restricted outside of academia. For this reason, we need at some point a framework for preparing such data and make them congruent to the tax_transfer module.

The building blocks can be found in hypo.py (in particular create_hypo_data()) and would need to made more generic. In particular, the user should be able to decide upon

  1. mandatory
    1. Type of Household = ['single', 'single_parent', 'couple', 'couple with children']
    2. Amount/Range of monthly income
    3. number of kids
  2. optional
    1. Age for each member (default: 40 for adults, 6 for kids)
    2. Type of Dwelling (Default: 'rent')
    3. Monthly Rent / Housing Expenses (Default: average value from BA statistics)
    4. Monthly heating costs (Default: average value from BA statistics)
    5. Employment Type (default: 'Employee')
    6. In case of couples: Single-Earner or fixed income of one partner?
    7. East or West Germany

API for children

There are a couple of issues which would make the setup of a custom data set as input to tax_transfer() cumbersome at the moment. Let's collect them here.

  • There are many variables counting the number of children in different age brackets: [child_num, child3_6_num, child7_13_num, ...]. This is due to the scattered system of benefits in Germany. To get rid of this, each function could compute the necessary values itself, based on age of each household member. This would also reduce the amount of data carried from one function to the next.

Split up param.yaml by group and rename groups

Current and desired situation

param.yaml is a monster and unreadable. We have already agreed to split it up in smaller files. This makes it concrete and suggests renaming groups to something comprehensible by the non-initiated.

Proposed implementation

Use something close to the current groups (#56 (comment)) with more sensible and more readable names. This table is ordered by proposed name so that it is clear which ones I propose to merge:

Current Proposal
abgeltung abg_st
alg arb_los_geld
alg2 arb_los_geld_2
alh arb_los_hilfe
kinderfreibetrag ek_st_abzuege
zve ek_st_abzuege
tarif ek_st_tarif
tax_sched ek_st_tarif
eg erziehungsgeld
kindergeld kindergeld
kiz kinderzuschlag
ust mw_st
rente rente
soc_ins_contrib soz_vers_beitr
sgbxii sozialhilfe
sh sozialhilfe
uhv unterh_vorsch
wohngeld wohngeld

Please check carefully before we go for this (@mjbloemer, @Eric-Sommer, @stichnoth, your opinions woud be especially great to have!) - I may have missed something or misunderstood what is actually meant by a parameter.

Does it make sense to include abgeltungssteuer as a parametr of ek_st_tarif? It seems weird to have a group with just one element.

Missing pytest dependency in conda package

Bug description

GETTSIM's conda package is missing the dependency on pytest

To Reproduce

$ conda create -n tmp
$ conda activate tmp
$ conda install -c gettsim gettsim
$ python
Python 3.7.3 | packaged by conda-forge | (default, Jul  1 2019, 21:52:21) 
[GCC 7.3.0] :: Anaconda, Inc. on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import gettsim
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/xxx/miniconda3/envs/gst/lib/python3.7/site-packages/gettsim/__init__.py", line 1, in <module>
    import pytest
ModuleNotFoundError: No module named 'pytest'

Add unemployment benefit(alg2 func) features

  • Sozialhilfe acc. to SGB XII for disabled persons is not yet modelled.

  • Armutsgewöhnungszuschlag: Until 2010, you got something on top if you went from Unemployment Benefit to ALG2.

  • BuT (Bildungs- und Teilhabepaket): 120€ per child and year for ALG2 recipients to spend for school stuff. params are there.

  • ALG2 asset test: There are additional allowances for monetary assets like life insurances which we do not model yet. params are there however.

API for tax units and household types

Current and desired situation

The input data contain a number of demographic information derived from individual information. Examples include:

  1. The definition of tax_units and their size.
  2. The number of children in a given age (see #30) and the definition of a child itself.
  3. The household type (Single, Single Parent, Couple without kids, Couple with kids, [other]). This also affects dummy variables, such as married and zveranl.

As these are currently dealt with in the data preparation, they are a black box to GETTSIM users. Moreover, it leads to more required input variables and thus more burden on the user.

Proposed implementation

Create functions for each of the items above which return the variables hhsize, hhsize_tu, child_num, hhtype etc. based on the following variables only:

Variable Explanation
pid Personal ID
female
age
ineducation
mother_id pid of mother
father_id pid to Father
child_id pid to own child
partner_id pid to partner

Considered alternatives

leave as is, which is not feasible in the long run.

Monetary unit of parameters and data set

I think every monetary value in param.xls is currently in Euro; DM values are converted to Euro (which makes it just a little bit harder when comparing to the written law).

Perfect. I have no strong opinion on whether we want to convert those values now, but eventually DM would seem to make more sense to me.

There is one exception, namely the tax tariff parameters before 2002. The others are in Euro; initially because SOEP delivers all incomes in Euro as well. I'd prefer to sticking to Euro all the way. If you do a long-term analysis, you'd report in Euro anyway. It's easier to convert incomes rather than parameters, as there are lots of parameters (shares) which must not be converted.

I think we are on the same page in the sense that internally, we will do everything in Euros.

I disagree on parameters being more difficult to convert. That may be the case in Stata if you have to store all these animals in macros, but here?

def convert_dm_to_euro(dm):
    return dm / 1.95583

if unit == "Euro":
    val = raw_val
elif unit == "DM":
    val = convert_dm_to_euro(dm)
else:
   raise ValueError(f"Monetary Unit unknown: {raw_val}")

So this is only about how to store the input parameters and I think the idea of sticking close to the law is good there. As I wrote previously, whether we want to do that right away or do that at some later point (please open an issue in that case once we close this one, @mjbloemer lest we forget) I do not care much.

What we want to support in terms of input data is a wholly different question.

Originally posted by @hmgaudecker in #31 (comment)

Decide on variables

Current and desired situation

We need to decide which variables to be mandatory and which can either be assumed or if they appear only in a nonrelevant calculation( m_imputedrent in gross_income) can be left out. Furthermore there are variables right now, which can be computed from basic household data(hhsize, child_num_tu etc.)

Proposed implementation

Catch in the beginning missing variables and either alert user or assume fill values, but still alert user.

Considered alternatives

  • Make all variables mandatory

Convert remaining Excel-files to YAML

Current and desired situation

We mostly removed the dependence on Excel, but pensions.xlsx is still around.

Proposed implementation

Same as in #54

  • Convert pensions.xlsx to YAML format
  • Adjust data loading whereever we call pd.read_excel
  • Remove xlrd as a dependency in environment.yml and meta.yaml
  • Remove lines 28 and 29 (starting with # household and # wg.to_excel) in wohngeld.py

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.