opensourceeconomics / respy Goto Github PK

View Code? Open in Web Editor NEW

74.0 6.0 31.0 125.62 MB

Framework for the simulation and estimation of some finite-horizon discrete choice dynamic programming models.

Home Page: http://respy.readthedocs.io

License: MIT License

Python 99.97% Shell 0.02% Batchfile 0.01%

economics structural-microeconometrics markov-decision-processes

respy's Introduction

respy

Note: respy is not under development anymore and only inactively maintained since 2021. Check out our GitHub organization to find projects that are currently under development.

respy is an open source framework written in Python for the simulation and estimation of some finite-horizon discrete choice dynamic programming models. The group of models which can be currently represented in respy are called Eckstein-Keane-Wolpin models (Aguirregabiria and Mira (2010))

What makes respy powerful is that it allows to build and solve structural models in weeks or months whose development previously took years. The design of respy allows the researcher to flexibly add the following components to her model.

Any number of discrete choices (e.g., working alternatives, schooling, home production, retirement) where each choice may yield a wage, may allow for experience accumulation and can be constrained by time, a maximum amount of accumulated experience or other characteristics.
Condition the decision of individuals on its previous choices or their labor market history.
Adding a finite mixture with any number of subgroups to account for unobserved heterogeneity among individuals as developed by Keane and Wolpin (1997).
Any number of time-constant observed state variables (e.g., ability measures (Bhuller et al. (2020)), race (Keane and Wolpin (2000)), demographic variables) found in the data.
Correct the estimation for measurement error in wages, either using a Kalman filter in maximum likelihood estimation or by adding the measurement error in simulation based approaches.

You can install respy via conda with

$ conda config --add channels conda-forge
$ conda install -c opensourceeconomics respy

Please visit our online documentation for tutorials and other information.

As respy relies heavily on pandas, you might also want to install their recommended dependencies <https://pandas.pydata.org/pandas-docs/stable/getting_started/ install.html#recommended-dependencies> to speed up internal calculations done with pd.eval <https://pandas.pydata.org/pandas-docs/stable/user_guide/enhancingperf.html #expression-evaluation-via-eval>.

conda install -c conda-forge bottleneck numexpr

Citation

respy was completely rewritten in the second release and evolved into a general framework for the estimation of Eckstein-Keane-Wolpin models. Please cite it with

@Unpublished{Gabler2020,
  Title  = {respy - A Framework for the Simulation and Estimation of
            Eckstein-Keane-Wolpin Models.},
  Author = {Janos Gabler and Tobias Raabe},
  Year   = {2020},
  Url    = {https://github.com/OpenSourceEconomics/respy},
}

Before that, respy was developed by Philipp Eisenhauer and provided a package for the simulation and estimation of a prototypical finite-horizon discrete choice dynamic programming model. At the heart of this release is a Fortran implementation with Python bindings which uses MPI and OMP to scale up to HPC clusters. It is accompanied by a pure Python implementation as teaching material. If you use respy up to version 1.2.1, please cite it with

@Software{Eisenhauer2019,
  Title  = {respy - A Package for the Simulation and Estimation of a prototypical
            finite-horizon Discrete Choice Dynamic Programming Model.},
  Author = {Philipp Eisenhauer},
  Year   = {2019},
  DOI    = {10.5281/zenodo.3011343},
  Url    = {https://doi.org/10.5281/zenodo.3011343}
}

We appreciate citations for respy because it helps us to find out how people have been using the package and it motivates further work.

References

Aguirregabiria, V., & Mira, P. (2010). Dynamic Discrete Choice Structural Models: A Survey. Journal of Econometrics, 156(1), 38-67

Bhuller, M., Eisenhauer, P. and Mendel, M. (2020). The Option Value of Education. Working Paper.

Keane, M. P. and Wolpin, K. I. (1994). The Solution and Estimation of Discrete Choice Dynamic Programming Models by Simulation and Interpolation: Monte Carlo Evidence. The Review of Economics and Statistics, 76(4): 648-672.

Keane, M. P. and Wolpin, K. I. (1997). The Career Decisions of Young Men. Journal of Political Economy, 105(3): 473-522.

Keane, M. P., & Wolpin, K. I. (2000). Eliminating Race Differences in School Attainment and Labor Market Success. Journal of Labor Economics, 18(4), 614-652.

respy's People

Contributors

Stargazers

Watchers

respy's Issues

NAN Return Value

This should not happen!
mode.respy.txt
est.respy.log
est.respy.txt

Responsible Estimation

We need a section in the documentation that illustrates how to actually estimate a model on simulated dataset using the fit() function that emphasizes the parts of the initialization file that are relevant here but were not already discussed for the simulation step. For example, optimizer options, scaling, derviatives , etc. Please feel free to reach out to @janosg for questions in the process

Add information on average time at HOME to simulation output

... just for completeness, it allows for easier creation of some tables in paper and manuscript.

Mean shifts and covariance

How to think about the fixed covariance matrix in light of mean shifts. Maybe it would be better to fix the correlation structure?

type ordering if equal intercept

The following init file only generates type 3 draws in a simulation as the intercepts, that deterime the ordering, are all identical equal to zero. full.respy.txt

PRINTING in data.respy.sol

Starting state space creation

... finished

Starting calculation of systematic rewards

... finished

Starting backward induction procedure

**... solving period 48 with ***** states

... solving period 47 with ***** states**

Gradient Preconditioning

In rare instances, the value of the criterion function might be too large and thus printed as a string. This occurred in the past, when the gradient preconditioning had zero probability observations. We now generate random initialization files with smaller gradient step sizes ... We need to revisit this and handle this case more explicitly by adding a minimum/maximum value. If this happens at the gradient calculation at the start values, then the scaling results in nonsense. We need to add a maximum value in addition to the minimum value. So, that a -HUGE log likelihood can still be processed properly.

WAGE INTERNAL

The simulated dataset contains a variable WAGE_INTERNAL that writes out the full wage and not just two two digit precision as WAGE. This is useful for unit testing some relationships in the data. However, I would prefer to write out the WAGE variable just to full precision in the simulation case. This breaks the release tests as the likelihood of identical samples is slightly altered.

merge coefficients ...

merge coeffs_a and coeffs_b to allow for loops in codes

Scripts Documentation

Add the use of scripts

test_integration_test_5

The set of admissible actions needs to be extended, very restrictive at the moment.

PEP 257 - Docstring Convention

Revisit package for compliance https://www.python.org/dev/peps/pep-0257/

Authorship Information

We need authorship information in the respy.init following this convention.

pytest warnings

In general I want to turn off warnings if if production compiled.

respy/tests/test_integration.py::TestClass::()::test_15
/home/peisenha/ownCloud/office/workspace/software/repositories/organizations/restudToolbox/package/respy/clsRespy.py:266: ResourceWarning: unclosed file <_io.BufferedWriter name='solution.respy.pkl'>
pkl.dump(self, open(file_name, 'wb'))
/home/peisenha/ownCloud/office/workspace/software/repositories/organizations/restudToolbox/package/respy/clsRespy.py:266: ResourceWarning: unclosed file <_io.BufferedWriter name='solution.respy.pkl'>
pkl.dump(self, open(file_name, 'wb'))
/home/peisenha/ownCloud/office/workspace/software/repositories/organizations/restudToolbox/package/respy/clsRespy.py:266: ResourceWarning: unclosed file <_io.BufferedWriter name='solution.respy.pkl'>
pkl.dump(self, open(file_name, 'wb'))
/home/peisenha/ownCloud/office/workspace/software/repositories/organizations/restudToolbox/package/respy/clsRespy.py:266: ResourceWarning: unclosed file <_io.BufferedWriter name='solution.respy.pkl'>
pkl.dump(self, open(file_name, 'wb'))

-- Docs: http://doc.pytest.org/en/latest/warnings.html

Notes from random review

I think the covariate any_exp is calculated wrong. Why the constraint that exp > 0?
I need to extend the code to allow for initial conditon actiity lagged not equal to schooling in the last period.
Is the wage equation always properly calculated, I am worried about period -1. The code seems not to be consistent in iterating over period starting at 0 or 1.
- It seems that there is a low-impact bug regarding periods treatment in line 158 of simulate_fortran.f90. There I should pass in periods + 1, right?

debug setup

I would like that the debug flag also has consequences for the types of warnings for example that we show, i.e. only errors if not debug otherwise warnings as well.

structRecomputation fix in get_coefficients

This needs to be removed once the manuscript is published.

SIMULATION INFORMATION

Add count for intial schooling shares, this allows to easier modification of initalization files

TYPE COUNTS

There seems to be a problem with the type counts, there are too many shifts specified compared to the shares.

INITIAL SCHOOLING

Can I allow to pass in shares that do not sum to one? This would make it easier to iteratively add initial schooling levels by coping the initial schooling shares in the simulated dataset.

respy-update

The order of education initial conditions is maintained. So, when the original user specifies for 10 years first then for 11 and then for 9 it would be nice if these were ordered.

Ambiguity Performance Measure

Please add a summary in data.respy.info after simulation. Now, I always need to go back to the the very large ambiguity log file.

Add Fortran coverage to coverage.io

space between coeff and sign

The processing of an initialization file fails when there is a space between the value and the sign of a coefficient.

Property Testing Results

property.respy.txt

The f2py tests seem to fail for ill-conditioned problems only.
The interpolation test takes very long to run, also, the interpolation tests should be refactored to also include the flag_ambiguity=True if called from test_ambiguity explicitly.

Input Data

I should construct all derived information such as lagged activity and experiences simply inside the program ... I can still output it for simulated datasets for debugging and testing purposes.

Identifier as column name and index level

/home/peisenha/.envs/2.0.0.dev16/local/lib/python2.7/site-packages/respy/python/process/process_auxiliary.py:60: FutureWarning: 'Identifier' is both a column name and an index level.
Defaulting to column but this will raise an ambiguity error in a future version

num_rows

fort_evaluate.f90, num_rows seems not to be used and can thus be removed

Testing Infrastructure

Would be nice to have a more formal process, pickled storage of failed seed which can be removed when investigated etc. Hypothesis Package for Property-based Testing, contribute hour runs as open source contribution. There seems to be a lot of overlap with my needs for the test battery with random parameterizations. Also, I should think about using more and more pzytest features, such as skipif properties and such in my own run/property test setup. At the moment, I remove the test_parallels if not IS_Parallel available.

Property Testing Failures

There were some issues for further investigation, that were all solved.

Amdahl's Law

Mention Amdahls Law in scalability exercises, https://en.wikipedia.org/wiki/Amdahl%27s_law, nice graph on slide 4 at http://insidehpc.com/2016/10/parallel-programming-efficiency/

Changelog

We need to add a new CHANGELOG file following the convention here. Please add one just with the header information

IS_DETERMINISTIC evaluation of criterion function

This concept does only make sense in the case of a single type, otherwise the likelihood is never one ... This does not make any sense either anymore:

            #If there is no random variation in rewards, then the observed wages need to be
            # identical their systematic components. The discrepancy between the observed
            # wages and their systematic components might be small due to the reading in of
            # the dataset (FORTRAN only).

Related to #70

SQL Naming Convention

https://launchbylunch.com/posts/2014/Feb/16/sql-naming-conventions/

TRAVIS-CI

It would be nice if we could also test the parallel implementation on TRAVIS-CI. This requires installation of MPICH. Also, we want to add a build matrix.

Simulation at new step

Add ability to be able to simulate from new points in est.respy.info without updating the model.respy.ini

preconditioning/scaling naming convention

The naming of preconditioning/scaling is not properly done throughout the code. For example, scaling now refers to performance exercise and preconditioning to the preconditioning of the parameters of the criterion function. However, the function doing the recording is still named record_estimation_scaling.

add likelihood of simulated sample to simulation info

This eases reliability testing.

SLSQP Quality

The SLSQP implementation involves intentional zero divisions which hinders the use of ffpe-trap=invalid as an option for the debug compilation. Maybe we can disable this for one particular file or find a better FORTRAN implementation.

sum of initial conditions

Traceback (most recent call last):
File "/home/eisenhauer/.envs/structAmbiguity/bin/respy-modify", line 190, in
scripts_modify(*args)
File "/home/eisenhauer/.envs/structAmbiguity/bin/respy-modify", line 84, in scripts_modify
respy_obj = RespyCls(init_file)
File "/home/eisenhauer/restudToolbox/package/respy/clsRespy.py", line 147, in init
self.lock()
File "/home/eisenhauer/restudToolbox/package/respy/clsRespy.py", line 199, in lock
self._check_integrity_attributes()
File "/home/eisenhauer/restudToolbox/package/respy/clsRespy.py", line 737, in _check_integrity_attributes
np.testing.assert_almost_equal(np.sum(edu_spec['share']), 1.0)
File "/home/eisenhauer/.envs/structAmbiguity/lib/python2.7/site-packages/numpy/testing/utils.py", line 539, in assert_almost_equal
raise AssertionError(_build_err_msg())
AssertionError:
Arrays are not almost equal to 7 decimals
ACTUAL: 1.0000099999999998
DESIRED: 1.0

The test is too strict it might fail when reading in valid shares.

ZeroDivision FORTRAN Compilation

# TODO: When revising the build process, it want to explore ways to
# impose the check for ZeroDivision Errors in all modules besides the SLSQP.

Informative Error Message when all parameters fixed

If I have an init file with all parameters either fixed or with bounds, then this throws an error. It should not if at least one parameter is free, even with bounds, for estimation.

Reactivate steps

Add ability to reactivate a evaluation in est.respy.log for est.respy.info to work with from there.

NAMESPACE

I would like to have a stricter sharing of global variables within the program unit. The variables required for the evaluation of the criterion functions should only be accessible there and not everywhere else.

Remove underscore in data column names

Estimation on Subset

We allow to run estimations on less or equal the number of estimation agents specified in the init file to ease adding iteratively different initial conditions, etc. However, this case in not part of the random init file generation. This needs to be explicitly included.

Calculation of Average Schooling in Simulation output

This is wrong if individuals enroll in school the very last period. We are only using the information on the schooling when they enter the last period.... This is a little more tricky than calculating average occupation experience for example as individuals differ in their level of initial schooling.