opensourceeconomics / grmpy Goto Github PK

View Code? Open in Web Editor NEW

18.0 18.0 5.0 19.09 MB

Python package for the simulation and estimation of generalized Roy model

Home Page: http://grmpy.readthedocs.io

License: MIT License

Python 11.91% Shell 0.01% TeX 0.87% Jupyter Notebook 87.22%

econometrics economics generalized-roy-model software-engineering

grmpy's People

Contributors

Stargazers

Watchers

Forkers

lnsongxf anhnguyendepocen jnaidoo fagan2888 sebecker

grmpy's Issues

Comparison File

Please automatically simulate a sample based on the estimation results and compare the basic descriptives between the simulated and the observed sample ... like the counts across treatment status and the descriptives about the distribution of the observed outcomes.

Order of information for covariance matrix

This does not have anything to do with the order that you specified in the documentation, right?

    U0_sd, U1_sd, V_sd = init_dict['DIST']['all'][:3]
    vars_ = [U0_sd ** 2, U1_sd ** 2, V_sd ** 2]
    U01, U0_V, U1_V = init_dict['DIST']['all'][3:]
    covar_ = [U01 ** 2, U0_V ** 2, U1_V ** 2]
    Dist_coeffs = init_dict['DIST']['all']

If so, please align the code with the documentation.

TODO List Sphinx

Use TODO List feature in online doc to implement idea of someday/maybe list. This centralizes all notes and is not presented online ...

This is the last notes from evernote to integrate:
Cloud Infrastructure
MuPY
Hypothesis Package

document workflow as part of contribution

Analytical derivatives

Please add analytical derivatives, this make BFGS based estimation much faster. This function is very helpful during this error prone process. https://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.check_grad.html

Plese set up a unit test that compares the numerical and analytical derivatives for random requests.

Staring Values from OLS and Probit

Please add a feature that allows to use the results from an OLS and Probit regression as the starting values.

README cleanup

Please remove from README.md

The main references are:

Heckman, J. J., and Vytlacil, E. J. (2007a). Econometric evaluation of social programs, Part I: Causal effects, structural models and econometric policy evaluation. In Heckman, J. J., and Leamer, E. E., editors, Handbook of Econometrics, volume 6B, pages 4779–4874. Elsevier Science, Amsterdam, Netherlands.

Heckman, J. J., and Vytlacil, E. J. (2007b). Econometric evaluation of social programs, Part II: Using the marginal treatment effect to organize alternative economic estimators to evaluate social programs and to forecast their effects in new environments. In Heckman, J. J. and Leamer, E. E., editors, Handbook of Econometrics, volume 6B, pages 4875–5144. Elsevier Science, Amsterdam, Netherlands.

SIMULATION after Estimation Flawed

There is not role for randomenss?

`
def simulate_outcomes_estimation(init_dict, X, Z):
"""The function simulates the outcome Y, the resulting treatment dummy."""
# Distribute information
coeffs_untreated = init_dict['UNTREATED']['all']
coeffs_treated = init_dict['TREATED']['all']
coeffs_cost = init_dict['COST']['all']

# Calculate potential outcomes and costs
Y_1 = np.dot(coeffs_treated, X.T)
Y_0 = np.dot(coeffs_untreated, X.T)
C = np.dot(coeffs_cost, Z.T)

# Calculate expected benefit and the resulting treatment dummy
D = np.array((Y_1 - Y_0 - C > 0).astype(int))

# Observed outcomes
Y = D * Y_1 + (1 - D) * Y_0

return Y, D, Y_1, Y_0

Prepare Branch Workflow

I want to switch to a feature branch workflow soon. For this purpose, I want to set up the automatic code review tools that assess the quality of the branch.

adding reference

Please add the missing reference to this paper ....
http://www.journals.uchicago.edu/doi/abs/10.1086/679498 for Eisenhauer.2015 in contributing.rst

Random Initialization file

Please move print_dict() into generate_random_dict()

pei_edits

This issue simply serves the purpose to keep track of the major edits to the code in the branch.

improved usability of regression test runner
integrated custom exceptions and started initialization file checks
added missing docstring to check_types()
cleaned up regression_test_2
refactored simulation of unobservables
refactoring MTE unit test
refactoring of simulation modules

Output from REliability test

This is from the reliability test setup, why is the message indicate success of the optimizer but the warning also needed?

OUT_POWELL_true_values.txt

Can we remove the branch pei_hackathon?

Please confirm that all is merged into erbin and then delete it.

Update Regression Tests

Please include in our regression test a single evaluation of the criterion function at the starting values in addition to the overall statistic on the simulated dataset.

True Values vs. Init Values

Please rename the true values to init values... These are only the true values if the dataset is simulated with the same initialization file. This refers to the user option, but also inside the code if required.

Reminder Codacy

We want to integrate codacy in future pull request requirements .

Reference on index.rst

The reference to Heckman Vytlacil in the very beginning does not conform with our treatment of references.

Economics.rst

I created a new section that describes the basic economics that underlie the generalized Roy model and discuss some selected issues in the econometrics of policy evaluation. At this point this is mainly a simple copy of the material from https://github.com/policyMetrics/miscellaneous/blob/master/Eisenhauer.2012.pdf

Please polish this section by properly formatting everything such as the equations, references. Please make sure that all references also show up in the bibliography.

ESTIMATION block in initialization file

Please add an explicit block in the initialization file that contains parameters for the estimation:

ESTIMATION

agents 1000
file data.respy.dat
maxfun 1000
optimizer FORT-NEWUOA

This is the relevant part from the respy pacakge.

Agents describes the number of agents to use for the estimation, we might only want to estimate on a subset of the data in the simulation sample.
file is the source for the estimation sample
maxfun is the maximum number of function evaluations. This is a little tricky to enforce with the scipy optimizers as the concept of maxiter that the options provide is different. I usually have a user-defined error class, see here for an example. As a start you might also simply write out the number of function evaluations to a file each time the likelihood function is called.
Please check that the special case of maxfun = 0 checks the value of the criterion function at the starting value. This is not the same as maxfun = 1 with the BFGS which first calculated the derivatives.
optimizer is the optimizer to use, in our case SCIPY-BFGS as the only option

add suggested citation

Once we have our first release, this needs to be added to the documentation.

Log file for Estimation results

At the end of each estimation, please output a file est.grmpy.info that contains the value of the parameters at the start and at the end. Also, note the number of function evaluations and the optimizer termination status as well as the optimizer message.

Example from respy attached.
est.info.txt

MTE Caclulation flawed

MTE is flat, but has wrong level.
init.txt
simulatio_info.txt

Tutorial

Please document the initialization file in the tutorial.rst. See here for an example, but feel free to deviate if there is a good reason. http://respy.readthedocs.io/en/latest/tutorial.html

add parametric assumptions
add explained example script as in http://respy.readthedocs.io/en/latest/tutorial.html

Cholesky Factors

Please change the setup of the optimization so that we are internally optimizing over the Cholesky Factor of the covariance matrix. This allows us to avoid any of the parameter transformations, i.e. ensuring that values valid variances and covariances.

Order of Unobservables

The unobservables are ordered (U_0, U_1) while the potential outcomes are always (Y_1, Y_0). Please adjust code, simulation output, documentation so that the unobservables are (U_1, U_0 ) as well ...

Regression Tests

We need to integrate a regression test battery in our workflow. See my first draft at https://github.com/grmToolbox/grmpy/blob/master/development/tests/regression/draft.py
We will discuss the ideas behind it and the next steps during our call today.

POWELL

Please add POWELL as an alternative optimizer to request by the user.

SCIPY Optimization

Do we correctly understand that the return values are always the starting values if the success indicator is false?

MTE Calculation

As you suspected, the MTE calculation is only valid for the special case of var(V) = 1. Please generalize the function ...

Docstrings missing

Several functions are missing a docstring.

... documentation skeleton

Please conduct a couple of edits to the documentation:

fix all links, the markdown syntax seems different from GitHub Wiki. See the Handbook references in the beginning for example
since this is just a ripoff from the respy doc at the moment, please check all links that they actually work and point to grmpy material
add Tobias to contributors

Layout of the Information File

I want to improve the layout of the information file. Also, prepare layout for subsequent MTE implementation by Sebastian.

run test suite from terminal

Please add function that allows us to run to the tests from inside the interpreter:

python -c "import grmpy; grmpy.test()"

See https://github.com/restudToolbox/package/blob/master/respy/__init__.py for an example.

BFGS Options

Please add the feature that all BFGS options are specified in the initialization file, see below for the example from respy.

SCIPY-BFGS

gtol 0.000100000000000
maxiter 1

[ ] incorporate in read()
[ ] add tests at beginning of estimate() to see whether specified with valid input values.
[ ] add to random initialization file generator

Squared Covariances

It seems we are working with squared covariances?

Simulation of Binary Covaraites

Please implement the following feature: I want to be able to specify in the initialization file that a certain covariate takes on only value one and zero ....

make sure to document new features, as well as the other default currently implemented
also make sure that part of random init generator

Definition of Done:

requires an update to the documentation
requires an updated regression test battery

Docstrings

Please fix formatting in conftest.py. No need to assign the issue back to me, just close it right away when you are done.

Failed Regression tests

Installing the package in development mode and running py.test gives an error. Please fix.


(grmToolbox) peisenha@pontos:~/grmToolbox/grmpy$ pip install -e .
Obtaining file:///home/peisenha/ownCloud/office/workspace/software/repositories/organizations/grmToolbox/grmpy
Requirement already satisfied: numpy in /home/peisenha/.envs/grmToolbox/lib/python3.5/site-packages (from grmpy==0.0.5.dev0)
Requirement already satisfied: scipy in /home/peisenha/.envs/grmToolbox/lib/python3.5/site-packages (from grmpy==0.0.5.dev0)
Requirement already satisfied: pytest in /home/peisenha/.envs/grmToolbox/lib/python3.5/site-packages (from grmpy==0.0.5.dev0)
Requirement already satisfied: pandas in /home/peisenha/.envs/grmToolbox/lib/python3.5/site-packages (from grmpy==0.0.5.dev0)
Requirement already satisfied: statsmodels in /home/peisenha/.envs/grmToolbox/lib/python3.5/site-packages (from grmpy==0.0.5.dev0)
Requirement already satisfied: py>=1.4.33 in /home/peisenha/.envs/grmToolbox/lib/python3.5/site-packages (from pytest->grmpy==0.0.5.dev0)
Requirement already satisfied: setuptools in /home/peisenha/.envs/grmToolbox/lib/python3.5/site-packages (from pytest->grmpy==0.0.5.dev0)
Requirement already satisfied: python-dateutil>=2 in /home/peisenha/.envs/grmToolbox/lib/python3.5/site-packages (from pandas->grmpy==0.0.5.dev0)
Requirement already satisfied: pytz>=2011k in /home/peisenha/.envs/grmToolbox/lib/python3.5/site-packages (from pandas->grmpy==0.0.5.dev0)
Requirement already satisfied: patsy in /home/peisenha/.envs/grmToolbox/lib/python3.5/site-packages (from statsmodels->grmpy==0.0.5.dev0)
Requirement already satisfied: six>=1.5 in /home/peisenha/.envs/grmToolbox/lib/python3.5/site-packages (from python-dateutil>=2->pandas->grmpy==0.0.5.dev0)
Installing collected packages: grmpy
  Found existing installation: grmpy 0.0.5.dev0
    Uninstalling grmpy-0.0.5.dev0:
      Successfully uninstalled grmpy-0.0.5.dev0
  Running setup.py develop for grmpy
Successfully installed grmpy
(grmToolbox) peisenha@pontos:~/grmToolbox/grmpy$ py.test
============================= test session starts ==============================
platform linux -- Python 3.5.2, pytest-3.2.1, py-1.4.34, pluggy-0.4.0
rootdir: /home/peisenha/ownCloud/office/workspace/software/repositories/organizations/grmToolbox/grmpy, inifile:
collected 7 items                                                               

grmpy/test/test_integration.py .F
grmpy/test/test_unit.py .....

================================================================================================= FAILURES =================================================================================================
_____________________________________________________________________________________________ TestClass.test2 ______________________________________________________________________________________________

self = <grmpy.test.test_integration.TestClass object at 0x7f6f388a5048>

    def test2(self):
        """The test takes a subsample of 5 random entries from the regression battery test list
            (resources/regression_vault.grmpy.json), simulates the specific output again, sums the
            resulting data frame up and checks if the sum is equal to the regarding entry in the test
            list eement.
            """
        tests = json.load(
>           open('{}'.format(os.getcwd()) + '/test/resources/regression_vault.grmpy.json', 'r'))
E       FileNotFoundError: [Errno 2] No such file or directory: '/home/peisenha/ownCloud/office/workspace/software/repositories/organizations/grmToolbox/grmpy/test/resources/regression_vault.grmpy.json'

grmpy/test/test_integration.py:35: FileNotFoundError
=================================================================================== 1 failed, 6 passed in 11.27 seconds ====================================================================================
(grmToolbox) peisenha@pontos:~/grmToolbox/grmpy$

Discuss Estimation Feature

Require user to specify the column number where the regressor is found ... Impose restriction for now, that the columns need to be specified identical for treated, untreated.

document that we have strict separation between cost and benefit shifters, we will weaken that restriction in due time.

bumpversion

https://github.com/peritus/bumpversion To ease workflow with pypi releases

Agent's Information Set

The individuals know their values for U_1 and U_0 when making their decision. Please account for this when simulating the choice.

Link to Tutorial

The link to the tutorial init file refers to an old branch and needs to be updated once we are back in master.

Notebook with Simulation Code

We will iterate on this noteboook https://github.com/grmToolbox/grmpy/blob/master/simulation.ipynb and develop a baseline simulation code of the generalized Roy Model. The first part of this lecture will give you some guidance. https://github.com/grmToolbox/notebook/blob/master/lecture/lecture.ipynb However, tackle the problem your own way and then we will iterate on it from there.

sphinx latexpdf

Creating a pdf from our documentation fails, please see if you can reproduce the problem and fix it.


(grmToolbox) peisenha@pontos:~/grmToolbox/grmpy/docs$ make latexpdf
Running Sphinx v1.6.3
making output directory...
/home/peisenha/.envs/grmToolbox/lib/python3.5/site-packages/sphinx/util/compat.py:40: RemovedInSphinx17Warning: sphinx.util.compat.Directive is deprecated and will be removed in Sphinx 1.7, please use docutils' instead.
  RemovedInSphinx17Warning)
loading pickled environment... done
building [mo]: targets for 0 po files that are out of date
building [latex]: all documents
updating environment: 0 added, 0 changed, 0 removed
looking for now-outdated files... none found
processing grmpy.tex...index economics installation tutorial reliability software_engineering contributing credits changes bibliography 
resolving references...

Exception occurred:
  File "/home/peisenha/.envs/grmToolbox/lib/python3.5/site-packages/sphinx/transforms/post_transforms/images.py", line 73, in handle
    basename = sha1(node['uri']).hexdigest()
TypeError: Unicode-objects must be encoded before hashing
The full traceback has been saved in /tmp/sphinx-err-0hbedqya.log, if you want to report the issue to the developers.
Please also report this if it was a user error, so that a better error message can be provided next time.
A bug report can be filed in the tracker at <https://github.com/sphinx-doc/sphinx/issues>. Thanks!
Makefile:20: recipe for target 'latexpdf' failed
make: *** [latexpdf] Error 1

Some comments on test_unit.py

Please refactor

            for f in glob.glob("*.grmpy.*"):
                os.remove(f)

into a function new function cleanup(). This shows up numerous times in the test-related modules

Please do not use assert np.array_equal instead use np.testing.assert_equal instead, try to avoid using assert ... altogether in the test modules.
This could use a loop:
assert np.array_equal(df.Y[df.D == 1], df.Y1[df.D == 1])
assert np.array_equal(df.Y[df.D == 0], df.Y1[df.D == 0])
Please consider replacing this x_ = [col for col in df if col.startswith('X')] by using http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.filter.html
There is a print statement in test5()
Why do we have test_1 and test_2 separate, they are pretty much identical? It is enough if the is_deterministic flag is tested with probablity 0.1

Talk to you next week ...

POWELL Problem

/home/peisenha/.envs/grmToolbox/lib/python3.5/site-packages/statsmodels/compat/pandas.py:56: FutureWarning: The pandas.core.datetools module is deprecated and will be removed in a future version. Please use the pandas.tseries module instead.
from pandas.core import datetools
Traceback (most recent call last):
File "run.py", line 16, in
estimate('test.grmpy.ini', option, optimizer='POWELL')
File "/home/peisenha/ownCloud/office/workspace/software/repositories/organizations/grmToolbox/grmpy/grmpy/estimate/estimate.py", line 40, in estimate
minimizing_interface, x0, args=(dict_, data), method=method, options=opts)
File "/home/peisenha/.envs/grmToolbox/lib/python3.5/site-packages/scipy/optimize/_minimize.py", line 440, in minimize
return _minimize_powell(fun, x0, args, callback, **options)
File "/home/peisenha/.envs/grmToolbox/lib/python3.5/site-packages/scipy/optimize/optimize.py", line 2435, in _minimize_powell
direc1 = direc[i]
IndexError: too many indices for array

Attached initialization file test.txt

NUMPY Criterion

Please implement version of the likelihood function that relies on NUMPY and avoids the time consuming loop. As a suggestions, set up a unit test that compares your slow version against a fast version for random initialization files.

MTE Information

Please add the calculation of the MTE based on https://www.aeaweb.org/articles?id=10.1257/aer.101.6.2754 to our information file.

Unit test for special case that MTE is flat ....

est.grmpy.info

Please adjust order of printed coefficients to the one in the init file. Add constant variance for V

index.rst

estimation of generalized Roy Model (Heckman & Vytlacil, 2005) ...
Please add the usual link to the reference in the bibliography.