opensourceeconomics / grmpy Goto Github PK
View Code? Open in Web Editor NEWPython package for the simulation and estimation of generalized Roy model
Home Page: http://grmpy.readthedocs.io
License: MIT License
Python package for the simulation and estimation of generalized Roy model
Home Page: http://grmpy.readthedocs.io
License: MIT License
Please automatically simulate a sample based on the estimation results and compare the basic descriptives between the simulated and the observed sample ... like the counts across treatment status and the descriptives about the distribution of the observed outcomes.
This does not have anything to do with the order that you specified in the documentation, right?
U0_sd, U1_sd, V_sd = init_dict['DIST']['all'][:3]
vars_ = [U0_sd ** 2, U1_sd ** 2, V_sd ** 2]
U01, U0_V, U1_V = init_dict['DIST']['all'][3:]
covar_ = [U01 ** 2, U0_V ** 2, U1_V ** 2]
Dist_coeffs = init_dict['DIST']['all']
If so, please align the code with the documentation.
Use TODO List feature in online doc to implement idea of someday/maybe list. This centralizes all notes and is not presented online ...
This is the last notes from evernote to integrate:
Cloud Infrastructure
MuPY
Hypothesis Package
document workflow as part of contribution
Please add analytical derivatives, this make BFGS based estimation much faster. This function is very helpful during this error prone process. https://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.check_grad.html
Plese set up a unit test that compares the numerical and analytical derivatives for random requests.
Please add a feature that allows to use the results from an OLS and Probit regression as the starting values.
Please remove from README.md
The main references are:
Heckman, J. J., and Vytlacil, E. J. (2007a). Econometric evaluation of social programs, Part I: Causal effects, structural models and econometric policy evaluation. In Heckman, J. J., and Leamer, E. E., editors, Handbook of Econometrics, volume 6B, pages 4779โ4874. Elsevier Science, Amsterdam, Netherlands.
Heckman, J. J., and Vytlacil, E. J. (2007b). Econometric evaluation of social programs, Part II: Using the marginal treatment effect to organize alternative economic estimators to evaluate social programs and to forecast their effects in new environments. In Heckman, J. J. and Leamer, E. E., editors, Handbook of Econometrics, volume 6B, pages 4875โ5144. Elsevier Science, Amsterdam, Netherlands.
There is not role for randomenss?
`
def simulate_outcomes_estimation(init_dict, X, Z):
"""The function simulates the outcome Y, the resulting treatment dummy."""
# Distribute information
coeffs_untreated = init_dict['UNTREATED']['all']
coeffs_treated = init_dict['TREATED']['all']
coeffs_cost = init_dict['COST']['all']
# Calculate potential outcomes and costs
Y_1 = np.dot(coeffs_treated, X.T)
Y_0 = np.dot(coeffs_untreated, X.T)
C = np.dot(coeffs_cost, Z.T)
# Calculate expected benefit and the resulting treatment dummy
D = np.array((Y_1 - Y_0 - C > 0).astype(int))
# Observed outcomes
Y = D * Y_1 + (1 - D) * Y_0
return Y, D, Y_1, Y_0
`
I want to switch to a feature branch workflow soon. For this purpose, I want to set up the automatic code review tools that assess the quality of the branch.
Please add the missing reference to this paper ....
http://www.journals.uchicago.edu/doi/abs/10.1086/679498 for Eisenhauer.2015 in contributing.rst
Please move print_dict() into generate_random_dict()
This issue simply serves the purpose to keep track of the major edits to the code in the branch.
This is from the reliability test setup, why is the message indicate success of the optimizer but the warning also needed?
Please confirm that all is merged into erbin and then delete it.
Please include in our regression test a single evaluation of the criterion function at the starting values in addition to the overall statistic on the simulated dataset.
Please rename the true values to init values... These are only the true values if the dataset is simulated with the same initialization file. This refers to the user option, but also inside the code if required.
We want to integrate codacy in future pull request requirements .
The reference to Heckman Vytlacil in the very beginning does not conform with our treatment of references.
I created a new section that describes the basic economics that underlie the generalized Roy model and discuss some selected issues in the econometrics of policy evaluation. At this point this is mainly a simple copy of the material from https://github.com/policyMetrics/miscellaneous/blob/master/Eisenhauer.2012.pdf
Please polish this section by properly formatting everything such as the equations, references. Please make sure that all references also show up in the bibliography.
Please add an explicit block in the initialization file that contains parameters for the estimation:
ESTIMATION
agents 1000
file data.respy.dat
maxfun 1000
optimizer FORT-NEWUOA
This is the relevant part from the respy pacakge.
Once we have our first release, this needs to be added to the documentation.
At the end of each estimation, please output a file est.grmpy.info that contains the value of the parameters at the start and at the end. Also, note the number of function evaluations and the optimizer termination status as well as the optimizer message.
Example from respy attached.
est.info.txt
MTE is flat, but has wrong level.
init.txt
simulatio_info.txt
Please document the initialization file in the tutorial.rst. See here for an example, but feel free to deviate if there is a good reason. http://respy.readthedocs.io/en/latest/tutorial.html
Please change the setup of the optimization so that we are internally optimizing over the Cholesky Factor of the covariance matrix. This allows us to avoid any of the parameter transformations, i.e. ensuring that values valid variances and covariances.
The unobservables are ordered (U_0, U_1) while the potential outcomes are always (Y_1, Y_0). Please adjust code, simulation output, documentation so that the unobservables are (U_1, U_0 ) as well ...
We need to integrate a regression test battery in our workflow. See my first draft at https://github.com/grmToolbox/grmpy/blob/master/development/tests/regression/draft.py
We will discuss the ideas behind it and the next steps during our call today.
Please add POWELL as an alternative optimizer to request by the user.
Do we correctly understand that the return values are always the starting values if the success indicator is false?
As you suspected, the MTE calculation is only valid for the special case of var(V) = 1. Please generalize the function ...
Several functions are missing a docstring.
Please conduct a couple of edits to the documentation:
I want to improve the layout of the information file. Also, prepare layout for subsequent MTE implementation by Sebastian.
Please add function that allows us to run to the tests from inside the interpreter:
python -c "import grmpy; grmpy.test()"
See https://github.com/restudToolbox/package/blob/master/respy/__init__.py for an example.
Please add the feature that all BFGS options are specified in the initialization file, see below for the example from respy.
SCIPY-BFGS
gtol 0.000100000000000
maxiter 1
[ ] incorporate in read()
[ ] add tests at beginning of estimate() to see whether specified with valid input values.
[ ] add to random initialization file generator
It seems we are working with squared covariances?
Please implement the following feature: I want to be able to specify in the initialization file that a certain covariate takes on only value one and zero ....
Definition of Done:
Please fix formatting in conftest.py. No need to assign the issue back to me, just close it right away when you are done.
Installing the package in development mode and running py.test gives an error. Please fix.
(grmToolbox) peisenha@pontos:~/grmToolbox/grmpy$ pip install -e .
Obtaining file:///home/peisenha/ownCloud/office/workspace/software/repositories/organizations/grmToolbox/grmpy
Requirement already satisfied: numpy in /home/peisenha/.envs/grmToolbox/lib/python3.5/site-packages (from grmpy==0.0.5.dev0)
Requirement already satisfied: scipy in /home/peisenha/.envs/grmToolbox/lib/python3.5/site-packages (from grmpy==0.0.5.dev0)
Requirement already satisfied: pytest in /home/peisenha/.envs/grmToolbox/lib/python3.5/site-packages (from grmpy==0.0.5.dev0)
Requirement already satisfied: pandas in /home/peisenha/.envs/grmToolbox/lib/python3.5/site-packages (from grmpy==0.0.5.dev0)
Requirement already satisfied: statsmodels in /home/peisenha/.envs/grmToolbox/lib/python3.5/site-packages (from grmpy==0.0.5.dev0)
Requirement already satisfied: py>=1.4.33 in /home/peisenha/.envs/grmToolbox/lib/python3.5/site-packages (from pytest->grmpy==0.0.5.dev0)
Requirement already satisfied: setuptools in /home/peisenha/.envs/grmToolbox/lib/python3.5/site-packages (from pytest->grmpy==0.0.5.dev0)
Requirement already satisfied: python-dateutil>=2 in /home/peisenha/.envs/grmToolbox/lib/python3.5/site-packages (from pandas->grmpy==0.0.5.dev0)
Requirement already satisfied: pytz>=2011k in /home/peisenha/.envs/grmToolbox/lib/python3.5/site-packages (from pandas->grmpy==0.0.5.dev0)
Requirement already satisfied: patsy in /home/peisenha/.envs/grmToolbox/lib/python3.5/site-packages (from statsmodels->grmpy==0.0.5.dev0)
Requirement already satisfied: six>=1.5 in /home/peisenha/.envs/grmToolbox/lib/python3.5/site-packages (from python-dateutil>=2->pandas->grmpy==0.0.5.dev0)
Installing collected packages: grmpy
Found existing installation: grmpy 0.0.5.dev0
Uninstalling grmpy-0.0.5.dev0:
Successfully uninstalled grmpy-0.0.5.dev0
Running setup.py develop for grmpy
Successfully installed grmpy
(grmToolbox) peisenha@pontos:~/grmToolbox/grmpy$ py.test
============================= test session starts ==============================
platform linux -- Python 3.5.2, pytest-3.2.1, py-1.4.34, pluggy-0.4.0
rootdir: /home/peisenha/ownCloud/office/workspace/software/repositories/organizations/grmToolbox/grmpy, inifile:
collected 7 items
grmpy/test/test_integration.py .F
grmpy/test/test_unit.py .....
================================================================================================= FAILURES =================================================================================================
_____________________________________________________________________________________________ TestClass.test2 ______________________________________________________________________________________________
self = <grmpy.test.test_integration.TestClass object at 0x7f6f388a5048>
def test2(self):
"""The test takes a subsample of 5 random entries from the regression battery test list
(resources/regression_vault.grmpy.json), simulates the specific output again, sums the
resulting data frame up and checks if the sum is equal to the regarding entry in the test
list eement.
"""
tests = json.load(
> open('{}'.format(os.getcwd()) + '/test/resources/regression_vault.grmpy.json', 'r'))
E FileNotFoundError: [Errno 2] No such file or directory: '/home/peisenha/ownCloud/office/workspace/software/repositories/organizations/grmToolbox/grmpy/test/resources/regression_vault.grmpy.json'
grmpy/test/test_integration.py:35: FileNotFoundError
=================================================================================== 1 failed, 6 passed in 11.27 seconds ====================================================================================
(grmToolbox) peisenha@pontos:~/grmToolbox/grmpy$
Require user to specify the column number where the regressor is found ... Impose restriction for now, that the columns need to be specified identical for treated, untreated.
https://github.com/peritus/bumpversion To ease workflow with pypi releases
The individuals know their values for U_1 and U_0 when making their decision. Please account for this when simulating the choice.
The link to the tutorial init file refers to an old branch and needs to be updated once we are back in master.
We will iterate on this noteboook https://github.com/grmToolbox/grmpy/blob/master/simulation.ipynb and develop a baseline simulation code of the generalized Roy Model. The first part of this lecture will give you some guidance. https://github.com/grmToolbox/notebook/blob/master/lecture/lecture.ipynb However, tackle the problem your own way and then we will iterate on it from there.
Creating a pdf from our documentation fails, please see if you can reproduce the problem and fix it.
(grmToolbox) peisenha@pontos:~/grmToolbox/grmpy/docs$ make latexpdf
Running Sphinx v1.6.3
making output directory...
/home/peisenha/.envs/grmToolbox/lib/python3.5/site-packages/sphinx/util/compat.py:40: RemovedInSphinx17Warning: sphinx.util.compat.Directive is deprecated and will be removed in Sphinx 1.7, please use docutils' instead.
RemovedInSphinx17Warning)
loading pickled environment... done
building [mo]: targets for 0 po files that are out of date
building [latex]: all documents
updating environment: 0 added, 0 changed, 0 removed
looking for now-outdated files... none found
processing grmpy.tex...index economics installation tutorial reliability software_engineering contributing credits changes bibliography
resolving references...
Exception occurred:
File "/home/peisenha/.envs/grmToolbox/lib/python3.5/site-packages/sphinx/transforms/post_transforms/images.py", line 73, in handle
basename = sha1(node['uri']).hexdigest()
TypeError: Unicode-objects must be encoded before hashing
The full traceback has been saved in /tmp/sphinx-err-0hbedqya.log, if you want to report the issue to the developers.
Please also report this if it was a user error, so that a better error message can be provided next time.
A bug report can be filed in the tracker at <https://github.com/sphinx-doc/sphinx/issues>. Thanks!
Makefile:20: recipe for target 'latexpdf' failed
make: *** [latexpdf] Error 1
for f in glob.glob("*.grmpy.*"):
os.remove(f)
into a function new function cleanup(). This shows up numerous times in the test-related modules
Please do not use assert np.array_equal instead use np.testing.assert_equal instead, try to avoid using assert ... altogether in the test modules.
This could use a loop:
assert np.array_equal(df.Y[df.D == 1], df.Y1[df.D == 1])
assert np.array_equal(df.Y[df.D == 0], df.Y1[df.D == 0])
Please consider replacing this x_ = [col for col in df if col.startswith('X')]
by using http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.filter.html
There is a print statement in test5()
Why do we have test_1 and test_2 separate, they are pretty much identical? It is enough if the is_deterministic flag is tested with probablity 0.1
Talk to you next week ...
/home/peisenha/.envs/grmToolbox/lib/python3.5/site-packages/statsmodels/compat/pandas.py:56: FutureWarning: The pandas.core.datetools module is deprecated and will be removed in a future version. Please use the pandas.tseries module instead.
from pandas.core import datetools
Traceback (most recent call last):
File "run.py", line 16, in
estimate('test.grmpy.ini', option, optimizer='POWELL')
File "/home/peisenha/ownCloud/office/workspace/software/repositories/organizations/grmToolbox/grmpy/grmpy/estimate/estimate.py", line 40, in estimate
minimizing_interface, x0, args=(dict_, data), method=method, options=opts)
File "/home/peisenha/.envs/grmToolbox/lib/python3.5/site-packages/scipy/optimize/_minimize.py", line 440, in minimize
return _minimize_powell(fun, x0, args, callback, **options)
File "/home/peisenha/.envs/grmToolbox/lib/python3.5/site-packages/scipy/optimize/optimize.py", line 2435, in _minimize_powell
direc1 = direc[i]
IndexError: too many indices for array
Attached initialization file test.txt
Please implement version of the likelihood function that relies on NUMPY and avoids the time consuming loop. As a suggestions, set up a unit test that compares your slow version against a fast version for random initialization files.
Please add the calculation of the MTE based on https://www.aeaweb.org/articles?id=10.1257/aer.101.6.2754 to our information file.
Please adjust order of printed coefficients to the one in the init file. Add constant variance for V
estimation of generalized Roy Model (Heckman & Vytlacil, 2005) ...
Please add the usual link to the reference in the bibliography.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.