Code Monkey home page Code Monkey logo

grmpy's People

Contributors

bekauf avatar maxblesch avatar pre-commit-ci[bot] avatar sebecker avatar segsell avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

grmpy's Issues

Comparison File

Please automatically simulate a sample based on the estimation results and compare the basic descriptives between the simulated and the observed sample ... like the counts across treatment status and the descriptives about the distribution of the observed outcomes.

Order of information for covariance matrix

This does not have anything to do with the order that you specified in the documentation, right?

    U0_sd, U1_sd, V_sd = init_dict['DIST']['all'][:3]
    vars_ = [U0_sd ** 2, U1_sd ** 2, V_sd ** 2]
    U01, U0_V, U1_V = init_dict['DIST']['all'][3:]
    covar_ = [U01 ** 2, U0_V ** 2, U1_V ** 2]
    Dist_coeffs = init_dict['DIST']['all']

If so, please align the code with the documentation.

TODO List Sphinx

Use TODO List feature in online doc to implement idea of someday/maybe list. This centralizes all notes and is not presented online ...

This is the last notes from evernote to integrate:
Cloud Infrastructure
MuPY
Hypothesis Package

document workflow as part of contribution

README cleanup

Please remove from README.md

The main references are:

Heckman, J. J., and Vytlacil, E. J. (2007a). Econometric evaluation of social programs, Part I: Causal effects, structural models and econometric policy evaluation. In Heckman, J. J., and Leamer, E. E., editors, Handbook of Econometrics, volume 6B, pages 4779โ€“4874. Elsevier Science, Amsterdam, Netherlands.

Heckman, J. J., and Vytlacil, E. J. (2007b). Econometric evaluation of social programs, Part II: Using the marginal treatment effect to organize alternative economic estimators to evaluate social programs and to forecast their effects in new environments. In Heckman, J. J. and Leamer, E. E., editors, Handbook of Econometrics, volume 6B, pages 4875โ€“5144. Elsevier Science, Amsterdam, Netherlands.

SIMULATION after Estimation Flawed

There is not role for randomenss?

`
def simulate_outcomes_estimation(init_dict, X, Z):
"""The function simulates the outcome Y, the resulting treatment dummy."""
# Distribute information
coeffs_untreated = init_dict['UNTREATED']['all']
coeffs_treated = init_dict['TREATED']['all']
coeffs_cost = init_dict['COST']['all']

# Calculate potential outcomes and costs
Y_1 = np.dot(coeffs_treated, X.T)
Y_0 = np.dot(coeffs_untreated, X.T)
C = np.dot(coeffs_cost, Z.T)

# Calculate expected benefit and the resulting treatment dummy
D = np.array((Y_1 - Y_0 - C > 0).astype(int))

# Observed outcomes
Y = D * Y_1 + (1 - D) * Y_0

return Y, D, Y_1, Y_0

`

Prepare Branch Workflow

I want to switch to a feature branch workflow soon. For this purpose, I want to set up the automatic code review tools that assess the quality of the branch.

pei_edits

This issue simply serves the purpose to keep track of the major edits to the code in the branch.

  • improved usability of regression test runner
  • integrated custom exceptions and started initialization file checks
  • added missing docstring to check_types()
  • cleaned up regression_test_2
  • refactored simulation of unobservables
  • refactoring MTE unit test
  • refactoring of simulation modules

Update Regression Tests

Please include in our regression test a single evaluation of the criterion function at the starting values in addition to the overall statistic on the simulated dataset.

True Values vs. Init Values

Please rename the true values to init values... These are only the true values if the dataset is simulated with the same initialization file. This refers to the user option, but also inside the code if required.

Reminder Codacy

We want to integrate codacy in future pull request requirements .

Reference on index.rst

The reference to Heckman Vytlacil in the very beginning does not conform with our treatment of references.

Economics.rst

I created a new section that describes the basic economics that underlie the generalized Roy model and discuss some selected issues in the econometrics of policy evaluation. At this point this is mainly a simple copy of the material from https://github.com/policyMetrics/miscellaneous/blob/master/Eisenhauer.2012.pdf

Please polish this section by properly formatting everything such as the equations, references. Please make sure that all references also show up in the bibliography.

ESTIMATION block in initialization file

Please add an explicit block in the initialization file that contains parameters for the estimation:

ESTIMATION

agents 1000
file data.respy.dat
maxfun 1000
optimizer FORT-NEWUOA

This is the relevant part from the respy pacakge.

  • Agents describes the number of agents to use for the estimation, we might only want to estimate on a subset of the data in the simulation sample.
  • file is the source for the estimation sample
  • maxfun is the maximum number of function evaluations. This is a little tricky to enforce with the scipy optimizers as the concept of maxiter that the options provide is different. I usually have a user-defined error class, see here for an example. As a start you might also simply write out the number of function evaluations to a file each time the likelihood function is called.
  • Please check that the special case of maxfun = 0 checks the value of the criterion function at the starting value. This is not the same as maxfun = 1 with the BFGS which first calculated the derivatives.
  • optimizer is the optimizer to use, in our case SCIPY-BFGS as the only option

Log file for Estimation results

At the end of each estimation, please output a file est.grmpy.info that contains the value of the parameters at the start and at the end. Also, note the number of function evaluations and the optimizer termination status as well as the optimizer message.

Example from respy attached.
est.info.txt

Cholesky Factors

Please change the setup of the optimization so that we are internally optimizing over the Cholesky Factor of the covariance matrix. This allows us to avoid any of the parameter transformations, i.e. ensuring that values valid variances and covariances.

Order of Unobservables

The unobservables are ordered (U_0, U_1) while the potential outcomes are always (Y_1, Y_0). Please adjust code, simulation output, documentation so that the unobservables are (U_1, U_0 ) as well ...

POWELL

Please add POWELL as an alternative optimizer to request by the user.

SCIPY Optimization

Do we correctly understand that the return values are always the starting values if the success indicator is false?

MTE Calculation

As you suspected, the MTE calculation is only valid for the special case of var(V) = 1. Please generalize the function ...

... documentation skeleton

Please conduct a couple of edits to the documentation:

  • fix all links, the markdown syntax seems different from GitHub Wiki. See the Handbook references in the beginning for example
  • since this is just a ripoff from the respy doc at the moment, please check all links that they actually work and point to grmpy material
  • add Tobias to contributors

Layout of the Information File

I want to improve the layout of the information file. Also, prepare layout for subsequent MTE implementation by Sebastian.

BFGS Options

Please add the feature that all BFGS options are specified in the initialization file, see below for the example from respy.

SCIPY-BFGS

gtol 0.000100000000000
maxiter 1

[ ] incorporate in read()
[ ] add tests at beginning of estimate() to see whether specified with valid input values.
[ ] add to random initialization file generator

Simulation of Binary Covaraites

Please implement the following feature: I want to be able to specify in the initialization file that a certain covariate takes on only value one and zero ....

  • make sure to document new features, as well as the other default currently implemented
  • also make sure that part of random init generator

Definition of Done:

  • requires an update to the documentation
  • requires an updated regression test battery

Docstrings

Please fix formatting in conftest.py. No need to assign the issue back to me, just close it right away when you are done.

Failed Regression tests

Installing the package in development mode and running py.test gives an error. Please fix.


(grmToolbox) peisenha@pontos:~/grmToolbox/grmpy$ pip install -e .
Obtaining file:///home/peisenha/ownCloud/office/workspace/software/repositories/organizations/grmToolbox/grmpy
Requirement already satisfied: numpy in /home/peisenha/.envs/grmToolbox/lib/python3.5/site-packages (from grmpy==0.0.5.dev0)
Requirement already satisfied: scipy in /home/peisenha/.envs/grmToolbox/lib/python3.5/site-packages (from grmpy==0.0.5.dev0)
Requirement already satisfied: pytest in /home/peisenha/.envs/grmToolbox/lib/python3.5/site-packages (from grmpy==0.0.5.dev0)
Requirement already satisfied: pandas in /home/peisenha/.envs/grmToolbox/lib/python3.5/site-packages (from grmpy==0.0.5.dev0)
Requirement already satisfied: statsmodels in /home/peisenha/.envs/grmToolbox/lib/python3.5/site-packages (from grmpy==0.0.5.dev0)
Requirement already satisfied: py>=1.4.33 in /home/peisenha/.envs/grmToolbox/lib/python3.5/site-packages (from pytest->grmpy==0.0.5.dev0)
Requirement already satisfied: setuptools in /home/peisenha/.envs/grmToolbox/lib/python3.5/site-packages (from pytest->grmpy==0.0.5.dev0)
Requirement already satisfied: python-dateutil>=2 in /home/peisenha/.envs/grmToolbox/lib/python3.5/site-packages (from pandas->grmpy==0.0.5.dev0)
Requirement already satisfied: pytz>=2011k in /home/peisenha/.envs/grmToolbox/lib/python3.5/site-packages (from pandas->grmpy==0.0.5.dev0)
Requirement already satisfied: patsy in /home/peisenha/.envs/grmToolbox/lib/python3.5/site-packages (from statsmodels->grmpy==0.0.5.dev0)
Requirement already satisfied: six>=1.5 in /home/peisenha/.envs/grmToolbox/lib/python3.5/site-packages (from python-dateutil>=2->pandas->grmpy==0.0.5.dev0)
Installing collected packages: grmpy
  Found existing installation: grmpy 0.0.5.dev0
    Uninstalling grmpy-0.0.5.dev0:
      Successfully uninstalled grmpy-0.0.5.dev0
  Running setup.py develop for grmpy
Successfully installed grmpy
(grmToolbox) peisenha@pontos:~/grmToolbox/grmpy$ py.test
============================= test session starts ==============================
platform linux -- Python 3.5.2, pytest-3.2.1, py-1.4.34, pluggy-0.4.0
rootdir: /home/peisenha/ownCloud/office/workspace/software/repositories/organizations/grmToolbox/grmpy, inifile:
collected 7 items                                                               

grmpy/test/test_integration.py .F
grmpy/test/test_unit.py .....

================================================================================================= FAILURES =================================================================================================
_____________________________________________________________________________________________ TestClass.test2 ______________________________________________________________________________________________

self = <grmpy.test.test_integration.TestClass object at 0x7f6f388a5048>

    def test2(self):
        """The test takes a subsample of 5 random entries from the regression battery test list
            (resources/regression_vault.grmpy.json), simulates the specific output again, sums the
            resulting data frame up and checks if the sum is equal to the regarding entry in the test
            list eement.
            """
        tests = json.load(
>           open('{}'.format(os.getcwd()) + '/test/resources/regression_vault.grmpy.json', 'r'))
E       FileNotFoundError: [Errno 2] No such file or directory: '/home/peisenha/ownCloud/office/workspace/software/repositories/organizations/grmToolbox/grmpy/test/resources/regression_vault.grmpy.json'

grmpy/test/test_integration.py:35: FileNotFoundError
=================================================================================== 1 failed, 6 passed in 11.27 seconds ====================================================================================
(grmToolbox) peisenha@pontos:~/grmToolbox/grmpy$ 

Discuss Estimation Feature

Require user to specify the column number where the regressor is found ... Impose restriction for now, that the columns need to be specified identical for treated, untreated.

  • document that we have strict separation between cost and benefit shifters, we will weaken that restriction in due time.

Agent's Information Set

The individuals know their values for U_1 and U_0 when making their decision. Please account for this when simulating the choice.

Link to Tutorial

The link to the tutorial init file refers to an old branch and needs to be updated once we are back in master.

sphinx latexpdf

Creating a pdf from our documentation fails, please see if you can reproduce the problem and fix it.


(grmToolbox) peisenha@pontos:~/grmToolbox/grmpy/docs$ make latexpdf
Running Sphinx v1.6.3
making output directory...
/home/peisenha/.envs/grmToolbox/lib/python3.5/site-packages/sphinx/util/compat.py:40: RemovedInSphinx17Warning: sphinx.util.compat.Directive is deprecated and will be removed in Sphinx 1.7, please use docutils' instead.
  RemovedInSphinx17Warning)
loading pickled environment... done
building [mo]: targets for 0 po files that are out of date
building [latex]: all documents
updating environment: 0 added, 0 changed, 0 removed
looking for now-outdated files... none found
processing grmpy.tex...index economics installation tutorial reliability software_engineering contributing credits changes bibliography 
resolving references...

Exception occurred:
  File "/home/peisenha/.envs/grmToolbox/lib/python3.5/site-packages/sphinx/transforms/post_transforms/images.py", line 73, in handle
    basename = sha1(node['uri']).hexdigest()
TypeError: Unicode-objects must be encoded before hashing
The full traceback has been saved in /tmp/sphinx-err-0hbedqya.log, if you want to report the issue to the developers.
Please also report this if it was a user error, so that a better error message can be provided next time.
A bug report can be filed in the tracker at <https://github.com/sphinx-doc/sphinx/issues>. Thanks!
Makefile:20: recipe for target 'latexpdf' failed
make: *** [latexpdf] Error 1

Some comments on test_unit.py

  • Please refactor
            for f in glob.glob("*.grmpy.*"):
                os.remove(f)

into a function new function cleanup(). This shows up numerous times in the test-related modules

  • Please do not use assert np.array_equal instead use np.testing.assert_equal instead, try to avoid using assert ... altogether in the test modules.

  • This could use a loop:
    assert np.array_equal(df.Y[df.D == 1], df.Y1[df.D == 1])
    assert np.array_equal(df.Y[df.D == 0], df.Y1[df.D == 0])

  • Please consider replacing this x_ = [col for col in df if col.startswith('X')] by using http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.filter.html

  • There is a print statement in test5()

  • Why do we have test_1 and test_2 separate, they are pretty much identical? It is enough if the is_deterministic flag is tested with probablity 0.1

Talk to you next week ...

POWELL Problem

/home/peisenha/.envs/grmToolbox/lib/python3.5/site-packages/statsmodels/compat/pandas.py:56: FutureWarning: The pandas.core.datetools module is deprecated and will be removed in a future version. Please use the pandas.tseries module instead.
from pandas.core import datetools
Traceback (most recent call last):
File "run.py", line 16, in
estimate('test.grmpy.ini', option, optimizer='POWELL')
File "/home/peisenha/ownCloud/office/workspace/software/repositories/organizations/grmToolbox/grmpy/grmpy/estimate/estimate.py", line 40, in estimate
minimizing_interface, x0, args=(dict_, data), method=method, options=opts)
File "/home/peisenha/.envs/grmToolbox/lib/python3.5/site-packages/scipy/optimize/_minimize.py", line 440, in minimize
return _minimize_powell(fun, x0, args, callback, **options)
File "/home/peisenha/.envs/grmToolbox/lib/python3.5/site-packages/scipy/optimize/optimize.py", line 2435, in _minimize_powell
direc1 = direc[i]
IndexError: too many indices for array

Attached initialization file test.txt

NUMPY Criterion

Please implement version of the likelihood function that relies on NUMPY and avoids the time consuming loop. As a suggestions, set up a unit test that compares your slow version against a fast version for random initialization files.

est.grmpy.info

Please adjust order of printed coefficients to the one in the init file. Add constant variance for V

index.rst

estimation of generalized Roy Model (Heckman & Vytlacil, 2005) ...
Please add the usual link to the reference in the bibliography.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.