buddejul / pyvmte Goto Github PK

0.0 0.0 0.0 199 KB

Implementation of Mogstad, Santos, Torgovitsky 2017 ECMA "Using Instrumental Variables for Inference About Policy Relevant Treatment Parameters".

Python 72.98% TeX 27.02%

pyvmte's People

Contributors

Watchers

pyvmte's Issues

ENH: Implement type hinting

Implement type hinting for all non-interface functions.

Also, implement type conversion functions for interfaces, see #9.

I think the weights from the gamma linear map are not correctly estimated, because we currently do not use the estimated p score associated pscore with each z value (i.e. somewhere I think we need to pass in an pscore array that holds the pscore for each observation i in the data).

Expected Output

ENH: RNG for parallelization

Think about how to do parallelization of Monte Carlo. Currently we use pytask-parallel plugin but not sure how it handles RNG.

Maybe move to joblib using the example from here: https://albertcthomas.github.io/good-practices-random-number-generators/

ENH: Test for consistent estimation of LP inputs

This is a bit tricky because generally the partition will not coincide with the analytic solution.

What I could do: Get analytic solution for every partition then compare estimated to computed solution.

ENH: Use pytest fixtures for unit tests identification part

Is your feature request related to a problem?

Currently there is a lot of repetitive code in the testing for the identification part although we only use the GDP from the paper.

Describe the solution you'd like

Use fixtures/think about how to make function calls smaller (e.g. specify argument that never change beforehand).

(Potential) BUG: Specification of estimands for LATE.

What should the user specify for target_estimand and identified_estimands in the estimation mode?

I think this is the same for OLS slope, IV slope, and cross moments.

However, LATE could be different. Two options

estimand_late_v1 = {
    "type": "late",
    "u_lo": some_pscore_lo,
    "u_hi": some_pscore_hi,
}

estimand_late_v2 = {
    "type": "late",
    "z_lo": some_z_val_lo,
    "z_hi": some_z_val_hi,
}

v1 is weird becaue it needs the user to specify the propensity scores for which the LATE is identified. This could lead to downstream errors because we also (re-)estimate this in the code so might have to numbers for the same pscore with epsilon difference.

v2 is better, I think: In our setup any LATEs are identified from certain values of the instrument Z and their associated propensity scores. Is this true? Need to check Angrist and Imbens paper.

ENH: Implement error handling and argument processing for

Is your feature request related to a problem?

Currently we do not do error handling and/or argument processing.

Describe the solution you'd like

Error handling (and maybe argument conversion) with a good report in case of errors at the top of the pyvmte interface.

API breaking implications

None.

ENH: Estimation for Simple Model

Is your feature request related to a problem?

Implement estimation of the simple sharp non-parametric model.

Describe the solution you'd like

Requires setting up the estimation part of the paper.

API breaking implications

None

BUG: Estimation Results do not vary by target parameter

Problem Description

The simulation results do not vary by target parameter.

BUG: LP occasionally has no solution with too low sample size.

Becomes important when simulating for low sample sizes. Should make sure this doesn't result in error when using pytask.

Capture this error in simulation
Record number of iterations without a solution to the LP

BUG: Function calls to identification require list input even if single estimand.

The function identification allows for multiple estimand dicts in a list by looping over that list. However, if we supply a dict and not a list containing a single dict the loop breaks.

This makes writing code with single estimands unintuitive because we need to specify arguments as list.

Proposed Solution

Convert to list after providing input so loop works.

ENH: Use Containers for Estimands/IV/Moments

BUG: The Old Bimodal Distribution Bug!

Settings:

Figure 3
Sample Size 100,000

To check:

u_partition
LP inputs
- A_ub
- b_ub
- c

ENH: Write Tests for identification

Is your feature request related to a problem?

Currently most of the functions in the identification part do not have unit tests.

ENH: Look for faster solver for LP

Is your feature request related to a problem?

Not a problem, but once the python code works well most time should be spent solving the LPs.

Describe the solution you'd like

Have a look at this benchmark

Maybe think about structure of our problem; try out a bit free solvers available without much rewriting

API breaking implications

scipy.linprog natively supports only very few of the algs (e.g. highs), so would need to rewrite formulation of the problem

ENH: Specify target and identified estimands as dictionaries.

Is your feature request related to a problem?

Currently it's supper annoying to keep track of which parameters belong to which estimand when there are multiple parameters of one type. E.g. with multiple identified late we have multiple values of u_lo and u_hi each specific to that late.

Describe the solution you'd like

Each estimand (target or identified) get's it's own dictionary that contains all the estimand specific information.

OLS and IV: no additional info beyond type
LATE: u_lo, u_hi
cross moments: values of d, z

Describe alternatives you've considered

One alternative might be implementation as an estimand object.

example_cross = {
    'estimand_type': 'cross',
    'd': 0,
    'z': 0
}

example_ols = {
    'estimand_type': 'ols_slope', 
}

example_iv = {
    'estimand_type': 'iv_slope', 
}

example_late = {
    'estimand_type': 'late', 
    'u_lo': 0.35,
    'u_hi': 0.9
}

ENH: Additonal simulation results and plots

Simulate by sample size for a given u_hi_target
Violin plot similar to current figure with means/CI by u_hi_target

BUG: Second step linear program fails

Problem description

Second step linear program fails for test_second_step_linear_program_runs inputs.

Expected Output

Solution to linear program.

Hypotheses

DGP is incorrectly specified
Sample size is too small
- Probably not the problem, size is already quite large
Weights are estimated incorrectly --> Write tests for estimation of weights; can compute this analytically and check with paper.
LP formulation is wrong

ENH: Profile and Speed Up Code

Profile code using EPP tools.
Speed up bottlenecks using either Numba or jax.

ENH: Rewrite gamma_star function

Is your feature request related to a problem?

Currently the gamma_star function does not follow a good style.

Describe the solution you'd like

Use functional approach with helper functions

API breaking implications

None

ENH: Use analytical integration wherever possible.

Is your feature request related to a problem?

Add the moment we use scipy routines to integrate Bernstein and constant basis functions. However, they have analytical solutions which are preferable. Also, scipy throws a warning for every integration, e.g.

IntegrationWarning: The maximum number of subdivisions (50) has been achieved.
    If increasing the limit yields no improvement it is advised to analyze
    the integrand in order to determine the difficulties.  If the position of a
    local difficulty can be determined (singularity, discontinuity) one will
    probably gain from splitting up the interval and calling the integrator
    on the subranges.  Perhaps a special-purpose integrator should be used.
    integrate.quad(func, 0, 1, args=(z,))[0] * pdf_z[i]

Describe the solution you'd like

Related to issue #7.

ENH: Separate config for different parts of the project.

Defining settings for all simulations in config.py and using it as a dependency triggers rerun of all simulations, which is not ideal. Instead use different config files.

ENH: Identification with Shape Constraints

Is your feature request related to a problem?

Implement identification under shape constraints (see Figures 6 and 7 in the paper).

Describe the solution you'd like

How are these constraints implemented in the linear program?
API implementation in pyvmte and identification and estimation functions.

BUG: Identification not working for upper bounds.

Identification tests for Figures 2, 3, and 5 fail for the upper bound.

Write tests for identification functions.
Write checks for identification inputs.

ENH: Put information about z into container

Is your feature request related to a problem?

Currently most functions take separate inputs pscore_z, support_z, pdf_z but just pass them onto another function.
This makes the code unnecessarily hard to read.

Describe the solution you'd like

Instead we should provide a dictionary containing all relevant info about z and only the final functions unpack this dict.

API breaking implications

Not relevant atm.

BUG: Side effects in estimation tests

Code Sample, a copy-pastable example

config.py has

SETUP = {
    "target": "late",
    "lower_bound": -0.421,
    "upper_bound": 0.500,
}

and the test

@pytest.mark.parametrize("setup,lower_bound", [(SETUP, some_value)])
def test_abc(setup, lower_bound):
    target = setup
    target["lower_bound"] = lower_bound

Problem description

This type of assignment has side-effects on the parameter settings stored in a dict (which is mutable). Then tests following this test will fail because they import the changed version of the SETUP.

Expected Output

All tests should pass irrespective of order/tests shouldn't have sideeffects.

Solution

Implement SETUP as NamedTuple.

buddejul / pyvmte Goto Github PK

pyvmte's People

Contributors

Watchers

pyvmte's Issues

Problem description

Expected Output

Is your feature request related to a problem?

Describe the solution you'd like

Is your feature request related to a problem?

Describe the solution you'd like

API breaking implications

Is your feature request related to a problem?

Describe the solution you'd like

API breaking implications

Problem Description

Proposed Solution

Is your feature request related to a problem?

Is your feature request related to a problem?

Describe the solution you'd like

API breaking implications

Is your feature request related to a problem?

Describe the solution you'd like

Describe alternatives you've considered

Problem description

Expected Output

Hypotheses

Is your feature request related to a problem?

Describe the solution you'd like

API breaking implications

Is your feature request related to a problem?

Describe the solution you'd like

Is your feature request related to a problem?

Describe the solution you'd like

Is your feature request related to a problem?

Describe the solution you'd like

API breaking implications

Code Sample, a copy-pastable example

Problem description

Expected Output

Solution

Recommend Projects

Recommend Topics

Recommend Org