Code Monkey home page Code Monkey logo

pyvmte's People

Contributors

buddejul avatar

Watchers

 avatar

pyvmte's Issues

ENH: Implement type hinting

Implement type hinting for all non-interface functions.

Also, implement type conversion functions for interfaces, see #9.

BUG:

Problem description

I think the weights from the gamma linear map are not correctly estimated, because we currently do not use the estimated p score associated pscore with each z value (i.e. somewhere I think we need to pass in an pscore array that holds the pscore for each observation i in the data).

Expected Output

ENH: Test for consistent estimation of LP inputs

This is a bit tricky because generally the partition will not coincide with the analytic solution.

What I could do: Get analytic solution for every partition then compare estimated to computed solution.

ENH: Use pytest fixtures for unit tests identification part

Is your feature request related to a problem?

Currently there is a lot of repetitive code in the testing for the identification part although we only use the GDP from the paper.

Describe the solution you'd like

Use fixtures/think about how to make function calls smaller (e.g. specify argument that never change beforehand).

(Potential) BUG: Specification of estimands for LATE.

What should the user specify for target_estimand and identified_estimands in the estimation mode?

I think this is the same for OLS slope, IV slope, and cross moments.

However, LATE could be different. Two options

estimand_late_v1 = {
    "type": "late",
    "u_lo": some_pscore_lo,
    "u_hi": some_pscore_hi,
}

estimand_late_v2 = {
    "type": "late",
    "z_lo": some_z_val_lo,
    "z_hi": some_z_val_hi,
}

v1 is weird becaue it needs the user to specify the propensity scores for which the LATE is identified. This could lead to downstream errors because we also (re-)estimate this in the code so might have to numbers for the same pscore with epsilon difference.

v2 is better, I think: In our setup any LATEs are identified from certain values of the instrument Z and their associated propensity scores. Is this true? Need to check Angrist and Imbens paper.

ENH: Implement error handling and argument processing for

Is your feature request related to a problem?

Currently we do not do error handling and/or argument processing.

Describe the solution you'd like

Error handling (and maybe argument conversion) with a good report in case of errors at the top of the pyvmte interface.

API breaking implications

None.

ENH: Estimation for Simple Model

Is your feature request related to a problem?

Implement estimation of the simple sharp non-parametric model.

Describe the solution you'd like

Requires setting up the estimation part of the paper.

API breaking implications

None

BUG: Function calls to identification require list input even if single estimand.

The function identification allows for multiple estimand dicts in a list by looping over that list. However, if we supply a dict and not a list containing a single dict the loop breaks.

This makes writing code with single estimands unintuitive because we need to specify arguments as list.

Proposed Solution

Convert to list after providing input so loop works.

ENH: Look for faster solver for LP

Is your feature request related to a problem?

Not a problem, but once the python code works well most time should be spent solving the LPs.

Describe the solution you'd like

Have a look at this benchmark

  • Maybe think about structure of our problem; try out a bit free solvers available without much rewriting

API breaking implications

  • scipy.linprog natively supports only very few of the algs (e.g. highs), so would need to rewrite formulation of the problem

ENH: Specify target and identified estimands as dictionaries.

Is your feature request related to a problem?

Currently it's supper annoying to keep track of which parameters belong to which estimand when there are multiple parameters of one type. E.g. with multiple identified late we have multiple values of u_lo and u_hi each specific to that late.

Describe the solution you'd like

Each estimand (target or identified) get's it's own dictionary that contains all the estimand specific information.

  • OLS and IV: no additional info beyond type
  • LATE: u_lo, u_hi
  • cross moments: values of d, z

Describe alternatives you've considered

One alternative might be implementation as an estimand object.

example_cross = {
    'estimand_type': 'cross',
    'd': 0,
    'z': 0
}

example_ols = {
    'estimand_type': 'ols_slope', 
}

example_iv = {
    'estimand_type': 'iv_slope', 
}

example_late = {
    'estimand_type': 'late', 
    'u_lo': 0.35,
    'u_hi': 0.9
}

BUG: Second step linear program fails

Problem description

Second step linear program fails for test_second_step_linear_program_runs inputs.

Expected Output

Solution to linear program.

Hypotheses

  • DGP is incorrectly specified
  • Sample size is too small
    • Probably not the problem, size is already quite large
  • Weights are estimated incorrectly --> Write tests for estimation of weights; can compute this analytically and check with paper.
  • LP formulation is wrong

ENH: Rewrite gamma_star function

Is your feature request related to a problem?

Currently the gamma_star function does not follow a good style.

Describe the solution you'd like

Use functional approach with helper functions

API breaking implications

None

ENH: Use analytical integration wherever possible.

Is your feature request related to a problem?

Add the moment we use scipy routines to integrate Bernstein and constant basis functions. However, they have analytical solutions which are preferable. Also, scipy throws a warning for every integration, e.g.

IntegrationWarning: The maximum number of subdivisions (50) has been achieved.
    If increasing the limit yields no improvement it is advised to analyze
    the integrand in order to determine the difficulties.  If the position of a
    local difficulty can be determined (singularity, discontinuity) one will
    probably gain from splitting up the interval and calling the integrator
    on the subranges.  Perhaps a special-purpose integrator should be used.
    integrate.quad(func, 0, 1, args=(z,))[0] * pdf_z[i]

Describe the solution you'd like

Related to issue #7.

ENH: Identification with Shape Constraints

Is your feature request related to a problem?

Implement identification under shape constraints (see Figures 6 and 7 in the paper).

Describe the solution you'd like

  • How are these constraints implemented in the linear program?
  • API implementation in pyvmte and identification and estimation functions.

ENH: Put information about z into container

Is your feature request related to a problem?

Currently most functions take separate inputs pscore_z, support_z, pdf_z but just pass them onto another function.
This makes the code unnecessarily hard to read.

Describe the solution you'd like

Instead we should provide a dictionary containing all relevant info about z and only the final functions unpack this dict.

API breaking implications

Not relevant atm.

BUG: Side effects in estimation tests

Code Sample, a copy-pastable example

config.py has

SETUP = {
    "target": "late",
    "lower_bound": -0.421,
    "upper_bound": 0.500,
}

and the test

@pytest.mark.parametrize("setup,lower_bound", [(SETUP, some_value)])
def test_abc(setup, lower_bound):
    target = setup
    target["lower_bound"] = lower_bound

Problem description

This type of assignment has side-effects on the parameter settings stored in a dict (which is mutable). Then tests following this test will fail because they import the changed version of the SETUP.

Expected Output

All tests should pass irrespective of order/tests shouldn't have sideeffects.

Solution

Implement SETUP as NamedTuple.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.