buddejul / pyvmte Goto Github PK
View Code? Open in Web Editor NEWImplementation of Mogstad, Santos, Torgovitsky 2017 ECMA "Using Instrumental Variables for Inference About Policy Relevant Treatment Parameters".
Implementation of Mogstad, Santos, Torgovitsky 2017 ECMA "Using Instrumental Variables for Inference About Policy Relevant Treatment Parameters".
Implement type hinting for all non-interface functions.
Also, implement type conversion functions for interfaces, see #9.
I think the weights from the gamma linear map are not correctly estimated, because we currently do not use the estimated p score associated pscore with each z value (i.e. somewhere I think we need to pass in an pscore
array that holds the pscore for each observation i
in the data).
Think about how to do parallelization of Monte Carlo. Currently we use pytask-parallel
plugin but not sure how it handles RNG.
Maybe move to joblib
using the example from here: https://albertcthomas.github.io/good-practices-random-number-generators/
This is a bit tricky because generally the partition will not coincide with the analytic solution.
What I could do: Get analytic solution for every partition then compare estimated to computed solution.
Currently there is a lot of repetitive code in the testing for the identification part although we only use the GDP from the paper.
Use fixtures/think about how to make function calls smaller (e.g. specify argument that never change beforehand).
What should the user specify for target_estimand
and identified_estimands
in the estimation
mode?
I think this is the same for OLS slope, IV slope, and cross moments.
However, LATE could be different. Two options
estimand_late_v1 = {
"type": "late",
"u_lo": some_pscore_lo,
"u_hi": some_pscore_hi,
}
estimand_late_v2 = {
"type": "late",
"z_lo": some_z_val_lo,
"z_hi": some_z_val_hi,
}
v1
is weird becaue it needs the user to specify the propensity scores for which the LATE is identified. This could lead to downstream errors because we also (re-)estimate this in the code so might have to numbers for the same pscore with epsilon difference.
v2
is better, I think: In our setup any LATEs are identified from certain values of the instrument Z
and their associated propensity scores. Is this true? Need to check Angrist and Imbens paper.
Currently we do not do error handling and/or argument processing.
Error handling (and maybe argument conversion) with a good report in case of errors at the top of the pyvmte
interface.
None.
Implement estimation of the simple sharp non-parametric model.
Requires setting up the estimation
part of the paper.
None
Becomes important when simulating for low sample sizes. Should make sure this doesn't result in error when using pytask.
The function identification
allows for multiple estimand dicts in a list by looping over that list. However, if we supply a dict and not a list containing a single dict the loop breaks.
This makes writing code with single estimands unintuitive because we need to specify arguments as list.
Convert to list after providing input so loop works.
Currently most of the functions in the identification part do not have unit tests.
Not a problem, but once the python code works well most time should be spent solving the LPs.
Have a look at this benchmark
scipy.linprog
natively supports only very few of the algs (e.g. highs), so would need to rewrite formulation of the problemCurrently it's supper annoying to keep track of which parameters belong to which estimand when there are multiple parameters of one type. E.g. with multiple identified late we have multiple values of u_lo
and u_hi
each specific to that late.
Each estimand (target or identified) get's it's own dictionary that contains all the estimand specific information.
u_lo, u_hi
d, z
One alternative might be implementation as an estimand
object.
example_cross = {
'estimand_type': 'cross',
'd': 0,
'z': 0
}
example_ols = {
'estimand_type': 'ols_slope',
}
example_iv = {
'estimand_type': 'iv_slope',
}
example_late = {
'estimand_type': 'late',
'u_lo': 0.35,
'u_hi': 0.9
}
u_hi_target
u_hi_target
Second step linear program fails for test_second_step_linear_program_runs
inputs.
Solution to linear program.
Currently the gamma_star
function does not follow a good style.
Use functional approach with helper functions
None
Add the moment we use scipy
routines to integrate Bernstein and constant basis functions. However, they have analytical solutions which are preferable. Also, scipy
throws a warning for every integration, e.g.
IntegrationWarning: The maximum number of subdivisions (50) has been achieved.
If increasing the limit yields no improvement it is advised to analyze
the integrand in order to determine the difficulties. If the position of a
local difficulty can be determined (singularity, discontinuity) one will
probably gain from splitting up the interval and calling the integrator
on the subranges. Perhaps a special-purpose integrator should be used.
integrate.quad(func, 0, 1, args=(z,))[0] * pdf_z[i]
Related to issue #7.
Defining settings for all simulations in config.py
and using it as a dependency triggers rerun of all simulations, which is not ideal. Instead use different config files.
Implement identification under shape constraints (see Figures 6 and 7 in the paper).
pyvmte
and identification
and estimation
functions.Identification tests for Figures 2, 3, and 5 fail for the upper bound.
Currently most functions take separate inputs pscore_z, support_z, pdf_z
but just pass them onto another function.
This makes the code unnecessarily hard to read.
Instead we should provide a dictionary containing all relevant info about z and only the final functions unpack this dict.
Not relevant atm.
config.py
has
SETUP = {
"target": "late",
"lower_bound": -0.421,
"upper_bound": 0.500,
}
and the test
@pytest.mark.parametrize("setup,lower_bound", [(SETUP, some_value)])
def test_abc(setup, lower_bound):
target = setup
target["lower_bound"] = lower_bound
This type of assignment has side-effects on the parameter settings stored in a dict (which is mutable). Then tests following this test will fail because they import the changed version of the SETUP
.
All tests should pass irrespective of order/tests shouldn't have sideeffects.
Implement SETUP
as NamedTuple
.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.