Code Monkey home page Code Monkey logo

misc's People

Contributors

josef-pkt avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

misc's Issues

SUMM: What's the purpose of the estimation?

Why do we estimate? What are we using the results for?

  • parameter estimates: how large is this effect? And is there even an effect?
  • implied summary statistics: what's the average treatment effect on this subgroup? What's the average marginal effect? risk ratio, ...
  • predicting mean: What's the expected value for this new observation?
  • predicted distribution: What's this distributional property of a new observation? prediction confidence interval, tail probabilities, ...

Questions while pondering model diagnostics and goodness or lack of fit measures
e.g. Poisson and all LEF estimate the expected value consistently even if the rest is misspecified. But Poisson is not a good model for the distribution of overdispersed data.
When do we care about distributional fit diagnostic tests and measures?

Task: Collect examples and recipes

I wrote quite a lot of examples and recipes over time that are "misplaced" somewhere.
Collect and check what's more widely useful and hasn't been included in statsmodels yet.

  • Generalized IRF and FEVD for VAR

Generic Estimator classes

Towards more generic estimator frameworks

  • least squares
  • generalized linear model GLM
  • MLE
  • Quasi-MLE
  • GEE
  • GMM
  • M-estimators
  • minimum distance estimators

All the above are extremum estimators. GEE is usually formalized as estimating equations. Besides this we can also have estimators that directly specify the estimating equations, with maybe no consistent objective function for a minimum. Examples are in robust estimation with estimating equations for mean and scale specified separately. Maybe feasible GLS would also be in this category as estimation method without the extra MLE interpretation.
Two points: The estimators like OLS, GLM, GEE, RLM solve the estimating equations directly without going through an objective function, OLS, GLM have a (Q)MLE interpretation, RLM is an M-estimator. Even in cases like robust estimators, we could cast them in a exactly identified GMM framework, however similar to FGLS, and GLM/LEF, they exploit asymptotic independence or independence in expectation of the mean and variance terms.

What's the hierarchical tree?
What is common? What is specific?

How can we structure the code with class hierarchies, mixins and function for best code reuse and flexibility?
(Finetuning and special casing can always be done on the lowest level, so that's not directly relevant for the overall structure.)

When or how do we combine different estimators in one model? (e.g. MLE and QMLE, MLE and GMM/OLS, MLE and M-estimator)
example: cov_type makes MLE into QMLE, OLS/WLS mixes MLE/QMLE and LS/GMM

Components differ whether we want

  • full model with everything
  • prediction and forecasting only (machine learning style)
  • inference only: hypothesis tests (although we are moving towards simplified model for this, e.g. contingency table, and weightstats)

numerical algorithm - speed versus convenience versus robustness

some thought on choosing algorithms, algorithm libraries

linear algebra

  • high level that are easy to write and read: inv, pinv, solve
  • low level that can be numerically more efficient and finetune results reuse: SVD, QR, ...

sparse versus loops versus pandas groupby

What's the best approach for block structures, group handling and strata
examples:
kronecker product in system of equations,
balanced or unbalanced panel or cluster data

nearest neighbor, matching

still pretty unused: direct interval calculation for 1-D, kdtree, ball point

examples:
local regression (lowess, KernelRegression)
matching algorithms: multiple imputation, propensity score matching

SUMM: minimum distance estimation for misspecified models

Suppose we only have "wrong" models, what's the "best" model and our "best" estimates?

And, how do you define "best"?

Motivation:
Gamma regression: Greene compares MLE versus GMM, where GMM uses overidentified moment conditions. Gamma regression is in GLM/LEF and uses mean and variance assumptions.

preliminaries

  • GLM/LEF can estimate the mean/expected value if only the mean is correctly specified. The estimates are consistent even if these other parts are misspecified. However, what happens if we are not exclusively interested in the mean? What's the "best" Poisson model if our data is overdispersed, the mean function is possibly misspecified and we are also interested in predicting quantiles?
  • Misspecified MLE minimizes Kullback-Leibler distance to the true model and parameters converge to pseudotruevalues.
  • choosing by optimal prediction accuracy is only a special case with some distance measure for prediction accuracy and mostly used for hyperparameter and model selection (with cross-validation) but usually not for parameter estimation itself. (special case: time series models that minimize squared one step ahead forecast errors for the parameter estimation).

So, What's the point?

  • provide and use more general minimum distance estimators
  • be clear what the objective function is, i.e. distance measure
  • focus on implied final statistic (e.g. average effect, prediction)

... ???

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.