Code Monkey home page Code Monkey logo

Comments (13)

henrishi avatar henrishi commented on July 20, 2024

@susanathey I'm tracking the paper 3 dissertation submission here.

Yesterday, you mentioned the revision involves a comparison with a state-of-the-art model. Which model is it?

I hope to get started on this ASAP in case I run into environment/resource constraints on the 17Zuoye machines that I'm using.

from bm_model.

susanathey avatar susanathey commented on July 20, 2024

I'd say the first thing is to have a good lit review--I don't actually know what is considered state of the art for this specific application. It is a matrix completion problem and I know that matrix factorization is best in some cases while neural net might be better elsewhere, that depends on the setting, but computational burden and interpretability may drive decision-making in general. however it would be good to explain what the competitors are, and say whether this method is pretty much the same as alternatives with only modest differences, or where there are other models there might be tradeoffs. Then as we discussed, if the value add is that the model is interpretable, show some interpretations. To be publishable rather than just an engineering project for a team, need to decide what the message is. Is it "wow I can compute this on a big dataset"? Is it "look how well calibrated I am?" or "look how well I predict" but for those it is useful to know what the comparison is. If it is "wow I learn really cool things with this method" then you need to show them. Just saying "cool this model converged and isn't just noise" is not enough.

from bm_model.

henrishi avatar henrishi commented on July 20, 2024

This makes a lot of sense. For the dissertation submission, I want to aim for the "wow I can compute this on a big dataset" message.

I have baselines from a few recent papers from CS and polisci. The model formulations are not too different from each other, so I'd expect predictive performance to be similar -- our factorization works well, but they have multiple variations too, so probably hard to say mine is best definitively in the next week.

Nevertheless, the main difference I see in their papers is that they are still using much smaller datasets than me and take longer to train. Probably because they only have access to Kaggle datasets and not data from a big company. So this is a point I can drill in on in the next few days -- improving the lit review and adding comparisons.

While this computation argument may not be the only argument we make in the final publication, I think it would be one of the points we make. After all, a big advantage we have over the other people writing about this is that we are actually applying it in a corporate environment on a large scale.

Does this sound good to you?

from bm_model.

susanathey avatar susanathey commented on July 20, 2024

Well "wow I can compute this" still needs some kind of benchmark relative to the existing literature--and it isn't by itself a scientific contribution. So I don't think that cuts it by itself.

from bm_model.

henrishi avatar henrishi commented on July 20, 2024

How about a combination of computation argument and a predictive comparison against recent models from CS and POLISCI? Probably won't be able to do too many models due to time constraint, but I'll use the 2 or 3 that are most highly cited.

from bm_model.

henrishi avatar henrishi commented on July 20, 2024

I don't think I can do "look how well-calibrated I am?" so going the prediction accuracy plus computational superiority is probably the only way for me.

And on the point of computation, it could be considered a good point for publication if framed well. For instance, Kosuke Imai's paper proposing an EM method for this is essentially making the computation argument. He published in APSR, the top polisci journal.

What I want to do is to take Imai's and another highly cited one:

  1. Run on my data to compare computational time to convergence
  2. Compare overall predictive accuracy in held-out test set
  3. Compare predictive accuracy by knowledge point in held-out test set

from bm_model.

susanathey avatar susanathey commented on July 20, 2024

I guess it is not clear still what the contribution is exactly on the computation side--you don't have a new computational method so you are demonstrating that one method works well on one dataset in terms of time. That is still not really a clear contribution. In terms of calibration, that is just showing that there is a tight match between predicted and observed values in the test set. Your three steps sound reasonable

from bm_model.

henrishi avatar henrishi commented on July 20, 2024

@susanathey Here is the updated version of the dissertation with the changes we talked about
Shi_dissertation_20210601.pdf

Here are the highlights to help you navigate:

  1. The models I compared against are varIRT (the 2020 paper Emma Brunskill mentioned in the defense), and emIRT (Imai 2016 in APSR), they are the most similar and most recent models I could find -- p.51 of pdf
  2. Revamped literature review with a focus on state-of-the-art models -- p.36 of pdf
  3. varIRT and emIRT chokes on the full data, so I had to use a subsample of the data -- p.51 bottom paragraph
  4. Our models dominate both varIRT and emIRT in predictive and computational performance (by a lot!) -- p.51 left figure
  5. Our models is also better than both varIRT and emIRT in predictions across knowledge labels -- p.51 right figure

This took longer than I would have liked. Mostly because varIRT and emIRT doesn't work well on data this big and sparse. Neither was easy to use as I had to write custom classes to make them work on my data. varIRT is also very new (EDM 2020) and I had to fix coding errors in their source code at times.

from bm_model.

henrishi avatar henrishi commented on July 20, 2024

I'm open all day today to talk about this, please let me know if you have questions.

Also, after the dissertation submission, I think we have multiple paths to publication:

  1. I propose a customized ELBO to make my models run even faster -- make the paper about computational gains with superior predictive performance
  2. Do more interpreting of the latents like we discussed in the defense -- make the paper about balancing prediction and interpretation.
  3. More fancy modeling, e.g. modeling teachers' effects as transformation matrix on the student latents -- make the paper to be about innovative modeling
  4. Wait on experimental results from smart-homework and publish together as one paper -- make this paper about ML + real-world experiment (maybe something similar to https://academic.oup.com/qje/article-abstract/133/1/237/4095198)

from bm_model.

henrishi avatar henrishi commented on July 20, 2024

@susanathey

Hi Susan, here is the full dissertation with all the pieces assembled.
Shi_dissertation_20210601_full.pdf

To address the authorship concerns you had, I followed Emma's suggestion and added "The paper is authored by me while its contents belong to a part of a larger project that will be published in co-authorship with Susan Athey." to papers 1 and 3.

The registrar is requesting email signatures these days. Let me know if this looks ok and I'll send the signature signoff email.

from bm_model.

susanathey avatar susanathey commented on July 20, 2024

Comments:
-the intro doesn't explain what IRT is or what question you are trying to answer--non-sequitor. Need a few sentences before this one to explain what problem you are trying to solve and why. "Traditionally, education measurement tools based on Item response theory (IRT) are designed for standardized tests with dense data. "
-you say that the models retrieve true parameters, make it clear you are referring to simulations- in real data we don't know true parameters, we only know whether we predict well in test sets
-capitalize "section" when it is used as a proper noun "Section 3"
-this paper is an application of methods to a new problem--not really new methods--so the most relevant related work would be other papers that apply methods like IRT to similar problems
-why not neural nets?
-"Computational" not "Computation" performance
-For future, a good simulation study would vary parameters like missingness and sample size and show how relative performance of methods changes
-Increase fonts in figures

from bm_model.

henrishi avatar henrishi commented on July 20, 2024

Thanks, Susan! I'll be making these changes today and tomorrow and they will be in the final submission.

from bm_model.

henrishi avatar henrishi commented on July 20, 2024

Closing.

from bm_model.

Related Issues (3)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.