Code Monkey home page Code Monkey logo

Comments (5)

bernardoleite avatar bernardoleite commented on August 23, 2024 1

@seewoo5 I believe it's a good option to do refactoring using Pytorch Lighting (I'm becoming a fan of it too). Regarding the second question, I am more enlightened now. Thanks for the comprehensive explanation.

Regards,
Bernardo

from kt.

seewoo5 avatar seewoo5 commented on August 23, 2024

You're right, and the other models are the same. The only reason is that the NPA model, which uses BiLSTM to encode past interaction, can't be trained to predict all the responses in a given sequence because of its bi-directional property. However, all the other models can be trained by computing losses for all interactions in a sequence (not only the last one), and this actually makes training much faster. Although I'm going to fix it later, and you can fix it and send PR if you want.

from kt.

Chunpai avatar Chunpai commented on August 23, 2024

Thank you for your clarification.

from kt.

bernardoleite avatar bernardoleite commented on August 23, 2024

Hi there!

Has anyone by chance made an implementation that allows predicting a complete sequence instead of one target_id for a single sequence? If so, I would be very grateful if you could share.

Also, I take this opportunity to confirm: considering that current code only makes predictions for one target_id, can we compare the obtained results with state-of-the-art (where whole sequences are considered for prediction)? I apologize in advance if I miss some implementation detail.

Regards.

from kt.

seewoo5 avatar seewoo5 commented on August 23, 2024

@bernardoleite First of all, I may not have time to do the implementation for now. Actually I'm planning to do refactoring the whole repository using Pytorch Lightning and adding some recent KT models, but I don't have enough time to do that. I'm also considering about using EduData instead of my own pre-processed datasets.

For your second question, I think that making predictions only for one target_id is the right way to evaluate models, but most of the other results and papers actually divide whole sequence into several sub-sequences of fixed length and do prediction for each sequence. This would give worse performance than one-by-one prediction. For example, when you want to do prediction for the first question in the second subsequence, then the input does not include any previous interactions from the first subsequence. However, if a model make predictions for single target id at once, you can feed the previous interactions as much as you can (same size as the maximum length of the model trained), which should give a better prediction result.

from kt.

Related Issues (8)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.