Hello <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-ur

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

prediction of models about kt HOT 5 CLOSED

Chunpai commented on August 23, 2024

prediction of models

from kt.

Comments (5)

bernardoleite commented on August 23, 2024 1

@seewoo5 I believe it's a good option to do refactoring using Pytorch Lighting (I'm becoming a fan of it too). Regarding the second question, I am more enlightened now. Thanks for the comprehensive explanation.

Regards,
Bernardo

from kt.

seewoo5 commented on August 23, 2024

You're right, and the other models are the same. The only reason is that the NPA model, which uses BiLSTM to encode past interaction, can't be trained to predict all the responses in a given sequence because of its bi-directional property. However, all the other models can be trained by computing losses for all interactions in a sequence (not only the last one), and this actually makes training much faster. Although I'm going to fix it later, and you can fix it and send PR if you want.

from kt.

Chunpai commented on August 23, 2024

Thank you for your clarification.

from kt.

bernardoleite commented on August 23, 2024

Hi there!

Has anyone by chance made an implementation that allows predicting a complete sequence instead of one target_id for a single sequence? If so, I would be very grateful if you could share.

Also, I take this opportunity to confirm: considering that current code only makes predictions for one target_id, can we compare the obtained results with state-of-the-art (where whole sequences are considered for prediction)? I apologize in advance if I miss some implementation detail.

Regards.

from kt.

seewoo5 commented on August 23, 2024

@bernardoleite First of all, I may not have time to do the implementation for now. Actually I'm planning to do refactoring the whole repository using Pytorch Lightning and adding some recent KT models, but I don't have enough time to do that. I'm also considering about using EduData instead of my own pre-processed datasets.

For your second question, I think that making predictions only for one target_id is the right way to evaluate models, but most of the other results and papers actually divide whole sequence into several sub-sequences of fixed length and do prediction for each sequence. This would give worse performance than one-by-one prediction. For example, when you want to do prediction for the first question in the second subsequence, then the input does not include any previous interactions from the first subsequence. However, if a model make predictions for single target id at once, you can feed the previous interactions as much as you can (same size as the maximum length of the model trained), which should give a better prediction result.

from kt.

prediction of models about kt HOT 5 CLOSED

Comments (5)

Related Issues (8)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent