I have trained the model similar to <a class="issue-link js-issue-link" data-error-tex

Pretrain model policy about pensieve HOT 3 OPEN

hongzimao commented on August 25, 2024

Pretrain model policy

from pensieve.

Comments (3)

hongzimao commented on August 25, 2024

Which objective metric you used to observe skipping bitrate 2 and 4? Notice in figure 3(b) and section 3 example 2, this behavior might be desirable.

About picking the model forward, we didn't experience 2k model significantly outperforms 20k in validation. Usually the more you train it, the better the performance, unless there is significant overfitting

from pensieve.

karanrak commented on August 25, 2024

QoE linear was the metric used right? I am referring to the pretrain model that you provided with the code. Just as shown in the figures 3a/3b the model only chooses from qualities 4.3mbps / 1.8mbps / 750mbps / 300kbps, while skipping the qualities 2 and 4.
It is indeed performing slightly better than other models that I've tried training (which dont ignore those bitrates), but I want to know if there is some intuition/methodology behind achieving such behaviour?

This is the process I am using for training/validation.

Start the training with entropy X (e.g. 1). Keep printing the test results every 100 epochs (during model saving).
If the test results are >= max, store that model separately as the new best model.
Continue for 30k iterations. Use the best model as the base for next 30k iterations with a lower entropy.
I am using the provided training and testing sets for the above. Is there something I am missing? Coz I am getting my "best model" sometimes as early as 8k iterations out of 30k iterations. And at other times it might be at 28k iterations. Unlikely for overfitting, since the results are oscillating near max values, which shouldn't occur if there is overfitting right?

from pensieve.

hongzimao commented on August 25, 2024

Thanks for pointing this out but we didn't notice this behavior before. One intuition I can think of is this might reduce the variance for the policy (outputting just a subset of the actions). So as long as the performance is improved, the agent has all the correction intention of reduce its entropy for a subset of the actions. This might not be preferable in reality, and I think you might want to increase the training dataset to get rid of this issue.

As for the overfitting, based on what you described, it might not be an overfitting issue. But you might want to checkpoint the model and test on validation set at each step to make sure.

from pensieve.

Pretrain model policy about pensieve HOT 3 OPEN

Comments (3)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent