The casr from ralphhan

casr's People

Contributors

Stargazers

Watchers

casr's Issues

[Question] iterative refinement prone to overfit

Hi. Your idea of modeling complex sequences via so-called self-boost refinement is very interesting.

I tried to implement this approach to neural machine translation. That is, train an autoregressive model $M^{0}$ to translate a source sequence to a target sequence in the teacher-forcing manner. After training enough, generate the predicted version of the target sequences. Then, take the source sequence and predicted target sequence as input to the second autoregressive model $M^{1}$ and still train it in the teacher-forcing manner. And so on.

However, I found that, since this is not an end-to-end procedure, the validation loss goes higher as the Castep increases. The expectation that the later model corrects the wrong part of the previous prediction according to other parts it depends on fails. This result is not a surprise to me because the model can easily find a shortcut which is just simply copying the inputs and this shortcut conveniently maintains a good likelihood loss. In other words, in my implementation, only the first model $M^{0}$ learns something, and the subsequent model $\lbrace M^{t}\rbrace_{t>0}$ just overfits the input and is too lazy to try to correct the wrong predictions.

So I am wondering how exactly CASR model can achieve refinement.

Recommend Projects

ralphhan / casr Goto Github PK

casr's People

Contributors

Stargazers

Watchers

casr's Issues

[Question] iterative refinement prone to overfit

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent