Comments (6)
Hello, I find the setting of epoch in pretrain_language_model.yaml is 80, and I use 4 titan xp to pre-train language model. However, 1 epoch takes 8 hours, and 80 epochs need 640 hours(nearly 26 days). Should I have to train 80 epochs? How to judge the training of language model become convergence?
As I reproduced, 1~2 epochs are enough and final model (vision model + language model we pretrained + alignment model) could catch up with the performance author published in paper. But another problem I found, param use_sm is set False in pretrain_language_model.yaml. Is spelling mutation used when pretrain language model ?
from abinet.
@Jack-Lee-NULL did you try to evaluate LM separately? I'm training it using a smaller dataset (3.1M words, lowercase alphanumeric) and a similar setup (effective batch size = 4096), but word accuracy always saturates below 40%. This is well below the performance of the VM alone (> 85%) which converges to an acceptable state much more quickly.
from abinet.
@FangShancheng I'm using the pretrained weights for the LM and a small test script to probe its outputs given arbitrary inputs. I'm getting weird results. Below are some examples:
Input: hello tensor([[ 8, 5, 12, 12, 15, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]]), length: tensor([6])
Output: hello tensor([[ 8, 5, 12, 12, 15, 0, 23, 13, 4, 19, 19, 23, 19, 20, 4, 13, 9, 23, 23, 25, 1, 12, 13, 13, 23, 20]])
Input: hello2 tensor([[ 8, 5, 12, 12, 15, 28, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]]), length: tensor([7])
Output: celicw tensor([[ 3, 5, 12, 9, 3, 23, 0, 30, 30, 12, 19, 28, 20, 14, 28, 29, 3, 25, 19, 20, 4, 28, 3, 16, 16, 20]])
Input: hllo tensor([[ 8, 12, 12, 15, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]]), length: tensor([5])
Output: aaia tensor([[ 1, 1, 9, 1, 0, 23, 19, 1, 1, 2, 19, 23, 23, 1, 9, 13, 23, 23, 23, 1, 1, 9, 23, 23, 23, 1]])
Input: test tensor([[20, 5, 19, 20, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]]), length: tensor([5])
Output: west tensor([[23, 5, 19, 20, 0, 19, 19, 9, 1, 19, 19, 19, 7, 2, 5, 5, 5, 19, 19, 9, 14, 5, 5, 19, 19, 19]])
For the first sample, the output is as expected. However, for the next two ones, the outputs are way off. For the last one, the model erroneously corrected test
into west
. Is this the expected behavior? I'm getting very low word accuracy when training the LM.
from abinet.
@baudm I did. Different training methods I tried. On different dataset(MJ+ST lexicon, Wiki103), similar conclusion I got.
- feed incorrect word to language model when training, whose performance exceeds paper mentioned;
- feed correct word to language model when training, whose performance is similar as paper mentioned;
@FangShancheng I test the performance of language model weight you published and contrast with I reproduced. I guess that training method 2(as I mentioned above) is used in paper. I don't realize, it looks like that training method 1 is also reasonable.
from abinet.
@Jack-Lee-NULL what's the metric you're using for evaluation? When using the pretrained LM weights, I'm getting unexpected results (see previous comment). Did you try to check the actual individual outputs of the LM? Here's my minimal test script for checking individual inputs.
from abinet.
@FangShancheng @Jack-Lee-NULL I just checked Table 4 of the paper and word accuracy of BCN is indeed just above 40%. So I guess the results I posted in my previous comments were expected.
from abinet.
Related Issues (20)
- 预训练视觉模型报错
- AssertionError: idx 0 != idx_new 20 during testing. HOT 9
- I can't start training the model and get an error. AttributeError: 'NoneType' object has no attribute 'update' HOT 1
- Evaluation dataset contain corrupted image HOT 1
- Hello, may I ask how this visualized attention map is generated? Can you provide the code please?
- AssertionError: idx 0 != idx_new 2998 during testing. HOT 3
- 论文细节求解释下
- 我在运行ABINet的时候遇到RuntimeError: cuDNN error: CUDNN_STATUS_EXECUTION_FAILED HOT 2
- 'editdistance/bycython.pyx' doesn't match any files
- assert os.path.isfile(file)
- 评价指标 HOT 1
- callbacks.py中断言错误:AssertionError: tensor([0, 0], device='cuda:0') != tensor([26, 26], device='cuda:0') HOT 6
- 数据集的问题
- RecursionError
- 请问训练中断之后怎么加载已经保存的模型继续训练呀?没找到修改的参数 HOT 5
- 中英文模型
- Reproduce training issue HOT 2
- 语言模型
- 语言模型训练
- 中文模型
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from abinet.