I am facing this error Applying BPE to valid and test files... L

Why codes file is empty.? about unsupervisedmt HOT 4 OPEN

ykkhan commented on July 18, 2024

Why codes file is empty.?

from unsupervisedmt.

Comments (4)

glample commented on July 18, 2024

This kind of issues with fastBPE typically happens when you have too many BPE codes and a too small vocabulary.
What is your vocabulary size, and how many BPE codes are you trying to compute?

from unsupervisedmt.

ykkhan commented on July 18, 2024

Ok, I get it.
can you suggest how many code should i compute for dataset has 610 sentence in each file (train, test, valid). it is also not extract correct vocabulary. its only extract alphabets instead of words. as shown in image. it is computing just 93 vocabulary size.

for 610 sentences, I have tried different values in almost 200 to 1500 for BPE code. for each code value it is giving same issue.
Although dataset is very small, but at this time my task is to run this technique successfully. Next I will increase dataset.

from unsupervisedmt.

glample commented on July 18, 2024

610 sentences is very small. I would simply use word level and not BPE in this case.
BPE is useful to reduce the vocabulary size and to avoid computing a softmax over hundred of thousands of elements. But in your case the vocabulary will be very small so you probably don't need BPE.

from unsupervisedmt.

ykkhan commented on July 18, 2024

ok, thanks for replying. can you guide me little bit more. which part of data.enfr.sh I should remove, what to insert in the code to this.

from unsupervisedmt.

Recommend Projects

Why codes file is empty.? about unsupervisedmt HOT 4 OPEN

Comments (4)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent