lyeoni / nlp-tutorial Goto Github PK
View Code? Open in Web Editor NEWA list of NLP(Natural Language Processing) tutorials
License: MIT License
A list of NLP(Natural Language Processing) tutorials
License: MIT License
Hi
After saving the model in news-category-classification, how do you actually use it to predict text classification?
Can you put up an example, please?
Thanks for your code!
I found that when I do training, the GPU are not totally utilized. So it there is way to add batch to train more pairs at one iter?
Hi, I am curious about the inference part in the model. Does any example to show how it works? Many thank.
Hi Lyeoni,
First of all, thank you a lot for your work in making these tutorials, which are interesting !
I am trying to run the question-answer-matching tutorial and reproduce your evaluation. Unfortunately, I can't download the Posts.xml file from git lfs as it looks like your subscription doesn't accept download anymore.
By any chance, do you have that file hosted somewhere else ? That would allow me to run the evaluation with your trained model.
Thanks a lot and I wish you a nice day ! :-)
Kindly add a tutorial for NLP with transformer setup
In the case I want to train an Arabic to Urdu Machine Translation:
Hi,
In cleaning function in the script : nlp-tutorial/news-category-classifcation/preprocessing.py
,
line 21 is written as text = re.sub(r'[!]{2,}', '?', text) # multiple ?s -> ?
. There should be ? in first argument and It should be text = re.sub(r'[?]{2,}', '?', text) # multiple ?s -> ?
.
Am I correct?
Hi, @lyeoni !
You have written great tutorials. I really appreciate you)
We can improve a little bit with one pretty line. Look, please)
Here, we fill first key-value items of stoi, itos by special tokens.
I suggest insert this line before cycle.
special_tokens = filter(lambda x: x is not None, [self.unk_token, self.bos_token, self.eos_token, self.pad_token])
If we don't set value for self.unk_token
and set for self.bos_token
, then index in dictionary become wrong. So, we need filter None values before.
Input
vocab = Vocab(body, bos_token='<bos>'); vocab.build(); vocab.stoi;
Wrong Output
'<bos>': 1 ' ': 1, 'hi': 2, 'bear': 3, ...
python train.py --epochs 12 --batch_size 2 --learning_rate .001 --hidden_size 64 --n_layers 1 --dropout_p .1
number of trained word vectors of data/glove.6B.100d.txt: 400000
Traceback (most recent call last):
File "train.py", line 200, in
train_loader = DataLoader(dataset=qa_train, batch_size=config.batch_size, shuffle=True, num_workers=4)
File "/usr/local/lib/python3.6/dist-packages/torch/utils/data/dataloader.py", line 176, in init
sampler = RandomSampler(dataset)
File "/usr/local/lib/python3.6/dist-packages/torch/utils/data/sampler.py", line 66, in init
"value, but got num_samples={}".format(self.num_samples))
ValueError: num_samples should be a positive integer value, but got num_samples=0
Please Let Know What's the exact issue
Traceback (most recent call last):
File "", line 1, in
runfile('D:/nlp-tutorial/neural-machine-translation/nmt/train.py', wdir='D:/nlp-tutorial/neural-machine-translation/nmt')
File "C:\ProgramData\Anaconda3\lib\site-packages\spyder_kernels\customize\spydercustomize.py", line 827, in runfile
execfile(filename, namespace)
File "C:\ProgramData\Anaconda3\lib\site-packages\spyder_kernels\customize\spydercustomize.py", line 110, in execfile
exec(compile(f.read(), filename, 'exec'), namespace)
File "D:/nlp-tutorial/neural-machine-translation/nmt/train.py", line 254, in
trainiters(pairs, encoder, decoder, n_iters)
File "D:/nlp-tutorial/neural-machine-translation/nmt/train.py", line 184, in trainiters
train_pairs += [random.choice(train_pairs) for i in range(n_iters%len(train_pairs))]
ZeroDivisionError: integer division or modulo by zero
Movie Rating Classification no datasets?
The corresponding package is installed and Data set downloaded,Run vocab.py . The following error occurred: “local variable 'MosesTokenizer' referenced before assignment”
Thanks for your great job! I learned a lot. However, I have a question.
I train the model for 7 epochs reaching a train acc of 95.2 and test(validate) acc of 85.2.
Is that normal? Could the final test(validate) acc be higher after more epochs? Thanks!
I had install nltk; but a error occur when I run the code;
ModuleNotFoundError: No module named 'nltk.tokenize.moses'
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.