lyeoni / nlp-tutorial Goto Github PK

View Code? Open in Web Editor NEW

1.4K 49.0 265.0 1.42 GB

A list of NLP(Natural Language Processing) tutorials

License: MIT License

Python 5.67% Shell 0.04% Jupyter Notebook 94.30%

nlp natural-language-processing nlp-tutorial neural-machine-translation text-classification sentiment-classification

nlp-tutorial's Introduction

nlp-tutorial's People

Contributors

Stargazers

Watchers

Forkers

zeniel-oroi jp1936 keep-steady awesome-archive puru01 foreseez ilineicry allensmile legendtianjin songxianjin hsouporto boykis82 sandepp123 mariobyn rrmina gdcollect anamikasen fengyicoder rssanjeev kaderberrouachedi cathy-kim yesheng607 jiamim yunweidashuju rogerspy just92up coolerme samux87 jangocheng nofeetbird0321 jangocity rie-long wps1112 liuwenhaha ada1582 nickgxnn uuuup zhang-yun-peng chaoongithub deqiangxiao zhexiongliu awesome-docs zhwj0803 znsoftm zhhsx001 tswhen meanmachine1031 zdx ringwraith chengmuni66 moseshu sidney1994 fishredleaf lyxx666 webshell520 wguo123 bzqweiyi renhongquan pieere langke14199 hhy5277 kgoeson liubin12360 physcoder d1jiasheng liuwq168 hezihan0606 khaled-klod wibruce nastul yingtaohuo chubukou melicent114 mousechen luolanfeixue fengfengj lijian10086 yanyiting prasancumarn leynard007 jacklee20151 vinklibrary sxzhou1937 hzj1558718 tasnimneo lizhaoliu-lec chenpe32cp shualite subrota-mondal everlee78 nhatrio svmihar tiandiao123 yerayl essie-chiang typanda ysyfrank xzycr7 afcarl rui-ma

nlp-tutorial's Issues

Using the classifier

Hi
After saving the model in news-category-classification, how do you actually use it to predict text classification?
Can you put up an example, please?

How could utilize GPU totally?

Thanks for your code!

I found that when I do training, the GPU are not totally utilized. So it there is way to add batch to train more pairs at one iter?

Confused about the inference. Any example?

Hi, I am curious about the inference part in the model. Does any example to show how it works? Many thank.

question-answer-matching missing file

Hi Lyeoni,

First of all, thank you a lot for your work in making these tutorials, which are interesting !

I am trying to run the question-answer-matching tutorial and reproduce your evaluation. Unfortunately, I can't download the Posts.xml file from git lfs as it looks like your subscription doesn't accept download anymore.
By any chance, do you have that file hosted somewhere else ? That would allow me to run the evaluation with your trained model.

Thanks a lot and I wish you a nice day ! :-)

Please add transformer based tutorial

Kindly add a tutorial for NLP with transformer setup

Arabic to Urdu Machine Translation

@lyeoni

In the case I want to train an Arabic to Urdu Machine Translation:

is that attainable using this project?
what options should be set in training?
do you suggest another github project?

How about the speed of the model

typo in preprocessing?

Hi,
In cleaning function in the script : nlp-tutorial/news-category-classifcation/preprocessing.py,
line 21 is written as text = re.sub(r'[!]{2,}', '?', text) # multiple ?s -> ?. There should be ? in first argument and It should be text = re.sub(r'[?]{2,}', '?', text) # multiple ?s -> ?.
Am I correct?

Little improvements for right indexes in vocabulary dictionaries

Hi, @lyeoni !
You have written great tutorials. I really appreciate you)
We can improve a little bit with one pretty line. Look, please)
Here, we fill first key-value items of stoi, itos by special tokens.
I suggest insert this line before cycle.
special_tokens = filter(lambda x: x is not None, [self.unk_token, self.bos_token, self.eos_token, self.pad_token])
If we don't set value for self.unk_token and set for self.bos_token, then index in dictionary become wrong. So, we need filter None values before.
Input
vocab = Vocab(body, bos_token='<bos>'); vocab.build(); vocab.stoi;
Wrong Output
'<bos>': 1 ' ': 1, 'hi': 2, 'bear': 3, ...

num_samples should be a positive integer value, but got num_samples=0

python train.py --epochs 12 --batch_size 2 --learning_rate .001 --hidden_size 64 --n_layers 1 --dropout_p .1

number of trained word vectors of data/glove.6B.100d.txt: 400000
Traceback (most recent call last):
File "train.py", line 200, in
train_loader = DataLoader(dataset=qa_train, batch_size=config.batch_size, shuffle=True, num_workers=4)
File "/usr/local/lib/python3.6/dist-packages/torch/utils/data/dataloader.py", line 176, in init
sampler = RandomSampler(dataset)
File "/usr/local/lib/python3.6/dist-packages/torch/utils/data/sampler.py", line 66, in init
"value, but got num_samples={}".format(self.num_samples))
ValueError: num_samples should be a positive integer value, but got num_samples=0

Please Let Know What's the exact issue

neural-machine-translation - nmt ZeroDivisionError: integer division or modulo by zero

Traceback (most recent call last):

File "", line 1, in
runfile('D:/nlp-tutorial/neural-machine-translation/nmt/train.py', wdir='D:/nlp-tutorial/neural-machine-translation/nmt')

File "C:\ProgramData\Anaconda3\lib\site-packages\spyder_kernels\customize\spydercustomize.py", line 827, in runfile
execfile(filename, namespace)

File "C:\ProgramData\Anaconda3\lib\site-packages\spyder_kernels\customize\spydercustomize.py", line 110, in execfile
exec(compile(f.read(), filename, 'exec'), namespace)

File "D:/nlp-tutorial/neural-machine-translation/nmt/train.py", line 254, in
trainiters(pairs, encoder, decoder, n_iters)

File "D:/nlp-tutorial/neural-machine-translation/nmt/train.py", line 184, in trainiters
train_pairs += [random.choice(train_pairs) for i in range(n_iters%len(train_pairs))]

ZeroDivisionError: integer division or modulo by zero

Movie Rating Classification no datasets

Movie Rating Classification no datasets?

local variable 'MosesTokenizer' referenced before assignment

The corresponding package is installed and Data set downloaded，Run vocab.py . The following error occurred： “local variable 'MosesTokenizer' referenced before assignment”

Question about validate acc

Thanks for your great job! I learned a lot. However, I have a question.
I train the model for 7 epochs reaching a train acc of 95.2 and test(validate) acc of 85.2.
Is that normal? Could the final test(validate) acc be higher after more epochs? Thanks!

No module named 'nltk.tokenize.moses'

I had install nltk; but a error occur when I run the code;
ModuleNotFoundError: No module named 'nltk.tokenize.moses'