Code Monkey home page Code Monkey logo

nlp-tutorial's Introduction

github stats

Linkedin Badge Gmail Badge

nlp-tutorial's People

Contributors

lyeoni avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

nlp-tutorial's Issues

Using the classifier

Hi
After saving the model in news-category-classification, how do you actually use it to predict text classification?
Can you put up an example, please?

How could utilize GPU totally?

Thanks for your code!

I found that when I do training, the GPU are not totally utilized. So it there is way to add batch to train more pairs at one iter?

question-answer-matching missing file

Hi Lyeoni,

First of all, thank you a lot for your work in making these tutorials, which are interesting !

I am trying to run the question-answer-matching tutorial and reproduce your evaluation. Unfortunately, I can't download the Posts.xml file from git lfs as it looks like your subscription doesn't accept download anymore.
By any chance, do you have that file hosted somewhere else ? That would allow me to run the evaluation with your trained model.

Thanks a lot and I wish you a nice day ! :-)

Arabic to Urdu Machine Translation

@lyeoni

In the case I want to train an Arabic to Urdu Machine Translation:

  • is that attainable using this project?
  • what options should be set in training?
  • do you suggest another github project?

typo in preprocessing?

Hi,
In cleaning function in the script : nlp-tutorial/news-category-classifcation/preprocessing.py,
line 21 is written as text = re.sub(r'[!]{2,}', '?', text) # multiple ?s -> ?. There should be ? in first argument and It should be text = re.sub(r'[?]{2,}', '?', text) # multiple ?s -> ?.
Am I correct?

Little improvements for right indexes in vocabulary dictionaries

Hi, @lyeoni !
You have written great tutorials. I really appreciate you)
We can improve a little bit with one pretty line. Look, please)
Here, we fill first key-value items of stoi, itos by special tokens.
I suggest insert this line before cycle.
special_tokens = filter(lambda x: x is not None, [self.unk_token, self.bos_token, self.eos_token, self.pad_token])
If we don't set value for self.unk_token and set for self.bos_token, then index in dictionary become wrong. So, we need filter None values before.
Input
vocab = Vocab(body, bos_token='<bos>'); vocab.build(); vocab.stoi;
Wrong Output
'<bos>': 1 ' ': 1, 'hi': 2, 'bear': 3, ...

num_samples should be a positive integer value, but got num_samples=0

python train.py --epochs 12 --batch_size 2 --learning_rate .001 --hidden_size 64 --n_layers 1 --dropout_p .1

number of trained word vectors of data/glove.6B.100d.txt: 400000
Traceback (most recent call last):
File "train.py", line 200, in
train_loader = DataLoader(dataset=qa_train, batch_size=config.batch_size, shuffle=True, num_workers=4)
File "/usr/local/lib/python3.6/dist-packages/torch/utils/data/dataloader.py", line 176, in init
sampler = RandomSampler(dataset)
File "/usr/local/lib/python3.6/dist-packages/torch/utils/data/sampler.py", line 66, in init
"value, but got num_samples={}".format(self.num_samples))
ValueError: num_samples should be a positive integer value, but got num_samples=0

Please Let Know What's the exact issue

neural-machine-translation - nmt ZeroDivisionError: integer division or modulo by zero

Traceback (most recent call last):

File "", line 1, in
runfile('D:/nlp-tutorial/neural-machine-translation/nmt/train.py', wdir='D:/nlp-tutorial/neural-machine-translation/nmt')

File "C:\ProgramData\Anaconda3\lib\site-packages\spyder_kernels\customize\spydercustomize.py", line 827, in runfile
execfile(filename, namespace)

File "C:\ProgramData\Anaconda3\lib\site-packages\spyder_kernels\customize\spydercustomize.py", line 110, in execfile
exec(compile(f.read(), filename, 'exec'), namespace)

File "D:/nlp-tutorial/neural-machine-translation/nmt/train.py", line 254, in
trainiters(pairs, encoder, decoder, n_iters)

File "D:/nlp-tutorial/neural-machine-translation/nmt/train.py", line 184, in trainiters
train_pairs += [random.choice(train_pairs) for i in range(n_iters%len(train_pairs))]

ZeroDivisionError: integer division or modulo by zero

Question about validate acc

Thanks for your great job! I learned a lot. However, I have a question.
I train the model for 7 epochs reaching a train acc of 95.2 and test(validate) acc of 85.2.
Is that normal? Could the final test(validate) acc be higher after more epochs? Thanks!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.