mhagiwara / realworldnlp Goto Github PK

View Code? Open in Web Editor NEW

325.0 20.0 91.0 4.98 MB

Example code for "Real-World Natural Language Processing"

Home Page: http://www.realworldnlpbook.com/

Python 100.00%

realworldnlp's Introduction

Real-World Natural Language Processing

This repository contains example code for the book "Real-World Natural Language Processing."

AllenNLP (2.5.0 or above) is required to run the example code in this repository.

Examples included in this repository:

Sentiment analysis (LSTM) [blog article] [Colab notebook]
Sentiment analysis (CNN) [Colab notebook]
Sentiment analysis (with BERT) [AllenNLP config]
Language detector [Colab notebook]
Part-of-speech tagging [blog article] [Colab notebook]
Named entity recognition (NER) [Colab notebook]
Language generation [Colab notebook (LSTM)] [Colab notebook (Transformers)]

realworldnlp's People

Contributors

Stargazers

Watchers

Forkers

fankli fulin-wei wangyiyao2016 nininininini amirunpri2018 remotejob ramithp altsuzy mylv1222 raokaran leelaylay quanth enesozi chl916185 t1anzhenyu ehix 910199 dartrevan dilettacal lisennlp atanu1ytiam serdarbozoglan gunnxx wangcongcong123 siviltaram suluo aishajv rikukawamura lishengfever jiaruipeng1994 percent4 rulegreen sunyancn geor7 sahupankaj10 chenq1114 96koushikroy youarerare rtjshreyd alexmei98 ramk7 ankitvad williamyzd alkindiisda statham-stone fagan2888 nada-s sanjeevhalyal arakotom xrosliang mathcass hitman56 zeta1999 njcx-ai adeveloperdiary daniel-csvw pearatme linmu7177 alan-018 ppvastar pmaxit aolingshuang alainlompo kasidkhansbp paulhkim80 wangweitl81 candy555 ameer191815 savy2017 mickdelaney techthiyanes jjdiaz24 aditya964 laplacekorea sharif618 egumasa azurecloudmonk programcomputer zqheartqq cldmello jcarlosneto aje-dotcom xiaoouwang yogi-1999 kostto-ray alexpopo kkpan11 saibaldasprivate daukantas

realworldnlp's Issues

run_onny_tagger.tf.py: Dimensions must be equal

I've been following along with your article How to Convert an AllenNLP model and Deploy on Caffe2 and TensorFlow and run_onny_tagger.tf.py seems to be broken because of the following error:

ValueError: Dimensions must be equal, but are 64 and 128 for 'LSTM_0813814f/rnn/while/rnn/multi_rnn_cell/cell_0/lstm_cell/mul' (op: 'Mul') with input shapes: [1,64], [1,128].

Error in 2.8.1

Hi,

While trying

predictor = SentenceClassifierPredictor(model, dataset_reader=reader)

in Sec 2.8.1 I'm getting error

AttributeError: 'StanfordSentimentTreeBankDatasetReader' object has no attribute '_tokenizer'

I see you have made some changes in this commit.

Help: examples/mt/mt.py

I try reproduce examples/mt/mt.py but I have CPU/CUDA error:

File "/opt/conda/lib/python3.6/site-packages/allennlp/models/encoder_decoders/simple_seq2seq.py", line 212, in forward state = self._encode(source_tokens) File "/opt/conda/lib/python3.6/site-packages/allennlp/models/encoder_decoders/simple_seq2seq.py", line 268, in _encode embedded_input = self._source_embedder(source_tokens) File "/opt/conda/lib/python3.6/site-packages/torch/nn/modules/module.py", line 493, in __call__ result = self.forward(*input, **kwargs) File "/opt/conda/lib/python3.6/site-packages/allennlp/modules/text_field_embedders/basic_text_field_embedder.py", line 123, in forward token_vectors = embedder(*tensors) File "/opt/conda/lib/python3.6/site-packages/torch/nn/modules/module.py", line 493, in __call__ result = self.forward(*input, **kwargs) File "/opt/conda/lib/python3.6/site-packages/allennlp/modules/token_embedders/embedding.py", line 143, in forward sparse=self.sparse) File "/opt/conda/lib/python3.6/site-packages/torch/nn/functional.py", line 1506, in embedding return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse) RuntimeError: Expected object of backend CPU but got backend CUDA for argument #3 'index'

I try play in kaggle enveroment

typos and errata (last updated 2021/05/18)

chapter 1, should be text generation

Finally, a third class of text classification is unconditional text generation, where natural language text is generated stochastically from a model. You can train models so that they can generate some random academic papers, Linux source code, or even some poems and play scripts. For example, Andrej Karpathy trained an RNN model form all works of Shakespeare and succeeded in generation pieces of text that look exactly like his work (http://realworldnlpbook.com/ch1.html#karpathy15):

4.2.3 typo swtich in the pseudocode

def update_gru(state, word):
    new_state = update_hidden(state, word)
 
    switch = get_switch(state, word)
 
    state = swtich * new_state + (1 – switch) * state
 
    return state

chapter 5: micro and macro should be switched.

original text:

If these metrics are computed while ignoring entity types, it’s called a micro average. For example, the micro-averaged precision is the total number of true positives of all types divided by the total number of retrieved named entities regardless of the type. On the other hand, if these metrics are computed per entity type and then get averaged, it’s called a macro average. For example, if the precision for PER and GPE is 80% and 90%, respectively, its macro average is 85%. What AllenNLP computes in the following is the micro average.

5.6.1

The language detection in a previous chapter used RNN and character as input.

original text:

In the first half of this section, we are going to build an English language model and train it using a generic English corpus. Before we start, we note that the RNN language model we build in this chapter operates on characters, not on words or tokens. All the RNN models we’ve seen so far operate on words, which means the input to the RNN was always sequences of words. On the other hand, the RNN we are going to use in this section takes sequences of characters as the input.

Using GPU

Great Tutorial!
It would be very cool, if you can describe how to use the GPU to run it faster.

LSTM CLAsSIFier already used

a lot of codes are broken in allennlp 2.0

I'm now reading the book and notice a lot of bugs related to allennlp2.0. Does the author consider upgrading the code to allennlp2.0 to make it comply more with the title real world nlp?

It's a pity because this book is I think the only book using allennlp to tackle a range of general nlp tasks and I like it very much.

Some examples:

In the sst_classifier.ipynb one can note:

vocab = Vocabulary.from_instances(train_dataset + dev_dataset,
                                  min_count={'tokens': 3})

gives

unsupported operand type(s) for +: 'generator' and 'generator'

(easily fixable using list(reader.read('train.txt')))

The following two lines

train_dataset.index_with(vocab)
dev_dataset.index_with(vocab)

give

'generator' object has no attribute 'index_with'

and also not specific to allen2.0,

predictor = SentenceClassifierPredictor(model, dataset_reader=reader)

gives

AttributeError: 'StanfordSentimentTreeBankDatasetReader' object has no attribute '_tokenizer'

Positive label for F1 measure is not configured correctly

I reviewed the code: examples/sentiment/sst_classifier.py, and found a bug.

    self.f1_measure = F1Measure(4)

I think this code is intended to measure precision/recall/f1 for the label '4' which is the most positive sentiment. However, the integer 4 here is considered as index in the array representation. It must be converted using label mapping stored in vocab.

Module Not Found Error for Machine Translation

ModuleNotFoundError: No module named 'allennlp.data.dataset_readers.seq2seq'
I'm trying to run the code examples/mt/mt.py with allennlp==1.0.0 and got this error, no code changes direct clone of the repo and tried to run it.

mhagiwara / realworldnlp Goto Github PK

realworldnlp's Introduction

Real-World Natural Language Processing

realworldnlp's People

Contributors

Stargazers

Watchers

Forkers

realworldnlp's Issues

run_onny_tagger.tf.py: Dimensions must be equal

Error in 2.8.1

Help: examples/mt/mt.py

typos and errata (last updated 2021/05/18)

Using GPU

LSTM CLAsSIFier already used

a lot of codes are broken in allennlp 2.0

Positive label for F1 measure is not configured correctly

Module Not Found Error for Machine Translation

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent