lettergram / sentence-classification Goto Github PK

View Code? Open in Web Editor NEW

235.0 8.0 35.0 107.14 MB

Sentence Classifications with Neural Networks

Home Page: https://austingwalters.com/neural-networks-to-production-from-an-engineer/

License: Other

Python 100.00%

sentence-classification hyperparameter-tuning fasttext cnn rnn neural-network

sentence-classification's Issues

Error establishing a database connection when trying to access austingwalters.com

Hi,
I'm following your guide Neural Networks to Production, From an Engineer but I have problems accessing the site, it gives me a Database Error ( Error establishing a database connection ).

Hyperparameter tuning code request

Hello,

Thank you for this awesome repo, giving me better understanding of different approach in text classification.
Please, is it possible to have access to the code used for the hyperparameter tuning?

Thank you.

Error : Inference on pre-trained model

Hi,
I was trying to test on pre-trained model(cnn). I successfully loaded your model with following commands :

  # load json and create model
   json_file = open(model_name + '.json', 'r')
   loaded_model_json = json_file.read()
   json_file.close()
   model = model_from_json(loaded_model_json)

Now, when I tried to test it through following code:


test_comments, test_comments_category = get_custom_test_comments()

x_test, _, y_test, _ = encode_data(test_comments, test_comments_category,
                                   data_split=1.0,
                                   embedding_name=embedding_name,
                                   add_pos_tags_flag=pos_tags_flag)

x_test = sequence.pad_sequences(x_test, maxlen=maxlen)
y_test = keras.utils.to_categorical(y_test, num_classes)

score = model.evaluate(x_test, y_test,
                       batch_size=batch_size, verbose=1)

^this last line of model.evaluate resulted in error :


InvalidArgumentError: indices[13,490] = 22271 is not in [0, 15000)
	 [[Node: embedding_1_9/embedding_lookup = GatherV2[Taxis=DT_INT32, Tindices=DT_INT32, Tparams=DT_FLOAT, _class=["loc:@dropout_1_9/cond/Switch_1"], _device="/job:localhost/replica:0/task:0/device:CPU:0"](embedding_1_9/embeddings/read, embedding_1_9/Cast, embedding_1_9/embedding_lookup/axis)]]

which I figured out that it might be because of word id which is contained in x_test, because value of max_words which we have is 15000 and maximum value in x_test is far greater than 15000 so it's not able to find words which have id greater than 15000. I tried to divide all the values of x_test by 100 and then converted all the values to integer. Then it successfully worked.

So , can you please suggest me If I am doing anything wrong, or any other word encoding needs to be loaded?
Thanks for the help.

data set with (train, val, test) splits

Hey there, Is it possible to get the final dataset with splits? I intend to train a transformer model for classifying questions vs statements. I can later create a pull request and that model can integrated to this wonderful repo of classifiers that you already have. Thanks!

Using pretrained model

Hi, I want to use the pre-trained model to classify my sentences. But I am not that familiar with Deep Learning.
Here I have some questions:

is tensorflow==2.4.0 necessary?
I have some sentences stored as txt. files, can I use them as inputs? If not, what should be the input while using the pre-trained model?

Issue with word_embeddings generated from encode_phrases

While using default word_encoding if a word is not present in dictionary key it is given an encoding 0. I saw your default_word_encoding.json and found multiple words have value 0 and so on. If it is intended can I know how model differentiates between new words and seen words with same encoded value?

Counts printed by gen_test_comments include duplicates

The values being printed by gen_test_comments are

-------------------------
command 1672
statement 80993
question 131219
-------------------------

However, the actual values are 1111, 80167 and, 131001 for commands, statements and questions respectively. The values stored in the variables like command_count include duplicate sentences.
While the tagged_comments dict takes care of duplicate values, the counts still contain duplicates.

lettergram / sentence-classification Goto Github PK

sentence-classification's Issues

Error establishing a database connection when trying to access austingwalters.com

Hyperparameter tuning code request

Error : Inference on pre-trained model

data set with (train, val, test) splits

Using pretrained model

Issue with word_embeddings generated from encode_phrases

Counts printed by gen_test_comments include duplicates

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent