jarfo / kchar Goto Github PK
View Code? Open in Web Editor NEWCharacter-Aware Neural Language Models. A keras-based implementation
Character-Aware Neural Language Models. A keras-based implementation
While using this code on large datasets, I face the tcmalloc alert and the process is killed. What could be a solution to this? Can we process the data on several batches?
tcmalloc: large alloc 26052075520 bytes == 0x1fbca000 @ 0x7f4a26405f21 0x7f4a23f93ae5 0x7f4a23ff68d3 0x7f4a23ff8816 0x7f4a24090b28 0x4c4b0b 0x54f3c4 0x551ee0 0x54efc1 0x54f24d 0x553aaf 0x54e4c8 0x5582c2 0x459c11 0x45969e 0x4e0b5b 0x4da8d7 0x459893 0x54f117 0x553aaf 0x54e4c8 0x54f4f6 0x553aaf 0x54efc1 0x54ff73 0x42b3c9 0x42b5b5 0x44182b 0x421f64 0x7f4a2524d1c1 0x42201a
For a quick test, I first test on a small model by running the code:
python train_lstm3.py --rnn_size 50 --highway_layer 1 --feature_maps 5 10 15 20 25 30 --kernels 1 2 3 4 5 6 --checkpoint_dir LSTM4 --savefile char-small --max_epochs 1
It gives the results:
Epoch 1/1
1327/1327 [==============================] - 2175s - loss: 6.6535
Epoch 1/1. Validation loss: 708.149629583
Perplexity on test set: 672.399581347
With the trained model, I perform evaluation on the test again by running the codes:
python train_lstm3.py --rnn_size 50 --highway_layer 1 --feature_maps 5 10 15 20 25 30 --kernels 1 2 3 4 5 6 --checkpoint_dir LSTM4 --savefile char-small --max_epochs 1 --skip_train
But it gives the error message:
C:\Users\jingweimo\Anaconda2\lib\site-packages\keras\legacy\layers.py:654: UserWarning: The
Highwaylayer is deprecated and will be removed after 06/2017. warnings.warn('The
Highway` layer is deprecated '
Traceback (most recent call last):
File "train.py", line 114, in
main(params)
File "train.py", line 46, in main
model = load_model('{}/{}.json'.format(opt.checkpoint_dir, opt.savefile))
File "D:\LSTM packages\kchar-master\kchar-master\model\LSTMCNN.py", line 190, in load_model
model.compile(loss='sparse_categorical_crossentropy', optimizer=SGD)
File "C:\Users\jingweimo\Anaconda2\lib\site-packages\keras\engine\training.py", line 720, in compile
self.optimizer = optimizers.get(optimizer)
File "C:\Users\jingweimo\Anaconda2\lib\site-packages\keras\optimizers.py", line 710, in get
identifier)
ValueError: ('Could not interpret optimizer identifier:', <class 'keras.optimizers.SGD'>)`
It seems there is something wrong with load_model
in LSTMCNN.py
. I feel you may need to change
model.compile(loss='sparse_categorical_crossentropy', optimizer=SGD)
to
model.compile(loss='sparse_categorical_crossentropy', optimizer=SGD())
I'm running this code with the tensorflow backend, and encounter the following errors:
[lsong10@bh25fen sub.kchar]$ more train.err
Using TensorFlow backend.
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcublas.so.8.0 locally
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcudnn.so.5 locally
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcufft.so.8.0 locally
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcuda.so.1 locally
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcurand.so.8.0 locally
Traceback (most recent call last):
File "train.py", line 99, in
main(params)
File "train.py", line 24, in main
model = LSTMCNN(opt)
File "/gpfs/fs2/scratch/lsong10/exp.full_BoW/sub.kchar/model/LSTMCNN.py", line 95, in LSTMCNN
cnn = CNN(opt.seq_length, opt.max_word_l, opt.char_vec_size, opt.feature_maps, opt.kernels, chars_embedding)
File "/gpfs/fs2/scratch/lsong10/exp.full_BoW/sub.kchar/model/LSTMCNN.py", line 63, in CNN
conv = Convolution2D(feature_map, 1, kernel, activation='tanh', dim_ordering='tf')(x)
File "/software/python/2.7.12/lib/python2.7/site-packages/keras/engine/topology.py", line 572, in call
self.add_inbound_node(inbound_layers, node_indices, tensor_indices)
File "/software/python/2.7.12/lib/python2.7/site-packages/keras/engine/topology.py", line 635, in add_inbound_node
Node.create_node(self, inbound_layers, node_indices, tensor_indices)
File "/software/python/2.7.12/lib/python2.7/site-packages/keras/engine/topology.py", line 166, in create_node
output_tensors = to_list(outbound_layer.call(input_tensors[0], mask=input_masks[0]))
File "/software/python/2.7.12/lib/python2.7/site-packages/keras/layers/convolutional.py", line 475, in call
filter_shape=self.W_shape)
File "/software/python/2.7.12/lib/python2.7/site-packages/keras/backend/tensorflow_backend.py", line 2691, in conv2d
x = tf.nn.conv2d(x, kernel, strides, padding=padding)
File "/software/python/2.7.12/lib/python2.7/site-packages/tensorflow/python/ops/gen_nn_ops.py", line 396, in conv2d
data_format=data_format, name=name)
File "/software/python/2.7.12/lib/python2.7/site-packages/tensorflow/python/framework/op_def_library.py", line 585, in apply_op
param_name=input_name)
File "/software/python/2.7.12/lib/python2.7/site-packages/tensorflow/python/framework/op_def_library.py", line 61, in _SatisfiesTypeCon
straint
", ".join(dtypes.as_dtype(x).name for x in allowed_list)))
TypeError: Value passed to parameter 'input' has DataType int32 not in list of allowed values: float16, float32, float64
I faced a problems when tried to use trained model.
What i tried to do:
model = LSTMCNN(opts)
model.load_weights('words-large.h5')
...
input = some_seed_input
for ...:
x = {'word': np.array([input])}
prediction = model.predict(x)
input = input[1:len(input)]
Something like this predicts several words, then predictions becomes like "the the the be of the".
What I am doing wrong?
Could you please show basic example of using trained model to generate text, please?
def CNN(seq_length, length, feature_maps, kernels, x):
concat_input = []
for feature_map, kernel in zip(feature_maps, kernels):
reduced_l = length - kernel + 1
conv = Conv2D(feature_map, **(1, kernel),** activation='tanh', data_format="channels_last")(x)
maxp = MaxPooling2D((1, reduced_l), data_format="channels_last")(conv)
concat_input.append(maxp)
x = Concatenate()(concat_input)
x = Reshape((seq_length, sum(feature_maps)))(x)
return x
Here, I thought the size of kernel should be d*w, from the description of paper.
It means the output of the embedding layer should be (20, 35, 31, 15, 1).
Thank you in advance for your responses.
Is this implementation currently working for you @jarfo? I currently get a large amount of errors when trying to run it.
@jarfo I couldn't understand why the shape of char layer is (opt.batch_size, opt.seq_length, opt.max_word_l), eg. (20, 35, 21); why not (batch_size, seq_length), eg.(20, 35)?
I run train.py for a quick check the results, but get the error message:
`one-time setup: preprocessing input train/valid/test files in dir: data/ptb
Processing text into tensors...
Traceback (most recent call last):
File "", line 58, in
main(params)
File "", line 2, in main
loader = BatchLoaderUnk(opt.tokens, opt.data_dir, opt.batch_size, opt.seq_length, opt.max_word_l, opt.n_words, opt.n_chars)
File "D:\Research\Manuscript\Simplified LSTM\LSTM packages\kchar-master\kchar-master\util\BatchLoaderUnk.py", line 34, in init
self.text_to_tensor(tokens, input_files, vocab_file, tensor_file, char_file, max_word_l)
File "D:\Research\Manuscript\Simplified LSTM\LSTM packages\kchar-master\kchar-master\util\BatchLoaderUnk.py", line 148, in text_to_tensor
f = codecs.open(input_files[split], 'r', encoding)
File "C:\Users\Yuzhen Lu\Anaconda2\lib\codecs.py", line 896, in open
file = builtin.open(filename, mode, buffering)
IOError: [Errno 2] No such file or directory: 'data/ptb\train.txt'`
I use python2.7 (spyder ide) with keras 2.0.5 and theano 0.10
I tried evaluate.py with character level model using following command
python evaluate.py --model cv/char-large --vocabulary data/ptb/vocab.npz --init init.npy --text data/ptb/test.txt --calc
I got shape error which is a problem of inputting, i guess but i don't know how to give input and fix this problem...
Traceback (most recent call last):
File "evaluate.py", line 263, in
main(args.model, args.vocabulary, args.init, args.text, args.calc)
File "evaluate.py", line 174, in main
lprob, nwords, output = ev.logprob(line)
File "evaluate.py", line 107, in logprob
out = self.model.predict(np.array([x['word'][wrd]]), batch_size=1)
File "/usr/local/lib/python3.5/dist-packages/Keras-2.0.6-py3.5.egg/keras/engine/training.py", line 1499, in predict
File "/usr/local/lib/python3.5/dist-packages/Keras-2.0.6-py3.5.egg/keras/engine/training.py", line 128, in _standardize_input_data
ValueError: Error when checking : expected chars to have 3 dimensions, but got array with shape (1, 1)
I had trained a system with some sentences(say 10000 sentences) and i want to train the system with 200 more sentences using training incremental. How can i achieve this using this model, so that i can train a system with total10200 sentences.
Thanks in advance...
For word-level training model, it works
python train.py --savefile word-large --highway_layers 0 --use_chars 0 --use_words 1
But for Char-level training model
python train.py --savefile char-large
The following error is coming...
Traceback (most recent call last):
File "train.py", line 115, in
main(params)
File "train.py", line 37, in main
model = LSTMCNN(opt)
File "/home/prasad/tensorflow/kchar3/model/LSTMCNN.py", line 97, in LSTMCNN
cnn = CNN(opt.seq_length, opt.max_word_l, opt.char_vec_size, opt.feature_maps, opt.kernels, chars_embedding)
File "/home/prasad/tensorflow/kchar3/model/LSTMCNN.py", line 65, in CNN
conv = Convolution2D(feature_map, 1, kernel, activation='tanh', dim_ordering='tf')(x)
File "/usr/lib64/python2.7/site-packages/keras/engine/topology.py", line 572, in call
self.add_inbound_node(inbound_layers, node_indices, tensor_indices)
File "/usr/lib64/python2.7/site-packages/keras/engine/topology.py", line 635, in add_inbound_node
Node.create_node(self, inbound_layers, node_indices, tensor_indices)
File "/usr/lib64/python2.7/site-packages/keras/engine/topology.py", line 166, in create_node
output_tensors = to_list(outbound_layer.call(input_tensors[0], mask=input_masks[0]))
File "/usr/lib64/python2.7/site-packages/keras/layers/convolutional.py", line 475, in call
filter_shape=self.W_shape)
File "/usr/lib64/python2.7/site-packages/keras/backend/tensorflow_backend.py", line 2691, in conv2d
x = tf.nn.conv2d(x, kernel, strides, padding=padding)
File "/usr/lib/python2.7/site-packages/tensorflow/python/ops/gen_nn_ops.py", line 396, in conv2d
data_format=data_format, name=name)
File "/usr/lib/python2.7/site-packages/tensorflow/python/framework/op_def_library.py", line 585, in apply_op
param_name=input_name)
File "/usr/lib/python2.7/site-packages/tensorflow/python/framework/op_def_library.py", line 61, in _SatisfiesTypeConstraint
", ".join(dtypes.as_dtype(x).name for x in allowed_list)))
TypeError: Value passed to parameter 'input' has DataType int32 not in list of allowed values: float16, float32, float64
Can anyone please give me the solution for this issue and what is the difference between both the training model and which one is giving better prediction ???
Hi,
I'm trying to train the model like this
python train.py --save_every 1 --data_dir data/test --n_chars 10000 --batch
_size 80
Here's the output
creating an LSTM-CNN with 2 layers
chars (InputLayer) (80, 35, 65) 0
timedistributed_1 (TimeDistribute(80, 35, 65, 15) 88200 chars[0][0]
convolution2d_1 (Convolution2D) (80, 35, 65, 50) 800 timedistributed_1[0][0]
convolution2d_2 (Convolution2D) (80, 35, 64, 100) 3100 timedistributed_1[0][0]
convolution2d_3 (Convolution2D) (80, 35, 63, 150) 6900 timedistributed_1[0][0]
convolution2d_4 (Convolution2D) (80, 35, 62, 200) 12200 timedistributed_1[0][0]
convolution2d_5 (Convolution2D) (80, 35, 61, 200) 15200 timedistributed_1[0][0]
convolution2d_6 (Convolution2D) (80, 35, 60, 200) 18200 timedistributed_1[0][0]
convolution2d_7 (Convolution2D) (80, 35, 59, 200) 21200 timedistributed_1[0][0]
maxpooling2d_1 (MaxPooling2D) (80, 35, 1, 50) 0 convolution2d_1[0][0]
maxpooling2d_2 (MaxPooling2D) (80, 35, 1, 100) 0 convolution2d_2[0][0]
maxpooling2d_3 (MaxPooling2D) (80, 35, 1, 150) 0 convolution2d_3[0][0]
maxpooling2d_4 (MaxPooling2D) (80, 35, 1, 200) 0 convolution2d_4[0][0]
maxpooling2d_5 (MaxPooling2D) (80, 35, 1, 200) 0 convolution2d_5[0][0]
maxpooling2d_6 (MaxPooling2D) (80, 35, 1, 200) 0 convolution2d_6[0][0]
maxpooling2d_7 (MaxPooling2D) (80, 35, 1, 200) 0 convolution2d_7[0][0]
merge_1 (Merge) (80, 35, 1, 1100) 0 maxpooling2d_1[0][0]
maxpooling2d_2[0][0]
maxpooling2d_3[0][0]
maxpooling2d_4[0][0]
maxpooling2d_5[0][0]
maxpooling2d_6[0][0]
maxpooling2d_7[0][0]
reshape_1 (Reshape) (80, 35, 1100) 0 merge_1[0][0]
timedistributed_2 (TimeDistribute(80, 35, 1100) 2422200 reshape_1[0][0]
timedistributed_3 (TimeDistribute(80, 35, 1100) 2422200 timedistributed_2[0][0]
lstm_1 (LSTM) (80, 35, 650) 4552600 timedistributed_3[0][0]
dropout_1 (Dropout) (80, 35, 650) 0 lstm_1[0][0]
lstm_2 (LSTM) (80, 35, 650) 3382600 dropout_1[0][0]
dropout_2 (Dropout) (80, 35, 650) 0 lstm_2[0][0]
Total params: 32475400
It outputs a lsm_char_large_epoch20.h5 which I have renamed it to char-large.h5 and evaluate it with
python evaluate.py --model cv/char-large --vocabulary data/test/vocab.npz --text data/test/test.txt --calc
Word vocab size: 30000, Char vocab size: 5880
chars (InputLayer) (1, 1, 65) 0
timedistributed_1 (TimeDistribute(1, 1, 65, 15) 88200 chars[0][0]
convolution2d_1 (Convolution2D) (1, 1, 65, 50) 800 timedistributed_1[0][0]
convolution2d_2 (Convolution2D) (1, 1, 64, 100) 3100 timedistributed_1[0][0]
convolution2d_3 (Convolution2D) (1, 1, 63, 150) 6900 timedistributed_1[0][0]
convolution2d_4 (Convolution2D) (1, 1, 62, 200) 12200 timedistributed_1[0][0]
convolution2d_5 (Convolution2D) (1, 1, 61, 200) 15200 timedistributed_1[0][0]
convolution2d_6 (Convolution2D) (1, 1, 60, 200) 18200 timedistributed_1[0][0]
convolution2d_7 (Convolution2D) (1, 1, 59, 200) 21200 timedistributed_1[0][0]
maxpooling2d_1 (MaxPooling2D) (1, 1, 1, 50) 0 convolution2d_1[0][0]
maxpooling2d_2 (MaxPooling2D) (1, 1, 1, 100) 0 convolution2d_2[0][0]
maxpooling2d_3 (MaxPooling2D) (1, 1, 1, 150) 0 convolution2d_3[0][0]
maxpooling2d_4 (MaxPooling2D) (1, 1, 1, 200) 0 convolution2d_4[0][0]
maxpooling2d_5 (MaxPooling2D) (1, 1, 1, 200) 0 convolution2d_5[0][0]
maxpooling2d_6 (MaxPooling2D) (1, 1, 1, 200) 0 convolution2d_6[0][0]
maxpooling2d_7 (MaxPooling2D) (1, 1, 1, 200) 0 convolution2d_7[0][0]
merge_1 (Merge) (1, 1, 1, 1100) 0 maxpooling2d_1[0][0]
maxpooling2d_2[0][0]
maxpooling2d_3[0][0]
maxpooling2d_4[0][0]
maxpooling2d_5[0][0]
maxpooling2d_6[0][0]
maxpooling2d_7[0][0]
reshape_1 (Reshape) (1, 1, 1100) 0 merge_1[0][0]
timedistributed_2 (TimeDistribute(1, 1, 1100) 2422200 reshape_1[0][0]
timedistributed_3 (TimeDistribute(1, 1, 1100) 2422200 timedistributed_2[0][0]
lstm_1 (LSTM) (1, 1, 650) 4552600 timedistributed_3[0][0]
dropout_1 (Dropout) (1, 1, 650) 0 lstm_1[0][0]
lstm_2 (LSTM) (1, 1, 650) 3382600 dropout_1[0][0]
dropout_2 (Dropout) (1, 1, 650) 0 lstm_2[0][0]
Total params: 32475400
And then it raise out a error like this.
Error when checking model target: expected timedistributed_4 to have shape (1, 1, 30000) but got array with shape (4, 1, 1)
Is there any problem with the model it outputs?
Hi,
I was running into some issues running your code on keras with a tensorflow backend.
ValueError: The shape for while/Merge_2:0 is not an invariant for the loop. It enters the loop with shape (35, 21), but has shape (20, 21, 15) after one iteration. Provide shape invariants using either the `shape_invariants` argument of tf.while_loop or set_shape() on the loop variables.
Is this something you are aware of ?
Thanks so much for your work on this by the way!
When running python train.py --savefile char-large
, I've encountered the error below.
After first pass of data, max word length is: 21
Token count: train 929589, val 73760, test 82430
saving data/ptb/data_0.npy
saving data/ptb/data_char_0.npy
saving data/ptb/data_1.npy
saving data/ptb/data_char_1.npy
saving data/ptb/data_2.npy
saving data/ptb/data_char_2.npy
saving data/ptb/vocab.npz
loading data files...
Traceback (most recent call last):
File "train.py", line 101, in <module>
main(params)
File "train.py", line 15, in main
loader = BatchLoaderUnk(opt.tokens, opt.data_dir, opt.batch_size, opt.seq_length, opt.max_word_l, opt.n_words, opt.n_chars)
File "/home/jiayi/kchar/util/BatchLoaderUnk.py", line 45, in __init__
self.idx2word, self.word2idx, self.idx2char, self.char2idx = vocab_unpack(vocab_mapping)
File "/home/jiayi/kchar/util/BatchLoaderUnk.py", line 19, in vocab_unpack
return vocab['idx2word'], vocab['word2idx'], vocab['idx2char'], vocab['char2idx']
File "/home/jiayi/.local/lib/python3.6/site-packages/numpy/lib/npyio.py", line 255, in __getitem__
pickle_kwargs=self.pickle_kwargs)
File "/home/jiayi/.local/lib/python3.6/site-packages/numpy/lib/format.py", line 727, in read_array
raise ValueError("Object arrays cannot be loaded when "
ValueError: Object arrays cannot be loaded when allow_pickle=False
I was able to resolve it by setting the np.load()
function with a allow_pickle=True
flag. The following is for Line 41-44 of the BatchLoaderUnk.py
file. Hope this helps others facing the same error.
for split in range(3):
all_data.append(np.load("{}_{}.npy".format(tensor_file, split), allow_pickle=True)) # train, valid, test tensors
all_data_char.append(np.load("{}_{}.npy".format(char_file, split), allow_pickle=True)) # train, valid, test character indices
vocab_mapping = np.load(vocab_file, allow_pickle=True)
May be a naive question - but its not very clear whats the task at hand.
When I looking I training data - its a series of sentences. So is it the task to predict (k+1)th word after looking at 1....k words ?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.