jarfo / kchar Goto Github PK

View Code? Open in Web Editor NEW

118.0 118.0 51.0 1.92 MB

Character-Aware Neural Language Models. A keras-based implementation

Python 100.00%

kchar's People

Contributors

Stargazers

Watchers

Forkers

cyente zherongz dilip-dmk kaeflint raminia nymph332088 yetanothertimes glebalshanskii sanket-patil vyraun jitindua wpmarinho devinbostil binbenliu jingweimo yuanzhike buptpriswang sigmaquan aa1607 maxaschwarzer ollmer speechx hsajjad rukor atlasderek frankxu2004 emanuelaboros futa hitxujian duytinvo jihopark shubhampachori12110095 460130107 zhuhaiqing42 sparkingarthur oliverccccct vikas-kumar-infrrd afcarl vikasmech by2101 leiqi bunyodjon xiaonan07 ashish-gupta03 b2220333 hoangcuong2011 jivnesh xuezhizeng cherylxy gohjiayi

kchar's Issues

tcmalloc alert

While using this code on large datasets, I face the tcmalloc alert and the process is killed. What could be a solution to this? Can we process the data on several batches?

tcmalloc: large alloc 26052075520 bytes == 0x1fbca000 @ 0x7f4a26405f21 0x7f4a23f93ae5 0x7f4a23ff68d3 0x7f4a23ff8816 0x7f4a24090b28 0x4c4b0b 0x54f3c4 0x551ee0 0x54efc1 0x54f24d 0x553aaf 0x54e4c8 0x5582c2 0x459c11 0x45969e 0x4e0b5b 0x4da8d7 0x459893 0x54f117 0x553aaf 0x54e4c8 0x54f4f6 0x553aaf 0x54efc1 0x54ff73 0x42b3c9 0x42b5b5 0x44182b 0x421f64 0x7f4a2524d1c1 0x42201a

'--skip_train' ValueError: ('Could not interpret optimizer identifier:', <class 'keras.optimizers.SGD'>)

For a quick test, I first test on a small model by running the code:
python train_lstm3.py --rnn_size 50 --highway_layer 1 --feature_maps 5 10 15 20 25 30 --kernels 1 2 3 4 5 6 --checkpoint_dir LSTM4 --savefile char-small --max_epochs 1

It gives the results:
Epoch 1/1
1327/1327 [==============================] - 2175s - loss: 6.6535
Epoch 1/1. Validation loss: 708.149629583
Perplexity on test set: 672.399581347

With the trained model, I perform evaluation on the test again by running the codes:
python train_lstm3.py --rnn_size 50 --highway_layer 1 --feature_maps 5 10 15 20 25 30 --kernels 1 2 3 4 5 6 --checkpoint_dir LSTM4 --savefile char-small --max_epochs 1 --skip_train

But it gives the error message:
C:\Users\jingweimo\Anaconda2\lib\site-packages\keras\legacy\layers.py:654: UserWarning: The Highwaylayer is deprecated and will be removed after 06/2017. warnings.warn('TheHighway` layer is deprecated '

Traceback (most recent call last):
File "train.py", line 114, in
main(params)
File "train.py", line 46, in main
model = load_model('{}/{}.json'.format(opt.checkpoint_dir, opt.savefile))
File "D:\LSTM packages\kchar-master\kchar-master\model\LSTMCNN.py", line 190, in load_model
model.compile(loss='sparse_categorical_crossentropy', optimizer=SGD)
File "C:\Users\jingweimo\Anaconda2\lib\site-packages\keras\engine\training.py", line 720, in compile
self.optimizer = optimizers.get(optimizer)
File "C:\Users\jingweimo\Anaconda2\lib\site-packages\keras\optimizers.py", line 710, in get
identifier)
ValueError: ('Could not interpret optimizer identifier:', <class 'keras.optimizers.SGD'>)`

It seems there is something wrong with load_model in LSTMCNN.py. I feel you may need to change
model.compile(loss='sparse_categorical_crossentropy', optimizer=SGD)
to
model.compile(loss='sparse_categorical_crossentropy', optimizer=SGD())

Error when running the code

I'm running this code with the tensorflow backend, and encounter the following errors:

[lsong10@bh25fen sub.kchar]$ more train.err
Using TensorFlow backend.
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcublas.so.8.0 locally
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcudnn.so.5 locally
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcufft.so.8.0 locally
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcuda.so.1 locally
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcurand.so.8.0 locally
Traceback (most recent call last):
File "train.py", line 99, in
main(params)
File "train.py", line 24, in main
model = LSTMCNN(opt)
File "/gpfs/fs2/scratch/lsong10/exp.full_BoW/sub.kchar/model/LSTMCNN.py", line 95, in LSTMCNN
cnn = CNN(opt.seq_length, opt.max_word_l, opt.char_vec_size, opt.feature_maps, opt.kernels, chars_embedding)
File "/gpfs/fs2/scratch/lsong10/exp.full_BoW/sub.kchar/model/LSTMCNN.py", line 63, in CNN
conv = Convolution2D(feature_map, 1, kernel, activation='tanh', dim_ordering='tf')(x)
File "/software/python/2.7.12/lib/python2.7/site-packages/keras/engine/topology.py", line 572, in call
self.add_inbound_node(inbound_layers, node_indices, tensor_indices)
File "/software/python/2.7.12/lib/python2.7/site-packages/keras/engine/topology.py", line 635, in add_inbound_node
Node.create_node(self, inbound_layers, node_indices, tensor_indices)
File "/software/python/2.7.12/lib/python2.7/site-packages/keras/engine/topology.py", line 166, in create_node
output_tensors = to_list(outbound_layer.call(input_tensors[0], mask=input_masks[0]))
File "/software/python/2.7.12/lib/python2.7/site-packages/keras/layers/convolutional.py", line 475, in call
filter_shape=self.W_shape)
File "/software/python/2.7.12/lib/python2.7/site-packages/keras/backend/tensorflow_backend.py", line 2691, in conv2d
x = tf.nn.conv2d(x, kernel, strides, padding=padding)
File "/software/python/2.7.12/lib/python2.7/site-packages/tensorflow/python/ops/gen_nn_ops.py", line 396, in conv2d
data_format=data_format, name=name)
File "/software/python/2.7.12/lib/python2.7/site-packages/tensorflow/python/framework/op_def_library.py", line 585, in apply_op
param_name=input_name)
File "/software/python/2.7.12/lib/python2.7/site-packages/tensorflow/python/framework/op_def_library.py", line 61, in _SatisfiesTypeCon
straint
", ".join(dtypes.as_dtype(x).name for x in allowed_list)))
TypeError: Value passed to parameter 'input' has DataType int32 not in list of allowed values: float16, float32, float64

Example of sentences generation

I faced a problems when tried to use trained model.

What i tried to do:

model = LSTMCNN(opts)
model.load_weights('words-large.h5')
...
input = some_seed_input
for ...:
    x = {'word': np.array([input])}
    prediction = model.predict(x)
    input = input[1:len(input)]

Something like this predicts several words, then predictions becomes like "the the the be of the".

What I am doing wrong?
Could you please show basic example of using trained model to generate text, please?

the size of kernel

def CNN(seq_length, length, feature_maps, kernels, x):
    concat_input = []
    for feature_map, kernel in zip(feature_maps, kernels):
        reduced_l = length - kernel + 1
        conv = Conv2D(feature_map, **(1, kernel),** activation='tanh', data_format="channels_last")(x)
        maxp = MaxPooling2D((1, reduced_l), data_format="channels_last")(conv)
        concat_input.append(maxp)

    x = Concatenate()(concat_input)
    x = Reshape((seq_length, sum(feature_maps)))(x)
    return x

Here, I thought the size of kernel should be d*w, from the description of paper.
It means the output of the embedding layer should be (20, 35, 31, 15, 1).
Thank you in advance for your responses.

Currently working?

Is this implementation currently working for you @jarfo? I currently get a large amount of errors when trying to run it.

shape of char input?

@jarfo I couldn't understand why the shape of char layer is (opt.batch_size, opt.seq_length, opt.max_word_l), eg. (20, 35, 21); why not (batch_size, seq_length), eg.(20, 35)?

Running error with train.py

I run train.py for a quick check the results, but get the error message:

`one-time setup: preprocessing input train/valid/test files in dir: data/ptb
Processing text into tensors...
Traceback (most recent call last):

File "", line 58, in
main(params)

File "", line 2, in main
loader = BatchLoaderUnk(opt.tokens, opt.data_dir, opt.batch_size, opt.seq_length, opt.max_word_l, opt.n_words, opt.n_chars)

File "D:\Research\Manuscript\Simplified LSTM\LSTM packages\kchar-master\kchar-master\util\BatchLoaderUnk.py", line 34, in init
self.text_to_tensor(tokens, input_files, vocab_file, tensor_file, char_file, max_word_l)

File "D:\Research\Manuscript\Simplified LSTM\LSTM packages\kchar-master\kchar-master\util\BatchLoaderUnk.py", line 148, in text_to_tensor
f = codecs.open(input_files[split], 'r', encoding)

File "C:\Users\Yuzhen Lu\Anaconda2\lib\codecs.py", line 896, in open
file = builtin.open(filename, mode, buffering)

IOError: [Errno 2] No such file or directory: 'data/ptb\train.txt'`

I use python2.7 (spyder ide) with keras 2.0.5 and theano 0.10

Error in evaluate of character level modelling

I tried evaluate.py with character level model using following command
python evaluate.py --model cv/char-large --vocabulary data/ptb/vocab.npz --init init.npy --text data/ptb/test.txt --calc
I got shape error which is a problem of inputting, i guess but i don't know how to give input and fix this problem...

Traceback (most recent call last):
File "evaluate.py", line 263, in
main(args.model, args.vocabulary, args.init, args.text, args.calc)
File "evaluate.py", line 174, in main
lprob, nwords, output = ev.logprob(line)
File "evaluate.py", line 107, in logprob
out = self.model.predict(np.array([x['word'][wrd]]), batch_size=1)
File "/usr/local/lib/python3.5/dist-packages/Keras-2.0.6-py3.5.egg/keras/engine/training.py", line 1499, in predict
File "/usr/local/lib/python3.5/dist-packages/Keras-2.0.6-py3.5.egg/keras/engine/training.py", line 128, in _standardize_input_data
ValueError: Error when checking : expected chars to have 3 dimensions, but got array with shape (1, 1)

Tensorflow training incremental

I had trained a system with some sentences(say 10000 sentences) and i want to train the system with 200 more sentences using training incremental. How can i achieve this using this model, so that i can train a system with total10200 sentences.

   Thanks in advance...

Error when running the following command

For word-level training model, it works
python train.py --savefile word-large --highway_layers 0 --use_chars 0 --use_words 1

But for Char-level training model
python train.py --savefile char-large
The following error is coming...

Traceback (most recent call last):
File "train.py", line 115, in
main(params)
File "train.py", line 37, in main
model = LSTMCNN(opt)
File "/home/prasad/tensorflow/kchar3/model/LSTMCNN.py", line 97, in LSTMCNN
cnn = CNN(opt.seq_length, opt.max_word_l, opt.char_vec_size, opt.feature_maps, opt.kernels, chars_embedding)
File "/home/prasad/tensorflow/kchar3/model/LSTMCNN.py", line 65, in CNN
conv = Convolution2D(feature_map, 1, kernel, activation='tanh', dim_ordering='tf')(x)
File "/usr/lib64/python2.7/site-packages/keras/engine/topology.py", line 572, in call
self.add_inbound_node(inbound_layers, node_indices, tensor_indices)
File "/usr/lib64/python2.7/site-packages/keras/engine/topology.py", line 635, in add_inbound_node
Node.create_node(self, inbound_layers, node_indices, tensor_indices)
File "/usr/lib64/python2.7/site-packages/keras/engine/topology.py", line 166, in create_node
output_tensors = to_list(outbound_layer.call(input_tensors[0], mask=input_masks[0]))
File "/usr/lib64/python2.7/site-packages/keras/layers/convolutional.py", line 475, in call
filter_shape=self.W_shape)
File "/usr/lib64/python2.7/site-packages/keras/backend/tensorflow_backend.py", line 2691, in conv2d
x = tf.nn.conv2d(x, kernel, strides, padding=padding)
File "/usr/lib/python2.7/site-packages/tensorflow/python/ops/gen_nn_ops.py", line 396, in conv2d
data_format=data_format, name=name)
File "/usr/lib/python2.7/site-packages/tensorflow/python/framework/op_def_library.py", line 585, in apply_op
param_name=input_name)
File "/usr/lib/python2.7/site-packages/tensorflow/python/framework/op_def_library.py", line 61, in _SatisfiesTypeConstraint
", ".join(dtypes.as_dtype(x).name for x in allowed_list)))
TypeError: Value passed to parameter 'input' has DataType int32 not in list of allowed values: float16, float32, float64

Can anyone please give me the solution for this issue and what is the difference between both the training model and which one is giving better prediction ???

Error when doing evaluate

Hi,

I'm trying to train the model like this
python train.py --save_every 1 --data_dir data/test --n_chars 10000 --batch
_size 80
Here's the output
creating an LSTM-CNN with 2 layers

Layer (type) Output Shape Param # Connected to

chars (InputLayer) (80, 35, 65) 0

timedistributed_1 (TimeDistribute(80, 35, 65, 15) 88200 chars[0][0]

convolution2d_1 (Convolution2D) (80, 35, 65, 50) 800 timedistributed_1[0][0]

convolution2d_2 (Convolution2D) (80, 35, 64, 100) 3100 timedistributed_1[0][0]

convolution2d_3 (Convolution2D) (80, 35, 63, 150) 6900 timedistributed_1[0][0]

convolution2d_4 (Convolution2D) (80, 35, 62, 200) 12200 timedistributed_1[0][0]

convolution2d_5 (Convolution2D) (80, 35, 61, 200) 15200 timedistributed_1[0][0]

convolution2d_6 (Convolution2D) (80, 35, 60, 200) 18200 timedistributed_1[0][0]

convolution2d_7 (Convolution2D) (80, 35, 59, 200) 21200 timedistributed_1[0][0]

maxpooling2d_1 (MaxPooling2D) (80, 35, 1, 50) 0 convolution2d_1[0][0]

maxpooling2d_2 (MaxPooling2D) (80, 35, 1, 100) 0 convolution2d_2[0][0]

maxpooling2d_3 (MaxPooling2D) (80, 35, 1, 150) 0 convolution2d_3[0][0]

maxpooling2d_4 (MaxPooling2D) (80, 35, 1, 200) 0 convolution2d_4[0][0]

maxpooling2d_5 (MaxPooling2D) (80, 35, 1, 200) 0 convolution2d_5[0][0]

maxpooling2d_6 (MaxPooling2D) (80, 35, 1, 200) 0 convolution2d_6[0][0]

maxpooling2d_7 (MaxPooling2D) (80, 35, 1, 200) 0 convolution2d_7[0][0]

merge_1 (Merge) (80, 35, 1, 1100) 0 maxpooling2d_1[0][0]
maxpooling2d_2[0][0]
maxpooling2d_3[0][0]
maxpooling2d_4[0][0]
maxpooling2d_5[0][0]
maxpooling2d_6[0][0]
maxpooling2d_7[0][0]

reshape_1 (Reshape) (80, 35, 1100) 0 merge_1[0][0]

timedistributed_2 (TimeDistribute(80, 35, 1100) 2422200 reshape_1[0][0]

timedistributed_3 (TimeDistribute(80, 35, 1100) 2422200 timedistributed_2[0][0]

lstm_1 (LSTM) (80, 35, 650) 4552600 timedistributed_3[0][0]

dropout_1 (Dropout) (80, 35, 650) 0 lstm_1[0][0]

lstm_2 (LSTM) (80, 35, 650) 3382600 dropout_1[0][0]

dropout_2 (Dropout) (80, 35, 650) 0 lstm_2[0][0]

timedistributed_4 (TimeDistribute(80, 35, 30000) 19530000 dropout_2[0][0]

Total params: 32475400
It outputs a lsm_char_large_epoch20.h5 which I have renamed it to char-large.h5 and evaluate it with
python evaluate.py --model cv/char-large --vocabulary data/test/vocab.npz --text data/test/test.txt --calc

Word vocab size: 30000, Char vocab size: 5880

Layer (type) Output Shape Param # Connected to

chars (InputLayer) (1, 1, 65) 0

timedistributed_1 (TimeDistribute(1, 1, 65, 15) 88200 chars[0][0]

convolution2d_1 (Convolution2D) (1, 1, 65, 50) 800 timedistributed_1[0][0]

convolution2d_2 (Convolution2D) (1, 1, 64, 100) 3100 timedistributed_1[0][0]

convolution2d_3 (Convolution2D) (1, 1, 63, 150) 6900 timedistributed_1[0][0]

convolution2d_4 (Convolution2D) (1, 1, 62, 200) 12200 timedistributed_1[0][0]

convolution2d_5 (Convolution2D) (1, 1, 61, 200) 15200 timedistributed_1[0][0]

convolution2d_6 (Convolution2D) (1, 1, 60, 200) 18200 timedistributed_1[0][0]

convolution2d_7 (Convolution2D) (1, 1, 59, 200) 21200 timedistributed_1[0][0]

maxpooling2d_1 (MaxPooling2D) (1, 1, 1, 50) 0 convolution2d_1[0][0]

maxpooling2d_2 (MaxPooling2D) (1, 1, 1, 100) 0 convolution2d_2[0][0]

maxpooling2d_3 (MaxPooling2D) (1, 1, 1, 150) 0 convolution2d_3[0][0]

maxpooling2d_4 (MaxPooling2D) (1, 1, 1, 200) 0 convolution2d_4[0][0]

maxpooling2d_5 (MaxPooling2D) (1, 1, 1, 200) 0 convolution2d_5[0][0]

maxpooling2d_6 (MaxPooling2D) (1, 1, 1, 200) 0 convolution2d_6[0][0]

maxpooling2d_7 (MaxPooling2D) (1, 1, 1, 200) 0 convolution2d_7[0][0]

merge_1 (Merge) (1, 1, 1, 1100) 0 maxpooling2d_1[0][0]
maxpooling2d_2[0][0]
maxpooling2d_3[0][0]
maxpooling2d_4[0][0]
maxpooling2d_5[0][0]
maxpooling2d_6[0][0]
maxpooling2d_7[0][0]

reshape_1 (Reshape) (1, 1, 1100) 0 merge_1[0][0]

timedistributed_2 (TimeDistribute(1, 1, 1100) 2422200 reshape_1[0][0]

timedistributed_3 (TimeDistribute(1, 1, 1100) 2422200 timedistributed_2[0][0]

lstm_1 (LSTM) (1, 1, 650) 4552600 timedistributed_3[0][0]

dropout_1 (Dropout) (1, 1, 650) 0 lstm_1[0][0]

lstm_2 (LSTM) (1, 1, 650) 3382600 dropout_1[0][0]

dropout_2 (Dropout) (1, 1, 650) 0 lstm_2[0][0]

timedistributed_4 (TimeDistribute(1, 1, 30000) 19530000 dropout_2[0][0]

Total params: 32475400

And then it raise out a error like this.
Error when checking model target: expected timedistributed_4 to have shape (1, 1, 30000) but got array with shape (4, 1, 1)
Is there any problem with the model it outputs?

Tensorflow support?

Hi,
I was running into some issues running your code on keras with a tensorflow backend.

ValueError: The shape for while/Merge_2:0 is not an invariant for the loop. It enters the loop with shape (35, 21), but has shape (20, 21, 15) after one iteration. Provide shape invariants using either the `shape_invariants` argument of tf.while_loop or set_shape() on the loop variables.

Is this something you are aware of ?

Thanks so much for your work on this by the way!

ValueError: Object arrays cannot be loaded when allow_pickle=False

When running python train.py --savefile char-large, I've encountered the error below.

After first pass of data, max word length is: 21
Token count: train 929589, val 73760, test 82430
saving data/ptb/data_0.npy
saving data/ptb/data_char_0.npy
saving data/ptb/data_1.npy
saving data/ptb/data_char_1.npy
saving data/ptb/data_2.npy
saving data/ptb/data_char_2.npy
saving data/ptb/vocab.npz
loading data files...
Traceback (most recent call last):
  File "train.py", line 101, in <module>
    main(params)
  File "train.py", line 15, in main
    loader = BatchLoaderUnk(opt.tokens, opt.data_dir, opt.batch_size, opt.seq_length, opt.max_word_l, opt.n_words, opt.n_chars)
  File "/home/jiayi/kchar/util/BatchLoaderUnk.py", line 45, in __init__
    self.idx2word, self.word2idx, self.idx2char, self.char2idx = vocab_unpack(vocab_mapping)
  File "/home/jiayi/kchar/util/BatchLoaderUnk.py", line 19, in vocab_unpack
    return vocab['idx2word'], vocab['word2idx'], vocab['idx2char'], vocab['char2idx']
  File "/home/jiayi/.local/lib/python3.6/site-packages/numpy/lib/npyio.py", line 255, in __getitem__
    pickle_kwargs=self.pickle_kwargs)
  File "/home/jiayi/.local/lib/python3.6/site-packages/numpy/lib/format.py", line 727, in read_array
    raise ValueError("Object arrays cannot be loaded when "
ValueError: Object arrays cannot be loaded when allow_pickle=False

I was able to resolve it by setting the np.load() function with a allow_pickle=True flag. The following is for Line 41-44 of the BatchLoaderUnk.py file. Hope this helps others facing the same error.

for split in range(3):
    all_data.append(np.load("{}_{}.npy".format(tensor_file, split), allow_pickle=True))  # train, valid, test tensors
    all_data_char.append(np.load("{}_{}.npy".format(char_file, split), allow_pickle=True))  # train, valid, test character indices
vocab_mapping = np.load(vocab_file, allow_pickle=True)

What is the task at hand ?

May be a naive question - but its not very clear whats the task at hand.
When I looking I training data - its a series of sentences. So is it the task to predict (k+1)th word after looking at 1....k words ?

jarfo / kchar Goto Github PK

kchar's People

Contributors

Stargazers

Watchers

Forkers

kchar's Issues

Layer (type) Output Shape Param # Connected to

timedistributed_4 (TimeDistribute(80, 35, 30000) 19530000 dropout_2[0][0]

Layer (type) Output Shape Param # Connected to

timedistributed_4 (TimeDistribute(1, 1, 30000) 19530000 dropout_2[0][0]

Recommend Projects

Recommend Topics

Recommend Org