rusiaaman / xlnet-gen Goto Github PK

XLNet for generating language.

License: MIT License

Python 100.00%

xlnet text language-model language-modeling language-generation gpt2 transformer-xl

xlnet-gen's Issues

Understand top-p vs top-k

I'm trying to understand top-p and top-k prediction mode from your code.
(I'm not proficient in Tensorflow and I never heard of top-p before)

Top-k strategy will sample token from the K best logits score.

Top-p strategy will sample token from the X logits where logits score > p.

Am I right ?

Beam search usefulness ?

In most text generating architecture, beam search provide a quality improvement by generating more natural text.

Is it useful to use beam search with XLNet ?

As far as I understand, since token are generated one by one, beam search is completely useless.
But what about generating tokens 2 by 2 ? Would it be useful to add beam search ?

Are you going to try it ?

TypeError: gather_nd() got an unexpected keyword argument 'batch_dims'

Any idea about this below error. Try to execute this.

 Instructions for updating:
 Use keras.layers.dense instead.
 Traceback (most recent call last):
   File "language_generation.py", line 686, in <module>
     main()
   File "language_generation.py", line 591, in main
     predictions, features = prediction_graph()
   File "language_generation.py", line 518, in prediction_graph_no_memory
     inp, inp_mask, seg_id, perm_mask, prev_tokens, prev_conf)
   File "language_generation.py", line 504, in body
     sampled_tokens, confidences = sample_token(logits)
   File "language_generation.py", line 276, in sample_token
     confidences = tf.gather_nd(params=probs, batch_dims=0, indices=samples)
 TypeError: gather_nd() got an unexpected keyword argument 'batch_dims'

I tried removing batch_dims argument and then it fails again after prompt. Any idea about this?

----PROMPT----

WARNING:tensorflow:From /Users/nitin/opt/anaconda3/envs/xlnet/lib/python3.6/site-packages/tensorflow/python/data/ops/dataset_ops.py:429: py_func (from tensorflow.python.ops.script_ops) is deprecated and will be removed in a future version.
Instructions for updating:
tf.py_func is deprecated in TF V2. Instead, use
    tf.py_function, which takes a python function which manipulates tf eager
    tensors instead of numpy arrays. It's easy to convert a tf eager tensor to
    an ndarray (just call tensor.numpy()) but having access to eager tensors
    means `tf.py_function`s can use accelerators such as GPUs as well as
    being differentiable using a gradient tape.
    
  0%|                                                                                                  | 0/1 [00:00<?, ?it/s]
2019-12-23 16:59:50.810097: W tensorflow/core/framework/op_kernel.cc:1401] OP_REQUIRES failed at gather_nd_op.cc:50 : Invalid argument: indices[0] = [4712] does not index into param shape [1,32000]

Looking forward hearing from you.

Gibberish with tensorflow 2.0.0b0

Thank you for the code and for wading through all of those einsum's to get things working!

I'm trying to generate some text using tensorflow 2.0 and have thus far only been able to produce gibberish.

In order to get the code running with the 2.0 api I had to make a few minor modifications: 1) change tf -> tf.compat.v1 in each place the compiler complained, and 2) use tf.keras.layers.LayerNormalization instead of tf.contrib.layers.layer_norm in parts of the attention and ffn code: (e.g.

#output = tf.contrib.layers.layer_norm(output + inp, begin_norm_axis=-1, scope='LayerNorm')
ln = tf.keras.layers.LayerNormalization()
with tf.compat.v1.variable_scope("LayerNorm"):
    output = ln(output + inp)

The documentation for LayerNormalization implies the default behavior is equivalent to using begin_norm_axis=-1 so I don't think this is the issue.

Any ideas to remedy the gibberish?

Update for tensorflow 2

Can you update the code to Tensorflow 2?
Thanks

Colab notebook doesn't work

Hi,
I tried using the Colab notebook you linked to in your README for this repo at:
https://colab.research.google.com/drive/12u-CmB9evMIASNOqJtDW26gmNvSgepBv

However, the last code cell raises an exception:

2021-01-17 19:47:05.890096: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.10.1
Traceback (most recent call last):
  File "language_generation.py", line 16, in <module>
    import model_utils
  File "/content/XLNet-gen/model_utils.py", line 292, in <module>
    class AdamWeightDecayOptimizer(tf.train.Optimizer):
AttributeError: module 'tensorflow._api.v2.train' has no attribute 'Optimizer'

Nothing happens when generating with the ---PROMPT---

----PROMPT----
Hello world, this is some sample text that you can use!!   
WARNING:tensorflow:From /home/timisb/.local/lib/python3.6/site-packages/tensorflow/python/data/ops/dataset_ops.py:494: py_func (from tensorflow.python.ops.script_ops) is deprecated and will be removed in a future version.
Instructions for updating:
tf.py_func is deprecated in TF V2. Instead, there are two
    options available in V2.
    - tf.py_function takes a python function which manipulates tf eager
    tensors instead of numpy arrays. It's easy to convert a tf eager tensor to
    an ndarray (just call tensor.numpy()) but having access to eager tensors
    means `tf.py_function`s can use accelerators such as GPUs as well as
    being differentiable using a gradient tape.
    - tf.numpy_function maintains the semantics of the deprecated tf.py_func
    (it is not differentiable, and manipulates numpy arrays). It drops the
    stateful argument making all functions stateful.
    
WARNING:tensorflow:From language_generation.py:603: DatasetV1.make_one_shot_iterator (from tensorflow.python.data.ops.dataset_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use `for ... in dataset:` to iterate over a dataset. If using `tf.estimator`, return the `Dataset` object directly from your input function. As a last resort, you can use `tf.compat.v1.data.make_one_shot_iterator(dataset)`.
  0%|                                                                                                                                               | 0/1 [00:00<?, ?it/s]

Been stuck like this for 1 hour.

Protobuf error after inputting prompt

I run into the following error after inputting my prompt:

----PROMPT----
Today I plan to
WARNING:tensorflow:From language_generation.py:605: DatasetV1.make_one_shot_iterator (from tensorflow.python.data.ops.dataset_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use `for ... in dataset:` to iterate over a dataset. If using `tf.estimator`, return the `Dataset` object directly from your input function. As a last resort, you can use `tf.compat.v1.data.make_one_shot_iterator(dataset)`.
  0% 0/25 [00:00<?, ?it/s]2020-04-19 15:44:22.416928: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10
2020-04-19 15:44:22.756233: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
[libprotobuf FATAL /sentencepiece/src/../third_party/protobuf-lite/google/protobuf/repeated_field.h:1506] CHECK failed: (index) < (current_size_): 
terminate called after throwing an instance of 'google::protobuf::FatalException'
  what():  CHECK failed: (index) < (current_size_):

Any suggestions for how to fix/work around this?

Inference time

Thanks for the code and the article !

I've tried running your code, and it works, but the inference time is very, very long : ~45min for 1 sample...

I understand it is because we need to recompute all the attention of non-target tokens every 2 steps and it's running on CPU, but it is still too long.

Can I run this code on GPU / Colab TPU ?
If yes, how ?

How to fine tune xlnet large model with new text using XLnet-gen?

Hi:

Thanks for this repo. Text generation is my main interest and I was wondering how the xlnet large model can be fine tuned with new text then used as a model in XLnet-gen using language_generation.py

I can create a small base model from scratch using your repo but I don't have the gpu power to generate a large one.

Since the gpt-2 fine tuning repo by nshepperd using the OpenAi 345M model is very easy to use is it possible to use a similar process in XLnet-gen?

The fine tuning examples given in the original XLnet repo don't seem applicable or easy to edit for text generation.

Any suggestions or new scripts are welcome.

Thanks

OOM error on colab TPU when pretraining XLNet

I am sorry to bother you here with the problme about xlnet pretraining.

I saw your comment on xlnet issues, you has the same error: Error recorded from outfeed: Bad hardware status: 0x1, on colab TPU. Nowadays, I try to pretrain XLNet on colab tpu, and I am meeting the problem too. I also have tried with minimal batch_size= 16, but still get the error. So I want to ask you if you have solved the problem, and can you pretrain xlnet on colab TPU now?

Thanks!

Collab notebook results VS samples?

So I've tried using the collab notebook for conditional generation, but the results are nowhere near as long or as good quality as the samples listed in the Medium article. They seem to lose coherency very quickly and start repeating phrases or single words repeatedly. What were the arguments used to create the samples?

Filtering in token sampling process

This is not a problem, but rather a discovery.

I was working on a Japanese version of XLNet recently, due to lack of training data(nothing more than JPNWiki) and complexity of the language itself, the model was never really good in terms of in-sample and heldout perplexity. But anyways I decided to give it a try on language generation.

The discovery is that, my model will frequently try to generate an eod token and skip to another completely irrelevant topic , often end up making up a non-existent wikipedia article. But if I purge the activation of eod token in predicted logits before sampling process, it will never be able to generate that token and end the topic it was given.

I also found purging activation corresponding to bad tokens(those that tend to make anything behind it catastrophic gibberish) will also largely help the quality of the generated text.

Even though the model was never good in the first place, I still managed to improve the generated text from complete garbage to acceptable text that reads as if it was written by someone drunk.

rusiaaman / xlnet-gen Goto Github PK

xlnet-gen's Issues

Understand top-p vs top-k

Beam search usefulness ?

TypeError: gather_nd() got an unexpected keyword argument 'batch_dims'

Gibberish with tensorflow 2.0.0b0

Update for tensorflow 2

Colab notebook doesn't work

Nothing happens when generating with the ---PROMPT---

Protobuf error after inputting prompt

Inference time

How to fine tune xlnet large model with new text using XLnet-gen?

OOM error on colab TPU when pretraining XLNet

Collab notebook results VS samples?

Filtering in token sampling process

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent