rusiaaman / xlnet-gen Goto Github PK

XLNet for generating language.

License: MIT License

Python 100.00%

xlnet text language-model language-modeling language-generation gpt2 transformer-xl

xlnet-gen's Introduction

Update 30-01-2021

This repository is archived. Please use https://github.com/huggingface/transformers which supports XLNet language generation in both pytorch and tensorflow

XLnet-gen

Generate language using XLNet. This is not an official implementation. Samples are included at the end of this README as well as in the samples folder.

Medium article as a summary of this effort: https://medium.com/@amanrusia/xlnet-speaks-comparison-to-gpt-2-ea1a4e9ba39e

Colab notebook where you can give prompts: https://colab.research.google.com/drive/12u-CmB9evMIASNOqJtDW26gmNvSgepBv

Usage

Step 1: Download and install requirements (change tensorflow to tensorflow-gpu in requirements.txt if needed)
```
git clone https://github.com/rusiaaman/XLnet-gen.git && cd XLnet-gen
pip install -r requirements.txt
```

Step 2: Download and unzip pretrained XLNet model from https://github.com/zihangdai/xlnet/

wget https://storage.googleapis.com/xlnet/released_models/cased_L-24_H-1024_A-16.zip
unzip cased_L-24_H-1024_A-16.zip

Step 3: Either run in interactive mode using --interactive flag or pass an input file using --input_file argument as described later. Use --unconditional for generating text without any conditioned text.

python language_generation.py\
     --model_config_path=xlnet_cased_L-24_H-1024_A-16/xlnet_config.json\
     --init_checkpoint=xlnet_cased_L-24_H-1024_A-16/xlnet_model.ckpt\
     --spiece_model_file=xlnet_cased_L-24_H-1024_A-16/spiece.model\
     --interactive\
     --max_mem_length=256\
     --num_toks_pred=256\
     --num_samples=1\
     --top_p=0.9\
     --bidirectional_eachstep

Important Notes

Methodology

XLNet is a novel permutation based language model. In current implementation of XLNet-gen, we generate texts from left to right.

XLNet is trained using num_predict=85, which means 85 tokens out of 512 in a single example are predicted at a time. More importantly rest of the 512-85 = 427 tokens can attend to each other in the attention mechanism (bidrectional attention). This creates problems with conventional causal attention mechanism during language generation. Following problems were faced:

Use of small context leads to gibberish predictions. Currently a hard-coded random text is included as a leading text followed by <eod>, the end of document token, along with the desired context. This helps with small prompts.
Due to the nature of pretraining, context tokens attend to each other in bi-directional way. And the context is spread throughout the input of the model. Because of this generating tokens left to right in causal way leads to suboptimal output. Recalculating hidden states each step allows us to have bidirectional attention to each new generated token which substantially improve the generation. To do the same use --bidirectional_eachstep flag

Explanation of flags (specific to XLNet-gen)

--max_mem_length Max sequence length used for prediction. NOTE: number of tokens to be predicted can be greater than this, but the context gets truncated at the beginning. For --autoregressive case, this sets the size of the 'memory'.
--num_toks_pred Number of tokens to predict. This can be as large as we want, however the context is truncated if longer than max_mem_length for the default case.
--num_samples For each prompt the number of samples to generate.
--interactive Command line prompt input.
--input_file path to the file which is used for conditional prompts. Prompts are separted by an empty line. The output is generated in the same location in a new file with the same file name appended with ".xlnet".
--top_p top_p paramter for nucleus sampling. Set this 0 if you want to use top_k sampling process.
--top_k top_k parameter for top_k sampling. Only top_k most probable tokens are considered for sampling. Set top_p=0 if you want to use this.
--unconditional Generates unconditional samples. Ignores --interactive and --input_file flags.
--bidirectional_eachstep leads to much better output at the expense of computation. Explanation in methodology.

Sampling schemes

top-k sampling: use --top_k flag, ensure --top_p=0
Nucleus sampling: use --top_p flag
Permutation sampling

Notes on quality of the samples

There is a vast difference in quality with and without bidirectional_eachstep flag, which turns on re-calculation of hidden states with bidirectional attention everytime a new token is generated. This is probably due to the way XLNet was pretrained--with sparse masks and bidrectional context. However, I am currently investigating this issue and this could be an area of improvement for XLNet.
Generation of artifacts like empty quotes "", " ", multiple hyphens ---, and combination of them ""-" can all be attributed to bad training data. Specifically, there seems to be bugs in https://github.com/attardi/wikiextractor which leads to generation of empty quotes and other such artifacts. This is probably the same library that was used by the authors.
Wikipedia has a lot of ellipses in its articles which is reflected in the generation. The wiki data dump has it in the form with and without spaces: both . . ., and ....
The XLNet can only predict end of paragraph and end of documents, but not new line characters or tabs, so it doesn't generate good structure of the documents
Vocabulary is limited to English and not all Unicode characters are in the vocabulary. Other language characters and emojis can't be generated are decoded as .

Samples

We’ve trained a large-scale unsupervised language model which generates coherent paragraphs of text, achieves state-of-the-art performance on many language modeling benchmarks, and performs rudimentary reading comprehension, machine translation, question answering, and summarization tasks in our lab using automated translation/text analysis with an automated computer system, Pro (Pro Text Analysis). From this training we have developed an automated translation tool, Pro Translation. Our system is known as the Pro Translation Suite, and is designed for translation between text, computer documents, and web pages. All of the tools in the Pro Translation Suite provide both text and "real time" translation. The program also features extensive user-friendly interfaces for user-directed development and customization of the software. The Pro Translation Suite features a number of features which offer new and innovative translation tasks. In addition, the Pro Translation Suite offers enhanced support for "realtime" translation systems, such as translation for Web pages, "real time" translation of language models, and machine translation.

We currently have a highly optimized robot in the development stage and support for this robot is currently being increased to include a (possibly) real-time translation engine, "The Trans-To-Trans". The Trans-To-Trans robot has been optimized, optimized and (and perhaps) may become a real-time translation engine, " The Trans-To-Trans". As one of our main goals, we will also be testing this robot against real time translation standards and benchmarks. Additionally, this robot has been made available publicly to evaluate and use, at no cost to the public.

The Trans-To-Trans robot has been built to meet a "real time" translation requirement (which is a requirement of English translation methods), which is the language to which all other robot translation will be converted. It has been designed for trans-lingual translation, such as translation between English and other popular languages. We expect to use this robot to do such translation in the future, and have been working on a translation tool, which we will be releasing near the end of the year. The Trans-To-Trans robot has been optimized to meet a "real time" translation requirement. This is a requirement of English translation methods. We have been working on a translation tool, which will be released near the end of the year. We have been working on a translation tool, which will be released near the end of the year.

Before boarding your rocket to Mars, remember to pack these items. First, you must pack a rocket case, or booster, for your rocket. The launcher is a special product developed by the World Space Program, which is a government agency of the United States. When you get the launcher, the rocket will be built to you. And it will take only 3 days! Another important item you should pack is the rocket engine. The rocket engine is a component of the rocket, that is made from two parts. The engine consists of two "core" chambers. The main chamber is constructed of a ceramic material. The second chamber is made of stainless steel. A solid core of the second chamber, called the "fire pit", is made from carbon fiber. The "fire pit" is sealed with seal-on plastic and then put into a hollow box, or “spar case". The spar case contains all of the other components, such as the engines, and the components inside the spar case are then assembled at the launch site. While waiting for the rocket to be assembled, you can rest and drink your milk or water. At the launch site, you will be given some kind of an instrument, or scope, and guided with the scope. The mission is to launch the rocket. The rocket will leave the launch site. The rocket will travel approximately 5.5 hours.

When the rocket arrives, you will be given a helmet, and then the rocket will launch. As you are lifting off off, keep your eyes open, and try to keep on track. It is important that you stay open and focused on the mission. If you are able to do this, then you will be able to fly away safely. Also, remember to drink water and drink fresh milk. Then, try your best to keep your body from over heating up while on the flight of your rocket.

There are many things that come into use in a restaurant kitchen. A dish is a component of the cook and it serves a food in a particular form or a certain way. Other types of dishes are prepared according to the needs of the customer or the guest. There are also different types of food that comes into use in the food service company

Todo

Comparison with GPT-2.
Permutation based decoding instead of left-to-right only.

xlnet-gen's People

Contributors

Stargazers

Watchers

xlnet-gen's Issues

Beam search usefulness ?

In most text generating architecture, beam search provide a quality improvement by generating more natural text.

Is it useful to use beam search with XLNet ?

As far as I understand, since token are generated one by one, beam search is completely useless.
But what about generating tokens 2 by 2 ? Would it be useful to add beam search ?

Are you going to try it ?

Protobuf error after inputting prompt

I run into the following error after inputting my prompt:

----PROMPT----
Today I plan to
WARNING:tensorflow:From language_generation.py:605: DatasetV1.make_one_shot_iterator (from tensorflow.python.data.ops.dataset_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use `for ... in dataset:` to iterate over a dataset. If using `tf.estimator`, return the `Dataset` object directly from your input function. As a last resort, you can use `tf.compat.v1.data.make_one_shot_iterator(dataset)`.
  0% 0/25 [00:00<?, ?it/s]2020-04-19 15:44:22.416928: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10
2020-04-19 15:44:22.756233: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
[libprotobuf FATAL /sentencepiece/src/../third_party/protobuf-lite/google/protobuf/repeated_field.h:1506] CHECK failed: (index) < (current_size_): 
terminate called after throwing an instance of 'google::protobuf::FatalException'
  what():  CHECK failed: (index) < (current_size_):

Any suggestions for how to fix/work around this?

TypeError: gather_nd() got an unexpected keyword argument 'batch_dims'

Any idea about this below error. Try to execute this.

 Instructions for updating:
 Use keras.layers.dense instead.
 Traceback (most recent call last):
   File "language_generation.py", line 686, in <module>
     main()
   File "language_generation.py", line 591, in main
     predictions, features = prediction_graph()
   File "language_generation.py", line 518, in prediction_graph_no_memory
     inp, inp_mask, seg_id, perm_mask, prev_tokens, prev_conf)
   File "language_generation.py", line 504, in body
     sampled_tokens, confidences = sample_token(logits)
   File "language_generation.py", line 276, in sample_token
     confidences = tf.gather_nd(params=probs, batch_dims=0, indices=samples)
 TypeError: gather_nd() got an unexpected keyword argument 'batch_dims'

I tried removing batch_dims argument and then it fails again after prompt. Any idea about this?

----PROMPT----

WARNING:tensorflow:From /Users/nitin/opt/anaconda3/envs/xlnet/lib/python3.6/site-packages/tensorflow/python/data/ops/dataset_ops.py:429: py_func (from tensorflow.python.ops.script_ops) is deprecated and will be removed in a future version.
Instructions for updating:
tf.py_func is deprecated in TF V2. Instead, use
    tf.py_function, which takes a python function which manipulates tf eager
    tensors instead of numpy arrays. It's easy to convert a tf eager tensor to
    an ndarray (just call tensor.numpy()) but having access to eager tensors
    means `tf.py_function`s can use accelerators such as GPUs as well as
    being differentiable using a gradient tape.
    
  0%|                                                                                                  | 0/1 [00:00<?, ?it/s]
2019-12-23 16:59:50.810097: W tensorflow/core/framework/op_kernel.cc:1401] OP_REQUIRES failed at gather_nd_op.cc:50 : Invalid argument: indices[0] = [4712] does not index into param shape [1,32000]

Looking forward hearing from you.

Update for tensorflow 2

Can you update the code to Tensorflow 2?
Thanks

Nothing happens when generating with the ---PROMPT---

----PROMPT----
Hello world, this is some sample text that you can use!!   
WARNING:tensorflow:From /home/timisb/.local/lib/python3.6/site-packages/tensorflow/python/data/ops/dataset_ops.py:494: py_func (from tensorflow.python.ops.script_ops) is deprecated and will be removed in a future version.
Instructions for updating:
tf.py_func is deprecated in TF V2. Instead, there are two
    options available in V2.
    - tf.py_function takes a python function which manipulates tf eager
    tensors instead of numpy arrays. It's easy to convert a tf eager tensor to
    an ndarray (just call tensor.numpy()) but having access to eager tensors
    means `tf.py_function`s can use accelerators such as GPUs as well as
    being differentiable using a gradient tape.
    - tf.numpy_function maintains the semantics of the deprecated tf.py_func
    (it is not differentiable, and manipulates numpy arrays). It drops the
    stateful argument making all functions stateful.
    
WARNING:tensorflow:From language_generation.py:603: DatasetV1.make_one_shot_iterator (from tensorflow.python.data.ops.dataset_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use `for ... in dataset:` to iterate over a dataset. If using `tf.estimator`, return the `Dataset` object directly from your input function. As a last resort, you can use `tf.compat.v1.data.make_one_shot_iterator(dataset)`.
  0%|                                                                                                                                               | 0/1 [00:00<?, ?it/s]

Been stuck like this for 1 hour.

Collab notebook results VS samples?

So I've tried using the collab notebook for conditional generation, but the results are nowhere near as long or as good quality as the samples listed in the Medium article. They seem to lose coherency very quickly and start repeating phrases or single words repeatedly. What were the arguments used to create the samples?

How to fine tune xlnet large model with new text using XLnet-gen?

Hi:

Thanks for this repo. Text generation is my main interest and I was wondering how the xlnet large model can be fine tuned with new text then used as a model in XLnet-gen using language_generation.py

I can create a small base model from scratch using your repo but I don't have the gpu power to generate a large one.

Since the gpt-2 fine tuning repo by nshepperd using the OpenAi 345M model is very easy to use is it possible to use a similar process in XLnet-gen?

The fine tuning examples given in the original XLnet repo don't seem applicable or easy to edit for text generation.

Any suggestions or new scripts are welcome.

Thanks

Colab notebook doesn't work

Hi,
I tried using the Colab notebook you linked to in your README for this repo at:
https://colab.research.google.com/drive/12u-CmB9evMIASNOqJtDW26gmNvSgepBv

However, the last code cell raises an exception:

2021-01-17 19:47:05.890096: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.10.1
Traceback (most recent call last):
  File "language_generation.py", line 16, in <module>
    import model_utils
  File "/content/XLNet-gen/model_utils.py", line 292, in <module>
    class AdamWeightDecayOptimizer(tf.train.Optimizer):
AttributeError: module 'tensorflow._api.v2.train' has no attribute 'Optimizer'

Inference time

Thanks for the code and the article !

I've tried running your code, and it works, but the inference time is very, very long : ~45min for 1 sample...

I understand it is because we need to recompute all the attention of non-target tokens every 2 steps and it's running on CPU, but it is still too long.

Can I run this code on GPU / Colab TPU ?
If yes, how ?

Understand top-p vs top-k

I'm trying to understand top-p and top-k prediction mode from your code.
(I'm not proficient in Tensorflow and I never heard of top-p before)

Top-k strategy will sample token from the K best logits score.

Top-p strategy will sample token from the X logits where logits score > p.

Am I right ?

Gibberish with tensorflow 2.0.0b0

Thank you for the code and for wading through all of those einsum's to get things working!

I'm trying to generate some text using tensorflow 2.0 and have thus far only been able to produce gibberish.

In order to get the code running with the 2.0 api I had to make a few minor modifications: 1) change tf -> tf.compat.v1 in each place the compiler complained, and 2) use tf.keras.layers.LayerNormalization instead of tf.contrib.layers.layer_norm in parts of the attention and ffn code: (e.g.

#output = tf.contrib.layers.layer_norm(output + inp, begin_norm_axis=-1, scope='LayerNorm')
ln = tf.keras.layers.LayerNormalization()
with tf.compat.v1.variable_scope("LayerNorm"):
    output = ln(output + inp)

The documentation for LayerNormalization implies the default behavior is equivalent to using begin_norm_axis=-1 so I don't think this is the issue.

Any ideas to remedy the gibberish?

Filtering in token sampling process

This is not a problem, but rather a discovery.

I was working on a Japanese version of XLNet recently, due to lack of training data(nothing more than JPNWiki) and complexity of the language itself, the model was never really good in terms of in-sample and heldout perplexity. But anyways I decided to give it a try on language generation.

The discovery is that, my model will frequently try to generate an eod token and skip to another completely irrelevant topic , often end up making up a non-existent wikipedia article. But if I purge the activation of eod token in predicted logits before sampling process, it will never be able to generate that token and end the topic it was given.

I also found purging activation corresponding to bad tokens(those that tend to make anything behind it catastrophic gibberish) will also largely help the quality of the generated text.

Even though the model was never good in the first place, I still managed to improve the generated text from complete garbage to acceptable text that reads as if it was written by someone drunk.

OOM error on colab TPU when pretraining XLNet

I am sorry to bother you here with the problme about xlnet pretraining.

I saw your comment on xlnet issues, you has the same error: Error recorded from outfeed: Bad hardware status: 0x1, on colab TPU. Nowadays, I try to pretrain XLNet on colab tpu, and I am meeting the problem too. I also have tried with minimal batch_size= 16, but still get the error. So I want to ask you if you have solved the problem, and can you pretrain xlnet on colab TPU now?

Thanks!