Code Monkey home page Code Monkey logo

xlnet-gen's Issues

Understand top-p vs top-k

I'm trying to understand top-p and top-k prediction mode from your code.
(I'm not proficient in Tensorflow and I never heard of top-p before)


Top-k strategy will sample token from the K best logits score.

Top-p strategy will sample token from the X logits where logits score > p.

Am I right ?

Beam search usefulness ?

In most text generating architecture, beam search provide a quality improvement by generating more natural text.

Is it useful to use beam search with XLNet ?


As far as I understand, since token are generated one by one, beam search is completely useless.
But what about generating tokens 2 by 2 ? Would it be useful to add beam search ?

Are you going to try it ?

TypeError: gather_nd() got an unexpected keyword argument 'batch_dims'

Any idea about this below error. Try to execute this.

 Instructions for updating:
 Use keras.layers.dense instead.
 Traceback (most recent call last):
   File "language_generation.py", line 686, in <module>
     main()
   File "language_generation.py", line 591, in main
     predictions, features = prediction_graph()
   File "language_generation.py", line 518, in prediction_graph_no_memory
     inp, inp_mask, seg_id, perm_mask, prev_tokens, prev_conf)
   File "language_generation.py", line 504, in body
     sampled_tokens, confidences = sample_token(logits)
   File "language_generation.py", line 276, in sample_token
     confidences = tf.gather_nd(params=probs, batch_dims=0, indices=samples)
 TypeError: gather_nd() got an unexpected keyword argument 'batch_dims'

I tried removing batch_dims argument and then it fails again after prompt. Any idea about this?

----PROMPT----

WARNING:tensorflow:From /Users/nitin/opt/anaconda3/envs/xlnet/lib/python3.6/site-packages/tensorflow/python/data/ops/dataset_ops.py:429: py_func (from tensorflow.python.ops.script_ops) is deprecated and will be removed in a future version.
Instructions for updating:
tf.py_func is deprecated in TF V2. Instead, use
    tf.py_function, which takes a python function which manipulates tf eager
    tensors instead of numpy arrays. It's easy to convert a tf eager tensor to
    an ndarray (just call tensor.numpy()) but having access to eager tensors
    means `tf.py_function`s can use accelerators such as GPUs as well as
    being differentiable using a gradient tape.
    
  0%|                                                                                                  | 0/1 [00:00<?, ?it/s]
2019-12-23 16:59:50.810097: W tensorflow/core/framework/op_kernel.cc:1401] OP_REQUIRES failed at gather_nd_op.cc:50 : Invalid argument: indices[0] = [4712] does not index into param shape [1,32000]

Looking forward hearing from you.

Gibberish with tensorflow 2.0.0b0

Thank you for the code and for wading through all of those einsum's to get things working!

I'm trying to generate some text using tensorflow 2.0 and have thus far only been able to produce gibberish.

In order to get the code running with the 2.0 api I had to make a few minor modifications: 1) change tf -> tf.compat.v1 in each place the compiler complained, and 2) use tf.keras.layers.LayerNormalization instead of tf.contrib.layers.layer_norm in parts of the attention and ffn code: (e.g.

#output = tf.contrib.layers.layer_norm(output + inp, begin_norm_axis=-1, scope='LayerNorm')
ln = tf.keras.layers.LayerNormalization()
with tf.compat.v1.variable_scope("LayerNorm"):
    output = ln(output + inp)

The documentation for LayerNormalization implies the default behavior is equivalent to using begin_norm_axis=-1 so I don't think this is the issue.

Any ideas to remedy the gibberish?

Colab notebook doesn't work

Hi,
I tried using the Colab notebook you linked to in your README for this repo at:
https://colab.research.google.com/drive/12u-CmB9evMIASNOqJtDW26gmNvSgepBv

However, the last code cell raises an exception:

2021-01-17 19:47:05.890096: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.10.1
Traceback (most recent call last):
  File "language_generation.py", line 16, in <module>
    import model_utils
  File "/content/XLNet-gen/model_utils.py", line 292, in <module>
    class AdamWeightDecayOptimizer(tf.train.Optimizer):
AttributeError: module 'tensorflow._api.v2.train' has no attribute 'Optimizer'

Nothing happens when generating with the ---PROMPT---

----PROMPT----
Hello world, this is some sample text that you can use!!   
WARNING:tensorflow:From /home/timisb/.local/lib/python3.6/site-packages/tensorflow/python/data/ops/dataset_ops.py:494: py_func (from tensorflow.python.ops.script_ops) is deprecated and will be removed in a future version.
Instructions for updating:
tf.py_func is deprecated in TF V2. Instead, there are two
    options available in V2.
    - tf.py_function takes a python function which manipulates tf eager
    tensors instead of numpy arrays. It's easy to convert a tf eager tensor to
    an ndarray (just call tensor.numpy()) but having access to eager tensors
    means `tf.py_function`s can use accelerators such as GPUs as well as
    being differentiable using a gradient tape.
    - tf.numpy_function maintains the semantics of the deprecated tf.py_func
    (it is not differentiable, and manipulates numpy arrays). It drops the
    stateful argument making all functions stateful.
    
WARNING:tensorflow:From language_generation.py:603: DatasetV1.make_one_shot_iterator (from tensorflow.python.data.ops.dataset_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use `for ... in dataset:` to iterate over a dataset. If using `tf.estimator`, return the `Dataset` object directly from your input function. As a last resort, you can use `tf.compat.v1.data.make_one_shot_iterator(dataset)`.
  0%|                                                                                                                                               | 0/1 [00:00<?, ?it/s]

Been stuck like this for 1 hour.

Protobuf error after inputting prompt

I run into the following error after inputting my prompt:

----PROMPT----
Today I plan to
WARNING:tensorflow:From language_generation.py:605: DatasetV1.make_one_shot_iterator (from tensorflow.python.data.ops.dataset_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use `for ... in dataset:` to iterate over a dataset. If using `tf.estimator`, return the `Dataset` object directly from your input function. As a last resort, you can use `tf.compat.v1.data.make_one_shot_iterator(dataset)`.
  0% 0/25 [00:00<?, ?it/s]2020-04-19 15:44:22.416928: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10
2020-04-19 15:44:22.756233: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
[libprotobuf FATAL /sentencepiece/src/../third_party/protobuf-lite/google/protobuf/repeated_field.h:1506] CHECK failed: (index) < (current_size_): 
terminate called after throwing an instance of 'google::protobuf::FatalException'
  what():  CHECK failed: (index) < (current_size_): 

Any suggestions for how to fix/work around this?

Inference time

Thanks for the code and the article !

I've tried running your code, and it works, but the inference time is very, very long : ~45min for 1 sample...

I understand it is because we need to recompute all the attention of non-target tokens every 2 steps and it's running on CPU, but it is still too long.

Can I run this code on GPU / Colab TPU ?
If yes, how ?

How to fine tune xlnet large model with new text using XLnet-gen?

Hi:

Thanks for this repo. Text generation is my main interest and I was wondering how the xlnet large model can be fine tuned with new text then used as a model in XLnet-gen using language_generation.py

I can create a small base model from scratch using your repo but I don't have the gpu power to generate a large one.

Since the gpt-2 fine tuning repo by nshepperd using the OpenAi 345M model is very easy to use is it possible to use a similar process in XLnet-gen?

The fine tuning examples given in the original XLnet repo don't seem applicable or easy to edit for text generation.

Any suggestions or new scripts are welcome.

Thanks

OOM error on colab TPU when pretraining XLNet

I am sorry to bother you here with the problme about xlnet pretraining.

I saw your comment on xlnet issues, you has the same error: Error recorded from outfeed: Bad hardware status: 0x1, on colab TPU. Nowadays, I try to pretrain XLNet on colab tpu, and I am meeting the problem too. I also have tried with minimal batch_size= 16, but still get the error. So I want to ask you if you have solved the problem, and can you pretrain xlnet on colab TPU now?

Thanks!

Collab notebook results VS samples?

So I've tried using the collab notebook for conditional generation, but the results are nowhere near as long or as good quality as the samples listed in the Medium article. They seem to lose coherency very quickly and start repeating phrases or single words repeatedly. What were the arguments used to create the samples?

Filtering in token sampling process

This is not a problem, but rather a discovery.

I was working on a Japanese version of XLNet recently, due to lack of training data(nothing more than JPNWiki) and complexity of the language itself, the model was never really good in terms of in-sample and heldout perplexity. But anyways I decided to give it a try on language generation.

The discovery is that, my model will frequently try to generate an eod token and skip to another completely irrelevant topic , often end up making up a non-existent wikipedia article. But if I purge the activation of eod token in predicted logits before sampling process, it will never be able to generate that token and end the topic it was given.

I also found purging activation corresponding to bad tokens(those that tend to make anything behind it catastrophic gibberish) will also largely help the quality of the generated text.

Even though the model was never good in the first place, I still managed to improve the generated text from complete garbage to acceptable text that reads as if it was written by someone drunk.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.