rusiaaman / xlnet-gen Goto Github PK
View Code? Open in Web Editor NEWXLNet for generating language.
License: MIT License
XLNet for generating language.
License: MIT License
I'm trying to understand top-p and top-k prediction mode from your code.
(I'm not proficient in Tensorflow and I never heard of top-p before)
Top-k strategy will sample token from the K best logits score.
Top-p strategy will sample token from the X logits where logits score > p
.
Am I right ?
In most text generating architecture, beam search provide a quality improvement by generating more natural text.
Is it useful to use beam search with XLNet ?
As far as I understand, since token are generated one by one, beam search is completely useless.
But what about generating tokens 2 by 2 ? Would it be useful to add beam search ?
Are you going to try it ?
Any idea about this below error. Try to execute this.
Instructions for updating:
Use keras.layers.dense instead.
Traceback (most recent call last):
File "language_generation.py", line 686, in <module>
main()
File "language_generation.py", line 591, in main
predictions, features = prediction_graph()
File "language_generation.py", line 518, in prediction_graph_no_memory
inp, inp_mask, seg_id, perm_mask, prev_tokens, prev_conf)
File "language_generation.py", line 504, in body
sampled_tokens, confidences = sample_token(logits)
File "language_generation.py", line 276, in sample_token
confidences = tf.gather_nd(params=probs, batch_dims=0, indices=samples)
TypeError: gather_nd() got an unexpected keyword argument 'batch_dims'
I tried removing batch_dims argument and then it fails again after prompt. Any idea about this?
----PROMPT----
WARNING:tensorflow:From /Users/nitin/opt/anaconda3/envs/xlnet/lib/python3.6/site-packages/tensorflow/python/data/ops/dataset_ops.py:429: py_func (from tensorflow.python.ops.script_ops) is deprecated and will be removed in a future version.
Instructions for updating:
tf.py_func is deprecated in TF V2. Instead, use
tf.py_function, which takes a python function which manipulates tf eager
tensors instead of numpy arrays. It's easy to convert a tf eager tensor to
an ndarray (just call tensor.numpy()) but having access to eager tensors
means `tf.py_function`s can use accelerators such as GPUs as well as
being differentiable using a gradient tape.
0%| | 0/1 [00:00<?, ?it/s]
2019-12-23 16:59:50.810097: W tensorflow/core/framework/op_kernel.cc:1401] OP_REQUIRES failed at gather_nd_op.cc:50 : Invalid argument: indices[0] = [4712] does not index into param shape [1,32000]
Looking forward hearing from you.
Thank you for the code and for wading through all of those einsum's to get things working!
I'm trying to generate some text using tensorflow 2.0 and have thus far only been able to produce gibberish.
In order to get the code running with the 2.0 api I had to make a few minor modifications: 1) change tf -> tf.compat.v1 in each place the compiler complained, and 2) use tf.keras.layers.LayerNormalization instead of tf.contrib.layers.layer_norm in parts of the attention and ffn code: (e.g.
#output = tf.contrib.layers.layer_norm(output + inp, begin_norm_axis=-1, scope='LayerNorm')
ln = tf.keras.layers.LayerNormalization()
with tf.compat.v1.variable_scope("LayerNorm"):
output = ln(output + inp)
The documentation for LayerNormalization implies the default behavior is equivalent to using begin_norm_axis=-1 so I don't think this is the issue.
Any ideas to remedy the gibberish?
Can you update the code to Tensorflow 2?
Thanks
Hi,
I tried using the Colab notebook you linked to in your README for this repo at:
https://colab.research.google.com/drive/12u-CmB9evMIASNOqJtDW26gmNvSgepBv
However, the last code cell raises an exception:
2021-01-17 19:47:05.890096: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.10.1
Traceback (most recent call last):
File "language_generation.py", line 16, in <module>
import model_utils
File "/content/XLNet-gen/model_utils.py", line 292, in <module>
class AdamWeightDecayOptimizer(tf.train.Optimizer):
AttributeError: module 'tensorflow._api.v2.train' has no attribute 'Optimizer'
----PROMPT----
Hello world, this is some sample text that you can use!!
WARNING:tensorflow:From /home/timisb/.local/lib/python3.6/site-packages/tensorflow/python/data/ops/dataset_ops.py:494: py_func (from tensorflow.python.ops.script_ops) is deprecated and will be removed in a future version.
Instructions for updating:
tf.py_func is deprecated in TF V2. Instead, there are two
options available in V2.
- tf.py_function takes a python function which manipulates tf eager
tensors instead of numpy arrays. It's easy to convert a tf eager tensor to
an ndarray (just call tensor.numpy()) but having access to eager tensors
means `tf.py_function`s can use accelerators such as GPUs as well as
being differentiable using a gradient tape.
- tf.numpy_function maintains the semantics of the deprecated tf.py_func
(it is not differentiable, and manipulates numpy arrays). It drops the
stateful argument making all functions stateful.
WARNING:tensorflow:From language_generation.py:603: DatasetV1.make_one_shot_iterator (from tensorflow.python.data.ops.dataset_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use `for ... in dataset:` to iterate over a dataset. If using `tf.estimator`, return the `Dataset` object directly from your input function. As a last resort, you can use `tf.compat.v1.data.make_one_shot_iterator(dataset)`.
0%| | 0/1 [00:00<?, ?it/s]
Been stuck like this for 1 hour.
I run into the following error after inputting my prompt:
----PROMPT----
Today I plan to
WARNING:tensorflow:From language_generation.py:605: DatasetV1.make_one_shot_iterator (from tensorflow.python.data.ops.dataset_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use `for ... in dataset:` to iterate over a dataset. If using `tf.estimator`, return the `Dataset` object directly from your input function. As a last resort, you can use `tf.compat.v1.data.make_one_shot_iterator(dataset)`.
0% 0/25 [00:00<?, ?it/s]2020-04-19 15:44:22.416928: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10
2020-04-19 15:44:22.756233: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
[libprotobuf FATAL /sentencepiece/src/../third_party/protobuf-lite/google/protobuf/repeated_field.h:1506] CHECK failed: (index) < (current_size_):
terminate called after throwing an instance of 'google::protobuf::FatalException'
what(): CHECK failed: (index) < (current_size_):
Any suggestions for how to fix/work around this?
Thanks for the code and the article !
I've tried running your code, and it works, but the inference time is very, very long : ~45min for 1 sample...
I understand it is because we need to recompute all the attention of non-target tokens every 2 steps and it's running on CPU, but it is still too long.
Can I run this code on GPU / Colab TPU ?
If yes, how ?
Hi:
Thanks for this repo. Text generation is my main interest and I was wondering how the xlnet large model can be fine tuned with new text then used as a model in XLnet-gen using language_generation.py
I can create a small base model from scratch using your repo but I don't have the gpu power to generate a large one.
Since the gpt-2 fine tuning repo by nshepperd using the OpenAi 345M model is very easy to use is it possible to use a similar process in XLnet-gen?
The fine tuning examples given in the original XLnet repo don't seem applicable or easy to edit for text generation.
Any suggestions or new scripts are welcome.
Thanks
I am sorry to bother you here with the problme about xlnet pretraining.
I saw your comment on xlnet issues, you has the same error: Error recorded from outfeed: Bad hardware status: 0x1, on colab TPU. Nowadays, I try to pretrain XLNet on colab tpu, and I am meeting the problem too. I also have tried with minimal batch_size= 16, but still get the error. So I want to ask you if you have solved the problem, and can you pretrain xlnet on colab TPU now?
Thanks!
So I've tried using the collab notebook for conditional generation, but the results are nowhere near as long or as good quality as the samples listed in the Medium article. They seem to lose coherency very quickly and start repeating phrases or single words repeatedly. What were the arguments used to create the samples?
This is not a problem, but rather a discovery.
I was working on a Japanese version of XLNet recently, due to lack of training data(nothing more than JPNWiki) and complexity of the language itself, the model was never really good in terms of in-sample and heldout perplexity. But anyways I decided to give it a try on language generation.
The discovery is that, my model will frequently try to generate an eod token and skip to another completely irrelevant topic , often end up making up a non-existent wikipedia article. But if I purge the activation of eod token in predicted logits before sampling process, it will never be able to generate that token and end the topic it was given.
I also found purging activation corresponding to bad tokens(those that tend to make anything behind it catastrophic gibberish) will also largely help the quality of the generated text.
Even though the model was never good in the first place, I still managed to improve the generated text from complete garbage to acceptable text that reads as if it was written by someone drunk.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.