Code Monkey home page Code Monkey logo

gpt2's People

Contributors

connorjl avatar tbfly avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

gpt2's Issues

Docker documentation for CUDA

EDIT: I just realized this is the wrong repository to post this issue. The one(s) who forked your project and posted this docker solution, deepai-org, decided to go without issue reporting, but you can close this if you want.

Please add these CUDA specific notes to the docker page or please correct me if I had missed them somewhere.

Requirement is to install nvidia-docker .

CUDA enabled docker run using the latest toolkit is:

docker run --gpus all --rm -it -e MODE=http -p 5000:5000 deepaiorg/gpt2

It might also help others to include the (probably common) error people get when trying to run the image without enabling CUDA:

[root@machine ~]# docker run --rm -it -e MODE=http -p 5000:5000 deepaiorg/gpt2
Unable to find image 'deepaiorg/gpt2:latest' locally
latest: Pulling from deepaiorg/gpt2
Digest: sha256:805592697648bd7e83ea558d071b3db3e486553e32d5622b56c74f6da97cb0a8
Status: Downloaded newer image for deepaiorg/gpt2:latest
Traceback (most recent call last):
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/pywrap_tensorflow.py", line 58, in <module>
    from tensorflow.python.pywrap_tensorflow_internal import *
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/pywrap_tensorflow_internal.py", line 28, in <module>
    _pywrap_tensorflow_internal = swig_import_helper()
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/pywrap_tensorflow_internal.py", line 24, in swig_import_helper
    _mod = imp.load_module('_pywrap_tensorflow_internal', fp, pathname, description)
  File "/usr/lib/python3.5/imp.py", line 242, in load_module
    return load_dynamic(name, filename, file)
  File "/usr/lib/python3.5/imp.py", line 342, in load_dynamic
    return _load(spec)
ImportError: libcuda.so.1: cannot open shared object file: No such file or directory

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "main.py", line 4, in <module>
    import tensorflow as tf
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/__init__.py", line 24, in <module>
    from tensorflow.python import pywrap_tensorflow  # pylint: disable=unused-import
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/__init__.py", line 49, in <module>
    from tensorflow.python import pywrap_tensorflow
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/pywrap_tensorflow.py", line 74, in <module>
    raise ImportError(msg)
ImportError: Traceback (most recent call last):
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/pywrap_tensorflow.py", line 58, in <module>
    from tensorflow.python.pywrap_tensorflow_internal import *
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/pywrap_tensorflow_internal.py", line 28, in <module>
    _pywrap_tensorflow_internal = swig_import_helper()
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/pywrap_tensorflow_internal.py", line 24, in swig_import_helper
    _mod = imp.load_module('_pywrap_tensorflow_internal', fp, pathname, description)
  File "/usr/lib/python3.5/imp.py", line 242, in load_module
    return load_dynamic(name, filename, file)
  File "/usr/lib/python3.5/imp.py", line 342, in load_dynamic
    return _load(spec)
ImportError: libcuda.so.1: cannot open shared object file: No such file or directory

DOCKER: Web interface doesn't work

EDIT: I just realized this is the wrong repository to post this issue. The one(s) who forked your project and posted this docker solution, deepai-org, decided to go without issue reporting, but you can close this if you want.

I apologize if there's a better place to report docker image problems.

After putting the value in the textbox and hitting upload/submit button the result is always: console displays a thrown exception and the web interface has no visible reaction. Different types of input all have the same result: raw input text or JSON (like the piped input example), or raw input text formatted as base64

exception follows:

172.17.0.1 - - [13/Jan/2020 02:56:03] "POST / HTTP/1.1" 500 -
Traceback (most recent call last):
  File "/usr/local/lib/python3.5/dist-packages/flask/app.py", line 2463, in __call__
    return self.wsgi_app(environ, start_response)
  File "/usr/local/lib/python3.5/dist-packages/flask/app.py", line 2449, in wsgi_app
    response = self.handle_exception(e)
  File "/usr/local/lib/python3.5/dist-packages/flask/app.py", line 1866, in handle_exception
    reraise(exc_type, exc_value, tb)
  File "/usr/local/lib/python3.5/dist-packages/flask/_compat.py", line 39, in reraise
    raise value
  File "/usr/local/lib/python3.5/dist-packages/flask/app.py", line 2446, in wsgi_app
    response = self.full_dispatch_request()
  File "/usr/local/lib/python3.5/dist-packages/flask/app.py", line 1951, in full_dispatch_request
    rv = self.handle_user_exception(e)
  File "/usr/local/lib/python3.5/dist-packages/flask/app.py", line 1820, in handle_user_exception
    reraise(exc_type, exc_value, tb)
  File "/usr/local/lib/python3.5/dist-packages/flask/_compat.py", line 39, in reraise
    raise value
  File "/usr/local/lib/python3.5/dist-packages/flask/app.py", line 1949, in full_dispatch_request
    rv = self.dispatch_request()
  File "/usr/local/lib/python3.5/dist-packages/flask/app.py", line 1935, in dispatch_request
    return self.view_functions[rule.endpoint](**req.view_args)
  File "/usr/local/lib/python3.5/dist-packages/ai_integration/modes/http.py", line 41, in hello
    'url': 'data:' + 'text/plain' + ';base64,' + base64.b64encode(inputs_dict[key]).decode("utf-8")
  File "/usr/lib/python3.5/base64.py", line 59, in b64encode
    encoded = binascii.b2a_base64(s)[:-1]
TypeError: a bytes-like object is required, not 'str'

Downloading Encoder Model fails

Hi, could someone please provide me the pre-trained encoder? There seems to be an issue with the GCP account, when I run python3 download_model.py encoder:

<?xml version='1.0' encoding='UTF-8'?><Error><Code>UserProjectAccountProblem</Code><Message>User project billing account not in good standing.</Message><Details>The billing account for project 916430819220 is disabled in state delinquent</Details></Error>$

Question about the metric reported in the paper?

Question about the metric reported in the paper?.
HELLO! I am a new NLPer. I am confused about the pipline(pretrain->fineturn->test) of pre-training large language models.

  1. I would like to know which stage of the model was used for unlabeled dataset (e.g., WebText), labeled dataset (e.g., LAMBADA, CoQA, CNN and Daily Mail dataset), respectively?
    Dose GPT2 model pre-trained on unlabeled dataset, then fine-tuned on labeled dataset (e.g., LAMBADA, CoQA, CNN and Daily Mail dataset), respectively? Finally, reported the score in the paper.
  2. Other Large Language Models, like BART, RoBERTa, Mass, have these models been fine-tuned on labeled dataset (e.g., LAMBADA, CoQA, CNN and Daily Mail dataset) before reporting the scores?

Thank you!

when reading metadata of gs://openwebtext/stuff/encoder/encoder.json

Error coming while executing the command

$ python3 main.py --model 345M.json --predict_text "Hello World. Hello there! My name"
The output is below
{'n_head': 16, 'encoder_path': 'gs://openwebtext/stuff/encoder', 'n_vocab': 50257, 'embed_dropout': 0.1, 'lr': 0.00025, 'warmup_steps': 2000, 'weight_decay': 0.01, 'beta1': 0.9, 'beta2': 0.98, 'epsilon': 1e-09, 'opt_name': 'adam', 'train_batch_size': 8, 'attn_dropout': 0.1, 'train_steps': 10000, 'eval_steps': 10, 'max_steps': 500000, 'data_path': 'gs://connors-datasets/openwebtext/', 'res_dropout': 0.1, 'predict_batch_size': 8, 'eval_batch_size': 8, 'iterations': 500, 'n_embd': 1024, 'input': 'openwebtext', 'model': 'GPT2', 'model_path': 'gs://connors-models/GPT2-345M', 'n_ctx': 1024, 'predict_path': 'logs/predictions.txt', 'n_layer': 24, 'scale_by_depth': True, 'scale_by_in': True, 'use_tpu': False, 'precision': 'float32'}
2019-10-21 12:38:38.103626: I tensorflow/core/platform/cloud/retrying_utils.cc:73] The operation failed and will be automatically retried in 0.159809 seconds (attempt 1 out of 10), caused by: Unavailable: Error executing an HTTP request: libcurl code 6 meaning 'Couldn't resolve host name', error details: Couldn't resolve host 'metadata'
2019-10-21 12:38:38.272828: I tensorflow/core/platform/cloud/retrying_utils.cc:73] The operation failed and will be automatically retried in 0.053047 seconds (attempt 2 out of 10), caused by: Unavailable: Error executing an HTTP request: libcurl code 6 meaning 'Couldn't resolve host name', error details: Couldn't resolve host 'metadata'
2019-10-21 12:38:38.370688: I tensorflow/core/platform/cloud/retrying_utils.cc:73] The operation failed and will be automatically retried in 0.050504 seconds (attempt 3 out of 10), caused by: Unavailable: Error executing an HTTP request: libcurl code 6 meaning 'Couldn't resolve host name', error details: Couldn't resolve host 'metadata'
2019-10-21 12:38:38.433094: I tensorflow/core/platform/cloud/retrying_utils.cc:73] The operation failed and will be automatically retried in 0.564422 seconds (attempt 4 out of 10), caused by: Unavailable: Error executing an HTTP request: libcurl code 6 meaning 'Couldn't resolve host name', error details: Couldn't resolve host 'metadata'
2019-10-21 12:38:39.022315: I tensorflow/core/platform/cloud/retrying_utils.cc:73] The operation failed and will be automatically retried in 0.256678 seconds (attempt 5 out of 10), caused by: Unavailable: Error executing an HTTP request: libcurl code 6 meaning 'Couldn't resolve host name', error details: Couldn't resolve host 'metadata'
2019-10-21 12:38:39.300586: I tensorflow/core/platform/cloud/retrying_utils.cc:73] The operation failed and will be automatically retried in 1.24113 seconds (attempt 6 out of 10), caused by: Unavailable: Error executing an HTTP request: libcurl code 6 meaning 'Couldn't resolve host name', error details: Couldn't resolve host 'metadata'
2019-10-21 12:38:40.675821: I tensorflow/core/platform/cloud/retrying_utils.cc:73] The operation failed and will be automatically retried in 1.13431 seconds (attempt 7 out of 10), caused by: Unavailable: Error executing an HTTP request: libcurl code 6 meaning 'Couldn't resolve host name', error details: Couldn't resolve host 'metadata'
2019-10-21 12:38:41.867547: I tensorflow/core/platform/cloud/retrying_utils.cc:73] The operation failed and will be automatically retried in 1.20263 seconds (attempt 8 out of 10), caused by: Unavailable: Error executing an HTTP request: libcurl code 6 meaning 'Couldn't resolve host name', error details: Couldn't resolve host 'metadata'
2019-10-21 12:38:43.087045: I tensorflow/core/platform/cloud/retrying_utils.cc:73] The operation failed and will be automatically retried in 1.05564 seconds (attempt 9 out of 10), caused by: Unavailable: Error executing an HTTP request: libcurl code 6 meaning 'Couldn't resolve host name', error details: Couldn't resolve host 'metadata'
2019-10-21 12:38:44.151391: I tensorflow/core/platform/cloud/retrying_utils.cc:73] The operation failed and will be automatically retried in 1.43831 seconds (attempt 10 out of 10), caused by: Unavailable: Error executing an HTTP request: libcurl code 6 meaning 'Couldn't resolve host name', error details: Couldn't resolve host 'metadata'
2019-10-21 12:38:45.596157: W tensorflow/core/platform/cloud/google_auth_provider.cc:157] All attempts to get a Google authentication bearer token failed, returning an empty token. Retrieving token from files failed with "Not found: Could not locate the credentials file.". Retrieving token from GCE failed with "Aborted: All 10 retry attempts failed. The last failure: Unavailable: Error executing an HTTP request: libcurl code 6 meaning 'Couldn't resolve host name', error details: Couldn't resolve host 'metadata'".
Traceback (most recent call last):
File "main.py", line 118, in
enc = encoder.get_encoder(params["encoder_path"])
File "/home/kiran1/KiranResearch/TextSummerization/GPT2/models/gpt2/encoder.py", line 111, in get_encoder
encoder = json.load(f)
File "/home/kiran1/anaconda3/envs/tf_gpu/lib/python3.6/json/init.py", line 296, in load
return loads(fp.read(),
File "/home/kiran1/anaconda3/envs/tf_gpu/lib/python3.6/site-packages/tensorflow/python/lib/io/file_io.py", line 128, in read
length = self.size() - self.tell()
File "/home/kiran1/anaconda3/envs/tf_gpu/lib/python3.6/site-packages/tensorflow/python/lib/io/file_io.py", line 104, in size
return stat(self.__name).length
File "/home/kiran1/anaconda3/envs/tf_gpu/lib/python3.6/site-packages/tensorflow/python/lib/io/file_io.py", line 735, in stat
return stat_v2(filename)
File "/home/kiran1/anaconda3/envs/tf_gpu/lib/python3.6/site-packages/tensorflow/python/lib/io/file_io.py", line 754, in stat_v2
return file_statistics
File "/home/kiran1/anaconda3/envs/tf_gpu/lib/python3.6/site-packages/tensorflow/python/framework/errors_impl.py", line 528, in exit
c_api.TF_GetCode(self.status.status))
tensorflow.python.framework.errors_impl.PermissionDeniedError: Error executing an HTTP request: HTTP response code 401 with body '{
"error": {
"code": 401,
"message": "Anonymous caller does not have storage.objects.get access to openwebtext/stuff/encoder/encoder.json.",
"errors": [
{
"message": "Anonymous caller does not have storage.objects.get access to openwebtext/stuff/encoder/encoder.json.",
"domain": "global",
"reason": "required",
"locationType": "header",
"location": "Authorization"
}
]
}
}
'
when reading metadata of gs://openwebtext/stuff/encoder/encoder.json

Your 1.5B model

Seeing as open AI released theirs, and those other researchers did prior. I would like to see yours for research and comparison. Thank you.

How to process raw text files to create similar "PrettyBig" model?

Thanks for the repo. Have sampling working fine from your "PrettyBig" model.

I would like to generate my own dataset from 6 gigs of raw, header free Gutenberg text files
and I was wondering how this can be done using datasets/create_tfrecords.py

Using tar I've created "RS_2017-04-4_data.xz" from the raw text files and placed in "openwebtext/RS_2017-04-4_data.xz"

I've edited one of your .json files to include the paths in the required "files.json" (# This file should contain paths to all your RS_--_data. files)

run create_tfrecords.py and creates parse/RS_2017-04 folders

90 minutes later from the terminal

Parsing chunk 1 took 54.41039276123047 seconds
-- 0.0% of chunk 1's docs yielded text.
Saving chunk 1 took 1.6689300537109375e-06 seconds
Parsing chunk 2 took 49.19901156425476 seconds
-- 0.0% of chunk 2's docs yielded text.
Saving chunk 2 took 1.1920928955078125e-06 second

... parse/RS_2017-04 is still empty

Stopped at this point because I assume this is wrong. Any suggestions how I can prepare a similar model as "PrettyBig" using standard raw text files?

Cheers,

P.S Do you plan on releasing the 1.7 model?

Training problem

@ConnorJL Thanks for the great work.

Unfortunately, I found out my training using OpenWebTextCorpus is too slow even for 117M model. The cross entropy loss function decreases rapidly before 10k steps using a batch size of 64. After that it stayed around 3.0. Is this a known phenomenon or is it a dataset problem? I found the loss function in model_fns is not shifted. It should be loss_batch = tf.nn.sparse_softmax_cross_entropy_with_logits(logits=output["logits"][:, :-1],labels=features[:, 1:]) , am I right?

Predicting with PrettyBigModel `InvalidArgumentError: indices[0,0] = 1024 is not in [0, 1024)`

Hi, I was interested in testing your PrettyBig model. I've downloaded the model and edited the PrettyBig.json to point to the downloaded encoder and model paths. When running:

python3 main.py --model PrettyBig.eval.json --predict_text "Hello there! My name is"

I get the following error:

{'n_head': 16, 'encoder_path': '/Users/pkmital/freelance/pkm/gpt-2/gpt-1.5b/encoder', 'n_vocab': 50257, 'embed_dropout': 0.0, 'lr': 0.00025, 'warmup_steps': 2000, 'weight_decay': 0.01, 'beta1': 0.9, 'beta2': 0.98, 'epsilon': 1e-09, 'opt_n
ame': 'adam', 'train_batch_size': 256, 'attn_dropout': 0.0, 'train_steps': 10000, 'eval_steps': 10, 'max_steps': 604800, 'data_path': 'gs://connors-datasets/openwebtext/', 'scale': 0.14433756729740646, 'res_dropout': 0.1, 'predict_batch_s
ize': 1, 'eval_batch_size': 256, 'iterations': 100, 'n_embd': 1024, 'input': 'openwebtext_longbiased', 'model': 'GPT2', 'model_path': '/Users/pkmital/freelance/pkm/gpt-2/gpt-1.5b/PrettyBig', 'n_ctx': 1024, 'predict_path': 'logs/prediction
s_SortaBig.txt', 'n_layer': 25, 'use_tpu': False, 'precision': 'float32'}
Using config: {'_model_dir': '/Users/pkmital/freelance/pkm/gpt-2/gpt-1.5b/PrettyBig', '_tf_random_seed': None, '_save_summary_steps': 100, '_save_checkpoints_steps': None, '_save_checkpoints_secs': 600, '_session_config': , '_keep_checkpo
int_max': 5, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': 100, '_train_distribute': None, '_device_fn': None, '_protocol': None, '_eval_distribute': None, '_experimental_distribute': None, '_service': None, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x13fbf8ef0>, '_task_type': 'worker', '_task_id': 0, '_global_id_in_cluster': 0, '_master': '', '_evaluation_master': '', '_is_chief': True, '_num_ps_replicas': 0, '_num_worker_replicas': 1}                                                                                                                                                                                                                          Generating predictions...
From /Users/pkmital/anaconda3/lib/python3.6/site-packages/tensorflow/python/framework/op_def_library.py:263: colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version.                      Instructions for updating:                                                                                                                                                                                                                    Colocations handled automatically by placer.                                                                                                                                                                                                  Calling model_fn.
From /Users/pkmital/freelance/pkm/gpt-2/gpt-1.5b/models/gpt2/sample.py:57: to_float (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.cast instead.
From /Users/pkmital/freelance/pkm/gpt-2/gpt-1.5b/models/gpt2/sample.py:59: multinomial (from tensorflow.python.ops.random_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.random.categorical instead.
Done calling model_fn.
Graph was finalized.
2019-06-08 15:55:47.498527: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
From /Users/pkmital/anaconda3/lib/python3.6/site-packages/tensorflow/python/training/saver.py:1266: checkpoint_exists (from tensorflow.python.training.checkpoint_management) is deprecated and will be removed in a future version.
Instructions for updating:
Use standard file APIs to check for files with this prefix.
Restoring parameters from /Users/pkmital/freelance/pkm/gpt-2/gpt-1.5b/PrettyBig/model.ckpt
Running local_init_op.
Done running local_init_op.
Traceback (most recent call last):
  File "/Users/pkmital/anaconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1334, in _do_call
    return fn(*args)
  File "/Users/pkmital/anaconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1319, in _run_fn
    options, feed_dict, fetch_list, target_list, run_metadata)
  File "/Users/pkmital/anaconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1407, in _call_tf_sessionrun
    run_metadata)
tensorflow.python.framework.errors_impl.InvalidArgumentError: indices[0,0] = 1024 is not in [0, 1024)
         [[{{node sample_sequence/while/model/GatherV2_1}}]]
python3 --version                                                                                                                                                                                                                      
Python 3.6.8 :: Anaconda, Inc.
pip3 list | grep tensorflow
mesh-tensorflow                    0.0.5
tensorflow                         1.13.1
tensorflow-datasets                1.0.1
tensorflow-estimator               1.13.0
tensorflow-metadata                0.13.0
tensorflow-probability             0.6.0

Any ideas appreciated. Thanks!

Input Chinese, the predicted is Japanese.

Hello Connor Leahy. Thank you very much for your excellent model project. It's very cool. And I'm more happy and exciting that there will be more open source models in the future. But I'm a Chinese user. When using Pretty Big model, I input Chinese, and the results are predicted in Japanese. Can I support the Chinese model?

Retraining a new model, only gpu 0 can be used

my batch size:
"train_batch_size": 4,

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 418.67 Driver Version: 418.67 CUDA Version: 10.1 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Tesla V100-SXM3... Off | 00000000:34:00.0 Off | 0 |
| N/A 62C P0 349W / 350W | 30630MiB / 32480MiB | 100% Default |
+-------------------------------+----------------------+----------------------+
| 1 Tesla V100-SXM3... Off | 00000000:36:00.0 Off | 0 |
| N/A 28C P0 70W / 350W | 428MiB / 32480MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 2 Tesla V100-SXM3... Off | 00000000:39:00.0 Off | 0 |
| N/A 37C P0 71W / 350W | 428MiB / 32480MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 3 Tesla V100-SXM3... Off | 00000000:3B:00.0 Off | 0 |
| N/A 57C P0 75W / 350W | 428MiB / 32480MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 4 Tesla V100-SXM3... Off | 00000000:57:00.0 Off | 0 |
| N/A 27C P0 68W / 350W | 428MiB / 32480MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 5 Tesla V100-SXM3... Off | 00000000:59:00.0 Off | 0 |
| N/A 36C P0 67W / 350W | 428MiB / 32480MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 6 Tesla V100-SXM3... Off | 00000000:5C:00.0 Off | 0 |
| N/A 30C P0 66W / 350W | 428MiB / 32480MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 7 Tesla V100-SXM3... Off | 00000000:5E:00.0 Off | 0 |
| N/A 38C P0 69W / 350W | 428MiB / 32480MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 8 Tesla V100-SXM3... Off | 00000000:B7:00.0 Off | 0 |
| N/A 30C P0 66W / 350W | 428MiB / 32480MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 9 Tesla V100-SXM3... Off | 00000000:B9:00.0 Off | 0 |
| N/A 30C P0 66W / 350W | 428MiB / 32480MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 10 Tesla V100-SXM3... Off | 00000000:BC:00.0 Off | 0 |
| N/A 36C P0 68W / 350W | 428MiB / 32480MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 11 Tesla V100-SXM3... Off | 00000000:BE:00.0 Off | 0 |
| N/A 38C P0 68W / 350W | 428MiB / 32480MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 12 Tesla V100-SXM3... Off | 00000000:E0:00.0 Off | 0 |
| N/A 30C P0 66W / 350W | 428MiB / 32480MiB | 0% Default |

Unable to predict with bfloat16 model

Can train a bfloat16 model but prediction on either GPU or CPU gives missing kernel op for bfloat16 for 'Rsqrt'. Have you been able to predict using bfloat16 models?

Also, would it be possible to do batch gradient averaging to simulate larger batch size on TPU without requiring more memory?

Are there some research papers about text-to-set generation?

I know this question is a little out of topic. But it is helpful to me. Thank you.

Text-to-(word)set generation or sequence-to-(token)set generation.

For example, input a text and then output the tags for this text:

'Peter is studying English' --> {'good behavior','person','doing something'}

Thank you!

Training on artificial language data (server logs, medical records, etc.)

Hi and thank you for your amazing work! I would like to train GPT-2 in Colab TPU on non-natural language sequential categorical data like server logs, medical records or weather events. What do I have to change in your code to prepare a dataset with word-level encoding (instead of BPE) and successfully run training?

P.S. I think I would be very useful for the community if we have a quick tutorial section on this in Readme.

Thank you!

Training 1.5B?

Hello,

I was wondering if you were able to train the 1.5B model or the large model on TPUs? Afaik it's too large to fit.
I would really like to know if you did succeed. Thanks.

A meaningful performance comparison with OpenAI's models

Hi Connor,

I'd like to see some meaningful comparison with released, and, if possible, unreleased OpenAI's pretrained GPT-2 models.

My concern is that if you used different training techniques, the result may be very far off from what they've got. Including a possibility, that 1.5B model could be worse, than 345M model, that they have released.

P.S. Also pinged you on Twitter about this.

quirks that hold the model back

In Addendum: Evaluation of My Model you mention:

Although I used the same amount of hardware (or more), the differences in my training setup and hyperparameters made a significant difference. Which is an unfortunate reality to anyone familiar with reproducing deep learning papers. I don’t think my model in its current state is even as dangerous as 117M in its text generating abilities. But I believe to have found the quirks in my setup that have held the model back, and they are easy to fix.

Are you willing to elaborate on this, and describe or fix the quirks? I think it would be really interesting/informative/useful for students of deep learning as a case study, showing how small non-obvious changes can make a big difference. Please consider doing so :) Thank you.

Error on output

After about 5 hours I think I am throwing in the towel here. I have ran all the commands at noted. I am running this in google colab where I tested a few other systems. However this one works great until I try to run this last command.

!python3 main.py --model 1.5B.json [--top_k Top-K-Truncation] --predict_text "Hello there! My name is"

Error:

2020-10-16 23:42:00.183472: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.1
Traceback (most recent call last):
  File "main.py", line 12, in <module>
    from model_fns import *
  File "/home/GPT2/model_fns.py", line 6, in <module>
    from optimizers import create_train_op
  File "/home/GPT2/optimizers.py", line 100, in <module>
    class AdafactorOptimizer(tf.train.Optimizer):
AttributeError: module 'tensorflow._api.v2.train' has no attribute 'Optimizer'

Has anyone managed to work it on Windows? Which OS did you use to make it work?

I have windows 10, x64, Core i7 2600 K CPU, 32 ram memory, GTX 1050 Ti GPU

I have installed latest Phyton and Tensorflow

Also run these commands

1) pip3 install tensorflow-gpu regex

2) pip3 install requests tqdm

3) cd GPT2 folder (cloned via bash)

4) python download_model.py PrettyBig

Everything I believe is ready however i am not able to make it work

Here my configurations and what errors I am getting

Main folder

image

PrettyBig folder

image

PrettyBig.json - file paths are correct and working

image

Here the command line I have used

C:\GPT2>python main.py --model PrettyBig.json --predict_text "Pikachu"

At first it runs several minutes with around 70% CPU usage and above 2 GB ram usage

Here the full command line result of the above command

C:\GPT2>python main.py --model PrettyBig.json --predict_text "Pikachu"
{'n_head': 16, 'encoder_path': 'C:\GPT2\encoder', 'n_vocab': 50257, 'embed_dropout': 0.0, 'lr': 0.00025, 'warmup_steps': 2000, 'weight_decay': 0.01, 'beta1': 0.9, 'beta2': 0.98, 'epsilon': 1e-09, 'opt_name': 'adam', 'train_batch_size': 256, 'attn_dropout': 0.0, 'train_steps': 10000, 'eval_steps': 10, 'max_steps': 604800, 'data_path': 'gs://connors-datasets/openwebtext/', 'scale': 0.14433756729740646, 'res_dropout': 0.1, 'predict_batch_size': 1, 'eval_batch_size': 256, 'iterations': 100, 'n_embd': 1024, 'input': 'openwebtext_longbiased', 'model': 'GPT2', 'model_path': 'C:\GPT2\PrettyBig', 'n_ctx': 1024, 'predict_path': 'logs/predictions_SortaBig.txt', 'n_layer': 25, 'use_tpu': False, 'precision': 'float32'}
Using config: {'_model_dir': 'C:\GPT2\PrettyBig', '_tf_random_seed': None, '_save_summary_steps': 100, '_save_checkpoints_steps': None, '_save_checkpoints_secs': 600, '_session_config': , '_keep_checkpoint_max': 5, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': 100, '_train_distribute': None, '_device_fn': None, '_protocol': None, '_eval_distribute': None, '_experimental_distribute': None, '_service': None, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x0000016DD33ECEB8>, '_task_type': 'worker', '_task_id': 0, '_global_id_in_cluster': 0, '_master': '', '_evaluation_master': '', '_is_chief': True, '_num_ps_replicas': 0, '_num_worker_replicas': 1}
Generating predictions...
From C:\Python37\lib\site-packages\tensorflow\python\framework\op_def_library.py:263: colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version.
Instructions for updating:
Colocations handled automatically by placer.
Calling model_fn.
From C:\GPT2\models\gpt2\sample.py:57: to_float (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.cast instead.
From C:\GPT2\models\gpt2\sample.py:59: multinomial (from tensorflow.python.ops.random_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.random.categorical instead.
Done calling model_fn.
Graph was finalized.
From C:\Python37\lib\site-packages\tensorflow\python\training\saver.py:1266: checkpoint_exists (from tensorflow.python.training.checkpoint_management) is deprecated and will be removed in a future version.
Instructions for updating:
Use standard file APIs to check for files with this prefix.
Restoring parameters from C:\GPT2\PrettyBig\model.ckpt
Running local_init_op.
Done running local_init_op.
Traceback (most recent call last):
File "C:\Python37\lib\site-packages\tensorflow\python\client\session.py", line 1334, in _do_call
return fn(*args)
File "C:\Python37\lib\site-packages\tensorflow\python\client\session.py", line 1319, in _run_fn
options, feed_dict, fetch_list, target_list, run_metadata)
File "C:\Python37\lib\site-packages\tensorflow\python\client\session.py", line 1407, in _call_tf_sessionrun
run_metadata)
tensorflow.python.framework.errors_impl.InvalidArgumentError: indices[0,0] = 1024 is not in [0, 1024)
[[{{node sample_sequence/while/model/GatherV2_1}}]]

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "main.py", line 131, in
predict_fn(network, text, params)
File "C:\GPT2\predict_fns.py", line 18, in gpt2_predict
for i, p in enumerate(predictions):
File "C:\Python37\lib\site-packages\tensorflow_estimator\python\estimator\estimator.py", line 629, in predict
preds_evaluated = mon_sess.run(predictions)
File "C:\Python37\lib\site-packages\tensorflow\python\training\monitored_session.py", line 676, in run
run_metadata=run_metadata)
File "C:\Python37\lib\site-packages\tensorflow\python\training\monitored_session.py", line 1171, in run
run_metadata=run_metadata)
File "C:\Python37\lib\site-packages\tensorflow\python\training\monitored_session.py", line 1270, in run
raise six.reraise(*original_exc_info)
File "C:\Python37\lib\site-packages\six.py", line 693, in reraise
raise value
File "C:\Python37\lib\site-packages\tensorflow\python\training\monitored_session.py", line 1255, in run
return self._sess.run(*args, **kwargs)
File "C:\Python37\lib\site-packages\tensorflow\python\training\monitored_session.py", line 1327, in run
run_metadata=run_metadata)
File "C:\Python37\lib\site-packages\tensorflow\python\training\monitored_session.py", line 1091, in run
return self._sess.run(*args, **kwargs)
File "C:\Python37\lib\site-packages\tensorflow\python\client\session.py", line 929, in run
run_metadata_ptr)
File "C:\Python37\lib\site-packages\tensorflow\python\client\session.py", line 1152, in _run
feed_dict_tensor, options, run_metadata)
File "C:\Python37\lib\site-packages\tensorflow\python\client\session.py", line 1328, in _do_run
run_metadata)
File "C:\Python37\lib\site-packages\tensorflow\python\client\session.py", line 1348, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InvalidArgumentError: indices[0,0] = 1024 is not in [0, 1024)
[[node sample_sequence/while/model/GatherV2_1 (defined at C:\GPT2\models\gpt2\gpt2.py:208) ]]

Caused by op 'sample_sequence/while/model/GatherV2_1', defined at:
File "main.py", line 131, in
predict_fn(network, text, params)
File "C:\GPT2\predict_fns.py", line 18, in gpt2_predict
for i, p in enumerate(predictions):
File "C:\Python37\lib\site-packages\tensorflow_estimator\python\estimator\estimator.py", line 611, in predict
features, None, model_fn_lib.ModeKeys.PREDICT, self.config)
File "C:\Python37\lib\site-packages\tensorflow_estimator\python\estimator\estimator.py", line 1112, in _call_model_fn
model_fn_results = self._model_fn(features=features, **kwargs)
File "C:\GPT2\model_fns.py", line 62, in gpt2_model
temperature=1.0, top_k=params["top_k"]
File "C:\GPT2\models\gpt2\sample.py", line 82, in sample_sequence
back_prop=False,
File "C:\Python37\lib\site-packages\tensorflow\python\ops\control_flow_ops.py", line 3556, in while_loop
return_same_structure)
File "C:\Python37\lib\site-packages\tensorflow\python\ops\control_flow_ops.py", line 3087, in BuildLoop
pred, body, original_loop_vars, loop_vars, shape_invariants)
File "C:\Python37\lib\site-packages\tensorflow\python\ops\control_flow_ops.py", line 3022, in _BuildLoop
body_result = body(*packed_vars_for_body)
File "C:\Python37\lib\site-packages\tensorflow\python\ops\control_flow_ops.py", line 3525, in
body = lambda i, lv: (i + 1, orig_body(*lv))
File "C:\GPT2\models\gpt2\sample.py", line 56, in body
next_outputs = step(params, prev[:, tf.newaxis], past=past)
File "C:\GPT2\models\gpt2\sample.py", line 40, in step
lm_output = lm_output = gpt2.model(params=params, X=tokens, past=past, reuse=tf.AUTO_REUSE)
File "C:\GPT2\models\gpt2\gpt2.py", line 208, in model
h = tf.gather(wte, X) + tf.gather(wpe, positions_for(X, past_length))
File "C:\Python37\lib\site-packages\tensorflow\python\util\dispatch.py", line 180, in wrapper
return target(*args, **kwargs)
File "C:\Python37\lib\site-packages\tensorflow\python\ops\array_ops.py", line 3273, in gather
return gen_array_ops.gather_v2(params, indices, axis, name=name)
File "C:\Python37\lib\site-packages\tensorflow\python\ops\gen_array_ops.py", line 4390, in gather_v2
"GatherV2", params=params, indices=indices, axis=axis, name=name)
File "C:\Python37\lib\site-packages\tensorflow\python\framework\op_def_library.py", line 788, in _apply_op_helper
op_def=op_def)
File "C:\Python37\lib\site-packages\tensorflow\python\util\deprecation.py", line 507, in new_func
return func(*args, **kwargs)
File "C:\Python37\lib\site-packages\tensorflow\python\framework\ops.py", line 3300, in create_op
op_def=op_def)
File "C:\Python37\lib\site-packages\tensorflow\python\framework\ops.py", line 1801, in init
self._traceback = tf_stack.extract_stack()

InvalidArgumentError (see above for traceback): indices[0,0] = 1024 is not in [0, 1024)
[[node sample_sequence/while/model/GatherV2_1 (defined at C:\GPT2\models\gpt2\gpt2.py:208) ]]

I have used single text as author suggested but still fails

I have also tested input.txt method

117M/model.ckpt.index is corrupted?

Kept getting this error -

Create CheckpointSaverHook.
Done calling model_fn.
TPU job name worker
Graph was finalized.
Restoring parameters from gs://kogpt2/models/117M/model.ckpt
Error recorded from training_loop: From /job:worker/replica:0/task:0:
File contents are inconsistent for file: gs://kogpt2/models/117M/model.ckpt.index @ 0.
         [[node save/RestoreV2 (defined at /home/ksjcom0705_gmail_com/GPT2/venv/lib/python3.7/site-packages/tensorflow_co
re/python/framework/ops.py:1748) ]]

Anyone with a trained 117M model so I can pretrain them? It looks like the source is damaged somehow(or gsutil is damaging them)

about encoder.json

How can I get encoder.json on my own dataset? I am comfused about it. I got a vocab file using SentencePiece.

error when using create_tfrecords.py

Got 142 files, divided into 1 chunks.
  0% 0/1 [00:00<?, ?it/s]0

Traceback (most recent call last):
  File "./GPT2/datasets/openwebtext/create_tfrecords.py", line 86, in <module>
    good += g
TypeError: unsupported operand type(s) for +: 'int' and 'NoneType'

format dataset

Wow nice repository, I also find GPT2 repo to train on TPU because I just got access google cloud TPU from tensorflow research cloud program. I have a plain text dataset but I don't know how to reformat my dataset into trainable format dataset like in your repo. So any formatted dataset you create to trian using this repo?

Very thanks for your answer and create this repo, awesome!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.