openai / image-gpt Goto Github PK

View Code? Open in Web Editor NEW

2.0K 84.0 390.0 33 KB

License: Other

Python 100.00%

image-gpt's Introduction

Status: Archive (code is provided as-is, no updates expected)

image-gpt

Code and models from the paper "Generative Pretraining from Pixels".

Supported Platforms:

Ubuntu 16.04

Install

You can get miniconda from https://docs.conda.io/en/latest/miniconda.html, or install the dependencies shown below manually.

conda create --name image-gpt python=3.7.3
conda activate image-gpt

conda install numpy=1.16.3
conda install tensorflow-gpu=1.13.1

conda install imageio=2.8.0
conda install requests=2.21.0
conda install tqdm=4.46.0

Usage

This repository is meant to be a starting point for researchers and engineers to experiment with image GPT (iGPT). Our code forks GPT-2 to highlight that it can be easily applied across domains. The diff from gpt-2/src/model.py to image-gpt/src/model.py includes a new activation function, renaming of several variables, and the introduction of a start-of-sequence token, none of which change the model architecture.

Downloading Pre-trained Models

To download a model checkpoint, run download.py. The --model argument should be one of "s", "m", or "l", and the --ckpt argument should be one of "131000", "262000", "524000", or "1000000".

python download.py --model s --ckpt 1000000

This command downloads the iGPT-S checkpoint at 1M training iterations. The default download directory is set to /root/downloads/, and can be changed using the --download_dir argument.

Downloading Datasets

To download datasets, run download.py with the --dataset argument set to "imagenet" or "cifar10".

python download.py --model s --ckpt 1000000 --dataset imagenet

This command additionally downloads 32x32 ImageNet encoded with the 9-bit color palette described in the paper. The datasets we provide are center-cropped images intended for evaluation; random cropped images are required to faithfully replicate training.

Downloading Color Clusters

To download the color cluster file defining our 9-bit color palette, run download.py with the --clusters flag set.

python download.py --model s --ckpt 1000000 --dataset imagenet --clusters

This command additionally downloads the color cluster file. src/run.py:sample shows how to decode from 9-bit color to RGB and src/utils.py:color_quantize shows how to go the other way around.

Sampling

Once the desired checkpoint and color cluster file are downloaded, we can run the script in sampling mode. The following commands sample from iGPT-S, iGPT-M, and iGPT-L respectively:

python src/run.py --sample --n_embd 512  --n_head 8  --n_layer 24
python src/run.py --sample --n_embd 1024 --n_head 8  --n_layer 36
python src/run.py --sample --n_embd 1536 --n_head 16 --n_layer 48

If your data is not in /root/downloads/, set --ckpt_path and --color_cluster_path manually. To run on fewer than 8 GPUs, use a command of the following form:

CUDA_VISIBLE_DEVICES=0,1 python src/run.py --sample --n_embd 512  --n_head 8  --n_layer 24 --n_gpu 2

Evaluating

Once the desired checkpoint and evaluation dataset are downloaded, we can run the script in evaluation mode. The following commands evaluate iGPT-S, iGPT-M, and iGPT-L on ImageNet respectively:

python src/run.py --eval --n_embd 512  --n_head 8  --n_layer 24
python src/run.py --eval --n_embd 1024 --n_head 8  --n_layer 36
python src/run.py --eval --n_embd 1536 --n_head 16 --n_layer 48

If your data is not in /root/downloads/, set --ckpt_path and --data_path manually. You should see that the test generative losses are 2.0895, 2.0614, and 2.0466, matching Figure 3 in the paper.

Citation

Please use the following bibtex entry:

@article{chen2020generative,
  title={Generative Pretraining from Pixels},
  author={Chen, Mark and Radford, Alec and Child, Rewon and Wu, Jeff and Jun, Heewoo and Dhariwal, Prafulla and Luan, David and Sutskever, Ilya},
  year={2020}
}

License

Modified MIT

image-gpt's People

Contributors

Stargazers

Watchers

Forkers

fiddlerwoaroof murilo cpietsch jbinkleyj prince-xuanchan robot-ai-machinelearning ml-lab peternara shyamalschandra ai-hub-deep-learning-fundamental manikant92 eridgd luweishuang vasyllyashkevych jonathanfly hephaex maciejmacko jbdatascience codeaudit martincastellano asawq2006 apeguero1 stjordanis nahilsobh nathanhack wongkaiweng gitzengyi dyttokaa cxz matpg emreyalcin26 alyxq brunocavagnaro asisakov maitret emmanueljoven xrosliang dmytrosytro devaksnz licoriceleaf guybrusht helixngc7293 theusclouds zikkuratti bniss pauldaniv abvesa jubal go-and-practice tchigher ryansteed a-leut mrhoora sailfish009 cv-ip xinwang-hnu alanderex msopyan suvrajeet01 anhhai986 nevrax arr18years applezoos openseg-group zmonoid tanmayjain69 grant-steinfeld rodolfoaugusto danisharain73 v-desk nagisayui nicolehrke michael-spengler 0xymoro obi-wan-shinobi bloqum angelsharks taktak1 kulkarnikaustubh benbuckley ai-machine-vision-lab piegu ganeshkumartk th77889900 derrenbrown ari-hant land007 tgokyigit freedreamer-crypto muskanmahajan37 vershinin075 gvvynplaine tawawhite mhomol robinrowe the-code-chef kncnow hdmtp chijuwu90 gutihernandez

image-gpt's Issues

A quick question: the provided code does not include model training?

If so, would you plan to release the training part?

Pre-trained models for Bert Objective?

I have tried to use the pre-trained models (s-GPT) with the Bert objective. However, this only generates noise.

Are there extra pre-trained models that were trained on the bert task?

I could not find anything in the download.py, bert is not mentioned there.

Dockerfile

Thanks for publishing this work. Here is a Dockerfile that builds a compatible runtime environment for executing this model.

FROM nvidia/cuda:10.0-cudnn7-runtime-ubuntu18.04
RUN apt-get update && \
    apt-get install -y python3-pip && \
    pip3 install \
    numpy==1.16.3 \
    tensorflow-gpu==1.13.1 \
    imageio==2.8.0 \
    fire>=0.1.3 \
    regex==2017.4.5 \
    requests==2.21.0 \
    tqdm==4.31.1 \
    scipy==1.4.1

Clone this repo, and drop this Dockerfile in the root of the repo. Build the container:

docker build -t image-gpt .

With the root of repo as your current directory, run the container:

docker run -it -v $(pwd):/app -v $HOME/image-gpt-data:/root image-gpt bash

size mismatch for centroids: copying a param with shape torch.Size([512, 3]) from checkpoint, the shape in current model is torch.Size([16, 3]).

After compute centroids
!python src/compute_centroids.py --dataset cifar10 --num_clusters=16
I tried to train with this command
!python src/run.py --dataset cifar-10-batches-py train configs/s_clf.yml --pretrained=models/cifar10_gen.ckpt
Also tried to generate samples with this command
!python src/sample.py models/cifar10_gen.ckpt

but got this error:-

Traceback (most recent call last):
File "src/run.py", line 91, in
args.func(args)
File "src/run.py", line 21, in train
model = ImageGPT.load_from_checkpoint(args.pretrained)
File "/usr/local/lib/python3.6/dist-packages/pytorch_lightning/core/saving.py", line 154, in load_from_checkpoint
model = cls._load_model_state(checkpoint, strict=strict, **kwargs)
File "/usr/local/lib/python3.6/dist-packages/pytorch_lightning/core/saving.py", line 200, in _load_model_state
model.load_state_dict(checkpoint['state_dict'], strict=strict)
File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 1045, in load_state_dict
self.class.name, "\n\t".join(error_msgs)))
RuntimeError: Error(s) in loading state_dict for ImageGPT:
size mismatch for centroids: copying a param with shape torch.Size([512, 3]) from checkpoint, the shape in current model is torch.Size([16, 3]).

nft

horse with hat

learning rate for pretraining

Hello, thanks for the great project.
I want to know the learning rates, which are used in the pre-training.

When i check the paper, it describes learning rates with BERT or AR objective.
However, when i read the paper,
i understand that the pretraining is conducted with BERT+AR objective, not stand-alone BERT or AR.

[Figure 3] - Linear Probe Accuracy

Hi,

first of all, great work and thanks for sharing the code !

I want to know, how the linear probe accuracy was obtained in Figure 3.

I tried to run the following command as described in the README.md file:

python src/run.py --eval --n_embd 512  --n_head 8  --n_layer 24

I got around 10% accuracy (instead of >90% as reported in Figure 3 of the paper)

Thanks !

Model runs faster with Hugging Face Transformers

Hi everyone,

I thought I'd share this colab which ports the Image GPT weights into a Hugging Face transformers model. Runs about 10x faster for me.

How to run in windows or colab pro?

Do i need install linux, or it can be runned in windows env?
Also, can it be run in 2x gtx 1080 ?
And what averange speed of generation will be for one frame?

i try run in windows and get error:
line 1801, in init
self._traceback = tf_stack.extract_stack()

DataLossError (see above for traceback): Unable to open table file :\image-gpt\image-gpt-master\download: Unknown: NewRandomAccessFile failed to Create/Open: \image-gpt-master\download : Access is denied.
; Input/output error
[[node save/RestoreV2 (defined at src/run.py:179) ]]
[[node save/RestoreV2 (defined at src/run.py:179) ]]

BERT objective on google colab

Hello,

I would like to share with you this Google Colab I wrote in which the BERT approach can be tested. It's an Editor where you can load images, mask pixels from it and sample them.

The code must be completely finished running (Runtime -> Run all) before the window at the bottom can be used.

For loading the model into the huggingface transformer I use a slightly modified code from apeguero1

some question

why just use one of BERT loss or auto-encodetr loss? why don't use them together?
in Fig 2,why the linear probe acc will incerase first and then decrease ,withing the layer increase?in that paper,you say ,this contextualized input is used to solve the conditional next pixel prediction task.....i'm sorry , i'm hard to understand that meanings

HELPP ! ImportError: cannot import name 'function'

when i run this command

!python src/run.py --sample --n_embd 512 --n_head 8 --n_layer 24

i get this error :
Traceback (most recent call last):
File "src/run.py", line 12, in
from tensorflow.python.eager import function
ImportError: cannot import name 'function'

I don't know what's wrong
can anyone help me please ?