Code Monkey home page Code Monkey logo

pix2seq's Introduction

Pix2Seq codebase: multi-tasks with generative modeling

This is the official implementation of Pix2Seq in Tensorflow 2 with efficient TPUs/GPUs support. The original Pix2Seq code aims to be a general framework that turns RGB pixels into semantically meaningful sequences. We now extend it to be a generic codebase, with task-centric organization that supports different tasks as well as their combination, using generative modeling (both autoregressive and diffusion models, see below).

Pix2Seq Illustration
An illustration of Pix2Seq for object detection (from our Google AI blog post).

(NEW!) FitTransformer (FIT)

We added (official) implementations of FitTransformer (FIT) (as an encoder, a diffusion decoder, or an autoregressive decoder) see architectures/transformers.py.

(NEW!) Diffusion models

We added (official) implementations of diffusion models (such as Bit Diffusion, RIN, see references below) built on top of the original Pix2Seq codebase and they can be found in tasks/, models/, and architectures/.

Please note that we have not yet added proper documentations on training these models.

Models

Open In Colab

Objects365 object detection pretrained checkpoints

Backbone Total params (M) Image size Google cloud storage location
ResNet-50 36.6 640x640 gs://pix2seq/obj365_pretrain/resnet_640x640_b256_s400k
ResNet-50 (C4) 84.7 640x640 gs://pix2seq/obj365_pretrain/resnetc_640x640_b256_s400k
ViT-B 115.2 640x640 gs://pix2seq/obj365_pretrain/vit_b_640x640_b256_s400k
ViT-L 341.2 640x640 gs://pix2seq/obj365_pretrain/vit_l_640x640_b256_s400k

COCO object detection fine-tuned checkpoints

Backbone Total params (M) Image size COCO AP Google cloud storage location
ResNet-50 36.6 640x640 39.1 gs://pix2seq/coco_det_finetune/resnet_640x640
ResNet-50 36.6 1024x1024 41.7 gs://pix2seq/coco_det_finetune/resnet_1024x1024
ResNet-50 36.6 1333x1333 42.6 gs://pix2seq/coco_det_finetune/resnet_1333x1333
ResNet-50 (C4) 84.7 640x640 44.7 gs://pix2seq/coco_det_finetune/resnetc_640x640
ResNet-50 (C4) 84.7 1024x1024 46.9 gs://pix2seq/coco_det_finetune/resnetc_1024x1024
ResNet-50 (C4) 84.7 1333x1333 47.3 gs://pix2seq/coco_det_finetune/resnetc_1333x1333
ViT-B 115.2 640x640 44.2 gs://pix2seq/coco_det_finetune/vit_b_640x640
ViT-B 115.2 1024x1024 46.5 gs://pix2seq/coco_det_finetune/vit_b_1024x1024
ViT-B 115.2 1333x1333 47.1 gs://pix2seq/coco_det_finetune/vit_b_1333x1333
ViT-L 341.2 640x640 47.6 gs://pix2seq/coco_det_finetune/vit_l_640x640
ViT-L 341.2 1024x1024 49.2 gs://pix2seq/coco_det_finetune/vit_l_1024x1024
ViT-L 341.2 1333x1333 50.0 gs://pix2seq/coco_det_finetune/vit_l_1333x1333

Multitask checkpoints

Jointly fine-tuned on coco object detection, instance segmentation, captioning and keypoint detection.

Backbone Total params (M) Image size COCO AP Google cloud storage location
ViT-B 115.2 640x640 44.2 gs://pix2seq/multi_task/ckpt/vit_b_640x640
ViT-B 115.2 1024x1024 46.5 gs://pix2seq/multi_task/ckpt/vit_b_1024x1024

Usage

Colabs

See colabs for inference and fine-tuning demos. Give it a try!

Basic setup before running the code

The following setup is required before running the code.

git clone https://github.com/google-research/pix2seq.git
pip install -r requirements.txt

Download COCO annotations from gs://pix2seq/multi_task/data/coco/json to /tmp/coco_annotations (dir can be updated in the configs).

annotations_dir=/tmp/coco_annotations
wget https://storage.googleapis.com/pix2seq/multi_task/data/coco/json/captions_train2017_eval_compatible.json $annotations_dir
wget https://storage.googleapis.com/pix2seq/multi_task/data/coco/json/captions_val2017_eval_compatible.json $annotations_dir
wget https://storage.googleapis.com/pix2seq/multi_task/data/coco/json/instances_train2017.json $annotations_dir
wget https://storage.googleapis.com/pix2seq/multi_task/data/coco/json/instances_val2017.json $annotations_dir
wget https://storage.googleapis.com/pix2seq/multi_task/data/coco/json/person_keypoints_train2017.json $annotations_dir
wget https://storage.googleapis.com/pix2seq/multi_task/data/coco/json/person_keypoints_val2017.json $annotations_dir

(Optional) If accessing the pretrained checkpoints in Cloud is slowing down or blocking the start of training/eval, you can download them manually with following command gsutil cp -r gs://cloud_folder local_folder, and update pretrained_ckpt in the config file accordingly.

(Optional) If training fails at the start (due to NcclAllReduce error), try a different cross_device_ops for tf.distribute.MirroredStrategy in utils.py:build_strategy function.

Instructions for training (fine-tuning) of object detection models.

Below is the instruction for starting a training job, where we've set up a configuration mainly for fine-tuning the objects365 pretrained models.

Step 1: check config_det_finetune.py and update if necessary, such as encoder_variant, image_size.

Step 2: run python3 run.py --mode=train --model_dir=/tmp/model_dir --config=configs/config_det_finetune.py --config.train.batch_size=32 --config.train.epochs=20 --config.optimization.learning_rate=3e-5.

(Optional) Setup tensorboard for training curves with tensorboard --logdir=/tmp/model_dir. Note: eval on this drill fine-tuning run (with vit-b 640x640 and 20 epochs) should give ~43.5 AP. Exact configurations used to reproduce the COCO fine-tuning results can be found in gs://pix2seq/coco_det_finetune/...

(Optional) Set --run_eagerly=True for interactive debugging (which will be slower).

Instructions for evaluation of object detection models.

Below is the instruction for starting an evaluation job, which monitors the specified directory and perform (continuous) evaluation of the latest and un-evaluated checkpoints. It can be started in parallel to or after the training.

Step 1: check config_det_finetune.py and update if necessary, such as encoder_variant, image_size. Set checkpoint_dir if the checkpoints to evaluate are not in model_dir (e.g., for evaluating our provided fine-tuning checkpoints).

Step 2: run python3 run.py --mode=eval --model_dir=/tmp/model_dir --config=configs/config_det_finetune.py --config.dataset.coco_annotations_dir=/path/to/annotations --config.eval.batch_size=40.

(Optional) Setup tensorboard for eval curves and detection visualizations with tensorboard --logdir=/tmp/model_dir.

Instructions for evaluation of multi-task models.

In configs/config_multi_task.py uncomment the line with checkpoint_dir=get_multi_task_checkpoint_dir(...). To evaluate for image size 1024x1024 update image_size in the config.

Object detection

config=configs/config_multi_task.py:object_detection@coco/2017_object_detection,vit-b
model_dir=/tmp/pix2seq_eval_det
# Path to save the detected boxes for evaluating other tasks.
boxes_json_path=$model_dir/boxes.json
python3 run.py --config=$config --model_dir=$model_dir --mode=eval --config.task.eval_outputs_json_path=$boxes_json_path

(Optional) In order to use the detected boxes generated in the previous step for eval of instance segmentation and keypoint detection, they need to be converted to tfrecords using the command below. Alternatively you can use the pre-processed tfrecords that we have provided.

box_tfrecords=/tmp/boxes
python3 data/scripts/merge_coco_json_tfrecord.py --tfrecord_path=gs://pix2seq/multi_task/data/coco/tfrecord/val* --annotation_path=$boxes_json_path  --output_dir=$box_tfrecords

Instance segmentation

config=configs/config_multi_task.py:instance_segmentation@coco/2017_instance_segmentation,vit-b
val_file_pattern=gs://pix2seq/multi_task/data/coco/det_boxes/vit_b_640x640/*.tfrecord
# val_file_pattern=$box_tfrecords/*.tfrecord
# Number of masks to aggregate. Reduce this for faster but lower quality eval. 
num_samples=8
model_dir=/tmp/pix2seq_eval_ins
python3 run.py --config=$config --model_dir=$model_dir --mode=eval --config.dataset.val_file_pattern=$val_file_pattern --config.task.ensemble_num_samples=$num_samples

Keypoint detection

config="configs/config_multi_task.py:keypoint_detection@coco/2017_keypoint_detection,vit-b"
val_file_pattern=gs://pix2seq/multi_task/data/coco/det_boxes/vit_b_640x640/*.tfrecord
# val_file_pattern=$box_tfrecords/*.tfrecord
model_dir=/tmp/pix2seq_eval_key
python3 run.py --config=$config --model_dir=$model_dir --mode=eval --config.dataset.val_file_pattern=$val_file_pattern

Captioning

config=configs/config_multi_task.py:captioning@coco/2017_captioning,vit-b
model_dir=/tmp/pix2seq_eval_cap
python3 run.py --config=$config --model_dir=$model_dir --mode=eval

For captioning, the generated captions are written to $model_dir/coco_result_{step}_{uuid.uuid4()}.json. Metrics can be computed using the official coco scripts.

Note: You can run eval on a subset of images by setting --config.eval.steps.

Cite

Pix2seq paper:

@article{chen2021pix2seq,
  title={Pix2seq: A language modeling framework for object detection},
  author={Chen, Ting and Saxena, Saurabh and Li, Lala and Fleet, David J and Hinton, Geoffrey},
  journal={arXiv preprint arXiv:2109.10852},
  year={2021}
}

Pix2seq multi-task paper:

@article{chen2022unified,
  title={A Unified Sequence Interface for Vision Tasks},
  author={Chen, Ting and Saxena, Saurabh and Li, Lala and Lin, Tsung-Yi and Fleet, David J. and Hinton, Geoffrey},
  journal={arXiv preprint arXiv:2206.07669},
  year={2022}
}

Pix2seq-D paper:

@article{chen2022unified,
  title={A generalist framework for panoptic segmentation of images and videos},
  author={Chen, Ting and Li, Lala and Saxena, Saurabh and Hinton, Geoffrey and Fleet, David J.},
  journal={arXiv preprint arXiv:2210.06366},
  year={2022}
}

Bit Diffusion paper:

@article{chen2022analog,
  title={Analog bits: Generating discrete data using diffusion models with self-conditioning},
  author={Chen, Ting and Zhang, Ruixiang and Hinton, Geoffrey},
  journal={arXiv preprint arXiv:2208.04202},
  year={2022}
}

RIN Diffusion paper:

@article{jabri2022scalable,
  title={Scalable Adaptive Computation for Iterative Generation},
  author={Jabri, Allan and Fleet, David J. and Chen, Ting},
  journal={arXiv preprint arXiv:2212.11972},
  year={2022}
}

Diffusion noise scheduling paper:

@article{chen2023on,
  title={On the Importance of Noise Scheduling for Diffusion Models},
  author={Chen, Ting},
  journal={arXiv preprint arXiv:2301.10972},
  year={2023}
}

FitTransformer (FIT) paper:

@article{chen2023fit,
  title={FIT: Far-reaching Interleaved Transformers},
  author={Chen, Ting and Li, Lala},
  journal={arXiv preprint arXiv:2305.12689},
  year={2023}
}

Disclaimer

This is not an officially supported Google product.

pix2seq's People

Contributors

chentingpc avatar eltociear avatar nyandwi avatar saxenasaurabh avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

pix2seq's Issues

Visualization of Attention map

Hello,

Thank you for the wonderful work.
I couldn't seem to find the function for visualizing the decoder cross attention map as shown in the paper.
Would you be able to provide this?

Thank you,
William Han

How to perform multi-task training?

Hi, thanks for your great work!
Could you please tell me how to perform multi-task training? It seems how to do multi-task training is not specified.
Thank you very much!

Problem with installing packages

I am having problem with installing requirements.txt. Is all the packages alright?

Collecting pycocotools
  Using cached pycocotools-2.0.6.tar.gz (24 kB)
  Installing build dependencies ... done
  Getting requirements to build wheel ... error
  error: subprocess-exited-with-error

  × Getting requirements to build wheel did not run successfully.
  │ exit code: 1
  ╰─> [17 lines of output]
      Error in sitecustomize; set PYTHONVERBOSE for traceback:
      AssertionError:
      Traceback (most recent call last):
        File "/usr/local/lib/python3.10/site-packages/pip/_vendor/pep517/in_process/_in_process.py", line 351, in <module>
          main()
        File "/usr/local/lib/python3.10/site-packages/pip/_vendor/pep517/in_process/_in_process.py", line 333, in main
          json_out['return_val'] = hook(**hook_input['kwargs'])
        File "/usr/local/lib/python3.10/site-packages/pip/_vendor/pep517/in_process/_in_process.py", line 118, in get_requires_for_build_wheel
          return hook(config_settings)
        File "/usr/local/lib/python3.10/site-packages/setuptools/build_meta.py", line 338, in get_requires_for_build_wheel
          return self._get_build_requires(config_settings, requirements=['wheel'])
        File "/usr/local/lib/python3.10/site-packages/setuptools/build_meta.py", line 320, in _get_build_requires
          self.run_setup()
        File "/usr/local/lib/python3.10/site-packages/setuptools/build_meta.py", line 335, in run_setup
          exec(code, locals())
        File "<string>", line 7, in <module>
      ModuleNotFoundError: No module named 'numpy'
      [end of output]

  note: This error originates from a subprocess, and is likely not a problem with pip.
error: subprocess-exited-with-error

× Getting requirements to build wheel did not run successfully.
│ exit code: 1
╰─> See above for output.

note: This error originates from a subprocess, and is likely not a problem with pip.

TypeError: 'int' object is not subscriptable

I have just run the colab and install the requirements. How can I resolve this?

TypeError                                 Traceback (most recent call last)
[<ipython-input-16-75c695310352>](https://localhost:8080/#) in <module>
     20 
     21 # Restore checkpoint.
---> 22 model = model_lib.Model(config)
     23 checkpoint = tf.train.Checkpoint(
     24     model=model, global_step=tf.Variable(0, dtype=tf.int64))

[/content/pix2seq/models/ar_model.py](https://localhost:8080/#) in __init__(self, config, **kwargs)
     50     else:
     51       self.encoder = ResNetTransformer(
---> 52           config.image_size[0], config.image_size[1], config.resnet_variant,
     53           config.resnet_depth, config.resnet_width_multiplier,
     54           config.resnet_sk_ratio, config.num_encoder_layers, config.dim_att,

TypeError: 'int' object is not subscriptable

coco images downloading

Could you please provide the TFDS COCO image path and the local save path? The dataset can not be downloaded online in my country. Or, how to use the official coco images downloaded from the official website for training, can you provide some instructions?

Training Hangs forever

Hi,

Thanks for releasing the code. I've been trying to train the default model on coco dataset with 8 V100 GPUs, but the training gets stuck at the very beginning.

Commond:
python3 run.py --mode=train --model_dir=/path/to/pix2seq --config=configs/config_det_finetune.py --config.dataset.coco_annotations_dir=/path/to/coco/annotations --run_eagerly=True

Environment:
Official tensorflow docker

Log
`
2022-05-06 17:37:27.168296: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:936] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-05-06 17:37:27.169421: I tensorflow/core/platform/cpu_feature_guard.cc:151] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2022-05-06 17:37:29.303662: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:936] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-05-06 17:37:29.304294: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:936] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-05-06 17:37:29.304868: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:936] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-05-06 17:37:29.305425: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:936] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-05-06 17:37:29.306014: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:936] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-05-06 17:37:29.306572: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:936] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-05-06 17:37:29.307122: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:936] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-05-06 17:37:29.307680: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:936] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-05-06 17:37:29.308248: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:936] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-05-06 17:37:29.308798: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:936] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-05-06 17:37:29.309342: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:936] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-05-06 17:37:29.309936: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:936] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-05-06 17:37:29.310488: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:936] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-05-06 17:37:29.311030: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:936] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-05-06 17:37:29.311569: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:936] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-05-06 17:37:29.312111: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:936] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-05-06 17:37:29.312652: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:936] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-05-06 17:37:29.313192: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:936] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-05-06 17:37:29.313765: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:936] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-05-06 17:37:29.314309: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:936] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-05-06 17:37:29.314858: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:936] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-05-06 17:37:29.315401: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:936] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-05-06 17:37:29.315955: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:936] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-05-06 17:37:29.316499: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:936] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-05-06 17:37:32.890560: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:936] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-05-06 17:37:32.891236: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:936] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-05-06 17:37:32.891837: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:936] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-05-06 17:37:32.892434: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:936] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-05-06 17:37:32.893018: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:936] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-05-06 17:37:32.893643: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:936] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-05-06 17:37:32.894231: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:936] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-05-06 17:37:32.895022: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:936] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-05-06 17:37:32.895784: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:936] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-05-06 17:37:32.896362: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:936] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-05-06 17:37:32.896927: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:936] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-05-06 17:37:32.897530: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:936] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-05-06 17:37:32.898093: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:936] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-05-06 17:37:32.898647: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:936] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-05-06 17:37:32.899203: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:936] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-05-06 17:37:32.899757: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:936] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-05-06 17:37:32.900314: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:936] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-05-06 17:37:32.900872: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1525] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 14639 MB memory: -> device: 0, name: Tesla V100-SXM2-16GB, pci bus id: 0000:00:17.0, compute capability: 7.0
2022-05-06 17:37:32.901671: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:936] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-05-06 17:37:32.902289: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1525] Created device /job:localhost/replica:0/task:0/device:GPU:1 with 14639 MB memory: -> device: 1, name: Tesla V100-SXM2-16GB, pci bus id: 0000:00:18.0, compute capability: 7.0
2022-05-06 17:37:32.902990: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:936] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-05-06 17:37:32.903568: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1525] Created device /job:localhost/replica:0/task:0/device:GPU:2 with 14639 MB memory: -> device: 2, name: Tesla V100-SXM2-16GB, pci bus id: 0000:00:19.0, compute capability: 7.0
2022-05-06 17:37:32.904262: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:936] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-05-06 17:37:32.904859: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1525] Created device /job:localhost/replica:0/task:0/device:GPU:3 with 14639 MB memory: -> device: 3, name: Tesla V100-SXM2-16GB, pci bus id: 0000:00:1a.0, compute capability: 7.0
2022-05-06 17:37:32.905525: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:936] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-05-06 17:37:32.906123: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1525] Created device /job:localhost/replica:0/task:0/device:GPU:4 with 14639 MB memory: -> device: 4, name: Tesla V100-SXM2-16GB, pci bus id: 0000:00:1b.0, compute capability: 7.0
2022-05-06 17:37:32.906772: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:936] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-05-06 17:37:32.907365: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1525] Created device /job:localhost/replica:0/task:0/device:GPU:5 with 14639 MB memory: -> device: 5, name: Tesla V100-SXM2-16GB, pci bus id: 0000:00:1c.0, compute capability: 7.0
2022-05-06 17:37:32.908015: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:936] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-05-06 17:37:32.908616: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1525] Created device /job:localhost/replica:0/task:0/device:GPU:6 with 14639 MB memory: -> device: 6, name: Tesla V100-SXM2-16GB, pci bus id: 0000:00:1d.0, compute capability: 7.0
2022-05-06 17:37:32.909246: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:936] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-05-06 17:37:32.909898: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1525] Created device /job:localhost/replica:0/task:0/device:GPU:7 with 14639 MB memory: -> device: 7, name: Tesla V100-SXM2-16GB, pci bus id: 0000:00:1e.0, compute capability: 7.0
INFO:tensorflow:Using MirroredStrategy with devices ('/job:localhost/replica:0/task:0/device:GPU:0', '/job:localhost/replica:0/task:0/device:GPU:1', '/job:localhost/replica:0/task:0/device:GPU:2', '/job:localhost/replica:0/task:0/device:GPU:3', '/job:localhost/replica:0/task:0/device:GPU:4', '/job:localhost/replica:0/task:0/device:GPU:5', '/job:localhost/replica:0/task:0/device:GPU:6', '/job:localhost/replica:0/task:0/device:GPU:7')
I0506 17:37:35.433079 140303732664128 mirrored_strategy.py:374] Using MirroredStrategy with devices ('/job:localhost/replica:0/task:0/device:GPU:0', '/job:localhost/replica:0/task:0/device:GPU:1', '/job:localhost/replica:0/task:0/device:GPU:2', '/job:localhost/replica:0/task:0/device:GPU:3', '/job:localhost/replica:0/task:0/device:GPU:4', '/job:localhost/replica:0/task:0/device:GPU:5', '/job:localhost/replica:0/task:0/device:GPU:6', '/job:localhost/replica:0/task:0/device:GPU:7')
I0506 17:37:35.433445 140303732664128 utils.py:240] Running using MirroredStrategy on 8 replicas
I0506 17:37:35.433876 140303732664128 utils.py:301] Config: dataset:
batch_duplicates: 1
cache_dataset: false
coco_annotations_dir: /path/to/coco/annotations
data_dir: /table_efs/data/kshi/tf_datasets
eval_split: validation
label_shift: 0
name: object_detection
tfds_name: coco/2017
train_filename: instances_train2017.json
train_split: train
val_filename: instances_val2017.json
datasets:

  • !!python/object:ml_collections.config_dict.config_dict.ConfigDict
    _convert_dict: true
    _fields:
    batch_duplicates: 1
    cache_dataset: false
    coco_annotations_dir: /path/to/coco/annotations
    data_dir: /path/to/tf_datasets
    eval_split: validation
    label_shift: 0
    name: object_detection
    tfds_name: coco/2017
    train_filename: instances_train2017.json
    train_split: train
    val_filename: instances_val2017.json
    _locked: false
    _type_safe: true
    eval:
    batch_size: 8
    checkpoint_dir: ''
    steps: 0
    tag: eval
    model:
    coord_vocab_shift: 1000
    dec_proj_mode: mlp
    decoder_output_bias: true
    dim_att: 768
    dim_att_dec: 512
    dim_mlp: 3072
    dim_mlp_dec: 2048
    drop_att: 0.0
    drop_path: 0.1
    drop_units: 0.1
    image_size: 640
    max_seq_len: 512
    name: encoder_ar_decoder
    num_decoder_layers: 6
    num_encoder_layers: 12
    num_heads: 12
    num_heads_dec: 16
    patch_size: 16
    pos_encoding: sin_cos
    pos_encoding_dec: learned
    pretrained_ckpt: /vit_b_640x640_b256_s400k/checkpoint
    resnet_variant: c1
    shared_decoder_embedding: true
    text_vocab_shift: 3000
    use_cls_token: false
    vocab_size: 3000
    model_dir: /path/to/pix2seq
    optimization:
    beta1: 0.9
    beta2: 0.95
    end_lr_factor: 0.01
    eps: 1.0e-08
    global_clipnorm: -1
    learning_rate: 3.0e-05
    learning_rate_scaling: none
    learning_rate_schedule: linear
    optimizer: adamw
    warmup_epochs: 2
    warmup_steps: 0
    weight_decay: 0.05
    task:
    class_label_corruption: rand_n_fake_cls
    color_jitter_strength: 0.0
    eos_token_weight: 0.1
    image_size: 640
    jitter_scale_max: 2.0
    jitter_scale_min: 0.3
    max_instances_per_image: 100
    max_instances_per_image_test: 100
    name: object_detection
    noise_bbox_weight: 1.0
    object_order: random
    quantization_bins: 1000
    temperature: 1.0
    top_k: 0
    top_p: 0.4
    vocab_id: 10
    weight: 1.0
    tasks:
  • !!python/object:ml_collections.config_dict.config_dict.ConfigDict
    _convert_dict: true
    _fields:
    class_label_corruption: rand_n_fake_cls
    color_jitter_strength: 0.0
    eos_token_weight: 0.1
    image_size: 640
    jitter_scale_max: 2.0
    jitter_scale_min: 0.3
    max_instances_per_image: 100
    max_instances_per_image_test: 100
    name: object_detection
    noise_bbox_weight: 1.0
    object_order: random
    quantization_bins: 1000
    temperature: 1.0
    top_k: 0
    top_p: 0.4
    vocab_id: 10
    weight: 1.0
    _locked: false
    _type_safe: true
    train:
    batch_size: 8
    checkpoint_epochs: 1
    checkpoint_steps: 0
    epochs: 40
    keep_checkpoint_max: 5
    loss_type: xent
    steps: 0

loading annotations into memory...
Done (t=17.30s)
creating index...
index created!
I0506 17:38:13.049188 140303732664128 dataset_info.py:439] Load dataset info from /path/to/coco/2017/1.1.0
I0506 17:38:13.099052 140303732664128 dataset_builder.py:369] Reusing dataset coco (/path/to//tf_datasets/coco/2017/1.1.0)
I0506 17:38:13.099727 140303732664128 logging_logger.py:44] Constructing tf.data.Dataset coco for split train, from /path/to//tf_datasets/coco/2017/1.1.0
/usr/local/lib/python3.8/dist-packages/tensorflow/python/data/ops/structured_function.py:264: UserWarning: Even though the tf.config.experimental_run_functions_eagerly option is set, this option does not apply to tf.data functions. To force eager execution of tf.data functions, please use tf.data.experimental.enable_debug_mode().
warnings.warn(
I0506 17:38:13.462730 140303732664128 coco.py:180] Loading annotations from /path/to//coco/annotations/instances_train2017.json
INFO:tensorflow:Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
I0506 17:38:35.765927 140303732664128 cross_device_ops.py:616] Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
INFO:tensorflow:Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
I0506 17:38:35.768844 140303732664128 cross_device_ops.py:616] Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
INFO:tensorflow:Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
I0506 17:38:35.776878 140303732664128 cross_device_ops.py:616] Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
INFO:tensorflow:Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
I0506 17:38:35.778877 140303732664128 cross_device_ops.py:616] Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
INFO:tensorflow:Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
I0506 17:38:35.787815 140303732664128 cross_device_ops.py:616] Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
INFO:tensorflow:Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
I0506 17:38:35.789826 140303732664128 cross_device_ops.py:616] Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
INFO:tensorflow:Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
I0506 17:38:35.797565 140303732664128 cross_device_ops.py:616] Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
INFO:tensorflow:Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
I0506 17:38:35.799470 140303732664128 cross_device_ops.py:616] Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
INFO:tensorflow:Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
I0506 17:38:35.808385 140303732664128 cross_device_ops.py:616] Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
INFO:tensorflow:Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
I0506 17:38:35.810423 140303732664128 cross_device_ops.py:616] Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
2022-05-06 17:38:35.962002: W tensorflow/core/grappler/optimizers/data/slack.cc:103] Could not find a final prefetch in the input pipeline to which to introduce slack.
WARNING:tensorflow:Using MirroredStrategy eagerly has significant overhead currently. We will be working on improving this in the future, but for now please wrap call_for_each_replica or experimental_run or run inside a tf.function to get the best performance.
W0506 17:38:37.268074 140303732664128 mirrored_run.py:85] Using MirroredStrategy eagerly has significant overhead currently. We will be working on improving this in the future, but for now please wrap call_for_each_replica or experimental_run or run inside a tf.function to get the best performance.
I0506 17:38:37.286441 140138558494464 model.py:94] train_step begins...
2022-05-06 17:38:37.878649: I tensorflow/stream_executor/cuda/cuda_dnn.cc:368] Loaded cuDNN version 8100
I0506 17:38:49.217952 140138004870912 model.py:94] train_step begins...
2022-05-06 17:38:49.602903: I tensorflow/stream_executor/cuda/cuda_dnn.cc:368] Loaded cuDNN version 8100
I0506 17:38:53.243627 140137996478208 model.py:94] train_step begins...
2022-05-06 17:38:53.629156: I tensorflow/stream_executor/cuda/cuda_dnn.cc:368] Loaded cuDNN version 8100
I0506 17:38:57.267315 140137988085504 model.py:94] train_step begins...
2022-05-06 17:38:57.647233: I tensorflow/stream_executor/cuda/cuda_dnn.cc:368] Loaded cuDNN version 8100
I0506 17:39:01.256025 140137979692800 model.py:94] train_step begins...
2022-05-06 17:39:01.642088: I tensorflow/stream_executor/cuda/cuda_dnn.cc:368] Loaded cuDNN version 8100
I0506 17:39:05.239289 140137971300096 model.py:94] train_step begins...
2022-05-06 17:39:05.620428: I tensorflow/stream_executor/cuda/cuda_dnn.cc:368] Loaded cuDNN version 8100
I0506 17:39:09.248277 140137962907392 model.py:94] train_step begins...
2022-05-06 17:39:09.626419: I tensorflow/stream_executor/cuda/cuda_dnn.cc:368] Loaded cuDNN version 8100
I0506 17:39:13.215021 140137954514688 model.py:94] train_step begins...
2022-05-06 17:39:13.615836: I tensorflow/stream_executor/cuda/cuda_dnn.cc:368] Loaded cuDNN version 8100
INFO:tensorflow:batch_all_reduce: 369 all-reduces with algorithm = nccl, num_packs = 1
I0506 17:39:17.282191 140303732664128 cross_device_ops.py:897] batch_all_reduce: 369 all-reduces with algorithm = nccl, num_packs = 1`

GPU status:
image

It seems that code hungs/stucks after this line https://github.com/google-research/pix2seq/blob/main/models/model.py#L108. Have you encountered this in your experiments?

Coco-pretrained model

Hi, thanks for your great work!

Could you open the models that trained only on COCO dataset?

Thanks.

Question about inference

I have a question about inference code.

In tasks/task_utils.py, function 'decode_object_seq_to_bbox',
there is no function that acts as 'end-of-sequence'.
image

I think PADDING_TOKEN (which denoted as 0 in vocab.py) is the token that indicates end-of-seq,
but in the inference time, the code does not use the token.

Am I right?

Inconsistency between paper and code

It seems that the labels in output target are either rand or fake, without any ground truth in it which is different from the paper in the end of section 2.3 that After noise objects are synthesised and discretized, we then append them in the end of the original input sequence.

response_seq_class_m = tf.concat([quantized_bbox, new_label_m], axis=-1)

image

How much time needed in the VOS task?

DDPM is known to be time-consuming and I am not sure if it is suitable for video segmentation tasks. So I wonder how much time is needed to for video object segmentation

scale factor b

@chentingpc You mentioned scale the input image x by factor b when we calculate the m_bits in Algorithm 1. I am wondering if we need to scale it back when we predict m_0 from m_t during sampling because it is not so obvious in your Algorithm 2. Did you actually implement it in DDIM sampler? I have similar question regarding your another paper On the Importance of Noise Scheduling for Diffusion Models.

If such inverted scaling is not necessary, could you please explain the reason behind this?

About ViT-B

Hi,

have you ever tried to train ViT-B without Obj365 pretraining?

About sequence formulation for instance segmentation

Excuse me,
I am interested in Pix2Seq, and trying to better understand it.
I wonder how instance segmentation targets are formulated for training. To be more specific,

  • How to convert coco annotations into target sequences? (especially those with more than one polygon)
  • Do the starting point and direction matter?
  • Is there any design handling the varying length of target sequences?

It would be nice if you can provide these details at your convenience.
Thank you!

training gets stuck

Hello, I train the model with the command

python run.py --mode=train --model_dir=/tmp/model_dir --config=configs/config_det_finetune.py --config.dataset.coco_annotations_dir=/home/t-liuze/coco --config.train.batch_size=32 --config.train.epochs=20 --config.optimization.learning_rate=3e-5

after the following log

I0328 17:36:25.534775 139824410076992 logging_logger.py:44] Constructing tf.data.Dataset coco for split train, from /home/t-liuze/tensorflow_datasets/coco/2017/1.1.0
I0328 17:36:25.781254 139824410076992 coco.py:174] Loading annotations from /home/t-liuze/coco/instances_train2017.json
I0328 17:36:51.126714 139824410076992 utils.py:189] Restoring from latest checkpoint: gs://pix2seq/obj365_pretrain/vit_b_640x640_b256_s400k/ckpt-400000
INFO:tensorflow:Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
I0328 17:36:58.118989 139824410076992 cross_device_ops.py:619] Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
INFO:tensorflow:Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
I0328 17:36:58.120561 139824410076992 cross_device_ops.py:619] Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
INFO:tensorflow:Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
I0328 17:36:58.123781 139824410076992 cross_device_ops.py:619] Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
INFO:tensorflow:Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
I0328 17:36:58.124701 139824410076992 cross_device_ops.py:619] Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
INFO:tensorflow:Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
I0328 17:36:58.128313 139824410076992 cross_device_ops.py:619] Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
INFO:tensorflow:Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
I0328 17:36:58.129202 139824410076992 cross_device_ops.py:619] Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
INFO:tensorflow:Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
I0328 17:36:58.131696 139824410076992 cross_device_ops.py:619] Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
INFO:tensorflow:Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
I0328 17:36:58.132594 139824410076992 cross_device_ops.py:619] Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
INFO:tensorflow:Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
I0328 17:36:58.136107 139824410076992 cross_device_ops.py:619] Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
INFO:tensorflow:Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
I0328 17:36:58.136971 139824410076992 cross_device_ops.py:619] Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
2022-03-28 17:36:58.212910: W tensorflow/core/grappler/optimizers/data/slack.cc:103] Could not find a final `prefetch` in the input pipeline to which to introduce slack.

The program gets stuck

I have tested single gpu or uncomment

    cross_device_ops = None  # tf.distribute.NcclAllReduce() by default
    # if the default cross_device_ops fails, try either of the following two
    # by uncommenting it.
    # cross_device_ops = tf.distribute.HierarchicalCopyAllReduce()
    # cross_device_ops = tf.distribute.ReductionToOneDevice()

The program still gets stuck.

Here is my tf version:

tf-docker ~ > pip list | grep tensorflow
tensorflow                   2.7.1
tensorflow-addons            0.16.1
tensorflow-datasets          4.5.2
tensorflow-estimator         2.7.0
tensorflow-io-gcs-filesystem 0.23.1
tensorflow-metadata          1.7.0

checksum error for downloaded data

Hi,
I am getting some checksum error when the code automatically downloads from google bucket. Here is the error message:

File "/usr/local/lib/python3.8/dist-packages/tensorflow_datasets/core/download/download_manager.py", line 343, in _handle_download_result
raise NonMatchingChecksumError(resource.url, tmp_path)
tensorflow_datasets.core.download.download_manager.NonMatchingChecksumError: Artifact http://images.cocodataset.org/zips/test2017.zip, downloaded to /root/tensorflow_datasets/downloads/images.cocodataset.org_zips_test2017KDQv8bPgQock_hnrTxvqZbYabAXdY4EN91CL7w8_GTo.zip.tmp.86dbce4c6d70492c9e71a03288cf434e/test2017.zip, has wrong checksum. This might indicate:

  • The website may be down (e.g. returned a 503 status code). Please check the url.
  • For Google Drive URLs, try again later as Drive sometimes rejects downloads when too many people access the same URL. See tensorflow/datasets#1482
  • The original datasets files may have been updated. In this case the TFDS dataset builder should be updated to use the new files and checksums. Sorry about that. Please open an issue or send us a PR with a fix.
  • If you're adding a new dataset, don't forget to register the checksums as explained in: https://www.tensorflow.org/datasets/add_dataset#2_run_download_and_prepare_locally

Could you please update the checksum so that we can use the code.

Regards,
Dipendra

Cannot reproduce BLEU-4 score of 34.3 in Table 1 for image captioning task

Hi there, first of all, thank you very much for sharing the code!

I tried the following command in README.md to evaluate model performance on image captioning task.

config=configs/config_multi_task.py:captioning@coco/2017_captioning,vit-b
model_dir=/tmp/pix2seq_eval_cap
python3 run.py --config=$config --model_dir=$model_dir --mode=eval

The checkpoint used is vit_b_640x640-ckpt-93324, and I used the pycocoevalcap to evaluate the results with coco captions 2017 validation dataset.

The metric I got for BLEU-4 score is only 14.1. In Table 1 of the paper, the score for Pix2Seq v2 multi-tasks (640*640) for Captioning task is 34.3. I am wondering if I did anything wrong. Could you please let me know how to resolve this?

Thank you in advance!

Screen Shot 2023-01-14 at 14 31 10 Screen Shot 2023-01-14 at 14 42 39

Here are the task configs:

{
  "dataset": {
    "batch_duplicates": 1,
    "cache_dataset": true,
    "coco_annotations_dir_for_metrics": "tmp/coco_annotations",
    "eval_num_examples": 5000,
    "eval_split": "validation",
    "label_shift": 0,
    "name": "coco/2017_captioning",
    "train_file_pattern": "gs://pix2seq/multi_task/data/coco/tfrecord/train*",
    "train_filename_for_metrics": "captions_train2017_eval_compatible.json",
    "train_num_examples": 118287,
    "train_split": "train",
    "val_file_pattern": "gs://pix2seq/multi_task/data/coco/tfrecord/val*",
    "val_filename_for_metrics": "captions_val2017_eval_compatible.json"
  },
  "datasets": [
    {
      "batch_duplicates": 1,
      "cache_dataset": true,
      "coco_annotations_dir_for_metrics": "tmp/coco_annotations",
      "eval_num_examples": 5000,
      "eval_split": "validation",
      "label_shift": 0,
      "name": "coco/2017_captioning",
      "train_file_pattern": "gs://pix2seq/multi_task/data/coco/tfrecord/train*",
      "train_filename_for_metrics": "captions_train2017_eval_compatible.json",
      "train_num_examples": 118287,
      "train_split": "train",
      "val_file_pattern": "gs://pix2seq/multi_task/data/coco/tfrecord/val*",
      "val_filename_for_metrics": "captions_val2017_eval_compatible.json"
    }
  ],
  "eval": {
    "batch_size": 8,
    "checkpoint_dir": "gs://pix2seq/multi_task/ckpt/vit_b_640x640",
    "steps": 0,
    "tag": "eval"
  },
  "model": {
    "coord_vocab_shift": 1000,
    "dec_proj_mode": "mlp",
    "decoder_output_bias": true,
    "dim_att": 768,
    "dim_att_dec": 512,
    "dim_mlp": 3072,
    "dim_mlp_dec": 2048,
    "drop_att": 0.0,
    "drop_path": 0.1,
    "drop_units": 0.1,
    "image_size": [
      640,
      640
    ],
    "max_seq_len": 512,
    "name": "encoder_ar_decoder",
    "num_decoder_layers": 6,
    "num_encoder_layers": 12,
    "num_heads": 12,
    "num_heads_dec": 16,
    "patch_size": 16,
    "pos_encoding": "sin_cos",
    "pos_encoding_dec": "learned",
    "pretrained_ckpt": "gs://pix2seq/obj365_pretrain/vit_b_640x640_b256_s400k",
    "resnet_variant": "c1",
    "shared_decoder_embedding": true,
    "text_vocab_shift": 3000,
    "use_cls_token": false,
    "vocab_size": 35000
  },
  "model_dir": "tmp/pix2seq_eval_cap",
  "optimization": {
    "beta1": 0.9,
    "beta2": 0.95,
    "end_lr_factor": 0.01,
    "eps": 1e-08,
    "global_clipnorm": -1,
    "learning_rate": 0.0001,
    "learning_rate_scaling": "none",
    "learning_rate_schedule": "linear",
    "optimizer": "adamw",
    "warmup_epochs": 10,
    "warmup_steps": 0,
    "weight_decay": 0.05
  },
  "task": {
    "captions_per_image": 5,
    "color_jitter_strength": 0.5,
    "eos_token_weight": 0.1,
    "image_size": [
      640,
      640
    ],
    "input_seq_drop_rate": 0.5,
    "jitter_scale_max": 1.0,
    "jitter_scale_min": 1.0,
    "max_instances_per_image": 5,
    "max_seq_len": 128,
    "metric": {
      "name": "coco_captioning"
    },
    "name": "captioning",
    "temperature": 1.0,
    "top_k": 0,
    "top_p": 1.0,
    "vocab_id": 13,
    "weight": 1.0
  },
  "tasks": [
    {
      "captions_per_image": 5,
      "color_jitter_strength": 0.5,
      "eos_token_weight": 0.1,
      "image_size": [
        640,
        640
      ],
      "input_seq_drop_rate": 0.5,
      "jitter_scale_max": 1.0,
      "jitter_scale_min": 1.0,
      "max_instances_per_image": 5,
      "max_seq_len": 128,
      "metric": {
        "name": "coco_captioning"
      },
      "name": "captioning",
      "temperature": 1.0,
      "top_k": 0,
      "top_p": 1.0,
      "vocab_id": 13,
      "weight": 1.0
    }
  ],
  "tokenizer": {
    "add_bos": false,
    "add_eos": true,
    "sentencepiece_model": "gs://pix2seq/multi_task/data/c4_en_32k_spm.model"
  },
  "train": {
    "batch_size": 128,
    "checkpoint_epochs": 1,
    "checkpoint_steps": 0,
    "epochs": 100,
    "keep_checkpoint_max": 10,
    "loss_type": "xent",
    "steps": 0
  }
}

RIN results on CIFAR

Hi,

In the RIN paper, you mention that you reach 1.81 FID and i'm trying to reproduce this results without success. I've reached at a minimum FID score of 17.7

Looking at the config files, 2 train schedulers are used, linear and sigmoid. Which one was used to get this result? Is the inference framework the same as the rest of the results with 250 DDIM steps and a cosine scheduler with tau=1? Are those results obtained on the class conditional setup or the unconditional one?

Also, if you have them, do you have an IS associated to this model?

Thank you very much for the information!

Distance Measurement

Hi,

What is the reference for measuring distance to objects by camera? and how do you estimate it?

Please, help me. Thanks

All 3 colabs do not work out of the box

Most issues are due to image size and type errors regarding ints, tuples or string mismatches.
TypeError: tuple indices must be integers or slices, not str for pix2seq_inference_multitask.ipynb in Object detection

missing code for panoptic segmentation

Hello,

thank you for this cool project.
I am trying to get pix2seq-D for panoptic segmentation to run. However there seems to be quite some code missing in the repository. Are you planning to add the missing code (and release the model weights)? That would be very helpful!
I am happy to provide more detailed pointers to what is missing. For instance, code for panoptic segmentation seems to be missing in:

  • configs/dataset_configs.py:dataset_configs
  • metrics/coco_metrics.py
  • data/coco.py

ValueError: coco_object_detection not registered!

Hi! I encountered the titled error when trying to evaluate the released object detection models following the provided command:
python3 run.py --mode=eval --model_dir=</tmp/model_dir> --config=configs/config_det_finetune.py --config.dataset.coco_annotations_dir=</path/to/annotations> --config.eval.batch_size=40
with released fine-tuned checkpoint corresonding to encoder_variant, image_size in config_det_finetune.py downloaded into </tmp/model_dir/>.

Any pointers would be greatly appreciated! Thanks!

How to prepare coco data in tfrecord format?

Your code does not contain the path for the original coco detection dataset.

And the downloading of tfrecord format coco is wrong:“E tensorflow/tsl/platform/cloud/curl_http_request.cc:610] The transmission of request 0x9690640 (URI: https://www.googleapis.com/storage/v1/b/pix2seq/o?fields=items%2Fname%2CnextPageToken&prefi
x=multi_task%2Fdata%2Fcoco%2Ftfrecord%2F) has been stuck at 0 of 0 bytes for 61 seconds and will be aborted. CURL timing information: lookup time: 9.4e-05 (No error), connect time: 0 (No error), pre-transfer time: 0 (No er
ror), start-transfer time: 0 (No error)”

How to use your code?

Training from scratch

Thanks for your awesome work! I want to reproduce the results in Table 1. When will you release the command for training model from scratch? Looking forward to your reply.

Trouble training RIN on CIFAR-10

Hi,

I'm currently trying to train RIN on CIFAR-10 using the code and config that was provided in the repository.

I did some minor changes to the code to get it to work:

  • fix some import statements (Diff)
  • use dummy data for FID since i currently don't have the cifar10_stats_real.npy file (Diff)
  • fix a bug in image_diffusion_model.py where the Model.sample calls a method that is only defined in ModelT (Diff)

The entire diff can be seen here: https://github.com/google-research/pix2seq/compare/main..leon-w:b6609
This was the exact code I used for training and evaluation.

Training Setup

I trained the model using this command:

python run.py \
 --config configs/config_diffusion_cifar10.py \
 --mode train \
 --model_dir results/cifar10 \
 --config.train.checkpoint_epochs 5 \
 --config.train.keep_checkpoint_max 2

train_log.txt

The training finishes after around 2 hours.
These are the training curves logged to Tensorboard:
image

Eval Setup

I then run the trained model in evaluation mode to create a few samples using:

python run.py \
 --config configs/config_diffusion_cifar10.py \
 --mode eval \
 --model_dir results/cifar10 \
 --config.eval.steps 1

eval_log.txt

Unfortunately, the generated samples don't seem to contain anything meaningful and only look like pure noise:

image

I also tried training the model 10x longer but still got only noise.

Has anyone successfully trained a RIN model using this code base before and has any idea how I can get this to work?
Any help would be highly appreciated!

'ImageFont' object has no attribute 'getsize'

When I run /colab/pix2seq_inference_object_detection.ipynb, I meet an error as shown in the following figure.
image
It shows that the problem lies in line233 and line243 in /tasks/visualization/vis_utils.py.

display_str_heights = [font.getsize(ds)[1] for ds in display_str_list] # line 233 
text_width, text_height = font.getsize(display_str) # line 243

class PIL.ImageFont(PIL.version=10.0.0) don't have function getsize() directly, you should get the class member font first, then you can use function getsize().
so the solution is below.

display_str_heights = [font.font.getsize(ds)[1] for ds in display_str_list] # line 233 
text_width, text_height = font.font.getsize(display_str) # line 243

Then,you can see it works!
image

Versions of libraries in the requirements.txt

Hi, I would like to know if the versions of libraries in the requirements.txt can be provided. Currently I am using the latest TF (2.11) that seems to have breaking change in the optimizer. Thus, the initialization for base class in AdamWeightDecay() (https://github.com/google-research/pix2seq/blob/main/models/model_utils.py#L104) has incorrect arguments compared to the corresponding ones in https://github.com/keras-team/keras/blob/v2.11.0/keras/optimizers/optimizer_experimental/adam.py#L86. Thanks!

typo in readme

a word object has a typo in "Instructions for training (fine-tuning) of objedct detection models"

how to use this to generate image captioning

In the paper, you mention that you use this to do image captioning in table 2. However, I do not see the image captioning in this github. Can you tell me how to use it?

thanks,

Hi ,i get the error msg like this :

2022-10-12 15:43:57.254005: W tensorflow/core/grappler/optimizers/data/slack.cc:103] Could not find a finalprefetch` in the input pipeline to which to introduce slack.
I1012 15:43:57.996680 140468541171456 api.py:459] train_step begins...
I1012 15:44:07.279798 140468532778752 api.py:459] train_step begins...
INFO:tensorflow:batch_all_reduce: 369 all-reduces with algorithm = nccl, num_packs = 1
I1012 15:44:10.852259 140499206152832 cross_device_ops.py:897] batch_all_reduce: 369 all-reduces with algorithm = nccl, num_packs = 1
I1012 15:44:17.169317 140468541171456 api.py:446] Trainable variables:
I1012 15:44:17.426999 140468541171456 api.py:446] vit/stem_conv/kernel:0 (16, 16, 3, 768)
I1012 15:44:17.432081 140468541171456 api.py:446] vit/stem_conv/bias:0 (768,)
I1012 15:44:17.436969 140468541171456 api.py:446] vit/stem_ln/gamma:0 (768,)
....
INFO:tensorflow:batch_all_reduce: 369 all-reduces with algorithm = nccl, num_packs = 1
I1012 15:44:31.484436 140499206152832 cross_device_ops.py:897] batch_all_reduce: 369 all-reduces with algorithm = nccl, num_packs = 1
I1012 15:44:37.695064 140468532778752 api.py:459] train_step ends...
I1012 15:44:38.920633 140468541171456 api.py:459] train_step ends...
2022-10-12 15:45:08.671253: W tensorflow/core/framework/op_kernel.cc:1768] UNKNOWN: KeyError: 351529
Traceback (most recent call last):

File "/root/anaconda3/envs/pix2seq/lib/python3.9/site-packages/tensorflow/python/ops/script_ops.py", line 271, in call
ret = func(*args)

File "/root/anaconda3/envs/pix2seq/lib/python3.9/site-packages/tensorflow/python/autograph/impl/api.py", line 642, in wrapper
return func(*args, **kwargs)

File "/tmp/autograph_generated_filecefzj46v.py", line 22, in get_area
retval__1 = ag
.converted_call(ag__.ld(np).asarray, ([ag__.ld(id_to_ann)[ag__.ld(i)]['area'] for i in ag__.ld(ids)],), dict(dtype=ag__.ld(np).float32), fscope_1)

File "/tmp/autograph_generated_filecefzj46v.py", line 22, in
retval__1 = ag
.converted_call(ag__.ld(np).asarray, ([ag__.ld(id_to_ann)[ag__.ld(i)]['area'] for i in ag__.ld(ids)],), dict(dtype=ag__.ld(np).float32), fscope_1)

KeyError: 351529
2022-10-12 15:45:08.671413: W tensorflow/core/framework/op_kernel.cc:1768] UNKNOWN: KeyError: 415619
Traceback (most recent call last):

File "/root/anaconda3/envs/pix2seq/lib/python3.9/site-packages/tensorflow/python/ops/script_ops.py", line 271, in call
ret = func(*args)

File "/root/anaconda3/envs/pix2seq/lib/python3.9/site-packages/tensorflow/python/autograph/impl/api.py", line 642, in wrapper
return func(*args, **kwargs)

File "/tmp/autograph_generated_filecefzj46v.py", line 22, in get_area
retval__1 = ag
.converted_call(ag__.ld(np).asarray, ([ag__.ld(id_to_ann)[ag__.ld(i)]['area'] for i in ag__.ld(ids)],), dict(dtype=ag__.ld(np).float32), fscope_1)

File "/tmp/autograph_generated_filecefzj46v.py", line 22, in
retval__1 = ag
.converted_call(ag__.ld(np).asarray, ([ag__.ld(id_to_ann)[ag__.ld(i)]['area'] for i in ag__.ld(ids)],), dict(dtype=ag__.ld(np).float32), fscope_1)

KeyError: 415619

`

My gpu is 2 * RTX 3070 with 8G .

Question about inference

During inference, the token (5-th) that is corresponded to the object class may be classified to the coordinates. In the other hand, the token that is corresponded to the coordinates still has chance to be classified to the class of the object. How to deal with such situation? Thanks a lot.

Input Sequence Box Augmentation

Did you include the source code for noised gt box and hallucinated box augmentation described in the paper?
I can only find class augmentation's code.
Thanks!

Convert mask to polygon

Hi, I have a question that how you convert the instance mask to polygon in your code? I find that codes used to generate polygons :

polygons = example['image/object/segmentation'].to_tensor(
        default_value=vocab.PADDING_FLOAT,
        shape=[None, max_points * 2])

but this codes do not contain instance mask from the given label. so how do you convert given instance mask label to a polygon? Thanks!

Video and webcam

Hi,

Has your code been tested on video and webcam? Please, can you share?

Thanks.

Input multiple sequences per image

Hello, fantastic work on Pix2seq v1 and v2.

I have a question regarding handling multiple sequences for one image. In the following code, it seems that we can input multiple sequences by using a tensor with size (bsz, instances, seqlen). The current version use seq with size (bsz, seqlen).

seq: `int` sequence in shape of (bsz, seqlen),

or (bsz, instances, seqlen) if there are multiple sequences per image.

(https://github.com/google-research/pix2seq/blob/6d45f77fcbb1905aca3e42678a2a079907ad17d0/models/ar_model.py#L84-#L85)

I tried this but it failed:

`ValueError: Exception encountered when calling layer "ar_decoder" " f"(type AutoregressiveDecoder).

    in user code:
    
        File "/pix2seq/architectures/transformers.py", line 684, in call  *
            _, seqlen = get_shape(tokens)
    
        ValueError: too many values to unpack (expected 2)
    
    
    Call arguments received by layer "ar_decoder" "                 f"(type AutoregressiveDecoder):
      • tokens=tf.Tensor(shape=(32, 3, 500), dtype=int64)
      • encoded=tf.Tensor(shape=(32, 1600, 512), dtype=float32)
      • training=True


Call arguments received by layer "model" "                 f"(type Model):
  • images=tf.Tensor(shape=(32, 640, 640, 3), dtype=float32)
  • seq=tf.Tensor(shape=(32, 500), dtype=int64)
  • training=True`

Do you have any idea? Have you tried to use multiple sequences and how to do that?

Thank you!!!

Generate different inference results

Hi, thanks for your wonderful work! I have a question that whether pix2seq can generate different inference results by setting random seed? If it can, how can i modify the code?

Typo in the README

image

Under the Object365 pretrained models, The ViT-B and ViT-L on the left column should be switched to match the URLs.

Training from scratch on local MSCOCO data on A100

Hi,
I can not find any resource on how to train this model on local available MSCOCO dataset using GPU. Could you please provide some training guide for that? I know the code is for running on TPUs, what will be the changes required to train on GPUs?

Regards,
Dipendra

multitask object detection result is wrong!

when I run the 'pix2seq_inference_multitask.ipynb', the detection result is below, the detection result is almostly 640,which might be wrong, why the problem happen?
`(<tf.Tensor: shape=(1, 640, 640, 3), dtype=float32, numpy=
array([[[[0.03137255, 0.03529412, 0.04313725],
[0.01960784, 0.02352941, 0.03137255],
[0.00784314, 0.01176471, 0.01960784],
...,
[0.3 , 0.3 , 0.3 ],
[0.3 , 0.3 , 0.3 ],
[0.3 , 0.3 , 0.3 ]],

    [[0.00784314, 0.01176471, 0.01960784],
     [0.00784314, 0.01176471, 0.01960784],
     [0.00784314, 0.01176471, 0.01960784],
     ...,
     [0.3       , 0.3       , 0.3       ],
     [0.3       , 0.3       , 0.3       ],
     [0.3       , 0.3       , 0.3       ]],

    [[0.00784314, 0.01176471, 0.01960784],
     [0.00784314, 0.01176471, 0.01960784],
     [0.01176471, 0.01568627, 0.02352941],
     ...,
     [0.3       , 0.3       , 0.3       ],
     [0.3       , 0.3       , 0.3       ],
     [0.3       , 0.3       , 0.3       ]],

    ...,

    [[0.50980395, 0.5137255 , 0.5294118 ],
     [0.5254902 , 0.5294118 , 0.54509807],
     [0.50980395, 0.50980395, 0.5176471 ],
     ...,
     [0.3       , 0.3       , 0.3       ],
     [0.3       , 0.3       , 0.3       ],
     [0.3       , 0.3       , 0.3       ]],

    [[0.5568628 , 0.5686275 , 0.5882353 ],
     [0.5294118 , 0.53333336, 0.54901963],
     [0.5294118 , 0.5294118 , 0.5372549 ],
     ...,
     [0.3       , 0.3       , 0.3       ],
     [0.3       , 0.3       , 0.3       ],
     [0.3       , 0.3       , 0.3       ]],

    [[0.5803922 , 0.5921569 , 0.6117647 ],
     [0.5254902 , 0.5294118 , 0.54509807],
     [0.5254902 , 0.5294118 , 0.5372549 ],
     ...,
     [0.3       , 0.3       , 0.3       ],
     [0.3       , 0.3       , 0.3       ],
     [0.3       , 0.3       , 0.3       ]]]], dtype=float32)>, <tf.Tensor: shape=(1,), dtype=int32, numpy=array([230983], dtype=int32)>, <tf.Tensor: shape=(1, 10, 4), dtype=float32, numpy=

array([[[1. , 1. , 1. , 1. ],
[1. , 1. , 1. , 1. ],
[1. , 1. , 1. , 1. ],
[1. , 0.5105105, 1. , 1. ],
[1. , 1. , 1. , 1. ],
[1. , 1. , 1. , 1. ],
[0.7327327, 1. , 1. , 1. ],
[1. , 1. , 1. , 1. ],
[1. , 1. , 1. , 1. ],
[1. , 1. , 1. , 1. ]]], dtype=float32)>, <tf.Tensor: shape=(1, 10, 4), dtype=float32, numpy=
array([[[640. , 640. , 640. , 640. ],
[640. , 640. , 640. , 640. ],
[640. , 640. , 640. , 640. ],
[640. , 326.7267 , 640. , 640. ],
[640. , 640. , 640. , 640. ],
[640. , 640. , 640. , 640. ],
[468.94894, 640. , 640. , 640. ],
[640. , 640. , 640. , 640. ],
[640. , 640. , 640. , 640. ],
[640. , 640. , 640. , 640. ]]], dtype=float32)>, <tf.Tensor: shape=(1, 10), dtype=int64, numpy=array([[505, 802, 505, 505, 505, 505, 505, 776, 505, 505]])>, <tf.Tensor: shape=(1, 10), dtype=float32, numpy=
array([[8.74843026e-05, 9.13504991e-05, 8.93203469e-05, 8.99704537e-05,
8.39981003e-05, 9.77213494e-05, 8.80038278e-05, 8.94075274e-05,
1.05320119e-04, 1.03460596e-04]], dtype=float32)>, <tf.Tensor: shape=(1, 10), dtype=int64, numpy=array([[0, 0, 0, 0, 0, 0, 0, 0, 0, 0]])>, <tf.Tensor: shape=(1, 10, 4), dtype=float32, numpy=
array([[[0., 0., 0., 0.],
[0., 0., 0., 0.],
[0., 0., 0., 0.],
[0., 0., 0., 0.],
[0., 0., 0., 0.],
[0., 0., 0., 0.],
[0., 0., 0., 0.],
[0., 0., 0., 0.],
[0., 0., 0., 0.],
[0., 0., 0., 0.]]], dtype=float32)>, <tf.Tensor: shape=(1, 10, 4), dtype=float32, numpy=
array([[[0., 0., 0., 0.],
[0., 0., 0., 0.],
[0., 0., 0., 0.],
[0., 0., 0., 0.],
[0., 0., 0., 0.],
[0., 0., 0., 0.],
[0., 0., 0., 0.],
[0., 0., 0., 0.],
[0., 0., 0., 0.],
[0., 0., 0., 0.]]], dtype=float32)>, <tf.Tensor: shape=(1, 10), dtype=float32, numpy=array([[0., 0., 0., 0., 0., 0., 0., 0., 0., 0.]], dtype=float32)>, <tf.Tensor: shape=(1, 10), dtype=bool, numpy=
array([[False, False, False, False, False, False, False, False, False,
False]])>)
`

RIN training with float16/bfloat16

I've been trying to train RIN with mixed precision, but it fails for some reason.
Have you tried to train with mixed precision? If so, would you happen to have some recommendations to stabilize training?

Thanks!

image
Left: bfloat16; right: float32

How to finetune on wider_face? & VOC fine-tuning colab error

In the finetune example, the dataset name was changed to wider_face, and the label was not defined when proceeding, so the learning was not possible. Is there an example of finetuning with the wider_face dataset?

And even if you use the finetuning example of the given colab as it is
`ValueError: in user code:

File "/home/vislab/Human_Pose_Estimation/capstone/pix2seq/data/dataset.py", line 111, in None  *
    lambda x: self.extract(x, training)
File "/home/vislab/Human_Pose_Estimation/capstone/pix2seq/test.py", line 53, in extract  *
    areas = tf.reshape(areas, [tf.shape(label)[0]])

ValueError: slice index 0 of dimension 0 out of bounds. for '{{node strided_slice}} = StridedSlice[Index=DT_INT32, T=DT_INT32, begin_mask=0, ellipsis_mask=0, end_mask=0, new_axis_mask=0, shrink_axis_mask=1](Shape, strided_slice/stack, strided_slice/stack_1, strided_slice/stack_2)' with input shapes: [0], [1], [1], [1] and with computed input tensors: input[1] = <0>, input[2] = <1>, input[3] = <1>.`

The following error does not work.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.