Code Monkey home page Code Monkey logo

tim-gan's Introduction

Text as Neural Operator: Image Manipulation by Text Instruction

This is the code for the paper:

Text as Neural Operator: Image Manipulation by Text Instruction
Tianhao Zhang, Hung-Yu Tseng, Lu Jiang, Weilong Yang, Honglak Lee, Irfan Essa

Please note that this is not an officially supported Google product. And this is the reproduced, not the original code.

If you find this code useful in your research then please cite

TODO: Citation

Introduction

In this paper, we study a new task that allows users to edit an input image using language instructions.

Problem Overview

The key idea is to treat language as neural operators to locally modify the image feature. To this end, our model decomposes the generation process into finding where (spatial region) and how (text operators) to apply modifications. We show that the proposed model performs favorably against recent baselines on three datasets.

Method

Installation

Clone this repo and go to the cloned directory.

Please create a environment using python 3.7 and install dependencies by

pip install -r requirements.txt

To reproduce the results reported in the paper, you would need an V100 GPU.

Download datasets and pretrained model

The original Clevr dataset we used is from this external website. The original Abstract Scene we used is from this external website.

Pretrained models (Clevr and Abstract Scene) can be downloaded from here. Extract the tar:

tar -xvf checkpointss.tar -C ../

Testing Using Pretrained Model

Once the dataset is preprocessed and the pretrained model is downloaded,

  1. Generate images using the pretrained model.

    bash run_test.sh

    Please switch parameters in the script for different datasets. Testing parameters for Clevr and Abstract Scene datasets are already configured in the script.

  2. The outputs are at ../output/.

Training

New models can be trained with the following commands.

  1. Prepare dataset. Please contact the author if you need the processed datasets. If you are to use a new dataset, please follow the structure of the provided datasets, which means you need paired data (input image, input text, output image)

  2. Train.

# Pretraining
bash run_pretrain.sh

# Training
bash run_train.sh

There are many options you can specify. Training parameters for Clevr and Abstract Scene datasets are already configured in the script.

Tensorboard logs are stored at [../checkpoints_local/TIMGAN/tfboard].

Testing

Testing is similar to testing pretrained models.

bash run_test.sh

Testing parameters for Clevr and Abstract Scene datasets are already configured in the script.

Evaluation

The FID score is computed using the pytorch implementation here. Image retrieval script will be released soon.

Pretrained Model Performance

FID RS@1 RS@5
Clevr 33.0 95.9±0.1 97.8±0.1
Abstract Scene 35.1 35.4±0.2 58.7±0.1

Code Structure

  • run_pretrain.sh, run_train.sh, run_test.sh: bash scripts for pretraining, training and testing.
  • train_recon.py, train.py, test.py: the entry point for pretraining, training and testing.
  • models/tim_gan.py: creates the networks, and compute the losses.
  • models/networks.py: defines the basic modules for the architecture.
  • options/: options.
  • dataset/: defines the class for loading the dataset.

Question

Please contact [email protected] if you need the preprocessed data or the Cityscapes pretrained model.

tim-gan's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

tim-gan's Issues

Retrieval scripts and Autoencoder for obtaining query vectors

Hi,

It was great to read your paper.

Retrieval script
I was wondering if you are going to release the retrieval script any time soon?

Autoencoder for getting image embeddings for retrieval:
What is the exact architecture of this autoencoder? Is the encoder and decoder the same as the encoder and generator used in TIM-GAN.
could you please explain the process of retrieval pleas, in particular, we have an autoencoder, made of an encoder E1 and decoder D1.
then we pretrain this autoencoder on the dataset. Could you tell me the exact pertaining process, loss functions etc? Can we use the run_pretrain.sh script for training the autoencoder?

While calculating recall for your method, this is written in the paper:
Screenshot 2022-04-18 at 12 15 04 PM

but how do we calculate the recall for other methods? i.e what is the encoder used in that case?

In my understanding, it should have been that there is a separate autoencoder trained on the dataset, which does not have anything to do with TIM-GAN, or any of the other methods, and then after all the models are trained and they are able to generate images, we can use this pretrained autoencoder to get the image representations of the generated images, and use it as a query.
could you tell me if this is happening in the paper, or if not then what is the exact process, because I want to calculate the metrics for these methods on my side.

Thanks you in advance, for the help

Clevr Dataset

Hi,

The CSS dataset in the TIRG repository is not valid for training the TIM-GAN model because they are not re-rendered to solve the misalignment issue of the unchanged objects in the images. Since the attention mechanism is trained based on the differences between the source and the target images, the model could not learn which object to give attention to when the non-target objects are misaligned. Can you please share the dataset generator code or make the re-rendered dataset available?

An example of the misalignment issue is as follows:
tim-gan_misalignment

The locations of the objects are not stable between the input (source) and the ground truth (target) images.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.