Code Monkey home page Code Monkey logo

advent's Introduction

ADVENT: Adversarial Entropy Minimization for Domain Adaptation in Semantic Segmentation

Updates

  • 02/2020: Using CycleGAN translated images, The AdvEnt model achieves (46.3%) on GTA5-2-Cityscapes
  • 09/2019: check out our new paper DADA: Depth-aware Domain Adaptation in Semantic Segmentation (accepted to ICCV 2019). With a depth-aware UDA framework, we leverage depth as the privileged information at train time to boost target performance. Pytorch code and pre-trained models are coming soon.

Paper

ADVENT: Adversarial Entropy Minimization for Domain Adaptation in Semantic Segmentation
Tuan-Hung Vu, Himalaya Jain, Maxime Bucher, Matthieu Cord, Patrick Pérez
valeo.ai, France
IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2019 (Oral)

If you find this code useful for your research, please cite our paper:

@inproceedings{vu2018advent,
  title={ADVENT: Adversarial Entropy Minimization for Domain Adaptation in Semantic Segmentation},
  author={Vu, Tuan-Hung and Jain, Himalaya and Bucher, Maxime and Cord, Mathieu and P{\'e}rez, Patrick},
  booktitle={CVPR},
  year={2019}
}

Abstract

Semantic segmentation is a key problem for many computer vision tasks. While approaches based on convolutional neural networks constantly break new records on different benchmarks, generalizing well to diverse testing environments remains a major challenge. In numerous real world applications, there is indeed a large gap between data distributions in train and test domains, which results in severe performance loss at run-time. In this work, we address the task of unsupervised domain adaptation in semantic segmentation with losses based on the entropy of the pixel-wise predictions. To this end, we propose two novel, complementary methods using (i) an entropy loss and (ii) an adversarial loss respectively. We demonstrate state-of-the-art performance in semantic segmentation on two challenging synthetic-2-real set-ups and show that the approach can also be used for detection.

Demo

Preparation

Pre-requisites

  • Python 3.7
  • Pytorch >= 0.4.1
  • CUDA 9.0 or higher

Installation

  1. Clone the repo:
$ git clone https://github.com/valeoai/ADVENT
$ cd ADVENT
  1. Install OpenCV if you don't already have it:
$ conda install -c menpo opencv
  1. Install this repository and the dependencies using pip:
$ pip install -e <root_dir>

With this, you can edit the ADVENT code on the fly and import function and classes of ADVENT in other project as well.

  1. Optional. To uninstall this package, run:
$ pip uninstall ADVENT

You can take a look at the Dockerfile if you are uncertain about steps to install this project.

Datasets

By default, the datasets are put in <root_dir>/data. We use symlinks to hook the ADVENT codebase to the datasets. An alternative option is to explicitlly specify the parameters DATA_DIRECTORY_SOURCE and DATA_DIRECTORY_TARGET in YML configuration files.

  • GTA5: Please follow the instructions here to download images and semantic segmentation annotations. The GTA5 dataset directory should have this basic structure:
<root_dir>/data/GTA5/                               % GTA dataset root
<root_dir>/data/GTA5/images/                        % GTA images
<root_dir>/data/GTA5/labels/                        % Semantic segmentation labels
...
  • Cityscapes: Please follow the instructions in Cityscape to download the images and validation ground-truths. The Cityscapes dataset directory should have this basic structure:
<root_dir>/data/Cityscapes/                         % Cityscapes dataset root
<root_dir>/data/Cityscapes/leftImg8bit              % Cityscapes images
<root_dir>/data/Cityscapes/leftImg8bit/val
<root_dir>/data/Cityscapes/gtFine                   % Semantic segmentation labels
<root_dir>/data/Cityscapes/gtFine/val
...

Pre-trained models

Pre-trained models can be downloaded here and put in <root_dir>/pretrained_models

Running the code

For evaluation, execute:

$ cd <root_dir>/advent/scripts
$ python test.py --cfg ./configs/advent_pretrained.yml
$ python test.py --cfg ./configs/advent_cyclegan_pretrained.yml 	% trained on cycleGAN translated images
$ python test.py --cfg ./configs/minent_pretrained.yml
$ python test.py --cfg ./configs/advent+minent.yml

Training

For the experiments done in the paper, we used pytorch 0.4.1 and CUDA 9.0. To ensure reproduction, the random seed has been fixed in the code. Still, you may need to train a few times to reach the comparable performance.

By default, logs and snapshots are stored in <root_dir>/experiments with this structure:

<root_dir>/experiments/logs
<root_dir>/experiments/snapshots

To train AdvEnt:

$ cd <root_dir>/advent/scripts
$ python train.py --cfg ./configs/advent.yml
$ python train.py --cfg ./configs/advent.yml --tensorboard         % using tensorboard

To train MinEnt:

$ python train.py --cfg ./configs/minent.yml
$ python train.py --cfg ./configs/minent.yml --tensorboard         % using tensorboard

Testing

To test AdvEnt:

$ cd <root_dir>/advent/scripts
$ python test.py --cfg ./configs/advent.yml

To test MinEnt:

$ python test.py --cfg ./configs/minent.yml

Acknowledgements

This codebase is heavily borrowed from AdaptSegNet and Pytorch-Deeplab.

License

ADVENT is released under the Apache 2.0 license.

advent's People

Contributors

himalayajain avatar tuanhungvu avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

advent's Issues

reproducibility of the results

I'm having trouble reproducing the results if I normalize the input image with mean=(0.485, 0.456, 0.406), std=(0.229, 0.224, 0.225) and use the pretrained resnet-101 model on pytorch official website, could you please explain the reason, thanks.

About implementation of AdvEnt+MinEnt

Hi,

I implemented the AdvEnt+MinEnt (45.5 in paper) by combining advEnd and MinEnd together, but got 42.48 % instead.

For GTA5-> Cityscapes:
I saw the result for MinEnt + ER in paper. My question is that do you use +ER or class prior techniques with AdvEnd+MinEnt experiment ?

Could you please briefly clarify how you implement AdvEnd+MinEnt experiment ?

Best,
Chang

How to reduce the requirement of GPU memory

Hi, I am trying to reproduce the state-of-the-art of your paper advent.
The server of my school is under system update, so
For now, I only have one rtx2070s 8GB for training, is there any method to reduce the requirement of GPU memory for training your model?
i have noticed that the batch size is 1 in the config file.

looking forward to your reply!
many thanks!

How to perform ensembling?

Very insightful paper. I have one question.
In the paper, the ensembling of EntMin and EntAdv achieves better performance.
How did you perform the ensembling?
Did you calculate the average of the probability map (after the softmax) or logit map (before the softmax) of the two models?

Thanks for your reply in advance.

question about the segmentation network

hello, @valeoai do you have tried different segmentation network in ADVENT? because now in advent it has only used deeplab as segmentation network, i want to ask, if the different segmentation network has influence an the segmentation result. Is it possible through changing segmentation network to get better result? Thank you in advance

Class ratio prior loss ?

Hello, thank for sharing the great work. I couldn't find the class ratio prior loss and wasn't sure i understood how it worked form your paper. Did i miss something ?
Thanks!
Mathilde

Training Performance and Stability

Hello,

Thank you for your work on this repo! I have a quick question. When I train a model (either MinEnt or Advent), I find that the validation performance varies widely (sometimes 5-10 mIOU points) from snapshot to snapshot (where snapshots are taken every 2000 iterations with the default learning rate). Do you recall experiencing this type of variation in mIOU across snapshots?

If so, did you just report the score of the best single snapshot on the 500-image val set (i.e. take the best evaluation under the 'best' config for cfg.TEST.MODE)?

It is possible that the performance stabilizes after more steps, but I am currently at iteration 60,000, so that seems unlikely at this point. Thank you!

Note: I have read all the previous issues and I am not concerned with attaining 43.3 vs 43.8 or that sort of thing. I am concerned with performance varying more widely, from say 41 to 37 to 42 within a few thousand iterations.

source_label=0; target_label=1

@tuanhungvu hello~, I find that in ADVENT/advent/domain_adaptation/train_UDA.py, the source_label = 0 and target_label = 1, which is inverse of what is descripted in the original paper. This makes me very confused, I do hope the authors could give me an answer, thanks~

binary segmentation

Hi, how can I change the dataset to binary segmentation, I have images and masks and can generate a data list, but the format seems quiet different from the cityscapes.

Any guidance will be so grateful!

performance

Hi. I git clone your repro and run “python train.py --cfg ./configs/advent.yml”, but I test it with the command "python test.py --cfg ./configs/advent.yml" , the test mIoU is only 43.25 which is lower than the paper. Did you finetune the model and get mIoU 43.8? Thanks.

Direct entropy minimization for object detection (YOLOv3)

Hello @tuanhungvu ,

I am a student at TU Chemnitz, Germany. I am currently working on my master thesis project titled, 'Unsupervised Domain Adaptation for object detection.' I am working on the direct entropy minimization method mentioned in your paper. I am using YOLOv3 as my base architecture for object detection. I just wanted to confirm if I am implementing the method correctly for YOLOv3, as in the paper it is defined for SSD and I could not find any source code of the same for reference.

I am a little confused about the term 'soft-detection map' which is to be used to calculate the entropy for object detection. I read the paper and found some similarities between SSD and YOLOv3 but I am not absolutely sure if I am using the correct feature map during implementation. It would be great if you could help me with this.

  1. Could you specify from which exact layer is the 'soft detection' map taken for SSD? By any chance would you know what will be its equivalent in YOLOv3?

  2. In the equation,
    image
    is it correct that C represents the class probabilities for each anchor box or does it represent all the offsets obtained for each anchor box after applying the kernel?

  3. In YOLOv3, feature maps are obtained at 3 different scales. So should the feature map be considered as the output from the previous convolutional layer that would be used for detection or just the class probabilities obtained after processing the feature map to apply softmax and calculate the entropy map?

I hope I am able to express my doubt in a clear way. In case, you need some additional information, please let me know.
Thanks in advance.

Hyperparameters for training with CycleGAN translated images

Hello, thanks for publishing your code.

I am trying to reproduce your results regarding the use of CycleGAN translated images, but am unable to. Did you use the same hyperparameters for this run, or did you change these? If so, what hyperparameter values did you change?

Thanks in advance :)

Why doesn't the output shape of the discriminator have to be (B,1,1,1)?

From the code, I know the structure of discriminator used fully convolution network(like discriminator in DCGAN), but when we input some any size self-information map , I(x),we can't fix the output shape of discriminator to (B, C, 1, 1), maybe we get a output whose shape is (B, 1, 4, 4) and then create a ground truth tensor whose all elements is 1 or 0 (source or target) to calculate BCE loss.
I can't know why the output shape of discriminator don't have to be (B, 1, 1, 1), and we can directly use them for BCE loss.
Thank you!

Question about the weighting factor of the entropy loss

Hi,
I checked the code ,and suprised to find that the weighting factor of the entropy loss and adversarial loss are both really small(1e-3 compared with the segmentation weighting factor 1.0), so I wonder if it really works out in practice. And for the direct entropy loss, could it be possible that the network eventually predicts every pixel as one class, in which case the network would have a really low entropy loss but bad performance.
Looking foward to hearing your reply.

About the code of ADVENT

Hello,

I am trying to reproduce the result reported in the paper( ADVENT) but the result I got is 42.36(advent only, best result), I want to know if there is any mistake in the configure or have you ever meet such result?
image

One hot encoding

Dear Author,

No one hot encoding required while training source dataset?

thank you

Data normalization before feeding to the Segmentation network

Hi, I looked at the dataloader and training codes completely and noticed that you are subtracting the Imagenet mean from the input RGB images (GTA and Cityscapes) but not normalizing the data to either 0 to 1 (or to -0.5 to 0.5 for that matter).

Just wanted to know if I missed out on something or you intentionally chose to skip the normalization before feeding the data to the segmentation network.

Appreciate your response.

Is it typing error? (at loss function)

Hello. Maybe I found some difference between code and paper.

image
The loss function of training discriminator represents that source domain is 0, and target domain is 1.

image
However, same part in paper shows source domain is 1, and target domain is 0.

Is it the intended notation or just typing error?

Question about caculation of image mean?

I found in your code IMG_MEAN is set to [104.00698793, 116.66876762, 122.67891434], is IMG_MEAN calcuated on source domain only or target domain only or both?

About implementation of class-ratio priors

Thanks for your great work, but I have a little confusion about the class-ratio priors. I can't find the implementation of class-ratio priors in your project. And I wonder whether the implementation of CP is what I describe.
First,calculate the distribution of class from source target and get the ps.
Then, pass the feature map after softmax layer of target domain picture to a global average pooling layer to get the mean of class score. px
Finally, ps subtract px,and add them if the subtraction result is over 0 among class channel.
Besides, I want to ask another question,about the loss function lcp. Why is the subtraction result should be over 0? Maybe a modulus of the result can help to let the target domain's distribution close to source domain's distribution.
Thanks!

About the entropy minimization

Hello~
I notice that the log you are using in the entropy minimization is log2() rather than log(), have you tried torch.log() ?

How to implement from SYNTHIA to Cityscapes?

Hello, I am re implementing the adaptation from SYNTHIA to Cityscapes. Except for the image size [1280, 760], I used the exact same setting up with GTA5->Cityscapes, including init parameters, learning rate, iterations, ect. But according to my training result, i could only get the best mIOU: 39.1% for 16 classes. I would like to ask, how should I improve to get the similar performance as in your paper. It is shown as below:
image

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.