yanchaoyang / fda Goto Github PK

Fourier Domain Adaptation for Semantic Segmentation

Python 100.00%

fda's Introduction

FDA: Fourier Domain Adaptation for Semantic Segmentation.

This is the Pytorch implementation of our FDA paper published in CVPR 2020.

Domain adaptation via style transfer made easy using Fourier Transform. FDA needs no deep networks for style transfer, and involves no adversarial training. Below is the diagram of the proposed Fourier Domain Adaptation method:

Step 1: Apply FFT to source and target images.

Step 2: Replace the low frequency part of the source amplitude with that from the target.

Step 3: Apply inverse FFT to the modified source spectrum.

Usage

FDA Demo

python3 FDA_demo.py

An example of FDA for domain adaptation. (source: GTA5, target: CityScapes, with beta 0.01)
Sim2Real Adaptation Using FDA (single beta)

python3 train.py --snapshot-dir='../checkpoints/FDA' --init-weights='../checkpoints/FDA/init_weight/DeepLab_init.pth' --LB=0.01 --entW=0.005 --ita=2.0 --switch2entropy=0

Important: use the original images for FDA, then do mean subtraction, normalization, etc. Otherwise, will be numerical artifacts.

DeepLab initialization can be downloaded through this link.

LB: beta in the paper, controls the size of the low frequency window to be replaced.

entW: weight on the entropy term.

ita: coefficient for the robust norm on entropy.

switch2entropy: entropy minimization kicks in after this many steps.
Evaluation of the Segmentation Networks Adapted with Multi-band Transfer (multiple betas)

python3 evaluation_multi.py --model='DeepLab' --save='../results' --restore-opt1="../checkpoints/FDA/gta2city_deeplab/gta2city_LB_0_01" --restore-opt2="../checkpoints/FDA/gta2city_deeplab/gta2city_LB_0_05" --restore-opt3="../checkpoints/FDA/gta2city_deeplab/gta2city_LB_0_09"

Pretrained models on the GTA5 -> CityScapes task using DeepLab backbone can be downloaded here.

The above command should output: ===> mIoU19: 50.45 ===> mIoU16: 54.23 ===> mIoU13: 59.78
Get Pseudo Labels for Self-supervised Training

python3 getSudoLabel_multi.py --model='DeepLab' --data-list-target='./dataset/cityscapes_list/train.txt' --set='train' --restore-opt1="../checkpoints/FDA/gta2city_deeplab/gta2city_LB_0_01" --restore-opt2="../checkpoints/FDA/gta2city_deeplab/gta2city_LB_0_05" --restore-opt3="../checkpoints/FDA/gta2city_deeplab/gta2city_LB_0_09"
Self-supervised Training with Pseudo Labels

python3 SStrain.py --model='DeepLab' --snapshot-dir='../checkpoints/FDA' --init-weights='../checkpoints/FDA/init_weight/DeepLab_init.pth' --label-folder='cs_pseudo_label' --LB=0.01 --entW=0.005 --ita=2.0
Other Models

VGG initializations can be downloaded through this link.

Pretrained models on the Synthia -> CityScapes task using DeepLab backbone link.

Pretrained models on the GTA5 -> CityScapes task using VGG backbone link.

Pretrained models on the Synthia -> CityScapes task using VGG backbone link.

Acknowledgment

Code adapted from BDL.

fda's People

Contributors

Stargazers

Watchers

Forkers

annopackage 1201amit yunahan094 jiangxiaobai00 dragynir daehankim-korea dengqiangzhang-tju githubpgq saurabhmishra608 thefatbandit siddharth-shrivastava7 gondorfu mingji123 liuguoyou liuquande cvlife hackerjean peterouzh xyl-py torment123 j-atermoehlen y72428026 hzcxq peterzhousz parisasaat bobvo23 ftt123456322 zhixiantang aabdz yangxu351 jo-wang kevynutopia deep-mici zhipengy gaochangwu seominseok0429 guoyang-xie jiawen7777 yangsenwxy crnh samuel-van-gurp carpediem1012 fanyuzeng manideep1108 cryptowealth-technology zhangyiming786 haoyang-219 fqzz2000 yutheon zhangyk124 tanmayg890 piaofu110 gaurav104 demoallan scottblack1998 niwa-s laurayuzheng 1402122619 cristian42253 kohtaro246 ip-augmentation ponykid russellriggsb mehranfvs khashi2 lixiang007666 emersonzc hantongxing dina-like kacper-marciniak matrixhackin klopotek0 johncruyff14 maxxyouu

fda's Issues

How to get best model of single Beta?

When I run train.py with --LB=0.01 as the command you show, I get files about:
gta5_2500.pth~gta5_100000.pth, I guess you will test this file and rename the best as gta2city_LB_0_01.pth for multiple test. But I can I find the best model of single beta?

Unable to reproduce the results of Sim2Real Adaptation Using FDA (single beta)

Hi,

I run the code " train.py" to reproduce "Sim2Real Adaptation Using FDA (single beta)" performance following the procedure explained in Usage 2 in READme.
Unfortunately, I failed to reproduce the results and got 42.69 42.81 41.32 for β=0.01,0.05,0.09 respectively.
Can you please help me to understand the potential reasons behind this performance drop?

Regards,
Yiting Cheng

Unable to reproduce the good results of Synthia to Cityscapes

Unable to reproduce good results of Synthia to Cityscapes. In LB=0.01, the miou is 44.09 before ssl. In LB=0.05, the miou is 43.3 before ssl. Can you please provide the hyperparameter settings on the Synthia dataset or your intermediate weight？Thanks a lot.

Questions about entropy minimization

Hi, thanks for sharing your work. However, I have some questions about your work.

Why you introduce the entropy minimization in your work?
There is no obvious ablation study for it except Table 1. In Table 1, beta = 0.09(T=0) == 45.01, and beta = 0.09(Ent=0) == 44.64. The previous one is with entropy loss, and latter is without entropy loss, right? If there exists only 0.37 improvements, why your utilize entropy loss in your work? It may introduce some misunderstandings.

Or you just introduce two types of self training ? entropy minimization and pseudo label retraining?

Thanks! looking forward to your reply!

looks promising, would love to try out your code!

Hi,

Pretty cool work! Any updates on when the code would be released?

Thanks!
Z.

Visualization after mean subtraction

When I draw the figure after mean subtraction by cv2.imshow() function, the figure became colorful which colors didn't show up in the original figure. someone tells me why?

Thanks!

Doubt regarding hardware requirements

In the paper it is mentioned, gpu GTX1080 Ti is used but nothing is mentioned about CPU ram. This can cause problems because the get_sudo_label command which is used for pseudo label generation requires a lot of CPU ram ( almost 36 GB to be precise). Also I felt the get_sudo_label file could have been coded better to decrease the CPU ram usage.

Problems about Label setting

The remapping labels range from 0~18, and the num_classes is set into 19. The background and ignored semantic objects are missing? At the inference stage, the backgrounds and ignored objects are forcibly allocated to 0~18?

Conda Environment specs for running FDA_demo.py + Unable to produce output as the demo

@YanchaoYang

Can you please share the environment you used to run FDA_demo.py ? I am unable to get the translation result as that of FDA_demo.py using the latest versions, and getting artefacts as follows:

The code I used is:

import numpy as np
from PIL import Image

def low_freq_mutate_np( amp_src, amp_trg, L=0.1 ):
    a_src = np.fft.fftshift( amp_src, axes=(-2, -1) )
    a_trg = np.fft.fftshift( amp_trg, axes=(-2, -1) )

    _, h, w = a_src.shape
    b = (  np.floor(np.amin((h,w))*L)  ).astype(int)
    c_h = np.floor(h/2.0).astype(int)
    c_w = np.floor(w/2.0).astype(int)

    h1 = c_h-b
    h2 = c_h+b+1
    w1 = c_w-b
    w2 = c_w+b+1

    a_src[:,h1:h2,w1:w2] = a_trg[:,h1:h2,w1:w2]
    a_src = np.fft.ifftshift( a_src, axes=(-2, -1) )
    return a_src

def FDA_source_to_target_np( src_img, trg_img, L=0.1 ):
    # exchange magnitude
    # input: src_img, trg_img

    src_img_np = src_img #.cpu().numpy()
    trg_img_np = trg_img #.cpu().numpy()

    # get fft of both source and target
    fft_src_np = np.fft.fft2( src_img_np, axes=(-2, -1) )
    fft_trg_np = np.fft.fft2( trg_img_np, axes=(-2, -1) )

    # extract amplitude and phase of both ffts
    amp_src, pha_src = np.abs(fft_src_np), np.angle(fft_src_np)
    amp_trg, pha_trg = np.abs(fft_trg_np), np.angle(fft_trg_np)

    # mutate the amplitude part of source with target
    amp_src_ = low_freq_mutate_np( amp_src, amp_trg, L=L )

    # mutated fft of source
    fft_src_ = amp_src_ * np.exp( 1j * pha_src )

    # get the mutated image
    src_in_trg = np.fft.ifft2( fft_src_, axes=(-2, -1) )
    src_in_trg = np.real(src_in_trg)

    return src_in_trg

im_src = Image.open("source.png").convert('RGB')
im_trg = Image.open("target.png").convert('RGB')

im_src = im_src.resize( (1024,512), Image.BICUBIC )
im_trg = im_trg.resize( (1024,512), Image.BICUBIC )

im_src = np.asarray(im_src, np.float32)
im_trg = np.asarray(im_trg, np.float32)

im_src = im_src.transpose((2, 0, 1))
im_trg = im_trg.transpose((2, 0, 1))

src_in_trg = FDA_source_to_target_np( im_src, im_trg, L=0.01 )

src_in_trg = src_in_trg.transpose((1,2,0))
# scipy.misc.toimage(src_in_trg, cmin=0.0, cmax=255.0).save('demo_images/src_in_tar.png')


src_in_trg_=Image.fromarray(np.uint8(src_in_trg))
src_in_trg_.save('gfg_dummy_pic.png')

Domain adaptation

Good day! I have read your paper and want to implement it on another 2 datasets. I want to ask you about adaptation - did you take different images from dataset you adopt to or you took only 1? I mean you take for every adopting image random target image or 1 for everything?

ent = ent / 2.9444

Hi,
in the training phase, when calculating the loss_ent variable in line 173 there is a line as follows:
ent = ent / 2.9444 # chanage when classes is not 19

what does the number 2.9444 represent and how should we change it if we use a different number of classes than 19?
I did not find any reference to this in the paper.

Thanks.

Releasing intermediate weights

Great work! Could you share the model weights before the SST step (those getting around 44.6 mIoU on GTA5 to Cityscapes)? It might be interesting to see if other self-training methods might improve the results.

In fcn8s.py line 162, what does "2.9444" represent? How can I calculate it when I change the number of class? thx

Model initialization in SSL training rounds

Hi,

During the second and the third round of SSL training, are the models still be trained from scratch or restored from the end of the previous round? According to the instruction, it seems like it will be initialized from scratch again?

Thanks

Dataset

Can you provide the GTA5 dataset

Why convert to BGR?

FDA/data/gta5_dataset.py

Line 57 in b9a0cdf

image = image[:, :, ::-1] # change to BGR

Questions about training from scratch

Hello @YanchaoYang, thanks for the great work. I have some questions about the weights initialization (--init-weights='DeepLab_init.pth'). What are they? I mean, if I initialize the network with those weights and infer in some Cityscapes images without any training, we already have some prediction results.
But when I try to train the model from the scratch (without any weight initialization), the loss get stuck. Also, I found that if we do not initialize the model with some weights, they are initialized with zeros (at least in VGG model, as in code bellow).

def _initialize_weights(self):
     for m in self.modules():
         if isinstance(m, nn.Conv2d):
             m.weight.data.zero_()
             if m.bias is not None:
                 m.bias.data.zero_()
         if isinstance(m, nn.ConvTranspose2d):
             assert m.kernel_size[0] == m.kernel_size[1]
             initial_weight = self.get_upsampling_weight(
                 m.in_channels, m.out_channels, m.kernel_size[0])
             m.weight.data.copy_(initial_weight)

My question is: What are the steps required to train the network from scratch? For example in another dataset that I don't have this initialization weights.
Thank you again!

Deeplabv2_init question

Is this Deeplabv2 pretrained on source data? Or is it the initialized model Deepla-resnet101 imagenet weights? Quite important if I would need to train deeplabv2 on my source data, or can I just start with these weights

Not able to understand this line

I am not able to understand the use of this line in the entropy loss implementation. How was this derived? What is its relation to number of classes?

FDA/model/CLS.py

Line 51 in b9a0cdf

ent = ent / 2.9444 # chanage when classes is not 19

GPU memory error

Hi, thanks for your work. Running train.py gives error :

/media/data/ObjectDetectionExperiments/Projects/3_SemanticSegment/FDA-master/my_env/lib/python3.5/site-packages/torch/nn/functional.py:52: UserWarning: size_average and reduce args will be deprecated, please use reduction='elementwise_mean' instead.
warnings.warn(warning.format(ret))
Traceback (most recent call last):
File "/media/data/ObjectDetectionExperiments/Projects/3_SemanticSegment/FDA-master/train.py", line 135, in
main()
File "/media/data/ObjectDetectionExperiments/Projects/3_SemanticSegment/FDA-master/train.py", line 94, in main
trg_seg_score = model(trg_img, lbl=trg_lbl, weight=class_weights, ita=args.ita) # forward pass
File "/media/data/ObjectDetectionExperiments/Projects/3_SemanticSegment/FDA-master/my_env/lib/python3.5/site-packages/torch/nn/modules/module.py", line 477, in call
result = self.forward(*input, **kwargs)
File "/media/data/ObjectDetectionExperiments/Projects/3_SemanticSegment/FDA-master/model/deeplab.py", line 181, in forward
self.loss_seg = self.CrossEntropy2d(x, lbl, weight=weight)
File "/media/data/ObjectDetectionExperiments/Projects/3_SemanticSegment/FDA-master/model/deeplab.py", line 237, in CrossEntropy2d
predict = predict[target_mask.view(n, h, w, 1).repeat(1, 1, 1, c)].view(-1, c)
RuntimeError: CUDA error: out of memory

If i use in data/init.py image size = (512, 256), train is running without error and take ~5Gb GPU. Why it happen?

Question about SYNTHIA dataset classes num

Hello~As we know, SYNTHIA dataset only share 16 classes with Cityscapes. So when training the model on the Synthia dataset, do I need to change the model classification layer to 16 channels? Does this matter?

label from synthia has to be convert to CityS format

hi, how to convert the format? do I need to preprocess the labels of synthia dataset?

Questions about FDA.demo.py

I guess you use code in FDA_demo to transform original images, but when run FDA_demo.py, nothing happen and the process over in several seconds. Does FDA_demo.py really work?

Question about the image transformation process

Hi, @YanchaoYang , thanks for the nice work.

I have a question about the image transformation process.

In FDA_demo.py, the pixel values of source and target image are in range of 0-255. But after we transforming the source image to the target style by swapping the amplitude spectrum, the generated image are not in this range (for exmaple the pixel value is in range -119-181 when setting alpha as 0.001).

Would this cause any trouble for model training since the generated images and target images used for training are not in the same pixel value range?

Many thanks.

DeepLab_init.pth file

Hi again,
Does this file hold the weights of DeepLab trained on GTA5 dataset or is it some regular initialization (like Imagenet for example)?

Thanks

synthia_mapped_to_cityscapes(labels)

Can not find the synthia_mapped_to_cityscapes(labels) in dataset. Can you help me?

Subtract mean 2 times

The readme notes that we should subtract the mean after the Fourier conversion.
However, I notice that the mean of the images are subtracted in the dataset class, method get_item.

FDA/data/gta5_dataset.py

Line 58 in b9a0cdf

image -= self.mean

And during the train loop, the mean is subtracted again:

FDA/train.py

Line 81 in b9a0cdf

    
           src_img = src_in_trg.clone() - mean_img                                 # src, src_lbl

May anyone shed some lights for this approach? Thank you for your help in advance.

Can we use FDA for faces?

Hi,
I tried to use FDA to solve the following task

Do you think FDA can be used to solve this?
I tried with L=0.01 or above it gives noise, red and blue spectrum. On running, with L=0.001, It gives an output, but there does not seem to be any style transfer taking place.

FDA numpy implementation not in line with the paper.

Bug

The border b in FDA numpy implementation https://github.com/YanchaoYang/FDA/blob/master/utils/__init__.py#L64 is taken as min(height, width)*beta which is not in line with the paper.

To quote the paper:

As we can see from Eq. (3), $\beta = 0$ will render $x^{s→t}$ the same as the original source image $x^s$. On the other hand, when $\beta = 1.0$, the amplitude of $x^s$ will be replaced by that of $x^t$.

But while setting the beta=1, we get almost the original source image back. Also, setting beta=1 should replace all amplitude not just along the smaller axis.

To Reproduce

Steps to reproduce the behavior:

betas = [0.001, 0.1, 0.5, 0.8, 0.9, 1.0]

f, axes = plt.subplots(3, 2, figsize=(30, 20))

for beta,ax  in zip(betas, axes.ravel()):
    
    image = FDA_source_to_target_np(src, tar , L=beta)
    image = np.clip(image, 0, 255).astype(np.uint8)
    image = image.transpose(1,2,0)
    ax.imshow(image)
    ax.set_title(f"L: {beta}")
    ax.set_axis_off()
    
f.tight_layout()

Output

Expected behavior

The "augmentation" should be maximum at beta=1 not at beta=0.5.

Target Image pair to Source Image

作者您好，请问Source Set和Target Set的图像配对是随机的吗？我看代码中是这样做的，请问这样不会有归纳偏差吗？

label from synthia has to be convert to CityS format

Why in

self.id_to_trainid = {3: 0, 4: 1, 2: 2, 21: 3, 5: 4, 7: 5,
15: 6, 9: 7, 6: 8, 16: 9, 1: 10, 10: 11, 17: 12,
8: 13, 18: 14, 19: 15, 20: 16, 12: 17, 11: 18}

you dont convert "void" 0 to 255 ?

FDA and random image augmentation methods

Hello, thank you for your outstanding work. I would like to know if you have compared the fda method and the random image augmentation method, because from the visual results, there are some similarities and fda may be a more theoretical image enhancement.

Slow speed of FDA

Hi, thanks for your awesome work, it's really a simlpe but effective method for domain adaptation.

However, the FDA_source_to_target_np() and FDA_source_to_target() is a little slow for runing code, is this a normal phenomenon?

initialization weights

Why does VGG16 have different initialization weights for synthia and GTA5?

how to train custom dataset？

thank you for this good work. I have a question for this dataset. Now i have two own dataset. One is generated from my simulation environment and another is from real world without annotation. It is actually about airport runway segmentation. If i want to use your model FDA, how can i adjust my dataset structure to train this model? thank you very much in advance

Demo.py artifacts in resultant image.

Error in the FDA implementation

Hi, thank you for sharing your code!

I have noticed two errors in your implementation of FDA in

FDA/utils/__init__.py

Lines 11 to 18 in b9a0cdf

    
           def low_freq_mutate( amp_src, amp_trg, L=0.1 ): 
        
               _, _, h, w = amp_src.size() 
        
               b = (  np.floor(np.amin((h,w))*L)  ).astype(int)     # get b 
        
               amp_src[:,:,0:b,0:b]     = amp_trg[:,:,0:b,0:b]      # top left 
        
               amp_src[:,:,0:b,w-b:w]   = amp_trg[:,:,0:b,w-b:w]    # top right 
        
               amp_src[:,:,h-b:h,0:b]   = amp_trg[:,:,h-b:h,0:b]    # bottom left 
        
               amp_src[:,:,h-b:h,w-b:w] = amp_trg[:,:,h-b:h,w-b:w]  # bottom right 
        
               return amp_src

Only the the top left and bottom left part of the image in the Fourier space needs to be changed.
This is because you use rfft (and not fft) and, as a result, only half of the image in the Fourier domain is returned (due to the symmetry). Otherwise, you are replacing high frequency which is not what is described in the paper.
The indices of the region mutated should be symmetrical around 0. Otherwise the output after inverse FFT will have a non-zero imaginary part (-b should be -b+1 above).

Also you may want to multiply b by 0.5 to get the max transform for L=1 instead of L=0.5 (see #36 (comment))

I propose to tackle those issues with this modified version fo your pytorch implementation:

def extract_ampl_phase(fft_im):
    # fft_im: size should be b x 3 x h x w
    fft_amp = torch.abs(fft_im)
    fft_pha = torch.angle(fft_im)
    return fft_amp, fft_pha


def low_freq_mutate(amp_src, amp_trg, L=0.1):
    _, _, h, w = amp_src.size()
    # multiply w by 2 because we have only half the space as rFFT is used
    w *= 2
    # multiply by 0.5 to have the maximum b for L=1 like in the paper
    b = (np.floor(0.5 * np.amin((h, w)) * L)).astype(int)     # get b
    if b > 0:
        # When rFFT is used only half of the space needs to be updated
        # because of the symmetry along the last dimension
        amp_src[:, :, 0:b, 0:b] = amp_trg[:, :, 0:b, 0:b]      # top left
        amp_src[:, :, h-b+1:h, 0:b] = amp_trg[:, :, h-b+1:h, 0:b]    # bottom left
    return amp_src


def FDA_source_to_target(src_img, trg_img, L=0.1):
    # get fft of both source and target
    fft_src = torch.fft.rfft2(src_img.clone(), dim=(-2, -1))
    fft_trg = torch.fft.rfft2(trg_img.clone(), dim=(-2, -1))

    # extract amplitude and phase of both ffts
    amp_src, pha_src = extract_ampl_phase(fft_src.clone())
    amp_trg, pha_trg = extract_ampl_phase(fft_trg.clone())

    # replace the low frequency amplitude part of source with that from target
    amp_src_ = low_freq_mutate(amp_src.clone(), amp_trg.clone(), L=L)

    # recompose fft of source
    real = torch.cos(pha_src.clone()) * amp_src_.clone()
    imag = torch.sin(pha_src.clone()) * amp_src_.clone()
    fft_src_ = torch.complex(real=real, imag=imag)

    # get the recomposed image: source content, target style
    _, _, imgH, imgW = src_img.size()
    src_in_trg = torch.fft.irfft2(fft_src_, dim=(-2, -1), s=[imgH, imgW])

    return src_in_trg

I have tested it with PyTorch v1.11.0.

Best wishes,
Lucas

Train from scratch requirements

Hi,
Thanks for this contribution.
What are the requirements for training the model from scratch? being more specific:

What gpu should be used?
how much gpu memory is needed?
which pytorch version?
which python version?
which cuda version?

I there anything crucial needed for reproducing the results of the paper? from my experience minor changes in the training environment can make a big difference.

Thanks =)

Question about loss_seg_src in SStrain.py?

Thanks a lot for your excellent work!
I found the loss_seg_src are used in SStrain.py. But in other related work, the model is usually only fine-tuned training on pseudo-label samples. I would like to ask if there is a big difference between these two methods

3D FDA

How to extend FDA to 3D?

why is the resolution=1344x576

Hi,

Is there any reason why the resolution of cityscapes is 1344, 576 instead of 1024, 512 while testing?

Thanks a lot in advance. I'm looking forward to your reply.

Question about second SST round

Hello and thank you for your great work :-)
I have a question about the performance boost of around 1,5% after the second SST round.
How do you achieve that? I tried it out twice and both times my model has a worse performance than after the first SST round.
Do I have to change something in the settings or do I just have to retry it a couple more times.

Evaluation checkpoint

Hi,
I trained a T=0 B=0.01 model (without self learning for now) and the final checkpoint weights of iteration 100,000 was ~41% mIoU.
When checking more checkpoints of earlier iterations I found better scores like ~43%.

My questions is as follows: How did you evaluate your score in the paper? for the same experiment you achieved a score of 44.61% and I was wondering if you used the last checkpoint or did you check all 40 iteration checkpoints until you found the best one.