liyunsheng13 / bdl Goto Github PK
View Code? Open in Web Editor NEWLicense: MIT License
License: MIT License
Hello,
I'm trying to train the CycleGAN in order to use the same configuration you had while training the network, but using a different dataset. After following the readme.md file, I ran the code but there are several opt
parameters inside the cycle_gan_model.py file of which I do not know the default parameters. Just to name few, there are opt.no_lsgan
, opt.fineSize
and so on...
Could you please tell me what the default values for such parameters were, in order to retrain the CycleGAN for my dataset? My dataset has even a different amount of labels, so I cannot use the weights you provided for the segmentation model...
Hi Yunsheng,
I noticed that in your paper you mentioned:
For SYNTHIA [28], we use the SYNTHIA-RAND-CITYSCAPES
set which contains 9, 400 images with the resolution 1280×
760 and 16 common categories with Cityscapes [5].
which means only 16 classes should be involved in the training and the evaluation in the experiment, but I found in the configuration of the dataset of Synthia that:
self.id_to_trainid = {3: 0, 4: 1, 2: 2, 21: 3, 5: 4, 7: 5, 15: 6, 9: 7, 6: 8, 16: 9, 1: 10, 10: 11, 17: 12, 8: 13, 18: 14, 19: 15, 20: 16, 12: 17, 11: 18}
so there are 19 classes in total, just like GTA V and cityscapes. Is that an inconsistency between the code and the paper or did I miss anything?
Thanks,
Kaihong
thres = []
for i in range(19):
x = predicted_prob[predicted_label==i]
if len(x) == 0:
thres.append(0)
continue
x = np.sort(x)
thres.append(x[np.int(np.round(len(x)*0.5))])
print (thres)
thres = np.array(thres)
thres[thres>0.9]=0.9
print (thres)
for index in range(len(targetloader)):
name = image_name[index]
label = predicted_label[index]
prob = predicted_prob[index]
for i in range(19):
label[(prob<thres[i])*(label==i)] = 255
output = np.asarray(label, dtype=np.uint8)
output = Image.fromarray(output)
name = name.split('/')[-1]
output.save('%s/%s' % (args.save, name))
What I did to train Cycle-GAN is to resize the image to 1024X or X1024 and then crop a patch with size 452*452. You can choose other size based on the GPU you use.
Originally posted by @liyunsheng13 in #11 (comment)
The original size of GTA5 images is 1914x1052, after resize the width of the images to 1024, then the hight should be (1052 * 1024 // 1914) = 562. The image you provided is 1024x564, so is there any extra processing such as padding?
Hello,
In this scenario (mentioned in the previous post):
Do "train CycleGAN" mean one-batch iteration (A - B - A, B - A - B) or complete training phase with 20 epochs?
Thank you.
Getting 1.5 to 2 percent worse results when I evaluated using downloaded pre-trained weights on PyTorch 1.7 and CUDA 11. Anyone else experiencing this?
python evaluation.py --restore-from ./snapshots/gta2city/gta_2_city_deeplab --save ./results/gta2city/ \
--data-dir-target /data/cityscapes/ --data-list-target ./dataset/cityscapes_list/val.txt \
--gt_dir /data/cityscapes/gtFine/val/ --devkit_dir ./dataset/cityscapes_list/
===>road: 90.9
===>sidewalk: 45.38
===>building: 83.56
===>wall: 32.19
===>fence: 25.46
===>pole: 27.81
===>light: 35.3
===>sign: 36.23
===>vegetation: 84.17
===>terrain: 39.95
===>sky: 83.95
===>person: 56.5
===>rider: 29.17
===>car: 81.48
===>truck: 31.93
===>bus: 46.8
===>train: 2.93
===>motocycle: 27.75
===>bicycle: 31.79
===> mIoU19: 47.01
===> mIoU16: 51.15
===> mIoU13: 56.38
Expected mIoU19: 48.5
$ python evaluation.py --restore-from ./snapshots/syn2city/syn_2_city_deeplab --save ./results/syn2city/ \
--data-dir-target /data/cityscapes/ --data-list-target ./dataset/cityscapes_list/val.txt \
--gt_dir /data/cityscapes/gtFine/val/ --devkit_dir ./dataset/cityscapes_list/
===>road: 86.27
===>sidewalk: 46.58
===>building: 79.1
===>wall: 6.04
===>fence: 0.55
===>pole: 23.55
===>light: 7.88
===>sign: 10.77
===>vegetation: 78.46
===>terrain: 0.0
===>sky: 81.57
===>person: 52.87
===>rider: 28.51
===>car: 71.43
===>truck: 0.0
===>bus: 36.6
===>train: 0.0
===>motocycle: 26.26
===>bicycle: 37.15
===> mIoU19: 35.45
===> mIoU16: 42.1
===> mIoU13: 49.5
Expected mIoU13: 51.4
Yunsheng,
Would you please provide the SSL model and hyper-parameters for the VGG-based models? Thanks.
hi,When I train cyclegan, GA can easily turn the sky in GTAV into a tree. Do you know what the problem is?thank you @liyunsheng13
Originally posted by @PJ1yang in #5 (comment)
hi,thanks for the code
I'm, trying to train GTA2Cityspace on VGG and DeepLab.
but no mater with or without self-supervised, after 2 iteration src seg loss will become nan. can you tell me how to fix it? thank you!
my environment:
CUDA Version: 11.1, pytorch-gpu:1.2.0
and here is my log on training self-supervised VGG:
Hi, I attempt to use your SSL strategy to my own DA framework, but my performance seriously decreased (from the initial 44.3% to 41.5%).
My DA framework is similar to AdasegNet, and I use your SSL.py to generate the pseudo labels.
My loss is :
L = L_sourceseg + \lamdaL_da (1)
L = L_sourceseg + \lamda1L_da +\lamda2*L_ssl (2)
First, I use (1) to train my network, and the best mIoU is 44.3%. Then, I use the best model to generate pseudo labels (use SSL.py). Finally, I use the pseudo labels to train the Eq(2). But the result is decreased.
Could you give me some suggestion on how to use the pseudo labels?
Hi, thanks for sharing the code. Noted that in SSL training, you set it at 120000 iterations for early stopping and choose the model at that iteration for next stage. In the paper it got 47.2 mIoU from initial 42.7 mIoU by 1-stage SSL training and then from 44.3 to 48.5. Did you obeserve the mIoUs in mid-iterations in the training process? Such as in 60000 iteraions or others.
Hello @liyunsheng13 , thank you very much for the code. I have questions on Algorithm 1 in your paper:
How do you select which M^k_i model (trained with Eqn 3) that will be used for the next iteration. I imagine that you validate all snapshots of M^k_i (saved during training) and pick the best?
Similar question for F^k (trained with Eqn 2) and M^k_0 (trained with Eqn 1)
Thanks
hi, i'm recently doing BDL, i want to ask, how to train own dataset?
Hi,
Thanks for a great paper and code.
I was trying to reproduce the result for M0(1)[F(1)] without SSL. As far as I understand it is required CycleGan learning and Segmentation learning (without SSL).
I used the files you released for CycleGan and merge it to the updated code of CycleGan repository.
I got 40.7 IOT while I should get 42.7.
Any chance you can release also the files that you didn’t change for CycleGan? (including the command line)
I used Patch size of 400 instead of 452 dues to GPU memory size. Do you think such variance make sense?
The command line I used are:
CycleGan - train:
python train.py --dataroot datasets/gta/ --display_id -1 --init_weights deep_lab_checkpoint/cyclegan_sem_model.pth --niter 10 --niter_decay 10 --crop_size 400 --load_size 1024 --lambda_identity 0
CycleGan Test:
python test.py --dataroot datasets/gta/ --name --load_size 1024 --preprocess scale_width --num_test 10000000
BDL:
python BDL.py --snapshot-dir ./snapshots/gta2city --init-weights DeepLab/DeepLab_init.pth --num-steps-stop 80000 --model DeepLab --data-dir <data_dir> --data-list dataset/gta5_list/train.txt --data-list-target dataset/cityscapes_list/train.txt --data-dir-target <data_dir>
Thanks for sharing your code firstly. After reading your code and paper carefully, I still have some questions, how to train CycleGAN by bidirectional learning?
My undestandings about that are shown as follows:
Are my understanding right ? Thanks for your answers!
When I first train translation model, CycleGAN transforms sky into vegetation or building. So, when train segmentation model using translated source image, there is many noisy label to disrupt training.
Question is
HI, Yunsheng, nice work.
I was kind of confused when I was reading the paper.
I understood the 'bi-directional' as simultaneous forward and backward pass', however when I check the code, I found it is not simultaneous training.
If I am correct, the training step might be:
(1) train CycleGAN to get translated images
(2) train BDL.py to get segmentation model with the translated images and source images,
(3) train SSL.py to get pseudo-labels and refine the segmentation model.
(4) retrain CycleGAN with an additional perceptual loss.
Then what's the next step? Step (1) tries to get better-translated images with the model of step (4)?
And then repeat (2)-(3). Does that mean this needs to be done multiple times based on the number of steps we want to try?
I am kind of confused about it since I thought 'bi-directional' was for simultaneous training of Im2Im translation and segmentation.
Moreover, the CycleGAN folder is not complete. Some libraries are missing :
from util.image_pool import ImagePool
from .base_model import BaseModel
from fcn8s_LSD import FCN8s_LSD #lys
For the last one, fcn8s_LSD is the model from step (4)?
thanks for your help.
Hi, I am sorry to disturb you again. I was trying to evaluate the pre-trained models which are provided by this project but I met some difficulties. Can you give me some suggestions? Thanks in advance!
You provided the pre-trained models in the README file, they are GTA5_deeplab, GTA5_VGG, SYNTHIA_deeplab, and SYNTHIA_VGG. In my understanding, I can get the same results with the paper by running evaluation.py on the test dataset. However, the results I got are as follows
python evaluation.py --restore-from gta_2_city_deeplab --model DeepLab --save test
===> mIoU19: 48.52
python evaluation.py --restore-from gta_2_city_vgg --model VGG --save test
===> mIoU19: 41.06
python evaluation.py --restore-from syn_2_city_deeplab --model DeepLab --save test
===> mIoU13: 51.32
python evaluation.py --restore-from syn_2_city_vgg --model VGG --save test
===> mIoU16: 38.81
Thanks for sharing the codes!
Could you please show more results on the task SYNTHIA to Cityscapes, such as M(2)0F(2) M(2)1F(2)
Thanks a lot.
First of all, it's just a silly question.
My understand for "real" and "fake":
I found that in your training code, the adversarial loss for output probility appears to be the opposite of the common setting, that would be 1 for "real" data, 0 for "fake" data. In this case, since we have the gt for translated sync data, we need to push unlabelled/pseudo labelled real data to behave like labelled data, that means translated sync data data should be "real" and target data should be "fake".
Common setting of adversarial training:
When we train the generator(segmentation network), we push the output of discriminator for generator input to "real"(1). When we train the discriminator, the input produced by the generator should be classified as "fake"(0).
Your setting:
You revese the domain label for the real data and generator(segmentation) output. Although, it may have little influence for the final result, I still want to check if this a personal preference or a deliberate design?
It appears that you use Adam and a stepLR schedule, isn't Adam supposed to be an adaptive optimizer, which is designed to replace a handcrafted lr scheduler? I wonder why not just use SGD with momentum if you are adjusting the lr yourself?
i met a problem at running train.py for cyclegan, when i replace the model file in the cyclegan by using the uploaded cyclegan files from @liyunsheng13, i find it will happend that list index out of range like following picture showing
do you guys have any solution for that
Thanks for sharing your code. I want to know the iteration number of provided GTA2Cityscapes dataset, SSL_step1 and SSL_step2 model, is K=1 or K=2? or after training, will we get mIoU of 47.2 or 48.5?
Would you please release your trained CycleGAN model for Synthia->Cityscapes translation? Thanks.
Hi, I have read the paper and still have som problems.
CycleGAN is trained with a perceptual loss. Does the first image translation use the perceptual loss? If so, which segmentation model parameters is used. Is the source only model with 33.6 mIoU in the paper?
With first traslated images in hand, when starting the adversarial training of the segmentation model, is the initial model parameters the ImageNet pretrained parameters or the source-image pretrained parameters with mIoU of 33.6?
Thanks for sharing the codes!
I have a quetion about training Cycle-GAN.
Did you use all 24966 GTA5 images and 2975 Cityscapes images for training the Cycle-GAN?
When i traning it on RTX2080tI , it will take a few days.
In the paper you say that L1 Loss is used for l_per , instead, you use mse loss in your code
[cycle _gan_model.py def compute_semantic_loss] ,which one is correct ?
Hi, when I am trying to test the pre-trained VGG models, I got the "out of memory" error. Can you help me to fix it? Thanks.
In this file, line numbers 115 to 119,
def forward(self, x):
out = self.conv2d_list[0](x)
for i in range(len(self.conv2d_list) - 1):
out += self.conv2d_list[i + 1](x)
return out
The return statement is inside the for loop, instead of outside the for loop.
I followed your training steps by using
python BDL.py --snapshot-dir ./snapshots/gta2city
--init-weights /path/to/inital_weights
--num-steps-stop 80000
--model DeepLab
I took the initial model DeepLabV2 as initial weights. Source data is the provided translated images GTA5 as CityScapes (DeepLab). Target data is the training set of Cityscapes. Then I only got 43.6 mIou while I should get 44.3.
could you give more explanation about using self supervised training?
In your paper, using the bidirectional lerning, the image translation model and the segmentation adaptation model can be learned alternatively. But, I think there is not forward direction process in BDL.py. Please teach me the reason. Thank you in advance.
Could you provide complete training procedure of your network for gta5 to cityscape dataset? I have some confusion about training cycle Gan and using procedures.
Hi,
Thanks for sharing the codes!
I have a small question about the training of Cycle-GAN.
For the training of Cycle-GAN, the input images are resized to 256x256 for the training. But the original size of image is 1024x564 (such as GTA), so after training of Cycle-GAN, how could you generate the transfered image of GTA in the original size (1024x564) ?
If you directly feed the original GTA image to the model trained with 256x256, does it harm the transfered performance?
Thanks very much and looking forward to your reply!
Hi, I am curious how to decide when to stop the training and how to choose the final snapshots. It's not clarified in your paper. I found the "Early Stopping" parameters in your code, how to set this hyper-parameter?
Hi,
I am a bit confused about the fineSize parameter for the CycleGAN training process.
In your Readme file found in the cycle-gan the fineSize mentioned for training is 452, while in a previous post you say cycle-gan is trained with fineSize=1024.
This is the previous post I am referring: #28
which one is the right one?
and in general, what is the fineSize policy for the cycleGAN training?
seems reasonable so choose the same input size used in BDL for that dataset so cycleGAN will optimize outputting images with that size and they can be used later in the BDL.py traning.
Thanks =)
Hi, I find two inconsistencies between paper and code. Can you give me some suggestions? Thanks!
The paper says that "When training the segmentation adaptation model, images are resized with the long side to be 1,024 and the ratio is kept." However, for the dataloader, the code of image sizes is as follows
Line 9 in 27dc56b
The paper says that "For FCN-8s with VGG16, we use Adam as the optimizer with momentum as 0.9 and 0.99. The initial learning rate is 1 × 10−5 and decreased with ‘step’ learning rate policy with step size as 5000 and γ = 0.1." However, the code for adjusting learning rate is as follows. I mean, is the number 5000 correct?
Line 188 in 27dc56b
Line 43 in 27dc56b
Hi. I have some questions about the training of CycleGAN with perception loss.
When I trained for the second round, the terminal made the following mistake:
scheduler.step()
AttributeError: 'NotImplementedError' object has no attribute 'step'
Can you help me? My choice of learning rate model is: --lr_policy = linear
Thank you.
In inner loop for SSL, if N=2, segmentation network is trained 3 times in a row.
When training segmentation network, how did you set the discriminator's weight?
Initialize discriminator every time or use previous discriminator's weight??
Also same question for outer loop's weight.
Before training FCN8s-VGG16 model, I test initial weight you provide.
Then I got 30.0 mIoU.
In paper, you wrote imagenet pretrained model.
But in my thought, initial model's high mIoU means it is trained on supervision of segmentation data.
How did you make the initial weight??
Hi, yunsheng, I can't open the GTA5 as CityScapes url on drive.google, could you please give other urls for example on Baiduyun? THX
Hi
When I train segmentation network with the pseudo labels with your code,
for GTA5, training is stable and mIoU is incrementally increases.
But for SYNTHIA, it shows different pattern.
The mIoU first steeply rise in early iteration, but decrease as train proceed.
I think that ,because layout gap between SYNTHIA and Cityscapes is much bigger, overfitting problem occurs in SYNTHIA setting.
Did you get the same pattern?
Hello,
Which files in official CycleGAN code should be replaced by your code? I try to replace the ./modes/ ./options/ , but it doesn't work .
First of all, thanks for this contribution.
As some may have noticed, there is some inconsistency between the BDL repository and the CycleGAN repository which has updated since it was used by BDL. thus some code and params on the BDL repository needs an update as well. This fixes are based on the history of the CycleGAN repository.
I hope I did not miss anything. If anyone notices something wrong or anything I missed please comment this issue.
Param name updates: (old -> new)
(fix: change the name on the provided script on BDL/cyclegan/readme.md)
Params more complex updates:
lr_policy: this parameter default was changed to 'linear' from 'lambda'. change the default back to 'lambda'.
If you wish you can copy the linear policy implementation from the cycleGAN repository networks.py file to the given networks.py.
which_direction parameter was chenged to direction. go to the given cycle_gan_model.py file and change all opt.which_direction usages to opt.direction.
which_model_netG, and which_model_netD where changed to 'netG' and 'netD'. go to the given cycle_gan_model.py file and change all which_model_netG and which_model_netD
usages to netG and netD.
no_lsgan parameter was changed to gan_mode and its type changed from bool to str which indicates the wanted gan mode.
a previous issue posted raised this issue and the repository owner commented that the no_lsgan param should be False at default.
since this parameter was not changed in the given run command (BDL/cyclegan/readme.md) we can indicate that we want to use the lsgan and the gan_mode parameter is set to "lsgan" as default so were good.
all left to do is change two lines in the given cycle_gan_model.py file: (#line num: old -> new)
62: use_sigmoid = opt.no_lsgan -> use_sigmoid = opt.gan_mode != 'lsgan'
74: self.criterionGAN = networks.GANLoss(use_lsgan=not opt.no_lsgan).to(self.device) -> self.criterionGAN = networks.GANLoss(use_lsgan=opt.gan_mode == 'lsgan').to(self.device)
Additional Problems:
for i in range(19):
x = predicted_prob[predicted_label==i]
if len(x) == 0:
thres.append(0)
continue
x = np.sort(x)
thres.append(x[np.int(np.round(len(x)*0.5))])
print thres
thres = np.array(thres)
thres[thres>0.9]=0.9
print thres
In the code above, applying 0.5(len(x) * 0.5) seems to get the median of confidence per class.
In some cases, the corresponding confidence value could be 0.4 or 0.6. Is that right?
If my interpretation is correct, are there any other references on how to assign these pseudo-labels?
If not, is it a technical skill?
Hi, could you release the M(0) model of DeepLab for GTA5 and SYNTHIA ?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.