visionlearninggroup / da_detection Goto Github PK

View Code? Open in Web Editor NEW

345.0 9.0 81.0 1.96 MB

Implementation of "Strong-Weak Distribution Alignment for Adaptive Object Detection"

Home Page: http://cs-people.bu.edu/keisaito/research/CVPR2019.html

License: MIT License

Python 84.53% MATLAB 0.25% Shell 0.44% Cuda 6.18% C 8.26% C++ 0.34%

da_detection's Introduction

A Pytorch Implementation of Strong-Weak Distribution Alignment for Adaptive Object Detection (CVPR 2019)

Introduction

Follow faster-rcnn repository to setup the environment. When installing pytorch-faster-rcnn, you may encounter some issues. Many issues have been reported there to setup the environment. We used Pytorch 0.4.0 for this project. The different version of pytorch will cause some errors, which have to be handled based on each envirionment.

Data Preparation

PASCAL_VOC 07+12: Please follow the instructions in py-faster-rcnn to prepare VOC datasets.
Clipart, WaterColor: Dataset preparation instruction link Cross Domain Detection . Images translated by Cyclegan are available in the website.
Sim10k: Website Sim10k
Cityscape-Translated Sim10k: TBA
CitysScape, FoggyCityscape: Download website Cityscape, see dataset preparation code in DA-Faster RCNN

All codes are written to fit for the format of PASCAL_VOC. For example, the dataset Sim10k is stored as follows.

$ cd Sim10k/VOC2012/
$ ls
Annotations  ImageSets  JPEGImages
$ cat ImageSets/Main/val.txt
3384827.jpg
3384828.jpg
3384829.jpg
.
.
.

If you want to test the code on your own dataset, arange the dataset in the format of PASCAL, make dataset class in lib/datasets/. and add it to lib/datasets/factory.py, lib/datasets/config_dataset.py. Then, add the dataset option to lib/model/utils/parser_func.py.

Data Path

Write your dataset directories' paths in lib/datasets/config_dataset.py.

Pretrained Model

We used two models pre-trained on ImageNet in our experiments, VGG and ResNet101. You can download these two models from:

VGG16: Dropbox, VT Server
ResNet101: Dropbox, VT Server

Download them and write the path in __C.VGG_PATH and __C.RESNET_PATH at lib/model/utils/config.py.

sample model

Global-local alignment model for watercolor dataset.

ResNet101 (adapted to water color) GoogleDrive

Train

Sample training script is in a folder, train_scripts.
With only local alignment loss,

 CUDA_VISIBLE_DEVICES=$GPU_ID python trainval_net_local.py \
                    --dataset source_dataset --dataset_t target_dataset --net vgg16 \
                    --cuda

Add --lc when using context-vector based regularization loss.

With only global alignment loss,

 CUDA_VISIBLE_DEVICES=$GPU_ID python trainval_net_global.py \
                    --dataset source_dataset --dataset_t target_dataset --net vgg16 \
                    --cuda

Add --gc when using context-vector based regularization loss.

With global and local alignment loss,

 CUDA_VISIBLE_DEVICES=$GPU_ID python trainval_net_global_local.py \
                    --dataset source_dataset --dataset_t target_dataset --net vgg16 \
                    --cuda

Add --lc and --gc when using context-vector based regularization loss.

Test

Sample test script is in a folder, test_scripts.

 CUDA_VISIBLE_DEVICES=$GPU_ID python test_net_global_local.py \
                    --dataset target_dataset --net vgg16 \
                    --cuda --lc --gc --load_name path_to_model

Citation

Please cite the following reference if you utilize this repository for your project.

@article{saito2018strong,
  title={Strong-Weak Distribution Alignment for Adaptive Object Detection},
  author={Saito, Kuniaki and Ushiku, Yoshitaka and Harada, Tatsuya and Saenko, Kate},
  journal={arXiv},
  year={2018}
}

da_detection's People

Contributors

Stargazers

Watchers

Forkers

chen-dixi cuthbertcai chenjinbit xychen9459 zdqf canqin001 denglixi gameidiot peterzs nemonameless benedictflorance systemcorps jarygrace starstylesky chenmingthu waterbearbee lemingguo guoleming xueboshan bityangke jerryflymi zhen-ao siqiyang zxr1314 cacashou diyer22-forks velodc linyuehzzz bai0925 banyueqin whrws xingliujia youtang1993 anhnktp vikash0837 kouohhashi edwardaaa 2017210698 flashkong chenchiwhu ciela 1061136002 rosyapril zhumingxu lcl-2019 youngjoo-kim xingxinggui yurongchen1998 mcj-part3 kcrayonchen kouxichao kinredon taotaoxu qiguming mandyxiaomeng dallen-code happy20200 weigq xyl-py nosoldier natsumetetsuya irisrainbowneko jiawen7777 piaofu110 yuwlong666 iostream11 poxiaoge wcf666 junguangjiang ethanyhzhang oliverliyu fengjunjian vibashan yangxu351 natureyoo davidpengiupui transjaguarpol degraded-ai-vision-lab changanleo susie0909

da_detection's Issues

DA_faster evaluate on clipart score is higher than your papers

Hi,
I use this open-source code to evaluate da_faster on voc-clipart.
The dataset I used as follows:
source dataset: VOC07+12 trainval, 16k images all.
target dataset: clipart1k (train+test)

And I got higher score(voc07 metric, IoU=0.5) than your papers:

mAP: 0.3296
aeroplane       : 0.2564
bicycle         : 0.4188
bird            : 0.2848
boat            : 0.2978
bottle          : 0.3248
bus             : 0.3452
car             : 0.3970
cat             : 0.0805
chair           : 0.3393
cow             : 0.4999
diningtable     : 0.1703
dog             : 0.2019
horse           : 0.3035
motorbike       : 0.3662
person          : 0.5565
pottedplant     : 0.4436
sheep           : 0.1996
sofa            : 0.2630
train           : 0.3749
tvmonitor       : 0.4676

Following is the soruce-only score of clipart1k:

 mAP: 0.2754
aeroplane       : 0.1798
bicycle         : 0.4767
bird            : 0.2161
boat            : 0.1275
bottle          : 0.2338
bus             : 0.6437
car             : 0.3147
cat             : 0.1322
chair           : 0.3185
cow             : 0.1364
diningtable     : 0.1931
dog             : 0.1273
horse           : 0.2536
motorbike       : 0.4130
person          : 0.3196
pottedplant     : 0.3662
sheep           : 0.0909
sofa            : 0.2208
train           : 0.4274
tvmonitor       : 0.3165

I just do not know there was anything wrong with my experiment or anything else, because my result(Map50=0.3296) was much higher than your papers (mAP50=19.8)

domain label are different from paper

Thank you for your great work. I find a question in loss function: the domain_s label is 1 and domain_t label is 0 in paper, but in the code domain label are 0 and 1. In addition,the loss function in paper is contradictory

ImportError: No module named cython_bbox

Hello, when I train with trainval_net_global_local.py, I meet the problem of Import error, What should I do?

from roi_data_layer.roidb import combined_roidb
...
from model.utils.cython_bbox import bbox_overlaps
ImportError: No module named cython_bbox

Looking forward to your reply, thank you very much!

Unexpected key(s) in state_dict: "netD.bn1.num_batches_tracked", "netD.bn2.num_batches_tracked", "netD.bn3.num_batches_tracked".

somebody met this problem？

ModuleNotFoundError: No module named 'lib.model.roi_crop._ext.roi_crop._roi_crop'

Traceback (most recent call last):
File "trainval_net_global_local.py", line 25, in
from lib.model.utils.net_utils import weights_normal_init, save_net, load_net,
File "E:\DA_Detection-master\lib\model\utils\net_utils.py", line 10, in
from lib.model.roi_crop.functions.roi_crop import RoICropFunction
File "E:\DA_Detection-master\lib\model\roi_crop\functions\roi_crop.py", line 4, in
from lib.model.roi_crop.ext import roi_crop
File "E:\DA_Detection-master\lib\model\roi_crop_ext\roi_crop_init.py", line 3, in
from ._roi_crop import lib as _lib, ffi as _ffi
ModuleNotFoundError: No module named 'lib.model.roi_crop._ext.roi_crop._roi_crop'

Multi Gpu training issue

Hi,
when i add --mGPUs for multi-gpu training , some errors occured to me as the following error informations.
DOES it support multi-gpu training???

Traceback (most recent call last):
131 File "trainval_net_global_local.py", line 201, in
132 rois_label, out_d_pixel, out_d = fasterRCNN(im_data, im_info, gt_boxes, num_boxes)
133 File "/mnt/lustre/yanghang1/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 491, in call
134 result = self.forward(*input, **kwargs)
135 File "/mnt/lustre/yanghang1/anaconda3/lib/python3.6/site-packages/torch/nn/parallel/data_parallel.py", line 115, in forward
136 return self.gather(outputs, self.output_device)
137 File "/mnt/lustre/yanghang1/anaconda3/lib/python3.6/site-packages/torch/nn/parallel/data_parallel.py", line 127, in gather
138 return gather(outputs, output_device, dim=self.dim)
139 File "/mnt/lustre/yanghang1/anaconda3/lib/python3.6/site-packages/torch/nn/parallel/scatter_gather.py", line 68, in gather
140 return gather_map(outputs)
141 File "/mnt/lustre/yanghang1/anaconda3/lib/python3.6/site-packages/torch/nn/parallel/scatter_gather.py", line 63, in gather_map
142 return type(out)(map(gather_map, zip(*outputs)))
143 File "/mnt/lustre/yanghang1/anaconda3/lib/python3.6/site-packages/torch/nn/parallel/scatter_gather.py", line 55, in gather_map
144 return Gather.apply(target_device, dim, *outputs)
145 File "/mnt/lustre/yanghang1/anaconda3/lib/python3.6/site-packages/torch/nn/parallel/_functions.py", line 54, in forward
146 ctx.input_sizes = tuple(map(lambda i: i.size(ctx.dim), inputs))
147 File "/mnt/lustre/yanghang1/anaconda3/lib/python3.6/site-packages/torch/nn/parallel/_functions.py", line 54, in
148 ctx.input_sizes = tuple(map(lambda i: i.size(ctx.dim), inputs))
149 RuntimeError: dimension specified as 0 but tensor has no dimensions

how to implement custom class?

I searched on internet for some guidelines to implement the class to load my custom dataset but I did not find nothing. Looking at the code of the classes already implemented I am not clear what are the necessary methods to insert inside the class. Could anyone give me an idea of how to write it?

Results on adpatation from PASCAL VOC to Clipart Dataset does not match

I am trying to get the results in table one for Source Only and Proposed method but it does not match with the paper, can you provide more details on the specifics please.
thanks

RuntimeError: cuDNN version mismatch: PyTorch was compiled against 7102 but linked against 7600

I have installed cudnn7.1.2,but still met this error:
RuntimeError: cuDNN version mismatch: PyTorch was compiled against 7102 but linked against 7600
Would you please tell me how to deal with it?

Warning: NaN or Inf found in input tensor.

I am getting this warning when I train from Cityscapes to Foggy Cityscapes. The RPN regression box loss becomes NaN.

I tried changing all these, but the error still goes NaN.

From jwyang/faster-rcnn.pytorch#136 (comment)

            x1 = max(float(bbox.find('xmin').text) - 1, 0)
            y1 = max(float(bbox.find('ymin').text) - 1, 0)
            x2 = max(float(bbox.find('xmax').text) - 1, 0)
            y2 = max(float(bbox.find('ymax').text) - 1, 0)

From jwyang/faster-rcnn.pytorch#193 (comment)
not_keep = (gt_boxes[:,2] - gt_boxes[:,0]) < 10 and (gt_boxes[:,3] - gt_boxes[:,1]) < 10

It stills throws the warning. What may be the possible reason for the issue?

The reason why using all the clipart data set when adapt from VOC to clipart?

In the paper, when conducting the experiment of adaptation from VOC to clipart, you said:

All images were used for both training(without labels) and testing.

In the file parser_func.py:

 elif args.dataset == "clipart":
            args.imdb_name = "clipart_trainval"
            args.imdbval_name = "clipart_trainval"
            args.set_cfgs = ['ANCHOR_SCALES', '[8, 16, 32]', 'ANCHOR_RATIOS', '[0.5,1,2]', 'MAX_NUM_GT_BOXES',
                             '20']

You use same training set and test set.

So, the reason why using all the clipart data set when adapting from VOC to clipart?

import error when train

i have import error when try to train
could you be so kind to help me?
thank you In advance

:~/DA_Detection$ CUDA_VISIBLE_DEVICES=$1 python trainval_net_global_local.py --cuda --net res101 --dataset pascal_voc_water --dataset_t water --gc --lc --save_dir $2
Traceback (most recent call last):
File "trainval_net_global_local.py", line 25, in
from model.utils.net_utils import weights_normal_init, save_net, load_net,
File "/home/DA_Detection/lib/model/utils/net_utils.py", line 10, in
from model.roi_crop.functions.roi_crop import RoICropFunction
File "/home/DA_Detection/lib/model/roi_crop/functions/roi_crop.py", line 4, in
from .._ext import roi_crop
File "/home/DA_Detection/lib/model/roi_crop/_ext/roi_crop/init.py", line 3, in
from ._roi_crop import lib as _lib, ffi as _ffi
ImportError: /home/DA_Detection/lib/model/roi_crop/_ext/roi_crop/_roi_crop.so: undefined symbol: __cudaPopCallConfiguration

How to draw the picture like this in your paper?

cityscape.py did not filter the target domain images

After using prepare_data.m, source domain images and target domain images are recorded in the same trainval.txt.
In the foggy_cityscape.py, source domain images are filtered in _load_image_set_index(), but in cityscape.py, the target domain are not filtered. Do you regard all the images in trainval.txt as source domain ?

pretrained model load error

Hi,
Thanks for your open-resource. But I have met such model loading error as follows:
When I run test_net_global_local.py according to your README.md tutorial, it seems that model 's state_dict not match:

self.__class__.__name__, "\n\t".join(error_msgs)))
RuntimeError: Error(s) in loading state_dict for ResNet:
	Missing key(s) in state_dict: "conv1.weight", "bn1.weight", "bn1.bias", "bn1.running_mean", "bn1.running_var", "layer1.0.conv1.weight", "layer1.0.bn1.weight", "layer1.0.bn1.bias", "layer1.0.bn1.running_mean", "layer1.0.bn1.running_var", "layer1.0.conv2.weight", "layer1.0.bn2.weight", "layer1.0.bn2.bias", "layer1.0.bn2.running_mean", "layer1.0.bn2.running_var", "layer1.0.conv3.weight", "layer1.0.bn3.weight", "layer1.0.bn3.bias", "layer1.0.bn3.running_mean", "layer1.0.bn3.running_var", "layer1.0.downsample.0.weight", "layer1.0.downsample.1.weight", "layer1.0.downsample.1.bias", "layer1.0.downsample.1.running_mean", "layer1.0.downsample.1.running_var", "layer1.1.conv1.weight", "layer1.1.bn1.weight", "layer1.1.bn1.bias", "layer1.1.bn1.running_mean", "layer1.1.bn1.running_var", "layer1.1.conv2.weight", "layer1.1.bn2.weight", "layer1.1.bn2.bias", "layer1.1.bn2.running_mean", "layer1.1.bn2.running_var", "layer1.1.conv3.weight", "layer1.1.bn3.weight", "layer1.1.bn3.bias", "layer1.1.bn3.running_mean", "layer1.1.bn3.running_var", "layer1.2.conv1.weight", "layer1.2.bn1.weight", "layer1.2.bn1.bias", "layer1.2.bn1.running_mean", "layer1.2.bn1.running_var", "layer1.2.conv2.weight", "layer1.2.bn2.weight", "layer1.2.bn2.bias", "layer1.2.bn2.running_mean", "layer1.2.bn2.running_var", "layer1.2.conv3.weight", "layer1.2.bn3.weight", "layer1.2.bn3.bias", "layer1.2.bn3.running_mean", "layer1.2.bn3.running_var", "layer2.0.conv1.weight", "layer2.0.bn1.weight", "layer2.0.bn1.bias", "layer2.0.bn1.running_mean", "layer2.0.bn1.running_var", "layer2.0.conv2.weight", "layer2.0.bn2.weight", "layer2.0.bn2.bias", "layer2.0.bn2.running_mean", "layer2.0.bn2.running_var", "layer2.0.conv3.weight", "layer2.0.bn3.weight", "layer2.0.bn3.bias", "layer2.0.bn3.running_mean", "layer2.0.bn3.running_var", "layer2.0.downsample.0.weight", "layer2.0.downsample.1.weight", "layer2.0.downsample.1.bias", "layer2.0.downsample.1.running_mean", "layer2.0.downsample.1.running_var", "layer2.1.conv1.weight", "layer2.1.bn1.weight", "layer2.1.bn1.bias", "layer2.1.bn1.running_mean", "layer2.1.bn1.running_var", "layer2.1.conv2.weight", "layer2.1.bn2.weight", "layer2.1.bn2.bias", "layer2.1.bn2.running_mean", "layer2.1.bn2.running_var", "layer2.1.conv3.weight", "layer2.1.bn3.weight", "layer2.1.bn3.bias", "layer2.1.bn3.running_mean", "layer2.1.bn3.running_var", "layer2.2.conv1.weight", "layer2.2.bn1.weight", "layer2.2.bn1.bias", "layer2.2.bn1.running_mean", "layer2.2.bn1.running_var", "layer2.2.conv2.weight", "layer2.2.bn2.weight", "layer2.2.bn2.bias", "layer2.2.bn2.running_mean", "layer2.2.bn2.running_var", "layer2.2.conv3.weight", "layer2.2.bn3.weight", "layer2.2.bn3.bias", "layer2.2.bn3.running_mean", "layer2.2.bn3.running_var", "layer2.3.conv1.weight", "layer2.3.bn1.weight", "layer2.3.bn1.bias", "layer2.3.bn1.running_mean", "layer2.3.bn1.running_var", "layer2.3.conv2.weight", "layer2.3.bn2.weight", "layer2.3.bn2.bias", "layer2.3.bn2.running_mean", "layer2.3.bn2.running_var", "layer2.3.conv3.weight", "layer2.3.bn3.weight", "layer2.3.bn3.bias", "layer2.3.bn3.running_mean", "layer2.3.bn3.running_var", "layer3.0.conv1.weight", "layer3.0.bn1.weight", "layer3.0.bn1.bias", "layer3.0.bn1.running_mean", "layer3.0.bn1.running_var", "layer3.0.conv2.weight", "layer3.0.bn2.weight", "layer3.0.bn2.bias", "layer3.0.bn2.running_mean", "layer3.0.bn2.running_var", "layer3.0.conv3.weight", "layer3.0.bn3.weight", "layer3.0.bn3.bias", "layer3.0.bn3.running_mean", "layer3.0.bn3.running_var", "layer3.0.downsample.0.weight", "layer3.0.downsample.1.weight", "layer3.0.downsample.1.bias", "layer3.0.downsample.1.running_mean", "layer3.0.downsample.1.running_var", "layer3.1.conv1.weight", "layer3.1.bn1.weight", "layer3.1.bn1.bias", "layer3.1.bn1.running_mean", "layer3.1.bn1.running_var", "layer3.1.conv2.weight", "layer3.1.bn2.weight", "layer3.1.bn2.bias", "layer3.1.bn2.running_mean", "layer3.1.bn2.running_var", "layer3.1.conv3.weight", "layer3.1.bn3.weight", "layer3.1.bn3.bias", "layer3.1.bn3.running_mean", "layer3.1.bn3.running_var", "layer3.2.conv1.weight", "layer3.2.bn1.weight", "layer3.2.bn1.bias", "layer3.2.bn1.running_mean", "layer3.2.bn1.running_var", "layer3.2.conv2.weight", "layer3.2.bn2.weight", "layer3.2.bn2.bias", "layer3.2.bn2.running_mean", "layer3.2.bn2.running_var", "layer3.2.conv3.weight", "layer3.2.bn3.weight", "layer3.2.bn3.bias", "layer3.2.bn3.running_mean", "layer3.2.bn3.running_var", "layer3.3.conv1.weight", "layer3.3.bn1.weight", "layer3.3.bn1.bias", "layer3.3.bn1.running_mean", "layer3.3.bn1.running_var", "layer3.3.conv2.weight", "layer3.3.bn2.weight", "layer3.3.bn2.bias", "layer3.3.bn2.running_mean", "layer3.3.bn2.running_var", "layer3.3.conv3.weight", "layer3.3.bn3.weight", "layer3.3.bn3.bias", "layer3.3.bn3.running_mean", "layer3.3.bn3.running_var", "layer3.4.conv1.weight", "layer3.4.bn1.weight", "layer3.4.bn1.bias", "layer3.4.bn1.running_mean", "layer3.4.bn1.running_var", "layer3.4.conv2.weight", "layer3.4.bn2.weight", "layer3.4.bn2.bias", "layer3.4.bn2.running_mean", "layer3.4.bn2.running_var", "layer3.4.conv3.weight", "layer3.4.bn3.weight", "layer3.4.bn3.bias", "layer3.4.bn3.running_mean", "layer3.4.bn3.running_var", "layer3.5.conv1.weight", "layer3.5.bn1.weight", "layer3.5.bn1.bias", "layer3.5.bn1.running_mean", "layer3.5.bn1.running_var", "layer3.5.conv2.weight", "layer3.5.bn2.weight", "layer3.5.bn2.bias", "layer3.5.bn2.running_mean", "layer3.5.bn2.running_var", "layer3.5.conv3.weight", "layer3.5.bn3.weight", "layer3.5.bn3.bias", "layer3.5.bn3.running_mean", "layer3.5.bn3.running_var", "layer3.6.conv1.weight", "layer3.6.bn1.weight", "layer3.6.bn1.bias", "layer3.6.bn1.running_mean", "layer3.6.bn1.running_var", "layer3.6.conv2.weight", "layer3.6.bn2.weight", "layer3.6.bn2.bias", "layer3.6.bn2.running_mean", "layer3.6.bn2.running_var", "layer3.6.conv3.weight", "layer3.6.bn3.weight", "layer3.6.bn3.bias", "layer3.6.bn3.running_mean", "layer3.6.bn3.running_var", "layer3.7.conv1.weight", "layer3.7.bn1.weight", "layer3.7.bn1.bias", "layer3.7.bn1.running_mean", "layer3.7.bn1.running_var", "layer3.7.conv2.weight", "layer3.7.bn2.weight", "layer3.7.bn2.bias", "layer3.7.bn2.running_mean", "layer3.7.bn2.running_var", "layer3.7.conv3.weight", "layer3.7.bn3.weight", "layer3.7.bn3.bias", "layer3.7.bn3.running_mean", "layer3.7.bn3.running_var", "layer3.8.conv1.weight", "layer3.8.bn1.weight", "layer3.8.bn1.bias", "layer3.8.bn1.running_mean", "layer3.8.bn1.running_var", "layer3.8.conv2.weight", "layer3.8.bn2.weight", "layer3.8.bn2.bias", "layer3.8.bn2.running_mean", "layer3.8.bn2.running_var", "layer3.8.conv3.weight", "layer3.8.bn3.weight", "layer3.8.bn3.bias", "layer3.8.bn3.running_mean", "layer3.8.bn3.running_var", "layer3.9.conv1.weight", "layer3.9.bn1.weight", "layer3.9.bn1.bias", "layer3.9.bn1.running_mean", "layer3.9.bn1.running_var", "layer3.9.conv2.weight", "layer3.9.bn2.weight", "layer3.9.bn2.bias", "layer3.9.bn2.running_mean", "layer3.9.bn2.running_var", "layer3.9.conv3.weight", "layer3.9.bn3.weight", "layer3.9.bn3.bias", "layer3.9.bn3.running_mean", "layer3.9.bn3.running_var", "layer3.10.conv1.weight", "layer3.10.bn1.weight", "layer3.10.bn1.bias", "layer3.10.bn1.running_mean", "layer3.10.bn1.running_var", "layer3.10.conv2.weight", "layer3.10.bn2.weight", "layer3.10.bn2.bias", "layer3.10.bn2.running_mean", "layer3.10.bn2.running_var", "layer3.10.conv3.weight", "layer3.10.bn3.weight", "layer3.10.bn3.bias", "layer3.10.bn3.running_mean", "layer3.10.bn3.running_var", "layer3.11.conv1.weight", "layer3.11.bn1.weight", "layer3.11.bn1.bias", "layer3.11.bn1.running_mean", "layer3.11.bn1.running_var", "layer3.11.conv2.weight", "layer3.11.bn2.weight", "layer3.11.bn2.bias", "layer3.11.bn2.running_mean", "layer3.11.bn2.running_var", "layer3.11.conv3.weight", "layer3.11.bn3.weight", "layer3.11.bn3.bias", "layer3.11.bn3.running_mean", "layer3.11.bn3.running_var", "layer3.12.conv1.weight", "layer3.12.bn1.weight", "layer3.12.bn1.bias", "layer3.12.bn1.running_mean", "layer3.12.bn1.running_var", "layer3.12.conv2.weight", "layer3.12.bn2.weight", "layer3.12.bn2.bias", "layer3.12.bn2.running_mean", "layer3.12.bn2.running_var", "layer3.12.conv3.weight", "layer3.12.bn3.weight", "layer3.12.bn3.bias", "layer3.12.bn3.running_mean", "layer3.12.bn3.running_var", "layer3.13.conv1.weight", "layer3.13.bn1.weight", "layer3.13.bn1.bias", "layer3.13.bn1.running_mean", "layer3.13.bn1.running_var", "layer3.13.conv2.weight", "layer3.13.bn2.weight", "layer3.13.bn2.bias", "layer3.13.bn2.running_mean", "layer3.13.bn2.running_var", "layer3.13.conv3.weight", "layer3.13.bn3.weight", "layer3.13.bn3.bias", "layer3.13.bn3.running_mean", "layer3.13.bn3.running_var", "layer3.14.conv1.weight", "layer3.14.bn1.weight", "layer3.14.bn1.bias", "layer3.14.bn1.running_mean", "layer3.14.bn1.running_var", "layer3.14.conv2.weight", "layer3.14.bn2.weight", "layer3.14.bn2.bias", "layer3.14.bn2.running_mean", "layer3.14.bn2.running_var", "layer3.14.conv3.weight", "layer3.14.bn3.weight", "layer3.14.bn3.bias", "layer3.14.bn3.running_mean", "layer3.14.bn3.running_var", "layer3.15.conv1.weight", "layer3.15.bn1.weight", "layer3.15.bn1.bias", "layer3.15.bn1.running_mean", "layer3.15.bn1.running_var", "layer3.15.conv2.weight", "layer3.15.bn2.weight", "layer3.15.bn2.bias", "layer3.15.bn2.running_mean", "layer3.15.bn2.running_var", "layer3.15.conv3.weight", "layer3.15.bn3.weight", "layer3.15.bn3.bias", "layer3.15.bn3.running_mean", "layer3.15.bn3.running_var", "layer3.16.conv1.weight", "layer3.16.bn1.weight", "layer3.16.bn1.bias", "layer3.16.bn1.running_mean", "layer3.16.bn1.running_var", "layer3.16.conv2.weight", "layer3.16.bn2.weight", "layer3.16.bn2.bias", "layer3.16.bn2.running_mean", "layer3.16.bn2.running_var", "layer3.16.conv3.weight", "layer3.16.bn3.weight", "layer3.16.bn3.bias", "layer3.16.bn3.running_mean", "layer3.16.bn3.running_var", "layer3.17.conv1.weight", "layer3.17.bn1.weight", "layer3.17.bn1.bias", "layer3.17.bn1.running_mean", "layer3.17.bn1.running_var", "layer3.17.conv2.weight", "layer3.17.bn2.weight", "layer3.17.bn2.bias", "layer3.17.bn2.running_mean", "layer3.17.bn2.running_var", "layer3.17.conv3.weight", "layer3.17.bn3.weight", "layer3.17.bn3.bias", "layer3.17.bn3.running_mean", "layer3.17.bn3.running_var", "layer3.18.conv1.weight", "layer3.18.bn1.weight", "layer3.18.bn1.bias", "layer3.18.bn1.running_mean", "layer3.18.bn1.running_var", "layer3.18.conv2.weight", "layer3.18.bn2.weight", "layer3.18.bn2.bias", "layer3.18.bn2.running_mean", "layer3.18.bn2.running_var", "layer3.18.conv3.weight", "layer3.18.bn3.weight", "layer3.18.bn3.bias", "layer3.18.bn3.running_mean", "layer3.18.bn3.running_var", "layer3.19.conv1.weight", "layer3.19.bn1.weight", "layer3.19.bn1.bias", "layer3.19.bn1.running_mean", "layer3.19.bn1.running_var", "layer3.19.conv2.weight", "layer3.19.bn2.weight", "layer3.19.bn2.bias", "layer3.19.bn2.running_mean", "layer3.19.bn2.running_var", "layer3.19.conv3.weight", "layer3.19.bn3.weight", "layer3.19.bn3.bias", "layer3.19.bn3.running_mean", "layer3.19.bn3.running_var", "layer3.20.conv1.weight", "layer3.20.bn1.weight", "layer3.20.bn1.bias", "layer3.20.bn1.running_mean", "layer3.20.bn1.running_var", "layer3.20.conv2.weight", "layer3.20.bn2.weight", "layer3.20.bn2.bias", "layer3.20.bn2.running_mean", "layer3.20.bn2.running_var", "layer3.20.conv3.weight", "layer3.20.bn3.weight", "layer3.20.bn3.bias", "layer3.20.bn3.running_mean", "layer3.20.bn3.running_var", "layer3.21.conv1.weight", "layer3.21.bn1.weight", "layer3.21.bn1.bias", "layer3.21.bn1.running_mean", "layer3.21.bn1.running_var", "layer3.21.conv2.weight", "layer3.21.bn2.weight", "layer3.21.bn2.bias", "layer3.21.bn2.running_mean", "layer3.21.bn2.running_var", "layer3.21.conv3.weight", "layer3.21.bn3.weight", "layer3.21.bn3.bias", "layer3.21.bn3.running_mean", "layer3.21.bn3.running_var", "layer3.22.conv1.weight", "layer3.22.bn1.weight", "layer3.22.bn1.bias", "layer3.22.bn1.running_mean", "layer3.22.bn1.running_var", "layer3.22.conv2.weight", "layer3.22.bn2.weight", "layer3.22.bn2.bias", "layer3.22.bn2.running_mean", "layer3.22.bn2.running_var", "layer3.22.conv3.weight", "layer3.22.bn3.weight", "layer3.22.bn3.bias", "layer3.22.bn3.running_mean", "layer3.22.bn3.running_var", "layer4.0.conv1.weight", "layer4.0.bn1.weight", "layer4.0.bn1.bias", "layer4.0.bn1.running_mean", "layer4.0.bn1.running_var", "layer4.0.conv2.weight", "layer4.0.bn2.weight", "layer4.0.bn2.bias", "layer4.0.bn2.running_mean", "layer4.0.bn2.running_var", "layer4.0.conv3.weight", "layer4.0.bn3.weight", "layer4.0.bn3.bias", "layer4.0.bn3.running_mean", "layer4.0.bn3.running_var", "layer4.0.downsample.0.weight", "layer4.0.downsample.1.weight", "layer4.0.downsample.1.bias", "layer4.0.downsample.1.running_mean", "layer4.0.downsample.1.running_var", "layer4.1.conv1.weight", "layer4.1.bn1.weight", "layer4.1.bn1.bias", "layer4.1.bn1.running_mean", "layer4.1.bn1.running_var", "layer4.1.conv2.weight", "layer4.1.bn2.weight", "layer4.1.bn2.bias", "layer4.1.bn2.running_mean", "layer4.1.bn2.running_var", "layer4.1.conv3.weight", "layer4.1.bn3.weight", "layer4.1.bn3.bias", "layer4.1.bn3.running_mean", "layer4.1.bn3.running_var", "layer4.2.conv1.weight", "layer4.2.bn1.weight", "layer4.2.bn1.bias", "layer4.2.bn1.running_mean", "layer4.2.bn1.running_var", "layer4.2.conv2.weight", "layer4.2.bn2.weight", "layer4.2.bn2.bias", "layer4.2.bn2.running_mean", "layer4.2.bn2.running_var", "layer4.2.conv3.weight", "layer4.2.bn3.weight", "layer4.2.bn3.bias", "layer4.2.bn3.running_mean", "layer4.2.bn3.running_var", "fc.weight", "fc.bias". 

Process finished with exit code 1

The pth file is download from your sample model res101

where to change the threshold

hello, where to change the threshold？my precision is very low.please!!!!!

The AP is slower than the paper report,I don't know what caused this.

About Backbone

Hi, have you experiment on large basemodels such as ResNet50, 100? what about the results?

Data loading problem

Providing Pre-trained model for sim10 -> City. ?

Hi @ksaito-ut,

Thank you for sharing your code. Can you provide the pre-trained model for the sim10k to city scenario?

In addition, using your sample pre-trained model, I got the following result:
There are some differences in final results compared to the paper (for bike, bird, cat, ..). Is this model exactly the same model for the paper? if not then can you help me how can I reproduce your results? Do you have some hints for me? Is it because the eval code? should I use the Matlab version?

Saving cached annotations to data/watercolor/annotations_cache/test_annots.pkl
AP for bicycle = 0.7777
data/watercolor/results/VOC2007/Main/comp4_det_test_bird.txt
AP for bird = 0.5225
data/watercolor/results/VOC2007/Main/comp4_det_test_car.txt
AP for car = 0.4628
data/watercolor/results/VOC2007/Main/comp4_det_test_cat.txt
AP for cat = 0.4604
data/watercolor/results/VOC2007/Main/comp4_det_test_dog.txt
AP for dog = 0.3974
data/watercolor/results/VOC2007/Main/comp4_det_test_person.txt
AP for person = 0.6705
Mean AP = 0.5485

Results: |  
0.778 |  
0.523 |  
0.463 |  
0.46 |  
0.397 |  
0.671 |  
0.549 |  
~~~~~~~~ |  
  |  
-------------------------------------------------------------- |  
Results computed with the **unofficial** Python eval code. |  
Results should be very close to the official MATLAB eval code. |  
Recompute with `./tools/reval.py --matlab ...` for your paper. |  
-- Thanks, The Management |

ImportError: torch.utils.ffi is deprecated. Please use cpp extensions instead.

Hi, thx for your great kindness. When I do 'cd lib, sh make.sh' in this repo, I got this error:
ImportError: torch.utils.ffi is deprecated. Please use cpp extensions instead.
Then I found this is caused by the version of pytorch. So I have these questions:
What's the version of pytorch you used? And which branch of the repo faster-rcnn.pytorch should I use? master or pytorch-1.0? Do you support pytorch-1.0?

size mismatch when loading the trained checkpoint

For the adaptation from VOC to watercolor, VOC has 21 classes (background included) and the watercolor dataset has 7 classes, how do you deal with the inconsistency? It seems that the classifier obtained in the training stage cannot be applied for the test stage. Thanks.

*** RuntimeError: Error(s) in loading state_dict for resnet:
	size mismatch for RCNN_cls_score.weight: copying a param of torch.Size([7, 2304]) from checkpoint, where the shape is torch.Size([21, 2304]) in current model.
	size mismatch for RCNN_cls_score.bias: copying a param of torch.Size([7]) from checkpoint, where the shape is torch.Size([21]) in current model.
	size mismatch for RCNN_bbox_pred.weight: copying a param of torch.Size([28, 2304]) from checkpoint, where the shape is torch.Size([84, 2304]) in current model.
	size mismatch for RCNN_bbox_pred.bias: copying a param of torch.Size([28]) from checkpoint, where the shape is torch.Size([84]) in current model.

About CycleGAN settings

In the supplement material, training epochs is provided. Does "10 epochs" mean that 10 epochs for initial lr plus 10 epochs for lr decay? Could share more parameters about CycleGAN? Or did you use the default parameters for other parameters? Lots of thanks. @ksaito-ut

No module named 'PIL', 'torchvision'

'PIL', 'torchvision' should be added to the file requirement.txt

Data preparation

Hi,

Im trying to use this repo to reproduce everything you did in your paper,
However, the watermark & clipart data you use is from another repo 'cross-domain-detection'

The other repo need Cupy to generate data (watermark and clipart). For some reasons (working environment) I cannot install Cupy correctly. (conflict on cudatoolkit and the cupy). Would it be possible to share your training data on dropbox?

Also, the watermark data 's annotations are generated using ssd300 in other repo, is it true?

How to get the dataset Cityscapes_car?

I wanna do the experiments from sim10k to cityscape car. Thus it is necessary to prepare cityscape car data set?

When I just remove all the annotations except car class, I found some images do not contain a car. Do I delete these images? Especially for the test images.

Error in Clipart Dataset

When I use train_scripts/clipart_sample.sh to train the network, terminal displays the following error message:

Appending horizontally-flipped training examples...
wrote gt roidb to ./DA_Detection/data/cache/clipart_trainval_gt_roidb.pkl
error
[ True False False False  True  True]
[[   27   137   157   181]
 [65445    66     4   163]
 [65470   192    78   254]
 [65439   372   138   483]
 [65273   190 65474   415]
 [65278   307 65416   371]]
300
done
Preparing training data...
done
before filtering, there are 2000 images...
after filtering, there are 2000 images...
33102 source roidb entries
2000 target roidb entries

It seems that the error messages comes from the annotation error in clipart dataset. Shall I need to throw the problematic image/annotation away from clipart dataset?

RuntimeError: size mismatch occurs when using `--lc --gc` for `trainval_net_global_local.py`

I want to use local context and global context with argument --lc --gc for trainval_net_global_local.py .

Command as followed:
CUDA_VISIBLE_DEVICES=4 python trainval_net_global_local.py --dataset pascal_voc_0712 --dataset_t clipart --net res101 --cuda --bs 2 --lc --gc

And I got a RuntimeError: size mismatch error.

Traceback (most recent call last):
  File "trainval_net_global_local.py", line 201, in <module>
    rois_label, out_d_pixel, out_d = fasterRCNN(im_data, im_info, gt_boxes, num_boxes)
  File "/youedata/dengjinhong/anaconda3/envs/pytorch0.4.0/lib/python3.6/site-packages/torch/nn/modules/module.py", line 491, in __call__
    result = self.forward(*input, **kwargs)
  File "/youedata/dengjinhong/github/DA_Detection/lib/model/faster_rcnn/faster_rcnn_global_local.py", line 114, in forward
    bbox_pred = self.RCNN_bbox_pred(pooled_feat)
  File "/youedata/dengjinhong/anaconda3/envs/pytorch0.4.0/lib/python3.6/site-packages/torch/nn/modules/module.py", line 491, in __call__
    result = self.forward(*input, **kwargs)
  File "/youedata/dengjinhong/anaconda3/envs/pytorch0.4.0/lib/python3.6/site-packages/torch/nn/modules/linear.py", line 55, in forward
    return F.linear(input, self.weight, self.bias)
  File "/youedata/dengjinhong/anaconda3/envs/pytorch0.4.0/lib/python3.6/site-packages/torch/nn/functional.py", line 992, in linear
    return torch.addmm(bias, input, weight.t())
RuntimeError: size mismatch, m1: [256 x 2560], m2: [2304 x 84] at /opt/conda/conda-bld/pytorch_1524586445097/work/aten/src/THC/generic/THCTensorMathBlas.cu:249

Hi~I have a question. Foggy Cityscapes dataset has multiple indexes (0.01, 0.02, 0.005).Which index is used in this paper

about the focalloss

Ablation study for w/o local context?

Hello, you have did a solid work, but in the paper, there is no ablation study for local alignment with context and without context. So, could u tell me the performance of this two situation?

Running Error

Hello,
When I run your code, there is an error. Could you give me some advice?
Thank you

from torchvision import _C
ImportError: /home/aming/anaconda3/envs/dadetection/lib/python3.6/site-packages/torchvision/_C.cpython-36m-x86_64-linux-gnu.so: undefined symbol: _ZN6caffe26detail36_typeMetaDataInstance_preallocated_7E

DA_Detection/lib/model/nms/src/nms_cuda.c: No such file or directory

Epochs trained for cityscapes to foggy cityscapes DA

Hi @ksaito-ut

Thank you for the work and code. Its very helpful. Can you please confirm once for how many epochs was the model trained for cityscapes to foggycityscapes DA? I couldnt see those details in the paper. Also in paper its mentioned 70k iterations but in code its 100k iterations per epoch.

Thanks and regards,
Vaishnavi Khindkar

some problem about train from cityscape to foggy cityscape

Ｈｅｌｌｏ, when I train on cityscape-->foggy cityscape, I meet the problem of RPN regression box loss becoming NaN,　What should I do?
Looking forward to your reply, thank you very much!

About global alignment loss and local alignment loss

Thanks for your code ! I have a little confused about the global alignment loss and local alignment loss, when training the network, the global loss and local loss should be higher and higher， is that right ？

Can this project run with multi-gpus?

Since the data should be fed into the network in the form of the pair strictly, I am confused that wether the model can be trained in the multi-gpus manners or not?

why calculated sigmoid twice？

Hello，in code vgg16_global_local.py，sigmoid has been calculated once，why count again in net_utils.py class FocalLoss(nn.Module)？
class netD_pixel(nn.Module): def forward(self, x): # x = ReverseLayerF.apply(x, self.beta) x = F.relu(x) x = F.relu(self.conv1(x)) x = F.relu(self.conv2(x)) if self.context: feat = F.avg_pool2d(x, (x.size(2), x.size(3))) # feat = x **x = torch.sigmoid(self.conv3(x))** return x.view(-1,1), feat # torch.cat((feat1,feat2),1)#F else: **x = torch.sigmoid(self.conv3(x))** return x.view(-1,1)#F.sigmoid(x)
class FocalLoss(nn.Module)： def forward(self, inputs, targets): N = inputs.size(0) # print(N) C = inputs.size(1) if self.sigmoid: **P = F.sigmoid(inputs)** #F.softmax(inputs) if targets == 0: probs = 1 - P#(P * class_mask).sum(1).view(-1, 1) log_p = probs.log() batch_loss = - (torch.pow((1 - probs), self.gamma)) * log_p if targets == 1: probs = P # (P * class_mask).sum(1).view(-1, 1) log_p = probs.log() batch_loss = - (torch.pow((1 - probs), self.gamma)) * log_p else: #inputs = F.sigmoid(inputs) **P = torch.softmax(inputs)**

lower result than paper

cityscape->foggy cityscape 32.48
sim10k -> cityscape 38.3
I use trainval_net_global_local.py gc=True lc=True , bs=1, epoch=20

How can I see the baseline results?

I have tried using trainval_net_so.py, hoping that is the file which reproduces the baseline results, but with no luck. I have an error when importing from model.faster_rcnn.resnet_imglevel import resnet because those files do not exists. The other training files seem to work.

Also, could you provide some documentation for the code? Why are there so many files for training? Isn't enough changing some parameters? If yes, which are those parameters and what do they do?

Thanks

Visualization of features obtained

Can you please provide the code for generating Figure 4 as done in the original paper?
Thanks

Few questions for reproducing the results on Watercolor

Hi, I'm trying to reproduce your work for VOC->Watercoloc dataset but have several issues.
Q1: Watercolor dataset only has 7 classes (including the background), but VOC has 21 classes. Should I use a 7-class-VOC dataset as the source domain during the training process?
Q2: Could I finetune from the pretrained model that is trained on 21-class-VOC dataset when I uses the 7-class-VOC dataset as the source domain?

Try this on SSD

Has anyone tried this approach on SSD? In the paper, Authors mentioned that it should work on SSD too.

The dimensions of the parameters of the trained and tested models do not match

hello:
Since I'm new to this, I have a problem with testing the pascal VOC --> clipart and don,t know where to start:

    While copying the parameter named "RCNN_cls_score.weight", whose dimensions in the model are torch.Size([21, 4224]) and whose dimensions in the checkpoint are torch.Size([21, 4096]).
    While copying the parameter named "RCNN_bbox_pred.weight", whose dimensions in the model are torch.Size([84, 4224]) and whose dimensions in the checkpoint are torch.Size([84, 4096]).

Looking forward to your reply. Thank you very much.

Once trained with source and target domain data, can we generate a model,wherein model will predict new dataset/images similar to target domain dataset.

Hi ,

I hope you understood my question, i want to test my model on new test dataset/images which has not been trained/run as target domain data (during training trainval_net_global_local.py). i.e., I want to test the model with respect to new dataset which has not been trained/run as target domain data.
basically I want to run test_net_global_local.py on new dataset that is not trained as target domain dataset using trainval_net_global_local.py but dataset/images is similar to target domain dataset and run object detection on this new dataset.
i want to understand if it is possible then how??
How can we do this any logic might be of great help?

Thanks in advance.