如果要训练source-only的结果(单纯训练FCOS),是用tools/, 以及yaml文件的FCOS_ON: False训练吗? 还是tools/, FCOS_ON: False 即可呢?

Codes of FasterRCNN implementation for SIGMA++

Hi,I have read a series of work on domain adaptation based graph matching; It is a wonderful work and can be very enlightening. I am very interested in the code implementation using FasterRCNN. Can you release it?


CPU:12核 Intel(R) Xeon(R) Platinum 8255C CPU @ 2.50GHz,内存:43GB

RuntimeError: Not compiled with GPU support

Hi~ I have a problem when i trying to run the demo model.Did anyone run into this problem before?
I have no clue how to solve it. QQ

I check my pytorch it can use cuda as well




如题,我个人尝试使用原repo maskrcnn的fcos_demo在运行时报错:
Traceback (most recent call last):
File "demo/", line 128, in
File "demo/", line 110, in main
File "/home/e401/Desktop/wrs/projects/SIGMA/demo/", line 117, in init
_ = checkpointer.load(cfg.MODEL.WEIGHT)
File "/home/e401/Desktop/wrs/projects/SIGMA/fcos_core/utils/", line 318, in load
self._load_model(checkpoint, load_dis)
File "/home/e401/Desktop/wrs/projects/SIGMA/fcos_core/utils/", line 422, in _load_model
load_state_dict(self.model["backbone"], checkpoint.pop("model_backbone"))
TypeError: 'GeneralizedRCNN' object is not subscriptable


Sim10k's ImageSets

Could you provide the Sim10k ImageSets? I can't seem to find them on the internet. Thank you so much!!


Traceback (most recent call last):
File "tools/", line 726, in
File "tools/", line 715, in main
MODEL = train(cfg, args.local_rank, args.distributed, args.test_only,args.use_tensorboard)
File "tools/", line 601, in train
File "/home/e401/Desktop/wrs/projects/SIGMA/fcos_core/engine/", line 299, in do_train
model, (images_s, images_t), targets=targets_s, return_maps=True)
File "/home/e401/Desktop/wrs/projects/SIGMA/fcos_core/engine/", line 69, in foward_detector
(features_s, features_t), middle_head_loss = model_middle_head(images, (features_s,features_t), targets=targets, score_maps=score_maps )
File "/home/e401/anaconda3/envs/SIGMA/lib/python3.7/site-packages/torch/nn/modules/", line 532, in call
result = self.forward(*input, **kwargs)
File "/home/e401/Desktop/wrs/projects/SIGMA/fcos_core/modeling/rpn/fcos/", line 229, in forward
features, feat_loss = self._forward_train(images, features, targets, score_maps)
File "/home/e401/Desktop/wrs/projects/SIGMA/fcos_core/modeling/rpn/fcos/", line 348, in _forward_train
matching_loss_quadratic = self._forward_qu(nodes_1, nodes_2, edges_1.detach(), edges_2.detach(), affinity)
File "/home/e401/Desktop/wrs/projects/SIGMA/fcos_core/modeling/rpn/fcos/", line 624, in _forward_qu
sin2 = torch.sqrt(1.- F.cosine_similarity(triangle_2, triangle_2_tmp).pow(2)).sort()[0]
RuntimeError: The size of tensor a (8) must match the size of tensor b (9) at non-singleton dimension 0


  1. 用的是自己的数据集
  2. 同样的代码在另一个数据集下运行就没事
  3. 报错前loss依然较平稳下降中

Questions about experiment results.

I trained about 50000 iters using 2080Ti with batch size as 2, and I found that the evaluation results are quite unstable. The AP50 fluctuated around 41 and reach a maximum of 43.5. I wanna ask you how to judge the convergence of the model and select the results to report.
Thanks a lot.

Error with accimage

I am running Python version 3.8.10 with torch version '1.9.0+cu111' and torchvision version 0.2.1. I installed additional packages as instructed in the file.

Despite this, upon testing, I encountered the following error message.

2023-04-12 10:52:45,419 fcos_core.inference INFO: Start evaluation on cityscapes_foggy_val_cocostyle dataset(500 images).
  0%|                                                                                                                                                                            | 0/125 [00:00<?, ?it/s]
Traceback (most recent call last):
  File "tools/", line 112, in <module>
  File "tools/", line 96, in main
  File "/root/autodl-tmp/UDA/SIGMA/fcos_core/engine/", line 87, in inference
    predictions = compute_on_dataset(cfg, model, data_loader, device, inference_timer)
  File "/root/autodl-tmp/UDA/SIGMA/fcos_core/engine/", line 23, in compute_on_dataset
    for _, batch in enumerate(tqdm(data_loader)):
  File "/root/miniconda3/lib/python3.8/site-packages/tqdm/", line 1195, in __iter__
    for obj in iterable:
  File "/root/miniconda3/lib/python3.8/site-packages/torch/utils/data/", line 521, in __next__
    data = self._next_data()
  File "/root/miniconda3/lib/python3.8/site-packages/torch/utils/data/", line 1203, in _next_data
    return self._process_data(data)
  File "/root/miniconda3/lib/python3.8/site-packages/torch/utils/data/", line 1229, in _process_data
  File "/root/miniconda3/lib/python3.8/site-packages/torch/", line 425, in reraise
    raise self.exc_type(msg)
AttributeError: Caught AttributeError in DataLoader worker process 0.
Original Traceback (most recent call last):
  File "/root/miniconda3/lib/python3.8/site-packages/torch/utils/data/_utils/", line 287, in _worker_loop
    data = fetcher.fetch(index)
  File "/root/miniconda3/lib/python3.8/site-packages/torch/utils/data/_utils/", line 44, in fetch
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "/root/miniconda3/lib/python3.8/site-packages/torch/utils/data/_utils/", line 44, in <listcomp>
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "/root/autodl-tmp/UDA/SIGMA/fcos_core/data/datasets/", line 94, in __getitem__
    img, target = self.transforms(img, target)
  File "/root/autodl-tmp/UDA/SIGMA/fcos_core/data/transforms/", line 15, in __call__
    image, target = t(image, target)
  File "/root/autodl-tmp/UDA/SIGMA/fcos_core/data/transforms/", line 59, in __call__
    image = F.resize(image, size)
  File "/root/miniconda3/lib/python3.8/site-packages/torchvision/transforms/", line 188, in resize
    if not _is_pil_image(img):
  File "/root/miniconda3/lib/python3.8/site-packages/torchvision/transforms/", line 19, in _is_pil_image
    return isinstance(img, (Image.Image, accimage.Image))
AttributeError: module 'accimage' has no attribute 'Image'

The issue appears to stem from accimage, which is a pre-installed package in torchvision. This is unexpected because the version of torchvision being utilized is accurate.


dear author:
如何解决 AttributeError: module 'torch._six' has no attribute 'PY3' 错误 in ''



How to tune hyperparameters for custom datasets?

Thanks for your excellent work! I am trying to run SIGMA and SIGMA++ on a new UDA benchmark for adapting Cityscapes to ACDC dataset. I find for different UDA tasks, you adopt different GA_DIS_LAMBDA, GRL_WEIGHT_{P3-P7}, MATCHING_LOSS_WEIGHT and BG_RATIO in configs. Could you share some insights about how to tune these hyperparameters for a new UDA task? Looking forward to your response!



multi gpu training

Hello, thanks for you work.
I notice there is no command in your README for multi gpu training. I use the following command to train.
python tools/ --confpython -m torch.distributed.launch --nproc_per_node 4 tools/ --config-file configs/SIGMA/sigma_vgg16_sim10k_to_cityscapes.yamlig-file configs/SIGMA/sigma_vgg16_sim10k_to_cityscapes.yaml
However, I meet a problem

RuntimeError: Expected to have finished reduction in the prior iteration before starting a new one. This error indicates that your module has parameters that were not used in producing loss. 
You can enable unused parameter detection by (1) passing the keyword argument `find_unused_parameters=True` to `torch.nn.parallel.DistributedDataParallel`; (2) making sure all `forward` function outputs participate in calculating loss. 
If you already have done the above two steps, then the distributed data parallel module wasn't able to locate the output tensors in the return value of your module's `forward` function. 
Please include the loss function and the structure of the return value of `forward` of your module when reporting this issue (e.g. list, dict, iterable).

How can I solve this problem to train on multiple gpus?

Random seeds are used for training

In your code, random seeds are used for training, so code based on Faster RCNN cannot be reproduced, the result will fluctuate, and code based on FCOS can be reproduced? My dataset is like this, thank you.


python tools/ --config-file configs/SIGMA/sigma_vgg16_sim10k_to_cityscapes.yaml


CUDA : 11.3, GCC : 7.5.0, Nvidia driver : 470.86
python : 3.7.9

python build develop所遇错误的解决方案(可成功编译,但不知道对实际运行的影响):

RuntimeError: cuDNN error: CUDNN_STATUS_EXECUTION_FAILED错误的已尝试方案(均无效):
三、pip install tensorboardX==2.1
四、修改cudatoolkit版本为11.1, 11.3


运行时报段错误Segmentation fault


Current thread 0x00007f1a5f1c6700 (most recent call first):
File "<frozen importlib._bootstrap>", line 372 in _init_
File "<frozen importlib._bootstrap_external>", line 606 in spec_from_file_location
File "/gs/home/rswang/proj/SIGMA/fcos_core/utils/", line 12 in import_file
File "/gs/home/rswang/proj/SIGMA/fcos_core/data/", line 221 in make_data_loader_source
File "tools/", line 559 in train
File "tools/", line 717 in main
File "tools/", line 728 in
/var/spool/slurm/job8041367/slurm_script: line 23: 96067 Segmentation fault (core dumped) python tools/ --config-file configs/sigma_plus_plus/mine.yaml

Cuda compilation tools, release 10.2, V10.2.89
CUDA used to build PyTorch: 10.1
CUDA runtime version: 10.2.89


ps: cudatoolkit==10.1 torch1.4.0

Question about pseudo label and category mismatch.

Pseudo Label
From what I gather, the graphs for target images in the paper are constructed solely based on the pseudo label due to the lack of ground truth labels.

However, in cases where the domain gap between source and target is very large, such as going from a sunny day to a heavy rainy night, relying on the pseudo label can be inadequate and may lead to issues in graph construction.

Assuming my understanding is accurate, are there any potential solutions to address this problem and enhance SIGMA's performance?

Category Mismatch
I have an additional concern regarding the effectiveness of the node completion (DNC) strategy used in the paper. The datasets used for domain adaptation, such as Cityscape, FoggyCityscape, Sim10k, and KITTI, have similar categories.

As a result, I am uncertain whether DNC would perform well if the categories were significantly different between the source and target datasets.

source-only 训练,低mAP





如题,我个人尝试使用原repo maskrcnn的fcos_demo在运行时报错:
Traceback (most recent call last):
File "demo/", line 128, in
File "demo/", line 110, in main
File "/home/e401/Desktop/wrs/projects/SIGMA/demo/", line 117, in init
_ = checkpointer.load(cfg.MODEL.WEIGHT)
File "/home/e401/Desktop/wrs/projects/SIGMA/fcos_core/utils/", line 318, in load
self._load_model(checkpoint, load_dis)
File "/home/e401/Desktop/wrs/projects/SIGMA/fcos_core/utils/", line 422, in _load_model
load_state_dict(self.model["backbone"], checkpoint.pop("model_backbone"))
TypeError: 'GeneralizedRCNN' object is not subscriptable



作者您好,我的配置是3080ti,cuda=11.6。在执行python build develop出现问题,具体如下:

  1. 当按照readme提供torch和cudatoolkit版本时,会出现cuda不匹配的报错(30系似乎只能安装cuda11+):
conda install cudatoolkit=10.1 # 10.0, 10.1, 10.2, 11+ all can work!
pip install torch==1.4.0 # later is ok!
pip install --no-deps torchvision==0.2.1 


  1. 当安装适配cuda11的torch版本时,执行 会有以下报错
  • (官网最新的)conda install pytorch==1.12.1 torchvision==0.13.1 torchaudio==0.12.1 -c pytorch

  • (cuda11的最低版本) conda install pytorch==1.7.0 torchvision==0.8.0 torchaudio==0.7.0 cudatoolkit=11.0 -c pytorch

The above exception was the direct cause of the following exception:
Traceback (most recent call last):
  File "", line 77, in <module>
  File "/home/yjy/anaconda3/envs/SIGMA/lib/python3.7/site-packages/setuptools/", line 87, in setup
    return distutils.core.setup(**attrs)
  File "/home/yjy/anaconda3/envs/SIGMA/lib/python3.7/site-packages/setuptools/_distutils/", line 185, in setup
    return run_commands(dist)
  File "/home/yjy/anaconda3/envs/SIGMA/lib/python3.7/site-packages/setuptools/_distutils/", line 201, in run_commands
  File "/home/yjy/anaconda3/envs/SIGMA/lib/python3.7/site-packages/setuptools/_distutils/", line 968, in run_commands
  File "/home/yjy/anaconda3/envs/SIGMA/lib/python3.7/site-packages/setuptools/", line 1217, in run_command
  File "/home/yjy/anaconda3/envs/SIGMA/lib/python3.7/site-packages/setuptools/_distutils/", line 987, in run_command
  File "/home/yjy/anaconda3/envs/SIGMA/lib/python3.7/site-packages/setuptools/_distutils/command/", line 132, in run
  File "/home/yjy/anaconda3/envs/SIGMA/lib/python3.7/site-packages/setuptools/_distutils/", line 319, in run_command
  File "/home/yjy/anaconda3/envs/SIGMA/lib/python3.7/site-packages/setuptools/", line 1217, in run_command
  File "/home/yjy/anaconda3/envs/SIGMA/lib/python3.7/site-packages/setuptools/_distutils/", line 987, in run_command
  File "/home/yjy/anaconda3/envs/SIGMA/lib/python3.7/site-packages/setuptools/command/", line 84, in run
  File "/home/yjy/anaconda3/envs/SIGMA/lib/python3.7/site-packages/setuptools/_distutils/command/", line 346, in run
  File "/home/yjy/anaconda3/envs/SIGMA/lib/python3.7/site-packages/torch/utils/", line 653, in build_extensions
  File "/home/yjy/anaconda3/envs/SIGMA/lib/python3.7/site-packages/setuptools/_distutils/command/", line 466, in build_extensions
  File "/home/yjy/anaconda3/envs/SIGMA/lib/python3.7/site-packages/setuptools/_distutils/command/", line 492, in _build_extensions_serial
  File "/home/yjy/anaconda3/envs/SIGMA/lib/python3.7/site-packages/setuptools/command/", line 246, in build_extension
    _build_ext.build_extension(self, ext)
  File "/home/yjy/anaconda3/envs/SIGMA/lib/python3.7/site-packages/setuptools/_distutils/command/", line 554, in build_extension
  File "/home/yjy/anaconda3/envs/SIGMA/lib/python3.7/site-packages/torch/utils/", line 482, in unix_wrap_ninja_compile
  File "/home/yjy/anaconda3/envs/SIGMA/lib/python3.7/site-packages/torch/utils/", line 1238, in _write_ninja_file_and_compile_objects
    error_prefix='Error compiling objects for extension')
  File "/home/yjy/anaconda3/envs/SIGMA/lib/python3.7/site-packages/torch/utils/", line 1538, in _run_ninja_build
    raise RuntimeError(message) from e
RuntimeError: Error compiling objects for extension


Question about number of warm up iterations

Hi, I would like to ask you about the number of iters used for warm-up. Haven't run the code myself but I was wondering if the 2000 iters is where the global-align only model converged? Did you find that number out of experimentation or did you follow the original config setting from previous works? Thanks.

Segmentation fault (core dumped)

Hi, amazing work! I followed the step-by-step installation instruction but met the following error. Do you have any ideas about this? Did I miss modify something? Thank you!

2022-05-11 21:08:26,489 fcos_core.trainer INFO: Start training
DA_ON: True
2022-05-11 21:10:25,284 fcos_core.trainer INFO: eta: 6 days, 20:56:49 iter: 20 loss_ds: 6.1977 (6.9134) node_loss: 2.0234 (2.0159) mat_loss_aff: 0.0991 (0.0992) mat_loss_qu: 0.0005 (0.0005) loss_cls: 0.6540 (0.7507) loss_reg: 1.3081 (1.9726) loss_centerness: 0.6631 (0.6738) loss_adv_P7: 0.2785 (0.2792) loss_adv_P6: 0.2755 (0.2747) loss_adv_P5: 0.2736 (0.2740) loss_adv_P4: 0.2750 (0.2750) loss_adv_P3: 0.2762 (0.2762) time: 4.4850 (5.9393) data: 0.0478 (1.7929) dis_loss: 0.0604 (0.0615) lr_backbone: 0.000833 lr_middle_head: 0.001667 lr_fcos: 0.000833 lr_dis: 0.000833 max mem: 10271
BLAS : Program is Terminated. Because you tried to allocate too many memory regions.
BLAS : Program is Terminated. Because you tried to allocate too many memory regions.
BLAS : Program is Terminated. Because you tried to allocate too many memory regions.
BLAS : Program is Terminated. Because you tried to allocate too many memory regions.
BLAS : Program is Terminated. Because you tried to allocate too many memory regions.
BLAS : Program is Terminated. Because you tried to allocate too many memory regions.
BLAS : Program is Terminated. Because you tried to allocate too many memory regions.
BLAS : Program is Terminated. Because you tried to allocate too many memory regions.
BLAS : Program is Terminated. Because you tried to allocate too many memory regions.
BLAS : Program is Terminated. Because you tried to allocate too many memory regions.
BLAS : Program is Terminated. Because you tried to allocate too many memory regions.
Segmentation fault (core dumped)


2、整个实验过程存在两次中断,第一次迭代了15200次,中断原因不明,在6200次自动保存了最高结果49.2;第二次迭代了21440次,中断原因为“OSError: [Errno 28] No space left on device”,在10400次自动保存了最高结果51.39;第三次迭代了30000次,无中断,在23800次自动保存了最高结果54.76。后两次训练均为接续训练,不是从头开始。


2、github所给代码是Conference verison吗?








Compare with model EPM?

Thank you for your contribution. If I want to compare with model EPM, I noticed that model EPM needs two steps of training, while yours only needs one step. If possible, may I ask how to change your model to be like EPM? Thank you for your reply.

Some Random Thoughts

I personally get your frustration as I think the cityscapes to foggy cityscapes adaptation is hard to achieve good semantic alignment on due to the amount of noise that distorts the semantic features.

Hello, I have the following questions, looking forward to your answers.

Hello, I have the following questions, looking forward to your answers.
1.Validation was carried out after every 100 iterations, and the model with the highest accuracy was selected to ensure repeatability. Would it be impossible to guarantee repeatability if you only verified it every 2500 times? Have you done similar experiments? thank you
2.If instead of verifying every 100 iterations, the early stop method is used, is the effect similar? thank you
3.In the own data set, the target domain data set is best divided into three parts: training, verification and testing?Modify as follows:
TRAIN_TARGET: ("cityscapes_foggy_train_cocostyle", "cityscapes_foggy_val_cocostyle"),TEST: ("cityscapes_foggy_test_cocostyle", )?

Unknown CUDA arch (8.6) or GPU not supported?

Hello, my environment is NVIDIA RTX A6000. During the last step of installation, I encountered the following problem. Do you know how to solve it? thank you. My cudatoolkit is 11.3.(conda install cudatoolkit=11.3)

File "/home/hc/anaconda3/envs/SIGMA/lib/python3.7/site-packages/torch/utils/", line 1027, in _get_cuda_arch_flags
raise ValueError("Unknown CUDA arch ({}) or GPU not supported".format(arch))
ValueError: Unknown CUDA arch (8.6) or GPU not supported

322 | T * data() const {
| ^~~~
gcc -pthread -B /home/hc/anaconda3/envs/SIGMA/compiler_compat -Wl,--sysroot=/ -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -DWITH_CUDA -Ifcos_core/csrc -I/home/hc/anaconda3/envs/SIGMA/lib/python3.7/site-packages/torch/include -I/home/hc/anaconda3/envs/SIGMA/lib/python3.7/site-packages/torch/include/torch/csrc/api/include -I/home/hc/anaconda3/envs/SIGMA/lib/python3.7/site-packages/torch/include/TH -I/home/hc/anaconda3/envs/SIGMA/lib/python3.7/site-packages/torch/include/THC -I/usr/local/cuda-11.8/include -I/home/hc/anaconda3/envs/SIGMA/include/python3.7m -c fcos_core/csrc/cpu/nms_cpu.cpp -o build/temp.linux-x86_64-cpython-37/fcos_core/csrc/cpu/nms_cpu.o -DTORCH_API_INCLUDE_EXTENSION_H -DTORCH_EXTENSION_NAME=_C -D_GLIBCXX_USE_CXX11_ABI=0 -std=c++11

