Code Monkey home page Code Monkey logo

sigma's Introduction

SIGMA: Semantic-complete Graph Matching For Domain Adaptive Object Detection (CVPR-22 ORAL)

[Arxiv] [知乎]

By Wuyang Li

🎉News! If you feel that your DAOD research has hit a bottleneck, welcome to check out our latest work on Adaptive Open-set Object Detection, which extends the target domain to the open set!

Three branches of the project:

  • Main branch (SIGMA): git clone https://github.com/CityU-AIM-Group/SIGMA.git
  • SIGMA++ branch: git clone -b SIGMA++ https://github.com/CityU-AIM-Group/SIGMA.git
  • FRCNN-SIGMA++ branch: git clone -b FRCNN-SIGMA++ https://github.com/CityU-AIM-Group/SIGMA.git

The features of SIGMA++:

  • More datasets and instructions
  • More stable and better results
  • From graph to hypergraph

image

SIGMA++ has found its final home now, indicating the end of this series of works. The growth of SIGMA++ is full of frustration: 👶 ➡ 🧒.

SCANSCAN++SIGMASIGMA++

The main idea of the series of works: Model fine-grained feature points with graphs. We sincerely appreciate for all the readers showing interest in our works.

Honestly, due to the limited personal ability, our works still have many limitations, e.g., sub-optimal and redundant designs. Please forgive me. Nevertheless, we hope our works can inspire lots of good idea.

Best regards,
Wuyang Li
E-mail: [email protected]

💡 Preparation

Installation

Check INSTALL.md for installation instructions. If you have any problem, feel free to screenshot your issue for me. Thanks.

Data preparation

More detailed preparation instructions are available here.

Step 1: Prepare benchmark datasets.

We follow EPM to construct the training and testing set by three following settings. Annotation files are available at onedrive.

Cityscapes -> Foggy Cityscapes

  • Download Cityscapes and Foggy Cityscapes dataset from the link. Particularly, we use leftImg8bit_trainvaltest.zip for Cityscapes and leftImg8bit_trainvaltest_foggy.zip for Foggy Cityscapes.
  • Download and extract the converted annotation from the following links: Cityscapes and Foggy Cityscapes (COCO format)
  • Extract the training sets from leftImg8bit_trainvaltest.zip, then move the folder leftImg8bit/train/ to Cityscapes/leftImg8bit/ directory.
  • Extract the training and validation set from leftImg8bit_trainvaltest_foggy.zip, then move the folder leftImg8bit_foggy/train/ and leftImg8bit_foggy/val/ to Cityscapes/leftImg8bit_foggy/ directory.

Sim10k -> Cityscapes (class car only)

  • Download Sim10k dataset and Cityscapes dataset from the following links: Sim10k and Cityscapes. Particularly, we use repro_10k_images.tgz and repro_10k_annotations.tgz for Sim10k and leftImg8bit_trainvaltest.zip for Cityscapes.
  • Download and extract the converted annotation from the following links: Sim10k (VOC format) and Cityscapes (COCO format)
  • Extract the training set from repro_10k_images.tgz and repro_10k_annotations.tgz, then move all images under VOC2012/JPEGImages/ to Sim10k/JPEGImages/ directory and move all annotations under VOC2012/Annotations/ to Sim10k/Annotations/.
  • Extract the training and validation set from leftImg8bit_trainvaltest.zip, then move the folder leftImg8bit/train/ and leftImg8bit/val/ to Cityscapes/leftImg8bit/ directory.

KITTI -> Cityscapes (class car only)

  • Download KITTI dataset and Cityscapes dataset from the following links: KITTI and Cityscapes. Particularly, we use data_object_image_2.zip for KITTI and leftImg8bit_trainvaltest.zip for Cityscapes.
  • Download and extract the converted annotation from the following links: KITTI (VOC format) and Cityscapes (COCO format).
  • Extract the training set from data_object_image_2.zip, then move all images under training/image_2/ to KITTI/JPEGImages/ directory.
  • Extract the training and validation set from leftImg8bit_trainvaltest.zip, then move the folder leftImg8bit/train/ and leftImg8bit/val/ to Cityscapes/leftImg8bit/ directory.
[DATASET_PATH]
└─ Cityscapes
   └─ cocoAnnotations
   └─ leftImg8bit
      └─ train
      └─ val
   └─ leftImg8bit_foggy
      └─ train
      └─ val
└─ KITTI
   └─ Annotations
   └─ ImageSets
   └─ JPEGImages
└─ Sim10k
   └─ Annotations
   └─ ImageSets
   └─ JPEGImages

Step 2: change the data root for your dataset at paths_catalog.py.

DATA_DIR = [$Your dataset root]

📦 Well-trained models

The ImageNet pretrained VGG-16 backbone (w/o BN) is available at link. You can use it if you cannot download the model through the link in the config file.
The well-trained models are available at this link).

  1. We can get higher results than the reported ones with tailor-tuned hyperparameters.
  2. E2E indicates end-to-end training for better reproducibility. Our config files are used for end-to-end training.
  3. Two-stage/ longer training and turning learning rate will make the results more stable and get higer mAP/AP75.
  4. After correcting a default hyper-parameter (as explained in the config file), Sim10k to City achieves better results than the reported ones.
  5. You can set MODEL.MIDDLE_HEAD.GM.WITH_CLUSTER_UPDATE False to accelerate training greatly with ignorable performance drops. You'd better also make this change for bs=2 since we found it more friendly for the small batch-size training.
  6. Results will be stable after the learning rate decline (in the training schedule).
Source Target E2E Metric Backbone mAP AP@50 AP@75 file
City Foggy COCO V-16 24.0 43.6 23.8 city_to_foggy_vgg16_43.58_mAP.pth
City Foggy COCO V-16 24.3 43.9 22.6 city_to_foggy_vgg16_43.90_mAP.pth
City Foggy $\checkmark$ COCO V-16 22.0 43.5 21.8 reproduced
City Foggy COCO R-50 22.7 44.3 21.2 city_to_foggy_res50_44.26_mAP.pth
City BDD100k COCO V-16 - 32.7 - city_to_bdd100k_vgg16_32.65_mAP.pth
Sim10k City COCO V-16 33.4 57.1 33.8 sim10k_to_city_vgg16_53.73_mAP.pth
Sim10k City $\checkmark$ COCO V-16 32.1 55.2 32.1 reproduced
KITTI City COCO V-16 22.6 46.6 20.0 kitti_to_city_vgg16_46.45_mAP.pth

🔥 Get start

NOTE: In the code comments, there is a small correction about batchsize: IMS_PER_BATACH=4 indicates 4 images per domain.

Train the model from the scratch with the default setting (batchsize = 4):

python tools/train_net_da.py \
        --config-file configs/SIGMA/xxx.yaml \

Test the well-trained model:

python tools/test_net.py \
        --config-file configs/SIGMA/xxx.yaml \
        MODEL.WEIGHT well_trained_models/xxx.pth

For example: test cityscapes to foggy cityscapes with VGG16 backbone.

python tools/test_net.py \
         --config-file configs/SIGMA/sigma_vgg16_cityscapace_to_foggy.yaml \
         MODEL.WEIGHT well_trained_models/city_to_foggy_vgg16_43.58_mAP.pth

✨ Quick Tutorials

  1. See sigma_vgg16_cityscapace_to_foggy.yaml to understand APIs.
  2. We modify the trainer to meet the requirements of SIGMA.
  3. GM is integrated in this "middle layer": graph_matching_head.
  4. Node sampling is conducted together with fcos loss: loss.

📝 Citation

If you think this work is helpful for your project, please give it a star and citation. We sincerely appreciate for your acknowledgments.

@inproceedings{li2022sigma,
  title={SIGMA: Semantic-complete Graph Matching for Domain Adaptive Object Detection},
  author={Li, Wuyang and Liu, Xinyu and Yuan, Yixuan},
  booktitle={CVPR},
  year={2022}
}

Relevant project:

@inproceedings{li2022scan,
  title={SCAN: Cross Domain Object Detection with Semantic Conditioned Adaptation},
  author={Li, Wuyang and Liu, Xinyu and Yao, Xiwen and Yuan, Yixuan},
  booktitle={AAAI},
  year={2022}
}

🤞 Acknowledgements

We mainly appreciate for these good projects and their authors' hard-working.

  • This work is based on EPM.
  • The implementation of our anchor-free detector is from FCOS, which highly relies on maskrcnn-benchmark.
  • The style-transferred data is from D_adapt.
  • The faster-rcnn-based implementation is based on DA-FRCNN.

📒 Abstract

Domain Adaptive Object Detection (DAOD) leverages a labeled source domain to learn an object detector generalizing to a novel target domain free of annotations. Recent advances align class-conditional distributions through narrowing down cross-domain prototypes (class centers). Though great success, these works ignore the significant within-class variance and the domain-mismatched semantics within the training batch, leading to a sub-optimal adaptation. To overcome these challenges, we propose a novel SemantIc-complete Graph MAtching (SIGMA) framework for DAOD, which completes mismatched semantics and reformulates the adaptation with graph matching. Specifically, we design a Graph-embedded Semantic Completion module (GSC) that completes mismatched semantics through generating hallucination graph nodes in missing categories. Then, we establish cross-image graphs to model class-conditional distributions and learn a graph-guided memory bank for better semantic completion in turn. After representing the source and target data as graphs, we reformulate the adaptation as a graph matching problem, i.e., finding well-matched node pairs across graphs to reduce the domain gap, which is solved with a novel Bipartite Graph Matching adaptor (BGM). In a nutshell, we utilize graph nodes to establish semantic-aware node affinity and leverage graph edges as quadratic constraints in a structure-aware matching loss, achieving fine-grained adaptation with a node-to-node graph matching. Extensive experiments demonstrate that our method outperforms existing works significantly.

image

sigma's People

Contributors

wymancv avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

sigma's Issues

Visualization

作者您好!请问如何生成目标的检测框的可视化?inference中的bbox.json应当如何运用呢?

Sim10k's ImageSets

Could you provide the Sim10k ImageSets? I can't seem to find them on the internet. Thank you so much!!

Error with accimage

I am running Python version 3.8.10 with torch version '1.9.0+cu111' and torchvision version 0.2.1. I installed additional packages as instructed in the INSTALL.md file.

Despite this, upon testing, I encountered the following error message.

2023-04-12 10:52:45,419 fcos_core.inference INFO: Start evaluation on cityscapes_foggy_val_cocostyle dataset(500 images).
  0%|                                                                                                                                                                            | 0/125 [00:00<?, ?it/s]
Traceback (most recent call last):
  File "tools/test_net.py", line 112, in <module>
    main()
  File "tools/test_net.py", line 96, in main
    inference(
  File "/root/autodl-tmp/UDA/SIGMA/fcos_core/engine/inference.py", line 87, in inference
    predictions = compute_on_dataset(cfg, model, data_loader, device, inference_timer)
  File "/root/autodl-tmp/UDA/SIGMA/fcos_core/engine/inference.py", line 23, in compute_on_dataset
    for _, batch in enumerate(tqdm(data_loader)):
  File "/root/miniconda3/lib/python3.8/site-packages/tqdm/std.py", line 1195, in __iter__
    for obj in iterable:
  File "/root/miniconda3/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 521, in __next__
    data = self._next_data()
  File "/root/miniconda3/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 1203, in _next_data
    return self._process_data(data)
  File "/root/miniconda3/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 1229, in _process_data
    data.reraise()
  File "/root/miniconda3/lib/python3.8/site-packages/torch/_utils.py", line 425, in reraise
    raise self.exc_type(msg)
AttributeError: Caught AttributeError in DataLoader worker process 0.
Original Traceback (most recent call last):
  File "/root/miniconda3/lib/python3.8/site-packages/torch/utils/data/_utils/worker.py", line 287, in _worker_loop
    data = fetcher.fetch(index)
  File "/root/miniconda3/lib/python3.8/site-packages/torch/utils/data/_utils/fetch.py", line 44, in fetch
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "/root/miniconda3/lib/python3.8/site-packages/torch/utils/data/_utils/fetch.py", line 44, in <listcomp>
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "/root/autodl-tmp/UDA/SIGMA/fcos_core/data/datasets/coco.py", line 94, in __getitem__
    img, target = self.transforms(img, target)
  File "/root/autodl-tmp/UDA/SIGMA/fcos_core/data/transforms/transforms.py", line 15, in __call__
    image, target = t(image, target)
  File "/root/autodl-tmp/UDA/SIGMA/fcos_core/data/transforms/transforms.py", line 59, in __call__
    image = F.resize(image, size)
  File "/root/miniconda3/lib/python3.8/site-packages/torchvision/transforms/functional.py", line 188, in resize
    if not _is_pil_image(img):
  File "/root/miniconda3/lib/python3.8/site-packages/torchvision/transforms/functional.py", line 19, in _is_pil_image
    return isinstance(img, (Image.Image, accimage.Image))
AttributeError: module 'accimage' has no attribute 'Image'

The issue appears to stem from accimage, which is a pre-installed package in torchvision. This is unexpected because the version of torchvision being utilized is accurate.

source-only 训练,低mAP

我在使用sigma代码时,想先用source-only预训练下试试看,所以把DA_ON关闭了。但是在source-only训练过程中,发现训练到40的mAP后,模型损失不再下降。我尝试过调整batch_size和learning_rate,但mAP就是上不去。当我使用yolov5进行sourceonly数据集的训练时,是可以到96的mAP的。
您对此问题有想法吗?

有关实验结果的一些疑问

您好,我在原实验参数未做更改的情况下完成了阶段性实验,下面我先对本人的实验过程做一下说明:
1、实验环境配置中我使用的高版本cuda
2、整个实验过程存在两次中断,第一次迭代了15200次,中断原因不明,在6200次自动保存了最高结果49.2;第二次迭代了21440次,中断原因为“OSError: [Errno 28] No space left on device”,在10400次自动保存了最高结果51.39;第三次迭代了30000次,无中断,在23800次自动保存了最高结果54.76。后两次训练均为接续训练,不是从头开始。

问题:
1、为什么自动保存的是效果最好的模型结果,而非是按迭代次数保存,比如每10000次迭代保存一个pth

2、github所给代码是Conference verison吗?

3、三次实验总共60000次迭代的结果是54.76,结果是否合理?若要达到57.1是否只能继续训练?

4、若我要做消融对比实验,仅对比实验结果来判断哪些模块更为有效,那么多少次迭代比较合适,或者说该如何进行比较,因为完整的100000次迭代耗时太久了

三次实验日志如下:
第一次:
train111.log

第二次:
train112.log

第三次:
train113.log

最后的测试结果:
1

2

是否在其他云盘中有保存权重

由于该云盘下载速度太慢,每次下载完成之后权重文件都会损坏,导致无法进行测试。请问您是否在其他云盘中保存权重文件,如果有可以分享一下吗?谢谢。
HFD$WLR_1(T0{PVA%X%E5OW

Codes of FasterRCNN implementation for SIGMA++

Hi,I have read a series of work on domain adaptation based graph matching; It is a wonderful work and can be very enlightening. I am very interested in the code implementation using FasterRCNN. Can you release it?

Question about pseudo label and category mismatch.

Pseudo Label
From what I gather, the graphs for target images in the paper are constructed solely based on the pseudo label due to the lack of ground truth labels.

However, in cases where the domain gap between source and target is very large, such as going from a sunny day to a heavy rainy night, relying on the pseudo label can be inadequate and may lead to issues in graph construction.

Assuming my understanding is accurate, are there any potential solutions to address this problem and enhance SIGMA's performance?

Category Mismatch
I have an additional concern regarding the effectiveness of the node completion (DNC) strategy used in the paper. The datasets used for domain adaptation, such as Cityscape, FoggyCityscape, Sim10k, and KITTI, have similar categories.

As a result, I am uncertain whether DNC would perform well if the categories were significantly different between the source and target datasets.

Some Random Thoughts

I personally get your frustration as I think the cityscapes to foggy cityscapes adaptation is hard to achieve good semantic alignment on due to the amount of noise that distorts the semantic features.

Unknown CUDA arch (8.6) or GPU not supported?

@wymanCV
Hello, my environment is NVIDIA RTX A6000. During the last step of installation, I encountered the following problem. Do you know how to solve it? thank you. My cudatoolkit is 11.3.(conda install cudatoolkit=11.3)

File "/home/hc/anaconda3/envs/SIGMA/lib/python3.7/site-packages/torch/utils/cpp_extension.py", line 1027, in _get_cuda_arch_flags
raise ValueError("Unknown CUDA arch ({}) or GPU not supported".format(arch))
ValueError: Unknown CUDA arch (8.6) or GPU not supported

322 | T * data() const {
| ^~~~
gcc -pthread -B /home/hc/anaconda3/envs/SIGMA/compiler_compat -Wl,--sysroot=/ -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -DWITH_CUDA -Ifcos_core/csrc -I/home/hc/anaconda3/envs/SIGMA/lib/python3.7/site-packages/torch/include -I/home/hc/anaconda3/envs/SIGMA/lib/python3.7/site-packages/torch/include/torch/csrc/api/include -I/home/hc/anaconda3/envs/SIGMA/lib/python3.7/site-packages/torch/include/TH -I/home/hc/anaconda3/envs/SIGMA/lib/python3.7/site-packages/torch/include/THC -I/usr/local/cuda-11.8/include -I/home/hc/anaconda3/envs/SIGMA/include/python3.7m -c fcos_core/csrc/cpu/nms_cpu.cpp -o build/temp.linux-x86_64-cpython-37/fcos_core/csrc/cpu/nms_cpu.o -DTORCH_API_INCLUDE_EXTENSION_H -DTORCH_EXTENSION_NAME=_C -D_GLIBCXX_USE_CXX11_ABI=0 -std=c++11

训练问题

作者您好!我在训练后出现如下报错:
image
这个报错会不会影响模型训练和测试的效果?如果影响的话,应该如何解决?
期待您的回复!

请问支持可视化吗?

如题,我个人尝试使用原repo maskrcnn的fcos_demo在运行时报错:
Traceback (most recent call last):
File "demo/fcos_demo.py", line 128, in
main()
File "demo/fcos_demo.py", line 110, in main
min_image_size=args.min_image_size
File "/home/e401/Desktop/wrs/projects/SIGMA/demo/predictor.py", line 117, in init
_ = checkpointer.load(cfg.MODEL.WEIGHT)
File "/home/e401/Desktop/wrs/projects/SIGMA/fcos_core/utils/checkpoint.py", line 318, in load
self._load_model(checkpoint, load_dis)
File "/home/e401/Desktop/wrs/projects/SIGMA/fcos_core/utils/checkpoint.py", line 422, in _load_model
load_state_dict(self.model["backbone"], checkpoint.pop("model_backbone"))
TypeError: 'GeneralizedRCNN' object is not subscriptable

此外论文中的点匹配效果可视化有接口吗?怎么使用呢?
提前谢谢!

Hello, I have the following questions, looking forward to your answers.

Hello, I have the following questions, looking forward to your answers.
1.Validation was carried out after every 100 iterations, and the model with the highest accuracy was selected to ensure repeatability. Would it be impossible to guarantee repeatability if you only verified it every 2500 times? Have you done similar experiments? thank you
2.If instead of verifying every 100 iterations, the early stop method is used, is the effect similar? thank you
3.In the own data set, the target domain data set is best divided into three parts: training, verification and testing?Modify as follows:
TRAIN_TARGET: ("cityscapes_foggy_train_cocostyle", "cityscapes_foggy_val_cocostyle"),TEST: ("cityscapes_foggy_test_cocostyle", )?

How to tune hyperparameters for custom datasets?

Thanks for your excellent work! I am trying to run SIGMA and SIGMA++ on a new UDA benchmark for adapting Cityscapes to ACDC dataset. I find for different UDA tasks, you adopt different GA_DIS_LAMBDA, GRL_WEIGHT_{P3-P7}, MATCHING_LOSS_WEIGHT and BG_RATIO in configs. Could you share some insights about how to tune these hyperparameters for a new UDA task? Looking forward to your response!

RuntimeError: Not compiled with GPU support

Hi~ I have a problem when i trying to run the demo model.Did anyone run into this problem before?
I have no clue how to solve it. QQ
image

I check my pytorch it can use cuda as well
image

Segmentation fault (core dumped)

Hi, amazing work! I followed the step-by-step installation instruction but met the following error. Do you have any ideas about this? Did I miss modify something? Thank you!

2022-05-11 21:08:26,489 fcos_core.trainer INFO: Start training
DA_ON: True
2022-05-11 21:10:25,284 fcos_core.trainer INFO: eta: 6 days, 20:56:49 iter: 20 loss_ds: 6.1977 (6.9134) node_loss: 2.0234 (2.0159) mat_loss_aff: 0.0991 (0.0992) mat_loss_qu: 0.0005 (0.0005) loss_cls: 0.6540 (0.7507) loss_reg: 1.3081 (1.9726) loss_centerness: 0.6631 (0.6738) loss_adv_P7: 0.2785 (0.2792) loss_adv_P6: 0.2755 (0.2747) loss_adv_P5: 0.2736 (0.2740) loss_adv_P4: 0.2750 (0.2750) loss_adv_P3: 0.2762 (0.2762) time: 4.4850 (5.9393) data: 0.0478 (1.7929) dis_loss: 0.0604 (0.0615) lr_backbone: 0.000833 lr_middle_head: 0.001667 lr_fcos: 0.000833 lr_dis: 0.000833 max mem: 10271
BLAS : Program is Terminated. Because you tried to allocate too many memory regions.
BLAS : Program is Terminated. Because you tried to allocate too many memory regions.
BLAS : Program is Terminated. Because you tried to allocate too many memory regions.
BLAS : Program is Terminated. Because you tried to allocate too many memory regions.
BLAS : Program is Terminated. Because you tried to allocate too many memory regions.
BLAS : Program is Terminated. Because you tried to allocate too many memory regions.
BLAS : Program is Terminated. Because you tried to allocate too many memory regions.
BLAS : Program is Terminated. Because you tried to allocate too many memory regions.
BLAS : Program is Terminated. Because you tried to allocate too many memory regions.
BLAS : Program is Terminated. Because you tried to allocate too many memory regions.
BLAS : Program is Terminated. Because you tried to allocate too many memory regions.
Segmentation fault (core dumped)

RuntimeError: cuDNN error: CUDNN_STATUS_EXECUTION_FAILED

您好,我在使用命令
python tools/train_net_da.py --config-file configs/SIGMA/sigma_vgg16_sim10k_to_cityscapes.yaml
运行中遇到了如下错误,不知道您是否能提供一些帮助,十分感谢!

微信截图_20220713210113

环境如下:
CUDA : 11.3, GCC : 7.5.0, Nvidia driver : 470.86
python : 3.7.9
conda:
cudatoolkit=10.1
pip:
torch==1.4.0
torchvision==0.2.1
scipy==1.6.0

python setup.py build develop所遇错误的解决方案(可成功编译,但不知道对实际运行的影响):
一、miniconda3/envs/SIGMA/lib/python3.7/site-packages/torch/utils/cpp_extension.py中添加了'8.6'架构
二、miniconda3/envs/SIGMA1/lib/python3.7/site-packages/torchvision/transforms/functional.py中由于pillow版本导致的错误故改为__version__

RuntimeError: cuDNN error: CUDNN_STATUS_EXECUTION_FAILED错误的已尝试方案(均无效):
一、按如下要求修改sigma_vgg16_sim10k_to_cityscapes.yaml
二、torch.backends.cudnn.enabled=False
三、pip install tensorboardX==2.1
四、修改cudatoolkit版本为11.1, 11.3

部分日志信息如下:
log.txt

Random seeds are used for training

@wymanCV
In your code, random seeds are used for training, so code based on Faster RCNN cannot be reproduced, the result will fluctuate, and code based on FCOS can be reproduced? My dataset is like this, thank you.

训练被killed

作者您好!我在训练时设置了3万次迭代,但是在22600次迭代之后程序被killed:
22ecb38643d6689564b81de3fff165a
我的设备为V100-SXM2-32GB,
CPU:12核 Intel(R) Xeon(R) Platinum 8255C CPU @ 2.50GHz,内存:43GB
batchsize为4.
想问一下killed是不是内存的问题,该如何解决?

请问支持可视化吗?

如题,我个人尝试使用原repo maskrcnn的fcos_demo在运行时报错:
Traceback (most recent call last):
File "demo/fcos_demo.py", line 128, in
main()
File "demo/fcos_demo.py", line 110, in main
min_image_size=args.min_image_size
File "/home/e401/Desktop/wrs/projects/SIGMA/demo/predictor.py", line 117, in init
_ = checkpointer.load(cfg.MODEL.WEIGHT)
File "/home/e401/Desktop/wrs/projects/SIGMA/fcos_core/utils/checkpoint.py", line 318, in load
self._load_model(checkpoint, load_dis)
File "/home/e401/Desktop/wrs/projects/SIGMA/fcos_core/utils/checkpoint.py", line 422, in _load_model
load_state_dict(self.model["backbone"], checkpoint.pop("model_backbone"))
TypeError: 'GeneralizedRCNN' object is not subscriptable

此外论文中的点匹配效果可视化有接口吗?怎么使用呢?
提前谢谢!

安装配置问题

作者您好,我的配置是3080ti,cuda=11.6。在执行python setup.py build develop出现问题,具体如下:

  1. 当按照readme提供torch和cudatoolkit版本时,会出现cuda不匹配的报错(30系似乎只能安装cuda11+):
conda install cudatoolkit=10.1 # 10.0, 10.1, 10.2, 11+ all can work!
pip install torch==1.4.0 # later is ok!
pip install --no-deps torchvision==0.2.1 

image

  1. 当安装适配cuda11的torch版本时,执行setup.py 会有以下报错
  • (官网最新的)conda install pytorch==1.12.1 torchvision==0.13.1 torchaudio==0.12.1 -c pytorch
    image

  • (cuda11的最低版本) conda install pytorch==1.7.0 torchvision==0.8.0 torchaudio==0.7.0 cudatoolkit=11.0 -c pytorch

The above exception was the direct cause of the following exception:
Traceback (most recent call last):
  File "setup.py", line 77, in <module>
    include_package_data=True,
  File "/home/yjy/anaconda3/envs/SIGMA/lib/python3.7/site-packages/setuptools/__init__.py", line 87, in setup
    return distutils.core.setup(**attrs)
  File "/home/yjy/anaconda3/envs/SIGMA/lib/python3.7/site-packages/setuptools/_distutils/core.py", line 185, in setup
    return run_commands(dist)
  File "/home/yjy/anaconda3/envs/SIGMA/lib/python3.7/site-packages/setuptools/_distutils/core.py", line 201, in run_commands
    dist.run_commands()
  File "/home/yjy/anaconda3/envs/SIGMA/lib/python3.7/site-packages/setuptools/_distutils/dist.py", line 968, in run_commands
    self.run_command(cmd)
  File "/home/yjy/anaconda3/envs/SIGMA/lib/python3.7/site-packages/setuptools/dist.py", line 1217, in run_command
    super().run_command(command)
  File "/home/yjy/anaconda3/envs/SIGMA/lib/python3.7/site-packages/setuptools/_distutils/dist.py", line 987, in run_command
    cmd_obj.run()
  File "/home/yjy/anaconda3/envs/SIGMA/lib/python3.7/site-packages/setuptools/_distutils/command/build.py", line 132, in run
    self.run_command(cmd_name)
  File "/home/yjy/anaconda3/envs/SIGMA/lib/python3.7/site-packages/setuptools/_distutils/cmd.py", line 319, in run_command
    self.distribution.run_command(command)
  File "/home/yjy/anaconda3/envs/SIGMA/lib/python3.7/site-packages/setuptools/dist.py", line 1217, in run_command
    super().run_command(command)
  File "/home/yjy/anaconda3/envs/SIGMA/lib/python3.7/site-packages/setuptools/_distutils/dist.py", line 987, in run_command
    cmd_obj.run()
  File "/home/yjy/anaconda3/envs/SIGMA/lib/python3.7/site-packages/setuptools/command/build_ext.py", line 84, in run
    _build_ext.run(self)
  File "/home/yjy/anaconda3/envs/SIGMA/lib/python3.7/site-packages/setuptools/_distutils/command/build_ext.py", line 346, in run
    self.build_extensions()
  File "/home/yjy/anaconda3/envs/SIGMA/lib/python3.7/site-packages/torch/utils/cpp_extension.py", line 653, in build_extensions
    build_ext.build_extensions(self)
  File "/home/yjy/anaconda3/envs/SIGMA/lib/python3.7/site-packages/setuptools/_distutils/command/build_ext.py", line 466, in build_extensions
    self._build_extensions_serial()
  File "/home/yjy/anaconda3/envs/SIGMA/lib/python3.7/site-packages/setuptools/_distutils/command/build_ext.py", line 492, in _build_extensions_serial
    self.build_extension(ext)
  File "/home/yjy/anaconda3/envs/SIGMA/lib/python3.7/site-packages/setuptools/command/build_ext.py", line 246, in build_extension
    _build_ext.build_extension(self, ext)
  File "/home/yjy/anaconda3/envs/SIGMA/lib/python3.7/site-packages/setuptools/_distutils/command/build_ext.py", line 554, in build_extension
    depends=ext.depends,
  File "/home/yjy/anaconda3/envs/SIGMA/lib/python3.7/site-packages/torch/utils/cpp_extension.py", line 482, in unix_wrap_ninja_compile
    with_cuda=with_cuda)
  File "/home/yjy/anaconda3/envs/SIGMA/lib/python3.7/site-packages/torch/utils/cpp_extension.py", line 1238, in _write_ninja_file_and_compile_objects
    error_prefix='Error compiling objects for extension')
  File "/home/yjy/anaconda3/envs/SIGMA/lib/python3.7/site-packages/torch/utils/cpp_extension.py", line 1538, in _run_ninja_build
    raise RuntimeError(message) from e
RuntimeError: Error compiling objects for extension

期待作者能够抽空指点下,十分感谢您的解答!!!

训练source-only

如果要训练source-only的结果(单纯训练FCOS),是用tools/train_net.py, 以及yaml文件的FCOS_ON: False训练吗? 还是tools/train_net_da.py, FCOS_ON: False 即可呢?

运行时报段错误Segmentation fault

代码运行至加载完预训练权重后报错,一开始没有traceback,使用详细报错命令后有了这样的信息。

Current thread 0x00007f1a5f1c6700 (most recent call first):
File "<frozen importlib._bootstrap>", line 372 in _init_
File "<frozen importlib._bootstrap_external>", line 606 in spec_from_file_location
File "/gs/home/rswang/proj/SIGMA/fcos_core/utils/imports.py", line 12 in import_file
File "/gs/home/rswang/proj/SIGMA/fcos_core/data/build.py", line 221 in make_data_loader_source
File "tools/train_net_da.py", line 559 in train
File "tools/train_net_da.py", line 717 in main
File "tools/train_net_da.py", line 728 in
/var/spool/slurm/job8041367/slurm_script: line 23: 96067 Segmentation fault (core dumped) python tools/train_net_da.py --config-file configs/sigma_plus_plus/mine.yaml

编译用的gcc5.3没看到有报错,代码自带的环境检测如下:
Cuda compilation tools, release 10.2, V10.2.89
CUDA used to build PyTorch: 10.1
CUDA runtime version: 10.2.89

不知道是哪里出了问题呢?请作者指点一二~~

ps: cudatoolkit==10.1 torch1.4.0

属性错误

dear author:
如何解决 AttributeError: module 'torch._six' has no attribute 'PY3' 错误 in 'imports.py'

train

您好呀,请问如果采用4卡训练的话,多卡训练的启动命令可否给一下呢

multi gpu training

Hello, thanks for you work.
I notice there is no command in your README for multi gpu training. I use the following command to train.
python tools/train_net_da.py --confpython -m torch.distributed.launch --nproc_per_node 4 tools/train_net_da.py --config-file configs/SIGMA/sigma_vgg16_sim10k_to_cityscapes.yamlig-file configs/SIGMA/sigma_vgg16_sim10k_to_cityscapes.yaml
However, I meet a problem

RuntimeError: Expected to have finished reduction in the prior iteration before starting a new one. This error indicates that your module has parameters that were not used in producing loss. 
You can enable unused parameter detection by (1) passing the keyword argument `find_unused_parameters=True` to `torch.nn.parallel.DistributedDataParallel`; (2) making sure all `forward` function outputs participate in calculating loss. 
If you already have done the above two steps, then the distributed data parallel module wasn't able to locate the output tensors in the return value of your module's `forward` function. 
Please include the loss function and the structure of the return value of `forward` of your module when reporting this issue (e.g. list, dict, iterable).

How can I solve this problem to train on multiple gpus?

Compare with model EPM?

@wymanCV
Thank you for your contribution. If I want to compare with model EPM, I noticed that model EPM needs two steps of training, while yours only needs one step. If possible, may I ask how to change your model to be like EPM? Thank you for your reply.

Question about number of warm up iterations

Hi, I would like to ask you about the number of iters used for warm-up. Haven't run the code myself but I was wondering if the 2000 iters is where the global-align only model converged? Did you find that number out of experimentation or did you follow the original config setting from previous works? Thanks.

训练到1720iter时报错

我又来打扰各位作者了哈哈,这次问题是这样的
Traceback (most recent call last):
File "tools/train_net_da.py", line 726, in
main()
File "tools/train_net_da.py", line 715, in main
MODEL = train(cfg, args.local_rank, args.distributed, args.test_only,args.use_tensorboard)
File "tools/train_net_da.py", line 601, in train
meters,
File "/home/e401/Desktop/wrs/projects/SIGMA/fcos_core/engine/trainer.py", line 299, in do_train
model, (images_s, images_t), targets=targets_s, return_maps=True)
File "/home/e401/Desktop/wrs/projects/SIGMA/fcos_core/engine/trainer.py", line 69, in foward_detector
(features_s, features_t), middle_head_loss = model_middle_head(images, (features_s,features_t), targets=targets, score_maps=score_maps )
File "/home/e401/anaconda3/envs/SIGMA/lib/python3.7/site-packages/torch/nn/modules/module.py", line 532, in call
result = self.forward(*input, **kwargs)
File "/home/e401/Desktop/wrs/projects/SIGMA/fcos_core/modeling/rpn/fcos/graph_matching_head.py", line 229, in forward
features, feat_loss = self._forward_train(images, features, targets, score_maps)
File "/home/e401/Desktop/wrs/projects/SIGMA/fcos_core/modeling/rpn/fcos/graph_matching_head.py", line 348, in _forward_train
matching_loss_quadratic = self._forward_qu(nodes_1, nodes_2, edges_1.detach(), edges_2.detach(), affinity)
File "/home/e401/Desktop/wrs/projects/SIGMA/fcos_core/modeling/rpn/fcos/graph_matching_head.py", line 624, in _forward_qu
sin2 = torch.sqrt(1.- F.cosine_similarity(triangle_2, triangle_2_tmp).pow(2)).sort()[0]
RuntimeError: The size of tensor a (8) must match the size of tensor b (9) at non-singleton dimension 0

其他信息:

  1. 用的是自己的数据集
  2. 同样的代码在另一个数据集下运行就没事
  3. 报错前loss依然较平稳下降中
    为何tensor维度会出问题呢,就感觉有点神奇

Questions about experiment results.

I trained about 50000 iters using 2080Ti with batch size as 2, and I found that the evaluation results are quite unstable. The AP50 fluctuated around 41 and reach a maximum of 43.5. I wanna ask you how to judge the convergence of the model and select the results to report.
Thanks a lot.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.