aosokin / os2d Goto Github PK

View Code? Open in Web Editor NEW

190.0 190.0 39.0 1.37 MB

OS2D: One-Stage One-Shot Object Detection by Matching Anchor Features

License: MIT License

Python 40.30% Shell 0.28% Jupyter Notebook 59.35% Dockerfile 0.08%

os2d's People

Contributors

Stargazers

Watchers

os2d's Issues

Segmentation fault

Upon evaluating the trained retrieval model, I'm getting a segmentation fault.

This is the snippet that I use to run the eval:

python main_detector_retrieval.py  --retrieval_multiscale  --maskrcnn_config_file detector/config/e2e_faster_rcnn_R_101_FPN_1x_multiscale_noClasses.yaml --maskrcnn_weight_file detector/output/exp0000-R-101-noCl-grozi/model_best.pth --retrieval_network_path retrieval/output/grozi/grozi-train-retrieval-rndCropPerImage10_resnet101_gem_whiten_contrastive_m0.85_adam_lr1.0e-06_wd1.0e-04_nnum5_qsize2000_psize20000_bsize32_imsize240/model_epoch0.pth.tar  --retrieval_image_size 240 is_cuda True eval.dataset_names "[\"grozi-val-new-cl\"]" eval.dataset_scales "[1280.0]" eval.mAP_iou_thresholds "[0.5]"

This is the error that I get:

DeprecatedFeatureWarning: apex.amp is deprecated and will be removed by the end of February 2023. Use [PyTorch AMP](https://pytorch.org/docs/stable/amp.html)
  warnings.warn(msg, DeprecatedFeatureWarning)
Segmentation fault (core dumped)

how about using the Efficient Neighbourhood Consensus Networks?

Thanks for your great job. Have you tried the paper "Efficient Neighbourhood Consensus Networks via Submanifold Sparse Convolutions" to replace current correlation map?
I am wondering if it is necessary to try it, could you give me some suggentions?

Same class images with different class_ids get different results?

I get different results when I use the same images with different class_ids. The input images configs in the demo.py are as blew:

class_images = [ c1.jpg, c1.jpg ] # the class images are two same images

if I set the class_ids=[0,1] or class_ids=[0,0] , the result is different. I am confused why these setting will lead to it. In my opinion, it should get the same result because the class_images are the same.

Reduce False Positives

Hey !
I get a lot of false positives during detection and i used your model that you gave.Can u suggest a solution to reduce the false positives?
Thanks.

TransformNet checkpoint load

Hi! First of all, I really appreciate your work and the high quality of your repository. I've been taking a look at it for some days now and I found it very well organized.

I saw that you made available the checkpoints for Rocco's TransformNet, however, I couldn't find the place where they're loaded when building the OS2D model. Could you point me at a specific folder to look for that checkpoint load?

By the way, I'm currently trying out other backbone architectures for feature extraction, have you experimented with other models besides ResNet?

Thanks!

Annotation of Grozi3.2K

Dear Anton Osokin ,
I just downloaded your annotation of Grozi3.2K dataset, but I can't visualize them. Could you share with me the original annotation?

few-shot

Thanks for the amazing work. I think one-shot & few-shot detection are going to change the game of object detection.

Problem Statement:

So far we can add multiple labels of one super class to the one-shot detector (for example, we could add multiple cars and bikes, so we know that classes 0-10 are cars and 11-20 are bikes).

Questions:

Is there any way to add few images to one class and somehow averaging the "Feature matching and alignment" of the results to get a more robust prediction?
Any way to combine the weights of multiple features before applying the head classifier?
Or maybe that's not even necessary, since we can group the labels (i.e., 0-10 --> car)?

I would appreciate any hint on that and would try to help in the implementation as well if that can be of any help.

Error in visualisation

I'm trying to run an evaluation and generate the detections on the fly. This is the error I'm facing

TypeError: show_detections() got multiple values for argument 'class_ids'

Full trace

Traceback (most recent call last):
  File "main_detector_retrieval.py", line 117, in <module>
    main()
  File "main_detector_retrieval.py", line 113, in main
    logger_prefix=logger_prefix)
  File "/home/user/os2d/baselines/detector_retrieval/evaluate_detector_retrieval.py", line 186, in evaluate
    visualizer.show_detections(boxes_one_image, image_id, dataloader, cfg_visualization, class_ids=None)
TypeError: show_detections() got multiple values for argument 'class_ids'

These are the changes in config.py file

cfg.visualization.eval.show_detections = True
cfg.visualization.eval.path_to_save_detections = "/home/user/os2d/baselines/detector_retrieval/gens"

This is how I'm running the evaluation

python main_detector_retrieval.py  --retrieval_multiscale  --maskrcnn_config_file detector/config/e2e_faster_rcnn_R_101_FPN_1x_multiscale_noClasses.yaml --maskrcnn_weight_file detector/output/exp0000-R-101-noCl-grozi/model_best.pth --retrieval_network_path retrieval/output/grozi/grozi-train-retrieval-rndCropPerImage10_resnet101_gem_whiten_contrastive_m0.85_adam_lr1.0e-06_wd1.0e-04_nnum5_qsize2000_psize20000_bsize32_imsize240/model_best.pth.tar --retrieval_image_size 240 is_cuda True eval.dataset_names "[\"grozi-val-new-cl\"]" eval.dataset_scales "[1280.0]" eval.mAP_iou_thresholds "[0.5]"

how to add coco dataset to os2d dataset.py

help in change of the architecture of the net

Hi, great work!

I tried to use your code and the v2 model on my data and the result is amazing.
But because my data have a big variation in the size of the image and the distance to the objects I get worse results.
To overcome this issue I tried the following:

work on several pyramid levels.
But because some images are big but took from a distance I need to increase the size of the image by the pyramid multiplayer as a result of the huge size the GPU memory crash
to work as a sliding window on the image - but again because of the size of the images, it takes a lot of time to process one image.

now, after going to the depth of your work I saw that you working directly on the C4 level of the feature extractor net ("resnet50").

my idea is to create FPN net and use layers C3, C4, and C5 and to create P3 to P7.
On each created layer to apply the os2head:

to do that I need to update the receptive field and the stride of the box generator and the size of the feature map size for each layer during the entire code.

the question of mine is how to determinate these parameters while:
the input size is 512

Size of the layers
P3: ([1, 128, 64, 64])
P4: ([1, 128, 32, 32])
P5: ([1, 128, 16, 16])
P6: ([1, 128, 8, 8])
P7: ([1, 128, 4, 4])

it would be wonder-full if you could also explain how did you calculate these parameters

config for COCO (maskrcnn-benchmark)

Sorry do you have training config (like config_training.yml in experiments) to correctly initialize from COCO dataset model? I was only able to init feature extractors from it. I would really appreciate if you provide config or which parameters i should tune.

cfg.model.normalization

Hi, aosokin:
If I train os2d on another dataset, should I recalculate mean and std of RGB in config.py line 28 & 29?

trainning from scratch

I the paper wroten that you trained the transform net from the model that has been pre-trained with weakalign.

I the repo "https://github.com/ignacio-rocco/weakalign.git" the architecture is different than yours. (two stages vs. one stage)
so how did you did it ?

How to visualize the test results?

Your work is excellent. Thank you for your work. How to visualize the test results?

making the code faster

Hi once again awsome work you did

in your head.py lines 371-376 :
default_boxes_xyxy_wrt_fm = self.box_grid_generator_feature_map_level.create_strided_boxes_columnfirst(fm_size=image_fm_size)

default_boxes_xyxy_wrt_fm = default_boxes_xyxy_wrt_fm.view(1, 1, image_fm_size.h, image_fm_size.w, 4)
# 1 (to broadcast to batch_size) x 1 (to broadcast to class batch_size) x  box_grid_height x box_grid_width x 4
default_boxes_xyxy_wrt_fm = default_boxes_xyxy_wrt_fm.to(resampling_grids_local_coord.device)
resampling_grids_fm_coord = convert_box_coordinates_local_to_global(resampling_grids_local_coord, default_boxes_xyxy_wrt_fm)

and lines 405-410:

default_boxes_xyxy_wrt_image = self.box_grid_generator_image_level.create_strided_boxes_columnfirst(fm_size=image_fm_size)
default_boxes_xyxy_wrt_image = default_boxes_xyxy_wrt_image.view(1, 1, image_fm_size.h, image_fm_size.w, 4)
# 1 (to broadcast to batch_size) x 1 (to broadcast to class batch_size) x  box_grid_height x box_grid_width x 4
default_boxes_xyxy_wrt_image = default_boxes_xyxy_wrt_image.to(resampling_grids_local_coord.device)
resampling_grids_image_coord = convert_box_coordinates_local_to_global(resampling_grids_local_coord, default_boxes_xyxy_wrt_image)

thus line do the same calculation, and they don't have trainable parameters, why we can't reuse the first calculation for the second one?

How to train the same model for custom data

Hello, thanx for the help.
Now, I am trying to train the model for my own images.
I put my images in the data/demo folder and also change number of iterations in the cfg.
Can you please suggest what more changes that I should do and also the command to train the model.

Thanks in advance.

multiple gpus for training

Hi aosokin,

Thanks for sharing the great work!

May I ask how to assign the number of GPUs used for training?
For example, I want to train 'V2-train' using 4 GPUs.

Thank you.

multiple images for same product

Some ideas about training with typical object detection datasets.

Hi, thanks for you greate work. I have some ideas about training the model with the typical OD datasets like VOC or COCO.
Since the grozi3.2k datasets needs labeled carefully and it is time consuming to get new data, I wondering if we could use the OD annotations directly.
Here are the steps:

Get the input image and it's bbox labels as a input image
Random select one class in the labels and crop it out as a class-image.
Considering the case that objects may be looks very different with the same class int the COCO or VOC, we could treat them as one class for semantic level one-shot OD or treat them as different for instance level one-shot OD.
So we get the input image and the class-image and train it.

Have you tried similar idea and could give me some advice?

Where to get the trained model on Instre and imageNet?

Thanks for your great work. The released v1-train ,v2-init,v2-train models are trained on the grozi3.2k dataset.
How could we get these models trained on the Instre or the Imagenet-LOC dataset? Training one model costs too much time.

how to use multiple input images in demo.ipynb

Dear Aosokin,

Thank you for the great job here, especially the detailed descriptions how to set-up the experiments and reproduce the results.

In your demo.ipynb, a single image used as the input, wondering if any way to use multiple (whole) images within one directory as the input? To scale up the demo.

Thank you.

some new ideas about YOLO to os2d

Hello, I have some ideas about changing the YOLO ont input branch to two branch, and apply the feature matching. Then use the standard YOLO Head to output the detection. (if not add the STN , the result will be better or worser?)
Could you give me some suggestions about the idea?

What is 3264 in the src?

Hey @aosokin

How does the number 3264 (folder named by this number and termed as average image size in code) come across? I wanted to test this on a custom dataset but am puzzled as to how to arrive at this figure.

CUDA out of memory

I have a dataset with 228454 images. Everytime I try to train OS2D on this dataset, CUDA runs out of memory

python main.py --config-file experiments/config_training.yml model.use_inverse_geom_model True model.use_simplified_affine_model False train.objective.loc_weight 0.0 train.model.freeze_bn_transform True model.backbone_arch ResNet50 init.model models/imagenet-caffe-resnet50-features-ac468af-converted.pth init.transform models/weakalign_resnet101_affine_tps.pth.tar train.mining.do_mining False output.path output/os2d_v2-train

I'm already using a small batch size (4) and have set cfg.eval.scales_of_image_pyramid to [0.5, 0.625, 1, 1.6] but still this error persists

Full trace of the log:

2022-12-17 11:52:14,366 OS2D INFO: Loaded configuration file experiments/config_training.yml
2022-12-17 11:52:14,366 OS2D INFO:
output:
  path: "" # Substitute ""
  save_iter: 0
  best_model:
    do_get_best_model: True
    dataset: "" # use the first validation dataset
    metric: "[email protected]"
    mode: "max"
is_cuda: True
random_seed: 0
init:
  model: "" # Substitute "models/resnet50-19c8e357.pth"
model:
  backbone_arch: "" # Substitute "ResNet50" or "ResNet101"
  use_inverse_geom_model: False # Substitute v1: False v2 : True
  use_simplified_affine_model: True # Substitute v1: True v2 : False
train:
  dataset_name: "grozi-train"
  dataset_scale: 789.0
  objective:
    class_objective: "RLL"
    loc_weight: 0.0 # Substitute v1: 0.2, v2: 0.0
    positive_iou_threshold: 0.5
    negative_iou_threshold: 0.1
    remap_classification_targets: True
    remap_classification_targets_iou_pos: 0.8
    remap_classification_targets_iou_neg: 0.4
  optim:
    anneal_lr:
      type: "MultiStepLR"
      milestones: [100000, 150000]
      gamma: 0.1

  model:
    freeze_bn: True
    freeze_bn_transform: True # Substitute v1: False, v2: True
    train_transform_on_negs: False
eval:
  iter: 1000
  dataset_names: ("grozi-val-new-cl",)
  dataset_scales: (789.0,)
  mAP_iou_thresholds: (0.5,)

2022-12-17 11:52:14,366 OS2D INFO: Running with config:
eval:
  batch_size: 1
  cache_images: False
  class_image_augmentation:
  dataset_names: ['grozi-val-new-cl']
  dataset_scales: [789.0]
  iter: 1000
  mAP_iou_thresholds: [0.5]
  nms_across_classes: False
  nms_iou_threshold: 0.3
  nms_score_threshold: -inf
  scales_of_image_pyramid: [0.5, 0.625, 1, 1.6]
  train_subset_for_eval_size: 0
init:
  model: models/imagenet-caffe-resnet50-features-ac468af-converted.pth
  transform: models/weakalign_resnet101_affine_tps.pth.tar
is_cuda: True
model:
  backbone_arch: ResNet50
  class_image_size: 240
  merge_branch_parameters: True
  normalization_mean: [0.485, 0.456, 0.406]
  normalization_std: [0.229, 0.224, 0.225]
  use_group_norm: False
  use_inverse_geom_model: True
  use_simplified_affine_model: False
output:
  best_model:
    dataset:
    do_get_best_model: True
    metric: [email protected]
    mode: max
  path: output/os2d_v2-train
  print_iter: 1
  save_iter: 0
  save_log_to_file: False
random_seed: 0
train:
  augment:
    jitter_aspect_ratio: 0.9
    min_box_coverage: 0.7
    mine_extra_class_images: False
    random_color_distortion: False
    random_crop_class_images: False
    random_flip_batches: False
    scale_jitter: 0.7
    train_patch_height: 600
    train_patch_width: 600
  batch_size: 4
  cache_images: False
  class_batch_size: 15
  dataset_name: grozi-train
  dataset_scale: 789.0
  do_training: True
  mining:
    do_mining: False
    mine_hard_patches_iter: 5000
    nms_iou_threshold_in_mining: 0.5
    num_hard_patches_per_image: 10
    num_random_negative_classes: 200
    num_random_pyramid_scales: 2
  model:
    freeze_bn: True
    freeze_bn_transform: True
    freeze_transform: False
    num_frozen_extractor_blocks: 0
    train_features: True
    train_transform_on_negs: False
  objective:
    class_neg_weight: 1.0
    class_objective: RLL
    loc_weight: 0.0
    neg_margin: 0.5
    neg_to_pos_ratio: 3
    negative_iou_threshold: 0.1
    pos_margin: 0.6
    positive_iou_threshold: 0.5
    remap_classification_targets: True
    remap_classification_targets_iou_neg: 0.4
    remap_classification_targets_iou_pos: 0.8
    rll_neg_weight_ratio: 0.001
  optim:
    anneal_lr:
      cooldown: 10000
      gamma: 0.1
      initial_patience: 0
      milestones: [100000, 150000]
      min_value: 1e-05
      patience: 1000
      quantity_epsilon: 0.01
      quantity_mode: max
      quantity_smoothness: 2000
      quantity_to_monitor: [email protected]_grozi-val-new-cl
      reduce_factor: 0.5
      reload_best_model_after_anneal_lr: True
      type: MultiStepLR
    lr: 0.0001
    max_grad_norm: 100.0
    max_iter: 200000
    optim_method: sgd
    sgd_momentum: 0.9
    weight_decay: 0.0001
visualization:
  eval:
    images_for_heatmaps: []
    labels_for_heatmaps: []
    max_detections: 10
    path_to_save_detections:
    score_threshold: -inf
    show_class_heatmaps: False
    show_detections: False
    show_gt_boxes: False
  mining:
    images_for_heatmaps: []
    labels_for_heatmaps: []
    max_detections: 10
    score_threshold: -inf
    show_class_heatmaps: False
    show_gt_boxes: False
    show_mined_patches: False
  train:
    max_detections: 5
    score_threshold: -inf
    show_detections: False
    show_gt_boxes_dataloader: False
    show_target_remapping: False
2022-12-17 11:52:14,366 OS2D INFO: Saving config into: output/os2d_v2-train/config.yml
2022-12-17 11:52:14,435 OS2D INFO: Building the OS2D model
2022-12-17 11:52:17,056 OS2D INFO: Creating model on one GPU
2022-12-17 11:52:17,067 OS2D INFO: Reading model file models/imagenet-caffe-resnet50-features-ac468af-converted.pth
2022-12-17 11:52:17,101 OS2D INFO: Cannot find 'net' in the checkpoint file
2022-12-17 11:52:17,101 OS2D INFO: Failed to load the full model, trying to init feature extractors
2022-12-17 11:52:17,101 OS2D INFO: Trying to init from models/imagenet-caffe-resnet50-features-ac468af-converted.pth
2022-12-17 11:52:17,145 OS2D INFO: FAILED to load as network
2022-12-17 11:52:17,145 OS2D INFO: Trying to init from models/imagenet-caffe-resnet50-features-ac468af-converted.pth as checkpoint
2022-12-17 11:52:17,145 OS2D INFO: FAILED to load as checkpoint
2022-12-17 11:52:17,145 OS2D INFO: Could not init the full feature extractor. Trying to init form a weakalign model
2022-12-17 11:52:17,145 OS2D INFO: Could not init from the weakalign network. Trying to init backbone from models/imagenet-caffe-resnet50-features-ac468af-converted.pth.
2022-12-17 11:52:17,155 OS2D INFO: Successfully initialized backbone.
2022-12-17 11:52:17,160 OS2D INFO: Trying to init affine transform from models/weakalign_resnet101_affine_tps.pth.tar
2022-12-17 11:52:17,351 OS2D INFO: Successfully initialized the affine transform from the provided weakalign model.
2022-12-17 11:52:17,353 OS2D INFO: OS2D has 139 blocks of 10169478 parameters (before freezing)
2022-12-17 11:52:17,353 OS2D INFO: OS2D has 139 blocks of 10169478 trainable parameters
2022-12-17 11:52:17,354 OS2D.dataset INFO: Preparing the GroZi-3.2k dataset: version grozi-train, eval scale 789.0, image caching False
2022-12-17 11:52:17,982 OS2D.dataset INFO: Reading query images
2022-12-17 11:53:01,132 OS2D.dataset INFO: Read 14136 GT images
2022-12-17 11:53:01,149 OS2D.dataset INFO: Reading target images
100%|██████████| 194526/194526 [6:02:58<00:00,  8.93it/s]
2022-12-17 17:56:00,012 OS2D.dataset INFO: Found 194526 data images
2022-12-17 17:56:00,030 OS2D.dataset INFO: Loaded dataset grozi-train with 194526 images, 291076 boxes, 14136 classes
2022-12-17 17:56:00,044 OS2D.eval.dataset INFO: Preparing the GroZi-3.2k dataset: version grozi-val-new-cl, eval scale 789.0, image caching False
2022-12-17 17:56:00,561 OS2D.eval.dataset INFO: Reading query images
2022-12-17 17:56:14,931 OS2D.eval.dataset INFO: Read 12213 GT images
2022-12-17 17:56:14,936 OS2D.eval.dataset INFO: Reading target images
100%|██████████| 63354/63354 [2:05:02<00:00,  8.44it/s]
2022-12-17 20:01:17,803 OS2D.eval.dataset INFO: Found 63354 data images
2022-12-17 20:01:17,812 OS2D.eval.dataset INFO: Loaded dataset grozi-val-new-cl with 63354 images, 72769 boxes, 12213 classes
2022-12-17 20:01:17,955 OS2D.train INFO: Start training
2022-12-17 20:01:17,955 OS2D.evaluate INFO: Starting to eval on grozi-val-new-cl, scale 789.0
2022-12-17 20:01:17,956 OS2D.evaluate INFO: Extracting scores from all images
2022-12-17 20:01:32,900 OS2D.evaluate INFO: Extracting weights from 12213 classes
2022-12-17 20:05:07,670 OS2D.eval.dataloader INFO: Image batch 0 out of 63354
/data1/saswats/baseline/os2d/os2d/engine/evaluate.py:292: UserWarning: __floordiv__ is deprecated, and its behavior will change in a future version of pytorch. It currently rounds toward 0 (like the 'trunc' function NOT 'floor'). This results in incorrect rounding for negative values. To keep the current behavior, use torch.div(a, b, rounding_mode='trunc'), or for actual floor division, use torch.div(a, b, rounding_mode='floor').
  batch_class_ids = [class_ids[l // num_class_views] for l in batch_labels_local]
/data1/saswats/baseline/os2d/os2d/engine/evaluate.py:293: UserWarning: __floordiv__ is deprecated, and its behavior will change in a future version of pytorch. It currently rounds toward 0 (like the 'trunc' function NOT 'floor'). This results in incorrect rounding for negative values. To keep the current behavior, use torch.div(a, b, rounding_mode='trunc'), or for actual floor division, use torch.div(a, b, rounding_mode='floor').
  batch_query_img_sizes = [query_img_sizes[l // num_class_views] for l in batch_labels_local]
/data1/saswats/miniconda3/envs/os2d/lib/python3.6/site-packages/torch/functional.py:445: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at  /opt/conda/conda-bld/pytorch_1639180593867/work/aten/src/ATen/native/TensorShape.cpp:2157.)
  return _VF.meshgrid(tensors, **kwargs)  # type: ignore[attr-defined]
2022-12-17 20:09:05,610 OS2D.evaluate INFO: Feature time: 4.81s, Label time: 232.43s, Net time: 0h 3m 57s
2022-12-17 20:09:10,840 OS2D.evaluate INFO: loss 0.1083, class_loss_per_element_detached_cpu 0.0000, loc_smoothL1 4.5910, cls_RLL 0.1083, cls_RLL_pos 0.0930, cls_RLL_neg 0.0152,
2022-12-17 20:10:26,108 OS2D.eval.dataloader INFO: Image batch 1 out of 63354
2022-12-17 20:14:18,661 OS2D.evaluate INFO: Feature time: 0.07s, Label time: 231.11s, Net time: 0h 3m 52s
2022-12-17 20:14:23,169 OS2D.evaluate INFO: loss 0.1185, class_loss_per_element_detached_cpu 0.0000, loc_smoothL1 4.6364, cls_RLL 0.1185, cls_RLL_pos 0.1032, cls_RLL_neg 0.0154,
2022-12-17 20:15:39,063 OS2D.eval.dataloader INFO: Image batch 2 out of 63354
2022-12-17 20:19:31,042 OS2D.evaluate INFO: Feature time: 0.06s, Label time: 230.85s, Net time: 0h 3m 51s
Traceback (most recent call last):
  File "main.py", line 98, in <module>
    main()
  File "main.py", line 94, in main
    trainval_loop(dataloader_train, net, cfg, criterion, optimizer, dataloaders_eval=dataloaders_eval)
  File "/data1/saswats/baseline/os2d/os2d/engine/train.py", line 426, in trainval_loop
    meters_eval = evaluate_model(dataloaders_eval, net, cfg, criterion)
  File "/data1/saswats/baseline/os2d/os2d/engine/train.py", line 392, in evaluate_model
    meters_val = evaluate(dataloader, net, cfg, criterion=criterion, print_per_class_results=print_per_class_results)
  File "/data1/saswats/miniconda3/envs/os2d/lib/python3.6/site-packages/torch/autograd/grad_mode.py", line 28, in decorate_context
    return func(*args, **kwargs)
  File "/data1/saswats/baseline/os2d/os2d/engine/evaluate.py", line 97, in evaluate
    add_batch_dim(class_targets_pyramid)
  File "/data1/saswats/miniconda3/envs/os2d/lib/python3.6/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "/data1/saswats/baseline/os2d/os2d/engine/objective.py", line 257, in forward
    neg_ranking = self._hard_negative_mining(cls_loss.unsqueeze(0), mask_all_negs.unsqueeze(0)).squeeze(0)  # [batch_size, num_labels, num_anchors]
  File "/data1/saswats/baseline/os2d/os2d/engine/objective.py", line 68, in _hard_negative_mining
    _, rank_mined = idx.sort(1)      # [batch_size, *]
RuntimeError: CUDA out of memory. Tried to allocate 1.93 GiB (GPU 0; 47.54 GiB total capacity; 41.29 GiB already allocated; 1.43 GiB free; 44.01 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

training precision

Thanks a lot for your helpful code! When I train the V2 version on the Grozi-3.2k dataset according to your instructions in README, the predicted bboxes are all square ones. But your pretrained weights is right. So I don't know what's wrong with my operations.

Cannot find 'net' in the checkpoint file

I initiated the training using this line

python main.py --config-file experiments/config_training.yml model.use_inverse_geom_model True model.use_simplified_affine_model False train.objective.loc_weight 0.0 train.model.freeze_bn_transform True model.backbone_arch ResNet50 init.model models/imagenet-caffe-resnet50-features-ac468af-converted.pth init.transform models/weakalign_resnet101_affine_tps.pth.tar train.mining.do_mining True output.path output/os2d_v2-train

But I'm getting this error in the logs:

2022-12-14 00:08:49,167 OS2D INFO: Saving config into: output/os2d_v2-train/config.yml
2022-12-14 00:08:49,305 OS2D INFO: Building the OS2D model
2022-12-14 00:08:52,171 OS2D INFO: Creating model on one GPU
2022-12-14 00:08:52,200 OS2D INFO: Reading model file models/imagenet-caffe-resnet50-features-ac468af-converted.pth
2022-12-14 00:08:52,257 OS2D INFO: Cannot find 'net' in the checkpoint file
2022-12-14 00:08:52,257 OS2D INFO: Failed to load the full model, trying to init feature extractors
2022-12-14 00:08:52,257 OS2D INFO: Trying to init from models/imagenet-caffe-resnet50-features-ac468af-converted.pth
2022-12-14 00:08:52,340 OS2D INFO: FAILED to load as network
2022-12-14 00:08:52,340 OS2D INFO: Trying to init from models/imagenet-caffe-resnet50-features-ac468af-converted.pth as checkpoint
2022-12-14 00:08:52,340 OS2D INFO: FAILED to load as checkpoint
2022-12-14 00:08:52,340 OS2D INFO: Could not init the full feature extractor. Trying to init form a weakalign model
2022-12-14 00:08:52,340 OS2D INFO: Could not init from the weakalign network. Trying to init backbone from models/imagenet-caffe-resnet50-features-ac468af-converted.pth.
2022-12-14 00:08:52,363 OS2D INFO: Successfully initialized backbone.
2022-12-14 00:08:52,363 OS2D INFO: Trying to init affine transform from models/weakalign_resnet101_affine_tps.pth.tar
2022-12-14 00:08:52,664 OS2D INFO: Successfully initialized the affine transform from the provided weakalign model.
2022-12-14 00:08:52,666 OS2D INFO: OS2D has 139 blocks of 10169478 parameters (before freezing)
2022-12-14 00:08:52,667 OS2D INFO: OS2D has 139 blocks of 10169478 trainable parameters
2022-12-14 00:08:52,667 OS2D.dataset INFO: Preparing the GroZi-3.2k dataset: version grozi-train, eval scale 1280.0, image caching True
Preparing the GroZi-3.2k dataset: version grozi-train, eval scale 1280.0, image caching True

sampling function

Hi, can you give a small explanation of how did you implement your fast sampling?

Error on using multi GPU setting

Hey @aosokin,
I tried running the training on the given dataset and am facing this error while testing

TypeError: zip argument #1 must support iteration

Here's how I'm running the training:

python trainval_net.py --mGPUs --cuda --dataset grozi-train --dataset_val grozi-val-new-cl --init_weights /home/user/exp/os2d/baselines/CoAE/experiments/../../../models/resnet101-5d3b4d8f.pth --disp_interval 1 --val_interval 10 --nw 4 --bs 8 --s 1 --epochs 2000 --lr_decay_milestones 1000 1500 --lr 0.01 --lr_decay_gamma 0.1 --lr_reload_best_after_decay True --save_dir /home/user/exp/os2d/baselines/CoAE/output/grozi/coae.0.res101_initPytorch_query192_scale900_ms --net res101 --set DATA_DIR /home/user/exp/os2d/baselines/CoAE/data TRAIN.MAX_SIZE 3000 TEST.MAX_SIZE 3000 TRAIN.query_size 192 TRAIN.SCALES [450,562,720,900,1080,1260,1440] TEST.SCALES [900]

Full trace of error:

Traceback (most recent call last):
  File "trainval_net.py", line 504, in <module>
    mAP = test(args_val, model=fasterRCNN)
  File "/home/user/exp/os2d/baselines/CoAE/test_net.py", line 177, in test
    rois_label, weight = fasterRCNN(im_data, q, im_info, gt_boxes, catgory)
  File "/home/user/miniconda3/envs/os2d/lib/python3.7/site-packages/torch/nn/modules/module.py", line 532, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/user/miniconda3/envs/os2d/lib/python3.7/site-packages/torch/nn/parallel/data_parallel.py", line 152, in forward
    outputs = self.parallel_apply(replicas, inputs, kwargs)
  File "/home/user/miniconda3/envs/os2d/lib/python3.7/site-packages/torch/nn/parallel/data_parallel.py", line 162, in parallel_apply
    return parallel_apply(replicas, inputs, kwargs, self.device_ids[:len(replicas)])
  File "/home/user/miniconda3/envs/os2d/lib/python3.7/site-packages/torch/nn/parallel/parallel_apply.py", line 85, in parallel_apply
    output.reraise()
  File "/home/user/miniconda3/envs/os2d/lib/python3.7/site-packages/torch/_utils.py", line 394, in reraise
    raise self.exc_type(msg)
TypeError: Caught TypeError in replica 0 on device 0.
Original Traceback (most recent call last):
  File "/home/user/miniconda3/envs/os2d/lib/python3.7/site-packages/torch/nn/parallel/parallel_apply.py", line 60, in _worker
    output = module(*input, **kwargs)
  File "/home/user/miniconda3/envs/os2d/lib/python3.7/site-packages/torch/nn/modules/module.py", line 532, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/user/miniconda3/envs/os2d/lib/python3.7/site-packages/torch/nn/parallel/data_parallel.py", line 153, in forward
    return self.gather(outputs, self.output_device)
  File "/home/user/miniconda3/envs/os2d/lib/python3.7/site-packages/torch/nn/parallel/data_parallel.py", line 165, in gather
    return gather(outputs, output_device, dim=self.dim)
  File "/home/user/miniconda3/envs/os2d/lib/python3.7/site-packages/torch/nn/parallel/scatter_gather.py", line 68, in gather
    res = gather_map(outputs)
  File "/home/user/miniconda3/envs/os2d/lib/python3.7/site-packages/torch/nn/parallel/scatter_gather.py", line 63, in gather_map
    return type(out)(map(gather_map, zip(*outputs)))
  File "/home/user/miniconda3/envs/os2d/lib/python3.7/site-packages/torch/nn/parallel/scatter_gather.py", line 63, in gather_map
    return type(out)(map(gather_map, zip(*outputs)))
TypeError: zip argument #1 must support iteration

How do I handle this issue? This error doesn't come up when--mGPUs flag is off

Questions about the V2 model approach

Hi, thanks for your great work.
I have a question about the V2 approach in the paper. In section 6.1, the V1 approach P=4 allows to have full supervision and the V2 approach P=6 requires weak supervision for training.

However, I remember the P=6 full affine transformations can also be trained with full supervision by generating random affine params in Rocco's paper "Convolutional neural network architecture for geometric matching".

I am wandering if there has some different?

Security alert from dependabot

Hi @Diogo364 ,
I've received a message form github that one of packages your edit depending on has a security vulnerability:

"An issue discovered in Python Charmers Future 0.18.2 and earlier allows remote attackers to cause a denial of service via crafted Set-Cookie header from malicious web server. "

Would it be possible to do something in Docker/requirements.txt to get rid of this vulnerability?

multi process; scale

Hi, nice work! I have two questions about the code:

Could the training be run with multiple GPUs and the dataloader be run with multiple workers? That would be much faster
How should I set the dataset scale? What does it mean? I saw that the scale "defining the expected object size" but still cannot understand what it is and how should I set it. Should I change it when I apply your trained model to some new images?

Many thanks!

Reproducing the model results on the paste-f dataset shown in the research paper

Hi, thanks a lot for your work!

I tried reproducing the model results shown in figure 3 of the research paper, shown below.

running the demo using the respective images yielded the following result

class image being 3547.jpg and the test image being 16486.jpg

is this related to the class pyramids not being implemented in the demo?

Need for OrderedDict

Hey @aosokin

Large datasets (~1.7M target images) take forever to be read (loaded into memory) via the _read_dataset_images function, and I was looking for a way to parallelize it with multiprocessing. But your implementation uses OrderedDict in many places, which isn't sharable across threads. Is there any particular reason for using this data structure, or can a normal dict serve the same purpose here?

Question about receptive field

why the feature map went through and its rf==16，whts does this "feature_map_receptive_field=FeatureMapSize(h=16, w=16) "mean? And in ur concat both transformnet([7,5,5] the rf is15) and baacbone. So how did the rf in bacbone feature extraction is 16?

Multi-GPU training for detector

Can we train the CNN detector using multiple GPUs? The available arguments cause the program to hang after loading the data and model. I'm using the following line to initiate the process.

python -m torch.distributed.launch --nproc_per_node=2  --use_env train_detector.py  --config-file config/e2e_faster_rcnn_R_50_FPN_1x_multiscale_noClasses.yaml DATASETS.TRAIN [\"grozi-train\"] DATASETS.TEST [\"grozi-val-all\"] INPUT.MIN_SIZE_TRAIN [480,600,768,960,1152,1344,1536] INPUT.MAX_SIZE_TRAIN 2048 INPUT.MIN_SIZE_TEST 960 INPUT.MAX_SIZE_TEST 1280 OUTPUT_DIR output/exp0000-R-50-noCl-grozi

New form of evaluation

Reusing this codebase, can we get the prediction and ground truths in two text files in the following format?
<class_name> <left> <top> <right> <bottom> [<difficult>]

Number of iterations

I am trying to run the code for one-shot object detection, but I am not able to complete 5000 iterations. I want to try for less number of iterations, how can I edit the number of iterations?
Thank you in advance.

A question about exact model architecture !

Hello. First of all, I'm very appreciative of your great work!

I am now trying to apply your os2d-v2-trained model to my custom data.
However, I cannot improve evaluate result since the model is not able to be trained because of
no-annotation with my custom data.
So during tagging my custom data, I'm planning to insert other modules to your os2d-v2-trained model, but I think I cannot get access to the exact architecture of the model, for example changing anchors but I cannot find where they are.

I'm very looking forward to your reply.
Thank you.

Error Message

Hi Anton

I am stuck with this error message:
assert class_images is not None, "If class_conv_layer is None than class_images cannot be None"

I want to convert your pytorch model into tensorflow model using onnx. I try to create dummy_input and get this error message.
Can you please explain it? Do I always need to specify class_images?

Here is an example of my conversion code:

model = Os2dModel()
dummy_input = Variable(torch.randn(64, 3, 7, 7, device='cuda'))
torch.onnx.export(Os2dModel(logger, is_cuda = True), dummy_input, "os2d_v2-train.onnx")

false positives

Dear Aosokin,

Thank you for the great job here, especially the detailed descriptions how to set-up the experiments and reproduce the results.
how to reduce the false positives in this case

the class image is

Thankyou

Did I get enough iterations in my training process?

Dear Anton Osokin：
Thanks a LOT for your helpful code, I'm trying to apply it on my own datasets but confused about setting some parameters，since I change the data used for training while keeping most parameters setting unchanged, the mAP is not good definitely， However, after 30000 iterations (15 epochs with batch size as 8), I only get 4.7% and it never increases in the next 50000 iterations. Difficult to judge from the loss values so I want to know whether my training process getting enough iterations and which parameters are vital to the effect of the model when I change the datasets.
Any response would be appreciated! Thanks for your time!

Model size

Hi all,

I am trying to train the model and I have got Nvidia GPU Titan 1080 with 11GM RAM I am also doing the hard mining with using the option train.cache_images FALSE. However, at the end of the hard mining procss I get an error message that the graphic card is running out of memory. Is there any opportunity to decrease the memory size?

Thanks in advance,
Andi

Why is loc_weight=0.0?

Hi Aosokin,

In the experiment for instre

The loc_weight is set to 0.0. I can't figure out why we do not need the localization loss.

Thank you.

output_recognition = self.resample_of_correlation_map_fast(
 cor_maps_for_recognition,                                                             
 resampling_grids_fm_coord_unit, 
 self.class_pool_mask)

if output_recognition.requires_grad:            
    output_recognition_transform_detached = self.resample_of_correlation_map_fast(
        cor_maps_for_recognition,
        resampling_grids_fm_coord_unit.detach(),
        self.class_pool_mask)
else:
    # Optimization to make eval faster
    output_recognition_transform_detached = output_recognition

why you didn't do just

if output_recognition.requires_grad:            
    output_recognition_transform_detached = self.resample_of_correlation_map_fast(
        cor_maps_for_recognition,
        resampling_grids_fm_coord_unit.detach(),
        self.class_pool_mask)
else:
    output_recognition_transform_detached = self.resample_of_correlation_map_fast(
        cor_maps_for_recognition,                                                             
        resampling_grids_fm_coord_unit, 
        self.class_pool_mask)

because if the output_recognition.requires_grad is true you run it twice

aosokin / os2d Goto Github PK

os2d's People

Contributors

Stargazers

Watchers

Forkers

os2d's Issues

Problem Statement:

Questions:

Recommend Projects

Recommend Topics

Recommend Org