yrcong / reltr Goto Github PK

View Code? Open in Web Editor NEW

231.0 4.0 46.0 27.53 MB

RelTR: Relation Transformer for Scene Graph Generation: https://arxiv.org/abs/2201.11460v2

Python 97.98% Cython 1.97% Shell 0.04%

scene-graph-generation end-to-end visual-relationship-detection

reltr's Introduction

Here is Yuren Cong!

reltr's People

Contributors

Stargazers

Watchers

Forkers

sugunitwente swipswaps chomolungma solomonkimunyu chiranthancv95 xy-lin qiming-zou emmanuelol haeslori dhgarcia park-youngjae richardy-lobo-sapan acrocorinth linhuixiao tonmoytalukder mohammedessamtga 18bhushansawant pschydlo macemoth prabumadhavan bhupindersnijjar anoop-phoenix aidasdir ihaeyong lindadeltax tracerxj nguyenhuusang2001 yifan-zhao y10ab1 elainegxy shaozhiyujiayou rascalgdd aryanmangal769 fatemeh-seyfi lianglili dreamer312 klinsc aelayouch gnehz abhiramkns sun521-com sycamore-ma foobar41 gorrg monesh-97

reltr's Issues

dataset bbox

Hi, Yuren:
I want to know what is the format of annotation['bbox'] in train/val/test.json file in Visual Genome (in COCO-format) xyxy or xyhw?
If it is xyxy, which two points in the bounding box are represented by xy xy？
If it is xywh, which point does xy represent, and where is the (0,0) point of the picture？
Thanks！

you commented out training code, training does not work at the moment

4d676ee

The problem of training time

Thanks for your inspiring work！
According to your report, the model was trained on 8 RTX2080 (2 images per GPU) for 150 epochs to reach the expected performance, which is a bit too long to modify/re-train based on the code.

Therefore, I would like to know if the model took so long to train because the DETR part was completely retrained?
Is it possible to use the pre-trained DETR encoder part to reduce the training overhead?

What happens when there are no relations in a sample?

Hi and thanks for sharing this interesting work!

I am working on supporting another dataset in vg format. In my dataset I have images with no relation between the objects and I wonder how to make the network support this.

There is an issue since the images still contain boxes of objects but targets['rel_annotations'] is empty. Therefore in matcher forward pass, indices has a different length from indices1.

I'd appreciate your input. Thanks in advance!

Hi, the training results are lower than reported results.

I git clone the code and start training on 2 A40 GPUs, 8 images per GPU. The hyperparameters remain the same with the original ones. Did I miss something? : (

System:
pytorch 2.0.1 py3.10_cuda11.8_cudnn8.7.0_0 pytorch
pytorch-cuda 11.8 h7e8668a_5 pytorch
torchvision 0.15.2 py310_cu118 pytorch

`======================sgdet============================
R@20: 0.203254
R@50: 0.250673
R@100: 0.271782

relationship: above
======================sgdet============================
R@20: 0.028179
R@50: 0.047680
R@100: 0.063172

relationship: across
======================sgdet============================
R@20: 0.015873
R@50: 0.015873
R@100: 0.047619

relationship: against
======================sgdet============================
R@20: 0.008065
R@50: 0.008065
R@100: 0.016129

relationship: along
======================sgdet============================
R@20: 0.000000
R@50: 0.000000
R@100: 0.000000

relationship: and
======================sgdet============================
R@20: 0.005917
R@50: 0.005917
R@100: 0.017751

relationship: at
======================sgdet============================
R@20: 0.111198
R@50: 0.163596
R@100: 0.176330

relationship: attached to
======================sgdet============================
R@20: 0.001179
R@50: 0.001474
R@100: 0.006191

relationship: behind
======================sgdet============================
R@20: 0.111431
R@50: 0.186552
R@100: 0.226916

relationship: belonging to
======================sgdet============================
R@20: 0.000000
R@50: 0.000000
R@100: 0.000000

relationship: between
======================sgdet============================
R@20: 0.000000
R@50: 0.000000
R@100: 0.003472

relationship: carrying
======================sgdet============================
R@20: 0.126705
R@50: 0.159264
R@100: 0.183682

relationship: covered in
======================sgdet============================
R@20: 0.016071
R@50: 0.018452
R@100: 0.025595

relationship: covering
======================sgdet============================
R@20: 0.000000
R@50: 0.000000
R@100: 0.012903

relationship: eating
======================sgdet============================
R@20: 0.059603
R@50: 0.094923
R@100: 0.118102

relationship: flying in
======================sgdet============================
R@20: 0.000000
R@50: 0.060606
R@100: 0.060606

relationship: for
======================sgdet============================
R@20: 0.022814
R@50: 0.035109
R@100: 0.039208

relationship: from
======================sgdet============================
R@20: 0.000000
R@50: 0.000000
R@100: 0.000000

relationship: growing on
======================sgdet============================
R@20: 0.000000
R@50: 0.000000
R@100: 0.000000

relationship: hanging from
======================sgdet============================
R@20: 0.000000
R@50: 0.003984
R@100: 0.003984

relationship: has
======================sgdet============================
R@20: 0.270150
R@50: 0.329975
R@100: 0.352970

relationship: holding
======================sgdet============================
R@20: 0.213111
R@50: 0.251752
R@100: 0.267716

relationship: in
======================sgdet============================
R@20: 0.098441
R@50: 0.143542
R@100: 0.169576

relationship: in front of
======================sgdet============================
R@20: 0.030335
R@50: 0.049338
R@100: 0.064679

relationship: laying on
======================sgdet============================
R@20: 0.081081
R@50: 0.121622
R@100: 0.130631

relationship: looking at
======================sgdet============================
R@20: 0.029451
R@50: 0.041499
R@100: 0.059572

relationship: lying on
======================sgdet============================
R@20: 0.040816
R@50: 0.051020
R@100: 0.051020

relationship: made of
======================sgdet============================
R@20: 0.000000
R@50: 0.000000
R@100: 0.000000

relationship: mounted on
======================sgdet============================
R@20: 0.000000
R@50: 0.000000
R@100: 0.000000

relationship: near
======================sgdet============================
R@20: 0.088555
R@50: 0.148588
R@100: 0.188174

relationship: of
======================sgdet============================
R@20: 0.188986
R@50: 0.256768
R@100: 0.279926

relationship: on
======================sgdet============================
R@20: 0.222539
R@50: 0.277110
R@100: 0.302019

relationship: on back of
======================sgdet============================
R@20: 0.000000
R@50: 0.000000
R@100: 0.000000

relationship: over
======================sgdet============================
R@20: 0.023573
R@50: 0.034739
R@100: 0.042184

relationship: painted on
======================sgdet============================
R@20: 0.000000
R@50: 0.000000
R@100: 0.000000

relationship: parked on
======================sgdet============================
R@20: 0.048555
R@50: 0.084791
R@100: 0.108387

relationship: part of
======================sgdet============================
R@20: 0.000000
R@50: 0.000000
R@100: 0.000000

relationship: playing
======================sgdet============================
R@20: 0.000000
R@50: 0.000000
R@100: 0.090909

relationship: riding
======================sgdet============================
R@20: 0.215077
R@50: 0.267793
R@100: 0.282938

relationship: says
======================sgdet============================
R@20: 0.000000
R@50: 0.083333
R@100: 0.083333

relationship: sitting on
======================sgdet============================
R@20: 0.098548
R@50: 0.150509
R@100: 0.164636

relationship: standing on
======================sgdet============================
R@20: 0.048238
R@50: 0.071938
R@100: 0.090096

relationship: to
======================sgdet============================
R@20: 0.000000
R@50: 0.000000
R@100: 0.000000

relationship: under
======================sgdet============================
R@20: 0.074767
R@50: 0.101710
R@100: 0.121233

relationship: using
======================sgdet============================
R@20: 0.114583
R@50: 0.140625
R@100: 0.170000

relationship: walking in
======================sgdet============================
R@20: 0.000000
R@50: 0.004695
R@100: 0.004695

relationship: walking on
======================sgdet============================
R@20: 0.038874
R@50: 0.089455
R@100: 0.111126

relationship: watching
======================sgdet============================
R@20: 0.008418
R@50: 0.061785
R@100: 0.083550

relationship: wearing
======================sgdet============================
R@20: 0.386910
R@50: 0.412767
R@100: 0.421221

relationship: wears
======================sgdet============================
R@20: 0.004430
R@50: 0.024502
R@100: 0.044608

relationship: with
======================sgdet============================
R@20: 0.029754
R@50: 0.070640
R@100: 0.093761

======================sgdet mean recall with constraint============================
mR@20: 0.05724454045111795
mR@50: 0.08143982200281324
mR@100: 0.09561244052384507
Averaged stats: class_error: 60.00 sub_error: 50.00 obj_error: 0.00 rel_error: 75.00 loss: 16.0778 (18.9691) loss_ce: 0.2776 (0.4303) loss_bbox: 0.8823 (0.9988) loss_giou: 1.0329 (1.0426) loss_rel: 0.4454 (0.6207) loss_ce_0: 0.3065 (0.4637) loss_bbox_0: 1.0056 (1.1189) loss_giou_0: 1.1239 (1.1814) loss_rel_0: 0.4299 (0.5930) loss_ce_1: 0.3023 (0.4488) loss_bbox_1: 0.9873 (1.0519) loss_giou_1: 1.0969 (1.1049) loss_rel_1: 0.3992 (0.5942) loss_ce_2: 0.2839 (0.4382) loss_bbox_2: 0.8772 (1.0255) loss_giou_2: 1.0365 (1.0784) loss_rel_2: 0.4104 (0.5954) loss_ce_3: 0.2801 (0.4330) loss_bbox_3: 0.7734 (1.0091) loss_giou_3: 1.0344 (1.0574) loss_rel_3: 0.4219 (0.5989) loss_ce_4: 0.2719 (0.4310) loss_bbox_4: 0.7968 (0.9987) loss_giou_4: 1.0301 (1.0459) loss_rel_4: 0.4093 (0.6088) loss_ce_unscaled: 0.2776 (0.4303) class_error_unscaled: 28.5714 (34.1161) sub_error_unscaled: 33.3333 (54.1432) obj_error_unscaled: 33.3333 (48.1875) loss_bbox_unscaled: 0.1765 (0.1998) loss_giou_unscaled: 0.5165 (0.5213) cardinality_error_unscaled: 7.0000 (7.0240) loss_rel_unscaled: 0.4454 (0.6207) rel_error_unscaled: 56.2500 (66.5290) loss_ce_0_unscaled: 0.3065 (0.4637) loss_bbox_0_unscaled: 0.2011 (0.2238) loss_giou_0_unscaled: 0.5619 (0.5907) cardinality_error_0_unscaled: 7.0000 (8.3696) loss_rel_0_unscaled: 0.4299 (0.5930) loss_ce_1_unscaled: 0.3023 (0.4488) loss_bbox_1_unscaled: 0.1975 (0.2104) loss_giou_1_unscaled: 0.5484 (0.5524) cardinality_error_1_unscaled: 8.0000 (8.0064) loss_rel_1_unscaled: 0.3992 (0.5942) loss_ce_2_unscaled: 0.2839 (0.4382) loss_bbox_2_unscaled: 0.1754 (0.2051) loss_giou_2_unscaled: 0.5182 (0.5392) cardinality_error_2_unscaled: 5.0000 (7.8084) loss_rel_2_unscaled: 0.4104 (0.5954) loss_ce_3_unscaled: 0.2801 (0.4330) loss_bbox_3_unscaled: 0.1547 (0.2018) loss_giou_3_unscaled: 0.5172 (0.5287) cardinality_error_3_unscaled: 6.0000 (7.5166) loss_rel_3_unscaled: 0.4219 (0.5989) loss_ce_4_unscaled: 0.2719 (0.4310) loss_bbox_4_unscaled: 0.1594 (0.1997) loss_giou_4_unscaled: 0.5150 (0.5229) cardinality_error_4_unscaled: 5.0000 (6.8948) loss_rel_4_unscaled: 0.4093 (0.6088)
Accumulating evaluation results...
DONE (t=127.17s).
IoU metric: bbox
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.132
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.262
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.115
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.032
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.086
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.183
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.210
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.330
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.337
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.136
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.272
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.403`

Inquiry about code behavior in relation to constraints of evaluation metrics

Hello,

Firstly, I want to express my gratitude for sharing your code. It has been incredibly helpful to me. However, I have a question regarding the evaluation metric when considering constraints versus without constraints.

From my understanding, the multiple_preds variable is used to control different modes. However, upon reviewing the code, I couldn't identify any discernible differences in actions taken for each mode.

Could you kindly provide some clarification on how the code behaves differently when multiple_preds is set to different values? I would greatly appreciate any insights you can provide.

Thank you once again for your assistance.

Error in Loading pre-trained weights

I tried to run your awesome colab notebook but i have encountered an error while loading the pre-trained weights

Some questions about evaluation

Hello,
Firstly, I want to express my gratitude for sharing your code. It has been incredibly helpful to me.

I see such evaluation results in your paper

How did you get your SGDet, SGCls, PredCls evaluation results?

After running the evaluation in your code, I can't find a correspondence。

Could you please give me some code about it or explain the results of your evaluation results?

I'm looking forward to your response.

Evaluation

I don't understand why class_ Error, sub_ Error, obj_ Error, rel_ Is the error so high that I misunderstood or was the evaluation process wrong

Will RelTR support Resnet101 as backbone and provided a trained checkpoint with Resnet101 as backbone?

The training and evaluation code for OpenImage dataste.

Hi,
many thanks for your great work and contribution. I find that the code for OpenImage dataset hasn't been released. Is it possible for you to give a timeline about when it will be made public?

Evaluation

RuntimeError: CUDA error: device-side assert triggered CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1. Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

when i want to trian RelTR on Open Images V6 with a single GPU
python main.py --dataset oi --img_folder /home/ybz/RelTR/data/oi/images/ --ann_path /home/ybz/RelTR/data/ --batch_size 1 --output_dir ckpt1

Regarding Pretrained Model Weights

Dear Author, thanks for your fabulous code! I have a question regarding model training, wondering if you initialized the model with pretrained detr model or you just train the whole network from scratch?

Does the provided checkpoint only support 152 object classes?

Thanks for your excellent work! I wonder if there is a checkpoint supporting more detected object classes?

about Predcls

I found out the code about Predcls from this issue #20 (comment)

You mentioned the function

def evaluate_batch_predcls(outputs, targets, evaluator, matching_indices, evaluator_list):
.......

Could you give some details about computing matching_indices. Thanks!

evaluating RelTR on PredCLS/SGCLS

Dear Author, thanks for your code. I have a question about evaluating RelTR on PredCLS/SGCLS , wondering how to assign the ground truth information to the matched triplet proposals?Thanks a lot.

Unable to download checkpoint from drive

Hey!

We tried to automatically download the RelTR checkpoint from drive but had some issues. We did not have this issue with any of the other assets hosted on drive.

Maybe you could help us out / update the permission.

Thanks!

convert the reltr model to onnx forma

Can I convert the reltr model to onnx format, I have tried many methods but have not succeeded, do you have any workarounds

请问如何将inference.py得到的场景图保存成一个json文件呢

Some misunderstanding about the heat map using to predict Relationship

Hello, thank you very much for providing the code. However, in the paper, you said that "The predicate probability pˆ𝑝𝑟𝑑 is predicted by a multi-layer perceptron concatenating the corresponding subject representation, object representation, and spatial feature vector, which can be formulated as:
pˆ𝑝𝑟𝑑 = softmax(MLP([Q𝑠,Q𝑜,V𝑠𝑝𝑎]))."
But in the source code, I don't see this concatenating, I just see that you used sub heatmap for prediction. Could you explain it to me. Thank you very much

In RelTR model:
and in Transformer:

checkpoint should be updated with enhanced version

I found perfomance overall enhanced when https://arxiv.org/pdf/2201.11460v1.pdf -> https://arxiv.org/pdf/2201.11460.pdf but checkpoint download link provided is not updated in first commit. checkpoint link seems to be updated to enhcanced version!

Model not getting trained on single GPU

When I try to train on single GPU, the error keeps on increasing and I cannot see any good results even thill 38th epoch.

train_class_error starts from 97.88 and deom 19th to 37th epoch its consistently 100. Can you debug this?

Please let me know if you need some more information

when will you release open image train / evaluation code?

I found training, evaluation code for Open Images is not well-established.
I am looking forward to see them from your repo! If you have reference code for open images evaluation, let me know. 😄

ImportError: cannot import name '_new_empty_tensor'

Getting this error when trying to run inference.py-- is this related to version issues with python/conda? Did my best to triage the issue independently, but all I could find is that this is a known issue with pytorch that was fixed with python 3.6, however that is the version I'm running.

Appreciate you for all your work on this btw!

rel_logits vs pred_logits

Hello,

Thank you for the wonderful work.

In models/reltr.py RelTR module, you specify outputs_class_rel and outputs_class as rel_logits and pred_logits.
What is the difference between these two?

Thank you,
William Han

Evaluation on Colab failing

Hello Authors,

As instructed, we have created a conda environment (python 3.6) to run inference and evaluation of RelTR. However, we are getting some unexpected errors and are not sure how to proceed.

Any feedback would be appreciated!

Error Stackstrace:

Not using distributed mode
git:
  sha: 4c9557165e8a8d9c90ca263aa9d2be82f70c1ace, status: has uncommited changes, branch: main

Namespace(ann_path='./data/vg/', aux_loss=True, backbone='resnet50', batch_size=1, bbox_loss_coef=5, clip_max_norm=0.1, dataset='vg', dec_layers=6, device='cuda', dilation=False, dim_feedforward=2048, dist_url='env://', distributed=False, dropout=0.1, enc_layers=6, eos_coef=0.1, epochs=150, eval=True, frozen_weights=None, giou_loss_coef=2, hidden_dim=256, img_folder='data/vg/images/', lr=0.0001, lr_backbone=1e-05, lr_drop=100, nheads=8, num_entities=100, num_triplets=200, num_workers=2, output_dir='', position_embedding='sine', pre_norm=False, rel_loss_coef=1, resume='ckpt/checkpoint0149.pth', return_interm_layers=False, seed=42, set_cost_bbox=5, set_cost_class=1, set_cost_giou=2, set_iou_threshold=0.7, start_epoch=0, weight_decay=0.0001, world_size=1)
number of params: 63679528
loading annotations into memory...
Done (t=2.56s)
creating index...
index created!
loading annotations into memory...
Done (t=1.17s)
creating index...
index created!
Traceback (most recent call last):
  File "main.py", line 239, in <module>
    main(args)
  File "main.py", line 171, in main
    checkpoint = torch.load(args.resume, map_location='cpu')
  File "/usr/local/lib/python3.6/site-packages/torch/serialization.py", line 585, in load
    return _legacy_load(opened_file, map_location, pickle_module, **pickle_load_args)
  File "/usr/local/lib/python3.6/site-packages/torch/serialization.py", line 755, in _legacy_load
    magic_number = pickle_module.load(f, **pickle_load_args)
_pickle.UnpicklingError: invalid load key, '<'.

Evaluation on VRD dataset

Hi. Thanks for sharing your nice work! i was wondering how to evaluate on VRD datasets?

About evaluate_rel_batch() function

Hello, thank you very much for providing the code. I modified the evaluate_rel_batch() function, but it seems to have encountered an error. Could you please help analyze the possible reasons for this error? Thank you very much!

Can ur work train on VidOR?

The problem of RelTR Demo.ipynb

I would like to ask you a question. When I run the RelTR Demo.ipynb you published, I encountered the error as shown in fig. Also, many tensors in conv_features are zeros. Could you guide me on how to resolve this?

AttributeError: 'Namespace' object has no attribute 'dataset'

hi, i just follow readme/usage and type this conmand:
python inference.py --img_path demo/vg2.jpg --resume ckpt/checkpoint0149.pth
and i get this:

(scene_graph_benchmark) bash-4.2$ python inference.py --img_path demo/vg2.jpg --resume ckpt/checkpoint0149.pth
Namespace(aux_loss=True, backbone='resnet50', dec_layers=6, device='cuda', dilation=False, dim_feedforward=2048, dropout=0.1, enc_layers=6, hidden_dim=256, img_path='demo/vg2.jpg', lr_backbone=1e-05, nheads=8, num_entities=100, num_triplets=200, position_embedding='sine', pre_norm=False, resume='ckpt/checkpoint0149.pth', return_interm_layers=False)
yes
Traceback (most recent call last):
File "inference.py", line 191, in
main(args)
File "inference.py", line 104, in main
model = build_model(args)
File "/home/user/JL/myhome/juyterNotebook_folder/test/test_for_code/sgg_for_sgbEnv/reltr/RelTR-main/models/init.py", line 5, in build_model
return build(args)
File "/home/user/JL/myhome/juyterNotebook_folder/test/test_for_code/sgg_for_sgbEnv/reltr/RelTR-main/models/reltr.py", line 377, in build
num_classes = 151 if args.dataset != 'oi' else None #TODO: openimage v6
AttributeError: 'Namespace' object has no attribute 'dataset'

Train and Test for custom images

How did you get the annotations of Visual Genome (in COCO-format)? Do you have tools to process the original datasets? I want to use in my custom datasets(only pictures), I have seen the KaihuaTang/Scene-Graph-Benchmark.pytorch code ,but have no idea to process the my custom datasets for training and testing. Thanks very much!

How to Calcualte PredCLS and SGCls ?

Your work is great! i was wondering how to use the scene graph evaluator in the evaluate function to calculate PredCLS and SGCLS ?

Training on Visual Genome dataset

Hi. Thanks for sharing your nice work!

Do you have any pre-trained weights on full Visual genome dataset? (not split one)

Thanks.

When I was training data, I encountered an error

Hello, I am very interested in your research. I have created my own dataset (with new categories and relationships added), and the format of the dataset is the same as in your code. But when I was training data, I encountered an error, but I couldn't find the problem with my dataset format. Is there any aspect of the code that needs to be adjusted? Can you help me?

The error is as follows：

C:/w/b/windows/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:84: block: [0,0,0], thread: [61,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed.
C:/w/b/windows/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:84: block: [0,0,0], thread: [62,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed.
C:/w/b/windows/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:84: block: [0,0,0], thread: [63,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed.
Traceback (most recent call last):
File "main.py", line 240, in
main(args)
File "main.py", line 192, in main
train_stats = train_one_epoch(model, criterion, data_loader_train, optimizer, device, epoch, args.clip_max_norm)
File "E:\LXD\RelTR-main\engine.py", line 40, in train_one_epoch
loss_dict = criterion(outputs, targets)
File "E:\Anaconda3\envs\reltr\lib\site-packages\torch\nn\modules\module.py", line 722, in _call_impl
result = self.forward(*input, **kwargs)
File "E:\LXD\RelTR-main\models\reltr.py", line 299, in forward
indices = self.matcher(outputs_without_aux, targets)
File "E:\Anaconda3\envs\reltr\lib\site-packages\torch\nn\modules\module.py", line 722, in _call_impl
result = self.forward(*input, **kwargs)
File "E:\Anaconda3\envs\reltr\lib\site-packages\torch\autograd\grad_mode.py", line 15, in decorate_context
return func(*args, **kwargs)
File "E:\LXD\RelTR-main\models\matcher.py", line 112, in forward
cost_sub_giou = -generalized_box_iou(box_cxcywh_to_xyxy(sub_bbox), box_cxcywh_to_xyxy(sub_tgt_bbox))
File "E:\LXD\RelTR-main\util\box_ops.py", line 50, in generalized_box_iou
assert (boxes1[:, 2:] >= boxes1[:, :2]).all(), boxes1
RuntimeError: CUDA error: device-side assert triggered

About OpenV6

Why is the number of object classes in the OpenV6 dataset I processed 600, while it is mentioned as 289 object classes in the article? Were there any additional processing steps involved?

cannot start training

cannot import functions betweeen py files

About training strategy

Hi, I'm now training the reltr model on VG dataset and I find the training time is quite long. It takes ~2.5 days to train for 150 epochs on 4*3090 with batchsize 4. Im not sure whether I'm doing something wrong or it does need much time to train from scratch.

And I want to ask if you have tried other training strategies like multiple stage. For example, in the first stage just train the model for object detection, the in the second stage only train the triplet decoder and freeze the encoder and entity decoder(or updating with a low leaning rate). That sounds more practical and will reduce the training time in theory.

Preprocessing Code

Hi, Dear authors of RelTR:

Your work is so solid that we are so impressed. Is it possible to release the code for preprocessing? so we could reproduce the results on other datasets.

How to generate a graph from an image?

If a scene graph need to be generated by RelTR, what should be done?
Since this command only detect the objects in the image:
python inference.py --img_path ./demo/vgx.jpg --resume ./ckpt/checkpoint0149.pth

So, please kindly help to tell me how to do, thanks very much!

about output

I see that in the literature, the Scene graph is finally generated, but the output of the open source code is only the attention heatmap and the identification results on the image, right? Is there any way to generate the final integrated Scene graph?

demo问题

您好！请问在您提供的demo中如何更换其他图片呢？我更换的图片链接会报错。具体操作是什么呢？

Some details about Open Image v6.

Many thanks for the great work.
Since the code for Open Image v6 is not release, may I ask about the data split you use for training?
Do you use the train+val splits or only train split for training?

Memory Utilization Issue

While training the code on a 4 GPU system, The memory utilization suddenly exploded after 5 epochs, Thus killing the process. I was training the code on university HPC, system specification
24 Cores
128 GB RAM
4 Nvidia Quadro RTX 8000

Number of classes of relations in VG

May I kindly ask why each sample in outputs_class_rel = self.rel_class_embed(torch.cat((hs_sub, hs_obj, so_masks), dim=-1)) at models/reltr.py:111 has dim 52, instead of 51? The processed VG has 50 relation classes, so I assume the dim should be 51 with an additional no relation ('background') class.

Besides, it can be seen from models/reltr.py:213-214
Count the number of predictions that are NOT "no-object" (which is the last class)
card_pred = (pred_logits.argmax(-1) != pred_logits.shape[-1] - 1).sum(1)
that the last class represents the "no-object" class.

However, in the colab notebook, you said
REL_CLASSES = ['background', 'above', 'across', 'against', 'along', 'and', 'at', 'attached to', 'behind',
'belonging to', 'between', 'carrying', 'covered in', 'covering', 'eating', 'flying in', 'for',
'from', 'growing on', 'hanging from', 'has', 'holding', 'in', 'in front of', 'laying on',
'looking at', 'lying on', 'made of', 'mounted on', 'near', 'of', 'on', 'on back of', 'over',
'painted on', 'parked on', 'part of', 'playing', 'riding', 'says', 'sitting on', 'standing on',
'to', 'under', 'using', 'walking in', 'walking on', 'watching', 'wearing', 'wears', 'with']

I notice that REL_CLASSES has a length of 51, not 52, and 'background' is at the index 0, not at the last index.
Is this REL_CLASSES in colab the label indices you use in your training code (in data/vg/rel.json)? Because I am working on re-organizing the dataset labels for my own project, I need to know the exact ordering of these label indices. Thanks for your assistance!

name 'train_stats' is not defined

When I use single NVIDIA GeForce RTX 4090 training, the error named 'train_stats' is not defined is reported, what is the reason for this, the command I am using is: python main.py --dataset vg --img_folder data/vg/images/ --ann_path data/vg/ --batch_size 2 --output_dir ckpt

Error during training in bbox.pyx : ValueError: Buffer dtype mismatch, expected 'DTYPE_t' but got 'double'

Hi,
Thank you for your work !
I try to train the model on a customized dataset, but after the first epoch, I have the following error at the lines of sg_eval.py :
sub_iou = bbox_overlaps(gt_box[None, :4], boxes[:, :4])[0]
obj_iou = bbox_overlaps(gt_box[None, 4:], boxes[:, 4:])[0]

File "bbox.pyx", line 17, in bbox.bbox_overlaps
ValueError: Buffer dtype mismatch, expected 'DTYPE_t' but got 'double'

Have you ever had this issue ?

annotation files

hey can you share how to create annotations files for custom dataset, thanks.

About the training Code for OpenImages V6

Hi,
Can you share the training code and evaluation code for OpenImages V6 ?
：）
Thanks！