fredzzhang / spatially-conditioned-graphs Goto Github PK

Official PyTorch implementation for ICCV 2021 paper "Spatially Conditioned Graphs for Detecting Human–Object Interactions"

Home Page: https://arxiv.org/abs/2012.06060

License: BSD 3-Clause "New" or "Revised" License

Python 98.58% Shell 1.42%

human-object-interaction pytorch graphical-models iccv2021

spatially-conditioned-graphs's Introduction

Spatially Conditioned Graphs

NEW! Check out our most recent work on transformer-based HOI detection here.

This repository contains the official PyTorch implementation for ICCV 2021 paper

Frederic Z. Zhang, Dylan Campbell and Stephen Gould. Spatially Conditioned Graphs for Detecting Human-Object Interactions. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 13319-13327, October 2021.

[paper] [supp] [preprint] [video]

Citation

If you find this repository useful for your research, please kindly cite our paper:

@inproceedings{zhang2021scg,
  author    = {Frederic Z. Zhang, Dylan Campbell and Stephen Gould},
  title     = {Spatially Conditioned Graphs for Detecting Human–Object Interactions},
  booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)},
  month     = {October},
  year      = {2021},
  pages     = {13319-13327}
}

Prerequisites
Data Utilities
- HICO-DET
- V-COCO
Demonstration
Testing
- HICO-DET
- V-COCO
Training
- HICO-DET
- V-COCO
Contact

Prerequisites

Download the repository with git clone https://github.com/fredzzhang/spatially-conditioned-graphs
Install the lightweight deep learning library Pocket
Make sure the environment you created for Pocket is activated. You are good to go!

Demonstration

To generate qualitative results shown in the paper, please follow instructions in the diagnosis package at spatially-conditioned-graphs/diagnosis/.

Data Utilities

The HICO-DET and V-COCO repos have been incorporated as submodules for convenience. To download relevant data utilities, run the following commands.

cd /path/to/spatially-conditioned-graphs
git submodule init
git submodule update

HICO-DET

Download the HICO-DET dataset
1. If you have not downloaded the dataset before, run the following script
```
cd /path/to/spatially-conditioned-graphs/hicodet
bash download.sh
```
1. If you have previously downloaded the dataset, simply create a soft link
```
cd /path/to/spatially-conditioned-graphs/hicodet
ln -s /path/to/hico_20160224_det ./hico_20160224_det
```
Run a Faster R-CNN pre-trained on MS COCO to generate detections

cd /path/to/spatially-conditioned-graphs/hicodet/detections
python preprocessing.py --partition train2015
python preprocessing.py --partition test2015

Download fine-tuned detections

cd /path/to/spatially-conditioned-graphs/download
bash download_finetuned_detections.sh

Generate ground truth detections (optional)

cd /path/to/spatially-conditioned-graphs/hicodet/detections
python generate_gt_detections.py --partition test2015

V-COCO

Download the train2014 and val2014 partitions of the COCO dataset
1. If you have not downloaded the dataset before, run the following script
```
cd /path/to/spatially-conditioned-graphs/vcoco
bash download.sh
```
1. If you have previously downloaded the dataset, simply create a soft link. Note that
```
cd /path/to/spatially-conditioned-graphs/vcoco
ln -s /path/to/coco ./mscoco2014
```
Run a Faster R-CNN pre-trained on MS COCO to generate detections

cd /path/to/spatially-conditioned-graphs/vcoco/detections
python preprocessing.py --partition trainval
python preprocessing.py --partition test

Testing

HICO-DET

Download the checkpoint of our trained model

cd /path/to/spatially-conditioned-graphs/download
bash download_checkpoint.sh

Test a model

cd /path/to/spatially-conditioned-graphs
CUDA_VISIBLE_DEVICES=0 python test.py --model-path checkpoints/scg_1e-4_b32h16e7_hicodet_e2e.pt

By default, detections from a pre-trained detector is used. To change sources of detections, use the argument --detection-dir, e.g. --detection-dir hicodet/detections/test2015_gt to select ground truth detections. Fine-tuned detections (if you downloaded them) are available under hicodet/detections.

Cache detections for Matlab evaluation following HO-RCNN (optional)

cd /path/to/spatially-conditioned-graphs
CUDA_VISIBLE_DEVICES=0 python cache.py --model-path checkpoints/scg_1e-4_b32h16e7_hicodet_e2e.pt

By default, 80 .mat files, one for each object class, will be cached in a directory named matlab. Use the --cache-dir argument to change the cache directory. To change sources of detections, refer to the use of --detection-dir in the previous section.

As a reference, the performance of the provided model is shown in the table below.

Detections	Default Setting	Known Object Setting
Pre-trained on MS COCO	(`21.85`, `18.11`, `22.97`)	(`25.53`, `21.79`, `26.64`)
*~~Fine-tuned on HICO-DET (DRG)~~	~~(`31.33`, `24.72`, `33.31`)~~	~~(`34.37`, `27.18`, `36.52`)~~
Fine-tuned DETR-R101 (here)	(`29.26`, `24.61`, `30.65`)	(`32.87`, `27.89`, `34.35`)
Ground truth detections	(`51.53`, `41.02`, `54.67`)	(`51.75`, `41.40`, `54.84`)

*The detections provided by the DRG repo were produced by a Cascaded R-CNN with ResNeXt-152 backbone, which is not directly comparable to the commonly used object detectors in the literature.

V-COCO

We did not implement evaluation utilities for V-COCO, and instead use the utilities provided by Gupta. To generate the required pickle file, run the following script by correctly specifying the path to a model with --model-path

cd /path/to/spatially-conditioned-graphs
CUDA_VISIBLE_DEVICES=0 python cache.py --dataset vcoco --data-root vcoco \
    --detection-dir vcoco/detections/test \
    --cache-dir vcoco_cache --partition test \
    --model-path /path/to/a/model

This will generate a file named vcoco_results.pkl under vcoco_cache in the current directory. Please refer to the v-coco repo (not to be confused with vcoco, the submodule) for further instructions. Note that loading the pickle file requires a particular class CacheTemplate, which is shown below in its entirety.

from collections import defaultdict
class CacheTemplate(defaultdict):
    """A template for VCOCO cached results """
    def __init__(self, **kwargs):
        super().__init__()
        for k, v in kwargs.items():
            self[k] = v
    def __missing__(self, k):
        seg = k.split('_')
        # Assign zero score to missing actions
        if seg[-1] == 'agent':
            return 0.
        # Assign zero score and a tiny box to missing <action,role> pairs
        else:
            return [0., 0., .1, .1, 0.]

You can either add it into the evaluation code or save it as a seperate file to import from.

Training

HICO-DET

cd /path/to/spatially-conditioned-graphs
python main.py --world-size 8 --cache-dir checkpoints/hicodet &>log &

Specify the number of GPUs to use with the argument --world-size. The default sub-batch size is 4 (per GPU). The provided model was trained with 8 GPUs, with an effective batch size of 32. Reducing the effective batch size could result in slightly inferior performance. The default learning rate for batch size of 32 is 0.0001. As a rule of thumb, scale the learning rate proportionally when changing the batch size, e.g. 0.00005 for batch size of 16. It is recommended to redirect stdout and stderr to a file to save the training log (as indicated by &>log). To check the progress, run cat log | grep mAP, or alternatively you can go through the log with vim log. Also, the mAP logged follows a slightly different protocol. It does NOT necessarily correlate with the mAP that the community reports. It only serves as a diagnostic tool. The true performance of the model requires running a seperate test as shown in the previous section. By default, checkpoints will be saved under checkpoints in the current directory. For more arguments, run python main.py --help to find out. We follow the early stopping training strategy, and have concluded (using a validation set split from the training set) that the model at epoch 7 should be picked. Training on 8 GeForce GTX TITAN X devices takes about 5 hours.

V-COCO

cd /path/to/spatially-conditioned-graphs
python main.py --world-size 8 \
    --dataset vcoco --partitions trainval val --data-root vcoco \
    --train-detection-dir vcoco/detections/trainval \
    --val-detection-dir vcoco/detections/trainval \
    --print-interval 20 --cache-dir checkpoints/vcoco &>log &

Contact

If you have any questions regarding our paper or the repo, please post them in discussions. If you ran into issues related to the code, feel free to open an issue. Alternatively, you can contact me at [email protected]

spatially-conditioned-graphs's People

Contributors

Stargazers

Watchers

Forkers

asmiftekhar gintsuki9349 roy881020 nikanor97 hutao555 gmuraleekrishna drjinying ifmaq1 peterzhousz liviust huahouxuan jeeseung-park jkbgsu liuc-v francescogentile lybllybl

spatially-conditioned-graphs's Issues

Update README to include instructions on V-COCO dataset

NameError: name 'HICODet' is not defined

A recent commit 066bee8 removed the import of HICODet class in test.py as a follow-up on the update of dataset interface, which has resulted in an error

Add random horizontal flip as a form of data augmentation

About label generate

Hello, I am a bit confused about the calculation of the classification loss in your code. If you have time, can you give me an answer?
https://github.com/fredzzhang/spatio-attentive-graphs/blob/main/interaction_head.py#L182
In your postprocess function, you pass in the one-hot form of the prediction result of each pair and the one-hot form of the label. Next, you use a series of magic operations to convert the one-hot form of the label into one I don't quite understand things.

json file

hello, this is my first work on hoi, and can you explain the meaning of instances_train2015.json, instances_val2015.json, coco80tohico80.json, coco91tohico80.json, what they are represented?

Update checkpoint download link

Clean up recent changes and update documentation

Add instructions on usage

Installation instructions
Data utilities
Testing
Training

Remove single-GPU training script

As training on a single GPU can be achieved by setting the --world-size option to 1, the single-GPU training script is no longer maintained, and thus to be removed.

KeyError when indexing dict using Tensor

In interaction_head.py, the mapping from object classes to action/verb classes use torch.LongTensor as the index, which works for a list, but not for a dict

428        # Map object class index to target class index
429        # Object class index to target class index is a one-to-many mapping
430        target_cls_idx = [self.object_class_to_target_class[obj]
431            for obj in object_class[y]]

Training GPU+CPU Utilization Stops

I'm trying the training procedure as laid out by the README file, and ran CUDA_VISIBLE_DEVICES=0 python main.py &>log &.

It seems to run fine up until the near end of the first epoch, where the GPU and CPU utilization completely stops. This drop in utilization never recovers and makes it so that the first epoch never actually finishes.

Here is the output from my log:

Namespace(batch_size=4, cache_dir='./checkpoints', data_root='hicodet', human_thresh=0.2,
learning_rate=0.001, lr_decay=0.1, milestones=[10], model_path='', momentum=0.9, 
num_epochs=15, num_iter=2, num_workers=4, object_thresh=0.2, print_interval=2000,
random_seed=1, train_detection_dir='hicodet/detections/train2015', 
val_detection_dir='hicodet/detections/test2015', weight_decay=0.0001)
Epoch [1/15], Iter. [2000/9409], Loss: 1.3726, Time[Data/Iter.]: [3.80s/1123.74s]
Epoch [1/15], Iter. [4000/9409], Loss: 1.1580, Time[Data/Iter.]: [3.53s/1105.00s]
Epoch [1/15], Iter. [6000/9409], Loss: 1.0998, Time[Data/Iter.]: [3.52s/1102.66s]
Epoch [1/15], Iter. [8000/9409], Loss: 1.0792, Time[Data/Iter.]: [3.72s/1140.21s]

My system specs as well:
OS: Pop!_OS 20.04 LTS x86_64
CPU: AMD Ryzen 7 2700X (16) @ 3.700G
GPU: NVIDIA GeForce RTX 2070 SUPER
Memory: 16017MiB
CUDA: 10.2

Clean up the visualisation code

Implement relevant code to train and test on V-COCO

How to use the 'CacheTemplate' class when testing for V-COCO?

Hi,

Recently, I tried to test a model on V-COCO dataset. I found that there is a class 'CacheTemplate' provided in README.md, however, I don't know how to use that correctly.

I tried to add that class to the V-COCO eval file directly, but it doesn't work - the file keep loading the detection file over half an hour and nothing happened. Could you please give more detail for using that class? Thanks a lot！

Here is my evaluation code.

from vsrl_eval import VCOCOeval
from collections import defaultdict

class CacheTemplate(defaultdict):
    """A template for VCOCO cached results """
    def __init__(self, **kwargs):
        super().__init__()
        for k, v in kwargs.items():
            self[k] = v
    def __missing__(self, k):
        seg = k.split('_')
        # Assign zero score to missing actions
        if seg[-1] == 'agent':
            return 0.
        # Assign zero score and a tiny box to missing <action,role> pairs
        else:
            return [0., 0., .1, .1, 0.]


if __name__ == "__main__":
    vsrl_annot_file = "data/vcoco/vcoco_test.json"
    coco_file = "data/instances_vcoco_all_2014.json"
    split_file = "data/splits/vcoco_test.ids"
    det_file = "../SCG/vcoco_cache/vcoco_results.pkl"
    vcocoeval = VCOCOeval(vsrl_annot_file, coco_file, split_file)
    vcocoeval._do_eval(det_file, ovr_thresh=0.5)

Average out the denominator when computing focal loss

The focal loss used in the model is normalised by the number of positive logits, which tends to have unstable statistics. Therefore, it should be averaged across all sub-batches (ranks) to better utilise the large batch size.

Add group batch sampler

When input images are batched, zero-padding is applied to fill in the gaps. This results in large input when images have distinct aspect ratios. Following torchvision references, images with similar aspect ratios will be batched together as much as possible.

Allow setting detection directory via argument

Currently in test.py, cache.py and a few other scripts, the path to the directory of object detections is hard-coded. It should be added to the argument list for flexibility

Add the maximum number of human and object instances in the argument list

Add instructions for diagnostic tools

Embed dataset utility repo as a submodule

how to get the finetuned detections on VCL DRG and Yours

hello, I can't download the finetuned detections, could you give me a link ? thanks, looking forward to your reply

About mAP in your paper

The highest precision in this repo is pointing to DRG, but your issue says that your highest precision is 28.54

Type of the argument 'milestone' not correctly set

In main_dist.py, argument --milestones should be a list of integers.

TypeError: pic should be PIL Image or ndarray. Got <class 'torch.Tensor'>

Full error is as follows

Traceback (most recent call last):
  File "test.py", line 91, in <module>
    main(args)
  File "test.py", line 68, in main
    test_ap = test(net, dataloader)
  File "/home/fred/spatio-attentive-graphs/utils.py", line 128, in test
    for batch in tqdm(test_loader):
  File "/home/fred/miniconda3/envs/pocket/lib/python3.7/site-packages/tqdm/std.py", line 1167, in __iter__
    for obj in iterable:
  File "/home/fred/miniconda3/envs/pocket/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 582, in __next__
    return self._process_next_batch(batch)
  File "/home/fred/miniconda3/envs/pocket/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 608, in _process_next_batch
    raise batch.exc_type(batch.exc_msg)
TypeError: Traceback (most recent call last):
  File "/home/fred/miniconda3/envs/pocket/lib/python3.7/site-packages/torch/utils/data/_utils/worker.py", line 99, in _worker_loop
    samples = collate_fn([dataset[i] for i in batch_indices])
  File "/home/fred/miniconda3/envs/pocket/lib/python3.7/site-packages/torch/utils/data/_utils/worker.py", line 99, in <listcomp>
    samples = collate_fn([dataset[i] for i in batch_indices])
  File "/home/fred/spatio-attentive-graphs/utils.py", line 113, in __getitem__
    image = pocket.ops.to_tensor(image, 'pil')
  File "/home/fred/pkgs/pocket/pocket/ops/transforms.py", line 33, in to_tensor
    return torchvision.transforms.functional.to_tensor(x).to(
  File "/home/fred/miniconda3/envs/pocket/lib/python3.7/site-packages/torchvision/transforms/functional.py", line 50, in to_tensor
    raise TypeError('pic should be PIL Image or ndarray. Got {}'.format(type(pic)))
TypeError: pic should be PIL Image or ndarray. Got <class 'torch.Tensor'>

Inference code

@fredzzhang @nikanor97 hi thanks for sharing the code base great work

do we inference code which we can run on the custom data / custom videos and visualize the results
currently when i tested the model using test.py for some scenes present in the validation data like an only person running on a beach without any other object present there is no detections/activity in the output, is there any way i can get results like people walking , fighting, waving without depending on the object present in the scene

Thanks in advance

The bug of main.py

Dear Professor , excuse me. I have a problem that I want to get a solution from you. when I run main.py, it has a problem as follows.

And I have followed your guide step by step, but it still can not run rightly, I am very grateful to you if you give me a solution when you have time. Thank you very much!

Exception happened during the evaluation of VCOCO

Hi, I trained your method on VCOCO dataset and generated the vcoco_results.pkl. However, when I ran the utilities provided by Gupta, I got the exception below:

Could you please kindly help me handling this problem? Thanks a lot. Besides, I also attached the code for evaluation of mine:

from vsrl_eval import VCOCOeval
from collections import defaultdict

class CacheTemplate(defaultdict):
    """A template for VCOCO cached results """
    def __init__(self, **kwargs):
        super().__init__()
        for k, v in kwargs.items():
            self[k] = v
    def __missing__(self, k):
        seg = k.split('_')
        # Assign zero score to missing actions
        if seg[-1] == 'agent':
            return 0.
        # Assign zero score and a tiny box to missing <action,role> pairs
        else:
            return [0., 0., .1, .1, 0.]



if __name__ == "__main__":
    vsrl_annot_file = "data/vcoco/vcoco_val.json"
    coco_file = "data/instances_vcoco_all_2014.json"
    split_file = "data/splits/vcoco_val.ids"
    det_file = "spatio-attentive-graphs/vcoco_cache/vcoco_results.pkl"
    vcocoeval = VCOCOeval(vsrl_annot_file, coco_file, split_file)
    vcocoeval._do_eval(det_file, ovr_thresh=0.5)

Fix relative paths for the demo code

The demo script demo.py under diagnosis had incorrect paths for relative imports after the relocation. To import relevant modules, the parent directory should be added to the search path.

Running HICO Evaluation

Hi @fredzzhang, thank you for your great work!
I am trying to reproduce the reported mAPs of the pretrained COCO version, but I'm confused on how to get the HICO evaluation running.

I ran the network through test2015 successfully and generated both JSON and .mat files. I tried running the mat files with eval_run.m from HO-RCNN but got very lower mAPs:

Is there something I am missing? I would appreciate any help.

excuse me, can you provide the best model of vcoco? Thanks!

Inconsistent evaluation result

The evaluation results returned from test.py are inconsistent with the Matlab evaluation code.

How can i use the pretrained HICO model for OKVQA action detections

Hi,
I want to retrieve the action detections on OKVQA dataset. I only want to detect actions on OKVQA dataset using pretrained HICO model. Can you please guide me how do I do that?

Also, do I need object detections on OKVQA beforehand? in order to use HICO pretrained model on it?

I read #63 too. but I didn't understand how to implement it to my problem

About mAP in your paper

Hi, I found that in your paper, there is a mAP 27.18 model on HICO-DET. I wonder how this model is trained as I did not find corresponding one in this repo.

"Too many open files" when running test.py

The problem originates in the multiprocessing when computing average precisions. Refer to fredzzhang/pocket#6 for details.

Specifying the number of processes has fixed the problem partially. Now during training, the computation of classification mAP did not result in the same error, but during test the computation of detection mAP did.

Number of images hardcoded in the caching script

In cache.py, the number of images for the collected results has been hardcoded for the test set. This should be made adaptive based on the number of images for a selected partition

How to train to get the model with mAP of 28.54

Sorry to bother you but would you mind giving me more tips about how to achieve the mAP of 28.54? I download the fasterrcnn_resnet50_fpn_hicodet_e13.pt but do not know how to use it. Shall I train the network with the detections produced with this .pt file? With python preprocessing.py --partition test2015 ckpt_path='path to .pt'? (Did u forget an h after .pt?)

Thx.

Learning curve visualisation
Class-wise human-object pair visualisation
Image-wise human-object pair visualisation

How to use code to infer in my own data set? My own data set is not labeled, just want to see the actual application effect of HOI algorithm

thank you for your work!

How to use code to infer in my own data set? My own data set is not labeled, just want to see the actual application effect of HOI algorithm.

thank you very much.

fredzzhang / spatially-conditioned-graphs Goto Github PK

spatially-conditioned-graphs's Introduction

Spatially Conditioned Graphs

Citation

Table of Contents

Prerequisites

Demonstration

Data Utilities

HICO-DET

V-COCO

Testing

HICO-DET

V-COCO

Training

HICO-DET

V-COCO

Contact

spatially-conditioned-graphs's People

Contributors

Stargazers

Watchers

Forkers

spatially-conditioned-graphs's Issues

Recommend Projects

Recommend Topics

Recommend Org