Code Monkey home page Code Monkey logo

spatially-conditioned-graphs's Introduction

Spatially Conditioned Graphs

NEW! Check out our most recent work on transformer-based HOI detection here.

graph

multibranch_fusion

This repository contains the official PyTorch implementation for ICCV 2021 paper

Frederic Z. Zhang, Dylan Campbell and Stephen Gould. Spatially Conditioned Graphs for Detecting Human-Object Interactions. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 13319-13327, October 2021.

[paper] [supp] [preprint] [video]

Citation

If you find this repository useful for your research, please kindly cite our paper:

@inproceedings{zhang2021scg,
  author    = {Frederic Z. Zhang, Dylan Campbell and Stephen Gould},
  title     = {Spatially Conditioned Graphs for Detecting Human–Object Interactions},
  booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)},
  month     = {October},
  year      = {2021},
  pages     = {13319-13327}
}

Table of Contents

Prerequisites

  1. Download the repository with git clone https://github.com/fredzzhang/spatially-conditioned-graphs
  2. Install the lightweight deep learning library Pocket
  3. Make sure the environment you created for Pocket is activated. You are good to go!

Demonstration

To generate qualitative results shown in the paper, please follow instructions in the diagnosis package at spatially-conditioned-graphs/diagnosis/.

Data Utilities

The HICO-DET and V-COCO repos have been incorporated as submodules for convenience. To download relevant data utilities, run the following commands.

cd /path/to/spatially-conditioned-graphs
git submodule init
git submodule update

HICO-DET

  1. Download the HICO-DET dataset
    1. If you have not downloaded the dataset before, run the following script
    cd /path/to/spatially-conditioned-graphs/hicodet
    bash download.sh
    1. If you have previously downloaded the dataset, simply create a soft link
    cd /path/to/spatially-conditioned-graphs/hicodet
    ln -s /path/to/hico_20160224_det ./hico_20160224_det
  2. Run a Faster R-CNN pre-trained on MS COCO to generate detections
cd /path/to/spatially-conditioned-graphs/hicodet/detections
python preprocessing.py --partition train2015
python preprocessing.py --partition test2015
  1. Download fine-tuned detections
cd /path/to/spatially-conditioned-graphs/download
bash download_finetuned_detections.sh
  1. Generate ground truth detections (optional)
cd /path/to/spatially-conditioned-graphs/hicodet/detections
python generate_gt_detections.py --partition test2015 

V-COCO

  1. Download the train2014 and val2014 partitions of the COCO dataset
    1. If you have not downloaded the dataset before, run the following script
    cd /path/to/spatially-conditioned-graphs/vcoco
    bash download.sh
    1. If you have previously downloaded the dataset, simply create a soft link. Note that
    cd /path/to/spatially-conditioned-graphs/vcoco
    ln -s /path/to/coco ./mscoco2014
  2. Run a Faster R-CNN pre-trained on MS COCO to generate detections
cd /path/to/spatially-conditioned-graphs/vcoco/detections
python preprocessing.py --partition trainval
python preprocessing.py --partition test

Testing

HICO-DET

  1. Download the checkpoint of our trained model
cd /path/to/spatially-conditioned-graphs/download
bash download_checkpoint.sh
  1. Test a model
cd /path/to/spatially-conditioned-graphs
CUDA_VISIBLE_DEVICES=0 python test.py --model-path checkpoints/scg_1e-4_b32h16e7_hicodet_e2e.pt

By default, detections from a pre-trained detector is used. To change sources of detections, use the argument --detection-dir, e.g. --detection-dir hicodet/detections/test2015_gt to select ground truth detections. Fine-tuned detections (if you downloaded them) are available under hicodet/detections.

  1. Cache detections for Matlab evaluation following HO-RCNN (optional)
cd /path/to/spatially-conditioned-graphs
CUDA_VISIBLE_DEVICES=0 python cache.py --model-path checkpoints/scg_1e-4_b32h16e7_hicodet_e2e.pt

By default, 80 .mat files, one for each object class, will be cached in a directory named matlab. Use the --cache-dir argument to change the cache directory. To change sources of detections, refer to the use of --detection-dir in the previous section.

As a reference, the performance of the provided model is shown in the table below.

Detections Default Setting Known Object Setting
Pre-trained on MS COCO (21.85, 18.11, 22.97) (25.53, 21.79, 26.64)
*Fine-tuned on HICO-DET (DRG) (31.33, 24.72, 33.31) (34.37, 27.18, 36.52)
Fine-tuned DETR-R101 (here) (29.26, 24.61, 30.65) (32.87, 27.89, 34.35)
Ground truth detections (51.53, 41.02, 54.67) (51.75, 41.40, 54.84)

*The detections provided by the DRG repo were produced by a Cascaded R-CNN with ResNeXt-152 backbone, which is not directly comparable to the commonly used object detectors in the literature.

V-COCO

We did not implement evaluation utilities for V-COCO, and instead use the utilities provided by Gupta. To generate the required pickle file, run the following script by correctly specifying the path to a model with --model-path

cd /path/to/spatially-conditioned-graphs
CUDA_VISIBLE_DEVICES=0 python cache.py --dataset vcoco --data-root vcoco \
    --detection-dir vcoco/detections/test \
    --cache-dir vcoco_cache --partition test \
    --model-path /path/to/a/model

This will generate a file named vcoco_results.pkl under vcoco_cache in the current directory. Please refer to the v-coco repo (not to be confused with vcoco, the submodule) for further instructions. Note that loading the pickle file requires a particular class CacheTemplate, which is shown below in its entirety.

from collections import defaultdict
class CacheTemplate(defaultdict):
    """A template for VCOCO cached results """
    def __init__(self, **kwargs):
        super().__init__()
        for k, v in kwargs.items():
            self[k] = v
    def __missing__(self, k):
        seg = k.split('_')
        # Assign zero score to missing actions
        if seg[-1] == 'agent':
            return 0.
        # Assign zero score and a tiny box to missing <action,role> pairs
        else:
            return [0., 0., .1, .1, 0.]

You can either add it into the evaluation code or save it as a seperate file to import from.

Training

HICO-DET

cd /path/to/spatially-conditioned-graphs
python main.py --world-size 8 --cache-dir checkpoints/hicodet &>log &

Specify the number of GPUs to use with the argument --world-size. The default sub-batch size is 4 (per GPU). The provided model was trained with 8 GPUs, with an effective batch size of 32. Reducing the effective batch size could result in slightly inferior performance. The default learning rate for batch size of 32 is 0.0001. As a rule of thumb, scale the learning rate proportionally when changing the batch size, e.g. 0.00005 for batch size of 16. It is recommended to redirect stdout and stderr to a file to save the training log (as indicated by &>log). To check the progress, run cat log | grep mAP, or alternatively you can go through the log with vim log. Also, the mAP logged follows a slightly different protocol. It does NOT necessarily correlate with the mAP that the community reports. It only serves as a diagnostic tool. The true performance of the model requires running a seperate test as shown in the previous section. By default, checkpoints will be saved under checkpoints in the current directory. For more arguments, run python main.py --help to find out. We follow the early stopping training strategy, and have concluded (using a validation set split from the training set) that the model at epoch 7 should be picked. Training on 8 GeForce GTX TITAN X devices takes about 5 hours.

V-COCO

cd /path/to/spatially-conditioned-graphs
python main.py --world-size 8 \
    --dataset vcoco --partitions trainval val --data-root vcoco \
    --train-detection-dir vcoco/detections/trainval \
    --val-detection-dir vcoco/detections/trainval \
    --print-interval 20 --cache-dir checkpoints/vcoco &>log &

Contact

If you have any questions regarding our paper or the repo, please post them in discussions. If you ran into issues related to the code, feel free to open an issue. Alternatively, you can contact me at [email protected]

spatially-conditioned-graphs's People

Contributors

fredzzhang avatar nikanor97 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

spatially-conditioned-graphs's Issues

About label generate

Hello, I am a bit confused about the calculation of the classification loss in your code. If you have time, can you give me an answer?
https://github.com/fredzzhang/spatio-attentive-graphs/blob/main/interaction_head.py#L182
In your postprocess function, you pass in the one-hot form of the prediction result of each pair and the one-hot form of the label. Next, you use a series of magic operations to convert the one-hot form of the label into one I don't quite understand things.

json file

hello, this is my first work on hoi, and can you explain the meaning of instances_train2015.json, instances_val2015.json, coco80tohico80.json, coco91tohico80.json, what they are represented?

Remove single-GPU training script

As training on a single GPU can be achieved by setting the --world-size option to 1, the single-GPU training script is no longer maintained, and thus to be removed.

KeyError when indexing dict using Tensor

In interaction_head.py, the mapping from object classes to action/verb classes use torch.LongTensor as the index, which works for a list, but not for a dict

428        # Map object class index to target class index
429        # Object class index to target class index is a one-to-many mapping
430        target_cls_idx = [self.object_class_to_target_class[obj]
431            for obj in object_class[y]]

Training GPU+CPU Utilization Stops

I'm trying the training procedure as laid out by the README file, and ran CUDA_VISIBLE_DEVICES=0 python main.py &>log &.

It seems to run fine up until the near end of the first epoch, where the GPU and CPU utilization completely stops. This drop in utilization never recovers and makes it so that the first epoch never actually finishes.

Here is the output from my log:

Namespace(batch_size=4, cache_dir='./checkpoints', data_root='hicodet', human_thresh=0.2,
learning_rate=0.001, lr_decay=0.1, milestones=[10], model_path='', momentum=0.9, 
num_epochs=15, num_iter=2, num_workers=4, object_thresh=0.2, print_interval=2000,
random_seed=1, train_detection_dir='hicodet/detections/train2015', 
val_detection_dir='hicodet/detections/test2015', weight_decay=0.0001)
Epoch [1/15], Iter. [2000/9409], Loss: 1.3726, Time[Data/Iter.]: [3.80s/1123.74s]
Epoch [1/15], Iter. [4000/9409], Loss: 1.1580, Time[Data/Iter.]: [3.53s/1105.00s]
Epoch [1/15], Iter. [6000/9409], Loss: 1.0998, Time[Data/Iter.]: [3.52s/1102.66s]
Epoch [1/15], Iter. [8000/9409], Loss: 1.0792, Time[Data/Iter.]: [3.72s/1140.21s]

My system specs as well:
OS: Pop!_OS 20.04 LTS x86_64
CPU: AMD Ryzen 7 2700X (16) @ 3.700G
GPU: NVIDIA GeForce RTX 2070 SUPER
Memory: 16017MiB
CUDA: 10.2

How to use the 'CacheTemplate' class when testing for V-COCO?

Hi,

Recently, I tried to test a model on V-COCO dataset. I found that there is a class 'CacheTemplate' provided in README.md, however, I don't know how to use that correctly.

I tried to add that class to the V-COCO eval file directly, but it doesn't work - the file keep loading the detection file over half an hour and nothing happened. Could you please give more detail for using that class? Thanks a lot!

Here is my evaluation code.

from vsrl_eval import VCOCOeval
from collections import defaultdict

class CacheTemplate(defaultdict):
    """A template for VCOCO cached results """
    def __init__(self, **kwargs):
        super().__init__()
        for k, v in kwargs.items():
            self[k] = v
    def __missing__(self, k):
        seg = k.split('_')
        # Assign zero score to missing actions
        if seg[-1] == 'agent':
            return 0.
        # Assign zero score and a tiny box to missing <action,role> pairs
        else:
            return [0., 0., .1, .1, 0.]


if __name__ == "__main__":
    vsrl_annot_file = "data/vcoco/vcoco_test.json"
    coco_file = "data/instances_vcoco_all_2014.json"
    split_file = "data/splits/vcoco_test.ids"
    det_file = "../SCG/vcoco_cache/vcoco_results.pkl"
    vcocoeval = VCOCOeval(vsrl_annot_file, coco_file, split_file)
    vcocoeval._do_eval(det_file, ovr_thresh=0.5)

Average out the denominator when computing focal loss

The focal loss used in the model is normalised by the number of positive logits, which tends to have unstable statistics. Therefore, it should be averaged across all sub-batches (ranks) to better utilise the large batch size.

Add group batch sampler

When input images are batched, zero-padding is applied to fill in the gaps. This results in large input when images have distinct aspect ratios. Following torchvision references, images with similar aspect ratios will be batched together as much as possible.

About mAP in your paper

The highest precision in this repo is pointing to DRG, but your issue says that your highest precision is 28.54
image
image

TypeError: pic should be PIL Image or ndarray. Got <class 'torch.Tensor'>

Full error is as follows

Traceback (most recent call last):
  File "test.py", line 91, in <module>
    main(args)
  File "test.py", line 68, in main
    test_ap = test(net, dataloader)
  File "/home/fred/spatio-attentive-graphs/utils.py", line 128, in test
    for batch in tqdm(test_loader):
  File "/home/fred/miniconda3/envs/pocket/lib/python3.7/site-packages/tqdm/std.py", line 1167, in __iter__
    for obj in iterable:
  File "/home/fred/miniconda3/envs/pocket/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 582, in __next__
    return self._process_next_batch(batch)
  File "/home/fred/miniconda3/envs/pocket/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 608, in _process_next_batch
    raise batch.exc_type(batch.exc_msg)
TypeError: Traceback (most recent call last):
  File "/home/fred/miniconda3/envs/pocket/lib/python3.7/site-packages/torch/utils/data/_utils/worker.py", line 99, in _worker_loop
    samples = collate_fn([dataset[i] for i in batch_indices])
  File "/home/fred/miniconda3/envs/pocket/lib/python3.7/site-packages/torch/utils/data/_utils/worker.py", line 99, in <listcomp>
    samples = collate_fn([dataset[i] for i in batch_indices])
  File "/home/fred/spatio-attentive-graphs/utils.py", line 113, in __getitem__
    image = pocket.ops.to_tensor(image, 'pil')
  File "/home/fred/pkgs/pocket/pocket/ops/transforms.py", line 33, in to_tensor
    return torchvision.transforms.functional.to_tensor(x).to(
  File "/home/fred/miniconda3/envs/pocket/lib/python3.7/site-packages/torchvision/transforms/functional.py", line 50, in to_tensor
    raise TypeError('pic should be PIL Image or ndarray. Got {}'.format(type(pic)))
TypeError: pic should be PIL Image or ndarray. Got <class 'torch.Tensor'>

Inference code

@fredzzhang @nikanor97 hi thanks for sharing the code base great work

  1. do we inference code which we can run on the custom data / custom videos and visualize the results
  2. currently when i tested the model using test.py for some scenes present in the validation data like an only person running on a beach without any other object present there is no detections/activity in the output, is there any way i can get results like people walking , fighting, waving without depending on the object present in the scene

Thanks in advance

The bug of main.py

Dear Professor , excuse me. I have a problem that I want to get a solution from you. when I run main.py, it has a problem as follows.
image

And I have followed your guide step by step, but it still can not run rightly, I am very grateful to you if you give me a solution when you have time. Thank you very much!

Exception happened during the evaluation of VCOCO

Hi, I trained your method on VCOCO dataset and generated the vcoco_results.pkl. However, when I ran the utilities provided by Gupta, I got the exception below:
image

Could you please kindly help me handling this problem? Thanks a lot. Besides, I also attached the code for evaluation of mine:

from vsrl_eval import VCOCOeval
from collections import defaultdict

class CacheTemplate(defaultdict):
    """A template for VCOCO cached results """
    def __init__(self, **kwargs):
        super().__init__()
        for k, v in kwargs.items():
            self[k] = v
    def __missing__(self, k):
        seg = k.split('_')
        # Assign zero score to missing actions
        if seg[-1] == 'agent':
            return 0.
        # Assign zero score and a tiny box to missing <action,role> pairs
        else:
            return [0., 0., .1, .1, 0.]



if __name__ == "__main__":
    vsrl_annot_file = "data/vcoco/vcoco_val.json"
    coco_file = "data/instances_vcoco_all_2014.json"
    split_file = "data/splits/vcoco_val.ids"
    det_file = "spatio-attentive-graphs/vcoco_cache/vcoco_results.pkl"
    vcocoeval = VCOCOeval(vsrl_annot_file, coco_file, split_file)
    vcocoeval._do_eval(det_file, ovr_thresh=0.5)

Fix relative paths for the demo code

The demo script demo.py under diagnosis had incorrect paths for relative imports after the relocation. To import relevant modules, the parent directory should be added to the search path.

Running HICO Evaluation

Hi @fredzzhang, thank you for your great work!
I am trying to reproduce the reported mAPs of the pretrained COCO version, but I'm confused on how to get the HICO evaluation running.

I ran the network through test2015 successfully and generated both JSON and .mat files. I tried running the mat files with eval_run.m from HO-RCNN but got very lower mAPs:

image
image

Is there something I am missing? I would appreciate any help.

How can i use the pretrained HICO model for OKVQA action detections

Hi,
I want to retrieve the action detections on OKVQA dataset. I only want to detect actions on OKVQA dataset using pretrained HICO model. Can you please guide me how do I do that?

Also, do I need object detections on OKVQA beforehand? in order to use HICO pretrained model on it?

I read #63 too. but I didn't understand how to implement it to my problem

About mAP in your paper

Hi, I found that in your paper, there is a mAP 27.18 model on HICO-DET. I wonder how this model is trained as I did not find corresponding one in this repo.

"Too many open files" when running test.py

The problem originates in the multiprocessing when computing average precisions. Refer to fredzzhang/pocket#6 for details.

Specifying the number of processes has fixed the problem partially. Now during training, the computation of classification mAP did not result in the same error, but during test the computation of detection mAP did.

How to train to get the model with mAP of 28.54

Sorry to bother you but would you mind giving me more tips about how to achieve the mAP of 28.54? I download the fasterrcnn_resnet50_fpn_hicodet_e13.pt but do not know how to use it. Shall I train the network with the detections produced with this .pt file? With python preprocessing.py --partition test2015 ckpt_path='path to .pt'? (Did u forget an h after .pt?)

Thx.

multi-GPU finetune fatser rcnn

When I use your detector finetune script in a multi-GPU situation, the following problems will occur. Do you have any good solutions to this problem?

RuntimeError: expected device cuda:1 and dtype Float but got device cuda:0 and dtype Float                                                                 

About fine-tuned detector

Hi, I wonder after fine-tuning the object detector on HICO-DET, whether you retrain your hoi classification model or just replace the object detector at the test time.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.