Code Monkey home page Code Monkey logo

tripnet's Introduction


PyTorch TensorFlow

Tripnet: Recognition of instrument-tissue interactions in endoscopic videos via action triplets

CI Nwoye, C Gonzalez, T Yu, P Mascagni, D Mutter, J Marescaux, and N Padoy

This repository contains the implementation code, inference code, and evaluation scripts.
ArXiv paper Journal Publication

Abstract

Recognition of surgical activity is an essential component to develop context-aware decision support for the operating room. In this work, we tackle the recognition of fine-grained activities, modeled as action triplets <instrument, verb, target> representing the tool activity.

To this end, we introduce a new laparoscopic dataset, CholecT40, consisting of 40 videos from the public dataset Cholec80 in which all frames have been annotated using 128 triplet classes.

Furthermore, we present an approach to recognize these triplets directly from the video data. It relies on a module called class activation guide, which uses the instrument activation maps to guide the verb and target recognition. To model the recognition of multiple triplets in the same frame, we also propose a trainable 3D interaction space (3Dis), which captures the associations between the triplet components. Finally, we demonstrate the significance of these contributions via several ablation studies and comparisons to baselines on CholecT40.


News and Updates

  • [2023.02.20]: CholecT50 dataset is now public!
  • [2022.05.09]: TensorFlow v2 implementation code released!
  • [2022.05.09]: TensorFlow v1 implementation code released!
  • [2022.05.03]: PyTorch implementation code released!

Model Overview

The Tripnet model is composed of:

  • Feature Extraction layer: extract high and low level features from input image from a video
  • Encoder: for triplet components encoding
    • Weakly-Supervised Localization (WSL) Layer: for localizing the instruments
    • Class Activation Guide (CAG): for detecting the verbs and targets leveraging the instrument activations.
  • Decoder: for triplet assocaition due to multi-instances
    • 3D interaction space (3Dis): for learning to associate instrument-verb-target using a learning projection and for final triplet classification.

We hope this repo will help researches/engineers in the development of surgical action recognition systems. For algorithm development, we provide training data, baseline models and evaluation methods to make a level playground. For application usage, we also provide a small video demo that takes raw videos as input without any bells and whistles.


Performance

Results Table

Dataset Components AP Association AP
.. API APV APT APIV APIT APIVT
CholecT40 89.7 60.7 38.3 35.5 19.9 19.0
CholecT45 89.9 59.9 37.4 31.8 27.1 24.4
CholecT50 92.1 54.5 33.2 29.7 26.4 20.0

Installation

Requirements

The model depends on the following libraries:

  1. sklearn
  2. PIL
  3. Python >= 3.5
  4. ivtmetrics
  5. Developer's framework:
    1. For Tensorflow version 1:
      • TF >= 1.10
    2. For Tensorflow version 2:
      • TF >= 2.1
    3. For PyTorch version:
      • Pyorch >= 1.10.1
      • TorchVision >= 0.11

System Requirements:

The code has been test on Linux operating system. It runs on both CPU and GPU. Equivalence of basic OS commands such as unzip, cd, wget, etc. will be needed to run in Windows or Mac OS.


Quick Start

  • clone the git repository: git clone https://github.com/CAMMA-public/tripnet.git
  • install all the required libraries according to chosen your framework.
  • download the dataset
  • download model's weights
  • train
  • evaluate

Dataset Zoo


Data Preparation

  • All frames are resized to 256 x 448 during training and evaluation.
  • Image data are mean normalized.
  • The dataset variants are tagged in this code as follows:
    • cholect50 = CholecT50 with split used in the original paper.
    • cholect50-challenge = CholecT50 with split used in the CholecTriplet challenge.
    • cholect45-crossval = CholecT45 with official cross-val split (currently public released).
    • cholect50-crossval = CholecT50 with official cross-val split.

Evaluation Metrics

The ivtmetrics computes AP for triplet recognition. It also support the evaluation of the recognition of the triplet components.

pip install ivtmetrics

or

conda install -c nwoye ivtmetrics

Usage guide is found on pypi.org.


Running the Model

The code can be run in a trianing mode (-t) or testing mode (-e) or both (-t -e) if you want to evaluate at the end of training :


Training on CholecT45/CholecT50 Dataset

Simple training on CholecT50 dataset:

python run.py -t  --data_dir="/path/to/dataset" --dataset_variant=cholect50 --version=1

You can include more details such as epoch, batch size, cross-validation and evaluation fold, weight initialization, learning rates for all subtasks, etc.:

python3 run.py -t -e  --data_dir="/path/to/dataset" --dataset_variant=cholect45-crossval --kfold=1 --epochs=180 --batch=64 --version=2 -l 1e-2 1e-3 1e-4 --pretrain_dir='path/to/imagenet/weights'

All the flags can been seen in the run.py file. The experimental setup of the published model is contained in the paper.


Testing

python3 run.py -e --dataset_variant=cholect45-crossval --kfold 3 --batch 32 --version=1 --test_ckpt="/path/to/model-k3/weights" --data_dir="/path/to/dataset"

Training on Custom Dataset

Adding custom datasets is quite simple, what you need to do are:

  • organize your annotation files in the same format as in CholecT45 dataset.
  • final model layers can be modified to suit your task by changing the class-size (num_tool_classes, num_verb_classes, num_target_classes, num_triplet_classes) in the argparse.

Model Zoo

  • N.B. Download links to models' weights will not be provided until after the CholecTriplet2022 challenge.

PyTorch

Network Base Resolution Dataset Data split Link
Tripnet ResNet-18 Low CholecT50 RDV Download
Tripnet ResNet-18 High CholecT50 RDV [Download]
Tripnet ResNet-18 Low CholecT50 Challenge Download
Tripnet ResNet-18 Low CholecT50 crossval k1 Download
Tripnet ResNet-18 Low CholecT50 crossval k2 Download
Tripnet ResNet-18 Low CholecT50 crossval k3 Download
Tripnet ResNet-18 Low CholecT50 crossval k4 Download
Tripnet ResNet-18 Low CholecT50 crossval k5 Download
Tripnet ResNet-18 Low CholecT45 crossval k1 Download
Tripnet ResNet-18 Low CholecT45 crossval k2 Download
Tripnet ResNet-18 Low CholecT45 crossval k3 Download
Tripnet ResNet-18 Low CholecT45 crossval k4 Download
Tripnet ResNet-18 Low CholecT45 crossval k5 Download

TensorFlow v1

Network Base Resolution Dataset Data split Link
Tripnet ResNet-18 High CholecT50 RDV [Download]
Tripnet ResNet-18 High CholecT50 Challenge [Download]


TensorFlow v2

Network Base Resolution Dataset Data split Link
Tripnet ResNet-18 High CholecT50 RDV [Download]
Tripnet ResNet-18 Low CholecT50 RDV [Download]
Tripnet ResNet-18 High CholecT50 Challenge [Download]

Baseline and Ablation Models

TensorFlow v1

Model APi APiv APit APIVT Link
Naive CNN 27.5 7.5 6.8 5.9 [Download]
MTL baseline 74.6 14.0 7.2 6.4 [Download]
Tripnet w/o CAG 89.5 20.6 12.1 12.1 [Download]
Tripnet w/c untrained 3Dis 89.7 16.7 7.6 6.3 [Download]

Models are being re-trained and weights are released periodically.



License

This code, models, and datasets are available for non-commercial scientific research purposes provided by CC BY-NC-SA 4.0 LICENSE attached as LICENSE file. By downloading and using this code you agree to the terms in the LICENSE. Third-party codes are subject to their respective licenses.


Related Resources

  • CholecT45 / CholecT50 Datasets Download dataset GitHub
  • Offical Dataset Splits Official dataset split
  • Rendezvous Read on ArXiv Journal Publication GitHub
  • Attention Tripnet ArXiv paper GitHub
  • CholecTriplet2021 Challenge Challenge website ArXiv paper GitHub
  • CholecTriplet2022 Challenge Challenge website GitHub

Citation

If you find this repo useful in your project or research, please consider citing the relevant publications:

  • For the Tripnet and Baseline Models or any code from this repo:
@inproceedings{nwoye2020recognition,
   title={Recognition of instrument-tissue interactions in endoscopic videos via action triplets},
   author={Nwoye, Chinedu Innocent and Gonzalez, Cristians and Yu, Tong and Mascagni, Pietro and Mutter, Didier and Marescaux, Jacques and Padoy, Nicolas},
   booktitle={International Conference on Medical Image Computing and Computer-Assisted Intervention (MICCAI)},
   pages={364--374},
   year={2020},
   organization={Springer}
}
  • For the CholecT45/CholecT50 Dataset:
@article{nwoye2021rendezvous,
  title={Rendezvous: Attention Mechanisms for the Recognition of Surgical Action Triplets in Endoscopic Videos},
  author={Nwoye, Chinedu Innocent and Yu, Tong and Gonzalez, Cristians and Seeliger, Barbara and Mascagni, Pietro and Mutter, Didier and Marescaux, Jacques and Padoy, Nicolas},
  journal={Medical Image Analysis},
  volume={78},
  pages={102433},
  year={2022}
}
  • For the CholecT45/CholecT50 Official Dataset Splits:
@article{nwoye2022data,
  title={Data Splits and Metrics for Benchmarking Methods on Surgical Action Triplet Datasets},
  author={Nwoye, Chinedu Innocent and Padoy, Nicolas},
  journal={arXiv preprint arXiv:2204.05235},
  year={2022}
}
  • For the Rendezvous or Attention Tripnet Baseline Models or any snippet of code from this repo:
@article{nwoye2021rendezvous,
  title={Rendezvous: Attention Mechanisms for the Recognition of Surgical Action Triplets in Endoscopic Videos},
  author={Nwoye, Chinedu Innocent and Yu, Tong and Gonzalez, Cristians and Seeliger, Barbara and Mascagni, Pietro and Mutter, Didier and Marescaux, Jacques and Padoy, Nicolas},
  journal={Medical Image Analysis},
  volume={78},
  pages={102433},
  year={2022}
}
  • For the models presented @ CholecTriplet2021 Challenge:
@article{nwoye2022cholectriplet2021,
  title={CholecTriplet2021: a benchmark challenge for surgical action triplet recognition},
  author={Nwoye, Chinedu Innocent and Alapatt, Deepak and Vardazaryan, Armine ... Gonzalez, Cristians and Padoy, Nicolas},
  journal={arXiv preprint arXiv:2204.04746},
  year={2022}
}

This repo is maintained by CAMMA. Comments and suggestions on models are welcomed. Check this page for updates.

tripnet's People

Contributors

nwoyecid avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

tripnet's Issues

Potential issue in pytorch > dataloader.py > class CholecT50

Hello!

In "https://github.com/CAMMA-public/tripnet/blob/main/pytorch/dataloader.py > class CholecT50 >(init function)>self.augmentations ":
there are two "contrast" keys in the dictionary. It seems to me that the first one will be overwritten, may I ask if they meant to have different key names or is there any reason about reassigning the value?

I have also checked https://github.com/CAMMA-public/cholect45/blob/main/dataloader_pth.py which has the same "self.augmentations" dictionary (2 "contrast" keys), so would really appreciate your help for clearing up my confusion.

Many thanks in advance!

some questions about the parameters for obtaining the model performance

Hello,

I am trying to obtain the same model performance in the paper, but I found some parameters questions!
the parameters I set is under there:

--train
--test
--data_dir "path to CholecT45"
--data_variant cholect45-crossval
--kfold 1
--epochs 80
-l 1e-2 1e-3 1e-4

other parameters I did not change , keep the same in the code, they are:

--warmups 9 18 58
--weight_decay 1e-5
--decay_steps 10
--decay_rate 0.99
--momentum 0.95
--power 0.1

I did not change these parameters, because I can't find any imformation in the paper.
I also noticed that the paper mentioned "The Resnet-18 backbone is pretrained on Imagenet."
So I wonder wheather need pretrain_dir parameter when I run the code, and where can I find the pretrain model. (So my pretrain_dir parameter is null)
The above parameters are the ones I use to run the code.
The final test results is:

Mean AP I V T IV IT IVT
my 0.7966 0.5315 0.3379 0.1360 0.1064 0.0735
resultsTable T45 89.9 59.9 37.4 31.8 27.1 24.4

The difference between the results I got and yours is shown in the table. These are the dilemmas I'm facing, I don't know how to go about modifying these parameters to get the same model results as your work.
Especially the parameters batch, epochs, kfold, pretrain_dir, initial_learning_rates, it might be my personal problem, I didn't find any introduction of settings related to these parameters in the paper.
Initial_learning_rates I refer to the paper, now I feel it is [1e-3,1e-3,1e-5], I don't know if the understanding is correct.
Looking forward to your corrections and guidance!

Question about mAP.compute_video_AP() function

Hi,

I would like to ask what is the difference between mAP_i = mAP.compute_video_AP('i', ignore_null=set_chlg_eval) and mAP_i = mAPi.compute_video_AP(ignore_null=set_chlg_eval) within the test.

In the final evaluation, I set set_chlg_eval = False to ignore the 6 null triplet classes, but the two expressions above still return different mAPs. So why would the results be different if I were to use mAP.computer_video_AP('i', ignore_null=set_chlg_eval) to calculate mAP_i when set_chlg_eval is false?

And mAP_iv and mAP_it can still use mAP.compute_video_AP to decompose and get their mAP results no matter set_chlg_eval is true or false.

Thanks for your help.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.