sovit-123 / fasterrcnn-pytorch-training-pipeline Goto Github PK

PyTorch Faster R-CNN Object Detection on Custom Dataset

License: MIT License

Python 6.41% Jupyter Notebook 93.59%

object-detection pytorch faster-rcnn mobilenet-fasterrcnn deep-learning fasterrcnn darknet-fasterrcnn efficientnet-fasterrcnn resnet50-fasterrcnn squeezenet-fasterrcnn

fasterrcnn-pytorch-training-pipeline's Introduction

A Simple Pipeline to Train PyTorch FasterRCNN Model

Train PyTorch FasterRCNN models easily on any custom dataset. Choose between official PyTorch models trained on COCO dataset, or choose any backbone from Torchvision classification models, or even write your own custom backbones.

You can run a Faster RCNN model with Mini Darknet backbone and Mini Detection Head at more than 150 FPS on an RTX 3080.

Get Started

Find blog posts/tutorials on DebuggerCafe

Check All Updates Here

Latest Update

Filter classes to visualize during inference using the --classes command line argument with space separated class indices from the dataset YAML file.

For example, to visualize only persons in COCO dataset, use, python inference.py --classes 1 <rest of the command>

To visualize person and car, use, python inference.py --classes 1 3 <rest of the command>
Added Deep SORT Real-Time tracking to inference_video.py and onnx_video_inference.py. Using --track command with the usual inference command. Support for MobileNet Re-ID for now.

Custom Model Naming Conventions

For this repository:

Small head refers to 512 representation size in the Faster RCNN head and predictor.
Tiny head refers to 256 representation size in the Faster RCNN head and predictor.
Nano head refers to 128 representation size in the Faster RCNN head and predictor.

Check All Available Model Flags

Setup on Ubuntu

Clone the repository.

git clone https://github.com/sovit-123/fastercnn-pytorch-training-pipeline.git

Install requirements.
1. Method 1: If you have CUDA and cuDNN set up already, do this in your environment of choice.
```
pip install -r requirements.txt
```
2. Method 2: If you want to install PyTorch with CUDA Toolkit in your environment of choice.
```
conda install pytorch torchvision torchaudio cudatoolkit=10.2 -c pytorch
```
  OR
```
conda install pytorch torchvision torchaudio cudatoolkit=11.3 -c pytorch
```
  OR install the version with CUDA support as per your choice from here.
  
  Then install the remaining requirements.

Setup on Windows

First you need to install Microsoft Visual Studio from here. Sing In/Sing Up by clicking on this link and download the Visual Studio Community 2017 edition.

Install with all the default chosen settings. It should be around 6 GB. Mainly, we need the C++ Build Tools.

Then install the proper pycocotools for Windows.

pip install git+https://github.com/gautamchitnis/cocoapi.git@cocodataset-master#subdirectory=PythonAPI

Clone the repository.

git clone https://github.com/sovit-123/fastercnn-pytorch-training-pipeline.git

Install PyTorch with CUDA support.
```
conda install pytorch torchvision torchaudio cudatoolkit=10.2 -c pytorch
```
OR
```
conda install pytorch torchvision torchaudio cudatoolkit=11.3 -c pytorch
```
OR install the version with CUDA support as per your choice from here.

Then install the remaining requirements except for pycocotools.

Train on Custom Dataset

Taking an exmaple of the smoke dataset from Kaggle. Let's say that the dataset is in the data/smoke_pascal_voc directory in the following format. And the smoke.yaml is in the data_configs directory. Assuming, we store the smoke data in the data directory

├── data
│   ├── smoke_pascal_voc
│   │   ├── archive
│   │   │   ├── train
│   │   │   └── valid
│   └── README.md
├── data_configs
│   └── smoke.yaml
├── models
│   ├── create_fasterrcnn_model.py
│   ...
│   └── __init__.py
├── outputs
│   ├── inference
│   └── training
│       ...
├── readme_images
│   ...
├── torch_utils
│   ├── coco_eval.py
│   ...
├── utils
│   ├── annotations.py
│   ...
├── datasets.py
├── inference.py
├── inference_video.py
├── __init__.py
├── README.md
├── requirements.txt
└── train.py

The content of the smoke.yaml should be the following:

# Images and labels direcotry should be relative to train.py
TRAIN_DIR_IMAGES: ../../xml_od_data/smoke_pascal_voc/archive/train/images
TRAIN_DIR_LABELS: ../../xml_od_data/smoke_pascal_voc/archive/train/annotations
# VALID_DIR should be relative to train.py
VALID_DIR_IMAGES: ../../xml_od_data/smoke_pascal_voc/archive/valid/images
VALID_DIR_LABELS: ../../xml_od_data/smoke_pascal_voc/archive/valid/annotations

# Class names.
CLASSES: [
    '__background__',
    'smoke'
]

# Number of classes (object classes + 1 for background class in Faster RCNN).
NC: 2

# Whether to save the predictions of the validation set while training.
SAVE_VALID_PREDICTION_IMAGES: True

Note that the data and annotations can be in the same directory as well. In that case, the TRAIN_DIR_IMAGES and TRAIN_DIR_LABELS will save the same path. Similarly for VALID images and labels. The datasets.py will take care of that.

Next, to start the training, you can use the following command.

Command format:

python train.py --data <path to the data config YAML file> --epochs 100 --model <model name (defaults to fasterrcnn_resnet50)> --name <folder name inside output/training/> --batch 16

In this case, the exact command would be:

python train.py --data data_configs/smoke.yaml --epochs 100 --model fasterrcnn_resnet50_fpn --name smoke_training --batch 16

The terimal output should be similar to the following:

Number of training samples: 665
Number of validation samples: 72

3,191,405 total parameters.
3,191,405 training parameters.
Epoch     0: adjusting learning rate of group 0 to 1.0000e-03.
Epoch: [0]  [ 0/84]  eta: 0:02:17  lr: 0.000013  loss: 1.6518 (1.6518)  time: 1.6422  data: 0.2176  max mem: 1525
Epoch: [0]  [83/84]  eta: 0:00:00  lr: 0.001000  loss: 1.6540 (1.8020)  time: 0.0769  data: 0.0077  max mem: 1548
Epoch: [0] Total time: 0:00:08 (0.0984 s / it)
creating index...
index created!
Test:  [0/9]  eta: 0:00:02  model_time: 0.0928 (0.0928)  evaluator_time: 0.0245 (0.0245)  time: 0.2972  data: 0.1534  max mem: 1548
Test:  [8/9]  eta: 0:00:00  model_time: 0.0318 (0.0933)  evaluator_time: 0.0237 (0.0238)  time: 0.1652  data: 0.0239  max mem: 1548
Test: Total time: 0:00:01 (0.1691 s / it)
Averaged stats: model_time: 0.0318 (0.0933)  evaluator_time: 0.0237 (0.0238)
Accumulating evaluation results...
DONE (t=0.03s).
IoU metric: bbox
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.001
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.002
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.000
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.000
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.001
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.009
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.007
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.029
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.074
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.028
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.088
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.167
SAVING PLOTS COMPLETE...
...
Epoch: [4]  [ 0/84]  eta: 0:00:20  lr: 0.001000  loss: 0.9575 (0.9575)  time: 0.2461  data: 0.1662  max mem: 1548
Epoch: [4]  [83/84]  eta: 0:00:00  lr: 0.001000  loss: 1.1325 (1.1624)  time: 0.0762  data: 0.0078  max mem: 1548
Epoch: [4] Total time: 0:00:06 (0.0801 s / it)
creating index...
index created!
Test:  [0/9]  eta: 0:00:02  model_time: 0.0369 (0.0369)  evaluator_time: 0.0237 (0.0237)  time: 0.2494  data: 0.1581  max mem: 1548
Test:  [8/9]  eta: 0:00:00  model_time: 0.0323 (0.0330)  evaluator_time: 0.0226 (0.0227)  time: 0.1076  data: 0.0271  max mem: 1548
Test: Total time: 0:00:01 (0.1116 s / it)
Averaged stats: model_time: 0.0323 (0.0330)  evaluator_time: 0.0226 (0.0227)
Accumulating evaluation results...
DONE (t=0.03s).
IoU metric: bbox
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.137
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.313
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.118
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.029
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.175
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.428
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.204
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.306
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.347
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.140
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.424
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.683
SAVING PLOTS COMPLETE...

Distributed Training

Training on 2 GPUs:

export CUDA_VISIBLE_DEVICES=0,1
python -m torch.distributed.launch --nproc_per_node=2 --use_env train.py --data data_configs/smoke.yaml --epochs 100 --model fasterrcnn_resnet50_fpn --name smoke_training --batch 16

Inference

Image Inference on COCO Pretrained Model

By default using Faster RCNN ResNet50 FPN V2 model.

python inference.py

Use model of your choice with an image input.

python inference.py --model fasterrcnn_mobilenetv3_large_fpn --input example_test_data/image_1.jpg

Image Inference in Custom Trained Model

In this case you only need to give the weights file path and input file path. The config file and the model name are optional. If not provided they will will be automatically inferred from the weights file.

python inference.py --input data/inference_data/image_1.jpg --weights outputs/training/smoke_training/last_model_state.pth

Video Inference on COCO Pretrrained Model

python inference_video.py

Video Inference in Custom Trained Model

python inference_video.py --input data/inference_data/video_1.mp4 --weights outputs/training/smoke_training/last_model_state.pth

Tracking using COCO Pretrained Models

# Track all COCO classes (Faster RCNN ResNet50 FPN V2).
python inference_video.py --track --model fasterrcnn_resnet50_fpn_v2 --show

# Track all COCO classes (Faster RCNN ResNet50 FPN V2) using own video.
python inference_video.py --track --model fasterrcnn_resnet50_fpn_v2 --show --input ../inference_data/video_1.mp4

# Tracking only person class (index 1 in COCO pretrained). Check `COCO_91_CLASSES` attribute in `data_configs/coco.yaml` for more information.
python inference_video.py --track --model fasterrcnn_resnet50_fpn_v2 --show --input ../inference_data/video_4.mp4 --classes 1

# Tracking only person and car classes (indices 1 and 3 in COCO pretrained). Check `COCO_91_CLASSES` attribute in `data_configs/coco.yaml` for more information.
python inference_video.py --track --model fasterrcnn_resnet50_fpn_v2 --show --input ../inference_data/video_4.mp4 --classes 1 3

# Tracking using custom trained weights. Just provide the path to the weights instead of model name.
python inference_video.py --track --weights outputs/training/fish_det/best_model.pth --show --input ../inference_data/video_6.mp4

Evaluation

Replace the required arguments according to your need.

python eval.py --model fasterrcnn_resnet50_fpn_v2 --weights outputs/training/trial/best_model.pth --data data_configs/aquarium.yaml --batch 4

You can use the following command to show a table for class-wise Average Precision (--verbose additionally needed).

python eval.py --model fasterrcnn_resnet50_fpn_v2 --weights outputs/training/trial/best_model.pth --data data_configs/aquarium.yaml --batch 4 --verbose

A List of All Model Flags to Use With the Training Script

The following command expects the coco dataset to be present one directory back inside the input folder in XML format. You can find the dataset here on Kaggle. Check the data_configs/coco.yaml for more details. You can change the relative dataset path in the YAML file according to your structure.

# Usage 
python train.py --model fasterrcnn_resnet50_fpn_v2 --data data_configs/coco.yaml

OR USE ANY ONE OF THE FOLLOWING

[
    'fasterrcnn_convnext_small',
    'fasterrcnn_convnext_tiny',
    'fasterrcnn_custom_resnet', 
    'fasterrcnn_darknet',
    'fasterrcnn_efficientnet_b0',
    'fasterrcnn_efficientnet_b4',
    'fasterrcnn_mbv3_small_nano_head',
    'fasterrcnn_mbv3_large',
    'fasterrcnn_mini_darknet_nano_head',
    'fasterrcnn_mini_darknet',
    'fasterrcnn_mini_squeezenet1_1_small_head',
    'fasterrcnn_mini_squeezenet1_1_tiny_head',
    'fasterrcnn_mobilenetv3_large_320_fpn', # Torchvision COCO pretrained
    'fasterrcnn_mobilenetv3_large_fpn', # Torchvision COCO pretrained
    'fasterrcnn_nano',
    'fasterrcnn_resnet18',
    'fasterrcnn_resnet50_fpn_v2', # Torchvision COCO pretrained
    'fasterrcnn_resnet50_fpn',  # Torchvision COCO pretrained
    'fasterrcnn_resnet101',
    'fasterrcnn_resnet152',
    'fasterrcnn_squeezenet1_0',
    'fasterrcnn_squeezenet1_1_small_head',
    'fasterrcnn_squeezenet1_1',
    'fasterrcnn_vitdet',
    'fasterrcnn_vitdet_tiny',
    'fasterrcnn_mobilevit_xxs',
    'fasterrcnn_regnet_y_400mf'
]

Tutorials

fasterrcnn-pytorch-training-pipeline's People

Contributors

Stargazers

Watchers

Forkers

limm5 binh24399 dhrubapuc23 dataubc nuhhatipoglu bhushangawde mcuf rahul66920 ducbluee sirbastiano uncleben420 ericmiao0817 codehackerone duyhominhnguyen mmbmi tolsicsse trefoiv minhthanghus jingyiyanlol kalfasyan jifrah mheadhero shivi1394 arunav-buragohain-senzcraft omvishal1 mfeldman143 bcwein dipakgaire eddydw nirav-b-naik gillyboeuf nrdout falibabaei woongbeom inspacebr saiph1 kashishkumarexxon 0x7d0 mohamadreza-jafaei pedro-aguia rukon-uddin maldandan tommyngx hxdaze davidko3 jrutyna2 grauziitisos hohongha2711 schopra6 xcn700418 shanky3678 alfredindrehus jgmv biasedliar

fasterrcnn-pytorch-training-pipeline's Issues

Can we discuss how to break the model into separate component? I want to output the validation loss graph.

Hi sovit, its been a long time. I finally released from my finals. Now that I would like to continue to study this model, so that it could output the validation loss graph (because my thesis need it to check if the model overfit or underfit when applying it to an unseen dataset).

So, as last time, I had found a website: How can I determine validation loss for faster RCNN (PyTorch)?

And you have said it might required to break the model to several different module. So can you provide some guidance like how to break the model or how to implement the method used in the website?

No module named torch_utils.engine error while training

Hello, I have been having some trouble training with the smoke custom dataset and I keep receiving the error below.

Traceback (most recent call last):
File "train.py", line 10, in
from torch_utils.engine import train_one_epoch, evaluate
ImportError: No module named torch_utils.engine

The problem seems to have risen from the python version match, and I could not find exactly which python version is used for this whole code. The version in my enviroment is 3.10.4.

Thank you.

CUDA out of memorie: resnet101

When I use the fasterrcnn model with backbone resnet101, I always have CUDA out of memory in the "evaluate" after one epoch. in the evaluate itself it gives an error at output=model(images). However, only after a few loops. So there is a leek to the CUDA memory somewhere I think. Further, my batch size is already only 1 and my images are 600 x600 pixels. any recommendations?
(my CUDA memory is 8GB)

Thanks in advance.

how can I bypass wandb?

Wanted to train smoke and my setup went fine.. I tried the following and I am getting the following...
Is there a way to skip creating W&B account?

python train.py --config data_configs/smoke.yaml --epochs 100 --model fasterrcnn_resnet50_fpn --project-name smoke_training --batch-size 16
Not using distributed mode
wandb: (1) Create a W&B account
wandb: (2) Use an existing W&B account
wandb: (3) Don't visualize my results
wandb: Enter your choice:

Why are images without objects removed?

Thanks for this light weight faster rcnn implementation!
However, I have been using Detectron2, and it is able to train on images without objects.
I think it would be good to use them as negative examples.

How to disable Image Augmentations in Faster RCNN

I want to disable the image augmentation, could you please show me how!

Thank you!

train.py: error: unrecognized arguments: --config data_configs/wheat_2020.yaml --project-name fasterrcnn_mobilenetv3_large_fpn_noaug_40e

usage: train.py
[-h]
[-m MODEL]
[--data DATA]
[-d DEVICE]
[-e EPOCHS]
[-j WORKERS]
[-b BATCH]
[--lr LR]
[-ims IMGSZ]
[-n NAME]
[-vt]
[-nm]
[-uta]
[-ca]
[-w WEIGHTS]
[-r]
[-st]
[--world-size WORLD_SIZE]
[--dist-url DIST_URL]
[-dw]
[--seed SEED]
train.py: error: unrecognized arguments: --config data_configs/wheat_2020.yaml --project-name fasterrcnn_mobilenetv3_large_fpn_noaug_40e

ModuleNotFoundError: No module named 'vision_transformers'

I got this error and I tried to solve it by install vision-transformer, but still show me same error
pip install vision-transformer-pytorch

Now only can save last epoch weight? I mean this (last_model_state.pth)

And actually how to use this last_model_state.pth, like what info can I get from it?

Change scale and aspect ratio of RPN network

Your latest change has spoilt custom training

Traceback (most recent call last):
File "/content/fastercnn-pytorch-training-pipeline/train.py", line 542, in
main(args)
File "/content/fastercnn-pytorch-training-pipeline/train.py", line 494, in main
wandb_log(
File "/content/fastercnn-pytorch-training-pipeline/utils/logging.py", line 185, in wandb_log
if len(val_pred_image) == 1:
TypeError: object of type 'NoneType' has no len()
wandb: Waiting for W&B process to finish... (failed 1).

this is what I keep getting, I ran it yesterday and it worked today it isn't

RuntimeError: CUDA out of memory.

Hi, I am working in Kaggle with custom dataset. I got RuntimeError and I have no idea how to solve it. Can you help me?

RuntimeError: CUDA out of memory. Tried to allocate 2.44 GiB (GPU 0; 15.90 GiB total capacity; 9.95 GiB already allocated; 1.70 GiB free; 13.44 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

RuntimeError: Error(s) in loading state_dict for FasterRCNN:

when l try to run eval.py; i get the following error

Traceback (most recent call last):
File "/home/noor/fasterrcnn-pytorch-training-pipeline/eval.py", line 121, in
model.load_state_dict(checkpoint['model_state_dict'])
File "/home/noor/anaconda3/envs/tf-gpu/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1604, in load_state_dict
raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
RuntimeError: Error(s) in loading state_dict for FasterRCNN:
Missing key(s) in state_dict: "backbone.fpn.inner_blocks.0.1.weight", "backbone.fpn.inner_blocks.0.1.bias", "backbone.fpn.inner_blocks.0.1.running_mean", "backbone.fpn.inner_blocks.0.1.running_var", "backbone.fpn.inner_blocks.1.1.weight", "backbone.fpn.inner_blocks.1.1.bias", "backbone.fpn.inner_blocks.1.1.running_mean", "backbone.fpn.inner_blocks.1.1.running_var", "backbone.fpn.inner_blocks.2.1.weight", "backbone.fpn.inner_blocks.2.1.bias", "backbone.fpn.inner_blocks.2.1.running_mean", "backbone.fpn.inner_blocks.2.1.running_var", "backbone.fpn.inner_blocks.3.1.weight", "backbone.fpn.inner_blocks.3.1.bias", "backbone.fpn.inner_blocks.3.1.running_mean", "backbone.fpn.inner_blocks.3.1.running_var", "backbone.fpn.layer_blocks.0.1.weight", "backbone.fpn.layer_blocks.0.1.bias", "backbone.fpn.layer_blocks.0.1.running_mean", "backbone.fpn.layer_blocks.0.1.running_var", "backbone.fpn.layer_blocks.1.1.weight", "backbone.fpn.layer_blocks.1.1.bias", "backbone.fpn.layer_blocks.1.1.running_mean", "backbone.fpn.layer_blocks.1.1.running_var", "backbone.fpn.layer_blocks.2.1.weight", "backbone.fpn.layer_blocks.2.1.bias", "backbone.fpn.layer_blocks.2.1.running_mean", "backbone.fpn.layer_blocks.2.1.running_var", "backbone.fpn.layer_blocks.3.1.weight", "backbone.fpn.layer_blocks.3.1.bias", "backbone.fpn.layer_blocks.3.1.running_mean", "backbone.fpn.layer_blocks.3.1.running_var", "rpn.head.conv.1.0.weight", "rpn.head.conv.1.0.bias", "roi_heads.box_head.0.0.weight", "roi_heads.box_head.0.1.weight", "roi_heads.box_head.0.1.bias", "roi_heads.box_head.0.1.running_mean", "roi_heads.box_head.0.1.running_var", "roi_heads.box_head.1.0.weight", "roi_heads.box_head.1.1.weight", "roi_heads.box_head.1.1.bias", "roi_heads.box_head.1.1.running_mean", "roi_heads.box_head.1.1.running_var", "roi_heads.box_head.2.0.weight", "roi_heads.box_head.2.1.weight", "roi_heads.box_head.2.1.bias", "roi_heads.box_head.2.1.running_mean", "roi_heads.box_head.2.1.running_var", "roi_heads.box_head.3.0.weight", "roi_heads.box_head.3.1.weight", "roi_heads.box_head.3.1.bias", "roi_heads.box_head.3.1.running_mean", "roi_heads.box_head.3.1.running_var", "roi_heads.box_head.5.weight", "roi_heads.box_head.5.bias".
Unexpected key(s) in state_dict: "backbone.fpn.inner_blocks.0.0.bias", "backbone.fpn.inner_blocks.1.0.bias", "backbone.fpn.inner_blocks.2.0.bias", "backbone.fpn.inner_blocks.3.0.bias", "backbone.fpn.layer_blocks.0.0.bias", "backbone.fpn.layer_blocks.1.0.bias", "backbone.fpn.layer_blocks.2.0.bias", "backbone.fpn.layer_blocks.3.0.bias", "roi_heads.box_head.fc6.weight", "roi_heads.box_head.fc6.bias", "roi_heads.box_head.fc7.weight", "roi_heads.box_head.fc7.bias".

ValueError: num_samples should be a positive integer value, but got num_samples=0

Traceback (most recent call last):
File "/home/noor/fasterrcnn-pytorch-training-pipeline/train.py", line 539, in
main(args)
File "/home/noor/fasterrcnn-pytorch-training-pipeline/train.py", line 251, in main
train_sampler = RandomSampler(train_dataset)
File "/home/noor/anaconda3/envs/tf-gpu/lib/python3.9/site-packages/torch/utils/data/sampler.py", line 107, in init
raise ValueError("num_samples should be a positive integer "

question about running in distributed training in a single node with multiple gpus

First, thanks for the great work.

When I run without distributed mode, it works fine: This means that it created "output/training/" folder and saved the training results there.
However, when I try to run in a distributed mode, it does not work in my case:

Using argument option --dist_url 'tcp://localhost:23456' or 'tcp://127.0.0.1:23456' => In this case, it does not even generate the output directory "output/training/"
Using default option --dist_url "env://", which means I did nothing but to use default config => In this case, it created an output directory correctly but training is not happening and there are some files like "events.out.tfevents.1682197067.pid.0", "opt.yaml" and "train.log". Although "opt.yaml" has information about training parameters, the size of "train.log" is 0, i.e. it contains nothing.
The thing I noticed is that in the file "opt.yaml" the value of gpu is 5, while our workstation has 10 GPUs.
What could be the problem and how could be solved?

ValueError: x_max is less than or equal to x_min for bbox

My bounding box is in "Pascal VOC" format. While training a model, I got this error:

ValueError: x_max is less than or equal to x_min for bbox (tensor(0.3252), tensor(0.9222), tensor(0.3252), tensor(0.9308),tensor(2)).

COCOeval vs MeanAveragePrecision

In the eval.py you use AverageMeanPrecision from torchmetrics, in the train.py for evaluation you use the evaluation using pycocotools. When I use them both, I obtain different values for AP. Which one do you recommend to use and why?

How to train faster_rcnn with hard-background?

Hello! Does anyone have any idea how I can force feed the model images with potentially tricky objects so that it learns to avoid these biases? I was thinking of adding some background images in the training process (hand picked) with NO LABELS.

My question basically is: will they be tagged as background, and if so, will they end up in the loss function?

Can I request 2 functions?

Function 1 is to output validation loss plot using tensorboard
Function 2 is make something like results.csv to shows all the mAP on every epoch, so that we can know at which epoch we get the best mAP.

Example for result.csv is as figure below. This results.csv is from yolov5 one.

ImportError: /usr/lib/x86_64-linux-gnu/libstdc++.so.6: version `GLIBCXX_3.4.29' not found (required by /home/noor/anaconda3/envs/tf-gpu/lib/python3.9/site-packages/pandas/_libs/window/aggregations.cpython-39-x86_64-linux-gnu.so)

ImportError: /usr/lib/x86_64-linux-gnu/libstdc++.so.6: version `GLIBCXX_3.4.29' not found (required by /home/noor/anaconda3/envs/tf-gpu/lib/python3.9/site-packages/pandas/_libs/window/aggregations.cpython-39-x86_64-linux-gnu.so)

index over error during traing

I got a something error that about index over...

see the error line, I think the reason of that is too much big parameter..

is it right?

why I get a error and how can I solved during training data

File "train.py", line 200, in <module> colors=COLORS File "/home/ubuntu/anaconda3/envs/pytorch1.7.1_p37/lib/python3.7/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context return func(*args, **kwargs) File "/home/ubuntu/lecttue-diagonosis/YangDongJae/fastercnn-pytorch-training-pipeline/torch_utils/engine.py", line 138, in evaluate images, outputs, counter, out_dir, classes, colors File "/home/ubuntu/lecttue-diagonosis/YangDongJae/fastercnn-pytorch-training-pipeline/utils/general.py", line 218, in save_validation_results pred_classes = [classes[i] for i in labels.cpu().numpy()] File "/home/ubuntu/lecttue-diagonosis/YangDongJae/fastercnn-pytorch-training-pipeline/utils/general.py", line 218, in <listcomp> pred_classes = [classes[i] for i in labels.cpu().numpy()] IndexError: list index out of range

Training with GPU

What is your goto Hardware setup for training?
Is it Windows desktop with NVidia GPU or you usually run it on AWS/Azure?

how to resume training if pc power off while training

@sovit-123
i run training on my pc and power was off i need to resume training.

and also sorry if i do training for like 20 epoic and training finished but the result i got is not good how to resume from last weight ,like start from epoic 21 .

I need validation loss plot

Number of training and validation samples get doubled at training process

As the title stated, when training with custom dataset, the train and validation samples got double in the command promt, which I think affect the training speed and the accuracy of the training process. My dataset consists of 361 train samples and 42 validation samples.

Here is my train promt:
python train.py --data data_configs/traffic.yaml --epochs 50 --model fasterrcnn_resnet50_fpn --name trafficSign_detection_no_bg --batch 12

Here my dataconfig files:
TRAIN_DIR_IMAGES: data/traffic-sign/train/images
TRAIN_DIR_LABELS: data/traffic-sign/train/annotations
VALID_DIR_IMAGES: data/traffic-sign/valid/images
VALID_DIR_LABELS: data/traffic-sign/valid/annotations

CLASSES: [
'cam_di_nguoc_chieu',
'cam_oto',
'cam_oto_re_phai',
'cam_mo_to',
'cam_oto_va_moto',
'cam_ng_di_bo',
'cam_re_trai',
'cam_re_phai',
'cam_quay_dau_trai',
'max_spd_40',
'max_spd_50',
'max_spd_60',
'max_spd_80',
'cam_dung_do',
'cam_do',
'duong_giao_nhau',
'giao_nhau_vs_ko_uu_tien',
'giao_nhau_vs_ko_uu_tien_trai',
'giao_nhau_vs_uu_tien',
'dg_co_ng_di_bo_cat_ngang',
'tre_em_qua_duong',
'cong_truong',
'day_cap',
'slow',
'huong_phai_di',
'danh_cho_ng_di_bo',
'dg_mot_chieu',
'dg_cho_oto',
# 'background',
]
NC: 28
SAVE_VALID_PREDICTION_IMAGES: True`

the training process take a very long time of 2hour plus on RTX 3060, and the precision, the mAP is very low, around 0.23 for 50 epochs with batch_size of 12.

KeyError: ‘fasterrcnn_resnet50’

I am trying to run the code on my custom dataset but I got this error.

Building model from scratch...
Traceback (most recent call last):
  File "train.py", line 491, in <module>
    main(args)
  File "train.py", line 248, in main
    build_model = create_model[args['model']]
KeyError: 'fasterrcnn_resnet50'
wandb: Waiting for W&B process to finish... (failed 1). Press Control-C to abort syncing.
wandb: Synced smoke_training: https://wandb.ai/samahwa/fastercnn-pytorch-training-pipeline/runs/ejy5jyw8
wandb: Synced 5 W&B file(s), 0 media file(s), 0 artifact file(s) and 0 other file(s)
wandb: Find logs at: ./wandb/run-20221128_113545-ejy5jyw8/logs

how to retrain model

give me reply

How to using program onnx_inference_image.py

I tried to run this program, but the program can't run or error. Please to give me using onnx_inference_image.py

I try program:

fasterrcnn_mobilenetv3_large_fpn

!onnx_inference_image.py --weights outputs/training/fasterrcnn_resnet50_fpn_2/best_model.pth --input ../test/60rpm_sedikit --no-labels

But Error:
/bin/bash: onnx_inference_image.py: command not found

Mention the erroneous file in the error ValueError: Expected x_min for bbox to be in the range [0.0, 1.0]

Is it possible to add to the error message the file that causes the error?

I am trying to train the model on the RICO dataset from kaggle ( you can find it here https://www.kaggle.com/datasets/onurgunes1993/rico-dataset ) and using colab. The dimensions of each image mentioned in the labels ( 1440 * 2560 ) do not match the actual dimensions when opening the images using openCV ( 1080 * 1920 ).
After converting the labels to Pascal voc format, I have tried rescaling the labels, resizing the images, normalizing the dimensions and deleting the images and labels whose normalized dimensions are not in the range [0,1], but I am still getting the following error

Epoch: [0]  [  0/174]  eta: 0:11:26  lr: 0.000007  loss: 3.9243 (3.9243)  loss_classifier: 0.7015 (0.7015)  loss_box_reg: 0.3106 (0.3106)  loss_objectness: 2.6861 (2.6861)  loss_rpn_box_reg: 0.2260 (0.2260)  time: 3.9457  data: 2.4448  max mem: 1315
Traceback (most recent call last):
  File "/content/fasterrcnn-pytorch-training-pipeline/train.py", line 561, in <module>
    main(args)
  File "/content/fasterrcnn-pytorch-training-pipeline/train.py", line 401, in main
    batch_loss_rpn_list = train_one_epoch(
  File "/content/fasterrcnn-pytorch-training-pipeline/torch_utils/engine.py", line 45, in train_one_epoch
    for images, targets in metric_logger.log_every(data_loader, print_freq, header):
  File "/content/fasterrcnn-pytorch-training-pipeline/torch_utils/utils.py", line 173, in log_every
    for obj in iterable:
  File "/usr/local/lib/python3.10/dist-packages/torch/utils/data/dataloader.py", line 633, in __next__
    data = self._next_data()
  File "/usr/local/lib/python3.10/dist-packages/torch/utils/data/dataloader.py", line 1345, in _next_data
    return self._process_data(data)
  File "/usr/local/lib/python3.10/dist-packages/torch/utils/data/dataloader.py", line 1371, in _process_data
    data.reraise()
  File "/usr/local/lib/python3.10/dist-packages/torch/_utils.py", line 644, in reraise
    raise exception
ValueError: Caught ValueError in DataLoader worker process 2.
Original Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/torch/utils/data/_utils/worker.py", line 308, in _worker_loop
    data = fetcher.fetch(index)
  File "/usr/local/lib/python3.10/dist-packages/torch/utils/data/_utils/fetch.py", line 51, in fetch
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "/usr/local/lib/python3.10/dist-packages/torch/utils/data/_utils/fetch.py", line 51, in <listcomp>
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "/content/fasterrcnn-pytorch-training-pipeline/datasets.py", line 327, in __getitem__
    sample = self.transforms(image=image_resized,
  File "/usr/local/lib/python3.10/dist-packages/albumentations/core/composition.py", line 202, in __call__
    p.preprocess(data)
  File "/usr/local/lib/python3.10/dist-packages/albumentations/core/utils.py", line 83, in preprocess
    data[data_name] = self.check_and_convert(data[data_name], rows, cols, direction="to")
  File "/usr/local/lib/python3.10/dist-packages/albumentations/core/utils.py", line 91, in check_and_convert
    return self.convert_to_albumentations(data, rows, cols)
  File "/usr/local/lib/python3.10/dist-packages/albumentations/core/bbox_utils.py", line 126, in convert_to_albumentations
    return convert_bboxes_to_albumentations(data, self.params.format, rows, cols, check_validity=True)
  File "/usr/local/lib/python3.10/dist-packages/albumentations/core/bbox_utils.py", line 394, in convert_bboxes_to_albumentations
    return [convert_bbox_to_albumentations(bbox, source_format, rows, cols, check_validity) for bbox in bboxes]
  File "/usr/local/lib/python3.10/dist-packages/albumentations/core/bbox_utils.py", line 394, in <listcomp>
    return [convert_bbox_to_albumentations(bbox, source_format, rows, cols, check_validity) for bbox in bboxes]
  File "/usr/local/lib/python3.10/dist-packages/albumentations/core/bbox_utils.py", line 337, in convert_bbox_to_albumentations
    check_bbox(bbox)
  File "/usr/local/lib/python3.10/dist-packages/albumentations/core/bbox_utils.py", line 421, in check_bbox
    raise ValueError(
ValueError: Expected x_min for bbox (tensor(-0.0028), tensor(0.0527), tensor(0.0021), tensor(0.0891), tensor(1)) to be in the range [0.0, 1.0], got -0.0027777778450399637.

RuntimeError: CUDA error: out of memory

I want to train the model, but I got this error even the batch size = 1

Pipeline feature in repository

Dear sovit-123,

First of all, thank you for your sharing your source code.
This is not an issue, I am just curious about how the pipeline feature show in your code?
I meant that which different point between the "training-pipeline" and "normal training" fasterrcnn-pytorch ?

Thank you in advance!

Does your Dataset class cope with empty frames?

I often use this repo as a point of reference.

boxes = [] 
boxes = torch.as_tensor(boxes, dtype=torch.float32)

I don't believe the above would create the required shape to be provided to Faster RCNN in the case of an empty frame. See https://stackoverflow.com/questions/66063046/how-to-train-faster-rcnn-on-dataset-including-negative-data-in-pytorch .

Does your dataset / model cope with empty frames in some other way? Just trying to find what's considered sensible practice.

pytorch/vision#1598

Eror testing with Vidio inference_video.py

I will testing model using fasterrcnn_resnet50_fpn_2 and I found an error in my program like the following:

Building from model name arguments...
Downloading: "https://download.pytorch.org/models/fasterrcnn_resnet50_fpn_coco-258fb6c6.pth" to /root/.cache/torch/hub/checkpoints/fasterrcnn_resnet50_fpn_coco-258fb6c6.pth
100% 160M/160M [00:03<00:00, 52.4MB/s]
Frame: 1, Forward pass FPS: 0.820, Forward pass time: 1.220 seconds, Forward pass + annotation time: 1.226 seconds
qt.qpa.xcb: could not connect to display
qt.qpa.plugin: Could not load the Qt platform plugin "xcb" in "/usr/local/lib/python3.10/dist-packages/cv2/qt/plugins" even though it was found.
This application failed to start because no Qt platform plugin could be initialized. Reinstalling the application may fix this problem.

Available platform plugins are: xcb.

maybe can update you notebook with the eval function, and how to validate it with testing dataset?

RuntimeError: Found no NVIDIA driver on your system. Please check that you have an NVIDIA GPU and installed a driver from http://www.nvidia.com/Download/index.aspx

..
Drop files to upload them to session storage
Disk
77.33 GB available
[1]
22s
#koneksi kegoogle drive-
from google.colab import drive
drive.mount('/content/drive/')
Mounted at /content/drive/
[2]
0s
%cd /content/drive/MyDrive/cangkehdetection
/content/drive/MyDrive/cangkehdetection
[ ]

Download data

!git clone https://github.com/sovit-123/fasterrcnn-pytorch-training-pipeline.git
[3]
0s
%cd fasterrcnn-pytorch-training-pipeline
/content/drive/MyDrive/cangkehdetection/fasterrcnn-pytorch-training-pipeline
[5]
1m

Install the Requirements library for image detection-

!pip install -r requirements.txt
Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Requirement already satisfied: albumentations>=1.1.0 in /usr/local/lib/python3.9/dist-packages (from -r requirements.txt (line 2)) (1.2.1)
Requirement already satisfied: ipython in /usr/local/lib/python3.9/dist-packages (from -r requirements.txt (line 3)) (7.34.0)
Collecting jupyter
Downloading jupyter-1.0.0-py2.py3-none-any.whl (2.7 kB)
Requirement already satisfied: matplotlib in /usr/local/lib/python3.9/dist-packages (from -r requirements.txt (line 5)) (3.7.1)
Requirement already satisfied: opencv-python>=4.1.1.26 in /usr/local/lib/python3.9/dist-packages (from -r requirements.txt (line 6)) (4.7.0.72)
Requirement already satisfied: opencv-python-headless>=4.1.1.26 in /usr/local/lib/python3.9/dist-packages (from -r requirements.txt (line 7)) (4.7.0.72)
Requirement already satisfied: Pillow in /usr/local/lib/python3.9/dist-packages (from -r requirements.txt (line 8)) (8.4.0)
Requirement already satisfied: PyYAML in /usr/local/lib/python3.9/dist-packages (from -r requirements.txt (line 9)) (6.0)
Requirement already satisfied: scikit-image in /usr/local/lib/python3.9/dist-packages (from -r requirements.txt (line 10)) (0.19.3)
Requirement already satisfied: scikit-learn in /usr/local/lib/python3.9/dist-packages (from -r requirements.txt (line 11)) (1.2.2)
Requirement already satisfied: scipy in /usr/local/lib/python3.9/dist-packages (from -r requirements.txt (line 12)) (1.10.1)
Collecting torch==1.12.0
Downloading torch-1.12.0-cp39-cp39-manylinux1_x86_64.whl (776.3 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 776.3/776.3 MB 2.1 MB/s eta 0:00:00
Collecting torchvision==0.13.0
Downloading torchvision-0.13.0-cp39-cp39-manylinux1_x86_64.whl (19.1 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 19.1/19.1 MB 65.6 MB/s eta 0:00:00
Requirement already satisfied: numpy in /usr/local/lib/python3.9/dist-packages (from -r requirements.txt (line 15)) (1.22.4)
Collecting protobuf<=3.20.1
Downloading protobuf-3.20.1-cp39-cp39-manylinux_2_5_x86_64.manylinux1_x86_64.whl (1.0 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.0/1.0 MB 60.5 MB/s eta 0:00:00
Requirement already satisfied: pandas in /usr/local/lib/python3.9/dist-packages (from -r requirements.txt (line 17)) (1.4.4)
Requirement already satisfied: tqdm in /usr/local/lib/python3.9/dist-packages (from -r requirements.txt (line 18)) (4.65.0)
Collecting wandb
Downloading wandb-0.14.2-py3-none-any.whl (2.0 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 2.0/2.0 MB 82.0 MB/s eta 0:00:00
Requirement already satisfied: tensorboard in /usr/local/lib/python3.9/dist-packages (from -r requirements.txt (line 22)) (2.12.1)
Collecting torchinfo
Downloading torchinfo-1.7.2-py3-none-any.whl (22 kB)
Requirement already satisfied: pycocotools>=2.0.2 in /usr/local/lib/python3.9/dist-packages (from -r requirements.txt (line 28)) (2.0.6)
Collecting setuptools==59.5.0
Downloading setuptools-59.5.0-py3-none-any.whl (952 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 952.4/952.4 KB 63.3 MB/s eta 0:00:00
Collecting torchmetrics
Downloading torchmetrics-0.11.4-py3-none-any.whl (519 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 519.2/519.2 KB 38.4 MB/s eta 0:00:00
Requirement already satisfied: typing-extensions in /usr/local/lib/python3.9/dist-packages (from torch==1.12.0->-r requirements.txt (line 13)) (4.5.0)
Requirement already satisfied: requests in /usr/local/lib/python3.9/dist-packages (from torchvision==0.13.0->-r requirements.txt (line 14)) (2.27.1)
Requirement already satisfied: qudida>=0.0.4 in /usr/local/lib/python3.9/dist-packages (from albumentations>=1.1.0->-r requirements.txt (line 2)) (0.0.4)
Collecting jedi>=0.16
Downloading jedi-0.18.2-py2.py3-none-any.whl (1.6 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.6/1.6 MB 78.0 MB/s eta 0:00:00
Requirement already satisfied: pexpect>4.3 in /usr/local/lib/python3.9/dist-packages (from ipython->-r requirements.txt (line 3)) (4.8.0)
Requirement already satisfied: backcall in /usr/local/lib/python3.9/dist-packages (from ipython->-r requirements.txt (line 3)) (0.2.0)
Requirement already satisfied: pygments in /usr/local/lib/python3.9/dist-packages (from ipython->-r requirements.txt (line 3)) (2.14.0)
Requirement already satisfied: traitlets>=4.2 in /usr/local/lib/python3.9/dist-packages (from ipython->-r requirements.txt (line 3)) (5.7.1)
Requirement already satisfied: pickleshare in /usr/local/lib/python3.9/dist-packages (from ipython->-r requirements.txt (line 3)) (0.7.5)
Requirement already satisfied: prompt-toolkit!=3.0.0,!=3.0.1,<3.1.0,>=2.0.0 in /usr/local/lib/python3.9/dist-packages (from ipython->-r requirements.txt (line 3)) (3.0.38)
Requirement already satisfied: decorator in /usr/local/lib/python3.9/dist-packages (from ipython->-r requirements.txt (line 3)) (4.4.2)
Requirement already satisfied: matplotlib-inline in /usr/local/lib/python3.9/dist-packages (from ipython->-r requirements.txt (line 3)) (0.1.6)
Collecting qtconsole
Downloading qtconsole-5.4.2-py3-none-any.whl (121 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 121.2/121.2 KB 14.8 MB/s eta 0:00:00
Requirement already satisfied: jupyter-console in /usr/local/lib/python3.9/dist-packages (from jupyter->-r requirements.txt (line 4)) (6.1.0)
Requirement already satisfied: ipykernel in /usr/local/lib/python3.9/dist-packages (from jupyter->-r requirements.txt (line 4)) (5.5.6)
Requirement already satisfied: ipywidgets in /usr/local/lib/python3.9/dist-packages (from jupyter->-r requirements.txt (line 4)) (7.7.1)
Requirement already satisfied: nbconvert in /usr/local/lib/python3.9/dist-packages (from jupyter->-r requirements.txt (line 4)) (6.5.4)
Requirement already satisfied: notebook in /usr/local/lib/python3.9/dist-packages (from jupyter->-r requirements.txt (line 4)) (6.4.8)
Requirement already satisfied: python-dateutil>=2.7 in /usr/local/lib/python3.9/dist-packages (from matplotlib->-r requirements.txt (line 5)) (2.8.2)
Requirement already satisfied: packaging>=20.0 in /usr/local/lib/python3.9/dist-packages (from matplotlib->-r requirements.txt (line 5)) (23.0)
Requirement already satisfied: cycler>=0.10 in /usr/local/lib/python3.9/dist-packages (from matplotlib->-r requirements.txt (line 5)) (0.11.0)
Requirement already satisfied: pyparsing>=2.3.1 in /usr/local/lib/python3.9/dist-packages (from matplotlib->-r requirements.txt (line 5)) (3.0.9)
Requirement already satisfied: importlib-resources>=3.2.0 in /usr/local/lib/python3.9/dist-packages (from matplotlib->-r requirements.txt (line 5)) (5.12.0)
Requirement already satisfied: fonttools>=4.22.0 in /usr/local/lib/python3.9/dist-packages (from matplotlib->-r requirements.txt (line 5)) (4.39.3)
Requirement already satisfied: contourpy>=1.0.1 in /usr/local/lib/python3.9/dist-packages (from matplotlib->-r requirements.txt (line 5)) (1.0.7)
Requirement already satisfied: kiwisolver>=1.0.1 in /usr/local/lib/python3.9/dist-packages (from matplotlib->-r requirements.txt (line 5)) (1.4.4)
Requirement already satisfied: PyWavelets>=1.1.1 in /usr/local/lib/python3.9/dist-packages (from scikit-image->-r requirements.txt (line 10)) (1.4.1)
Requirement already satisfied: networkx>=2.2 in /usr/local/lib/python3.9/dist-packages (from scikit-image->-r requirements.txt (line 10)) (3.0)
Requirement already satisfied: tifffile>=2019.7.26 in /usr/local/lib/python3.9/dist-packages (from scikit-image->-r requirements.txt (line 10)) (2023.3.21)
Requirement already satisfied: imageio>=2.4.1 in /usr/local/lib/python3.9/dist-packages (from scikit-image->-r requirements.txt (line 10)) (2.25.1)
Requirement already satisfied: joblib>=1.1.1 in /usr/local/lib/python3.9/dist-packages (from scikit-learn->-r requirements.txt (line 11)) (1.1.1)
Requirement already satisfied: threadpoolctl>=2.0.0 in /usr/local/lib/python3.9/dist-packages (from scikit-learn->-r requirements.txt (line 11)) (3.1.0)
Requirement already satisfied: pytz>=2020.1 in /usr/local/lib/python3.9/dist-packages (from pandas->-r requirements.txt (line 17)) (2022.7.1)
Requirement already satisfied: psutil>=5.0.0 in /usr/local/lib/python3.9/dist-packages (from wandb->-r requirements.txt (line 21)) (5.9.4)
Collecting setproctitle
Downloading setproctitle-1.3.2-cp39-cp39-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl (30 kB)
Requirement already satisfied: Click!=8.0.0,>=7.0 in /usr/local/lib/python3.9/dist-packages (from wandb->-r requirements.txt (line 21)) (8.1.3)
Collecting GitPython!=3.1.29,>=1.0.0
Downloading GitPython-3.1.31-py3-none-any.whl (184 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 184.3/184.3 KB 21.9 MB/s eta 0:00:00
Collecting docker-pycreds>=0.4.0
Downloading docker_pycreds-0.4.0-py2.py3-none-any.whl (9.0 kB)
Requirement already satisfied: appdirs>=1.4.3 in /usr/local/lib/python3.9/dist-packages (from wandb->-r requirements.txt (line 21)) (1.4.4)
Collecting pathtools
Downloading pathtools-0.1.2.tar.gz (11 kB)
Preparing metadata (setup.py) ... done
Collecting sentry-sdk>=1.0.0
Downloading sentry_sdk-1.19.1-py2.py3-none-any.whl (199 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 199.2/199.2 KB 21.0 MB/s eta 0:00:00
Requirement already satisfied: google-auth<3,>=1.6.3 in /usr/local/lib/python3.9/dist-packages (from tensorboard->-r requirements.txt (line 22)) (2.17.1)
Requirement already satisfied: tensorboard-plugin-wit>=1.6.0 in /usr/local/lib/python3.9/dist-packages (from tensorboard->-r requirements.txt (line 22)) (1.8.1)
Requirement already satisfied: tensorboard-data-server<0.8.0,>=0.7.0 in /usr/local/lib/python3.9/dist-packages (from tensorboard->-r requirements.txt (line 22)) (0.7.0)
Requirement already satisfied: werkzeug>=1.0.1 in /usr/local/lib/python3.9/dist-packages (from tensorboard->-r requirements.txt (line 22)) (2.2.3)
Requirement already satisfied: wheel>=0.26 in /usr/local/lib/python3.9/dist-packages (from tensorboard->-r requirements.txt (line 22)) (0.40.0)
Requirement already satisfied: google-auth-oauthlib<1.1,>=0.5 in /usr/local/lib/python3.9/dist-packages (from tensorboard->-r requirements.txt (line 22)) (1.0.0)
Requirement already satisfied: absl-py>=0.4 in /usr/local/lib/python3.9/dist-packages (from tensorboard->-r requirements.txt (line 22)) (1.4.0)
Requirement already satisfied: grpcio>=1.48.2 in /usr/local/lib/python3.9/dist-packages (from tensorboard->-r requirements.txt (line 22)) (1.53.0)
Requirement already satisfied: markdown>=2.6.8 in /usr/local/lib/python3.9/dist-packages (from tensorboard->-r requirements.txt (line 22)) (3.4.3)
Requirement already satisfied: six>=1.4.0 in /usr/local/lib/python3.9/dist-packages (from docker-pycreds>=0.4.0->wandb->-r requirements.txt (line 21)) (1.16.0)
Collecting gitdb<5,>=4.0.1
Downloading gitdb-4.0.10-py3-none-any.whl (62 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 62.7/62.7 KB 7.8 MB/s eta 0:00:00
Requirement already satisfied: rsa<5,>=3.1.4 in /usr/local/lib/python3.9/dist-packages (from google-auth<3,>=1.6.3->tensorboard->-r requirements.txt (line 22)) (4.9)
Requirement already satisfied: cachetools<6.0,>=2.0.0 in /usr/local/lib/python3.9/dist-packages (from google-auth<3,>=1.6.3->tensorboard->-r requirements.txt (line 22)) (5.3.0)
Requirement already satisfied: pyasn1-modules>=0.2.1 in /usr/local/lib/python3.9/dist-packages (from google-auth<3,>=1.6.3->tensorboard->-r requirements.txt (line 22)) (0.2.8)
Requirement already satisfied: requests-oauthlib>=0.7.0 in /usr/local/lib/python3.9/dist-packages (from google-auth-oauthlib<1.1,>=0.5->tensorboard->-r requirements.txt (line 22)) (1.3.1)
Requirement already satisfied: zipp>=3.1.0 in /usr/local/lib/python3.9/dist-packages (from importlib-resources>=3.2.0->matplotlib->-r requirements.txt (line 5)) (3.15.0)
Requirement already satisfied: parso<0.9.0,>=0.8.0 in /usr/local/lib/python3.9/dist-packages (from jedi>=0.16->ipython->-r requirements.txt (line 3)) (0.8.3)
Requirement already satisfied: importlib-metadata>=4.4 in /usr/local/lib/python3.9/dist-packages (from markdown>=2.6.8->tensorboard->-r requirements.txt (line 22)) (6.1.0)
Requirement already satisfied: ptyprocess>=0.5 in /usr/local/lib/python3.9/dist-packages (from pexpect>4.3->ipython->-r requirements.txt (line 3)) (0.7.0)
Requirement already satisfied: wcwidth in /usr/local/lib/python3.9/dist-packages (from prompt-toolkit!=3.0.0,!=3.0.1,<3.1.0,>=2.0.0->ipython->-r requirements.txt (line 3)) (0.2.6)
Requirement already satisfied: idna<4,>=2.5 in /usr/local/lib/python3.9/dist-packages (from requests->torchvision==0.13.0->-r requirements.txt (line 14)) (3.4)
Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.9/dist-packages (from requests->torchvision==0.13.0->-r requirements.txt (line 14)) (2022.12.7)
Requirement already satisfied: charset-normalizer~=2.0.0 in /usr/local/lib/python3.9/dist-packages (from requests->torchvision==0.13.0->-r requirements.txt (line 14)) (2.0.12)
Requirement already satisfied: urllib3<1.27,>=1.21.1 in /usr/local/lib/python3.9/dist-packages (from requests->torchvision==0.13.0->-r requirements.txt (line 14)) (1.26.15)
Requirement already satisfied: MarkupSafe>=2.1.1 in /usr/local/lib/python3.9/dist-packages (from werkzeug>=1.0.1->tensorboard->-r requirements.txt (line 22)) (2.1.2)
Requirement already satisfied: tornado>=4.2 in /usr/local/lib/python3.9/dist-packages (from ipykernel->jupyter->-r requirements.txt (line 4)) (6.2)
Requirement already satisfied: ipython-genutils in /usr/local/lib/python3.9/dist-packages (from ipykernel->jupyter->-r requirements.txt (line 4)) (0.2.0)
Requirement already satisfied: jupyter-client in /usr/local/lib/python3.9/dist-packages (from ipykernel->jupyter->-r requirements.txt (line 4)) (6.1.12)
Requirement already satisfied: widgetsnbextension~=3.6.0 in /usr/local/lib/python3.9/dist-packages (from ipywidgets->jupyter->-r requirements.txt (line 4)) (3.6.4)
Requirement already satisfied: jupyterlab-widgets>=1.0.0 in /usr/local/lib/python3.9/dist-packages (from ipywidgets->jupyter->-r requirements.txt (line 4)) (3.0.7)
Requirement already satisfied: jupyter-core>=4.7 in /usr/local/lib/python3.9/dist-packages (from nbconvert->jupyter->-r requirements.txt (line 4)) (5.3.0)
Requirement already satisfied: mistune<2,>=0.8.1 in /usr/local/lib/python3.9/dist-packages (from nbconvert->jupyter->-r requirements.txt (line 4)) (0.8.4)
Requirement already satisfied: jinja2>=3.0 in /usr/local/lib/python3.9/dist-packages (from nbconvert->jupyter->-r requirements.txt (line 4)) (3.1.2)
Requirement already satisfied: bleach in /usr/local/lib/python3.9/dist-packages (from nbconvert->jupyter->-r requirements.txt (line 4)) (6.0.0)
Requirement already satisfied: lxml in /usr/local/lib/python3.9/dist-packages (from nbconvert->jupyter->-r requirements.txt (line 4)) (4.9.2)
Requirement already satisfied: beautifulsoup4 in /usr/local/lib/python3.9/dist-packages (from nbconvert->jupyter->-r requirements.txt (line 4)) (4.11.2)
Requirement already satisfied: defusedxml in /usr/local/lib/python3.9/dist-packages (from nbconvert->jupyter->-r requirements.txt (line 4)) (0.7.1)
Requirement already satisfied: nbformat>=5.1 in /usr/local/lib/python3.9/dist-packages (from nbconvert->jupyter->-r requirements.txt (line 4)) (5.8.0)
Requirement already satisfied: tinycss2 in /usr/local/lib/python3.9/dist-packages (from nbconvert->jupyter->-r requirements.txt (line 4)) (1.2.1)
Requirement already satisfied: pandocfilters>=1.4.1 in /usr/local/lib/python3.9/dist-packages (from nbconvert->jupyter->-r requirements.txt (line 4)) (1.5.0)
Requirement already satisfied: jupyterlab-pygments in /usr/local/lib/python3.9/dist-packages (from nbconvert->jupyter->-r requirements.txt (line 4)) (0.2.2)
Requirement already satisfied: entrypoints>=0.2.2 in /usr/local/lib/python3.9/dist-packages (from nbconvert->jupyter->-r requirements.txt (line 4)) (0.4)
Requirement already satisfied: nbclient>=0.5.0 in /usr/local/lib/python3.9/dist-packages (from nbconvert->jupyter->-r requirements.txt (line 4)) (0.7.3)
Requirement already satisfied: pyzmq>=17 in /usr/local/lib/python3.9/dist-packages (from notebook->jupyter->-r requirements.txt (line 4)) (23.2.1)
Requirement already satisfied: terminado>=0.8.3 in /usr/local/lib/python3.9/dist-packages (from notebook->jupyter->-r requirements.txt (line 4)) (0.17.1)
Requirement already satisfied: nest-asyncio>=1.5 in /usr/local/lib/python3.9/dist-packages (from notebook->jupyter->-r requirements.txt (line 4)) (1.5.6)
Requirement already satisfied: Send2Trash>=1.8.0 in /usr/local/lib/python3.9/dist-packages (from notebook->jupyter->-r requirements.txt (line 4)) (1.8.0)
Requirement already satisfied: argon2-cffi in /usr/local/lib/python3.9/dist-packages (from notebook->jupyter->-r requirements.txt (line 4)) (21.3.0)
Requirement already satisfied: prometheus-client in /usr/local/lib/python3.9/dist-packages (from notebook->jupyter->-r requirements.txt (line 4)) (0.16.0)
Collecting qtpy>=2.0.1
Downloading QtPy-2.3.1-py3-none-any.whl (84 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 84.9/84.9 KB 10.8 MB/s eta 0:00:00
Collecting smmap<6,>=3.0.1
Downloading smmap-5.0.0-py3-none-any.whl (24 kB)
Requirement already satisfied: platformdirs>=2.5 in /usr/local/lib/python3.9/dist-packages (from jupyter-core>=4.7->nbconvert->jupyter->-r requirements.txt (line 4)) (3.2.0)
Requirement already satisfied: jsonschema>=2.6 in /usr/local/lib/python3.9/dist-packages (from nbformat>=5.1->nbconvert->jupyter->-r requirements.txt (line 4)) (4.3.3)
Requirement already satisfied: fastjsonschema in /usr/local/lib/python3.9/dist-packages (from nbformat>=5.1->nbconvert->jupyter->-r requirements.txt (line 4)) (2.16.3)
Requirement already satisfied: pyasn1<0.5.0,>=0.4.6 in /usr/local/lib/python3.9/dist-packages (from pyasn1-modules>=0.2.1->google-auth<3,>=1.6.3->tensorboard->-r requirements.txt (line 22)) (0.4.8)
Requirement already satisfied: oauthlib>=3.0.0 in /usr/local/lib/python3.9/dist-packages (from requests-oauthlib>=0.7.0->google-auth-oauthlib<1.1,>=0.5->tensorboard->-r requirements.txt (line 22)) (3.2.2)
Requirement already satisfied: argon2-cffi-bindings in /usr/local/lib/python3.9/dist-packages (from argon2-cffi->notebook->jupyter->-r requirements.txt (line 4)) (21.2.0)
Requirement already satisfied: soupsieve>1.2 in /usr/local/lib/python3.9/dist-packages (from beautifulsoup4->nbconvert->jupyter->-r requirements.txt (line 4)) (2.4)
Requirement already satisfied: webencodings in /usr/local/lib/python3.9/dist-packages (from bleach->nbconvert->jupyter->-r requirements.txt (line 4)) (0.5.1)
Requirement already satisfied: attrs>=17.4.0 in /usr/local/lib/python3.9/dist-packages (from jsonschema>=2.6->nbformat>=5.1->nbconvert->jupyter->-r requirements.txt (line 4)) (22.2.0)
Requirement already satisfied: pyrsistent!=0.17.0,!=0.17.1,!=0.17.2,>=0.14.0 in /usr/local/lib/python3.9/dist-packages (from jsonschema>=2.6->nbformat>=5.1->nbconvert->jupyter->-r requirements.txt (line 4)) (0.19.3)
Requirement already satisfied: cffi>=1.0.1 in /usr/local/lib/python3.9/dist-packages (from argon2-cffi-bindings->argon2-cffi->notebook->jupyter->-r requirements.txt (line 4)) (1.15.1)
Requirement already satisfied: pycparser in /usr/local/lib/python3.9/dist-packages (from cffi>=1.0.1->argon2-cffi-bindings->argon2-cffi->notebook->jupyter->-r requirements.txt (line 4)) (2.21)
Building wheels for collected packages: pathtools
Building wheel for pathtools (setup.py) ... done
Created wheel for pathtools: filename=pathtools-0.1.2-py3-none-any.whl size=8807 sha256=ce341c5849e5e6a3c8e4b833e3b11c039a73e68dc3da1279e37853635e92362b
Stored in directory: /root/.cache/pip/wheels/b7/0a/67/ada2a22079218c75a88361c0782855cc72aebc4d18d0289d05
Successfully built pathtools
Installing collected packages: pathtools, torchinfo, torch, smmap, setuptools, setproctitle, sentry-sdk, qtpy, protobuf, jedi, docker-pycreds, torchvision, torchmetrics, gitdb, GitPython, wandb, qtconsole, jupyter
Attempting uninstall: torch
Found existing installation: torch 2.0.0+cu118
Uninstalling torch-2.0.0+cu118:
Successfully uninstalled torch-2.0.0+cu118
Attempting uninstall: setuptools
Found existing installation: setuptools 67.6.1
Uninstalling setuptools-67.6.1:
Successfully uninstalled setuptools-67.6.1
Attempting uninstall: protobuf
Found existing installation: protobuf 3.20.3
Uninstalling protobuf-3.20.3:
Successfully uninstalled protobuf-3.20.3
Attempting uninstall: torchvision
Found existing installation: torchvision 0.15.1+cu118
Uninstalling torchvision-0.15.1+cu118:
Successfully uninstalled torchvision-0.15.1+cu118
ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
torchtext 0.15.1 requires torch==2.0.0, but you have torch 1.12.0 which is incompatible.
torchdata 0.6.0 requires torch==2.0.0, but you have torch 1.12.0 which is incompatible.
torchaudio 2.0.1+cu118 requires torch==2.0.0, but you have torch 1.12.0 which is incompatible.
tensorflow 2.12.0 requires protobuf!=4.21.0,!=4.21.1,!=4.21.2,!=4.21.3,!=4.21.4,!=4.21.5,<5.0.0dev,>=3.20.3, but you have protobuf 3.20.1 which is incompatible.
googleapis-common-protos 1.59.0 requires protobuf!=3.20.0,!=3.20.1,!=4.21.1,!=4.21.2,!=4.21.3,!=4.21.4,!=4.21.5,<5.0.0dev,>=3.19.5, but you have protobuf 3.20.1 which is incompatible.
google-cloud-translate 3.8.4 requires protobuf!=3.20.0,!=3.20.1,!=4.21.0,!=4.21.1,!=4.21.2,!=4.21.3,!=4.21.4,!=4.21.5,<5.0.0dev,>=3.19.5, but you have protobuf 3.20.1 which is incompatible.
google-cloud-language 2.6.1 requires protobuf!=3.20.0,!=3.20.1,!=4.21.0,!=4.21.1,!=4.21.2,!=4.21.3,!=4.21.4,!=4.21.5,<5.0.0dev,>=3.19.5, but you have protobuf 3.20.1 which is incompatible.
google-cloud-firestore 2.7.3 requires protobuf!=3.20.0,!=3.20.1,!=4.21.0,!=4.21.1,!=4.21.2,!=4.21.3,!=4.21.4,!=4.21.5,<5.0.0dev,>=3.19.5, but you have protobuf 3.20.1 which is incompatible.
google-cloud-datastore 2.11.1 requires protobuf!=3.20.0,!=3.20.1,!=4.21.0,!=4.21.1,!=4.21.2,!=4.21.3,!=4.21.4,!=4.21.5,<5.0.0dev,>=3.19.5, but you have protobuf 3.20.1 which is incompatible.
google-cloud-bigquery 3.4.2 requires protobuf!=3.20.0,!=3.20.1,!=4.21.0,!=4.21.1,!=4.21.2,!=4.21.3,!=4.21.4,!=4.21.5,<5.0.0dev,>=3.19.5, but you have protobuf 3.20.1 which is incompatible.
google-cloud-bigquery-storage 2.19.1 requires protobuf!=3.20.0,!=3.20.1,!=4.21.0,!=4.21.1,!=4.21.2,!=4.21.3,!=4.21.4,!=4.21.5,<5.0.0dev,>=3.19.5, but you have protobuf 3.20.1 which is incompatible.
google-api-core 2.11.0 requires protobuf!=3.20.0,!=3.20.1,!=4.21.0,!=4.21.1,!=4.21.2,!=4.21.3,!=4.21.4,!=4.21.5,<5.0.0dev,>=3.19.5, but you have protobuf 3.20.1 which is incompatible.
cvxpy 1.3.1 requires setuptools>65.5.1, but you have setuptools 59.5.0 which is incompatible.
arviz 0.15.1 requires setuptools>=60.0.0, but you have setuptools 59.5.0 which is incompatible.
Successfully installed GitPython-3.1.31 docker-pycreds-0.4.0 gitdb-4.0.10 jedi-0.18.2 jupyter-1.0.0 pathtools-0.1.2 protobuf-3.20.1 qtconsole-5.4.2 qtpy-2.3.1 sentry-sdk-1.19.1 setproctitle-1.3.2 setuptools-59.5.0 smmap-5.0.0 torch-1.12.0 torchinfo-1.7.2 torchmetrics-0.11.4 torchvision-0.13.0 wandb-0.14.2
[6]
1m
!pip install torch==2.0.0
Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting torch==2.0.0
Downloading torch-2.0.0-cp39-cp39-manylinux1_x86_64.whl (619.9 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 619.9/619.9 MB 2.4 MB/s eta 0:00:00
Collecting nvidia-cuda-nvrtc-cu11==11.7.99
Downloading nvidia_cuda_nvrtc_cu11-11.7.99-2-py3-none-manylinux1_x86_64.whl (21.0 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 21.0/21.0 MB 63.4 MB/s eta 0:00:00
Collecting nvidia-cuda-runtime-cu11==11.7.99
Downloading nvidia_cuda_runtime_cu11-11.7.99-py3-none-manylinux1_x86_64.whl (849 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 849.3/849.3 KB 56.1 MB/s eta 0:00:00
Requirement already satisfied: filelock in /usr/local/lib/python3.9/dist-packages (from torch==2.0.0) (3.10.7)
Requirement already satisfied: jinja2 in /usr/local/lib/python3.9/dist-packages (from torch==2.0.0) (3.1.2)
Collecting nvidia-cusparse-cu11==11.7.4.91
Downloading nvidia_cusparse_cu11-11.7.4.91-py3-none-manylinux1_x86_64.whl (173.2 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 173.2/173.2 MB 7.3 MB/s eta 0:00:00
Requirement already satisfied: typing-extensions in /usr/local/lib/python3.9/dist-packages (from torch==2.0.0) (4.5.0)
Collecting nvidia-nvtx-cu11==11.7.91
Downloading nvidia_nvtx_cu11-11.7.91-py3-none-manylinux1_x86_64.whl (98 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 98.6/98.6 KB 13.6 MB/s eta 0:00:00
Collecting nvidia-cufft-cu11==10.9.0.58
Downloading nvidia_cufft_cu11-10.9.0.58-py3-none-manylinux1_x86_64.whl (168.4 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 168.4/168.4 MB 7.4 MB/s eta 0:00:00
Collecting nvidia-cuda-cupti-cu11==11.7.101
Downloading nvidia_cuda_cupti_cu11-11.7.101-py3-none-manylinux1_x86_64.whl (11.8 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 11.8/11.8 MB 84.9 MB/s eta 0:00:00
Requirement already satisfied: sympy in /usr/local/lib/python3.9/dist-packages (from torch==2.0.0) (1.11.1)
Collecting nvidia-curand-cu11==10.2.10.91
Downloading nvidia_curand_cu11-10.2.10.91-py3-none-manylinux1_x86_64.whl (54.6 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 54.6/54.6 MB 12.4 MB/s eta 0:00:00
Requirement already satisfied: networkx in /usr/local/lib/python3.9/dist-packages (from torch==2.0.0) (3.0)
Requirement already satisfied: triton==2.0.0 in /usr/local/lib/python3.9/dist-packages (from torch==2.0.0) (2.0.0)
Collecting nvidia-cublas-cu11==11.10.3.66
Downloading nvidia_cublas_cu11-11.10.3.66-py3-none-manylinux1_x86_64.whl (317.1 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 317.1/317.1 MB 4.0 MB/s eta 0:00:00
Collecting nvidia-cudnn-cu11==8.5.0.96
Downloading nvidia_cudnn_cu11-8.5.0.96-2-py3-none-manylinux1_x86_64.whl (557.1 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 557.1/557.1 MB 2.7 MB/s eta 0:00:00
Collecting nvidia-nccl-cu11==2.14.3
Downloading nvidia_nccl_cu11-2.14.3-py3-none-manylinux1_x86_64.whl (177.1 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 177.1/177.1 MB 6.2 MB/s eta 0:00:00
Collecting nvidia-cusolver-cu11==11.4.0.1
Downloading nvidia_cusolver_cu11-11.4.0.1-2-py3-none-manylinux1_x86_64.whl (102.6 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 102.6/102.6 MB 9.9 MB/s eta 0:00:00
Requirement already satisfied: setuptools in /usr/local/lib/python3.9/dist-packages (from nvidia-cublas-cu11==11.10.3.66->torch==2.0.0) (59.5.0)
Requirement already satisfied: wheel in /usr/local/lib/python3.9/dist-packages (from nvidia-cublas-cu11==11.10.3.66->torch==2.0.0) (0.40.0)
Requirement already satisfied: cmake in /usr/local/lib/python3.9/dist-packages (from triton==2.0.0->torch==2.0.0) (3.25.2)
Requirement already satisfied: lit in /usr/local/lib/python3.9/dist-packages (from triton==2.0.0->torch==2.0.0) (16.0.0)
Requirement already satisfied: MarkupSafe>=2.0 in /usr/local/lib/python3.9/dist-packages (from jinja2->torch==2.0.0) (2.1.2)
Requirement already satisfied: mpmath>=0.19 in /usr/local/lib/python3.9/dist-packages (from sympy->torch==2.0.0) (1.3.0)
Installing collected packages: nvidia-nvtx-cu11, nvidia-nccl-cu11, nvidia-cusparse-cu11, nvidia-curand-cu11, nvidia-cufft-cu11, nvidia-cuda-runtime-cu11, nvidia-cuda-nvrtc-cu11, nvidia-cuda-cupti-cu11, nvidia-cublas-cu11, nvidia-cusolver-cu11, nvidia-cudnn-cu11, torch
Attempting uninstall: torch
Found existing installation: torch 1.12.0
Uninstalling torch-1.12.0:
Successfully uninstalled torch-1.12.0
ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
torchvision 0.13.0 requires torch==1.12.0, but you have torch 2.0.0 which is incompatible.
Successfully installed nvidia-cublas-cu11-11.10.3.66 nvidia-cuda-cupti-cu11-11.7.101 nvidia-cuda-nvrtc-cu11-11.7.99 nvidia-cuda-runtime-cu11-11.7.99 nvidia-cudnn-cu11-8.5.0.96 nvidia-cufft-cu11-10.9.0.58 nvidia-curand-cu11-10.2.10.91 nvidia-cusolver-cu11-11.4.0.1 nvidia-cusparse-cu11-11.7.4.91 nvidia-nccl-cu11-2.14.3 nvidia-nvtx-cu11-11.7.91 torch-2.0.0
[7]
17s
!pip install git+https://github.com/gautamchitnis/cocoapi.git@cocodataset-master#subdirectory=PythonAPI
Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting git+https://github.com/gautamchitnis/cocoapi.git@cocodataset-master#subdirectory=PythonAPI
Cloning https://github.com/gautamchitnis/cocoapi.git (to revision cocodataset-master) to /tmp/pip-req-build-ojbw02d9
Running command git clone --filter=blob:none --quiet https://github.com/gautamchitnis/cocoapi.git /tmp/pip-req-build-ojbw02d9
Running command git checkout -b cocodataset-master --track origin/cocodataset-master
Switched to a new branch 'cocodataset-master'
Branch 'cocodataset-master' set up to track remote branch 'cocodataset-master' from 'origin'.
Resolved https://github.com/gautamchitnis/cocoapi.git to commit 20291f19c46a8d11935862bc9e449a1b72ec25ed
Preparing metadata (setup.py) ... done
Building wheels for collected packages: pycocotools
Building wheel for pycocotools (setup.py) ... done
Created wheel for pycocotools: filename=pycocotools-2.0-cp39-cp39-linux_x86_64.whl size=398095 sha256=94723d5ce3fca91072d2832e8579000019a077fbabd5edf0c6c4db10f27932f7
Stored in directory: /tmp/pip-ephem-wheel-cache-n06x8qwe/wheels/a6/5f/ec/1eaf8c69abab5724baee819736e6d30adad774deb60736413b
Successfully built pycocotools
Installing collected packages: pycocotools
Attempting uninstall: pycocotools
Found existing installation: pycocotools 2.0.6
Uninstalling pycocotools-2.0.6:
Successfully uninstalled pycocotools-2.0.6
Successfully installed pycocotools-2.0
[11]
0s
#membuat file YAML (konfigurasi datatrain)-
%%writefile data_configs/clove_2023.yaml

Direktori gambar dan label harus relatif terhadap train.py ()

TRAIN_DIR_IMAGES: 'custom_data/train'
TRAIN_DIR_LABELS: 'custom_data/train'

VALID_DIR_IMAGES: 'custom_data/valid'
VALID_DIR_LABELS: 'custom_data/valid'

Masukkan nama classs

CLASSES: [
'background',
'clove', 'rubbish'
]

Jumlah class yang dijadikan object (object classes + 1 for background class in Faster RCNN).

NC: 3

Pertanyaan apakan akan menyimpan prediksi set validasi saat pelatihan jika ya ketik "True" dan jika tidak ketik "False"

SAVE_VALID_PREDICTION_IMAGES: True
Writing data_configs/clove_2023.yaml
[13]
50s

install library

!pip install vision_transformers
Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting vision_transformers
Downloading vision_transformers-0.1.1.0.tar.gz (38 kB)
Preparing metadata (setup.py) ... done
Requirement already satisfied: torch>=1.10 in /usr/local/lib/python3.9/dist-packages (from vision_transformers) (2.0.0)
Requirement already satisfied: torchvision in /usr/local/lib/python3.9/dist-packages (from vision_transformers) (0.13.0)
Requirement already satisfied: nvidia-cuda-runtime-cu11==11.7.99 in /usr/local/lib/python3.9/dist-packages (from torch>=1.10->vision_transformers) (11.7.99)
Requirement already satisfied: nvidia-cusparse-cu11==11.7.4.91 in /usr/local/lib/python3.9/dist-packages (from torch>=1.10->vision_transformers) (11.7.4.91)
Requirement already satisfied: typing-extensions in /usr/local/lib/python3.9/dist-packages (from torch>=1.10->vision_transformers) (4.5.0)
Requirement already satisfied: nvidia-nccl-cu11==2.14.3 in /usr/local/lib/python3.9/dist-packages (from torch>=1.10->vision_transformers) (2.14.3)
Requirement already satisfied: nvidia-cublas-cu11==11.10.3.66 in /usr/local/lib/python3.9/dist-packages (from torch>=1.10->vision_transformers) (11.10.3.66)
Requirement already satisfied: nvidia-nvtx-cu11==11.7.91 in /usr/local/lib/python3.9/dist-packages (from torch>=1.10->vision_transformers) (11.7.91)
Requirement already satisfied: nvidia-curand-cu11==10.2.10.91 in /usr/local/lib/python3.9/dist-packages (from torch>=1.10->vision_transformers) (10.2.10.91)
Requirement already satisfied: nvidia-cufft-cu11==10.9.0.58 in /usr/local/lib/python3.9/dist-packages (from torch>=1.10->vision_transformers) (10.9.0.58)
Requirement already satisfied: nvidia-cuda-cupti-cu11==11.7.101 in /usr/local/lib/python3.9/dist-packages (from torch>=1.10->vision_transformers) (11.7.101)
Requirement already satisfied: jinja2 in /usr/local/lib/python3.9/dist-packages (from torch>=1.10->vision_transformers) (3.1.2)
Requirement already satisfied: nvidia-cudnn-cu11==8.5.0.96 in /usr/local/lib/python3.9/dist-packages (from torch>=1.10->vision_transformers) (8.5.0.96)
Requirement already satisfied: triton==2.0.0 in /usr/local/lib/python3.9/dist-packages (from torch>=1.10->vision_transformers) (2.0.0)
Requirement already satisfied: filelock in /usr/local/lib/python3.9/dist-packages (from torch>=1.10->vision_transformers) (3.10.7)
Requirement already satisfied: nvidia-cusolver-cu11==11.4.0.1 in /usr/local/lib/python3.9/dist-packages (from torch>=1.10->vision_transformers) (11.4.0.1)
Requirement already satisfied: networkx in /usr/local/lib/python3.9/dist-packages (from torch>=1.10->vision_transformers) (3.0)
Requirement already satisfied: sympy in /usr/local/lib/python3.9/dist-packages (from torch>=1.10->vision_transformers) (1.11.1)
Requirement already satisfied: nvidia-cuda-nvrtc-cu11==11.7.99 in /usr/local/lib/python3.9/dist-packages (from torch>=1.10->vision_transformers) (11.7.99)
Requirement already satisfied: setuptools in /usr/local/lib/python3.9/dist-packages (from nvidia-cublas-cu11==11.10.3.66->torch>=1.10->vision_transformers) (59.5.0)
Requirement already satisfied: wheel in /usr/local/lib/python3.9/dist-packages (from nvidia-cublas-cu11==11.10.3.66->torch>=1.10->vision_transformers) (0.40.0)
Requirement already satisfied: lit in /usr/local/lib/python3.9/dist-packages (from triton==2.0.0->torch>=1.10->vision_transformers) (16.0.0)
Requirement already satisfied: cmake in /usr/local/lib/python3.9/dist-packages (from triton==2.0.0->torch>=1.10->vision_transformers) (3.25.2)
Requirement already satisfied: requests in /usr/local/lib/python3.9/dist-packages (from torchvision->vision_transformers) (2.27.1)
Requirement already satisfied: pillow!=8.3.*,>=5.3.0 in /usr/local/lib/python3.9/dist-packages (from torchvision->vision_transformers) (8.4.0)
Collecting torch>=1.10
Using cached torch-1.12.0-cp39-cp39-manylinux1_x86_64.whl (776.3 MB)
Requirement already satisfied: numpy in /usr/local/lib/python3.9/dist-packages (from torchvision->vision_transformers) (1.22.4)
Requirement already satisfied: MarkupSafe>=2.0 in /usr/local/lib/python3.9/dist-packages (from jinja2->torch>=1.10->vision_transformers) (2.1.2)
Requirement already satisfied: charset-normalizer~=2.0.0 in /usr/local/lib/python3.9/dist-packages (from requests->torchvision->vision_transformers) (2.0.12)
Requirement already satisfied: idna<4,>=2.5 in /usr/local/lib/python3.9/dist-packages (from requests->torchvision->vision_transformers) (3.4)
Requirement already satisfied: urllib3<1.27,>=1.21.1 in /usr/local/lib/python3.9/dist-packages (from requests->torchvision->vision_transformers) (1.26.15)
Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.9/dist-packages (from requests->torchvision->vision_transformers) (2022.12.7)
Requirement already satisfied: mpmath>=0.19 in /usr/local/lib/python3.9/dist-packages (from sympy->torch>=1.10->vision_transformers) (1.3.0)
Building wheels for collected packages: vision_transformers
Building wheel for vision_transformers (setup.py) ... done
Created wheel for vision_transformers: filename=vision_transformers-0.1.1.0-py3-none-any.whl size=48431 sha256=64be1b21506d11b9bba0bf2d026e79cfff9fc4133238bf701a865cc4e191d9bb
Stored in directory: /root/.cache/pip/wheels/1d/86/13/31f5fe3a4f2bc3dd24af79c12d5f732256e5047cb70d7485ef
Successfully built vision_transformers
Installing collected packages: torch, vision_transformers
Attempting uninstall: torch
Found existing installation: torch 2.0.0
Uninstalling torch-2.0.0:
Successfully uninstalled torch-2.0.0
ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
torchtext 0.15.1 requires torch==2.0.0, but you have torch 1.12.0 which is incompatible.
torchdata 0.6.0 requires torch==2.0.0, but you have torch 1.12.0 which is incompatible.
torchaudio 2.0.1+cu118 requires torch==2.0.0, but you have torch 1.12.0 which is incompatible.
Successfully installed torch-1.12.0 vision_transformers-0.1.1.0
[15]
1m

Old

!python train.py --model fasterrcnn_mobilenetv3_large_fpn --data data_configs/clove_2023.yaml --epochs 40 --no-mosaic --name training_old --seed 42
Not using distributed mode
wandb: Currently logged in as: auliadarnilasarii97 (mandev). Use wandb login --relogin to force relogin
wandb: Tracking run with wandb version 0.14.2
wandb: Run data is saved locally in /content/drive/MyDrive/cangkehdetection/fasterrcnn-pytorch-training-pipeline/wandb/run-20230409_064846-ws99a2ql
wandb: Run wandb offline to turn off syncing.
wandb: Syncing run training_old
wandb: ⭐️ View project at https://wandb.ai/mandev/fasterrcnn-pytorch-training-pipeline
wandb: 🚀 View run at https://wandb.ai/mandev/fasterrcnn-pytorch-training-pipeline/runs/ws99a2ql
device cuda
2023-04-09 06:48:47.993981: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-04-09 06:48:49.286339: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
Creating data loaders
/usr/local/lib/python3.9/dist-packages/torch/utils/data/dataloader.py:557: UserWarning: This DataLoader will create 4 worker processes in total. Our suggested max number of worker in current system is 2, which is smaller than what this DataLoader is going to create. Please be aware that excessive worker creation might get DataLoader running slow or even freeze, lower the worker number to avoid potential slowness/freeze if necessary.
warnings.warn(_create_warning_msg(
Number of training samples: 1470
Number of validation samples: 179

Building model from scratch...
/usr/local/lib/python3.9/dist-packages/torchvision/models/_utils.py:208: UserWarning: The parameter 'pretrained' is deprecated since 0.13 and will be removed in 0.15, please use 'weights' instead.
warnings.warn(
/usr/local/lib/python3.9/dist-packages/torchvision/models/_utils.py:223: UserWarning: Arguments other than a weight enum or None for 'weights' are deprecated since 0.13 and will be removed in 0.15. The current behavior is equivalent to passing weights=FasterRCNN_MobileNet_V3_Large_FPN_Weights.COCO_V1. You can also use weights=FasterRCNN_MobileNet_V3_Large_FPN_Weights.DEFAULT to get the most up-to-date weights.
warnings.warn(msg)
Downloading: "https://download.pytorch.org/models/fasterrcnn_mobilenet_v3_large_fpn-fb6a3cc7.pth" to /root/.cache/torch/hub/checkpoints/fasterrcnn_mobilenet_v3_large_fpn-fb6a3cc7.pth
100% 74.2M/74.2M [00:00<00:00, 88.0MB/s]
Traceback (most recent call last):
File "/content/drive/MyDrive/cangkehdetection/fasterrcnn-pytorch-training-pipeline/train.py", line 539, in
main(args)
File "/content/drive/MyDrive/cangkehdetection/fasterrcnn-pytorch-training-pipeline/train.py", line 331, in main
model = model.to(DEVICE)
File "/usr/local/lib/python3.9/dist-packages/torch/nn/modules/module.py", line 927, in to
return self._apply(convert)
File "/usr/local/lib/python3.9/dist-packages/torch/nn/modules/module.py", line 579, in _apply
module._apply(fn)
File "/usr/local/lib/python3.9/dist-packages/torch/nn/modules/module.py", line 579, in _apply
module._apply(fn)
File "/usr/local/lib/python3.9/dist-packages/torch/nn/modules/module.py", line 579, in _apply
module._apply(fn)
[Previous line repeated 1 more time]
File "/usr/local/lib/python3.9/dist-packages/torch/nn/modules/module.py", line 602, in _apply
param_applied = fn(param)
File "/usr/local/lib/python3.9/dist-packages/torch/nn/modules/module.py", line 925, in convert
return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None, non_blocking)
File "/usr/local/lib/python3.9/dist-packages/torch/cuda/init.py", line 217, in _lazy_init
torch._C._cuda_init()
RuntimeError: Found no NVIDIA driver on your system. Please check that you have an NVIDIA GPU and installed a driver from http://www.nvidia.com/Download/index.aspx
wandb: Waiting for W&B process to finish... (failed 1). Press Control-C to abort syncing.
wandb: 🚀 View run training_old at: https://wandb.ai/mandev/fasterrcnn-pytorch-training-pipeline/runs/ws99a2ql
wandb: Synced 5 W&B file(s), 0 media file(s), 0 artifact file(s) and 0 other file(s)
wandb: Find logs at: ./wandb/run-20230409_064846-ws99a2ql/logs

For training on custom dataset, I saw that in your smoke.yaml example stated that object class need to +1, does that mean if my dataset got 28classes originally, then I also need to +1 classes, which make my number of classes become 29?

One more extra question is that does this repo require any specific dataset format? Like in the example is voc format?

How to train custom

If I have images and txt file how to I train with your coda ??

More metrics

Hi,

I would like to use your repo to run fasterRCNN on my custom dataset. I would like to validate if detections are TP,FP, and FN and use it to compute TruePositiveRate, Precision, Recall ,and F-1. Could you please guide me how I should this for the inference mode?
Do you think I can use the following section in eval.py to compute these metrics:
for i in range(len(images)): true_dict = dict() preds_dict = dict() true_dict['boxes'] = targets[i]['boxes'].detach().cpu() true_dict['labels'] = targets[i]['labels'].detach().cpu() preds_dict['boxes'] = outputs[i]['boxes'].detach().cpu() preds_dict['scores'] = outputs[i]['scores'].detach().cpu() preds_dict['labels'] = outputs[i]['labels'].detach().cpu() preds.append(preds_dict) target.append(true_dict)
I only see the average precision.

How to validate the trained weights?

ResNet-101 network

I want to use resnet-101 model but I can't find it in the models folder. Could you please help me!?
How I can find the precision and recall for each object?

Thank you

Extracting image features

Is there a way to extract image features after the fasterrcnn model has been trained?

How to train with pretrained weights?

The train.py file accepts --model and --weights arguments and when I use the --model argument, training is from scratch. And when I use --weights and set the path to a pth file, I am getting this error:

Loading pretrained weights... Traceback (most recent call last): File "E:\fastercnn-pytorch-training-pipeline\train.py", line 547, in <module> main(args) File "E:\fastercnn-pytorch-training-pipeline\train.py", line 293, in main keys = list(checkpoint['model_state_dict'].keys()) KeyError: 'model_state_dict'

Hi, can you make a simple training.ipynb or a guideline kaggle notebook for newbies like me? (I am using roboflow to manage the dataset too)

I am using roboflow to manage the dataset. FYI roboflow can export a dataset to different format.

Generate the accuracy and loss figures

Dear Sir,

Could you please generate the training loss and training accuracy in one figure?

Thank you

Training is very slow: prefetching and cachin data?

I have been trying to implement my own custom dataset and it works but it seems like the data is red from files every epoch, is there no way of caching the data? I have mostly worked in Tensorflow and it can get a tremendous improvement in speed by prefetching and caching the data.

Model metrics / benchmarks for different backbones

Love this repo.

I tend to default to using the resnet50_v2 backbone (best performer in torchvision object detectionmetrics). Curious if your experiments suggest that this is a good choice or not?

And if you're not sure I'd be tempted to perform some training runs and write them up.

Is there a way to generate a confusion matrix?

Hi! I would love to know if there is a way or could be implemented a way of generating a confusion matrix while validation, among other metrics as F1-score.

ValueError: y_max is less than or equal to y_min for bbox (tensor(0.9438), tensor(0.5500), tensor(0.9563), tensor(0.5500), tensor(26))

Hi, I was using the same dataset which split into train/valid/test. When I use train.py, it has no problem at all, but when I execute validate.py it comes out this error. You have any ideas about it?

Distributed execution on single machine?

Is there any instruction to follow for creating a distributed execution? Preferably for execution on a single machine with multiple GPUs.
I can se that it should be possible but I don't understand how to do it.

eval with specific threshold

Is there a way to get the precision/recall for a specific threshold ? Or a way to find the best threshold for my model

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.