The holocron from frgfm

[references] Add codecarbon integration in training scripts

Since the reference scripts can be used on very different hardware, it is important to keep track of the footprint of such operations. In order to do so, Codecarbon could be used: https://github.com/mlco2/codecarbon

Inplace bias correction

Holocron/holocron/optim/radam.py

Line 90 in 7195235

exp_avg.div_(bias_correction1)

Here bias correction is made inplace so it affects exp_avg in the next steps (bias correction accumulates). I wonder if that was intentional?
Original Adam doesn't accomulate bias correction, it is applied on each step to calculate momentums with hat.

PolyLoss doesn't support soft targets

Bug description

Hello, @frgfm
When calling the PolyLoss() function, I got the following runtime error, but didn't know how to solve it.
Please help~

Code snippet to reproduce the bug

directly invoke PolyLoss() in my framework

Error traceback

File "/path/to/poly_loss.py", in forward
    return poly_loss(x, target, self.eps, self.weight, self.ignore_index, self.reduction)
File "/path/to/poly_loss.py", in poly_loss
    logpt = logpt.transpose(1, 0).flatten(1).gather(0, target.view(1, -1)).squeeze()
RuntimeError: gather(): Expected dtype int64 for index

Environment

python 3.10

[references] Add support of CIFAR-10/100 for image classification

Since CIFAR datasets are popular image datasets, this could be worth integrating them as options in the classification training script.

[models] Implement latest classification SOTA models

Being among the new SOTA models, the library should include:

ConvNeXt from https://arxiv.org/abs/2201.03545 (#251 #267)
EfficientNetV2 from https://arxiv.org/pdf/2104.00298.pdf
ResNet-RS from https://arxiv.org/pdf/2103.07579.pdf
MobileOne from https://arxiv.org/pdf/2206.04040.pdf (#252)
HorNet from https://arxiv.org/pdf/2207.14284.pdf
RegNet from https://arxiv.org/pdf/2003.13678.pdf

[models] Implement Dynamic UNet by passing any backbone

UNet architectures are not currently leveraging pretrained backbones for now. Their constructor should be able to accept a backbone as an argument (the bare backbone is defined by the original paper).

Add annotation typing and mypy CI verification

It would be preferable to add annotation typing early for clearer documentation and interface. A CI job should be added to enforce mypy where it's possible.

[models] Detection training seems not to be converging properly

[trainer] Add gradient accumulation option

All trainers should have an option to use gradient accumulation so that we can perform training at decent batch size without a beefy GPU.

polyloss not working when target contains `ignore_index`

Bug description

polyloss not working when target contains ignore_index

Code snippet to reproduce the bug

labels = [-100,2,1]
logits = [[-2.0605, -1.0522,  1.0922],[1.0303, -2.0048,  1.0727],[-1.1031,  1.0414, -2.0464]]
labels_pt = torch.tensor(labels,dtype=torch.int64)
logits_pt = torch.tensor(logits,dtype=torch.float32)

loss = poly_loss(logits_pt,labels_pt,reduction='none')

print('loss',loss)

Error traceback

index -100 is out of bounds for dimension 0 with size 3

Environment

[trainer] Add an image target size suggestion

Similarly to the LR finder, Holocron should integrate a suggestion for image target sizes. The goal is to limit oversampling / downsampling side effects while respecting aspect ratios.

Here is how this could be done:

a shape_recorder go through all images to produce the list of heights and widths
then we produce two arrays: aratio (height divided by width), and side (square root of height multiplied by width)
now, we want a range of values for each of those that fit a high percentage of the distribution
then we set the aspect ratio, and take the resolved side. The target size will be : side * sqrt(aratio), side / sqrt(aratio)

Pretrained object detection

is there a pretrained model available for object detection. I tried using yolov4 but getting error in loading pretrained weight as there is no url in the script.

[nn] Implement Involutions

For benchmarking purposes, it would be a good idea to implement the following paper:
https://arxiv.org/abs/2103.06255

[models] Add ONNX export compatibility for classification models

All classification models should be ONNX exportable:

Error Resnet pretrained

🐛 Bug

Hi, I want to use a pretrained 50 resnet but I have this error which I am having trouble resolving.
It spawns for resnet 18 and resnet 101.
I have not tested with the others.

To Reproduce

Steps to reproduce the behavior:

pretrained = True
backbone= "resnet50"
model = holocron.models.__dict__[backbone](
       pretrained, num_classes=1)

File "/.../.../Holocron/holocron/models/resnet.py", line 327, in resnet50
return _resnet('resnet50', pretrained, progress, **kwargs)
File "/.../.../Holocron/holocron/models/resnet.py", line 280, in _resnet
load_pretrained_params(model, default_cfgs[arch]['url'], progress)
File "/.../.../Holocron/holocron/models/utils.py", line 68, in load_pretrained_params
model.load_state_dict(state_dict)
File "/.../.../mainvenv/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1052, in load_state_dict
self.class.name, "\n\t".join(error_msgs)))
RuntimeError: Error(s) in loading state_dict for ResNet:

Missing key(s) in state_dict: "features.0.weight", "features.1.weight", "features.1.bias", "features.1.running_mean", "features.1.running_var", "features.4.0.conv.0.weight", "features.4.0.conv.1.weight", "features.4.0.conv.1.bias", "features.4.0.conv.1.running_mean", "features.4.0.conv.1.running_var", "features.4.0.conv.3.weight", "features.4.0.conv.4.weight", "features.4.0.conv.4.bias", "features.4.0.conv.4.running_mean", "features.4.0.conv.4.running_var", "features.4.0.conv.6.weight", "features.4.0.conv.7.weight", "features.4.0.conv.7.bias", "features.4.0.conv.7.running_mean", "features.4.0.conv.7.running_var", "features.4.0.downsample.0.weight", "features.4.0.downsample.1.weight", "features.4.0.downsample.1.bias", "features.4.0.downsample.1.running_mean", "features.4.0.downsample.1.running_var", "features.4.1.conv.0.weight", "features.4.1.conv.1.weight", "features.4.1.conv.1.bias", "features.4.1.conv.1.running_mean", "features.4.1.conv.1.running_var", "features.4.1.conv.3.weight", "features.4.1.conv.4.weight", "features.4.1.conv.4.bias", "features.4.1.conv.4.running_mean", "features.4.1.conv.4.running_var", "features.4.1.conv.6.weight", "features.4.1.conv.7.weight", "features.4.1.conv.7.bias", "features.4.1.conv.7.running_mean", "features.4.1.conv.7.running_var", "features.4.2.conv.0.weight", "features.4.2.conv.1.weight", "features.4.2.conv.1.bias", "features.4.2.conv.1.running_mean", "features.4.2.conv.1.running_var", "features.4.2.conv.3.weight", "features.4.2.conv.4.weight", "features.4.2.conv.4.bias", "features.4.2.conv.4.running_mean", "features.4.2.conv.4.running_var", "features.4.2.conv.6.weight", "features.4.2.conv.7.weight", "features.4.2.conv.7.bias", "features.4.2.conv.7.running_mean", "features.4.2.conv.7.running_var", "features.5.0.conv.0.weight", "features.5.0.conv.1.weight", "features.5.0.conv.1.bias", "features.5.0.conv.1.running_mean", "features.5.0.conv.1.running_var", "features.5.0.conv.3.weight", "features.5.0.conv.4.weight", "features.5.0.conv.4.bias", "features.5.0.conv.4.running_mean", "features.5.0.conv.4.running_var", "features.5.0.conv.6.weight", "features.5.0.conv.7.weight", "features.5.0.conv.7.bias", "features.5.0.conv.7.running_mean", "features.5.0.conv.7.running_var", "features.5.0.downsample.0.weight", "features.5.0.downsample.1.weight", "features.5.0.downsample.1.bias", "features.5.0.downsample.1.running_mean", "features.5.0.downsample.1.running_var", "features.5.1.conv.0.weight", "features.5.1.conv.1.weight", "features.5.1.conv.1.bias", "features.5.1.conv.1.running_mean", "features.5.1.conv.1.running_var", "features.5.1.conv.3.weight", "features.5.1.conv.4.weight", "features.5.1.conv.4.bias", "features.5.1.conv.4.running_mean", "features.5.1.conv.4.running_var", "features.5.1.conv.6.weight", "features.5.1.conv.7.weight", "features.5.1.conv.7.bias", "features.5.1.conv.7.running_mean", "features.5.1.conv.7.running_var", "features.5.2.conv.0.weight", "features.5.2.conv.1.weight", "features.5.2.conv.1.bias", "features.5.2.conv.1.running_mean", "features.5.2.conv.1.running_var", "features.5.2.conv.3.weight", "features.5.2.conv.4.weight", "features.5.2.conv.4.bias", "features.5.2.conv.4.running_mean", "features.5.2.conv.4.running_var", "features.5.2.conv.6.weight", "features.5.2.conv.7.weight", "features.5.2.conv.7.bias", "features.5.2.conv.7.running_mean", "features.5.2.conv.7.running_var", "features.5.3.conv.0.weight", "features.5.3.conv.1.weight", "features.5.3.conv.1.bias", "features.5.3.conv.1.running_mean", "features.5.3.conv.1.running_var", "features.5.3.conv.3.weight", "features.5.3.conv.4.weight", "features.5.3.conv.4.bias", "features.5.3.conv.4.running_mean", "features.5.3.conv.4.running_var", "features.5.3.conv.6.weight", "features.5.3.conv.7.weight", "features.5.3.conv.7.bias", "features.5.3.conv.7.running_mean", "features.5.3.conv.7.running_var", "features.6.0.conv.0.weight", "features.6.0.conv.1.weight", "features.6.0.conv.1.bias", "features.6.0.conv.1.running_mean", "features.6.0.conv.1.running_var", "features.6.0.conv.3.weight", "features.6.0.conv.4.weight", "features.6.0.conv.4.bias", "features.6.0.conv.4.running_mean", "features.6.0.conv.4.running_var", "features.6.0.conv.6.weight", "features.6.0.conv.7.weight", "features.6.0.conv.7.bias", "features.6.0.conv.7.running_mean", "features.6.0.conv.7.running_var", "features.6.0.downsample.0.weight", "features.6.0.downsample.1.weight", "features.6.0.downsample.1.bias", "features.6.0.downsample.1.running_mean", "features.6.0.downsample.1.running_var", "features.6.1.conv.0.weight", "features.6.1.conv.1.weight", "features.6.1.conv.1.bias", "features.6.1.conv.1.running_mean", "features.6.1.conv.1.running_var", "features.6.1.conv.3.weight", "features.6.1.conv.4.weight", "features.6.1.conv.4.bias", "features.6.1.conv.4.running_mean", "features.6.1.conv.4.running_var", "features.6.1.conv.6.weight", "features.6.1.conv.7.weight", "features.6.1.conv.7.bias", "features.6.1.conv.7.running_mean", "features.6.1.conv.7.running_var", "features.6.2.conv.0.weight", "features.6.2.conv.1.weight", "features.6.2.conv.1.bias", "features.6.2.conv.1.running_mean", "features.6.2.conv.1.running_var", "features.6.2.conv.3.weight", "features.6.2.conv.4.weight", "features.6.2.conv.4.bias", "features.6.2.conv.4.running_mean", "features.6.2.conv.4.running_var", "features.6.2.conv.6.weight", "features.6.2.conv.7.weight", "features.6.2.conv.7.bias", "features.6.2.conv.7.running_mean", "features.6.2.conv.7.running_var", "features.6.3.conv.0.weight", "features.6.3.conv.1.weight", "features.6.3.conv.1.bias", "features.6.3.conv.1.running_mean", "features.6.3.conv.1.running_var", "features.6.3.conv.3.weight", "features.6.3.conv.4.weight", "features.6.3.conv.4.bias", "features.6.3.conv.4.running_mean", "features.6.3.conv.4.running_var", "features.6.3.conv.6.weight", "features.6.3.conv.7.weight", "features.6.3.conv.7.bias", "features.6.3.conv.7.running_mean", "features.6.3.conv.7.running_var", "features.6.4.conv.0.weight", "features.6.4.conv.1.weight", "features.6.4.conv.1.bias", "features.6.4.conv.1.running_mean", "features.6.4.conv.1.running_var", "features.6.4.conv.3.weight", "features.6.4.conv.4.weight", "features.6.4.conv.4.bias", "features.6.4.conv.4.running_mean", "features.6.4.conv.4.running_var", "features.6.4.conv.6.weight", "features.6.4.conv.7.weight", "features.6.4.conv.7.bias", "features.6.4.conv.7.running_mean", "features.6.4.conv.7.running_var", "features.6.5.conv.0.weight", "features.6.5.conv.1.weight", "features.6.5.conv.1.bias", "features.6.5.conv.1.running_mean", "features.6.5.conv.1.running_var", "features.6.5.conv.3.weight", "features.6.5.conv.4.weight", "features.6.5.conv.4.bias", "features.6.5.conv.4.running_mean", "features.6.5.conv.4.running_var", "features.6.5.conv.6.weight", "features.6.5.conv.7.weight", "features.6.5.conv.7.bias", "features.6.5.conv.7.running_mean", "features.6.5.conv.7.running_var", "features.7.0.conv.0.weight", "features.7.0.conv.1.weight", "features.7.0.conv.1.bias", "features.7.0.conv.1.running_mean", "features.7.0.conv.1.running_var", "features.7.0.conv.3.weight", "features.7.0.conv.4.weight", "features.7.0.conv.4.bias", "features.7.0.conv.4.running_mean", "features.7.0.conv.4.running_var", "features.7.0.conv.6.weight", "features.7.0.conv.7.weight", "features.7.0.conv.7.bias", "features.7.0.conv.7.running_mean", "features.7.0.conv.7.running_var", "features.7.0.downsample.0.weight", "features.7.0.downsample.1.weight", "features.7.0.downsample.1.bias", "features.7.0.downsample.1.running_mean", "features.7.0.downsample.1.running_var", "features.7.1.conv.0.weight", "features.7.1.conv.1.weight", "features.7.1.conv.1.bias", "features.7.1.conv.1.running_mean", "features.7.1.conv.1.running_var", "features.7.1.conv.3.weight", "features.7.1.conv.4.weight", "features.7.1.conv.4.bias", "features.7.1.conv.4.running_mean", "features.7.1.conv.4.running_var", "features.7.1.conv.6.weight", "features.7.1.conv.7.weight", "features.7.1.conv.7.bias", "features.7.1.conv.7.running_mean", "features.7.1.conv.7.running_var", "features.7.2.conv.0.weight", "features.7.2.conv.1.weight", "features.7.2.conv.1.bias", "features.7.2.conv.1.running_mean", "features.7.2.conv.1.running_var", "features.7.2.conv.3.weight", "features.7.2.conv.4.weight", "features.7.2.conv.4.bias", "features.7.2.conv.4.running_mean", "features.7.2.conv.4.running_var", "features.7.2.conv.6.weight", "features.7.2.conv.7.weight", "features.7.2.conv.7.bias", "features.7.2.conv.7.running_mean", "features.7.2.conv.7.running_var", "head.weight", "head.bias". 
        
Unexpected key(s) in state_dict: "conv1.weight", "bn1.running_mean", "bn1.running_var", "bn1.weight", "bn1.bias", "layer1.0.conv1.weight", "layer1.0.bn1.running_mean", "layer1.0.bn1.running_var", "layer1.0.bn1.weight", "layer1.0.bn1.bias", "layer1.0.conv2.weight", "layer1.0.bn2.running_mean", "layer1.0.bn2.running_var", "layer1.0.bn2.weight", "layer1.0.bn2.bias", "layer1.0.conv3.weight", "layer1.0.bn3.running_mean", "layer1.0.bn3.running_var", "layer1.0.bn3.weight", "layer1.0.bn3.bias", "layer1.0.downsample.0.weight", "layer1.0.downsample.1.running_mean", "layer1.0.downsample.1.running_var", "layer1.0.downsample.1.weight", "layer1.0.downsample.1.bias", "layer1.1.conv1.weight", "layer1.1.bn1.running_mean", "layer1.1.bn1.running_var", "layer1.1.bn1.weight", "layer1.1.bn1.bias", "layer1.1.conv2.weight", "layer1.1.bn2.running_mean", "layer1.1.bn2.running_var", "layer1.1.bn2.weight", "layer1.1.bn2.bias", "layer1.1.conv3.weight", "layer1.1.bn3.running_mean", "layer1.1.bn3.running_var", "layer1.1.bn3.weight", "layer1.1.bn3.bias", "layer1.2.conv1.weight", "layer1.2.bn1.running_mean", "layer1.2.bn1.running_var", "layer1.2.bn1.weight", "layer1.2.bn1.bias", "layer1.2.conv2.weight", "layer1.2.bn2.running_mean", "layer1.2.bn2.running_var", "layer1.2.bn2.weight", "layer1.2.bn2.bias", "layer1.2.conv3.weight", "layer1.2.bn3.running_mean", "layer1.2.bn3.running_var", "layer1.2.bn3.weight", "layer1.2.bn3.bias", "layer2.0.conv1.weight", "layer2.0.bn1.running_mean", "layer2.0.bn1.running_var", "layer2.0.bn1.weight", "layer2.0.bn1.bias", "layer2.0.conv2.weight", "layer2.0.bn2.running_mean", "layer2.0.bn2.running_var", "layer2.0.bn2.weight", "layer2.0.bn2.bias", "layer2.0.conv3.weight", "layer2.0.bn3.running_mean", "layer2.0.bn3.running_var", "layer2.0.bn3.weight", "layer2.0.bn3.bias", "layer2.0.downsample.0.weight", "layer2.0.downsample.1.running_mean", "layer2.0.downsample.1.running_var", "layer2.0.downsample.1.weight", "layer2.0.downsample.1.bias", "layer2.1.conv1.weight", "layer2.1.bn1.running_mean", "layer2.1.bn1.running_var", "layer2.1.bn1.weight", "layer2.1.bn1.bias", "layer2.1.conv2.weight", "layer2.1.bn2.running_mean", "layer2.1.bn2.running_var", "layer2.1.bn2.weight", "layer2.1.bn2.bias", "layer2.1.conv3.weight", "layer2.1.bn3.running_mean", "layer2.1.bn3.running_var", "layer2.1.bn3.weight", "layer2.1.bn3.bias", "layer2.2.conv1.weight", "layer2.2.bn1.running_mean", "layer2.2.bn1.running_var", "layer2.2.bn1.weight", "layer2.2.bn1.bias", "layer2.2.conv2.weight", "layer2.2.bn2.running_mean", "layer2.2.bn2.running_var", "layer2.2.bn2.weight", "layer2.2.bn2.bias", "layer2.2.conv3.weight", "layer2.2.bn3.running_mean", "layer2.2.bn3.running_var", "layer2.2.bn3.weight", "layer2.2.bn3.bias", "layer2.3.conv1.weight", "layer2.3.bn1.running_mean", "layer2.3.bn1.running_var", "layer2.3.bn1.weight", "layer2.3.bn1.bias", "layer2.3.conv2.weight", "layer2.3.bn2.running_mean", "layer2.3.bn2.running_var", "layer2.3.bn2.weight", "layer2.3.bn2.bias", "layer2.3.conv3.weight", "layer2.3.bn3.running_mean", "layer2.3.bn3.running_var", "layer2.3.bn3.weight", "layer2.3.bn3.bias", "layer3.0.conv1.weight", "layer3.0.bn1.running_mean", "layer3.0.bn1.running_var", "layer3.0.bn1.weight", "layer3.0.bn1.bias", "layer3.0.conv2.weight", "layer3.0.bn2.running_mean", "layer3.0.bn2.running_var", "layer3.0.bn2.weight", "layer3.0.bn2.bias", "layer3.0.conv3.weight", "layer3.0.bn3.running_mean", "layer3.0.bn3.running_var", "layer3.0.bn3.weight", "layer3.0.bn3.bias", "layer3.0.downsample.0.weight", "layer3.0.downsample.1.running_mean", "layer3.0.downsample.1.running_var", "layer3.0.downsample.1.weight", "layer3.0.downsample.1.bias", "layer3.1.conv1.weight", "layer3.1.bn1.running_mean", "layer3.1.bn1.running_var", "layer3.1.bn1.weight", "layer3.1.bn1.bias", "layer3.1.conv2.weight", "layer3.1.bn2.running_mean", "layer3.1.bn2.running_var", "layer3.1.bn2.weight", "layer3.1.bn2.bias", "layer3.1.conv3.weight", "layer3.1.bn3.running_mean", "layer3.1.bn3.running_var", "layer3.1.bn3.weight", "layer3.1.bn3.bias", "layer3.2.conv1.weight", "layer3.2.bn1.running_mean", "layer3.2.bn1.running_var", "layer3.2.bn1.weight", "layer3.2.bn1.bias", "layer3.2.conv2.weight", "layer3.2.bn2.running_mean", "layer3.2.bn2.running_var", "layer3.2.bn2.weight", "layer3.2.bn2.bias", "layer3.2.conv3.weight", "layer3.2.bn3.running_mean", "layer3.2.bn3.running_var", "layer3.2.bn3.weight", "layer3.2.bn3.bias", "layer3.3.conv1.weight", "layer3.3.bn1.running_mean", "layer3.3.bn1.running_var", "layer3.3.bn1.weight", "layer3.3.bn1.bias", "layer3.3.conv2.weight", "layer3.3.bn2.running_mean", "layer3.3.bn2.running_var", "layer3.3.bn2.weight", "layer3.3.bn2.bias", "layer3.3.conv3.weight", "layer3.3.bn3.running_mean", "layer3.3.bn3.running_var", "layer3.3.bn3.weight", "layer3.3.bn3.bias", "layer3.4.conv1.weight", "layer3.4.bn1.running_mean", "layer3.4.bn1.running_var", "layer3.4.bn1.weight", "layer3.4.bn1.bias", "layer3.4.conv2.weight", "layer3.4.bn2.running_mean", "layer3.4.bn2.running_var", "layer3.4.bn2.weight", "layer3.4.bn2.bias", "layer3.4.conv3.weight", "layer3.4.bn3.running_mean", "layer3.4.bn3.running_var", "layer3.4.bn3.weight", "layer3.4.bn3.bias", "layer3.5.conv1.weight", "layer3.5.bn1.running_mean", "layer3.5.bn1.running_var", "layer3.5.bn1.weight", "layer3.5.bn1.bias", "layer3.5.conv2.weight", "layer3.5.bn2.running_mean", "layer3.5.bn2.running_var", "layer3.5.bn2.weight", "layer3.5.bn2.bias", "layer3.5.conv3.weight", "layer3.5.bn3.running_mean", "layer3.5.bn3.running_var", "layer3.5.bn3.weight", "layer3.5.bn3.bias", "layer4.0.conv1.weight", "layer4.0.bn1.running_mean", "layer4.0.bn1.running_var", "layer4.0.bn1.weight", "layer4.0.bn1.bias", "layer4.0.conv2.weight", "layer4.0.bn2.running_mean", "layer4.0.bn2.running_var", "layer4.0.bn2.weight", "layer4.0.bn2.bias", "layer4.0.conv3.weight", "layer4.0.bn3.running_mean", "layer4.0.bn3.running_var", "layer4.0.bn3.weight", "layer4.0.bn3.bias", "layer4.0.downsample.0.weight", "layer4.0.downsample.1.running_mean", "layer4.0.downsample.1.running_var", "layer4.0.downsample.1.weight", "layer4.0.downsample.1.bias", "layer4.1.conv1.weight", "layer4.1.bn1.running_mean", "layer4.1.bn1.running_var", "layer4.1.bn1.weight", "layer4.1.bn1.bias", "layer4.1.conv2.weight", "layer4.1.bn2.running_mean", "layer4.1.bn2.running_var", "layer4.1.bn2.weight", "layer4.1.bn2.bias", "layer4.1.conv3.weight", "layer4.1.bn3.running_mean", "layer4.1.bn3.running_var", "layer4.1.bn3.weight", "layer4.1.bn3.bias", "layer4.2.conv1.weight", "layer4.2.bn1.running_mean", "layer4.2.bn1.running_var", "layer4.2.bn1.weight", "layer4.2.bn1.bias", "layer4.2.conv2.weight", "layer4.2.bn2.running_mean", "layer4.2.bn2.running_var", "layer4.2.bn2.weight", "layer4.2.bn2.bias", "layer4.2.conv3.weight", "layer4.2.bn3.running_mean", "layer4.2.bn3.running_var", "layer4.2.bn3.weight", "layer4.2.bn3.bias", "fc.weight", "fc.bias".

Expected behavior

Environment

PyTorch Version (e.g., 1.7): torch==1.8.1+cu111 torchvision==0.9.1+cu111
OS (e.g., Linux): Ubuntu 18.04
Python version: 3.7.9
CUDA/cuDNN version: 11
GPU models and configuration: RTX 2080TI
Any other relevant information:

[models] res2net pretrained=True throws HTTP Error

As per this issue,
calling any functions from models.res2net throws:

HTTP Error 403: Forbidden

A temporary solution would be to catch the Error and throw a warning.

LAMB: Differences from the paper author's official implementation

The LAMB implementation of the PyTorch version you released is different from the official version of TensorFlow released by the paper author. According to the official implementation published in the paper, the author's code implementation skips some parameters according to their names() when calculating. But in your implementation, it seems that all parameters are directly involved in the calculation.
For example, exclude_from_weight_decay=["batch_normalization", "LayerNorm", "layer_norm"]
Their implementation:
https://github.com/tensorflow/addons/blob/master/tensorflow_addons/optimizers/lamb.py

cIoU loss calculation question

Thank you for design this great tool!
I found the implementation of CIoU calculation is slightly different from the original paper(Distance-IoU Loss: Faster and Better Learning for Bounding Box Regression)

The ciou_loss() function calculate the final loss with code:
ciou_loss[_filter].addcdiv_(v[_filter] ** 2, 1 - iou[_filter] + v[_filter])

The alpha in the paper is formulated as: V/(1-IoU)+V
whereas the code loos like calculating (v^2)/((1-IoU)+v)
I'd like to ask if this is the better design comparing to the original CIoU loss.
Anyway, thank you for the implementation, it really saves me a lot of time!

[models] Implement latest detection SOTA models

Being among the new SOTA models, the library should include:

YOLOv3 from https://arxiv.org/abs/1804.02767
Scaled YOLOv4 from https://arxiv.org/abs/2011.08036
YOLOX from https://arxiv.org/abs/2107.08430

Collaboration

Hello there,

I am the author of glasses what if we join forces?

May the force be with you!

Francesco

Release tracker - v0.2.1

This issue is to be used to track the roadmap of Holocron for release v0.2.1, and collect feedback from users & contributors.

Trainer
- #227
Utils
- #225
Docs
- #226

Release tracker - v0.3.0

This issue is to be used to track the roadmap of Holocron for release v0.3.0, and collect feedback from users & contributors.

NN
- #209
Models
- #126
- #203
Optimizers
- #244
Transforms
- #237
Trainer
- #236
- #235
- #241
References
- #206
Demo
- #208
- #212

[trainer] Add a loss recorder option

Once a model has been trained, it would be nice to analyze its main failures. For that, one could:

go through the training set computing the loss (unreduced on the batch dimension)
keep the N samples with the highest loss
plot the image, the loss, the prediction and the target

A later option would be to display the CAM to better understand the model behaviour.

frgfm / holocron Goto Github PK

holocron's Introduction

holocron's People

Contributors

Stargazers

Watchers

Forkers

holocron's Issues

Bug description

Code snippet to reproduce the bug

Error traceback

Environment

Bug description

Code snippet to reproduce the bug

Error traceback

Environment

🐛 Bug

To Reproduce

Expected behavior

Environment

Recommend Projects

Recommend Topics

Recommend Org