Code Monkey home page Code Monkey logo

fasterrcnn's People

Contributors

trzy avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

fasterrcnn's Issues

Buggy implementation of smooth L1 loss

The implementation in pytorch/model/detector/regression_loss (line 144) seems to apply x (instead of x_abs) in the case selection.
Simply changing is_negative_branch = (x < (1.0 / sigma_squared)) to is_negative_branch = (x_abs ...). shall fix it.

Thank you for this project!

step by step understanding approximate joint training method #192

i don't understand exactly approximate joint training method.
i know RPN and detector merged as a one network during training.
the forward path is started pre trained conv network and pass from RPN and finally arrives to fast rcnn layers. loss is computed :

RPN classification loss + RPN regression loss + Detection classification loss + Detection bounding-box regression loss.

but where is it from the backpropagation path? is it from detector and RPN and finally pretrained convnet?
in this case how derivation performed in decoder section in RPN? offcets produced with 1x1 reg-conv layer in RPN is translated to proposals in decoder.

What shoud be structute of dataset dir?

Noted dataset VOC2007 contains of the following folders:

Annotations  
ImageSets  
JPEGImages  
SegmentationClass  
SegmentationObject

Are they all necessary OR only some of them should be provided?

Sample larger than population or is negative during counting of negative_anchor_idxs

I have error:

Sample larger than population or is negative

in FasterRCNN\models\faster_rcnn.py in this section

  # Sample, producing indices into the index maps
    num_positive_anchors = len(positive_anchors)
    num_negative_anchors = len(negative_anchors)
    num_positive_samples = min(self._rpn_minibatch_size // 2, num_positive_anchors) # up to half the samples should be positive, if possible
    num_negative_samples = self._rpn_minibatch_size - num_positive_samples          # the rest should be negative
    positive_anchor_idxs = random.sample(range(num_positive_anchors), num_positive_samples)
    # negative_anchor_idxs = random.sample(range(num_negative_anchors), num_negative_samples)  # <-- error is here
    
    # ... fixed by
    negative_anchor_idxs = random.sample(range(num_negative_anchors), min(num_negative_samples, num_negative_anchors)) 

I have specific dataset with only 1 class and there are no negative samples (images that do not contain objects on this class)

Is this the correct fix? Or is it better to add negative examples into the dataset.

Validation Losses

Hi there, please can somebody help me to calculate the validation losses.

Thank you

What happen when I use vgg16 as backbone.

@trzy
Thank you your good work.
I've affirmed your pytorch version codes. Then, I have quessions.

  1. What happen when I use vgg16 (not models/vgg16_torch.py) as backbone? At that case, how to load the initial weight? I did not define vgg16_caffe.pth, but I can train.
  2. If I want to make model which can use four channels, should I make new image classification programs to train backbone from scratch? (I will use vgg16, but I can follow your advise if I have to use others.)

I want to use your good programs for my task. So, Please let me know.
Regards.

Usage over Object Detection APi

Hi, could you please provide some advice over using this approach versus using a Faster R CNN model with the Tensorflow Object Detection API?

Thanks, you did a great job.

Training times too long

Hello,

first of all thanks for the implementation. We are running some tests on a computer with two GPUs "GeForce RTX 3070, 7982 MiB" and we have some doubts, especially regarding the duration of each epoch in training.

  • We're running a release with the minimum configuration, asking it not to do data-augmentation and asking it to do the image caching:

python -m tf2.FasterRCNN --train --dataset-dir=./own_dataset/ --epochs=1 --learning-rate=1e-3 --save-best-to=fasterrcnn_tf2_tmp.h5 --no-augment --cache-images

we have durations per epoch of almost 2h.

  • Trying to launch it with data augmentation and without caching images:

python -m tf2.FasterRCNN --train --dataset-dir=./own_dataset/ --epochs=1 --learning-rate=1e-3 --save-best-to=fasterrcnn_tf2_tmp.h5

we get the same times per epoch.

  • If we include options like:

--debug-dir=/tmp/tf_debugger/

the duration increases to more than 8h per epoch.

Are we misconfiguring something or is it simply due to the dataset used?
Why don't we experience time improvements by removing data augmentation and including image caching?

Thank you very much!

Get predictions as a set of segments

How to get predictions as a set of coordinates of marked segments - i.e. not as an segmented png-image but as e.g. json with:

{
  imageid: ccc;
  object:dog; 
  x1: 33;
  y1:44;
...
}

Add channels from 3 to 4

Thank you for your good work trzy.
I want to use your faster r-cnn programs for my task.
And I want to add channels from 3 (RGB) to 4 (RGB + New one). Where should I change the program?

Thank you.

Please support to create new backbone based on ViT

Dear @trzy,

Thank for great repo. I am trying to try the new backbone ViT from your source code. I using the similar template from file: vgg16_torch.py which modify the line 67:
vgg16 = torchvision.models.vgg16(weights = torchvision.models.VGG16_Weights.IMAGENET1K_V1, dropout = dropout_probability)
to
ViT = torchvision.models.vit_b_16(weights=ViT_B_16_Weights.DEFAULT)

Based on ViT concept, the feature should be like
# Expand the class token to the full batch
batch_class_token = vit.class_token.expand(img.shape[0], -1, -1)
feats = torch.cat([batch_class_token, feats], dim=1)
feats = vit.encoder(feats)
We're only interested in the representation of the classifier token that we appended at position 0.
feats = feats[:, 0]

I am still get lost to fix FeatureExtractor function to fix the concept. Please assist if possible.
Many thanks!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.