Code Monkey home page Code Monkey logo

Comments (10)

QingqingWang-1 avatar QingqingWang-1 commented on August 17, 2024 2

Hi @JingChaoLiu , could you share your implementation of SyncBN? I try to use torch.nn.SyncBatchNorm in Pytorch 1.1, but it crashes in our program.

I don't know what is the authors' implementation, but I implement it by using torch.nn.BatchNorm2d and torch.nn.BatchNorm1d in the model part and
if distributed:
sync_bn_model = torch.nn.SyncBatchNorm.convert_sync_batchnorm(model)
model = torch.nn.parallel.DistributedDataParrel(sync_bn_model, device_ids=[local_rank], output_device=local_rank, )
in train_net.py.
Remember to change:
if isinstance(module, torch.nn.modules.batchnorm._BatchNorm): [torch/nn/modules/batchnorm.py (line 495)]
to
if isinstance(module, torch.nn.modules.batchnorm.BatchNorm2d):
Otherwise, the model will crash.

from pmtd.

JingChaoLiu avatar JingChaoLiu commented on August 17, 2024

I got NAN under the setting batch_size=36, LR=0.04,

In our earliest settings for baseline, we use just 8 cards with batch_size=16 and base_learning_rate=16 * 0.00125= 0.02 for warming up 2 epoch and training 40 epoch(a shrinking scheduler mentioned in #2). Though the F-measure is just 60%+, it seems to converge smoothly. Maybe you need to check the labels?

even when I use 1*binary_cross_entropy loss

The loss weight of mask branch keeps unchanged until the loss type is changed to l1_loss for pyramid label. The loss weight of binary_cross_entropy probably should be kept as 1 (haven`t done the loss weight experiments for binary_cross_entropy).

I calculate the cropped text area via cv2.findContours(). Is it OK?

Do you means calculating the text box from the corresponding predicted mask during the inference stage ?Yes, for the baseline, the text box is calculated by the cv2.findContours, and the contour with the max area is selected to be wrapped by the cv2.minAreaRect to output the final text box.

from pmtd.

QingqingWang-1 avatar QingqingWang-1 commented on August 17, 2024

from pmtd.

JingChaoLiu avatar JingChaoLiu commented on August 17, 2024

do you think I should change syncBN to group BN?

We didn't perform the experiments for group normalization(GN) provided by maskrcnn-benchmark. It is worth to try the GN.

Will the final performance be affected by the setting of batch size?

In our experiments, 8 cards with base_learing_rate=0.01, 16 cards with base_learing_rate=0.02 and 32 cards with base_learing_rate=0.04 shows no significant difference (within 0.1%).

what is your setting for FPN_POST_NMS_TOP_N_TRAIN?

These settings are: MODEL.RPN.FPN_POST_NMS_TOP_N_TRAIN = 2000 and MODEL.RPN.FPN_POST_NMS_PER_BATCH = False

I use cv2.fillPoly() and cv2.findContours() to find the cropped text areas

During the training stage, what type do text areas exist as before the data augmentation, binary masks of bbox_h * bbox_w or polygonsof point_num * {x, y}?

  • For the data augmentation in the baseline, since the mask is binary and the cropped mask is just the same part of the original mask, The method 1 of

    • building the original binary mask before cropping
    • then cropping the original mask to get the cropped mask,

    and the method 2 of

    • cropping the text area as a polygon
    • then using the cropped polygon to build a binary mask,

    are both ok.

  • For the PMTD, since the pyramid label is built according to the the cropped polygon, the method 1 of

    • building the binary mask before cropping,
    • cropping the original mask to get the cropped mask,
    • find contours from the cropped mask to get the cropped polygon,
    • and building the pyramid label from the cropped polygon,

    may be lengthy. The method 2 of

    • cropping the text area as a polygon
    • then using the cropped polygon to build a pyramid mask,

    is strongly recommended.

from pmtd.

jylins avatar jylins commented on August 17, 2024

Hi @JingChaoLiu , does cropping the text area as a polygon exist API?

from pmtd.

JingChaoLiu avatar JingChaoLiu commented on August 17, 2024

Both the libraries of pyclipper and Polygon3 can do this. Reimplement the PolygonInstance.crop(link) may be a proper way to do this.

from pmtd.

jylins avatar jylins commented on August 17, 2024

@JingChaoLiu Thanks!

from pmtd.

QingqingWang-1 avatar QingqingWang-1 commented on August 17, 2024

@JingChaoLiu Many thanks for your implementation details.

from pmtd.

jylins avatar jylins commented on August 17, 2024

Hi @JingChaoLiu , could you share your implementation of SyncBN? I try to use torch.nn.SyncBatchNorm in Pytorch 1.1, but it crashes in our program.

from pmtd.

hityzy1122 avatar hityzy1122 commented on August 17, 2024

Hi @JingChaoLiu , could you share your implementation of SyncBN? I try to use torch.nn.SyncBatchNorm in Pytorch 1.1, but it crashes in our program.

I don't know what is the authors' implementation, but I implement it by using torch.nn.BatchNorm2d and torch.nn.BatchNorm1d in the model part and
if distributed:
sync_bn_model = torch.nn.SyncBatchNorm.convert_sync_batchnorm(model)
model = torch.nn.parallel.DistributedDataParrel(sync_bn_model, device_ids=[local_rank], output_device=local_rank, )
in train_net.py.
Remember to change:
if isinstance(module, torch.nn.modules.batchnorm._BatchNorm): [torch/nn/modules/batchnorm.py (line 495)]
to
if isinstance(module, torch.nn.modules.batchnorm.BatchNorm2d):
Otherwise, the model will crash.

Thanks, it's really helpful

from pmtd.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.