Code Monkey home page Code Monkey logo

Comments (11)

JialeCao001 avatar JialeCao001 commented on June 25, 2024

@gauravmunjal13 Hi. Thanks for your interest. (1) The video id for evaluation and saved reasults can be different because the code saving results is written by me and the code for evaluation is provided by the official dataset. But I think it is not necessary to be same because we just need to visualize the results in order. For evaluation, you should follow official dataset. (2) With our provided model, we do not get the ap of 0. I am not sure about your reason. (3) It will skip frames with None. If you want to save results in each image, you can set a lower threshold in configs file.

from sipmask.

gauravmunjal13 avatar gauravmunjal13 commented on June 25, 2024

Thanks, @JialeCao001 for your response!

It's true that the frames which have segmentations None are ignored while saving them. However, even the ones saved (having segmentations), all don't contain a visible object. When I plotted these segmentations, I find they are small. Is it that they are ignored as being small in size?

Continuing on that, I would like to know your thoughts of using SipMask on detecting very small objects?

Could you please suggest where are you saving the frames/results? I came across function show_results() in inference.py code under mmdet/apis. But it doesn't save the results.

Meanwhile, I tried to write a code snippet to plot and save results using results.pkl.json output file so that I can filter results based on score. The segmentations displayed by your code is correct, however, the one using my code snippet is drifted (seems like size or scale issue). But it did show correctly using the results.pkl.json file from MaskTrack-RCNN. Is your output results file differ in some context to the one generated by MaskTrackRCNN?

I am decoding the segmentations and applying on the image as:

mask = maskUtils.decode(segm)
im_pred = apply_mask(im_pred, mask, (0.0,1,0.0)).astype(np.uint8)

where, apply_mask() is:

def apply_mask(image, mask, color, alpha=0.5):
     for c in range(3):
        image[:, :, c] = np.where(mask == 1,image[:, :, c] *(1 - alpha) + alpha * color[c] * 255,image[:, :, c])
     return image

Last, it is suggested on Readme that there are two versions of SipMask as High-Accuracy and Real-Time Fast. How do I know which one I am using and how to switch to the other?

from sipmask.

JialeCao001 avatar JialeCao001 commented on June 25, 2024

@gauravmunjal13 Hi. I can not get all the things that you say. Here, I try to ask your questions.
(1) I am not very sure that the mmdetection will filter out objects of small-scale objects. If I am free, I can check this problem.
(2) SipMask is more useful for large-scale objects from our experiments.
(3) In Readme, I introduce how to save the results of youtube-vis. Please use the the following command:
python tools/test_video.py ${CONFIG_FILE} ${CHECKPOINT_FILE} [--out ${RESULT_FILE}] --eval segm --show --save_path= ${SAVE_PATH}
And the corresponding show_result function is in base.py as
https://github.com/JialeCao001/SipMask/blob/f7b035232ae3ff8d7171fd98b80efdbac926cbd6/SipMask-VIS/mmdet/models/detectors/base.py#L109
(4) Instance segmentation in images has two versions of SipMask. For video instance segmentation, we only provide SipMask-VIS.

from sipmask.

gauravmunjal13 avatar gauravmunjal13 commented on June 25, 2024

Many thanks, @JialeCao001 for your comments!

These are really useful and really appreciate your help!

I may have figured out the reason why my code for plotting and saving the results is not working properly which may highlight some discrepancy in the results.pkl.json file.

The input to the model is images and segmentations corresponding to size (512,512). The model produces an output as results.pkl.json in which the size of segmentations is (512,512) and the mask obtained after decoding them is also (512,512). However, the issue was that these segmentations seems to be drifted towards the top left and smaller in size, but were correct in your saving results.

My analysis is that you are saving the images as of size (360,360) but the segmentations are of size (512,512). Does it mean there is some discrepancy in producing results file or am I missing anything?

Thanks!

from sipmask.

JialeCao001 avatar JialeCao001 commented on June 25, 2024

@gauravmunjal13 Hi. If you save images with the following code, can you get the right results?
python tools/test_video.py ${CONFIG_FILE} ${CHECKPOINT_FILE} [--out ${RESULT_FILE}] --eval segm --show --save_path= ${SAVE_PATH}

from sipmask.

gauravmunjal13 avatar gauravmunjal13 commented on June 25, 2024

@gauravmunjal13 Hi. If you save images with the following code, can you get the right results?
python tools/test_video.py ${CONFIG_FILE} ${CHECKPOINT_FILE} [--out ${RESULT_FILE}] --eval segm --show --save_path= ${SAVE_PATH}

Yes! But the resulting images are resized to (360,360) while the input images were of size (512,512).

So, if I take the results.pkl.json file to apply the segmentations on input images, it's not correct.

from sipmask.

JialeCao001 avatar JialeCao001 commented on June 25, 2024

@gauravmunjal13 okay. I know what you say. The saved image is the same to the rescaled input image for network, which may be different from original image. When writing the json file, the code will rescale bounding box back.

from sipmask.

gauravmunjal13 avatar gauravmunjal13 commented on June 25, 2024

Perhaps rescaling may not be happening correctly.

I followed the following steps to use segmentations from results.pkl.json file on input images of (512,512). Let me know if I am wrong. I used your code (show_result() method in base.py) as the reference.
Steps:
First, I resized the input image from (512,512) to (360,360).
Second, the mask obtained from the result file is of size (512,512), but I sliced it as mask = mask[:h, :w], where h and w are 360.
Applying this mask on the resized image gives the correct visualization results.

However, I still need to solve the problem of evaluation as the AP is 0. In the method ytvos_eval() in coco_utils.py, ytos detections are loaded from results.pkl.json as predictions while the ground truth annotations are loaded as ytos from the input annotation file which are of size (512,512). Since the predictions (segmentations) may not be rescaled correctly to the original input size, and thus the AP is 0. What do you think?

And many thanks @JialeCao001 for your support!

from sipmask.

gauravmunjal13 avatar gauravmunjal13 commented on June 25, 2024

Hi @JialeCao001 ,

Let me know if I can provide more information or not clear in explaining.

Thanks!

from sipmask.

JialeCao001 avatar JialeCao001 commented on June 25, 2024

@gauravmunjal13 I am not sure about your problem. I donot get a mAP of 0 on youtube-vis test set.

from sipmask.

gauravmunjal13 avatar gauravmunjal13 commented on June 25, 2024

Hi @JialeCao001 ,
I think we can close this issue. As discussed in the mails, that the following command produces the correct output (results.pkl.json):
python tools/test_video.py ${CONFIG_FILE} ${CHECKPOINT_FILE} [--out ${RESULT_FILE}] --eval segm

But the following command doesn't results in the correct output file (results.pkl.json) in terms that the annotations aren't resized back to the orignal size:
python tools/test_video.py ${CONFIG_FILE} ${CHECKPOINT_FILE} [--out ${RESULT_FILE}] --eval segm --show --save_path= ${SAVE_PATH}

Thanks!

from sipmask.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.