Code Monkey home page Code Monkey logo

rvos's Introduction

RVOS: End-to-End Recurrent Net for Video Object Segmentation

See our project website here.

In order to develop this code, we used RSIS (Recurrent Semantic Instance Segmentation), which can be found here, and modified it to suit it to video object segmentation task.

One shot visual results

RVOS One shot

Zero shot visual results

RVOS Zero shot

License

This code cannot be used for commercial purposes. Please contact the authors if interested in licensing this software.

Installation

  • Clone the repo:
git clone https://github.com/imatge-upc/rvos.git
  • Install requirements pip install -r requirements.txt
  • Install PyTorch 1.0 (choose the whl file according to your setup, e.g. your CUDA version):
pip3 install https://download.pytorch.org/whl/cu100/torch-1.0.1.post2-cp36-cp36m-linux_x86_64.whl
pip3 install torchvision

Data

YouTube-VOS

Download the YouTube-VOS dataset from their website. You will need to register to codalab to download the dataset. Create a folder named databasesin the parent folder of the root directory of this project and put there the database in a folder named YouTubeVOS. The root directory (rvosfolder) and the databases folder should be in the same directory.

The training of the RVOS model for YouTube-VOS has been implemented using a split of the train set into two subsets: train-train and train-val. The model is trained on the train-train subset and validated on the train-val subset to decide whether the model should be saved or not. To train the model according to this split, the code requires that there are two json files in the databases/YouTubeVOS/train/folder named train-train-meta.jsonand train-val-meta.json with the same format as the meta.jsonincluded when downloading the dataset. You can also download the partition used in our experiments in the following links:

DAVIS 2017

Download the DAVIS 2017 dataset from their website at 480p resolution. Create a folder named databasesin the parent folder of the root directory of this project and put there the database in a folder named DAVIS2017. The root directory (rvosfolder) and the databases folder should be in the same directory.

LMDB data indexing

To highly speed the data loading we recommend to generate an LMDB indexing of it by doing:

python dataset_lmdb_generator.py -dataset=youtube

or

python dataset_lmdb_generator.py -dataset=davis2017

depending on the dataset you are using.

Training

  • Train the model for one-shot video object segmentation with python train_previous_mask.py -model_name model_name. Checkpoints and logs will be saved under ../models/model_name.
  • Train the model for zero-shot video object segmentation with python train.py -model_name model_name. Checkpoints and logs will be saved under ../models/model_name.
  • Other arguments can be passed as well. For convenience, scripts to train with typical parameters are provided under scripts/.
  • Plot loss curves at any time with python plot_curves.py -model_name model_name.

Evaluation

We provide bash scripts to evaluate models for the YouTube-VOS and DAVIS 2017 datasets. You can find them under the scripts folder. On the one hand, eval_one_shot_youtube.shand eval_zero_shot_youtube.sh generate the results for YouTube-VOS dataset on one-shot video object segmentation and zero-shot video object segmentation respectively. On the other hand, eval_one_shot_davis.shand eval_zero_shot_davis.sh generate the results for DAVIS 2017 dataset on one-shot video object segmentation and zero-shot video object segmentation respectively.

Furthermore, in the src folder, prepare_results_submission.pyand prepare_results_submission_davis can be applied to change the format of the results in the appropiate format to use the official evaluation servers of YouTube-VOS and DAVIS respectively.

Demo

You can run demo.py to do generate the segmentation masks of a video. Just do:

python demo.py -model_name one-shot-model-davis --overlay_masks

and it will generate the resulting masks.

To run the demo for your own videos:

  1. extract the frames to a folder (make sure their names are in order, e.g. 00000.jpg, 00001.jpg, ...)
  2. Have the initial mask corresponding to the first frame (e.g. 00000.png).
  3. run python demo.py -model_name one-shot-model-davis -frames_path path-to-your-frames -mask_path path-to-initial-mask --overlay_masks

to do it for zero-shot (i.e. without initial mask) run python demo.py -model_name zero-shot-model-davis -frames_path path-to-your-frames --zero_shot --overlay_masks

Also you can use the argument -results_path to save the results to the folder you prefer.

Pretrained models

Download weights for models trained with:

The same files are also available in this folder in Google Drive.

Extract and place the obtained folder under models directory. You can then run evaluation scripts with the downloaded model by setting args.model_name to the name of the folder.

Contact

For questions and suggestions use the issues section or send an e-mail to [email protected]

rvos's People

Contributors

agirbau avatar carlesventura avatar kant avatar miriambellver avatar xavigiro avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

rvos's Issues

Woud you like to share evaluating code on YoutubeVOS?

Hi, thank you for sharing your work! The idea is great and the code is elegantly written.
I have evaluated RVOS results on DAVIS validation set with DAVIS-2017 official evaluation code. But the YoutubeVOS dataset structure is quite different from DAVIS, would you like to share the evaluation code? Thank you for your time.

Handling new objects entering the video

Thank you for sharing your code. The code is working if you have masks for all objects provided for the first frame. If some objects are appearing halfway through the video, how do I give the mask for that? I know I can give the mask for the first frame with -mask_path, how do I give the mask for new object entering at, say, 20th frame.

one-shot results are wrong

I test the code of one shot and the results of each clip are the same as the mask of the ground truth of the first frame. Maybe there is some trouble in the code of eval_previous_mask.py

How do you evaluate the performance for one-shot on DAVIS-2017?

Hi, I am really interested in your work. I notice that in Table 4 of your paper, both J and F of OSNM are much lower than those reported in OSMN (37.7 vs 52.5 for J mean, 44.9 vs 57.1 for F mean). I wonder whether these "J" and "F" in your Table 4 mean "J mean" and "F mean". If yes, why are they so much lower than these reported in original paper?
The fowllowing are results on DAVIS-2017 from your paper and OSMN.

Your results:

图片

OSMN results:

图片

Problem for zero-shot in DAVIS2017

As you described in your paper: "Analogously, inference, in order to evaluate our results for zero-shot video object segmentation , the masks provided for the first frame in one-shot VOS are used to select which predicted instances are selected for evaluation", it is easy to do so in val set, but gt masks for test-dev are not given. How can you get the results for test-dev? Looking forward to your reply.
Thanks.

lmdb.Error: xxx/lmdb_seq: Function not implemented

Hi, when I run ' python dataset_lmdb_generator.py -dataset=youtube ', I encountered the following error(when dataset is DAVIS2017, the same error showed up as well.)
How can I fix it?

Traceback (most recent call last): File "dataset_lmdb_generator.py", line 38, in <module> frame_lmdb_generator_sequences.generate_lmdb_file(cfg.PATH.DATA, cfg.PATH.SEQUENCES_TRAIN) File "dataset_lmdb_generator.py", line 16, in generate_lmdb_file env = lmdb.open(os.path.join(root_dir, 'lmdb_' + self.gen_type)) lmdb.Error: /userhome/rvos/databases/YouTubeVOS/lmdb_seq: Function not implemented

hi i cannot load the dataset youtube_vos when i run train_previous_mask.py

File "/home/xmz/RVOS/rvos-master/src/dataloader/base_youtube.py", line 104, in init
split,osp.join(cfg.PATH.SEQUENCES_TRAIN,name),regex, lmdb_env=lmdb_env)
File "/home/xmz/RVOS/rvos-master/src/dataloader/base_youtube.py", line 60, in init
_files_vec = txn.get(key_db.encode()).decode().split('|')
AttributeError: 'NoneType' object has no attribute 'decode'

thank you for your reply

Temporal stability is not available

Hey, thanks for the great work! I ran the zero-shot-davis model and although it worked, the temporal stability of the segmented objects is pretty bad. I assume this is related to the [WARNING][10-01-2020 16:50:16] Temporal stability not available / import error of the module tstab.
Are you planning on integrating this module?

ran the train script but nothing happened

Hello, I cloned your code and tried to train the model. But after I ran the script, nothing happened. My terminal just showed two messages. So what's the problem?
image

How is starting_frame determined?

Hi! Thank you for sharing the code.
I was wondering if you'd be kind enough and explain to me/point me to the code on how starting_frame is set.

I was trying to train my own data (the format of which is similar to YoutubeVOS, not all video sequences start from 00000.jpg, some start from 00001.jpg). When I try to train it, I sometimes get FileNotFound error potentially due to this index mixup. But I have a hard time understanding how starting_frame is set when the youtube dataset is made.

Please let me know. Thank you so much for your help.

Figure 2 in paper does not match with code ? Or I misunderstood

Please pay attention to the red box in the figure.

figure2

parser.add_argument('-hidden_size', dest='hidden_size', default = 128, type=int)

self.sk3 = nn.Conv2d(skip_dims_in[2],int(self.hidden_size/2),self.kernel_size,padding=self.padding)

self.sk2 = nn.Conv2d(skip_dims_in[3],int(self.hidden_size/4),self.kernel_size,padding=self.padding)

I always meet the following error when training. Anyone know what's going on here? ?

Traceback (most recent call last):
File "train_previous_mask.py", line 384, in
trainIters(args)
File "train_previous_mask.py", line 252, in trainIters
loaders = init_dataloaders(args)
File "train_previous_mask.py", line 50, in init_dataloaders
use_prev_mask = False) #use_prev_mask is True only for evaluation
File "C:\rvos\src\dataloader\dataset_utils.py", line 25, in get_dataset
use_prev_mask = use_prev_mask)
File "C:\vos\src\dataloader\davis2017.py", line 124, in init
self.sequences = [Sequence(self._phase, s.name, lmdb_env=lmdb_env_seq) for s in self._db_sequences]
File "C:\rvos\src\dataloader\davis2017.py", line 124, in
self.sequences = [Sequence(self._phase, s.name, lmdb_env=lmdb_env_seq) for s in self._db_sequences]
File "C:\rvos\src\dataloader\base.py", line 98, in init
split,osp.join(cfg.PATH.SEQUENCES,name), regex, lmdb_env=lmdb_env)
File "C:\rvos\src\dataloader\base.py", line 71, in init
super(BaseLoader, self).init(_files, load_func=load_func)
File "C:\Users\user.conda\envs\rvos\lib\site-packages\skimage\io\collection.py", line 188, in init
raise TypeError('Invalid pattern as input.')
TypeError: Invalid pattern as input.

when will you open source this work ?

Your work is amazing, I am very interested in the detail about this.
Could you please tell me when will your team open source this work.
Thanks a lot !

Retrained zero shot results are inferior to the public scores

I retrained the zero shot model by using train_zero_shot_youtube.sh based on the given settings, obtained the inference results based on the eval_zero_shot_youtube.sh, and then prepared the submission results based on prepare_results_submission.py for the YouTubeVOS challenge official website.
However, the test results on YouTubeVOS cannot match the public scores. Are there any other settings or tricks during training and testing? I found data argumentation is used in training while not in testing, and I absolutely did as the public settings. The models are trained for 50 epochs on a single TitanX GPU (batch_size=4, clips=5). The following is the retrained results:

retrain-RVOS-T: 33.87, 18.37, 38.62, 22.23

retrain-RVOS-S: 38.52, 18.72, 41.70, 22.59

retrain-RVOS-ST: 41.56, 21.46, 45.00, 24.52

Besides I also used the public zero shot youtube model for youtube-vos testing, I got the following scores:
pub-RVOS-ST: 43.39, 21.10, 45.30, 24.32.

It seems the inferior retrain results are not due to the test settings, but I do not know why, can you help me?

Questions about multi-gpu

Thank you for your amazing work and project! Have you tried to run the project on multi-gpu?(It seems that the released project doesn't support multi-gpu)

Version dependencies of third-party packages

Hi,
I am interested in your research and want to run the code. According to the provided README documentation, I had encountered a problem of version conflict of third-party pkg from requirement.txt. I used python3.7. Does the version of python meet the demand? Can anyone help?
Thanks.

expected performance on `train-val` set

Hi,

Thanks so much for releasing the code and the splitting of the training set. Can I ask if you have test the performance of eval_provious_mask + the youtubevos-one-shot model you released + validation set in train-val-meta.json? I just want to double check if my testing is bug free.

Thanks!

How to display visualization results ?

I have run the eval_previous_mask.py of one-shot-model-davis model, but how to display visualization results, such as mask or something?
I figure out in the /rvos/src/args.py there are some args about visualization and logging, how to use them ?
Thanks for your help !

hi, why the decoder need to loop T times

@carlesventura
Excuse me, could you tell me why the decoder need to loop T times?

for t in range(0, T):
#prev_hidden_temporal_list is a list with the hidden state for all instances from previous time instant
#If this is the first frame of the sequence, hidden_temporal is initialized to None. Otherwise, it is set with the value from previous time instant.
if prev_hidden_temporal_list is not None:
hidden_temporal = prev_hidden_temporal_list[t]
if args.only_temporal:
hidden_spatial = None
else:
hidden_temporal = None

    #The decoder receives two hidden state variables: hidden_spatial (a tuple, with hidden_state and cell_state) which refers to the
    #hidden state from the previous object instance from the same time instant, and hidden_temporal which refers to the hidden state from the same
    #object instance from the previous time instant.
    out_mask, hidden = decoder(feats, hidden_spatial, hidden_temporal)
    hidden_tmp = []
    for ss in range(len(hidden)):
        if mode == 'train':
            hidden_tmp.append(hidden[ss][0])
        else:
            hidden_tmp.append(hidden[ss][0].data)
    hidden_spatial = hidden
    hidden_temporal_list.append(hidden_tmp)

    upsample_match = nn.UpsamplingBilinear2d(size=(x.size()[-2], x.size()[-1]))
    out_mask = upsample_match(out_mask) # batch_size * 1 * height * width
    out_mask = out_mask.view(out_mask.size(0), -1) # batch_size * height x width
    
    # repeat predicted mask as many times as elements in ground truth.
    # to compute iou against all ground truth elements at once
    y_pred_i = out_mask.unsqueeze(0) # out_mask: batch_size * height x width -> 1 * batch_size * height x width
    y_pred_i = y_pred_i.permute(1,0,2) # 1 * batch_size * height * width -> batch_size * 1 * height x width
    y_pred_i = y_pred_i.repeat(1,y_mask.size(1),1) 
    y_pred_i = y_pred_i.view(y_mask.size(0)*y_mask.size(1),y_mask.size(2))# torch.Size([10, 10, 114688]) -> torch.Size([100, 114688])
    y_true_p = y_mask.view(y_mask.size(0)*y_mask.size(1),y_mask.size(2))# torch.Size([100, 114688])

    c = args.iou_weight * softIoU(y_true_p, y_pred_i)
    c = c.view(sw_mask.size(0),-1)
    scores[:,:,t] = c.cpu().data

    # get predictions in list to concat later
    out_masks.append(out_mask)

=================================
Glad to waiting for your answer. Thanks!
Best ragards,
zwk

Where can I see the metrics after running evaluation?

I followed the guidelines and it seems I successfully managed to run the evaluation:

python ../src/eval_previous_mask.py -model_name=one-shot-model-davis -dataset=davis2017 -eval_split=test-dev -batch_size=1 -length_clip=130 -gpu_id=0
[WARNING][16-04-2019 17:50:13] Temporal stability not available
Eval logs will be saved to: ../models/one-shot-model-davis/eval.log

The eval log displays the following message:
image

I am not sure where I can see the metrics though.

To run my own videos error

I use this command to run my own video: python demo.py -model_name one-shot-model-davis -frames_path path-to-your-frames --zero_shot --overlay_masks

Howerver, it appears an error:
image

Creating the mask for one-shot model

Thank you for sharing your code. I am trying to get this to work for my data. I can run the demo.py with the pre-trained models for the one-shot scenario using the command python demo.py -model_name one-shot-model-davis --overlay_masks. But when I try to run it for my data with the command python demo.py -model_name one-shot-model-davis -frames_path path-to-your-frames -mask_path path-to-initial-mask --overlay_masks, it gives me the following error.

[WARNING][05-11-2019 18:46:24] Temporal stability not available
Results will be saved to: ../models/../models/one-shot-model-youtubevos/results/JPEGImages
Loading model: ../models/one-shot-model-youtubevos
Namespace(all_classes=False, augment=True, base_model='resnet101', batch_size=4, best_val_loss=1000, cat_id=-1, class_loss_after=20, class_th=0.5, class_weight=0.1, crop=False, curriculum_learning=False, dataset='youtube', display=False, display_route=False, dropout=0.0, dropout_cls=0.0, dropout_stop=0.0, epoch_resume=0, eval_split='test', finetune_after=0, gpu_id=0, gt_maxseqlen=10, hidden_size=128, imsize=480, iou_weight=1.0, kernel_size=3, length_clip=5, log_file='train.log', log_term=False, lr=0.001, lr_cnn=1e-06, mask_th=0.5, max_dets=100, max_epoch=100, maxseqlen=10, min_delta=0.0, min_size=0.001, min_steps=1, model_name='spatiotemporal_youtube_bs_04_lc_05_256p', momentum=0.9, ngpus=1, no_display_text=False, no_run_coco_eval=False, num_classes=21, num_workers=1, only_temporal=False, optim='adam', optim_cnn='adam', pascal_dir='/databases/voc2012/VOCAug/', patience=15, patience_stop=60, port=8097, print_every=10, resize=True, resume=False, rotation=10, seed=123, server='http://localhost', shear=0.1, single_object=False, skip_mode='concat', smooth_curves=False, steps_cl=1, stop_balance_weight=0.5, stop_loss_after=-1, stop_th=0.5, stop_weight=0.5, transfer=True, transfer_from='spatiotemporal_youtube_bs_04_lc_05_256p_prev_mask', translation=0.1, update_encoder=False, use_cats=True, use_class_loss=False, use_gpu=True, use_gt_cats=False, use_gt_masks=False, use_gt_stop=False, use_stop_loss=False, visdom=False, weight_decay=1e-06, weight_decay_cnn=1e-06, year='2017', youtube_dir='/databases/YouTubeVOS/', zoom=0.7)
video mode activated
Traceback (most recent call last):
  File "demo.py", line 288, in <module>
    results.save_result_overlay(x, outs, frame_name)
  File "demo.py", line 225, in save_result_overlay
    mask_pred = (torch.squeeze(net_outs[0, t, :])).cpu().numpy()
IndexError: index 10 is out of bounds for dimension 1 with size 10

To isolate the problem, I replaced my mask with one of the masks in the DAVIS dataset. Although it gives wrong results(as expected) the code runs without issue. So I think the issue is in my mask. Can you please tell me what is the best way to create the masks? I used Matlab GraphCut tool to get the .png mask. Since this .png image did not have a palette(it gave None for image.getpalette()), I converted it in another Python script using im.convert("P", palette=Image.WEB, colors=256) before giving as mask for your code. Can you please tell the correct way to create the mask?

Thank you.

How to get the J and F values of segmentation results?

Hi,thank you for sharing the source code.
By running eval.py and eval_previous_mask.py , I have obtained the RGB image of the segmentation result.
I have found the evaluation.py in the rvos\src\dataloader folder ,I want to know how I can get the J and F values of the results.
Thanks for your help !

Question about sequence

Why didn't you extract features for every 2d input image??? There is no for loop.

I saw your code, and I'm sure you draw a feature on the 2d image and do 2d segmentation through the encoder decoder. How does it become a video segmentation with 3D? 2d and just stack? Is the T you set the number of objects?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.