dbolya / yolact Goto Github PK

View Code? Open in Web Editor NEW

4.9K 105.0 1.3K 21.2 MB

A simple, fully convolutional model for real-time instance segmentation.

License: MIT License

Python 90.06% Shell 1.10% CSS 0.91% HTML 0.90% JavaScript 7.03%

realtime real-time instance-segmentation yolact pytorch

yolact's People

Contributors

Stargazers

Watchers

Forkers

collector-m giorking hajungong007 fendaq yuckfu pandamax shubhampachori12110095 xjsxujingsong jianyuchen23 benjamesbabala suyanzhou626 wenmingmeng jacklongking happog chongruo changya1990 dongfangduoshou123 fireae richard-coder hyzcn zbyuan leogogogo txytju winnerineast noobgrow mvcaro hzshonny jingmouren dansonc liuzheng081 deeep-learning sjz207 wishgale dapenggg freedevelope labimage yuemengyuan hzhang57 zenetio jangocheng guanghan batermj crackgfw joeytang3377 jy1023408440 chengwei920412 leihuan925 suzhoushr insightai zhaixingzhe dreadlord1984 leo-xxx tsing-cv zp1018 kleinxin mincau robinwenqian yisampi sunshinezhihuo changhai0551 xuwanqi ericzw sadjadasghari papicheng zhyj3038 chiukin hyfine shashikant-ghangare kupine yushenxiang zllrunning daltonxiong xuchengggg mydir mathpopo jingtingxu33699 bresserl mymistakes liupearl1 jacke121 murari023 baifanysu gongypgit areslp daydreamer2023 zhangsdly whuguozili shuyunyuan kandithws jlcai5 paul0m jiangxiaoyan william-zhan hongminli liuwenhaha monoloxo yrahal saeed771 jingsenyang shanshuiluli

yolact's Issues

Problems encountered while training my own dataset

Hi,
In order to solve the stacking problem of the same object, I have trained my data set as required, but there are some masks that cannot completely cover the object, only part of them can be covered. Do you know what this is about? Do you have any Suggestions for modification?
Looking forward to your reply,thank you.

Training speed

I am not having consistent GPU utilization, and it says 18 days for 1 v100 gpu(p3.2xlarge) with batchsize of 12 and num-workers 8. Does this make sense?

Is there any explanation of timer column and is there tensorboard equivalent for viewing performance over time?

Thank you very much!

Comment wrong

https://github.com/dbolya/yolact/blob/master/data/config.py#L30
Is this line wrong? I think it should be BGR.

a very very very strange problem on windows

I think thers is something incompatible with windows in yolact.py

run the eval.py says cuda unkown error, the error locates at 'torch.set_default_tensor_type('torch.cuda.FloatTensor')'. It looks like cuda init unsuccessfully.

I try to put 'torch.set_default_tensor_type('torch.cuda.FloatTensor')' from top line to down,like this:

#try1
import torch
torch.set_default_tensor_type('torch.cuda.FloatTensor')
from data import COCODetection, get_label_map, MEANS, COLORS
...

#try2
import torch
...
torch.set_default_tensor_type('torch.cuda.FloatTensor')
from yolact import Yolact
...

then I find that put it before 'from yolact import Yolact' works, otherwise failed.

Now, at the begin of yolact.py, write as follow:

import torch
torch.set_default_tensor_type('torch.cuda.FloatTensor')
from data import COCODetection, get_label_map, MEANS, COLORS
...

KeyError while trying to retrain on Pascal

Hello

I am facing a little issue.
I am trying to retrain the model on Pascal Voc 2012 dataset.
I took the coco like annotations from this source:
https://github.com/facebookresearch/multipathnet

Then I follow the instruction concerning the modification to do in the file config.py

But when I call : python train.py --config=yolact_base_config

I receive the following error:

KeyError: 'Traceback (most recent call last):\n File "/home/smile/anaconda3/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 138, in _worker_loop\n samples = collate_fn([dataset[i] for i in batch_indices])\n File "/home/smile/anaconda3/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 138, in <listcomp>\n samples = collate_fn([dataset[i] for i in batch_indices])\n File "/hdd1/prog/yolact/data/coco.py", line 88, in __getitem__\n im, gt, masks, h, w, num_crowds = self.pull_item(index)\n File "/hdd1/prog/yolact/data/coco.py", line 145, in pull_item\n target = self.target_transform(target, width, height)\n File "/hdd1/prog/yolact/data/coco.py", line 39, in __call__\n label_idx = self.label_map[obj[\'category_id\']] - 1\nKeyError: 12\n'

The error is quite not clear to me.

So what I did is create a new dataset:

PASCAL_VOC_CLASSES = ("aeroplane", "bicycle", "bird", "boat", "bottle",
		      "bus", "car", "cat", "chair", "cow", "diningtable",
        	      "dog", "horse", "motorbike", "person", "pottedplant",
		      "sheep", "sofa", "train", "tvmonitor")


PASCAL_VOC_LABEL_MAP = { 1:  1,  2:  2,  3:  3,  4:  4,  5:  5,  6:  6,  7:  7,  8:  8,
                   9:  9, 10: 10, 11: 11, 13: 12, 14: 13, 15: 14, 16: 15, 17: 16,
                  18: 17, 19: 18, 20: 19, 21: 20}

pascalvoc2012_dataset = dataset_base.copy({
    'name': 'PASCAL VOC 2012',
    
    'train_images':'/media/smile/45C142AD782A7053/Datasets/PASCAL_VOC/VOC2012/VOCdevkit/VOC2012/JPEGImages/',
    'train_info':'/home/smile/multipathnet/data/annotations/pascal_train2012.json',

    'valid_images':'/media/smile/45C142AD782A7053/Datasets/PASCAL_VOC/VOC2012/VOCdevkit/VOC2012/JPEGImages/',
    'valid_info':'/home/smile/multipathnet/data/annotations/pascal_val2012.json',

    'label_map': PASCAL_VOC_LABEL_MAP
})

I created a new base_config that only which call the dataset I previously created with the proper number of classes:

pascalvoc_base_config = Config({
    'dataset': pascalvoc2012_dataset,
    'num_classes': 21, # This should include the background class
...

All the other fields are let untouch.

Finally I adapted yolact_base_config:

#yolact_base_config = coco_base_config.copy({
yolact_base_config = pascalvoc_base_config.copy({
    'name': 'yolact_base',

    # Dataset stuff
#    'dataset': coco2017_dataset,
#    'num_classes': len(coco2017_dataset.class_names) + 1,

    'dataset': pascalvoc2012_dataset,
    'num_classes': len(pascalvoc2012_dataset.class_names) + 1,

Here also all the other fields are let untouch.

EDIT

After applying the modifications discussed here the dataset configuration in order to train Pascal Voc is:

MEANS_PV = (103.17, 111.70, 116.69)
STD_PV = (61.11, 59.89, 61.00)

PASCAL_VOC_CLASSES = ("aeroplane", "bicycle", "bird", "boat", "bottle",
		      "bus", "car", "cat", "chair", "cow", "diningtable",
        	      "dog", "horse", "motorbike", "person", "pottedplant",
		      "sheep", "sofa", "train", "tvmonitor")


PASCAL_VOC_LABEL_MAP = { 1:  1,  2:  2,  3:  3,  4:  4,  5:  5,  6:  6,  7:  7,  8:  8,
                   9:  9, 10: 10, 11: 11, 12: 12, 13: 13, 14: 14, 15: 15, 16: 16,
                  17: 17, 18: 18, 19: 19, 20: 20}

pascalvoc2012_dataset = dataset_base.copy({
    'name': 'PASCAL VOC 2012',
    
    'train_images':'/media/smile/45C142AD782A7053/Datasets/PASCAL_VOC/VOC2012/VOCdevkit/VOC2012/JPEGImages/',
    'train_info':'/home/smile/multipathnet/data/annotations/pascal_train2012.json',

    'valid_images':'/media/smile/45C142AD782A7053/Datasets/PASCAL_VOC/VOC2012/VOCdevkit/VOC2012/JPEGImages/',
    'valid_info':'/home/smile/multipathnet/data/annotations/pascal_val2012.json',

    'label_map': PASCAL_VOC_LABEL_MAP,
    'class_names': PASCAL_VOC_CLASSES,
})

How you produce 'maskcoefficients'

Hi,Thanks a lot for your fantastic work!But,i found that in your paper ,you produce 'mask coefficients'
by using fc layers.but in your code, i found you produce 'mask coefficients' by using conv layer.Can you tell me which kind of layer you use for producing 'mask coefficients'?Thanks for your reply!

Weighted Web Download Can't Open？？

how to test on a single image and get masks?

How to open camera?

what's the difference the fast nms bewteen yolact and ssd

I found fastnms function in ssd code,so what's the difference between yolact and ssd?

Inference speed problem on my own environment

First of all, thanks for sharing the amazing work!
Following the instructions, I have deployed the environment and can execute the code successfully, however, when running eval.py, the inference speed is slower than expected.
For model ResNet101-FPN, when testing on validation set of coo, the code return about 9 FPS, and when testing on my own images of kinect (640*480), with ploting and saving disabled, the code return about 14 FPS.
my own evironment is : GTX1080, cuda8.0, cudatoolkits8.0, I am using anaconda, gpu support is checked via

torch.cuda.is_available()

I am a newer for pytorch, so I am wondering there is some configuration or dependencies have missed.

Thanks!

evaluation model download URL

hi dbolya,

Can u upload your model on Google drive or other disk? The URL provided by ucdavis. is not accessable

Thanks

Custom dataset runtime error

Hello

I am trying to retrain yolact on Pascal Part a variation of Pascal VOC where each classes has many sub-classes.
To simplify everything I make every sub-classes a class in addition with the 20 original one which give me a set 316 classes.
I generated three JSON files for each case.

When I start training I encouter the following error:
RuntimeError: cannot perform reduction function max on tensor with no elements because the operation does not have an identity
Which happen here:
losses = criterion(out, wrapper, wrapper.make_mask())
train.py around line 262 (I had some print in my file so my line number is different)

Here:
eriklindernoren/PyTorch-YOLOv3#110

I read it might be a path issue however I rechecked the image path are correct.
Also I am able to train Pascal Voc using the same image path without issues.

I try to investigate the forward method of the loss function looking for an empty tensor but I did not find any.

MemoryError

memory is 12G,only used 8G

python train.py --config=yolact_base_config --batch_size=5
loading annotations into memory...
Done (t=0.06s)
creating index...
index created!
loading annotations into memory...
Done (t=0.02s)
creating index...
index created!
Initializing weights...
Begin training!

[ 0] 0 || B: 8.264 | C: 14.452 | M: 14.870 | S: 3.010 | T: 40.595 || ETA: 0:00:00 || timer: 12.147
[ 0] 10 || B: 9.251 | C: 9.149 | M: 7.010 | S: 2.204 | T: 27.615 || ETA: 0:57:52 || timer: 0.445
[ 0] 20 || B: 8.156 | C: 7.494 | M: 6.613 | S: 1.537 | T: 23.800 || ETA: 1:00:17 || timer: 0.441
[ 0] 30 || B: 8.053 | C: 6.515 | M: 6.317 | S: 1.206 | T: 22.091 || ETA: 1:08:55 || timer: 0.437
[ 0] 40 || B: 7.631 | C: 5.865 | M: 6.203 | S: 0.981 | T: 20.680 || ETA: 1:22:37 || timer: 0.428
[ 0] 50 || B: 7.558 | C: 5.397 | M: 6.149 | S: 0.845 | T: 19.949 || ETA: 1:20:02 || timer: 0.432
Traceback (most recent call last):
File "train.py", line 374, in
train()
File "train.py", line 211, in train
for datum in data_loader:
File "/home/chase/anaconda3/envs/maskrcnn_benchmark1/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 637, in next
return self._process_next_batch(batch)
File "/home/chase/anaconda3/envs/maskrcnn_benchmark1/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 658, in _process_next_batch
raise batch.exc_type(batch.exc_msg)
MemoryError: Traceback (most recent call last):
File "/home/chase/anaconda3/envs/maskrcnn_benchmark1/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 138, in _worker_loop
samples = collate_fn([dataset[i] for i in batch_indices])
File "/home/chase/anaconda3/envs/maskrcnn_benchmark1/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 138, in
samples = collate_fn([dataset[i] for i in batch_indices])
File "/home/chase/yolact/data/coco.py", line 88, in getitem
im, gt, masks, h, w, num_crowds = self.pull_item(index)
File "/home/chase/yolact/data/coco.py", line 151, in pull_item
{'num_crowds': num_crowds, 'labels': target[:, 4]})
File "/home/chase/yolact/utils/augmentations.py", line 658, in call
return self.augment(img, masks, boxes, labels)
File "/home/chase/yolact/utils/augmentations.py", line 54, in call
img, masks, boxes, labels = t(img, masks, boxes, labels)
File "/home/chase/yolact/utils/augmentations.py", line 380, in call
current_masks = masks[mask, :, :].copy()
MemoryError

Is YOLACT feasible on mobile devices?

First of all, I would like to thank you for your outstanding contribution. Secondly, I would like to ask how the algorithm you proposed works on mobile devices with insufficient computing power and computing memory. Could you give me some reasonable Suggestions? Thank you so much!

eval.py does not process all 5k images

When I run:

python eval.py --trained_model=weights/yolact_base_54_800000.pth --dataset=coco2017_dataset

It only evaluates 4952 images. Any ideas on why it does't go though the 5000 images in ./data/coco/images/ ?

The image folder has 5000 images and the annotations_val2017.json file has annotations for those images.

What do I need to change so that it evaluates the complete set of images? (5k)

IndexError: list index out of range

Hello!
I trained this model with own dataset, but it fails in the mAP evaluation phase, does anyone have the same problem?

(tensorflow) root@gpuserver:/home/gpuserver/models/yolact# python train.py --config=yolact_base_config
loading annotations into memory...
Done (t=0.03s)
creating index...
index created!
loading annotations into memory...
Done (t=0.00s)
creating index...
index created!
Initializing weights...
Begin training!

/root/anaconda3/envs/tensorflow/lib/python3.6/site-packages/torch/nn/parallel/_functions.py:61: UserWarning: Was asked to gather along dimension 0, but all input tensors were scalars; will instead unsqueeze and return a vector.
warnings.warn('Was asked to gather along dimension 0, but all '
[ 0] 0 || B: 5.480 | C: 23.075 | M: 5.976 | S: 67.004 | T: 101.536 || ETA: 0:00:00 || timer: 23.377
[ 0] 10 || B: 4.757 | C: 18.774 | M: 5.625 | S: 47.000 | T: 76.155 || ETA: 11 days, 7:25:02 || timer: 1.176
[ 0] 20 || B: 4.587 | C: 15.804 | M: 5.362 | S: 29.147 | T: 54.900 || ETA: 11 days, 7:50:00 || timer: 1.180
[ 0] 30 || B: 4.582 | C: 13.355 | M: 5.309 | S: 19.954 | T: 43.199 || ETA: 11 days, 6:14:29 || timer: 1.272
[ 0] 40 || B: 4.553 | C: 11.175 | M: 5.306 | S: 15.150 | T: 36.183 || ETA: 11 days, 6:35:18 || timer: 1.266
[ 0] 50 || B: 4.497 | C: 9.617 | M: 5.303 | S: 12.227 | T: 31.645 || ETA: 11 days, 6:12:37 || timer: 1.120
[ 0] 60 || B: 4.433 | C: 8.514 | M: 5.290 | S: 10.265 | T: 28.503 || ETA: 11 days, 4:48:22 || timer: 1.166
[ 1] 70 || B: 4.383 | C: 7.700 | M: 5.304 | S: 8.850 | T: 26.237 || ETA: 11 days, 7:53:19 || timer: 1.236
[ 1] 80 || B: 4.339 | C: 7.073 | M: 5.269 | S: 7.781 | T: 24.464 || ETA: 11 days, 6:55:00 || timer: 1.173
[ 1] 90 || B: 4.294 | C: 6.585 | M: 5.250 | S: 6.945 | T: 23.074 || ETA: 11 days, 6:22:39 || timer: 1.217
[ 1] 100 || B: 4.235 | C: 6.015 | M: 5.230 | S: 5.666 | T: 21.147 || ETA: 11 days, 5:30:35 || timer: 1.259
[ 1] 110 || B: 4.131 | C: 4.426 | M: 5.184 | S: 1.178 | T: 14.920 || ETA: 11 days, 4:57:38 || timer: 1.177
[ 1] 120 || B: 4.045 | C: 3.427 | M: 5.202 | S: 0.242 | T: 12.915 || ETA: 11 days, 4:43:12 || timer: 1.214
[ 2] 130 || B: 3.926 | C: 2.860 | M: 5.195 | S: 0.192 | T: 12.174 || ETA: 11 days, 5:57:53 || timer: 2.714
[ 2] 140 || B: 3.817 | C: 2.654 | M: 5.138 | S: 0.180 | T: 11.789 || ETA: 11 days, 5:43:35 || timer: 1.230
[ 2] 150 || B: 3.694 | C: 2.571 | M: 5.045 | S: 0.170 | T: 11.480 || ETA: 11 days, 5:23:29 || timer: 1.217
[ 2] 160 || B: 3.617 | C: 2.516 | M: 4.966 | S: 0.158 | T: 11.256 || ETA: 11 days, 5:37:45 || timer: 1.277
[ 2] 170 || B: 3.540 | C: 2.467 | M: 4.876 | S: 0.149 | T: 11.031 || ETA: 11 days, 5:16:56 || timer: 1.222
[ 2] 180 || B: 3.440 | C: 2.419 | M: 4.831 | S: 0.141 | T: 10.831 || ETA: 11 days, 4:54:42 || timer: 1.176
[ 2] 190 || B: 3.342 | C: 2.364 | M: 4.716 | S: 0.135 | T: 10.558 || ETA: 11 days, 4:41:58 || timer: 1.187

Computing validation mAP (this may take a while)...

Traceback (most recent call last):
File "train.py", line 374, in
train()
File "train.py", line 303, in train
compute_validation_map(yolact_net, val_dataset)
File "train.py", line 367, in compute_validation_map
eval_script.evaluate(yolact_net, dataset, train_mode=True)
File "/home/gpuserver/models/yolact/eval.py", line 791, in evaluate
prep_metrics(ap_data, preds, img, gt, gt_masks, h, w, num_crowd, dataset.ids[image_idx], detections)
File "/home/gpuserver/models/yolact/eval.py", line 401, in prep_metrics
ap_obj = ap_data[iou_type][iouIdx][_class]
IndexError: list index out of range

A problem in traditional NMS.

yolact/layers/functions/detection.py

Line 189 in d8ddaa1

boxes = boxes * cfg.mask_size

I thought there is a mistake in the implementation of traditional_nms. When NMS finished, did the boxes need to be rescaled into [0, 1]?

Evaluation with Multiple GPUs [FPS is low while evaluating video]

Hi I am trying to run eval.py and I am getting an average FPS of 8.54 [approx]. I want the FPS to increase. So is there anyway by which the eval.py can use multiple GPUs?

Thanks.

A issue for custom dataset

Hi, thanks for your work. Recently I am trying to train the net using my custom dataset. There is an issue that I find it hard to debug it by myself. Here is my problem. Thanks a lot for your help again.

[ 2] 2930 || B: 3.808 | C: 2.416 | M: 4.821 | S: 0.049 | T: 11.094 || ETA: 4 days, 14:47:05 || timer: 0.478
[ 2] 2940 || B: 3.795 | C: 2.418 | M: 4.838 | S: 0.049 | T: 11.101 || ETA: 4 days, 14:47:10 || timer: 0.497
[ 2] 2950 || B: 3.787 | C: 2.421 | M: 4.812 | S: 0.049 | T: 11.069 || ETA: 4 days, 14:49:01 || timer: 0.474
[ 2] 2960 || B: 3.778 | C: 2.422 | M: 4.846 | S: 0.049 | T: 11.095 || ETA: 4 days, 14:49:52 || timer: 0.512
[ 2] 2970 || B: 3.748 | C: 2.419 | M: 4.846 | S: 0.048 | T: 11.061 || ETA: 4 days, 14:49:04 || timer: 0.491

Computing validation mAP (this may take a while)...

Traceback (most recent call last):
File "train.py", line 377, in
train()
File "train.py", line 300, in train
compute_validation_map(yolact_net, val_dataset)
File "train.py", line 370, in compute_validation_map
eval_script.evaluate(yolact_net, dataset, train_mode=True)
File "/data/pancreas/root/yolact-master/eval.py", line 869, in evaluate
prep_metrics(ap_data, preds, img, gt, gt_masks, h, w, num_crowd, dataset.ids[image_idx], detections)
File "/data/pancreas/root/yolact-master/eval.py", line 433, in prep_metrics
ap_obj = ap_data[iou_type][iouIdx][_class]
IndexError: list index out of range

How to see graph structure?

Hi sir.
I want to see the data flow to understand this article. However, I nerver use torch. Could you send me a graph logdir by tensorboardX? Thank you in advance.

Fine tuning with existing model

Hi,

I tried to train a model with a custom dataset and the resnet101 backbone. I noticed that while half of the bounding boxes looked accurate, the masks were completely off. I checked drew the annotations and verified that they are correct.

It could be due to the size of the dataset: 1357 images and 21 classes. I would like to use yolact_im700_54_80000.pth and fine tune it with my custom classes to see if this improves my results. What would be the steps to do this?

how to use your scripts to generate my own anchor sizes and scales?

Dear Sir:
I have some problem to understand your cluster_bbox_sizes.py, optimize_bboxes.py and bbox_recall.py. I really want use them to set the parameters: scales aspect_ratios and conv_sizes more reasonable.
Could you please explain a little of what these means? Thanks a lot!

I use the default paras as the yolact_base.cfg does, and test the scripts on a dataset
scales = [ [24],[48],[96],[192],[384] ] aspect_ratios = [ [[1, 1/sqrt(2), sqrt(2)]] ]*5 conv_sizes = [(69, 69), (35, 35), (18, 18), (9, 9),(5,5)]
here are the results:
from: cluster
`0.062 (18) aspect ratios:
17.71 (8)
5.23 (8)
109.76 (2)

0.146 (70) aspect ratios:
4.39 (34)
2.26 (30)
0.65 (6)

0.241 (125) aspect ratios:
1.12 (103)
0.23 (21)
0.00 (1)
`

from optimize_bbox:

`(Iteration 9) Aspect Ratios: [[[19.03, 0.55, 1.13]], [[13.94, 13.64, 14.24]], [[13.94, 13.64, 14.24]], [[13.94, 13.64, 14.24]], [[13.94, 13.64, 14.24]]]

scales = [[17.53], [60.94], [108.94], [204.94], [396.94]]

aspect_ratios = [[[19.03, 0.55, 1.13]], [[13.94, 13.64, 14.24]], [[13.94, 13.64, 14.24]], [[13.94, 13.64, 14.24]], [[13.94, 13.64, 14.24]]]
`

from bbox_recall:

`Total recall: 33.80

small recall: 0.00
medium recall: 0.00
large recall: 46.75
`

Thanks a lot! It's a bit hard for me >o<

How to use web service?

Hello, could you please show me how to use the scripts in the "root/web" subdirectory?

Training time is long?

Hi, dbolya.

Thanks for your work. I tried to reproduce the performance with ResNet50 pre-trained model and used the command 'python train.py --config=yolact_resnet50_config'. While training, I found that it need about 30 days to finish the training which was too long. Then I set batch_size = 32 because I have 8 GPUs, but it remains the same. The total training time was still about 30 days.

Did I do anything wrong? Or the training time is actually long? How can I use Multi-GPU to accelerate training?

Thanks!

How to run eval.py without cuda?

Hello, I'm trying to run eval.py, but got an error.
The error message is:

Traceback (most recent call last):
File "eval.py", line 990, in
torch.set_default_tensor_type('torch.cuda.FloatTensor')
File "/home/administrator/anaconda3/lib/python3.7/site-packages/torch/init.py", line 158, in set_default_tensor_type
_C._set_default_tensor_type(t)
File "/home/administrator/anaconda3/lib/python3.7/site-packages/torch/cuda/init.py", line 161, in _lazy_init
_check_driver()
File "/home/administrator/anaconda3/lib/python3.7/site-packages/torch/cuda/init.py", line 75, in _check_driver
raise AssertionError("Torch not compiled with CUDA enabled")
AssertionError: Torch not compiled with CUDA enabled

I don't have gpu graphic card on my pc, and how to run eval.py without cuda? Thanks.

missing in close object (while small objects still can detect)

Here is a result :

in a video sequence, the close object always miss detect, while the behind smaller object can segment out... but models trained on yolov3 detection only, can predict very precisely on this video..
what could be the reason

training my dataset with multi gpus

Hi, thanks for your good job!
I want to train my dataset, and using for 4gpus, but I find it slower than single gpu(same batch_size), why?

What will happen if I change the backbone network into MobileNet-v2 with FPN?

Hi,
I want to try to change the backbone network into MobileNet-v2 with FPN. Is there any suggestions? THX!!!

'max_size' or 'mask_size'?

Hi, i think you may want to use cfg.max_size, should it be?

yolact/layers/functions/detection.py

Line 188 in d04c948

    
           # Multiplying by max_size is necessary because of how cnms computes its area and intersections

yolact/layers/functions/detection.py

Line 189 in d04c948

boxes = boxes * cfg.mask_size

Bug in eval.py

Is this line a bug? why is there a data type in function defination?
https://github.com/dbolya/yolact/blob/dev/eval.py#L244

Not able to get 30+ fps processing speed on Nvidia RTX 2080 GPU

Hello, first off, thank you for sharing this amazing work. Much appreciated.

I wanted to report in that I also could not get 30+fps on an Nvidia RTX 2080 GPU with 8GB RAM. I am getting 8-10fps with video and with images, I get ~16fps (0.06sec/image) with the Resnet-101 model, ~20fps (0.05sec/image) with the Resnet-50 model and 17-18fps (0.055sec/image) with the Darket53 model. This is quite impressive but its roughly 1/2 of what is reported in the paper. For images, I used the python timeit module to wrap the evalimage function to report my numbers. Also, it is weird that the difference in speed between the different models is not significant (especially between Resnet-101 and Resnet-50), which indicates to me that something is reducing the processing speed by ~1/2 for all the models.

The command I am using is as below (except I change the model name as needed):

python3 eval.py --trained_model=weights/yolact_resnet50_54_800000.pth --score_threshold=0.4 --top_k=100 --images=./test_images:./test_output_images

I also tried using --benchmark but there is no change in the numbers above.

I was wondering if I could get some help to figure this out.

How to tag images for training on custom dataset ?

What is format of training ?
What tool we can use to tag images ?
What is the command for training ?

preserve_aspect_ratio question

I am training on cityscapes, so I want to preserver the ratio.(1024, 2048)
However, after turn on preserve ratio, loss keep decrease but the visualization of bounding box position always wrong.

And I find this line use max_size both at width and height.
I think it should be b_w, b_h = (int(cfg.max_size / r_w * w), int(cfg.min_size / r_h * h)).
or directly b_w, b_h =w, h
I don't understand the comment # A hack to scale the bboxes to the right size
I wonder is this a bug or some trick?

yolact/layers/output_utils.py

Line 68 in 5dd130d

b_w, b_h = (cfg.max_size / r_w * w, cfg.max_size / r_h * h)

Thanks

Support on Multi-GPU？

Hi, dbolya,

I did not find dataparallel in your yolact.py, which define the model. So the code in your repo did not support multi-gpu properly?
I tried simple CUDA_VISIBLE_DEVICES to assign multi-gpu, but the performance is not right according to the train log.

Thanks!

About Training Implenmentation detail of yolact

Thanks for sharing your your great work!
I compared yolact's training config with that of retinanet since yolact is based on retinanet(I think)
I have a few questions about the training config of yolact.
(1) the batch size on one GPU is 8, so how many GPUs did you use when training? 4 or 8? which means that total batch size is 32 or 64. Retinanet's batch size is 16.
(2) the iterations is 800k, which is almost 10x larger than retinanet. why?
(3) the learning rate is 1e-3, which is 10 times smaller than retinanet, why?

Thanks!

？？？ a bug when i training

[ 0] 3180 || B: 3.273 | C: 6.118 | M: 5.300 | S: 1.431 | T: 16.121 || ETA: 8 days, 0:27:19 || timer: 0.833
[ 0] 3190 || B: 3.251 | C: 6.134 | M: 5.046 | S: 1.343 | T: 15.774 || ETA: 8 days, 0:20:56 || timer: 0.924
[ 0] 3200 || B: 3.220 | C: 6.074 | M: 5.023 | S: 1.346 | T: 15.663 || ETA: 8 days, 0:14:25 || timer: 0.922
[ 0] 3210 || B: 3.249 | C: 6.012 | M: 4.997 | S: 1.397 | T: 15.655 || ETA: 8 days, 0:03:42 || timer: 0.824
[ 0] 3220 || B: 3.167 | C: 5.980 | M: 4.841 | S: 1.368 | T: 15.355 || ETA: 7 days, 23:56:10 || timer: 0.831
/opt/conda/conda-bld/pytorch_1550813258230/work/aten/src/THCUNN/BCECriterion.cu:57: void bce_updateOutput_no_reduce_functor<Dtype, Acctype>::operator()(const Dtype *, const Dt
ype *, Dtype *) [with Dtype = float, Acctype = float]: block: [33,0,0], thread: [192,0,0] Assertion *input >= 0. && *input <= 1. failed.
/opt/conda/conda-bld/pytorch_1550813258230/work/aten/src/THCUNN/BCECriterion.cu:57: void bce_updateOutput_no_reduce_functor<Dtype, Acctype>::operator()(const Dtype *, const Dt
ype *, Dtype *) [with Dtype = float, Acctype = float]: block: [33,0,0], thread: [193,0,0] Assertion *input >= 0. && *input <= 1.THCudaCheck FAIL file=/opt/conda/conda-bld/pyt orch_1550813258230/work/aten/src/THC/generated/../THCReduceAll.cuh line=317 error=59 : device-side assert triggered failed.

can you help me solve it? Thanks

AssertionError: Torch not compiled with CUDA enabled

I have NVIDIA GeForce RTX 2080 Ti，

How to label the same object in the picture?

For example, if there are multiple people in a picture, they are labeled as person 1,person2,person3?

AHHHH all our anchors where squares this whole time

yolact/yolact.py

Line 275 in cb3857a

h = scale * ar / cfg.max_size

That should be division not multiplication oh nooooooo
How could I have missed that ahhhhhh

Time to assess the damages

Could u please give the training steps on my own datasets?

Dear sir:
I'm really interested in your fantastic work,
Could u please give the training steps on my own datasets?
Thanks a lot!~

Do we really need to shuffle the input channels of the image?

yolact/utils/augmentations.py

Line 481 in d630bf6

self.rand_light_noise = RandomLightingNoise()

That's just gray-scale with noise right?

what inspire you the prototypenet?

I know the retinanet inspire the basic backbone, ssd inspire the loss, mask-rcnn inspire the branch,
but I wonder what inspire you the protonet?

Issue while running eval.py scripts

I am running this on a linux 18.04 box with python3 and all the most recent versions of the libraries. Any Idea why I get this error?

$ python3 eval.py --trained_model=weights/yolact_base_54_800000.pth --score_threshold=0.3 --top_k=100 --video=/home/vib/Desktop/AndurilSRC/LPR_DATA/lotsofcars_1.mp4:output_video-det.mp4

Config not specified. Parsed yolact_base_config from the file name.

Loading model... Done.
Traceback (most recent call last):
File "eval.py", line 935, in
evaluate(net, dataset)
File "eval.py", line 722, in evaluate
savevideo(net, inp, out)
File "eval.py", line 682, in savevideo
preds = net(batch)
File "/home/vib/.local/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in call
result = self.forward(*input, **kwargs)
File "/home/vib/Desktop/Personal/yolact/yolact.py", line 612, in forward
return self.detect(pred_outs)
File "/home/vib/Desktop/Personal/yolact/layers/functions/detection.py", line 76, in call
result = self.detect(batch_idx, conf_preds, decoded_boxes, mask_data, inst_data)
File "/home/vib/Desktop/Personal/yolact/layers/functions/detection.py", line 103, in detect
boxes, masks, classes, scores = self.fast_nms(boxes, masks, scores, self.nms_thresh, self.top_k)
File "/home/vib/Desktop/Personal/yolact/layers/functions/detection.py", line 148, in fast_nms
iou.triu_(diagonal=1)
RuntimeError: invalid argument 1: expected a matrix at /pytorch/aten/src/THC/generic/THCTensorMathPairwise.cu:203
FAIL

Any idea to transform eval result( box, masks, classes) to COCO style ? the keypoint is how to change masks to segmentations

DO NOT use the latest pytorch version(1.1.0)

Please do not use the latest pytorch version 1.1.0 which may cause the CUDA errors.

the default pytorch install command is
conda install pytorch torchvision cudatoolkit=10.0 -c pytorch

use the below instead
pip3 install https://download.pytorch.org/whl/cu100/torch-1.0.1-cp36-cp36m-win_amd64.whl

one day wasted to debug this [cry]

UR CODE WAS COPIED FOR SELL NOW

http://manaai.cn/aicodes_detail3.html?id=32
A group of contemptible thieves has clone ur code, replace ur license with theirs and peddling it on their "WEBSITE".

Compute Validation Loss

Hi, is there a way to get validation loss during training? I want to monitor it for overfitting cases.

I noticed you had it before (which is giving me errors), but the overhaul has removed it.

Thanks.

Computational time with own code

Hi, thank you for the awesome work!
For some reasons, I have to re-write your eval.py by myself.
However, if I run the code, it will take 2 seconds just for prediction.
Do you have any idea why is it?

I already checked I enabled GPU.


import os
from data import COCODetection, MEANS, COLORS, COCO_CLASSES
from yolact import Yolact
from utils.augmentations import BaseTransform, FastBaseTransform, Resize
from utils.functions import MovingAverage, ProgressBar
from layers.box_utils import jaccard, center_size
from utils import timer
from utils.functions import SavePath
from layers.output_utils import postprocess, undo_image_transformation
import pycocotools

from data import cfg, set_cfg, set_dataset

import numpy as np
import torch
import torch.backends.cudnn as cudnn
from torch.autograd import Variable
import argparse
import time
import random
import cProfile
import pickle
import json
import os
from pathlib import Path
from collections import OrderedDict
from PIL import Image

import matplotlib.pyplot as plt
import time

set_cfg("yolact_resnet50_config")
with torch.no_grad():
    torch.cuda.set_device(1)
    cudnn.benchmark = True
    cudnn.fastest = True
    torch.set_default_tensor_type('torch.cuda.FloatTensor')
    net = Yolact()
    net.load_weights('./weights/yolact_resnet50_54_800000.pth')
    net.eval()
    net = net.cuda()
print('model loaded...')

#run your code
def execute(rgb_image):
    net.detect.cross_class_nms = True
    net.detect.use_fast_nms = True
    cfg.mask_proto_debug = False
    with torch.no_grad():
        frame = torch.Tensor(rgb_image).cuda().float()
        batch = FastBaseTransform()(frame.unsqueeze(0))
        time_start = time.clock()
        preds = net(batch)
        time_elapsed = (time.clock() - time_start)
        h, w, _ = rgb_image.shape
        t = postprocess(preds, w, h, visualize_lincomb=False, crop_masks=True, score_threshold=0)
        torch.cuda.synchronize()
        
        classes, scores, boxes, masks = [x[:MAX_MASK_SIZE].cpu().numpy() for x in t]

        print(time_elapsed)

more experienments would be nicer

the paper says that box2pix relies on an extremely light-weight backbone detector.
I think more experienments maybe nicer. maybe like this
kitti cityscape coco
box2pix
yolact

also ,yolact-lite maybe good,just like yolo-lite using light-weight backbone(like xception).
this is the yolact v1 just like yolo v1.
I am wondering if the encoder-decoder achitecture or the atrous convolution may help which is adopped by deeplab v3 plus.
expecting yolact v2...

dbolya / yolact Goto Github PK

yolact's People

Contributors

Stargazers

Watchers

Forkers

yolact's Issues

Recommend Projects

Recommend Topics

Recommend Org