Code Monkey home page Code Monkey logo

cutmix-semisup-seg's People

Contributors

britefury avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

cutmix-semisup-seg's Issues

How to run the jobs on multiple GPUs.

Great job for the semi-supervised semantic segmentation problem!

After checking the implementations, we do not find the related code to support running jobs with multiple GPUs,

We only find that the job function in the job_helper.py contains a parameter "num_jobs" as shown below,

def job(job_name, enumerate_job_names=True):
"""
Decorator to turn a function into a job submitter.
Usage:
>>> @job('wait_some_time')
... def wait_some_time(submit_config: SubmitConfig, iteration_count):
... # Create a run context (hides low level details, exposes simple API to manage the run)
... with dnnlib.RunContext(submit_config) as ctx:
...
... fn = os.path.join(submit_config.run_dir, "output.txt")
... with open(fn, 'w') as f:
... f.write("Works!")
...
... print('Training...')
... for i in range(iteration_count):
... if ctx.should_stop():
... break
...
... time.sleep(1.0)
... ctx.update(loss='%2f' % i, cur_epoch=i, max_epoch=iteration_count)
To submit a job:
>>> wait_some_time.submit(on='local', job_desc='description_to_identify_specific_job', iteration_count=50)
:param job_name: The name to be given to the job
:param module_name: If necessary, name the module in which the job function resides
:param docker_image: Provide the path to the docker image required for this job
:param num_gpus: The number of GPUs required

Pretrained models

Hi @Britefury,

Thank you for making the code publicly available.

Can you share the pretrained models for the deeplabv2 architecture (Cutmix, Cutout, VAT and ICT) please?
We are having troubles reproducing the results.

Thanks in advance.

Question about the cons weight.

Hi. In mean teacher, the consistency weight is 100. but in this work, all consistency weight is 1. Isn't this value too small?
Can you tell me the details of setting this parameter, because I have seen other work(such as
CPS, cvpr2021) that uses a consistency weight of around 100 when reproducing the mean-teacher method as well.

Looking forward to your help.

best,

Small concerns on the experiments in Table-4

Really nice work and super impressive results on the low-data regime (shown in Table 4, also pasted as following)!!

image

We have some small concerns about the red circle marked results:

  1. On the 1/100 subset column, the baseline of DeepLabv3+/PSPNet is only 37.95%/36.69% while your method achieves 59.52%/67.20% separately. We really appreciate you if you could provide some detailed explanation of why your method could achieve so huge gains! One of our small concerns is that applying the CutMix scheme + Mean-Teacher scheme on the supervised baseline method w/o using unlabeled images might be a more reasonable baseline setting. It would be great if could share with us some results of such settings.

  2. On the 1/8 subset column, the baseline performance of DeepLabv3+ is slightly worse than the PSPNet. According to our experience, the DeepLabv3+ should perform much better. Could you share with us some explanation on it?

  3. On the full set column, we observe that the performance of the proposed method is slightly worse than the baseline. Could you share your comments on the possible reasons?

  4. According to your code, all the experiments fix the BN statics and apply crop size 321x321 with a single GPU. Do you have any plans to train or have you ever trained your method on a more strong setting such as with crop size 512x512 + SyncBN + 8x V100 GPUs. MMSegmentation or openseg.pytorch might be a good candidate codebase.

Great thanks for your valuable time and wait for your explanation!

What is the "mask_arr" in the code?

Hi. Thanks for your interesting work!

I have a question about the following code:

if self.mask_flag:
if self.pipeline_type == 'pil':
sample0['mask_pil'] = Image.new('L', size_xy, 255)
elif self.pipeline_type == 'cv':
sample0['mask_arr'] = np.full(size_xy[::-1], 255, dtype=np.uint8)
else:
raise RuntimeError

What is the "mask_arr" here? All elements of this array are set to 255. Why should we define it here?

In the file of the main experiment, "mask_arr" is converted to "mask" and called by:

batch_um0 = unsup_batch0['mask'].to(torch_device)

batch_um1 = unsup_batch1['mask'].to(torch_device)

batch_um_mixed = batch_um0 * (1 - batch_mix_masks) + batch_um1 * batch_mix_masks

loss_mask = batch_um_mixed

In the above code, a "loss_mask" is generated for unlabeled loss, i.e., the consistency loss. Are all elements of "loss_mask" equal to 1?
Can you explain what it does?

Thanks in advance.

Confused on the "loss_mask"

Hi, really nice codebase!

We have tried to understand the details of your implementation and find the usage of the loss mask a little bit confusing (as shown below).

batch_ux0 = unsup_batch0['image'].to(torch_device)
batch_um0 = unsup_batch0['mask'].to(torch_device)
batch_ux1 = unsup_batch1['image'].to(torch_device)
batch_um1 = unsup_batch1['mask'].to(torch_device)

First, I have checked the code carefully and find that you define the 'mask' for the unlabeled images at the following code snippet:

if self.mask_flag:
if self.pipeline_type == 'pil':
sample0['mask_pil'] = Image.new('L', size_xy, 255)
elif self.pipeline_type == 'cv':
sample0['mask_arr'] = np.full(size_xy[::-1], 255, dtype=np.uint8)
else:
raise RuntimeError

Therefore, we can see that the initial 'mask' for all images is set as an array of 255 with the same size as the input image. Then you assign the mask value as following:

image, labels, mask, xf = sample0['image_arr'], sample0.get('labels_arr'), sample0.get('mask_arr'), sample0.get('xf_cv')

According to my understanding, only when we apply the cv2.warpAffine transform, some of the 'mask' values become zero (as shown below):

if 'mask_arr' in sample0:
sample0['mask_arr'] = cv2.warpAffine(sample0['mask_arr'], local_xf[0], self.crop_size[::-1], flags=interpolation, borderValue=0, borderMode=cv2.BORDER_CONSTANT)

If my understanding is correct, the batch_um0 and batch_um1 might not be necessary if we do not apply the warpAffine transformation.

batch_ux0 = unsup_batch0['image'].to(torch_device)
batch_um0 = unsup_batch0['mask'].to(torch_device)
batch_ux1 = unsup_batch1['image'].to(torch_device)
batch_um1 = unsup_batch1['mask'].to(torch_device)

So my first question is about the influence of affine.cat_nx2x3 (shown as below) on the final performance.

# Build affine transformation matrix
local_xf = affine.cat_nx2x3(
affine.translation_matrices(self.crop_size_arr[None, ::-1] * 0.5),
affine.rotation_matrices(rot_theta),
affine.scale_matrices(scale_factor_yx[None, ::-1]),
affine.translation_matrices(-centre[None, ::-1]),
)

So I guess the purpose of the loss mask is mainly to filter out the influence of the extra introduced pixels when applying the warpAffine augmentation. Is my understanding correct?

It would be great if you could point out any of my misunderstandings!

Questions in converting cityscapes dataset

Thanks for your excellent work and kindly code releasing. I have a problem when reproducing the cityscapes experiment.

Following your tutorial, I have reproduced the experimental results, 44.37 vs. 51.73 with 6.79 improvement under the cutmix setting. When delving deep into the codes, I notice that you implement a downsample_label_img function for downsampling ground truth in the convert_cityscapes.py. However, it is more common to use nearest downsampling method in cv2 or PIL. Then, I replace the downsample_label_img function with the following:
y_img = cv2.resize(y_img, (1024, 512), interpolation=cv2.INTER_NEAREST)
and re-conduct the cityscapes experiment. The results degrades greatly, 43.79 vs. 47.02 with only 3.23 improvement.

I wonder the reason why a different downsample method affects the result greatly and the motivation to reimplement a downsample method rather than using a common one.

Improvement due to loss function or data augmentation?

Hi, I went through the paper and had a tough time understanding the jist of it. Is it:

  1. Cut-Mix for segmentation images improving the accuracy metrics or
  2. The consistency loss function is improving your scores

In a more naive way, is the work about cut-mix for segmentation(orginal is for classification) or the work is about consistency loss

Requesting pretrained models

Hi, I found your paper really insightful, been trying to replicate the results. Thanks for the code and the elaborate explanations with training scripts. I was wondering if you could share the pretrained models for Cityscapes dataset (Cutout, Cutmix and baseline models) for comparison and evaluation.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.