epfl-vilab / multimae Goto Github PK

View Code? Open in Web Editor NEW

536.0 536.0 59.0 5.05 MB

MultiMAE: Multi-modal Multi-task Masked Autoencoders, ECCV 2022

Home Page: https://multimae.epfl.ch

License: Other

Python 99.92% Shell 0.08%

multimae's People

Contributors

Stargazers

Watchers

Forkers

cryptowealth-technology vinayak015 nobelvictory animesh cuibaby mingyuyng jadamoureen laweissman jlqzzz guoyang-xie shuguoj hao1305430263 sanyamlakhanpal yibingwei-1 kamzero wh-forker mathisall xinweijiang fasladodo ericxsun dl-mae zebrajack zivzone boai01 pkwagner 28smiles vegemiteenjoyer yosungho dhockaday xingcw vivek9chavan xlsean heitorrapela whuhxb whu-lyh parinzee sangminwoo yujilin enthusiastdev121 aliang8 hxdaze axelrolov ashstuff paperwave hell-to-heaven dongyounkim anjusree123 5l1v3r1 fraware viznuv kim-jake langnico little-white001 bqm1111 keshav1711 munish30monga rasmuspjohansson

multimae's Issues

making 'mask_valid' folder in evaluate

Hello!
I just want to setup this nice work quickly, and got a problem with inference.
I used 'run_finetuning_depth.py' trying to see the code running well, also followed 'setup.md' to download NYUv2 dataset and structure folders.
But got not found error 'mask_valid' folder.
How can I make the 'mask_valid' folder for evaluate only?

how to download and prepare NYUv2

Thanks for releasing this code.

I'm trying to reproduce fine-tuning for use NYUv2, but the downloaded data doesn't match the extension of the IMG_EXTENSIONS

I need to know what pretreatment I should prepare.

"mask_valid" parameter

Hi! Thanks for the great work! 💯
I am a bit confused about this parameter the in yaml file --
use_mask_valid: True # Requires "task" mask_valid to be saved to disk

When I set that to False --I get the following error while
fine-tuning a ViT based model on the NYU dataset--

Original Traceback (most recent call last):
  File "/gscratch/sciencehub/kmarathe/miniconda3/envs/py38/lib/python3.8/site-packages/torch/utils/data/_utils/worker.py", line 302, in _worker_loop
    data = fetcher.fetch(index)
  File "/gscratch/sciencehub/kmarathe/miniconda3/envs/py38/lib/python3.8/site-packages/torch/utils/data/_utils/fetch.py", line 49, in fetch
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "/gscratch/sciencehub/kmarathe/miniconda3/envs/py38/lib/python3.8/site-packages/torch/utils/data/_utils/fetch.py", line 49, in <listcomp>
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "/mmfs1/gscratch/sciencehub/kmarathe/models/SSL/MAE/mae/evals/MultiMAE/utils/dataset_folder.py", line 307, in __getitem__
    sample_dict = self.transform(sample_dict)
  File "/mmfs1/gscratch/sciencehub/kmarathe/models/SSL/MAE/mae/evals/MultiMAE/utils/dataset_regression.py", line 120, in __call__
    task_dict['mask_valid'] = (task_dict['mask_valid'] == 255)[None]

KeyError: 'mask_valid'

When I set that to True -- I get the following. I used the provided script to create the NYU dataset. -- It did save train test data and depth maps, but there was no folder called mask_valid.


RuntimeError: Found 0 logs in subfolders of: /gscratch/sciencehub/vision_datasets/NYU/train/mask_valid
Supported extensions are: .jpg,.jpeg,.png,.ppm,.bmp,.pgm,.tif,.tiff,.webp,.jpx

I am definitely missing something and any help here is greatly appreciated.

Thanks and regards,
Kalyani

depth and semantic segmentation finetuning using frozen encoder

Hello! Thanks again for the great work!
I have a question about the following part of the fine-tuning code --

Why is there this raise NotImplementedError in the following file? https://github.com/EPFL-VILAB/MultiMAE/blob/main/run_finetuning_depth.py

If we want the numbers by freezing the transformer should we just comment out the error and run?
Is there anything else that needs to be done? I just wanted to make sure.

Thanks for your help,
Kalyani

    # Optionally freeze the encoder
    if args.freeze_transformer:
        raise NotImplementedError
        for param in model.encoder.parameters():
            param.requires_grad = False

OOM！It is necesarrily needed to run with 4 V100?

It is possible to try this model with just one 2080Ti?

Linear probing results

Hey,
Thank you for providing the code for the paper. The paper is really interesting and the project page is very well done!

I was wondering whether you've tested the performance of linear probing on the RGB image when trained with all 3 modalities.
The results of the original MAE paper were not very good, it is interesting to understand if the additional supervision creates better representations that translate into better linear probing scores.

Thanks,
Eliahu

Missing NYU normals data

Hi! it seems like the link to the NYU depth normals data https://cs.nyu.edu/~deigen/dnl/normals_gt.tgz is down. Is there an alternative mirror where we could get this data, or is it possible to run the code without it? I tried getting it from wayback machine but I get some pre-processing errors in DataAugmentationForRegression when I try to transform the masks, which leads me to believe it's a potential data issue. Thanks!

cv2.error: OpenCV(4.7.0) :-1: error: (-5:Bad argument) in function 'resize'
> Overload resolution failed:
>  - src data type = 0 is not supported
>  - Expected Ptr<cv::UMat> for argument 'src'

Example usage of regular MAE Weights

Hey awesome work! I am trying to figure out how to modify the demo notebook to use the regular MAE instead of multiMAE. In particular i comment out all depth and semseg info but the resulting image infilling looks corrupted. Could you by chance share an example of proper usage of the regular MAE weights? Thanks so much for the help!

Reproduction of NYU Depth results gets different numbers?

Hello! I'm trying to reproduce the NYU depth results but getting much higher delta_1's than shown in the paper (but also potentially elsewhere). As you can see below attached, I've run it with three different backbones (DEiT from timm, MAE, and MultiMAE pre-trained weights) and gotten a best delta_1 of around 89. I followed the NYU dataset preparation steps, but seems like something may have gone wrong. Any tips would be appreciated. Perhaps the masking is incorrect?

The reason why depth map should be divided 2**16 ?

Thank you for your great Multi-MAE, We observed that in https://github.com/EPFL-VILAB/MultiMAE/blob/main/utils/datasets.py line 96, you use img = torch.Tensor(np.array(task_dict[task]) / 2 ** 16). Can you tell me the reason why depth map should be divided 2**16 ? Is there any problems without this operation?

Input as RGBD

Hello, first of all thank you for your work it was a very interesting paper and releasing the code is very appreciated (it is well coded too).
I was wondering if you exprimented at some point with RGB-D input, where you would just concatenate the depth channel?

ADE20K dataset structure for semantic segmentation

Hi,

First of all, thanks for your amazing work!

We're trying to reproduce the paper results and stumbled over how to set up semantic segmentation finetuning with ADE20K. To our surprise, the data loader seems to expect the same root/task_a/class_x/xxx.ext folder hierarchy known from classification. However, as the images naturally contain more than a single semantic class, we're not sure how the images are supposed to be arranged.

Could you hence give us a hint on how the data should be structured to work with the provided ft_ade_64e_multimae-b_rgb.yaml configuration?

Thank you,
Paul

how to evaluate/test mae or multimae on test dataset?

@dmizr @amir32002 @roman-bachmann thanks for the paper

is there a way in the code already to perform testing on a different dataset or is it required to code it up ourselves?

how do i do multimodal classification?

I want to use RGB + something else as input modalities and do finetuning cls.

Let me know what files and yml i should look at for changing accordingly.

cc: @roman-bachmann @dmizr

Facing issues in pretraning the code on custom dataset

Hi,

I am trying to pretrain the code on Celeb-HQ dataset and I sucessfully created respective grayscale depth maps(PNG) and grayscale segmentation(PNG) for pretraining.
However, when i try to train "OMP_NUM_THREADS=1 torchrun --nproc_per_node=8 run_pretraining_multimae.py --config cfgs/pretrain/multimae-b_98_rgb+-depth-semseg_1600e.yaml --data_path /home/gargatik/gargatik/Datasets/copy/multimae/train"

I am facing the issue:

Start_______________________

../aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [92,0,0], thread: [30,0,0] Assertion srcIndex < srcSelectDimSize failed.
../aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [92,0,0], thread: [31,0,0] Assertion srcIndex < srcSelectDimSize failed.
Traceback (most recent call last):
File "run_pretraining_multimae.py", line 585, in
main(opts)
File "run_pretraining_multimae.py", line 414, in main
train_stats = train_one_epoch(
File "run_pretraining_multimae.py", line 501, in train_one_epoch
preds, masks = model(
File "/mnt/train-data-3-ssd/gargatik/virtual_env/multimae/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "/mnt/train-data-3-ssd/gargatik/virtual_env/multimae/lib/python3.8/site-packages/torch/nn/parallel/distributed.py", line 1008, in forward
output = self._run_ddp_forward(*inputs, **kwargs)
File "/mnt/train-data-3-ssd/gargatik/virtual_env/multimae/lib/python3.8/site-packages/torch/nn/parallel/distributed.py", line 969, in _run_ddp_forward
return module_to_run(*inputs[0], **kwargs[0])
File "/mnt/train-data-3-ssd/gargatik/virtual_env/multimae/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "/mnt/train-data-3-ssd/gargatik/inpaint_proj/MultiMAE/multimae/multimae.py", line 312, in forward
input_task_tokens = {
File "/mnt/train-data-3-ssd/gargatik/inpaint_proj/MultiMAE/multimae/multimae.py", line 313, in
domain: self.input_adaptersdomain
File "/mnt/train-data-3-ssd/gargatik/virtual_env/multimae/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "/mnt/train-data-3-ssd/gargatik/inpaint_proj/MultiMAE/multimae/input_adapters.py", line 232, in forward
x_patch = rearrange(self.proj(x), 'b d nh nw -> b (nh nw) d')
File "/mnt/train-data-3-ssd/gargatik/virtual_env/multimae/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, *kwargs)
File "/mnt/train-data-3-ssd/gargatik/virtual_env/multimae/lib/python3.8/site-packages/torch/nn/modules/conv.py", line 457, in forward
return self._conv_forward(input, self.weight, self.bias)
File "/mnt/train-data-3-ssd/gargatik/virtual_env/multimae/lib/python3.8/site-packages/torch/nn/modules/conv.py", line 453, in _conv_forward
return F.conv2d(input, weight, bias, self.stride,
RuntimeError: cuDNN error: CUDNN_STATUS_INTERNAL_ERROR
terminate called after throwing an instance of 'c10::CUDAError'
what(): CUDA error: device-side assert triggered
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Exception raised from createEvent at ../aten/src/ATen/cuda/CUDAEvent.h:166 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x3e (0x7f1c685031ee in /mnt/train-data-3-ssd/gargatik/virtual_env/multimae/lib/python3.8/site-packages/torch/lib/libc10.so)
frame #1: + 0xf3c2d (0x7f1caad91c2d in /mnt/train-data-3-ssd/gargatik/virtual_env/multimae/lib/python3.8/site-packages/torch/lib/libtorch_cuda_cpp.so)
frame #2: + 0xf6f6e (0x7f1caad94f6e in /mnt/train-data-3-ssd/gargatik/virtual_env/multimae/lib/python3.8/site-packages/torch/lib/libtorch_cuda_cpp.so)
frame #3: + 0x463418 (0x7f1cba0f6418 in /mnt/train-data-3-ssd/gargatik/virtual_env/multimae/lib/python3.8/site-packages/torch/lib/libtorch_python.so)
frame #4: c10::TensorImpl::release_resources() + 0x175 (0x7f1c684ea7a5 in /mnt/train-data-3-ssd/gargatik/virtual_env/multimae/lib/python3.8/site-packages/torch/lib/libc10.so)
frame #5: + 0x35f2f5 (0x7f1cb9ff22f5 in /mnt/train-data-3-ssd/gargatik/virtual_env/multimae/lib/python3.8/site-packages/torch/lib/libtorch_python.so)
frame #6: + 0x679288 (0x7f1cba30c288 in /mnt/train-data-3-ssd/gargatik/virtual_env/multimae/lib/python3.8/site-packages/torch/lib/libtorch_python.so)
frame #7: THPVariable_subclass_dealloc(_object) + 0x2d5 (0x7f1cba30c655 in /mnt/train-data-3-ssd/gargatik/virtual_env/multimae/lib/python3.8/site-packages/torch/lib/libtorch_python.so)
frame #8: /mnt/train-data-3-ssd/gargatik/virtual_env/multimae/bin/python() [0x5ccad3]
frame #9: /mnt/train-data-3-ssd/gargatik/virtual_env/multimae/bin/python() [0x5d270c]
frame #10: /mnt/train-data-3-ssd/gargatik/virtual_env/multimae/bin/python() [0x5ec780]
frame #11: /mnt/train-data-3-ssd/gargatik/virtual_env/multimae/bin/python() [0x5441f8]
frame #12: /mnt/train-data-3-ssd/gargatik/virtual_env/multimae/bin/python() [0x54424a]
frame #13: /mnt/train-data-3-ssd/gargatik/virtual_env/multimae/bin/python() [0x54424a]
frame #14: PyDict_SetItemString + 0x536 (0x5d1686 in /mnt/train-data-3-ssd/gargatik/virtual_env/multimae/bin/python)
frame #15: PyImport_Cleanup + 0x79 (0x684619 in /mnt/train-data-3-ssd/gargatik/virtual_env/multimae/bin/python)
frame #16: Py_FinalizeEx + 0x7f (0x67f8af in /mnt/train-data-3-ssd/gargatik/virtual_env/multimae/bin/python)
frame #17: Py_RunMain + 0x32d (0x6b70fd in /mnt/train-data-3-ssd/gargatik/virtual_env/multimae/bin/python)
frame #18: Py_BytesMain + 0x2d (0x6b736d in /mnt/train-data-3-ssd/gargatik/virtual_env/multimae/bin/python)
frame #19: __libc_start_main + 0xf3 (0x7f1cd8fc10b3 in /lib/x86_64-linux-gnu/libc.so.6)
frame #20: _start + 0x2e (0x5fa5ce in /mnt/train-data-3-ssd/gargatik/virtual_env/multimae/bin/python)

Traceback (most recent call last):
File "run_pretraining_multimae.py", line 585, in
main(opts)
File "run_pretraining_multimae.py", line 414, in main
train_stats = train_one_epoch(
File "run_pretraining_multimae.py", line 501, in train_one_epoch
preds, masks = model(
File "/mnt/train-data-3-ssd/gargatik/virtual_env/multimae/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "/mnt/train-data-3-ssd/gargatik/virtual_env/multimae/lib/python3.8/site-packages/torch/nn/parallel/distributed.py", line 1008, in forward
output = self._run_ddp_forward(*inputs, **kwargs)
File "/mnt/train-data-3-ssd/gargatik/virtual_env/multimae/lib/python3.8/site-packages/torch/nn/parallel/distributed.py", line 969, in _run_ddp_forward
return module_to_run(*inputs[0], **kwargs[0])
File "/mnt/train-data-3-ssd/gargatik/virtual_env/multimae/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "/mnt/train-data-3-ssd/gargatik/inpaint_proj/MultiMAE/multimae/multimae.py", line 312, in forward
input_task_tokens = {
File "/mnt/train-data-3-ssd/gargatik/inpaint_proj/MultiMAE/multimae/multimae.py", line 313, in
domain: self.input_adaptersdomain
File "/mnt/train-data-3-ssd/gargatik/virtual_env/multimae/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "/mnt/train-data-3-ssd/gargatik/inpaint_proj/MultiMAE/multimae/input_adapters.py", line 232, in forward
x_patch = rearrange(self.proj(x), 'b d nh nw -> b (nh nw) d')
File "/mnt/train-data-3-ssd/gargatik/virtual_env/multimae/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "/mnt/train-data-3-ssd/gargatik/virtual_env/multimae/lib/python3.8/site-packages/torch/nn/modules/conv.py", line 457, in forward
return self._conv_forward(input, self.weight, self.bias)
File "/mnt/train-data-3-ssd/gargatik/virtual_env/multimae/lib/python3.8/site-packages/torch/nn/modules/conv.py", line 453, in _conv_forward
return F.conv2d(input, weight, bias, self.stride,
RuntimeError: cuDNN error: CUDNN_STATUS_INTERNAL_ERROR
You can try to repro this exception using the following code snippet. If that doesn't trigger the error, please include your original repro script when reporting this issue.

import torch
torch.backends.cuda.matmul.allow_tf32 = False
torch.backends.cudnn.benchmark = True
torch.backends.cudnn.deterministic = False
torch.backends.cudnn.allow_tf32 = True
data = torch.randn([256, 64, 56, 56], dtype=torch.half, device='cuda', requires_grad=True).to(memory_format=torch.channels_last)
net = torch.nn.Conv2d(64, 768, kernel_size=[4, 4], padding=[0, 0], stride=[4, 4], dilation=[1, 1], groups=1)
net = net.cuda().half().to(memory_format=torch.channels_last)
out = net(data)
out.backward(torch.randn_like(out))
torch.cuda.synchronize()

ConvolutionParams
data_type = CUDNN_DATA_HALF
padding = [0, 0, 0]
stride = [4, 4, 0]
dilation = [1, 1, 0]
groups = 1
deterministic = false
allow_tf32 = true
input: TensorDescriptor 0xc853ff10
type = CUDNN_DATA_HALF
nbDims = 4
dimA = 256, 64, 56, 56,
strideA = 200704, 1, 3584, 64,
output: TensorDescriptor 0xc8540270
type = CUDNN_DATA_HALF
nbDims = 4
dimA = 256, 768, 14, 14,
strideA = 150528, 1, 10752, 768,
weight: FilterDescriptor 0x819c34f0
type = CUDNN_DATA_HALF
tensor_format = CUDNN_TENSOR_NHWC
nbDims = 4
dimA = 768, 64, 4, 4,
Pointer addresses:
input: 0x7f12aa000000
output: 0x7f12ca000000
weight: 0x7f13d9200c00

terminate called after throwing an instance of 'c10::CUDAError'
what(): CUDA error: device-side assert triggered
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Exception raised from createEvent at ../aten/src/ATen/cuda/CUDAEvent.h:166 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x3e (0x7f147c2811ee in /mnt/train-data-3-ssd/gargatik/virtual_env/multimae/lib/python3.8/site-packages/torch/lib/libc10.so)
frame #1: + 0xf3c2d (0x7f14beb0fc2d in /mnt/train-data-3-ssd/gargatik/virtual_env/multimae/lib/python3.8/site-packages/torch/lib/libtorch_cuda_cpp.so)
frame #2: + 0xf6f6e (0x7f14beb12f6e in /mnt/train-data-3-ssd/gargatik/virtual_env/multimae/lib/python3.8/site-packages/torch/lib/libtorch_cuda_cpp.so)
frame #3: + 0x463418 (0x7f14cde74418 in /mnt/train-data-3-ssd/gargatik/virtual_env/multimae/lib/python3.8/site-packages/torch/lib/libtorch_python.so)
frame #4: c10::TensorImpl::release_resources() + 0x175 (0x7f147c2687a5 in /mnt/train-data-3-ssd/gargatik/virtual_env/multimae/lib/python3.8/site-packages/torch/lib/libc10.so)
frame #5: + 0x35f2f5 (0x7f14cdd702f5 in /mnt/train-data-3-ssd/gargatik/virtual_env/multimae/lib/python3.8/site-packages/torch/lib/libtorch_python.so)
frame #6: + 0x679288 (0x7f14ce08a288 in /mnt/train-data-3-ssd/gargatik/virtual_env/multimae/lib/python3.8/site-packages/torch/lib/libtorch_python.so)
frame #7: THPVariable_subclass_dealloc(_object*) + 0x2d5 (0x7f14ce08a655 in /mnt/train-data-3-ssd/gargatik/virtual_env/multimae/lib/python3.8/site-packages/torch/lib/libtorch_python.so)
frame #8: /mnt/train-data-3-ssd/gargatik/virtual_env/multimae/bin/python() [0x5ccad3]
frame #9: /mnt/train-data-3-ssd/gargatik/virtual_env/multimae/bin/python() [0x5d270c]
frame #10: /mnt/train-data-3-ssd/gargatik/virtual_env/multimae/bin/python() [0x5ec780]
frame #11: /mnt/train-data-3-ssd/gargatik/virtual_env/multimae/bin/python() [0x5441f8]
frame #12: /mnt/train-data-3-ssd/gargatik/virtual_env/multimae/bin/python() [0x54424a]
frame #13: /mnt/train-data-3-ssd/gargatik/virtual_env/multimae/bin/python() [0x54424a]
frame #14: PyDict_SetItemString + 0x536 (0x5d1686 in /mnt/train-data-3-ssd/gargatik/virtual_env/multimae/bin/python)
frame #15: PyImport_Cleanup + 0x79 (0x684619 in /mnt/train-data-3-ssd/gargatik/virtual_env/multimae/bin/python)
frame #16: Py_FinalizeEx + 0x7f (0x67f8af in /mnt/train-data-3-ssd/gargatik/virtual_env/multimae/bin/python)
frame #17: Py_RunMain + 0x32d (0x6b70fd in /mnt/train-data-3-ssd/gargatik/virtual_env/multimae/bin/python)
frame #18: Py_BytesMain + 0x2d (0x6b736d in /mnt/train-data-3-ssd/gargatik/virtual_env/multimae/bin/python)
frame #19: __libc_start_main + 0xf3 (0x7f14ecd3f0b3 in /lib/x86_64-linux-gnu/libc.so.6)
frame #20: _start + 0x2e (0x5fa5ce in /mnt/train-data-3-ssd/gargatik/virtual_env/multimae/bin/python)

WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 182198 closing signal SIGTERM
WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 182199 closing signal SIGTERM
WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 182200 closing signal SIGTERM
WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 182202 closing signal SIGTERM
WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 182203 closing signal SIGTERM
WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 182204 closing signal SIGTERM
WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 182205 closing signal SIGTERM
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: -6) local_rank: 3 (pid: 182201) of binary: /mnt/train-data-3-ssd/gargatik/virtual_env/multimae/bin/python
Traceback (most recent call last):
File "/mnt/train-data-3-ssd/gargatik/virtual_env/multimae/bin/torchrun", line 8, in
sys.exit(main())
File "/mnt/train-data-3-ssd/gargatik/virtual_env/multimae/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/init.py", line 345, in wrapper
return f(*args, kwargs)
File "/mnt/train-data-3-ssd/gargatik/virtual_env/multimae/lib/python3.8/site-packages/torch/distributed/run.py", line 761, in main
run(args)
File "/mnt/train-data-3-ssd/gargatik/virtual_env/multimae/lib/python3.8/site-packages/torch/distributed/run.py", line 752, in run
elastic_launch(
File "/mnt/train-data-3-ssd/gargatik/virtual_env/multimae/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 131, in call**
return launch_agent(self._config, self._entrypoint, list(args))
File "/mnt/train-data-3-ssd/gargatik/virtual_env/multimae/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 245, in launch_agent
raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:

run_pretraining_multimae.py FAILED

Failures:
<NO_OTHER_FAILURES>

Root Cause (first observed failure):
[0]:
time : 2022-07-09_16:11:15
host : Norwalk
rank : 3 (local_rank: 3)
exitcode : -6 (pid: 182201)
error_file: <N/A>
traceback : Signal 6 (SIGABRT) received by PID 182201

END______________

Thanks for the help

Running time problem

Sorry to bother you,I'm a fresh to deep learning and i try to run your Multimae to learn more about model pretraining. I run run_pretraining_multimae.py on imagenet-1k dataset with 8 3090GPU.But it take 2h48mins to run one epoch,I don't know why.

I already replaed Pillow with [Pillow-SIMD]
Thank you if you could help me.

Fine-Tuning guide for additional tasks

Hello, thank you for the great work. The paper and the code look very well-thought-out. I wonder if you have any guide or list of changes required for fine-tuning for an additional task. If you have such a thing, it would be a lifesaver. The repository is quite large, and it is very easy to miss things that are required.

Thanks in advance for your help.

add web demo/model to Huggingface

Hi, would you be interested in adding MultiMAE to Hugging Face? The Hub offers free hosting, and it would make your work more accessible and visible to the rest of the ML community. Models/datasets/spaces(web demos) can be added to a user account or organization similar to github.

Example from other organizations:
Keras: https://huggingface.co/keras-io
Microsoft: https://huggingface.co/microsoft
Facebook: https://huggingface.co/facebook

Example spaces with repos:
github: https://github.com/salesforce/BLIP
Spaces: https://huggingface.co/spaces/salesforce/BLIP

github: https://github.com/facebookresearch/omnivore
Spaces: https://huggingface.co/spaces/akhaliq/omnivore

and here are guides for adding spaces/models/datasets to your org

How to add a Space: https://huggingface.co/blog/gradio-spaces
how to add models: https://huggingface.co/docs/hub/adding-a-model
uploading a dataset: https://huggingface.co/docs/datasets/upload_dataset.html

Please let us know if you would be interested and if you have any questions, we can also help with the technical implementation.

Evaluation only on 100 images?

We were looking into the depth finetuning pipeline and realized that the eval performance is based only on 100 images from the whole NYU val dataset. I am wondering why this design decision was made. Am I missing something?
Thanks,
Kalyani

Query about data preparation for finetuning for nyuv2-depth

Hi,
I think the following correction holds:
Line 357 in https://github.com/EPFL-VILAB/MultiMAE/blob/main/run_finetuning_depth.py should be
dataset_train = build_regression_dataset(args, data_path=args.train_data_path, transform=train_transform)
instead of
dataset_train = build_regression_dataset(args, data_path=args.data_path, transform=train_transform)
or the argument train_data_path should be changed to data_path.

Apart from that, I am trying to recreate your results on NYUv2 for depth. but the dataset preparation instructions are not clear from the instructions in SETUP.
As explained about the folder structure , where should the GT be when finetuning for depth and evaluating. Apart from that, mask_valid for fine-tuning?
RuntimeError: Found 0 logs in subfolders of: /tmp-network/user/varora/multimae/multimae_data/train/mask_valid

Query about semseg domain in pre-training

Hi, I have successful made the pesudo labels and trained ‘rgb’ in/out-domain multimae model.

But when I trained model with 'rgb-semseg' in/out-domain, I met an error in multimae/input_adapters.py line 232

# Create patches [B, C, H, W] -> [B, (H*W), C]
x_patch = rearrange(self.proj(x), 'b d nh nw -> b (nh nw) d')

The full log is log.txt.
x.size() is [batchsize, 64, 56, 56] before line 232.
I can't find out what's wrong.

What's more, I don't know why the pseudo semeg label image resize into 1/4 (that is 224*224->56*56)in utils/datasets.py line 105

# Convert to Tensor
for task in task_dict:
    if task in ['depth']:
        img = torch.Tensor(np.array(task_dict[task]) / 2 ** 16)
        img = img.unsqueeze(0)  # 1 x H x W
    elif task in ['rgb']:
        img = TF.to_tensor(task_dict[task])
        img = TF.normalize(img, mean=self.rgb_mean, std=self.rgb_std)
    elif task in ['semseg', 'semseg_coco']:
        # TODO: add this to a config instead
        # Rescale to 0.25x size (stride 4)
        scale_factor = 0.25
        img = task_dict[task].resize((int(self.input_size * scale_factor), int(self.input_size * scale_factor)))
        # Using pil_to_tensor keeps it in uint8, to_tensor converts it to float (rescaled to [0, 1])
        img = TF.pil_to_tensor(img).to(torch.long).squeeze(0)

and then use nn.Conv2d in multimae/input_adapters.py line 198

if self.interpolate_class_emb:
    self.proj = nn.Sequential(
        nn.Upsample(scale_factor=(1 / self.P_H, 1 / self.P_W),
                    mode='bilinear'),  # Actually a downsample operation
        nn.Conv2d(in_channels=self.dim_class_emb, out_channels=self.dim_tokens,
                    kernel_size=1, stride=1),
    )
else:
    self.proj = nn.Conv2d(
        in_channels=self.dim_class_emb, out_channels=self.dim_tokens,
        kernel_size=(self.P_H, self.P_W), stride=(self.P_H, self.P_W)
    )
)

Thank you for any help.

should i do nb_classes=1 or 2 for binary classification finetuning task?

Hi @roman-bachmann , @dmizr , @amir32002 great paper

Let me know what is better.

Request for pseudo semseg and depth for IN-1k

Thanks for the great work!
Could you please provide the pseudo semseg and depth for IN-1k to reproduce the result?

Thanks!

is it normal to see this during finetuning?

_IncompatibleKeys(missing_keys=['output_adapters.cls.norm.weight', 'output_adapters.cls.norm.bias', 'output_adapters.cls.head.weight', 'output_adapters.cls.head.bias'], unexpected_keys=[])

Could be happening because of deleting the output adapter.

@dmizr thank you for replying to previous issues.

Query about the dataset preparation for other dataset

Hi,

First of all, thanks for your amazing work!

I try to use potsdam dataset on MulitMAE, my dataset architecture is like this：

/path/to/potsdam/
├── train/
│ ├── rgb/
│ │ └── all/
│ │ ├── 1.jpg
│ │ ├── 2.jpg
│ │ └── ...
│ |└── semseg/
│ | └── all/
│ | ├── 1.png
│ | ├── 2.png
│ |
| | └── depth/
| | └── all/
| ├── 1.tif
│ | ├── 2.tif
|
└── val/
├── rgb/
│ └── all/
│ ├──1.jpg
│ ├── 2.jpg
│ └── ...
| └── semseg/
| └── all/
| ├──1.png
| ├── 2.png
| └── ...
| └── depth/
| └── all/
| ├── 1.tif
| ├── 2.tif

My yaml file looks like this：

> # NYU semseg config

# Finetune from:

finetune: 'D:/graduate study/MultiMAE-main/multimae-b_98_rgb+-depth-semseg_1600e_multivit-afff3f8c.pth' # Change me

# Input tasks

in_domains: rgb-depth
decoder_main_tasks: rgb # Can also be changed to rgb-depth
use_mask_valid: False # Requires "task" mask_valid to be saved to disk

# Architecture

model: multivit_base
patch_size: 16
num_global_tokens: 1
drop_path_encoder: 0.1
output_adapter: convnext
decoder_dim: 6144
decoder_preds_per_patch: 16
decoder_depth: 4

# Train

epochs: 1
opt: adamw
lr: 0.0001 # = 1e-4
warmup_lr: 0.000001 # = 1e-6
min_lr: 0.
warmup_epochs: 1
batch_size: 2
input_size: 512
layer_decay: 0.75

# Augmentation

aug_name: simple

# Data info

data_path: 'D:/datasets/satellite-dataset/potsdam_done/train' # Change me
eval_data_path: 'D:/datasets/satellite-dataset/potsdam_done/val' # Change me
num_classes: 150  
dataset_name: potsdam
dist_eval: True
seg_reduce_zero_label: True
eval_freq: 20

# Misc.

find_unused_params: False

# Wandb and logging

log_wandb: False # Set to True to log to Weights & Biases
wandb_project: 'multimae-finetune-semseg'
wandb_entity: null # Change if needed
wandb_run_name: 'ft_nyu_200e_multimae-b_rgb-depth'
log_images_wandb: True
log_images_freq: 20
output_dir: 'output/finetune/semseg/nyu/ft_nyu_200e_multimae-b_rgb-depth'

But when I run the run_finetuning_semseg.py it, I get this error:

RuntimeError: CUDA error: device-side assert triggered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.

I want to know if there is something wrong with my dataset structure or if my yaml file is configured incorrectly

Best, David

Problem during evaluate the pretrained model

Problem

Hi! I encountered some problem while just try to evaluate this model with same config as Demo on Colab.

Environment

Ubuntu 22.04
CUDA Kernel 10.1
CUDA Runtime 11.3
Pytorch 1.12.0

Terminal

/opt/conda/conda-bld/pytorch_1656352645774/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [6825,0,0], thread: [0,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds"failed. /opt/conda/conda-bld/pytorch_1656352645774/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [6825,0,0], thread: [1,0,0] Assertionidx_dim >= 0 && idx_dim < index_size && "index out of bounds"failed. /opt/conda/conda-bld/pytorch_1656352645774/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [6825,0,0], thread: [2,0,0] Assertionidx_dim >= 0 && idx_dim < index_size && "index out of bounds"failed. /opt/conda/conda-bld/pytorch_1656352645774/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [6825,0,0], thread: [3,0,0] Assertionidx_dim >= 0 && idx_dim < index_size && "index out of bounds"failed. /opt/conda/conda-bld/pytorch_1656352645774/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [6825,0,0], thread: [4,0,0] Assertionidx_dim >= 0 && idx_dim < index_size && "index out of bounds"failed. /opt/conda/conda-bld/pytorch_1656352645774/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [6825,0,0], thread: [5,0,0] Assertionidx_dim >= 0 && idx_dim < index_size && "index out of bounds"failed. /opt/conda/conda-bld/pytorch_1656352645774/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [6825,0,0], thread: [6,0,0] Assertionidx_dim >= 0 && idx_dim < index_size && "index out of bounds"failed. /opt/conda/conda-bld/pytorch_1656352645774/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [6825,0,0], thread: [7,0,0] Assertionidx_dim >= 0 && idx_dim < index_size && "index out of bounds"failed. /opt/conda/conda-bld/pytorch_1656352645774/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [6825,0,0], thread: [8,0,0] Assertionidx_dim >= 0 && idx_dim < index_size && "index out of bounds"failed. /opt/conda/conda-bld/pytorch_1656352645774/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [6825,0,0], thread: [9,0,0] Assertionidx_dim >= 0 && idx_dim < index_size && "index out of bounds"failed. /opt/conda/conda-bld/pytorch_1656352645774/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [6825,0,0], thread: [10,0,0] Assertionidx_dim >= 0 && idx_dim < index_size && "index out of bounds"failed. /opt/conda/conda-bld/pytorch_1656352645774/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [6825,0,0], thread: [11,0,0] Assertionidx_dim >= 0 && idx_dim < index_size && "index out of bounds"failed. /opt/conda/conda-bld/pytorch_1656352645774/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [6825,0,0], thread: [12,0,0] Assertionidx_dim >= 0 && idx_dim < index_size && "index out of bounds"failed. /opt/conda/conda-bld/pytorch_1656352645774/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [6825,0,0], thread: [13,0,0] Assertionidx_dim >= 0 && idx_dim < index_size && "index out of bounds"failed. /opt/conda/conda-bld/pytorch_1656352645774/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [6825,0,0], thread: [14,0,0] Assertionidx_dim >= 0 && idx_dim < index_size && "index out of bounds"failed. /opt/conda/conda-bld/pytorch_1656352645774/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [6825,0,0], thread: [15,0,0] Assertionidx_dim >= 0 && idx_dim < index_size && "index out of bounds"failed. /opt/conda/conda-bld/pytorch_1656352645774/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [6825,0,0], thread: [16,0,0] Assertionidx_dim >= 0 && idx_dim < index_size && "index out of bounds"failed. /opt/conda/conda-bld/pytorch_1656352645774/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [6825,0,0], thread: [17,0,0] Assertionidx_dim >= 0 && idx_dim < index_size && "index out of bounds"failed. /opt/conda/conda-bld/pytorch_1656352645774/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [6825,0,0], thread: [18,0,0] Assertionidx_dim >= 0 && idx_dim < index_size && "index out of bounds"failed. /opt/conda/conda-bld/pytorch_1656352645774/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [6825,0,0], thread: [19,0,0] Assertionidx_dim >= 0 && idx_dim < index_size && "index out of bounds"failed. /opt/conda/conda-bld/pytorch_1656352645774/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [6825,0,0], thread: [20,0,0] Assertionidx_dim >= 0 && idx_dim < index_size && "index out of bounds"failed. /opt/conda/conda-bld/pytorch_1656352645774/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [6825,0,0], thread: [21,0,0] Assertionidx_dim >= 0 && idx_dim < index_size && "index out of bounds"failed. /opt/conda/conda-bld/pytorch_1656352645774/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [6825,0,0], thread: [22,0,0] Assertionidx_dim >= 0 && idx_dim < index_size && "index out of bounds"failed. /opt/conda/conda-bld/pytorch_1656352645774/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [6825,0,0], thread: [23,0,0] Assertionidx_dim >= 0 && idx_dim < index_size && "index out of bounds"failed. /opt/conda/conda-bld/pytorch_1656352645774/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [6825,0,0], thread: [24,0,0] Assertionidx_dim >= 0 && idx_dim < index_size && "index out of bounds"failed. /opt/conda/conda-bld/pytorch_1656352645774/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [6825,0,0], thread: [25,0,0] Assertionidx_dim >= 0 && idx_dim < index_size && "index out of bounds"failed. /opt/conda/conda-bld/pytorch_1656352645774/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [6825,0,0], thread: [26,0,0] Assertionidx_dim >= 0 && idx_dim < index_size && "index out of bounds"failed. /opt/conda/conda-bld/pytorch_1656352645774/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [6825,0,0], thread: [27,0,0] Assertionidx_dim >= 0 && idx_dim < index_size && "index out of bounds"failed. /opt/conda/conda-bld/pytorch_1656352645774/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [6825,0,0], thread: [28,0,0] Assertionidx_dim >= 0 && idx_dim < index_size && "index out of bounds"failed. /opt/conda/conda-bld/pytorch_1656352645774/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [6825,0,0], thread: [29,0,0] Assertionidx_dim >= 0 && idx_dim < index_size && "index out of bounds"failed. /opt/conda/conda-bld/pytorch_1656352645774/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [6825,0,0], thread: [30,0,0] Assertionidx_dim >= 0 && idx_dim < index_size && "index out of bounds"failed. /opt/conda/conda-bld/pytorch_1656352645774/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [6825,0,0], thread: [31,0,0] Assertionidx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed. Traceback (most recent call last): File "/home/jxr/anaconda3/envs/python/lib/python3.10/runpy.py", line 196, in _run_module_as_main return _run_code(code, main_globals, None, File "/home/jxr/anaconda3/envs/python/lib/python3.10/runpy.py", line 86, in _run_code exec(code, run_globals) File "/home/jxr/.vscode-server/extensions/ms-python.python-2022.12.0/pythonFiles/lib/python/debugpy/adapter/../../debugpy/launcher/../../debugpy/__main__.py", line 39, in <module> cli.main() File "/home/jxr/.vscode-server/extensions/ms-python.python-2022.12.0/pythonFiles/lib/python/debugpy/adapter/../../debugpy/launcher/../../debugpy/../debugpy/server/cli.py", line 430, in main run() File "/home/jxr/.vscode-server/extensions/ms-python.python-2022.12.0/pythonFiles/lib/python/debugpy/adapter/../../debugpy/launcher/../../debugpy/../debugpy/server/cli.py", line 284, in run_file runpy.run_path(target, run_name="__main__") File "/home/jxr/.vscode-server/extensions/ms-python.python-2022.12.0/pythonFiles/lib/python/debugpy/_vendored/pydevd/_pydevd_bundle/pydevd_runpy.py", line 321, in run_path return _run_module_code(code, init_globals, run_name, File "/home/jxr/.vscode-server/extensions/ms-python.python-2022.12.0/pythonFiles/lib/python/debugpy/_vendored/pydevd/_pydevd_bundle/pydevd_runpy.py", line 135, in _run_module_code _run_code(code, mod_globals, init_globals, File "/home/jxr/.vscode-server/extensions/ms-python.python-2022.12.0/pythonFiles/lib/python/debugpy/_vendored/pydevd/_pydevd_bundle/pydevd_runpy.py", line 124, in _run_code exec(code, run_globals) File "/home/jxr/3D-MultiMAE/MultiMAE/try_model.py", line 118, in <module> preds, masks = multimae.forward( File "/home/jxr/3D-MultiMAE/MultiMAE/multimae/multimae.py", line 350, in forward encoder_tokens = self.encoder(input_tokens) File "/home/jxr/anaconda3/envs/python/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl return forward_call(*input, **kwargs) File "/home/jxr/anaconda3/envs/python/lib/python3.10/site-packages/torch/nn/modules/container.py", line 139, in forward input = module(input) File "/home/jxr/anaconda3/envs/python/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl return forward_call(*input, **kwargs) File "/home/jxr/3D-MultiMAE/MultiMAE/multimae/multimae_utils.py", line 230, in forward x = x + self.drop_path(self.attn(self.norm1(x))) File "/home/jxr/anaconda3/envs/python/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl return forward_call(*input, **kwargs) File "/home/jxr/3D-MultiMAE/MultiMAE/multimae/multimae_utils.py", line 175, in forward attn = (q @ k.transpose(-2, -1)) * self.scale RuntimeError: CUDA error: device-side assert triggered CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1.

[Inquiry] Plans to release visualization code

Great paper! Are there plans to release the visualization code?

colab error

Hello,

Thank you for the code. However, I am not able to run this:
!wget https://drive.switch.ch/index.php/s/RFfTZwyKROKKx0l/download in the google colab. Can you please recheck?

Thanks

about run_finetuning_semseg.py

HI, I find that:

in MultiMae/run_finetuning_semseg.py line735

seg_pred_argmax = seg_pred[:num_classes].argmax(dim=1)

I think it should be

seg_pred_argmax = seg_pred[:,:num_classes,:,:].argmax(dim=1)

ask for pretraining model

Hi~I don't have enough GPU to train the model. Where to down your pretaining model? I am looking forward to your reply.

Query regarding the output adapter heads

Hi,
Thank you for the interesting work and the extensive experiments. Your depth results are based on the DPT head in the paper. In the colab, you use the spatial adapter head for inference. I was wondering if your fine-tuning results with the spatial adapter head were better/worse than the DPT head? Was the intention to implement this spatial head more to test a pure transformer based head (compared to DPT's convolution based refineNet like approach?)?

Thank you.

Some doubts about pseudo labels

Hi, I am pseudo-tagging the imagenet-1k, and encountering some difficulties.

Firstly, I wonder what would happen if the classes of semeg are more than 255? How to use one channel depth png image to represent them? (Although COCO datasets is only 80 classes, the imagenet is more than 255 classes when fine-tuning)

Secondly, on the example of Colab notebook, the rgb2depth model of DPT could not input any size of imagenet pictures. How could we save all the pseudo labels down before the data augmentation cutting it into 224*224? We need to align the original images with the pseudo labeled image should we?