antocad / focusondepth Goto Github PK

View Code? Open in Web Editor NEW

133.0 6.0 30.0 8.1 MB

A Monocular depth-estimation for in-the-wild AutoFocus application.

License: MIT License

Python 100.00%

depth-estimation pytorch semantic-segmentation deep-learning dense-prediction vision-transformers

focusondepth's Introduction

Focus On Depth - A single DPT encoder for AutoFocus application and Dense Prediction Tasks

Abstract

Depth estimation is a classic task in computer vision, which is of great significance for many applications such as augmented reality, target tracking and autonomous driving. We firstly summarize the deep learning models for monocular depth estimation. Secondly, we will implement a recent Vision Transformers based architecture for this task. We will seek to improve it by adding a segmentation head in order to perform multi-task learning using a customly built dataset. Thirdly, we will implement our model for in-the-wild images (i.e. without control on the environment, the distance and size of objects of interests, and their physical properties (rotation, dynamics, etc.)) for Auto-focus application on humans and will give qualitative comparison across other methods.

⚡ New! Web demo

You can check the webdemo hosted on Hugging Face and powered by Gradio, here.

📌 Requirements

Run: pip install -r requirements.txt

🚀 Running the model

You can first download one of the models from the model zoo:

🏦 Model zoo

Get the links of the following models:

FocusOnDepth_vit_base_patch16_384.p
Other models coming soon...

And put the .p file into the directory models/. After that, you need to update the config.json (Tutorial here) according to the pre-trained model you have chosen to run the predictions (this means that if you load a depth-only model, then you have to set type to depth for example ...).

🎯 Run a prediction

Put your input images (that have to be .png or .jpg) into the input/ folder. Then, just run python run.py and you should get the depth maps as well as the segmentation masks in the output/ folder.

🔨 Training

🔧 Build the dataset

Our model is trained on a combination of

📝 Configure `config.json`

Please refer to our config wiki to understand how to modify the config file to run a training.

🔩 Run the training script

After that, you can simply run the training script: python train.py

📜 Citations

Our work is based on the work from Ranflt et al. please do not forget to cite their work! :) You can also check our report if you need more details.

@article{DPT,
  author    = {Ren{\'{e}} Ranftl and
               Alexey Bochkovskiy and
               Vladlen Koltun},
  title     = {Vision Transformers for Dense Prediction},
  journal   = {CoRR},
  volume    = {abs/2103.13413},
  year      = {2021},
  url       = {https://arxiv.org/abs/2103.13413},
  eprinttype = {arXiv},
  eprint    = {2103.13413},
  timestamp = {Wed, 07 Apr 2021 15:31:46 +0200},
  biburl    = {https://dblp.org/rec/journals/corr/abs-2103-13413.bib},
  bibsource = {dblp computer science bibliography, https://dblp.org}
}

focusondepth's People

Contributors

Stargazers

Watchers

focusondepth's Issues

vit_base_patch32_224 instead of vit_base_patch16_384

Hi,

Instead of vit_base_patch16_384, I'm trying to use vit_base_patch32_224 in the segmentation branch.
I did change

"patch_size":32 and

"transforms":{
"resize":224,
"p_flip":0.5,
"p_crop":0.3,
"p_rot":0.2
},
But its crashing for some reason. Do you know what else need to be changed in the config?

"compute_scale_and_shift" using median + MAD vs least square

"compute_scale_and_shift" can be found in Loss.py and is based on least square loss.
Given that MiDaS-v2 was trained using the median + MAD and MiDaS-v1 was trained using the least squares-based loss (https://arxiv.org/abs/1907.01341v1)

Have median + MAD been implemented yet?

def compute_scale_and_shift(prediction, target, mask):
    # system matrix: A = [[a_00, a_01], [a_10, a_11]]
    a_00 = torch.sum(mask * prediction * prediction, (1, 2))
    a_01 = torch.sum(mask * prediction, (1, 2))
    a_11 = torch.sum(mask, (1, 2))

    # right hand side: b = [b_0, b_1]
    b_0 = torch.sum(mask * prediction * target, (1, 2))
    b_1 = torch.sum(mask * target, (1, 2))

    # solution: x = A^-1 . b = [[a_11, -a_01], [-a_10, a_00]] / (a_00 * a_11 - a_01 * a_10) . b
    x_0 = torch.zeros_like(b_0)
    x_1 = torch.zeros_like(b_1)

    det = a_00 * a_11 - a_01 * a_01
    valid = det.nonzero()

    x_0[valid] = (a_11[valid] * b_0[valid] - a_01[valid] * b_1[valid]) / det[valid]
    x_1[valid] = (-a_01[valid] * b_0[valid] + a_00[valid] * b_1[valid]) / det[valid]

    return x_0, x_1

https://gist.github.com/dvdhfnr/732c26b61a0e63a0abc8a5d769dbebd0

https://arxiv.org/pdf/1907.01341v3.pdf

How to specify validation dataset

Hello, thanks for the great code. I have a question about the dataset split.

Does the split code generate validation dataset by taking random samples in the ratio which is specified in config.splits?
How can I specifiy the validation dataset itself instead of spliting the whole dataset? For example:
trainImages = /path/train/images/...png
trainMasks = /path/train/masks/...png
validationImages = /path/validation/images/...png
validationMasks = /path/validation/masks/...png

Inverse Depth

Are you using inverse depth to do the training? If not, why?

selected frames per dataset

apologies non-issue.

Thanks for a repo! Segmentation is wrong.

Hi, when I ran our model on nyuv2, I just noticed that predicted segmentation map is always completely black image while depth map is grayish color.

changes to the patch embedding layer

Hi,
I am trying to make use of the DPT model, however I would like to make some changes to the patch embedding layer. You have created the model using timm.create_model(), however you also have some commented out code. Have you used that code instead of the timm.create_model() and did it work the same us the original model? Do you have any tips on how I can modify the model code?

No need for depth aspect

Hi,

I'm only interested in using the FOD model for semantic segmentation - how should I approach curating the depth aspect of the dataset? Should I just use segmentation masks for both the depth and segmentation masks of the dataset? Or is there a way to remove that aspect of the output head?

question in the focusdepth scripts

it seems the transformer_encoder's output in this line not used?

About the datasets

Hi,
I am trying to download the datasets using the link "view on Kaggle" but failed. How can I obtain the datasets?
Thanks for your help!

Training using SUNRGBD dataset

Hi,
Thank you very much for the training script. I am currenlty trying to train using SUNRGBD dataset for both depth prediction and semantic segmentation. I have added multiple classes to config file ( chairs, floor, tables ). While training, program stopped execution due to nan values after 3 epochs. I am using 'vit_base', batch size 8, and default values for all other parameters. I have following doubts.

Should I update/modify any code sections to train for multiple classes segmentation?
I have passes ground truth depth values as such. Should I convert it to meters ?

Thanks in advance...

License

Hi, congratulations on the excellent work.

I am doing academic research that is funded by a public company. What is the license to use the code for this work?

Thanks for paying attention...

Questions about Midas Training

Hello,

First of all, I would like to say thank you for your work.
I'm going to retraining the MiDas model. However, MiDas repository does not have a train code.
Is it possible to train the MiDas model using your train code?
Or have you ever trained the MiDas model?
The dataset is NYU v2.

Thank you

Training loss and log

Hi,

Thanks for the nice repo.
I wonder if you can provide the training log.
And if you are going to use the median + mae which is used in MiDaS-v2
isl-org/MiDaS#28

Thanks!

Amazing!! Will the same weights work for "type":"segmentation" ?

I don't have a dataset with depth, so I want to train only segmentation. I'm confused, will this same architecture work?

How to add IOU score?

Hi
I'm currently using the below function to calculate IOU score (for binary segmentation). But that doesn't seem to give a consistent increase in IOU score as epochs increase. I was wondering if it was correct.

def iou_pytorch(outputs: torch.Tensor, labels: torch.Tensor):
    return torch.sum(outputs & labels)/torch.sum(outputs | labels)

Thanks

Hi. I have a question.

Hi. I am going to run (output) the 'model.p' file trained with focusondepth.
What should I do?

I proceeded with the approach of simply modifying the config file, but an error such as 'Missing key(s) in state_dict:
' occurs.

Question about gradient loss

Hi,

I'm trying to modify the loss by adding a confidence weight for each pixel. I don't understand why do you multiply translated mask:
mask_x = torch.mul(mask[:, :, 1:], mask[:, :, :-1])
grad_x = torch.mul(mask_x, grad_x)
to get the final mask in the gradient loss matching term. Why we can't just truncate the mask to get the right shape like this grad_x = torch.mul(mask[:, :, :-1], grad_x) ?

zipfile.BadZipFile: File is not a zip file

When I run python train.py,. I got the following errors.

Any advice on how to fix it?

Traceback (most recent call last):
File "C:\Users\hello\Desktop\git\FocusOnDepth\train.py", line 31, in
trainer = Trainer(config)
File "C:\Users\hello\Desktop\git\FocusOnDepth\FOD\Trainer.py", line 24, in init
self.model = FocusOnDepth(
File "C:\Users\hello\Desktop\git\FocusOnDepth\FOD\FocusOnDepth.py", line 55, in init
self.transformer_encoders = timm.create_model(model_timm, pretrained=True)
File "C:\Users\hello\AppData\Local\Programs\Python\Python310\lib\site-packages\timm\models\factory.py", line 71, in create_model
model = create_fn(pretrained=pretrained, pretrained_cfg=pretrained_cfg, **kwargs)
File "C:\Users\hello\AppData\Local\Programs\Python\Python310\lib\site-packages\timm\models\vision_transformer.py", line 887, in vit_base_patch16_384
model = _create_vision_transformer('vit_base_patch16_384', pretrained=pretrained, **model_kwargs)
File "C:\Users\hello\AppData\Local\Programs\Python\Python310\lib\site-packages\timm\models\vision_transformer.py", line 786, in _create_vision_transformer
model = build_model_with_cfg(
File "C:\Users\hello\AppData\Local\Programs\Python\Python310\lib\site-packages\timm\models\helpers.py", line 549, in build_model_with_cfg
load_custom_pretrained(model, pretrained_cfg=pretrained_cfg)
File "C:\Users\hello\AppData\Local\Programs\Python\Python310\lib\site-packages\timm\models\helpers.py", line 192, in load_custom_pretrained
model.load_pretrained(pretrained_loc)
File "C:\Users\hello\AppData\Local\Programs\Python\Python310\lib\site-packages\timm\models\vision_transformer.py", line 488, in load_pretrained
_load_weights(self, checkpoint_path, prefix)
File "C:\Users\hello\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\autograd\grad_mode.py", line 27, in decorate_context
return func(*args, **kwargs)
File "C:\Users\hello\AppData\Local\Programs\Python\Python310\lib\site-packages\timm\models\vision_transformer.py", line 624, in _load_weights
w = np.load(checkpoint_path)
File "C:\Users\hello\AppData\Local\Programs\Python\Python310\lib\site-packages\numpy\lib\npyio.py", line 422, in load
ret = NpzFile(fid, own_fid=own_fid, allow_pickle=allow_pickle,
File "C:\Users\hello\AppData\Local\Programs\Python\Python310\lib\site-packages\numpy\lib\npyio.py", line 178, in init
_zip = zipfile_factory(fid)
File "C:\Users\hello\AppData\Local\Programs\Python\Python310\lib\site-packages\numpy\lib\npyio.py", line 101, in zipfile_factory
return zipfile.ZipFile(file, *args, **kwargs)
File "C:\Users\hello\AppData\Local\Programs\Python\Python310\lib\zipfile.py", line 1267, in init
self._RealGetContents()
File "C:\Users\hello\AppData\Local\Programs\Python\Python310\lib\zipfile.py", line 1334, in _RealGetContents
raise BadZipFile("File is not a zip file")
zipfile.BadZipFile: File is not a zip file

Error Loading my own trained model.

Hi, thanks for you work, I am training to train the model and test it on my own datasets. The model starts training and after the training completes I am trying to test using run.py script. But it gives me this error:
File "C:\ProgramData\Anaconda3\envs\d3net\lib\site-packages\timm\models\factory.py", line 67, in create_model
raise RuntimeError('Unknown model (%s)' % model_name)
RuntimeError: Unknown model (1)

The model is saved in /models directory and i set the pretrain=False as well.

about accuracy

Thanks for the training code of DPT
I wonder the accuracy or metric of the trained code, Is this performance similar to the original paper code?

expected depth type ?

Hi,
I adapted your code to train on my own dataset in depth mode as an alternative to DPT, but the loss isn't decreasing, and I found that the output depth map is all zero, which is weird. I found

output_depth = 1-output_depth

in FOD/Predictor.py, while the statistics of the depth maps in my dataset is like (max, min, mean):

77.9375 0.0 1.6585418
82.8125 0.0 10.568505
97.9375 0.0 7.066457
1620.0 0.0 28.10622
203.375 0.0 6.32571
15.890625 0.0 1.8040103
80.625 0.0 11.406916
516.0 0.0 12.47693
5052.0 -3672.0 18.6016
48.75 0.0 8.396343
9136.0 0.0 21.534061
347.5 0.0 8.561709
113.5625 0.0 13.559731
40.875 0.0 9.513472
37.15625 0.0 4.9919477
99.0625 0.0 5.4938717
30.890625 0.0 0.77849835
67.625 0.0 8.707108
42.0 0.0 8.621763
106.1875 0.0 11.275524
74.1875 0.0 7.9050064
92.0 0.0 15.986496

So I suspect that the expected depth type is different from the one of my dataset ? What should be the correct depth type ? Like the value range.

thanks

Will FocusOnDepth support nn.DataParallel?

Unexpected key(s) in state_dict when type="depth"

can not change to type depth in conifig as will no work get this error RuntimeError: Error(s) in loading state_dict for FocusOnDepth:
Unexpected key(s) in state_dict: "head_segmentation.head.0.weight", "head_segmentation.head.0.bias", "head_segmentation.head.2.weight", "head_segmentation.head.2.bias", "head_segmentation.head.4.weight", "head_segmentation.head.4.bias". why only work in type full

A question about depth estimation accuracy .

Could you provide a pre-trained model, or we train the network from scratch, and finally achieve a depth estimation result comparable to DPT? Looking forward to your reply, thank you！