noahzn / lite-mono Goto Github PK

View Code? Open in Web Editor NEW

516.0 11.0 55.0 7.55 MB

[CVPR2023] Lite-Mono: A Lightweight CNN and Transformer Architecture for Self-Supervised Monocular Depth Estimation

License: MIT License

Python 100.00%

lite-mono's Introduction

Lite-Mono

A Lightweight CNN and Transformer Architecture for Self-Supervised Monocular Depth Estimation [paper link]

Ning Zhang*, Francesco Nex, George Vosselman, Norman Kerle

(Lite-Mono-8m 1024x320)

Overview
Results
Data Preparation
Single Image Test
- Preparing Trained Model
- Start Testing
Evaluation
Training
Make Your Own Pre-training Weights On ImageNet
Citation

Overview

Results

KITTI

You can download the trained models using the links below.

--model	Params	ImageNet Pretrained	Input size	Abs Rel	Sq Rel	RMSE	RMSE log	delta < 1.25	delta < 1.25^2	delta < 1.25^3
lite-mono	3.1M	yes	640x192	0.107	0.765	4.561	0.183	0.886	0.963	0.983
lite-mono-small	2.5M	yes	640x192	0.110	0.802	4.671	0.186	0.879	0.961	0.982
lite-mono-tiny	2.2M	yes	640x192	0.110	0.837	4.710	0.187	0.880	0.960	0.982
lite-mono-8m	8.7M	yes	640x192	0.101	0.729	4.454	0.178	0.897	0.965	0.983
lite-mono	3.1M	yes	1024x320	0.102	0.746	4.444	0.179	0.896	0.965	0.983
lite-mono-small	2.5M	yes	1024x320	0.103	0.757	4.449	0.180	0.894	0.964	0.983
lite-mono-tiny	2.2M	yes	1024x320	0.104	0.764	4.487	0.180	0.892	0.964	0.983
lite-mono-8m	8.7M	yes	1024x320	0.097	0.710	4.309	0.174	0.905	0.967	0.984

Speed Evaluation

Robustness

The RoboDepth Challenge Team is evaluating the robustness of different depth estimation algorithms. Lite-Mono has achieved the best robustness to date.

Data Preparation

Please refer to Monodepth2 to prepare your KITTI data.

Single Image Test

preparing trained model

From this table you can download trained models (depth encoder and depth decoder).

Click on the links in the '--model' column to download a trained model.

start testing

python test_simple.py --load_weights_folder path/to/your/weights/folder --image_path path/to/your/test/image

Evaluation

python evaluate_depth.py --load_weights_folder path/to/your/weights/folder --data_path path/to/kitti_data/ --model lite-mono

Training

dependency installation

pip install 'git+https://github.com/saadnaeem-dev/pytorch-linear-warmup-cosine-annealing-warm-restarts-weight-decay'

preparing pre-trained weights

From this table you can also download weights of backbone (depth encoder) pre-trained on ImageNet.

Click 'yes' on a row to download specific pre-trained weights. The weights are agnostic to image resolutions.

start training

python train.py --data_path path/to/your/data --model_name mytrain --num_epochs 30 --batch_size 12 --mypretrain path/to/your/pretrained/weights  --lr 0.0001 5e-6 31 0.0001 1e-5 31

tensorboard visualization

tensorboard --log_dir ./tmp/mytrain

Make Your Own Pre-training Weights On ImageNet

Since a lot of people are interested in training their own backbone on ImageNet, I also upload my pre-training scripts to this repo.

Citation

@InProceedings{Zhang_2023_CVPR,
author    = {Zhang, Ning and Nex, Francesco and Vosselman, George and Kerle, Norman},
title     = {Lite-Mono: A Lightweight CNN and Transformer Architecture for Self-Supervised Monocular Depth Estimation},
booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
month     = {June},
year      = {2023},
pages     = {18537-18546}
}

lite-mono's People

Contributors

Stargazers

Watchers

Forkers

qianqian121 ai-jie01 erintudelft janneskasper marenan byeongjun1022 christiaantheunisse hanoch666 yuelong59 park123man redeyezt whuhxb daydreamer2023 redredredc yuanchunyu dannyway03 htensor robotseye rainbowpillow sunhucheng sheffieldcao xleric shirleyxin01 shuweishao chunihiro xiaow89 igiardiyanto xqpinitial eltonfss wpfhtl arkitect-z hologerry tianfengke youhha hnzhaoyli beyourself312 jxncyym quanqhow daehow jackyfriend gcx158632979 kurumi715 zhongqiu1245 peter12398 anthony1011 sonainjameel victorwylde bosshuan mengzelin jackzhousz red-liu yfh-yufeihu gabrocecco dragroo tiandaji

lite-mono's Issues

Total Number of Parameters

Thank you for sharing your great work. I want to print the total number of parameters but it seems it's given me the wrong numbers.

I added these two lines of codes in the trainer.py code after "print("Training is using:\n ", self.device)":

print("Total number of parameters to train:", len(self.parameters_to_train))
print("Total number of parameters to train Pose:", len(self.parameters_to_train_pose))

However here are the results I got: " Total number of parameters to train: 227, Total number of parameters to train Pose: 70"

Would you please let me know how can I print the total number of parameters for the whole training? Im not using any pre-trained model.

Here is the comment for training:

python train.py --data_path /media/armin/DATA/Lightweight/kitti_data --model_name mytrain --num_epochs 30 --num_workers 4 --batch_size 4 --lr 0.0001 5e-6 31 0.0001 1e-5 31

The index exceeds the dataset itself

Hi ! noahzn:
I'm bothering you again! I run train.py in pycharm .The program reported an error:FileNotFoundError: [Errno 2] No such file or directory: 'C:\Users\Mr-Sun\Desktop\lite_mono\Lite-Mono\kitti_data\2011_09_26/2011_09_26_drive_0002_sync\image_03/data\0000000077.png'

but kitti_data\2011_09_26/2011_09_26_drive_0002_sync\image_03/data just 76 images .

sometimes ,the program reported an error:FileNotFoundError: [Errno 2] No such file or directory: 'C:\Users\Mr-Sun\Desktop\lite_mono\Lite-Mono\kitti_data\2011_09_26/2011_09_26_drive_0001_sync\image_02/data\0000000108.png'

but kitti_data\2011_09_26/2011_09_26_drive_0001_sync\image_02/data just 107 images .

The errors is all the same , so I think this is a common question, have you ever been in this situation? Do you know the reason for the error?

test_simple.py

hi, I had a problem when I run test_simple.py. The output is as follows :
File "test_simple.py", line 154, in test_simple
outputs, _ = depth_decoder(features)
ValueError: too many values to unpack (expected 2)

And I change the code to "outputs = depth_decoder(features)", that will no problem. But I wonder if this modification would have an effect on the result？

Timm version

Hi @noahzn,

I am trying to train the model, so I followed the procedure from monodepth2. I created the conda environment with the right packages and versions, but when I run the training, it get an error saying that timm is missing. I tried to install a few different versions, but it keeps throwing errors. I also tried it with pytorch 1.0, since that is also possible according to the monodepth2 page, but that doesn't work neither.

Can you tell me which timm package I need and if I should use the same package versions as monodepth2.

Thanks in advance!

pre-train method on the ImageNet dataset

Hello, how did you obtain the pre-trained weights on the ImageNet dataset?

How to get the correct inference speed of the model

I am running the following code on Ubuntu, under TITAN V, and I get an inference speed of 13.2ms, which is much different from the results in your paper, is this due to the code or the hardware?

import numpy as np
import torch
from torch.backends import cudnn
import tqdm
cudnn.benchmark = True

device = 'cuda:6'
encoder = encoder.to(device)
decoder = decoder.to(device)
repetitions = 300
dummy_input = torch.rand(1, 3,192,640).to(device)

print('warm up ...\n')
with torch.no_grad():
    for _ in range(100):
        _ = decoder(encoder(dummy_input))
torch.cuda.synchronize()

starter, ender = torch.cuda.Event(enable_timing=True), torch.cuda.Event(enable_timing=True)

timings = np.zeros((repetitions, 1))

print('testing ...\n')
with torch.no_grad():
    for rep in tqdm.tqdm(range(repetitions)):
        starter.record()
        _ = decoder(encoder(dummy_input))
        ender.record()
        torch.cuda.synchronize()
        curr_time = starter.elapsed_time(ender) 
        timings[rep] = curr_time

avg = timings.sum()/repetitions
print('\navg={}\n'.format(avg))

self.models_pose["pose_encoder"] or self.models["pose_encoder"]?

A small typo I guess.

Lite-Mono/trainer.py

Line 343 in 286c4ab

pose_inputs = [self.models["pose_encoder"](pose_inputs)]

about train_files.txt

can you give me the train_files.txt of your code ,i only find text_files.txt in your code .so i don't know what data do you use in you training.thank you very much.

Training code

When will the training code be released, I am very much looking forward to seeing the specific implementation

Can't distributed training be conducted?

The questions about the ImageNet pre-trained model

Hello, I would like to ask if the ImageNet pre-training model is obtained by training with the current network model using the ImageNet dataset, and then load this pre-training model when training on the kitti dataset, right?

I get different results on the test set and other questions.

Hello, thanks for great code.I got the pretrained model and I ran the train.py to reproduce the lite-mono, but I got different results.

paper
abs_rel | sq_rel | rmse | rmse_log | a1 | a2 | a3 |
0.107 0.765 4.561 0.183 0.886 0.963 0.983

own
abs_rel | sq_rel | rmse | rmse_log | a1 | a2 | a3 |
0.107 0.804 4.633 0.185 0.885 0.962 0.982

Among them, the second and third indicators are quite different from those in the paper.I did not make changes in trainer.py, options.py, depth_encoder.py and depth_decoder.py.Due to an unknown computer error,I change the kitti_dataset.py and mono_dataset.py.The new kitti_dataset.py and mono_dataset.py are from monodepth2.
1.I want to know why my result is not good.Because of kitti_dataset.py and mono_dataset.py?
2.Why the shuffle in the DataLoader is True?
self.train_loader = DataLoader(
train_dataset, self.opt.batch_size, shuffle=True,
num_workers=self.opt.num_workers, pin_memory=True, drop_last=True)
3.Why use the def _init_weights(self, m) in the encoder and decoder?
4.In addition to the settings in options.py and trainer.py, what details do I need to pay attention to in order to reproduce the result of the lite-mono model？
Looking forward to your reply, thank you.

Evaluation of Lite-Mono-8m

Hi @noahzn, thanks for your great work!

I was trying to evaluate the lite-mono family. I am able to test lite-mono and lite-mono-small but encountered the following error when evaluating on lite-mono-8m:

Traceback (most recent call last):
  File "/Lite-Mono/evaluate_depth.py", line 223, in <module>
    evaluate(options.parse())
  File "/Lite-Mono/evaluate_depth.py", line 102, in evaluate
    encoder = networks.LiteMono(model=opt.model,
  File "/Lite-Mono/networks/depth_encoder.py", line 384, in __init__
    stage_blocks.append(DilatedConv(dim=self.dims[i], k=3, dilation=self.dilation[i][j], drop_path=dp_rates[cur + j],
IndexError: list index out of range

Could you have a look at this? Thanks!

ImageNet pretrained weights

I use your source code to reproduce the paper, the result is worse, where <1.25 is only 0.852, is there something wrong with the source code?

Can't not find the package "linear_warmup_cosine_annealing_warm_restarts_weight_decay"?

size mismatch for downsample_layers......

hi:
when i run with " python test_simple.py --load_weights_folder ./model/ --image_path ./indoor/", i got mistake like " Missing key(s) in state_dict: "stages.2.6.ddwconv.conv.weight", ...... size mismatch for downsample_layers.0.0.conv.weight:....",and i download the weight " lite-mono-tiny". Then all of the model i have tried and got same erorr.
how can i fix this, thank you so much!

太气人了，一堆bug

'LiteMono' object has no attribute 'dilation'

Hi, when I train my dataset and load weights to test, I get the following error, what is the reason?
AttributeError: 'LiteMono' object has no attribute 'dilation'

Can you provide the code to turn the generated depth map into a point cloud?

inputs[key] = ipt.to(self.device)，AttributeError: 'NoneType' object has no attribute 'to'`

Hello, we attempted to run Lite mono's training, and we prepared the kitti dataset and successfully loaded it. However, while training, we encountered the following problem:

`(dh_robodepth) lyu4@lyu4:~/dh_wp/Lite-Mono-main$ CUDA_VISIBLE_DEVICES=1 python train.py
Training model named:
mymono
Models and tensorboard events files are saved to:
./tmp
Training is using:
cuda
/home/lyu4/anaconda3/envs/dh_robodepth/lib/python3.10/site-packages/torchvision/transforms/transforms.py:332: UserWarning: Argument interpolation should be of type InterpolationMode instead of int. Please, use InterpolationMode enum.
warnings.warn(
Using split:
eigen_zhou
There are 39810 training items and 4424 validation items

Training
Traceback (most recent call last):
File "/home/lyu4/dh_wp/Lite-Mono-main/train.py", line 12, in
trainer.train()
File "/home/lyu4/dh_wp/Lite-Mono-main/trainer.py", line 218, in train
self.run_epoch()
File "/home/lyu4/dh_wp/Lite-Mono-main/trainer.py", line 237, in run_epoch
outputs, losses = self.process_batch(inputs)
File "/home/lyu4/dh_wp/Lite-Mono-main/trainer.py", line 268, in process_batch
inputs[key] = ipt.to(self.device)
AttributeError: 'NoneType' object has no attribute 'to'`

We have tried many methods, but have been stuck here. Can you help me solve this problem? thank

Test

Thank you very much for your open source project. Do you have a single image test code? We want to verify the effect of the model

tensorboard file

Hi~, can you provide training logs on kitti dataset? For example, the tensorboard file, I want to make a comparison with my training process.

Setup / environment.yml

Hi,
I tried running test_simple.py with lite-mono-tiny, however I get an error. Since there are no setup instructions or an environment.yml file provided I assume that I am using a wrong version of pytorch. Can you please provide the setup information.

This is the error:

python test_simple.py --load_weights_folder weights/lite-mono-tiny_640x192 --image_path img/data/WIN_20230427_19_29_04_Pro.jpg
-> Loading model from  weights/lite-mono-tiny_640x192
   Loading pretrained encoder
Traceback (most recent call last):
  File "C:\git\Lite-Mono\test_simple.py", line 192, in <module>
    test_simple(args)
  File "C:\git\Lite-Mono\test_simple.py", line 86, in test_simple
    encoder.load_state_dict({k: v for k, v in encoder_dict.items() if k in model_dict})
  File "C:\Users\jon\AppData\Local\mambaforge\envs\litemono\lib\site-packages\torch\nn\modules\module.py", line 2041, in load_state_dict
    raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
RuntimeError: Error(s) in loading state_dict for LiteMono:
        Missing key(s) in state_dict: "stages.2.6.ddwconv.conv.weight", "stages.2.6.bn1.weight", "stages.2.6.bn1.bias", "stages.2.6.bn1.running_mean", "stages.2.6.bn1.running_var", "stages.2.7.gamma", "stages.2.7.ddwconv.conv.weight", "stages.2.7.bn1.weight", "stages.2.7.bn1.bias", "stages.2.7.bn1.running_mean", "stages.2.7.bn1.running_var", "stages.2.7.norm.weight", "stages.2.7.norm.bias", "stages.2.7.pwconv1.weight", "stages.2.7.pwconv1.bias", "stages.2.7.pwconv2.weight", "stages.2.7.pwconv2.bias", "stages.2.8.gamma", "stages.2.8.ddwconv.conv.weight", "stages.2.8.bn1.weight", "stages.2.8.bn1.bias", "stages.2.8.bn1.running_mean", "stages.2.8.bn1.running_var", "stages.2.8.norm.weight", "stages.2.8.norm.bias", "stages.2.8.pwconv1.weight", "stages.2.8.pwconv1.bias", "stages.2.8.pwconv2.weight", "stages.2.8.pwconv2.bias", "stages.2.9.gamma_xca", "stages.2.9.gamma", "stages.2.9.norm_xca.weight", "stages.2.9.norm_xca.bias", "stages.2.9.xca.temperature", "stages.2.9.xca.qkv.weight", "stages.2.9.xca.qkv.bias", "stages.2.9.xca.proj.weight", "stages.2.9.xca.proj.bias", "stages.2.9.norm.weight", "stages.2.9.norm.bias", "stages.2.9.pwconv1.weight", "stages.2.9.pwconv1.bias", "stages.2.9.pwconv2.weight", "stages.2.9.pwconv2.bias".
        size mismatch for downsample_layers.0.0.conv.weight: copying a param with shape torch.Size([32, 3, 3, 3]) from checkpoint, the shape in current model is torch.Size([48, 3, 3, 3]).
        size mismatch for downsample_layers.0.0.bn_gelu.bn.weight: copying a param with shape torch.Size([32]) from checkpoint, the shape in current model is torch.Size([48]).
        size mismatch for downsample_layers.0.0.bn_gelu.bn.bias: copying a param with shape torch.Size([32]) from checkpoint, the shape in current model is torch.Size([48]).
        size mismatch for downsample_layers.0.0.bn_gelu.bn.running_mean: copying a param with shape torch.Size([32]) from checkpoint, the shape in current model is torch.Size([48]).
        size mismatch for downsample_layers.0.0.bn_gelu.bn.running_var: copying a param with shape torch.Size([32]) from checkpoint, the shape in current model is torch.Size([48]).
        size mismatch for downsample_layers.0.1.conv.weight: copying a param with shape torch.Size([32, 32, 3, 3]) from checkpoint, the shape in current model is torch.Size([48, 48, 3, 3]).
        size mismatch for downsample_layers.0.1.bn_gelu.bn.weight: copying a param with shape torch.Size([32]) from checkpoint, the shape in current model is torch.Size([48]).
        size mismatch for downsample_layers.0.1.bn_gelu.bn.bias: copying a param with shape torch.Size([32]) from checkpoint, the shape in current model is torch.Size([48]).
        size mismatch for downsample_layers.0.1.bn_gelu.bn.running_mean: copying a param with shape torch.Size([32]) from checkpoint, the shape in current model is torch.Size([48]).
        size mismatch for downsample_layers.0.1.bn_gelu.bn.running_var: copying a param with shape torch.Size([32]) from checkpoint, the shape in current model is torch.Size([48]).
        size mismatch for downsample_layers.0.2.conv.weight: copying a param with shape torch.Size([32, 32, 3, 3]) from checkpoint, the shape in current model is torch.Size([48, 48, 3, 3]).
        size mismatch for downsample_layers.0.2.bn_gelu.bn.weight: copying a param with shape torch.Size([32]) from checkpoint, the shape in current model is torch.Size([48]).
        size mismatch for downsample_layers.0.2.bn_gelu.bn.bias: copying a param with shape torch.Size([32]) from checkpoint, the shape in current model is torch.Size([48]).
        size mismatch for downsample_layers.0.2.bn_gelu.bn.running_mean: copying a param with shape torch.Size([32]) from checkpoint, the shape in current model is torch.Size([48]).
        size mismatch for downsample_layers.0.2.bn_gelu.bn.running_var: copying a param with shape torch.Size([32]) from checkpoint, the shape in current model is torch.Size([48]).
        size mismatch for downsample_layers.1.0.conv.weight: copying a param with shape torch.Size([64, 67, 3, 3]) from checkpoint, the shape in current model is torch.Size([80, 99, 3, 3]).
        size mismatch for downsample_layers.2.0.conv.weight: copying a param with shape torch.Size([128, 131, 3, 3]) from checkpoint, the shape in current model is torch.Size([128, 163, 3, 3]).
        size mismatch for stem2.0.conv.weight: copying a param with shape torch.Size([32, 35, 3, 3]) from checkpoint, the shape in current model is torch.Size([48, 51, 3, 3]).
        size mismatch for stages.0.0.gamma: copying a param with shape torch.Size([32]) from checkpoint, the shape in current model is torch.Size([48]).
        size mismatch for stages.0.0.ddwconv.conv.weight: copying a param with shape torch.Size([32, 1, 3, 3]) from checkpoint, the shape in current model is torch.Size([48, 1, 3, 3]).
        size mismatch for stages.0.0.bn1.weight: copying a param with shape torch.Size([32]) from checkpoint, the shape in current model is torch.Size([48]).
        size mismatch for stages.0.0.bn1.bias: copying a param with shape torch.Size([32]) from checkpoint, the shape in current model is torch.Size([48]).
        size mismatch for stages.0.0.bn1.running_mean: copying a param with shape torch.Size([32]) from checkpoint, the shape in current model is torch.Size([48]).
        size mismatch for stages.0.0.bn1.running_var: copying a param with shape torch.Size([32]) from checkpoint, the shape in current model is torch.Size([48]).
        size mismatch for stages.0.0.norm.weight: copying a param with shape torch.Size([32]) from checkpoint, the shape in current model is torch.Size([48]).
        size mismatch for stages.0.0.norm.bias: copying a param with shape torch.Size([32]) from checkpoint, the shape in current model is torch.Size([48]).
        size mismatch for stages.0.0.pwconv1.weight: copying a param with shape torch.Size([192, 32]) from checkpoint, the shape in current model is torch.Size([288, 48]).
        size mismatch for stages.0.0.pwconv1.bias: copying a param with shape torch.Size([192]) from checkpoint, the shape in current model is torch.Size([288]).
        size mismatch for stages.0.0.pwconv2.weight: copying a param with shape torch.Size([32, 192]) from checkpoint, the shape in current model is torch.Size([48, 288]).
        size mismatch for stages.0.0.pwconv2.bias: copying a param with shape torch.Size([32]) from checkpoint, the shape in current model is torch.Size([48]).
        size mismatch for stages.0.1.gamma: copying a param with shape torch.Size([32]) from checkpoint, the shape in current model is torch.Size([48]).
        size mismatch for stages.0.1.ddwconv.conv.weight: copying a param with shape torch.Size([32, 1, 3, 3]) from checkpoint, the shape in current model is torch.Size([48, 1, 3, 3]).
        size mismatch for stages.0.1.bn1.weight: copying a param with shape torch.Size([32]) from checkpoint, the shape in current model is torch.Size([48]).
        size mismatch for stages.0.1.bn1.bias: copying a param with shape torch.Size([32]) from checkpoint, the shape in current model is torch.Size([48]).
        size mismatch for stages.0.1.bn1.running_mean: copying a param with shape torch.Size([32]) from checkpoint, the shape in current model is torch.Size([48]).
        size mismatch for stages.0.1.bn1.running_var: copying a param with shape torch.Size([32]) from checkpoint, the shape in current model is torch.Size([48]).
        size mismatch for stages.0.1.norm.weight: copying a param with shape torch.Size([32]) from checkpoint, the shape in current model is torch.Size([48]).
        size mismatch for stages.0.1.norm.bias: copying a param with shape torch.Size([32]) from checkpoint, the shape in current model is torch.Size([48]).
        size mismatch for stages.0.1.pwconv1.weight: copying a param with shape torch.Size([192, 32]) from checkpoint, the shape in current model is torch.Size([288, 48]).
        size mismatch for stages.0.1.pwconv1.bias: copying a param with shape torch.Size([192]) from checkpoint, the shape in current model is torch.Size([288]).
        size mismatch for stages.0.1.pwconv2.weight: copying a param with shape torch.Size([32, 192]) from checkpoint, the shape in current model is torch.Size([48, 288]).
        size mismatch for stages.0.1.pwconv2.bias: copying a param with shape torch.Size([32]) from checkpoint, the shape in current model is torch.Size([48]).
        size mismatch for stages.0.2.gamma: copying a param with shape torch.Size([32]) from checkpoint, the shape in current model is torch.Size([48]).
        size mismatch for stages.0.2.ddwconv.conv.weight: copying a param with shape torch.Size([32, 1, 3, 3]) from checkpoint, the shape in current model is torch.Size([48, 1, 3, 3]).
        size mismatch for stages.0.2.bn1.weight: copying a param with shape torch.Size([32]) from checkpoint, the shape in current model is torch.Size([48]).
        size mismatch for stages.0.2.bn1.bias: copying a param with shape torch.Size([32]) from checkpoint, the shape in current model is torch.Size([48]).
        size mismatch for stages.0.2.bn1.running_mean: copying a param with shape torch.Size([32]) from checkpoint, the shape in current model is torch.Size([48]).
        size mismatch for stages.0.2.bn1.running_var: copying a param with shape torch.Size([32]) from checkpoint, the shape in current model is torch.Size([48]).
        size mismatch for stages.0.2.norm.weight: copying a param with shape torch.Size([32]) from checkpoint, the shape in current model is torch.Size([48]).
        size mismatch for stages.0.2.norm.bias: copying a param with shape torch.Size([32]) from checkpoint, the shape in current model is torch.Size([48]).
        size mismatch for stages.0.2.pwconv1.weight: copying a param with shape torch.Size([192, 32]) from checkpoint, the shape in current model is torch.Size([288, 48]).
        size mismatch for stages.0.2.pwconv1.bias: copying a param with shape torch.Size([192]) from checkpoint, the shape in current model is torch.Size([288]).
        size mismatch for stages.0.2.pwconv2.weight: copying a param with shape torch.Size([32, 192]) from checkpoint, the shape in current model is torch.Size([48, 288]).
        size mismatch for stages.0.2.pwconv2.bias: copying a param with shape torch.Size([32]) from checkpoint, the shape in current model is torch.Size([48]).
        size mismatch for stages.0.3.gamma_xca: copying a param with shape torch.Size([32]) from checkpoint, the shape in current model is torch.Size([48]).
        size mismatch for stages.0.3.gamma: copying a param with shape torch.Size([32]) from checkpoint, the shape in current model is torch.Size([48]).
        size mismatch for stages.0.3.pos_embd.token_projection.weight: copying a param with shape torch.Size([32, 64, 1, 1]) from checkpoint, the shape in current model is torch.Size([48, 64, 1, 1]).     
        size mismatch for stages.0.3.pos_embd.token_projection.bias: copying a param with shape torch.Size([32]) from checkpoint, the shape in current model is torch.Size([48]).
        size mismatch for stages.0.3.norm_xca.weight: copying a param with shape torch.Size([32]) from checkpoint, the shape in current model is torch.Size([48]).
        size mismatch for stages.0.3.norm_xca.bias: copying a param with shape torch.Size([32]) from checkpoint, the shape in current model is torch.Size([48]).
        size mismatch for stages.0.3.xca.qkv.weight: copying a param with shape torch.Size([96, 32]) from checkpoint, the shape in current model is torch.Size([144, 48]).
        size mismatch for stages.0.3.xca.qkv.bias: copying a param with shape torch.Size([96]) from checkpoint, the shape in current model is torch.Size([144]).
        size mismatch for stages.0.3.xca.proj.weight: copying a param with shape torch.Size([32, 32]) from checkpoint, the shape in current model is torch.Size([48, 48]).
        size mismatch for stages.0.3.xca.proj.bias: copying a param with shape torch.Size([32]) from checkpoint, the shape in current model is torch.Size([48]).
        size mismatch for stages.0.3.norm.weight: copying a param with shape torch.Size([32]) from checkpoint, the shape in current model is torch.Size([48]).
        size mismatch for stages.0.3.norm.bias: copying a param with shape torch.Size([32]) from checkpoint, the shape in current model is torch.Size([48]).
        size mismatch for stages.0.3.pwconv1.weight: copying a param with shape torch.Size([192, 32]) from checkpoint, the shape in current model is torch.Size([288, 48]).
        size mismatch for stages.0.3.pwconv1.bias: copying a param with shape torch.Size([192]) from checkpoint, the shape in current model is torch.Size([288]).
        size mismatch for stages.0.3.pwconv2.weight: copying a param with shape torch.Size([32, 192]) from checkpoint, the shape in current model is torch.Size([48, 288]).
        size mismatch for stages.0.3.pwconv2.bias: copying a param with shape torch.Size([32]) from checkpoint, the shape in current model is torch.Size([48]).
        size mismatch for stages.1.0.gamma: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([80]).
        size mismatch for stages.1.0.ddwconv.conv.weight: copying a param with shape torch.Size([64, 1, 3, 3]) from checkpoint, the shape in current model is torch.Size([80, 1, 3, 3]).
        size mismatch for stages.1.0.bn1.weight: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([80]).
        size mismatch for stages.1.0.bn1.bias: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([80]).
        size mismatch for stages.1.0.bn1.running_mean: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([80]).
        size mismatch for stages.1.0.bn1.running_var: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([80]).
        size mismatch for stages.1.0.norm.weight: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([80]).
        size mismatch for stages.1.0.norm.bias: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([80]).
        size mismatch for stages.1.0.pwconv1.weight: copying a param with shape torch.Size([384, 64]) from checkpoint, the shape in current model is torch.Size([480, 80]).
        size mismatch for stages.1.0.pwconv1.bias: copying a param with shape torch.Size([384]) from checkpoint, the shape in current model is torch.Size([480]).
        size mismatch for stages.1.0.pwconv2.weight: copying a param with shape torch.Size([64, 384]) from checkpoint, the shape in current model is torch.Size([80, 480]).
        size mismatch for stages.1.0.pwconv2.bias: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([80]).
        size mismatch for stages.1.1.gamma: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([80]).
        size mismatch for stages.1.1.ddwconv.conv.weight: copying a param with shape torch.Size([64, 1, 3, 3]) from checkpoint, the shape in current model is torch.Size([80, 1, 3, 3]).
        size mismatch for stages.1.1.bn1.weight: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([80]).
        size mismatch for stages.1.1.bn1.bias: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([80]).
        size mismatch for stages.1.1.bn1.running_mean: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([80]).
        size mismatch for stages.1.1.bn1.running_var: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([80]).
        size mismatch for stages.1.1.norm.weight: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([80]).
        size mismatch for stages.1.1.norm.bias: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([80]).
        size mismatch for stages.1.1.pwconv1.weight: copying a param with shape torch.Size([384, 64]) from checkpoint, the shape in current model is torch.Size([480, 80]).
        size mismatch for stages.1.1.pwconv1.bias: copying a param with shape torch.Size([384]) from checkpoint, the shape in current model is torch.Size([480]).
        size mismatch for stages.1.1.pwconv2.weight: copying a param with shape torch.Size([64, 384]) from checkpoint, the shape in current model is torch.Size([80, 480]).
        size mismatch for stages.1.1.pwconv2.bias: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([80]).
        size mismatch for stages.1.2.gamma: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([80]).
        size mismatch for stages.1.2.ddwconv.conv.weight: copying a param with shape torch.Size([64, 1, 3, 3]) from checkpoint, the shape in current model is torch.Size([80, 1, 3, 3]).
        size mismatch for stages.1.2.bn1.weight: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([80]).
        size mismatch for stages.1.2.bn1.bias: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([80]).
        size mismatch for stages.1.2.bn1.running_mean: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([80]).
        size mismatch for stages.1.2.bn1.running_var: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([80]).
        size mismatch for stages.1.2.norm.weight: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([80]).
        size mismatch for stages.1.2.norm.bias: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([80]).
        size mismatch for stages.1.2.pwconv1.weight: copying a param with shape torch.Size([384, 64]) from checkpoint, the shape in current model is torch.Size([480, 80]).
        size mismatch for stages.1.2.pwconv1.bias: copying a param with shape torch.Size([384]) from checkpoint, the shape in current model is torch.Size([480]).
        size mismatch for stages.1.2.pwconv2.weight: copying a param with shape torch.Size([64, 384]) from checkpoint, the shape in current model is torch.Size([80, 480]).
        size mismatch for stages.1.2.pwconv2.bias: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([80]).
        size mismatch for stages.1.3.gamma_xca: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([80]).
        size mismatch for stages.1.3.gamma: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([80]).
        size mismatch for stages.1.3.norm_xca.weight: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([80]).
        size mismatch for stages.1.3.norm_xca.bias: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([80]).
        size mismatch for stages.1.3.xca.qkv.weight: copying a param with shape torch.Size([192, 64]) from checkpoint, the shape in current model is torch.Size([240, 80]).
        size mismatch for stages.1.3.xca.qkv.bias: copying a param with shape torch.Size([192]) from checkpoint, the shape in current model is torch.Size([240]).
        size mismatch for stages.1.3.xca.proj.weight: copying a param with shape torch.Size([64, 64]) from checkpoint, the shape in current model is torch.Size([80, 80]).
        size mismatch for stages.1.3.xca.proj.bias: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([80]).
        size mismatch for stages.1.3.norm.weight: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([80]).
        size mismatch for stages.1.3.norm.bias: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([80]).
        size mismatch for stages.1.3.pwconv1.weight: copying a param with shape torch.Size([384, 64]) from checkpoint, the shape in current model is torch.Size([480, 80]).
        size mismatch for stages.1.3.pwconv1.bias: copying a param with shape torch.Size([384]) from checkpoint, the shape in current model is torch.Size([480]).
        size mismatch for stages.1.3.pwconv2.weight: copying a param with shape torch.Size([64, 384]) from checkpoint, the shape in current model is torch.Size([80, 480]).
        size mismatch for stages.1.3.pwconv2.bias: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([80]).

Missing opt.json

Hi! I want to reimplement Lite-Mono on KITTI, but do know the right training setting.
I have carefully read your paper but still can't tell the right epoch and batch size for each model.
I tried num_epochs=30, but only get 0.84 a1 on KITTI.
So I wish you can provide a list of the opt.json or training command of each model.

Can you provide details on setting options about lite-mono-tiny,lite-mono-small and the ablation expriment?

Can you provide details on setting options about lite-mono-tiny,lite-mono-small and the ablation expriment?
the option such as learning rate,epoch and batch size...
Thank you!

Can you share the code for calculating FLOPs?

Thank you for the wonderful work!

I would like to know how you calculate the FlOPs? What libraries do you use?
I have tried using FVCORE to calculate FLOPs, but that library does not seem to support GELU.
In addition, many libraries do not support models with lists in the input, so they cannot calculate FLOPs for the depth decoder.
Can you share the code for calculating FLOPs? Many thanks!

Training the model via Google Colab using cloud storage

Hi! I'd like to train your model (for example, a lite-mono with 3.1M params) using Google Colab. I found out that the uploading speed is quite slow if I try to upload all the data directly to Colab. So, because of that I've decided to try using some cloud storage. Specifically, I have a bucket in Yandex Object Storage with the entire KITTI data.
Please, can you give me advice how I should access all this data properly during my training process? Is it possible, at least? In other words, I'd like to access the data as needed during the training - so I won't have to store the entire dataset on my Colab machine.

About the ImageNet pre-training model

Hello, thank you for your help to me before, but I still want to ask how to pre-train on the ImageNet-1K dataset. It is the same as the kitti_data dataset training method, modify the file path in eigen_zhou, and then follow Is it ok to train with the command given. I have downloaded the ImageNet-1K dataset now, thank you very much for your help

ImageNet pretrained model

Hi, thanks for your great work and code. I trained the Lite-Mono with no pretrained weights and get a similar performance as in the paper. But is it possible for you to upload the ImageNet pretrained model?

A little problem

Hi~, noahzn

May I ask how you get the intrinsics 'K' of your camera?
self.K = np.array([[0.58, 0, 0.5, 0],
[0, 1.92, 0.5, 0],
[0, 0, 1, 0],
[0, 0, 0, 1]], dtype=np.float32)
Bests,
Hu Li.

How to evaluate the lightweight of the model？

Good evening, I would like to ask how to calculate FLOPs and speed in this project, can you provide the code for this part?

About the random seed

Did you set a random seed when you did the test experiment?

How do I set camera parameters when recovering point clouds from depth maps?

I used my own image data as input. How do I set the camera parameters? To recover the three-dimensional point cloud

Can it be used for face depth estimation?

About imagenet pre-training

Hello, can you provide the code for imagenet pre-training? My email is [email protected]. Thank you!

evaluation error [ValueError: too many values to unpack (expected 2)]

dear author:

when i run your eval command in a conda env:

$ python evaluate_depth.py --load_weights_folder xxx/lite-mono/weights/input_size_1024_320/lite-mono_1024x320 --data_path xxx/kitti_depth_data --model lite-mono

the outputs from the terminal:

Traceback (most recent call last):
  File "evaluate_depth.py", line 223, in <module>
    evaluate(options.parse())
  File "evaluate_depth.py", line 131, in evaluate
    output, _ = depth_decoder(encoder(input_color))
ValueError: too many values to unpack (expected 2)

how can i slove this problem? thanks a lot!

RuntimeError: PytorchStreamReader failed reading zip archive: failed finding central directory

when I run python train.py --data_path /mypath --model_name mytrain --num_epochs 30 --batch_size 12 --mypretrain ./weight/lite-mono-8m_1024x320 --lr 0.0001 5e-6 31 0.0001 1e-5 31,
that display "Traceback (most recent call last):
File "E:\qitaxiangmu\Lite-Mono\train.py", line 11, in
trainer = Trainer(opts)
File "E:\qitaxiangmu\Lite-Mono\trainer.py", line 71, in init
self.models_pose["pose_encoder"] = networks.ResnetEncoder(
File "E:\qitaxiangmu\Lite-Mono\networks\resnet_encoder.py", line 76, in init
self.encoder = resnet_multiimage_input(num_layers, pretrained, num_input_images)
File "E:\qitaxiangmu\Lite-Mono\networks\resnet_encoder.py", line 48, in resnet_multiimage_input
loaded = model_zoo.load_url(models.resnet.model_urls['resnet{}'.format(num_layers)])
File "E:\qitaxiangmu\Track-Anything\venv\lib\site-packages\torch\hub.py", line 735, in load_state_dict_from_url
return torch.load(cached_file, map_location=map_location)
File "E:\qitaxiangmu\Track-Anything\venv\lib\site-packages\torch\serialization.py", line 777, in load
with _open_zipfile_reader(opened_file) as opened_zipfile:
File "E:\qitaxiangmu\Track-Anything\venv\lib\site-packages\torch\serialization.py", line 282, in init
super(_open_zipfile_reader, self).init(torch._C.PyTorchFileReader(name_or_buffer))"

training log

Hi, noahzn
Could you provide training logs?
When I was training, I didn't know why there was a problem with the predicted rgb:

Can you give me some help? Thank you very much!

About the test of the model on Make3D

Hello! Only a few days later, I'm here again.
I saw that in the paper, the model trained on the KITTI dataset is tested on the test set of Make3D. The same operation was performed in "Digging into Self-Supervised Monocular Depth Prediction", but I did not find the code for testing on the Make3D test set. I would like to know how this part of the test should be done？My Email is [email protected].

Looking forward to your reply. : )

What is the license of this project?

Hi,

cool work, and thanks for releasing the code!
What is the license of this project? Is it MIT like EPC-Depth, or something like Apache 2.0 as in the monocular depth toolbox or something else?

Best regards,
Markus

Another lightweight depth model RM-Depth (2.97M) in CVPR 2022

Just want to remind that there actually exists another lightweight depth model RM-Depth (CVPR 2022). It has a model size 2.97M while it achieves AbsRel = 0.107 on KITTI and AbsRel = 0.090 on Cityscapes testing sets. Unfortunately, it is not cited.

frame_ids

Hi,

I am confused about the values of frame_ids. Why are they default to [0,-1,1]?

about pretrain model

thank you for answering my previous question ,you code is great.but i don't have the pretrain model ,could you give me the pretrain model .my email is [email protected].

My training results are not as good as those in your paper

Hello, I would like to ask why I trained your code from scratch, and the results differ significantly from yours? I used the default hyperparameter in your code. Have you used monodepth2's automasking loss? Thank you very much for being able to open source, which has been very helpful to me!

About no imagenet pre-train

Thank you again for answering my question in such detail before! I am coming again.

I noticed that there are no imagenet pre-training results in the article. I found that both the depth encoder and the pose encoder in the code inherit the weight of imagenet.I want to know how the results without imagenet pre-training in the article are obtained. Only the depth network without pre-training or the pose network and the deep network without pre-training at the same time?

Looking forward to your reply.

This is my startup command

And these are the modified code

The following is the usage of the GPUS

It can be clearly seen that card 0 is occupying too much space！！

I don't know what went wrong. I hope you can help me