sunnyhelen / jperceiver Goto Github PK

View Code? Open in Web Editor NEW

72.0 4.0 7.0 36.34 MB

[ECCV 2022]JPerceiver: Joint Perception Network for Depth, Pose and Layout Estimation in Driving Scenes

Python 100.00%

autonomous-driving deep-learning perception bev-perception depth-estimation visual-odometry

jperceiver's People

Contributors

Stargazers

Watchers

Forkers

cv-depth jie311 daydreamer2023 simonsroad samehmohamed88 junyu-z zokooo

jperceiver's Issues

How to run this project.

Can you provide more details about how to run yours project.

Our obtained trajectory from draw_odometry.py is quite different from your paper.

issue

We ran python scripts/draw_odometry.py with the pre-trained model, kitti_raw_road.pth on the "KITTI Odometry" dataset.
But, the resulting trajectory is different from yours.
Both the evaluated values of RMSE, rotation error and the estimated values for the trajectory are different from the results in your paper(https://arxiv.org/pdf/2207.07895.pdf).

・The result in the paper
According to Table4
Average sequence translation RMSE (%): 4.57
Average sequence rotation error (deg/100m): 2.94
trajectry : Fig. 7

・Our result
Average sequence translation RMSE (%): 56.6012
Average sequence rotation error (deg/m): 0.4300

Runtime environment

azureml 1.44.0

Reproduction procedures

At draw_odometry.py:L28, KITTIOdomDataset argument "type" is now "static".
I created test_files_07.txt with 07/road_dense128/000000.png to 07/road_dense128/001100.png and placed it in JPerceiver/mono/datasets/splits/odometry/.
I set "~/<my_directory>/kitti_odometry_road.pth" in the path of model_path.
I set 7 in sequence_id.

our puopose

How to get the results of your paper ?

About config files

Hi,

I am confused that which config file I should choose to obtain the pretrained models. Could you please provide scripts for training and evaluating these models?

Besides, I have some questions about the setups in the configuration file:
(1) what is the gt depth file? Does it correspond to the validation set of kitti odometry?

JPerceiver/config/cfg_kitti_baseline_odometry_boundary_ce_iou_1024_20.py

Line 16 in 22ea4c8

gt_depth_path = '/CV/users/zhaohaimei3/gt_depths.npz',#path to gt data

（2）Although loss_sum is set to 3, loss_weightS is missing in the configuration file.

JPerceiver/config/cfg_kitti_baseline_odometry_boundary_ce_iou_1024_20.py

Line 52 in 22ea4c8

loss_sum = 3,

JPerceiver/mono/model/mono_baseline/net.py

Lines 581 to 582 in 22ea4c8

    
           elif opt.loss_sum == 3: 
        
               output = loss(generated_top_view, true_top_view) * opt.loss_weightS + \

Deprecated agroverse

Hi.
The agroverse 1.0 is deprecated and is now replaced with agroverse 1.1.
Do you have any plans to make modification?

Dear author ,how torun code in the nuscenes dataset?

a tutorial of Nuscenes

Really thanks!

Config file about the pretrained model.

Hello, nice work and thanks for the releasing source code!

As there was same issue before.. i am confused that which config file I should choose to obtain the pretrained models. Could you please provide scripts for training and evaluating these models?

inputs['bothD', 0, 0] missed in training process

After the datasets needed are downloaded, I ran the commanding code
# Training CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 python -m torch.distributed.launch --nproc_per_node=8 --master_port 25629 train.py --config config/cfg_kitti_baseline_odometry_boundary_ce_iou_1024_20.py --work_dir log/odometry/
However, KeyError: ('bothD', 0, 0) happened when computing the loss_dict of odometry dataset..
I noticed that when computing the loss of kitti_object, inputs['bothD', 0, 0] could be provided by getting groundtruth vehicle256, while inputs[''bothS, 0, 0] could be provided by getting groundtruth road dense128 .
Since the topview_lossB requires inputs['bothD', 0, 0] when I run odometry dataset, how can I get this variable?
How about running kitti object dataset, where can I find inputs['bothS', 0, 0] ?
Thanks.

failed to download ground truth of KITTI from monolayout

Hi, thank you for your wonderful work！
I tried to download GT layout of KITTI using the links in monolayout, only to get "404 FILE NOT FOUND".
Can you provide accessible links or other suggestions?

outputs("cam_T_cam",0,-1) is missing

After downloading dataset and pretrained model, I ran [eval_argo_both_video.py] this error:

T_ = outputs[("cam_T_cam", 0, -1)].cpu().numpy()[0]
KeyError: ('cam_T_cam', 0, -1)

IMGS_PER_GPU setting

According to the original config setting, the IMGS_PER_GPU is set to 3. However, the GPUs I use now are GTX 3080 10 GB so I changed the IMGS_PER_GPU to 1. Up to 120 epochs training (kitti odom), the eval results are
abs_rel: 0.2068, sq_rel: 1.5355, rmse: 4.9815, rmse_log: 0.2898, a1: 0.7079, a2: 0.9595, scale_mean: 1.7448, iou_road: 0.7133, mAP_road: 0.8781
These results have been achieved for around 50 epochs and haven't shown any descending trends.
Does the IMGS_PER_GPU set to 1 have a negative effect on the training/eval procedure?

How can we get result picture like these in paper?

Thank you very much for sharing your work results with us. Recently, I am also learning relevant knowledge. Can you tell me how I need to get the results as shown below?

Missing key(s) in state_dict

I try to evaluate the result of layout on kitti-dataset. But I meet this question of Missing key(s) in state_dict:"CycledViewProjectionB.transform_module.fc_transform.0.weight", "CycledViewProjectionB.transform_module.fc_transform.0.bias", "CycledViewProjectionB.transform_module.fc_transform.2.weight", "CycledViewProjectionB.transform_module.fc_transform.2.bias", "CycledViewProjectionB.retransform_module.fc_transform.0.weight", "CycledViewProjectionB.retransform_module.fc_transform.0.bias", "CycledViewProjectionB.retransform_module.fc_transform.2.weight", "CycledViewProjectionB.retransform_module.fc_transform.2.bias", "CrossViewTransformerB.query_conv.weight", "CrossViewTransformerB.query_conv.bias", "CrossViewTransformerB.key_conv.weight", "CrossViewTransformerB.key_conv.bias", "CrossViewTransformerB.value_conv.weight", "CrossViewTransformerB.value_conv.bias", "CrossViewTransformerB.f_conv.weight", "CrossViewTransformerB.f_conv.bias", "CrossViewTransformerB.res_conv.weight", "CrossViewTransformerB.res_conv.bias", "CrossViewTransformerB.query_conv_depth.weight", "CrossViewTransformerB.query_conv_depth.bias", "CrossViewTransformerB.key_conv_depth.weight", "CrossViewTransformerB.key_conv_depth.bias", "CrossViewTransformerB.value_conv_depth.weight", "CrossViewTransformerB.value_conv_depth.bias", "CrossViewTransformerB.conv1.conv.weight", "CrossViewTransformerB.conv1.conv.bias", "CrossViewTransformerB.conv2.conv.weight", "CrossViewTransformerB.conv2.conv.bias", "LayoutDecoderB.decoder.0.weight", "LayoutDecoderB.decoder.0.bias", "LayoutDecoderB.decoder.1.weight", "LayoutDecoderB.decoder.1.bias", "LayoutDecoderB.decoder.1.running_mean", "LayoutDecoderB.decoder.1.running_var", "LayoutDecoderB.decoder.3.weight", "LayoutDecoderB.decoder.3.bias", "LayoutDecoderB.decoder.4.weight", "LayoutDecoderB.decoder.4.bias", "LayoutDecoderB.decoder.4.running_mean", "LayoutDecoderB.decoder.4.running_var", "LayoutDecoderB.decoder.5.weight", "LayoutDecoderB.decoder.5.bias", "LayoutDecoderB.decoder.6.weight", "LayoutDecoderB.decoder.6.bias", "LayoutDecoderB.decoder.6.running_mean", "LayoutDecoderB.decoder.6.running_var", "LayoutDecoderB.decoder.8.weight", "LayoutDecoderB.decoder.8.bias", "LayoutDecoderB.decoder.9.weight", "LayoutDecoderB.decoder.9.bias", "LayoutDecoderB.decoder.9.running_mean", "LayoutDecoderB.decoder.9.running_var", "LayoutDecoderB.decoder.10.weight", "LayoutDecoderB.decoder.10.bias", "LayoutDecoderB.decoder.11.weight", "LayoutDecoderB.decoder.11.bias", "LayoutDecoderB.decoder.11.running_mean", "LayoutDecoderB.decoder.11.running_var", "LayoutDecoderB.decoder.13.weight", "LayoutDecoderB.decoder.13.bias", "LayoutDecoderB.decoder.14.weight", "LayoutDecoderB.decoder.14.bias", "LayoutDecoderB.decoder.14.running_mean", "LayoutDecoderB.decoder.14.running_var", "LayoutDecoderB.decoder.15.weight", "LayoutDecoderB.decoder.15.bias", "LayoutDecoderB.decoder.16.weight", "LayoutDecoderB.decoder.16.bias", "LayoutDecoderB.decoder.16.running_mean", "LayoutDecoderB.decoder.16.running_var", "LayoutDecoderB.decoder.18.weight", "LayoutDecoderB.decoder.18.bias", "LayoutDecoderB.decoder.19.weight", "LayoutDecoderB.decoder.19.bias", "LayoutDecoderB.decoder.19.running_mean", "LayoutDecoderB.decoder.19.running_var", "LayoutDecoderB.decoder.20.weight", "LayoutDecoderB.decoder.20.bias", "LayoutDecoderB.decoder.21.weight", "LayoutDecoderB.decoder.21.bias", "LayoutDecoderB.decoder.21.running_mean", "LayoutDecoderB.decoder.21.running_var", "LayoutDecoderB.decoder.23.weight", "LayoutDecoderB.decoder.23.bias", "LayoutDecoderB.decoder.24.weight", "LayoutDecoderB.decoder.24.bias", "LayoutDecoderB.decoder.24.running_mean", "LayoutDecoderB.decoder.24.running_var", "LayoutDecoderB.decoder.25.conv.weight", "LayoutDecoderB.decoder.25.conv.bias", "LayoutTransformDecoderB.decoder.0.weight", "LayoutTransformDecoderB.decoder.0.bias", "LayoutTransformDecoderB.decoder.1.weight", "LayoutTransformDecoderB.decoder.1.bias", "LayoutTransformDecoderB.decoder.1.running_mean", "LayoutTransformDecoderB.decoder.1.running_var", "LayoutTransformDecoderB.decoder.3.weight", "LayoutTransformDecoderB.decoder.3.bias", "LayoutTransformDecoderB.decoder.4.weight", "LayoutTransformDecoderB.decoder.4.bias", "LayoutTransformDecoderB.decoder.4.running_mean", "LayoutTransformDecoderB.decoder.4.running_var", "LayoutTransformDecoderB.decoder.5.weight", "LayoutTransformDecoderB.decoder.5.bias", "LayoutTransformDecoderB.decoder.6.weight", "LayoutTransformDecoderB.decoder.6.bias", "LayoutTransformDecoderB.decoder.6.running_mean", "LayoutTransformDecoderB.decoder.6.running_var", "LayoutTransformDecoderB.decoder.8.weight", "LayoutTransformDecoderB.decoder.8.bias", "LayoutTransformDecoderB.decoder.9.weight", "LayoutTransformDecoderB.decoder.9.bias", "LayoutTransformDecoderB.decoder.9.running_mean", "LayoutTransformDecoderB.decoder.9.running_var", "LayoutTransformDecoderB.decoder.10.weight", "LayoutTransformDecoderB.decoder.10.bias", "LayoutTransformDecoderB.decoder.11.weight", "LayoutTransformDecoderB.decoder.11.bias", "LayoutTransformDecoderB.decoder.11.running_mean", "LayoutTransformDecoderB.decoder.11.running_var", "LayoutTransformDecoderB.decoder.13.weight", "LayoutTransformDecoderB.decoder.13.bias", "LayoutTransformDecoderB.decoder.14.weight", "LayoutTransformDecoderB.decoder.14.bias", "LayoutTransformDecoderB.decoder.14.running_mean", "LayoutTransformDecoderB.decoder.14.running_var", "LayoutTransformDecoderB.decoder.15.weight", "LayoutTransformDecoderB.decoder.15.bias", "LayoutTransformDecoderB.decoder.16.weight", "LayoutTransformDecoderB.decoder.16.bias", "LayoutTransformDecoderB.decoder.16.running_mean", "LayoutTransformDecoderB.decoder.16.running_var", "LayoutTransformDecoderB.decoder.18.weight", "LayoutTransformDecoderB.decoder.18.bias", "LayoutTransformDecoderB.decoder.19.weight", "LayoutTransformDecoderB.decoder.19.bias", "LayoutTransformDecoderB.decoder.19.running_mean", "LayoutTransformDecoderB.decoder.19.running_var", "LayoutTransformDecoderB.decoder.20.weight", "LayoutTransformDecoderB.decoder.20.bias", "LayoutTransformDecoderB.decoder.21.weight", "LayoutTransformDecoderB.decoder.21.bias", "LayoutTransformDecoderB.decoder.21.running_mean", "LayoutTransformDecoderB.decoder.21.running_var", "LayoutTransformDecoderB.decoder.23.weight", "LayoutTransformDecoderB.decoder.23.bias", "LayoutTransformDecoderB.decoder.24.weight", "LayoutTransformDecoderB.decoder.24.bias", "LayoutTransformDecoderB.decoder.24.running_mean", "LayoutTransformDecoderB.decoder.24.running_var", "LayoutTransformDecoderB.decoder.25.conv.weight", "LayoutTransformDecoderB.decoder.25.conv.bias".

ModuleNotFoundError: No module named 'mono.model.mono_autoencoder'

JPerceiver/mono/model/__init__.py

Lines 2 to 4 in 802e511

    
           from .mono_autoencoder.net import autoencoder 
        
           from .mono_fm.net import mono_fm 
        
           from .mono_fm_joint.net import mono_fm_joint

Hi, the mono_autoencoder folder is missing.

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.

	elif opt.loss_sum == 3:
	output = loss(generated_top_view, true_top_view) * opt.loss_weightS + \

	from .mono_autoencoder.net import autoencoder
	from .mono_fm.net import mono_fm
	from .mono_fm_joint.net import mono_fm_joint