sunnyhelen / jperceiver Goto Github PK
View Code? Open in Web Editor NEW[ECCV 2022]JPerceiver: Joint Perception Network for Depth, Pose and Layout Estimation in Driving Scenes
[ECCV 2022]JPerceiver: Joint Perception Network for Depth, Pose and Layout Estimation in Driving Scenes
Can you provide more details about how to run yours project.
We ran python scripts/draw_odometry.py with the pre-trained model, kitti_raw_road.pth on the "KITTI Odometry" dataset.
But, the resulting trajectory is different from yours.
Both the evaluated values of RMSE, rotation error and the estimated values for the trajectory are different from the results in your paper(https://arxiv.org/pdf/2207.07895.pdf).
・The result in the paper
According to Table4
Average sequence translation RMSE (%): 4.57
Average sequence rotation error (deg/100m): 2.94
trajectry : Fig. 7
・Our result
Average sequence translation RMSE (%): 56.6012
Average sequence rotation error (deg/m): 0.4300
azureml 1.44.0
How to get the results of your paper ?
Hi,
I am confused that which config file I should choose to obtain the pretrained models. Could you please provide scripts for training and evaluating these models?
Besides, I have some questions about the setups in the configuration file:
(1) what is the gt depth file? Does it correspond to the validation set of kitti odometry?
(2)Although loss_sum is set to 3, loss_weightS is missing in the configuration file.
JPerceiver/mono/model/mono_baseline/net.py
Lines 581 to 582 in 22ea4c8
Hi.
The agroverse 1.0 is deprecated and is now replaced with agroverse 1.1.
Do you have any plans to make modification?
a tutorial of Nuscenes
Really thanks!
Hello, nice work and thanks for the releasing source code!
As there was same issue before.. i am confused that which config file I should choose to obtain the pretrained models. Could you please provide scripts for training and evaluating these models?
After the datasets needed are downloaded, I ran the commanding code
# Training CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 python -m torch.distributed.launch --nproc_per_node=8 --master_port 25629 train.py --config config/cfg_kitti_baseline_odometry_boundary_ce_iou_1024_20.py --work_dir log/odometry/
However, KeyError: ('bothD', 0, 0)
happened when computing the loss_dict of odometry dataset..
I noticed that when computing the loss of kitti_object, inputs['bothD', 0, 0]
could be provided by getting groundtruth vehicle256, while inputs[''bothS, 0, 0]
could be provided by getting groundtruth road dense128 .
Since the topview_lossB requires inputs['bothD', 0, 0]
when I run odometry dataset, how can I get this variable?
How about running kitti object dataset, where can I find inputs['bothS', 0, 0]
?
Thanks.
Hi, thank you for your wonderful work!
I tried to download GT layout of KITTI using the links in monolayout, only to get "404 FILE NOT FOUND".
Can you provide accessible links or other suggestions?
After downloading dataset and pretrained model, I ran [eval_argo_both_video.py] this error:
T_ = outputs[("cam_T_cam", 0, -1)].cpu().numpy()[0]
KeyError: ('cam_T_cam', 0, -1)
According to the original config setting, the IMGS_PER_GPU
is set to 3. However, the GPUs I use now are GTX 3080 10 GB so I changed the IMGS_PER_GPU
to 1. Up to 120 epochs training (kitti odom), the eval results are
abs_rel: 0.2068, sq_rel: 1.5355, rmse: 4.9815, rmse_log: 0.2898, a1: 0.7079, a2: 0.9595, scale_mean: 1.7448, iou_road: 0.7133, mAP_road: 0.8781
These results have been achieved for around 50 epochs and haven't shown any descending trends.
Does the IMGS_PER_GPU
set to 1 have a negative effect on the training/eval procedure?
I try to evaluate the result of layout on kitti-dataset. But I meet this question of Missing key(s) in state_dict:"CycledViewProjectionB.transform_module.fc_transform.0.weight", "CycledViewProjectionB.transform_module.fc_transform.0.bias", "CycledViewProjectionB.transform_module.fc_transform.2.weight", "CycledViewProjectionB.transform_module.fc_transform.2.bias", "CycledViewProjectionB.retransform_module.fc_transform.0.weight", "CycledViewProjectionB.retransform_module.fc_transform.0.bias", "CycledViewProjectionB.retransform_module.fc_transform.2.weight", "CycledViewProjectionB.retransform_module.fc_transform.2.bias", "CrossViewTransformerB.query_conv.weight", "CrossViewTransformerB.query_conv.bias", "CrossViewTransformerB.key_conv.weight", "CrossViewTransformerB.key_conv.bias", "CrossViewTransformerB.value_conv.weight", "CrossViewTransformerB.value_conv.bias", "CrossViewTransformerB.f_conv.weight", "CrossViewTransformerB.f_conv.bias", "CrossViewTransformerB.res_conv.weight", "CrossViewTransformerB.res_conv.bias", "CrossViewTransformerB.query_conv_depth.weight", "CrossViewTransformerB.query_conv_depth.bias", "CrossViewTransformerB.key_conv_depth.weight", "CrossViewTransformerB.key_conv_depth.bias", "CrossViewTransformerB.value_conv_depth.weight", "CrossViewTransformerB.value_conv_depth.bias", "CrossViewTransformerB.conv1.conv.weight", "CrossViewTransformerB.conv1.conv.bias", "CrossViewTransformerB.conv2.conv.weight", "CrossViewTransformerB.conv2.conv.bias", "LayoutDecoderB.decoder.0.weight", "LayoutDecoderB.decoder.0.bias", "LayoutDecoderB.decoder.1.weight", "LayoutDecoderB.decoder.1.bias", "LayoutDecoderB.decoder.1.running_mean", "LayoutDecoderB.decoder.1.running_var", "LayoutDecoderB.decoder.3.weight", "LayoutDecoderB.decoder.3.bias", "LayoutDecoderB.decoder.4.weight", "LayoutDecoderB.decoder.4.bias", "LayoutDecoderB.decoder.4.running_mean", "LayoutDecoderB.decoder.4.running_var", "LayoutDecoderB.decoder.5.weight", "LayoutDecoderB.decoder.5.bias", "LayoutDecoderB.decoder.6.weight", "LayoutDecoderB.decoder.6.bias", "LayoutDecoderB.decoder.6.running_mean", "LayoutDecoderB.decoder.6.running_var", "LayoutDecoderB.decoder.8.weight", "LayoutDecoderB.decoder.8.bias", "LayoutDecoderB.decoder.9.weight", "LayoutDecoderB.decoder.9.bias", "LayoutDecoderB.decoder.9.running_mean", "LayoutDecoderB.decoder.9.running_var", "LayoutDecoderB.decoder.10.weight", "LayoutDecoderB.decoder.10.bias", "LayoutDecoderB.decoder.11.weight", "LayoutDecoderB.decoder.11.bias", "LayoutDecoderB.decoder.11.running_mean", "LayoutDecoderB.decoder.11.running_var", "LayoutDecoderB.decoder.13.weight", "LayoutDecoderB.decoder.13.bias", "LayoutDecoderB.decoder.14.weight", "LayoutDecoderB.decoder.14.bias", "LayoutDecoderB.decoder.14.running_mean", "LayoutDecoderB.decoder.14.running_var", "LayoutDecoderB.decoder.15.weight", "LayoutDecoderB.decoder.15.bias", "LayoutDecoderB.decoder.16.weight", "LayoutDecoderB.decoder.16.bias", "LayoutDecoderB.decoder.16.running_mean", "LayoutDecoderB.decoder.16.running_var", "LayoutDecoderB.decoder.18.weight", "LayoutDecoderB.decoder.18.bias", "LayoutDecoderB.decoder.19.weight", "LayoutDecoderB.decoder.19.bias", "LayoutDecoderB.decoder.19.running_mean", "LayoutDecoderB.decoder.19.running_var", "LayoutDecoderB.decoder.20.weight", "LayoutDecoderB.decoder.20.bias", "LayoutDecoderB.decoder.21.weight", "LayoutDecoderB.decoder.21.bias", "LayoutDecoderB.decoder.21.running_mean", "LayoutDecoderB.decoder.21.running_var", "LayoutDecoderB.decoder.23.weight", "LayoutDecoderB.decoder.23.bias", "LayoutDecoderB.decoder.24.weight", "LayoutDecoderB.decoder.24.bias", "LayoutDecoderB.decoder.24.running_mean", "LayoutDecoderB.decoder.24.running_var", "LayoutDecoderB.decoder.25.conv.weight", "LayoutDecoderB.decoder.25.conv.bias", "LayoutTransformDecoderB.decoder.0.weight", "LayoutTransformDecoderB.decoder.0.bias", "LayoutTransformDecoderB.decoder.1.weight", "LayoutTransformDecoderB.decoder.1.bias", "LayoutTransformDecoderB.decoder.1.running_mean", "LayoutTransformDecoderB.decoder.1.running_var", "LayoutTransformDecoderB.decoder.3.weight", "LayoutTransformDecoderB.decoder.3.bias", "LayoutTransformDecoderB.decoder.4.weight", "LayoutTransformDecoderB.decoder.4.bias", "LayoutTransformDecoderB.decoder.4.running_mean", "LayoutTransformDecoderB.decoder.4.running_var", "LayoutTransformDecoderB.decoder.5.weight", "LayoutTransformDecoderB.decoder.5.bias", "LayoutTransformDecoderB.decoder.6.weight", "LayoutTransformDecoderB.decoder.6.bias", "LayoutTransformDecoderB.decoder.6.running_mean", "LayoutTransformDecoderB.decoder.6.running_var", "LayoutTransformDecoderB.decoder.8.weight", "LayoutTransformDecoderB.decoder.8.bias", "LayoutTransformDecoderB.decoder.9.weight", "LayoutTransformDecoderB.decoder.9.bias", "LayoutTransformDecoderB.decoder.9.running_mean", "LayoutTransformDecoderB.decoder.9.running_var", "LayoutTransformDecoderB.decoder.10.weight", "LayoutTransformDecoderB.decoder.10.bias", "LayoutTransformDecoderB.decoder.11.weight", "LayoutTransformDecoderB.decoder.11.bias", "LayoutTransformDecoderB.decoder.11.running_mean", "LayoutTransformDecoderB.decoder.11.running_var", "LayoutTransformDecoderB.decoder.13.weight", "LayoutTransformDecoderB.decoder.13.bias", "LayoutTransformDecoderB.decoder.14.weight", "LayoutTransformDecoderB.decoder.14.bias", "LayoutTransformDecoderB.decoder.14.running_mean", "LayoutTransformDecoderB.decoder.14.running_var", "LayoutTransformDecoderB.decoder.15.weight", "LayoutTransformDecoderB.decoder.15.bias", "LayoutTransformDecoderB.decoder.16.weight", "LayoutTransformDecoderB.decoder.16.bias", "LayoutTransformDecoderB.decoder.16.running_mean", "LayoutTransformDecoderB.decoder.16.running_var", "LayoutTransformDecoderB.decoder.18.weight", "LayoutTransformDecoderB.decoder.18.bias", "LayoutTransformDecoderB.decoder.19.weight", "LayoutTransformDecoderB.decoder.19.bias", "LayoutTransformDecoderB.decoder.19.running_mean", "LayoutTransformDecoderB.decoder.19.running_var", "LayoutTransformDecoderB.decoder.20.weight", "LayoutTransformDecoderB.decoder.20.bias", "LayoutTransformDecoderB.decoder.21.weight", "LayoutTransformDecoderB.decoder.21.bias", "LayoutTransformDecoderB.decoder.21.running_mean", "LayoutTransformDecoderB.decoder.21.running_var", "LayoutTransformDecoderB.decoder.23.weight", "LayoutTransformDecoderB.decoder.23.bias", "LayoutTransformDecoderB.decoder.24.weight", "LayoutTransformDecoderB.decoder.24.bias", "LayoutTransformDecoderB.decoder.24.running_mean", "LayoutTransformDecoderB.decoder.24.running_var", "LayoutTransformDecoderB.decoder.25.conv.weight", "LayoutTransformDecoderB.decoder.25.conv.bias".
JPerceiver/mono/model/__init__.py
Lines 2 to 4 in 802e511
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.