Hi, I've run evaluation and get the inference time for get 1000 samples for 1 t

Also, did you use the fast inference setting on: <div class="Box Box--condense

Have you executed the model more than twice within a ? I believe the timing incl

Have you executed the model more than twice within a ? I believe th

Inference Time about nodeik HOT 16 OPEN

newuhe commented on July 1, 2024

Inference Time

from nodeik.

Comments (16)

psh117 commented on July 1, 2024 1

Also, I note that time cost is still 100ms+- with 100 samples.

This is due to batch inferences, so the inference time for fewer than 100 samples would be relatively consistent.

from nodeik.

cadop commented on July 1, 2024

Sorry it's been awhile so I will have to look into it and rerun on my hardware just to confirm.

Off the top of my head, that time is really large, can you make sure you are counting the time of inference for 1 solution and not the total time of multiple samples during trajectory planning?

from nodeik.

newuhe commented on July 1, 2024

Sorry it's been awhile so I will have to look into it and rerun on my hardware just to confirm.

Off the top of my head, that time is really large, can you make sure you are counting the time of inference for 1 solution and not the total time of multiple samples during trajectory planning?

Even though i set num_samples and num_referenes to 1 in evalutation_panda_urdf.py, the time for nodeik.inverse_kinematics(pose_sets) is still 0.7s.

from nodeik.

cadop commented on July 1, 2024

How are you timing this?

If you are getting 1.1s on 1000 samples, and 0.7s on 1 sample, it seems like including the time it takes to initialize and/or transfer to the GPU, not the time of inference.

maybe @psh117 can provide more context, otherwise i'll take a look this week (i have different hardware, OS, etc. since we did this paper).

from nodeik.

cadop commented on July 1, 2024

Also, did you use the fast inference setting on:

nodeik/examples/evaluation_panda_urdf.py

Line 29 in f510ff7

atol = 1e-5 # 1e-3 for fast inference, 1e-5 for accurate inference

from nodeik.

newuhe commented on July 1, 2024

How are you timing this?

If you are getting 1.1s on 1000 samples, and 0.7s on 1 sample, it seems like including the time it takes to initialize and/or transfer to the GPU, not the time of inference.

maybe @psh117 can provide more context, otherwise i'll take a look this week (i have different hardware, OS, etc. since we did this paper).

I just put time.time() before and after line 59 in model_wrapper..py: ik_q, delta_logp = self.model(x,c,zero, rev=True) and the time counted should be model inference time. Also, atol is set 1e-3 but the time cost is slightly reduced.

from nodeik.

cadop commented on July 1, 2024

Sorry for the difficulty in these timings. We will need Suhan to provide a definitive answer on how he benchmarked the GPU versions (nodeik and ikflow).

I think it might have been using wandb. Generally there are specific ways to benchmark like https://pytorch.org/tutorials/recipes/recipes/benchmark.html
https://deci.ai/blog/measure-inference-time-deep-neural-networks/

Thanks for raising this issue though. Once I get more clarity i'll add this info to the readme.

from nodeik.

psh117 commented on July 1, 2024

Have you executed the model more than twice within a script? I believe the timing includes the model initialization time, as mentioned by @cadop. Performing a dummy inference after creating model could improve the inference time.

from nodeik.

newuhe commented on July 1, 2024

Have you executed the model more than twice within a script? I believe the timing includes the model initialization time, as mentioned by @cadop. Performing a dummy inference after creating model could improve the inference time.

I've tried running 10 times within a script and the time is reduced to 0.5s for 1000 samples. However, larger than that in paper.

from nodeik.

psh117 commented on July 1, 2024

The inference time may vary, but 0.5 seconds seems excessively slow I think. Could you please confirm the version of torchdiffeq you are using? I conducted my testing with torchdiffeq==0.2.3.

I made some modifications to the evaluation script. Can you do the test using these changes?

    nodeik.eval()

    t_start = time.time()
    ik_q, _ = nodeik.inverse_kinematics(pose_sets)
    t_end = time.time()
    print('time:', (t_end - t_start)*1000, 'ms')

    t_start = time.time()
    ik_q, _ = nodeik.inverse_kinematics(pose_sets)
    t_end = time.time()
    print('time:', (t_end - t_start)*1000, 'ms')
    fk_sets = nodeik.forward_kinematics(ik_q)

And the result on my system (RTX 4080). atol=1e-3 and rtol=1e-3.

Warp initialized:
   Version: 0.2.0
   Using CUDA device: NVIDIA GeForce RTX 4080
   Using CPU compiler: /usr/bin/g++
0 panda_joint1
1 panda_joint2
2 panda_joint3
3 panda_joint4
4 panda_joint5
5 panda_joint6
6 panda_joint7
link_index {'panda_link0': 0, 'panda_link0_sc': 1, 'panda_link1_sc': 2, 'panda_link1': 3, 'panda_link2_sc': 4, 'panda_link2': 5, 'panda_link3_sc': 6, 'panda_link3': 7, 'panda_link4_sc': 8, 'panda_link4': 9, 'panda_link5_sc': 10, 'panda_link5': 11, 'panda_link6_sc': 12, 'panda_link6': 13, 'panda_link7_sc': 14, 'panda_link7': 15, 'panda_link8': 16, 'panda_hand': 17}
Lightning automatically upgraded your loaded checkpoint from v1.6.0 to v2.0.1. To apply the upgrade to your files permanently, run `python -m pytorch_lightning.utilities.upgrade_checkpoint --file ../model/panda_loss-20.ckpt`
Module warp.sim.articulation load took 1.23 ms
torch.Size([1024, 7]) torch.Size([1024, 7]) torch.Size([1024, 1])
time: 615.9012317657471 ms
time: 76.89261436462402 ms
mean position    error: 0.009137176
mean orientation error: 0.016926718011890746

from nodeik.

newuhe commented on July 1, 2024

My torchdiffeq is 0.2.3 and i tried the same as yours. here is the log:

Warp 0.7.2 initialized:
   CUDA Toolkit: 11.5, Driver: 12.0
   Devices:
     "cpu"    | x86_64
     "cuda:0" | Quadro T1000 (sm_75)
   Kernel cache: /home/.cache/warp/0.7.2
0 panda_joint1
1 panda_joint2
2 panda_joint3
3 panda_joint4
4 panda_joint5
5 panda_joint6
6 panda_joint7
link_index {'panda_link0': 0, 'panda_link0_sc': 1, 'panda_link1_sc': 2, 'panda_link1': 3, 'panda_link2_sc': 4, 'panda_link2': 5, 'panda_link3_sc': 6, 'panda_link3': 7, 'panda_link4_sc': 8, 'panda_link4': 9, 'panda_link5_sc': 10, 'panda_link5': 11, 'panda_link6_sc': 12, 'panda_link6': 13, 'panda_link7_sc': 14, 'panda_link7': 15, 'panda_link8': 16, 'panda_hand': 17}
Lightning automatically upgraded your loaded checkpoint from v1.6.0 to v1.9.5. To apply the upgrade to your files permanently, run `python -m pytorch_lightning.utilities.upgrade_checkpoint --file model/panda_loss-20.ckpt`
Module warp.sim.articulation load on device 'cpu' took 33.29 ms
torch.Size([1024, 7]) torch.Size([1024, 7]) torch.Size([1024, 1])
time: 1119.408369064331 ms
time: 581.2697410583496 ms
mean position    error: 0.0065971585
mean orientation error: 0.011576415242703343

Here is log in server:

Warp 0.7.2 initialized:
   CUDA Toolkit: 11.5, Driver: 12.0
   Devices:
     "cpu"    | x86_64
     "cuda:0" | Quadro RTX 8000 (sm_75)
   Kernel cache: /home/junfeng/.cache/warp/0.7.2
0 panda_joint1
1 panda_joint2
2 panda_joint3
3 panda_joint4
4 panda_joint5
5 panda_joint6
6 panda_joint7
link_index {'panda_link0': 0, 'panda_link0_sc': 1, 'panda_link1_sc': 2, 'panda_link1': 3, 'panda_link2_sc': 4, 'panda_link2': 5, 'panda_link3_sc': 6, 'panda_link3': 7, 'panda_link4_sc': 8, 'panda_link4': 9, 'panda_link5_sc': 10, 'panda_link5': 11, 'panda_link6_sc': 12, 'panda_link6': 13, 'panda_link7_sc': 14, 'panda_link7': 15, 'panda_link8': 16, 'panda_hand': 17}
Lightning automatically upgraded your loaded checkpoint from v1.6.0 to v1.9.5. To apply the upgrade to your files permanently, run `python -m pytorch_lightning.utilities.upgrade_checkpoint --file model/panda_loss-20.ckpt`
Module warp.sim.articulation load on device 'cpu' took 31.56 ms
torch.Size([1024, 7]) torch.Size([1024, 7]) torch.Size([1024, 1])
time: 654.2432308197021 ms
time: 169.31962966918945 ms
mean position    error: 0.006597163
mean orientation error: 0.011576218433427071

from nodeik.

newuhe commented on July 1, 2024

Also, I note that time cost is still 100ms+- with 100 samples. It seems not to be linear?

from nodeik.

psh117 commented on July 1, 2024

And for your results, I think it is somehow weird. Please check if there are sufficient GPU resources available for use, as other programs might be affecting the performance.

from nodeik.

newuhe commented on July 1, 2024

Also, I note that time cost is still 100ms+- with 100 samples.

This is due to batch inferences, so the inference time for fewer than 100 samples would be relatively consistent.

Could you please tell me time cost of 100 samples in your environment?

from nodeik.

psh117 commented on July 1, 2024

Also, I note that time cost is still 100ms+- with 100 samples.

This is due to batch inferences, so the inference time for fewer than 100 samples would be relatively consistent.

Could you please tell me time cost of 100 samples in your environment?

In my environment, it took about 68 ms with 100 samples.

time: 599.5767116546631 ms (first inference)
time: 67.74044036865234 ms (second inference)

from nodeik.

cadop commented on July 1, 2024

just random thought, are you sure its running on the GPU? the log says

Module warp.sim.articulation load on device 'cpu' took 31.56 ms

you can see @psh117 log does not have the statement it was loaded on cpu

from nodeik.

Inference Time about nodeik HOT 16 OPEN

Comments (16)

Related Issues (11)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent