Code Monkey home page Code Monkey logo

Comments (16)

psh117 avatar psh117 commented on June 16, 2024 1

Also, I note that time cost is still 100ms+- with 100 samples.

This is due to batch inferences, so the inference time for fewer than 100 samples would be relatively consistent.

from nodeik.

cadop avatar cadop commented on June 16, 2024

Sorry it's been awhile so I will have to look into it and rerun on my hardware just to confirm.

Off the top of my head, that time is really large, can you make sure you are counting the time of inference for 1 solution and not the total time of multiple samples during trajectory planning?

from nodeik.

newuhe avatar newuhe commented on June 16, 2024

Sorry it's been awhile so I will have to look into it and rerun on my hardware just to confirm.

Off the top of my head, that time is really large, can you make sure you are counting the time of inference for 1 solution and not the total time of multiple samples during trajectory planning?

Even though i set num_samples and num_referenes to 1 in evalutation_panda_urdf.py, the time for nodeik.inverse_kinematics(pose_sets) is still 0.7s.

from nodeik.

cadop avatar cadop commented on June 16, 2024

How are you timing this?

If you are getting 1.1s on 1000 samples, and 0.7s on 1 sample, it seems like including the time it takes to initialize and/or transfer to the GPU, not the time of inference.

maybe @psh117 can provide more context, otherwise i'll take a look this week (i have different hardware, OS, etc. since we did this paper).

from nodeik.

cadop avatar cadop commented on June 16, 2024

Also, did you use the fast inference setting on:

atol = 1e-5 # 1e-3 for fast inference, 1e-5 for accurate inference

from nodeik.

newuhe avatar newuhe commented on June 16, 2024

How are you timing this?

If you are getting 1.1s on 1000 samples, and 0.7s on 1 sample, it seems like including the time it takes to initialize and/or transfer to the GPU, not the time of inference.

maybe @psh117 can provide more context, otherwise i'll take a look this week (i have different hardware, OS, etc. since we did this paper).

I just put time.time() before and after line 59 in model_wrapper..py: ik_q, delta_logp = self.model(x,c,zero, rev=True) and the time counted should be model inference time. Also, atol is set 1e-3 but the time cost is slightly reduced.

from nodeik.

cadop avatar cadop commented on June 16, 2024

Sorry for the difficulty in these timings. We will need Suhan to provide a definitive answer on how he benchmarked the GPU versions (nodeik and ikflow).

I think it might have been using wandb. Generally there are specific ways to benchmark like https://pytorch.org/tutorials/recipes/recipes/benchmark.html
https://deci.ai/blog/measure-inference-time-deep-neural-networks/

Thanks for raising this issue though. Once I get more clarity i'll add this info to the readme.

from nodeik.

psh117 avatar psh117 commented on June 16, 2024

Have you executed the model more than twice within a script? I believe the timing includes the model initialization time, as mentioned by @cadop. Performing a dummy inference after creating model could improve the inference time.

from nodeik.

newuhe avatar newuhe commented on June 16, 2024

Have you executed the model more than twice within a script? I believe the timing includes the model initialization time, as mentioned by @cadop. Performing a dummy inference after creating model could improve the inference time.

I've tried running 10 times within a script and the time is reduced to 0.5s for 1000 samples. However, larger than that in paper.

from nodeik.

psh117 avatar psh117 commented on June 16, 2024

The inference time may vary, but 0.5 seconds seems excessively slow I think. Could you please confirm the version of torchdiffeq you are using? I conducted my testing with torchdiffeq==0.2.3.

I made some modifications to the evaluation script. Can you do the test using these changes?

    nodeik.eval()

    t_start = time.time()
    ik_q, _ = nodeik.inverse_kinematics(pose_sets)
    t_end = time.time()
    print('time:', (t_end - t_start)*1000, 'ms')

    t_start = time.time()
    ik_q, _ = nodeik.inverse_kinematics(pose_sets)
    t_end = time.time()
    print('time:', (t_end - t_start)*1000, 'ms')
    fk_sets = nodeik.forward_kinematics(ik_q)

And the result on my system (RTX 4080). atol=1e-3 and rtol=1e-3.

Warp initialized:
   Version: 0.2.0
   Using CUDA device: NVIDIA GeForce RTX 4080
   Using CPU compiler: /usr/bin/g++
0 panda_joint1
1 panda_joint2
2 panda_joint3
3 panda_joint4
4 panda_joint5
5 panda_joint6
6 panda_joint7
link_index {'panda_link0': 0, 'panda_link0_sc': 1, 'panda_link1_sc': 2, 'panda_link1': 3, 'panda_link2_sc': 4, 'panda_link2': 5, 'panda_link3_sc': 6, 'panda_link3': 7, 'panda_link4_sc': 8, 'panda_link4': 9, 'panda_link5_sc': 10, 'panda_link5': 11, 'panda_link6_sc': 12, 'panda_link6': 13, 'panda_link7_sc': 14, 'panda_link7': 15, 'panda_link8': 16, 'panda_hand': 17}
Lightning automatically upgraded your loaded checkpoint from v1.6.0 to v2.0.1. To apply the upgrade to your files permanently, run `python -m pytorch_lightning.utilities.upgrade_checkpoint --file ../model/panda_loss-20.ckpt`
Module warp.sim.articulation load took 1.23 ms
torch.Size([1024, 7]) torch.Size([1024, 7]) torch.Size([1024, 1])
time: 615.9012317657471 ms
time: 76.89261436462402 ms
mean position    error: 0.009137176
mean orientation error: 0.016926718011890746

from nodeik.

newuhe avatar newuhe commented on June 16, 2024

My torchdiffeq is 0.2.3 and i tried the same as yours. here is the log:

Warp 0.7.2 initialized:
   CUDA Toolkit: 11.5, Driver: 12.0
   Devices:
     "cpu"    | x86_64
     "cuda:0" | Quadro T1000 (sm_75)
   Kernel cache: /home/.cache/warp/0.7.2
0 panda_joint1
1 panda_joint2
2 panda_joint3
3 panda_joint4
4 panda_joint5
5 panda_joint6
6 panda_joint7
link_index {'panda_link0': 0, 'panda_link0_sc': 1, 'panda_link1_sc': 2, 'panda_link1': 3, 'panda_link2_sc': 4, 'panda_link2': 5, 'panda_link3_sc': 6, 'panda_link3': 7, 'panda_link4_sc': 8, 'panda_link4': 9, 'panda_link5_sc': 10, 'panda_link5': 11, 'panda_link6_sc': 12, 'panda_link6': 13, 'panda_link7_sc': 14, 'panda_link7': 15, 'panda_link8': 16, 'panda_hand': 17}
Lightning automatically upgraded your loaded checkpoint from v1.6.0 to v1.9.5. To apply the upgrade to your files permanently, run `python -m pytorch_lightning.utilities.upgrade_checkpoint --file model/panda_loss-20.ckpt`
Module warp.sim.articulation load on device 'cpu' took 33.29 ms
torch.Size([1024, 7]) torch.Size([1024, 7]) torch.Size([1024, 1])
time: 1119.408369064331 ms
time: 581.2697410583496 ms
mean position    error: 0.0065971585
mean orientation error: 0.011576415242703343

Here is log in server:

Warp 0.7.2 initialized:
   CUDA Toolkit: 11.5, Driver: 12.0
   Devices:
     "cpu"    | x86_64
     "cuda:0" | Quadro RTX 8000 (sm_75)
   Kernel cache: /home/junfeng/.cache/warp/0.7.2
0 panda_joint1
1 panda_joint2
2 panda_joint3
3 panda_joint4
4 panda_joint5
5 panda_joint6
6 panda_joint7
link_index {'panda_link0': 0, 'panda_link0_sc': 1, 'panda_link1_sc': 2, 'panda_link1': 3, 'panda_link2_sc': 4, 'panda_link2': 5, 'panda_link3_sc': 6, 'panda_link3': 7, 'panda_link4_sc': 8, 'panda_link4': 9, 'panda_link5_sc': 10, 'panda_link5': 11, 'panda_link6_sc': 12, 'panda_link6': 13, 'panda_link7_sc': 14, 'panda_link7': 15, 'panda_link8': 16, 'panda_hand': 17}
Lightning automatically upgraded your loaded checkpoint from v1.6.0 to v1.9.5. To apply the upgrade to your files permanently, run `python -m pytorch_lightning.utilities.upgrade_checkpoint --file model/panda_loss-20.ckpt`
Module warp.sim.articulation load on device 'cpu' took 31.56 ms
torch.Size([1024, 7]) torch.Size([1024, 7]) torch.Size([1024, 1])
time: 654.2432308197021 ms
time: 169.31962966918945 ms
mean position    error: 0.006597163
mean orientation error: 0.011576218433427071

from nodeik.

newuhe avatar newuhe commented on June 16, 2024

Also, I note that time cost is still 100ms+- with 100 samples. It seems not to be linear?

from nodeik.

psh117 avatar psh117 commented on June 16, 2024

And for your results, I think it is somehow weird. Please check if there are sufficient GPU resources available for use, as other programs might be affecting the performance.

from nodeik.

newuhe avatar newuhe commented on June 16, 2024

Also, I note that time cost is still 100ms+- with 100 samples.

This is due to batch inferences, so the inference time for fewer than 100 samples would be relatively consistent.

Could you please tell me time cost of 100 samples in your environment?

from nodeik.

psh117 avatar psh117 commented on June 16, 2024

Also, I note that time cost is still 100ms+- with 100 samples.

This is due to batch inferences, so the inference time for fewer than 100 samples would be relatively consistent.

Could you please tell me time cost of 100 samples in your environment?

In my environment, it took about 68 ms with 100 samples.

time: 599.5767116546631 ms (first inference)
time: 67.74044036865234 ms (second inference)

from nodeik.

cadop avatar cadop commented on June 16, 2024

just random thought, are you sure its running on the GPU? the log says

Module warp.sim.articulation load on device 'cpu' took 31.56 ms

you can see @psh117 log does not have the statement it was loaded on cpu

from nodeik.

Related Issues (11)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.