https://github.com/deepmind/t

How is it different from DeepMind's TapNet? about omnimotion HOT 3 OPEN

yhyu13 commented on August 16, 2024

How is it different from DeepMind's TapNet?

from omnimotion.

Comments (3)

qianqianwang68 commented on August 16, 2024 20

Thank you for your question! TAPIR is undoubtfully an amazing work. Our method and TAPIR are fundamentally different in the way they work and I believe they are complementary.

TAPIR and most tracking methods are feed-forward methods, but our method is a test time optimization-based method. TAPIR are trained on large amounts of video data, and when given a new video sequence at test time, it can be used to directly compute the raw tracking results for this video. Our method, on the other hand, is a test-time optimization method, which means our method needs to be optimized on each video separately (substantially slower!). To perform the optimization, our method takes the raw tracking results from existing methods as the noisy supervising signal. So methods like TAPIR provide input motion to our system, and our method can reconcile and complete the possibly noisy and inconsistent motion to get a global motion representation for the video. With better input motion, the results of our method will also likely get better. And as mentioned on TAPIR's webpage, our method "could potentially be used on top of TAPIR tracks to further improve performance." Note that TAPIR achieves much better tracking accuracy on the TAP-Vid benchmark than OmniMotion optimized with input motion data from RAFT.
Some other differences include: our method produces a compact representation of the motion of the entire video; our optimized motion tends to be more temporally coherent; our method can provide plausible locations for points when they are occluded; our method provides pseudo-3D reconstructions.

Lastly, in my opinion, we need both generalizable methods like TAPIR which learns very useful priors from data, and test-time optimization methods like ours that can take the noisy motion data and refine them for a particular video sequence for better quality and coherence.

from omnimotion.

dongxinyu1030 commented on August 16, 2024

To perform the optimization, our method takes the raw tracking results from existing methods as the noisy supervising signal.

For this step : "To perform the optimization, our method takes the raw tracking results from existing methods as the noisy supervising signal." Do your method need trajectories across all video frames or just frames before the current time t?

from omnimotion.

qianqianwang68 commented on August 16, 2024

It takes the trajectories across all video frames.

from omnimotion.

Recommend Projects

How is it different from DeepMind's TapNet? about omnimotion HOT 3 OPEN

Comments (3)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent