qianqianwang68 / omnimotion Goto Github PK

View Code? Open in Web Editor NEW

2.1K 127.0 118.0 14 KB

License: Apache License 2.0

Python 100.00%

omnimotion's People

Contributors

Stargazers

Watchers

Forkers

jianglingxin rahulsinghalchicago matthewgard1 huangongshu zhangxinkang cjh88888 erickkill jjhw fgxhxx domigome standardgalactic ashiquebiniqbal 0403bettie zxy126 otherbackup liaodaoqing4399 linhuaiyi deyh2020 wf1024966 mzheng0927 reeshark autogyro algorithmlover2016 lkku1 ramiromagno gasteizko user01 andreemic lijiejue-grad ibtehajali67 naosi kustomzone curie1826 iff-0303 rodrigoxrma zussini xiaoyxue siyandong astridgcn baishibona papercatnku wangshiguanzhang renaissnance 3335083534 ys830 scilover junyaoshi peterzs whuhxb steven-xiong lida901210 vonhartz jetsql healthonrails hyh21521038 xuqian6078 kevinlee752 davorjordacevic songyang86 xiaojake firestonelib theprash007 joranwang youngjackson floralzhao hanzi326 limalkasadith gy20073 zhanghm1995 catherineyun booturbo furmanlukasz sea-comet romakoks hongshen-z induwarasenadheera someoneserge psandovalsegura wangshaobobetter llf10811020205 markd88 kristianmk p-koenigstein dixiyao kevinkai02 weibinqiu kl-2 sureddykavyareddy varun30032002 agirbau jayvaghasiya aseahu chsengni songxuyao liuguoyou sfen779 wtadota rouai fanghaipeng obeoneji

omnimotion's Issues

Best Practices JPG

This may be a silly question but I am looking to use this on videos. Yet the structure seems to be only .jpgs. I wrote these:

To downscale and then convert video to img's as that is the structure in the preprocessing read me. Does this make sense? Or should i be training on the videos?

`
import cv2
import os
from moviepy.editor import VideoFileClip

def downscale_video(video_path, output_path):
cap = cv2.VideoCapture(video_path)
fourcc = cv2.VideoWriter_fourcc(*"mp4v")
out = cv2.VideoWriter(output_path, fourcc, 30.0, (640, 480))

while True:
    ret, frame = cap.read()
    if not ret:
        break
    resized_frame = cv2.resize(frame, (640, 480))
    out.write(resized_frame)

cap.release()
out.release()

def extract_60_frame_clips(video_path, output_dir, video_name):
with VideoFileClip(video_path) as clip:
duration = clip.duration
fps = clip.fps
n_frames = int(fps * duration)

    for i in range(0, n_frames, 60):
        start = i / fps
        end = min((i + 60) / fps, duration)
        short_clip = clip.subclip(start, end)
        short_clip.write_videofile(f"{output_dir}/{video_name}_{i//60}.mp4")

input_dir = "input"
output_dir = "output_clips"
temp_dir = "temp"

if not os.path.exists(output_dir):
os.makedirs(output_dir)
if not os.path.exists(temp_dir):
os.makedirs(temp_dir)

for video_file in os.listdir(input_dir):
if video_file.endswith(".mov"):
video_path = os.path.join(input_dir, video_file)
video_name = os.path.splitext(video_file)[0]
temp_path = os.path.join(temp_dir, f"{video_name}_temp.mp4")

    downscale_video(video_path, temp_path)
    extract_60_frame_clips(temp_path, output_dir, video_name)

def extract_frames(video_path, output_folder, sequence_name):
cap = cv2.VideoCapture(video_path)
frame_count = 0
saved_frame_count = 0 # Counter for saved frames

while True:
    ret, frame = cap.read()
    if not ret:
        break

    if frame_count % 10 == 0:  # Save only every 10th frame
        frame_filename = os.path.join(
            output_folder, f"{sequence_name}_{saved_frame_count:05d}.jpg"
        )
        cv2.imwrite(frame_filename, frame)
        saved_frame_count += 1  # Increment the counter for saved frames

    frame_count += 1  # Increment the overall frame counter

cap.release()

video_folder = "output_clips" # Your output directory containing MP4 clips
base_folder = "sequence_name" # Base folder for sequences

color_folder = os.path.join(base_folder, "color") # Folder to save JPEG frames

os.makedirs(color_folder, exist_ok=True)

for video_file in os.listdir(video_folder):
if video_file.endswith(".mp4"):
video_path = os.path.join(video_folder, video_file)
sequence_name = os.path.splitext(video_file)[0]

    extract_frames(video_path, color_folder, sequence_name)

Is this proper practice for this? Sorry this is my first CNN! Thanks for any help

The given checkpoint do not match all the model, and it's hard to reproduce the result

The given checkpoint miss the parameters for the affine_mlp model in the nvp_simplified.py

ModuleNotFoundError: No module named 'vision_transformer'

I have installed vision_transformer from this site:
https://github.com/google-research/vision_transformer/tree/main

but I still get the error:
ImportError: cannot import name 'RAFT' from 'raft' (/home/piotr/anaconda3/envs/omnimotion/lib/python3.8/site-packages/raft/init.py)
Traceback (most recent call last):
File "extract_dino_features.py", line 27, in
import vision_transformer as vits
ModuleNotFoundError: No module named 'vision_transformer'

Do you have any idea what did I wrong ?
I am trying to preprocess my own sequences.

Transformation matrix

Thanks for sharing this awesome job. How can we recover the transformation matrix between two points please, or if there's a way to recover numerically the trajectory of a point between the frames please.
Thanks in advance.

Specific weight for each sequence

It seems that every image sequence has it's own weight, look forward you to improve this repo.

Reducing CUDA Memory

I am trying to train on some videos of Mosquitoes and am doing some preprocessing. I am running into

torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 20.00 MiB (GPU 0; 12.00 GiB total capacity; 9.91 GiB already allocated; 0 bytes free; 11.27 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

I am on a 3080ti. Inside of config.py i reduced the default num of points to 20 and the chunk size to a measly 100. Yet still memory errors. Any suggestions? Ran nvidia-smi and have nothing else hogging Gpu. trying to squeeze down can't afford an A100!

A question about train loss

When I trained on my own dataset, I found that the loss function did not converge, have you encountered this problem?

Applications in VFX

This research looks amazing.

Could this be used to export Nuke-like SmartVectors to 32-bit EXRs?

Could it be used for camera matchmove?

Very excited for how this could be applied to visual effects.

some problem in Computing and processing flow

Congratulations, thank you very much for your amazing work.
when I want to compute and process flow, I meet some problem. Here is the command I ran:

python main_processing.py --data_dir /home/omnimotion/data/01_0 --chain

This is the error message:

Traceback (most recent call last):
  File "exhaustive_raft.py", line 18, in <module>
    from raft import RAFT
ModuleNotFoundError: No module named 'raft'
Traceback (most recent call last):
  File "extract_dino_features.py", line 26, in <module>
    import utils
ModuleNotFoundError: No module named 'utils'
flitering raft optical flow for /home/omnimotion/data/01_0....
  0%|                                                                       | 0/54056 [00:00<?, ?it/s]Traceback (most recent call last):
  File "filter_raft.py", line 125, in <module>
    run_filtering(args)
  File "filter_raft.py", line 50, in run_filtering
    features = [torch.from_numpy(np.load(os.path.join(scene_dir, 'features', feature_name,
  File "filter_raft.py", line 50, in <listcomp>
    features = [torch.from_numpy(np.load(os.path.join(scene_dir, 'features', feature_name,
  File "/home/anaconda3/envs/omnimotion/lib/python3.8/site-packages/numpy/lib/npyio.py", line 405, in load
    fid = stack.enter_context(open(os_fspath(file), "rb"))
FileNotFoundError: [Errno 2] No such file or directory: '/home/omnimotion/data/01_0/features/dino/000.jpg.npy

This is the structure of the folder (/home/omnimotion/data/01_0 )：

Thanks very much

Code Coming Soon!

I'm excited for this code to be shared. When do you think that will be?

Does it have to be trained and optimized for every new video?

A question about blending weight.

Why x_i and x_j can share the same blending weights? The depth changes when point k motion from frame i to frame j.

Code License

Amazing work - really enjoyed reading the paper and extremely intrigued by the results.

I noted the citation in the readme. Can we add a license to ensure proper usage of the code?

Thanks for all this work!

How to accelerate training speed?

I have a lot of videos to process, so I would like to inquire about how to speed up the training? Reducing a certain level of performance is acceptable. Thank you a lot!

One question about the bijections $\tau_{i}$

How to create demo video

Again, congratulations, and thank you for the great work!

I'm also curious about how you created the demo videos, where the tracking trajectories seem consistent in the 3D canonical frame and move as the camera moves. This indicates that OmniMotion has a good understanding of the camera motion. However, the paper suggests that the camera motion is entangled with object motion, making it difficult to render the demo videos unless I misunderstood something.

Please correct my mistakes. Thank you very much.

how to use args.distributed

Training and evaluating model on TAP-Vid DAVIS produces different results

Hello,
I am trying to reproduce OmniMotion results on TAP-Vid DAVIS. I preprocessed and trained the models using the default configs (except for using num_iters=200_000). However, when evaluating the trained models I am getting d_avg=63.5%, which is lower compared to 67.5% outlined in the paper. (Further elaboration of my training & evaluation process is described below).

Therefore I wanted to ask, do the default hyperparameters and configurations in the repo match reported model?
Also, I wanted to ask whether you have any code for evaluating OmniMotion on Tap-Vid? I had to write some code on my own (which I verified and fairly trust), but still I think that using your evaluation pipeline would still be reliable. :)

Thank you in advance!
Assaf

Set up for 3080ti

Hi there, see you trained on an A100. Only have access to a 3080ti. Should I not even bother? or just reduce the number of sampled points num_pts and the chunk size chunk_size?

I have a secondary question if there are any resources you can point to for getting this setup on a Windows machine. Not quite sure what I need to do exactly.

Thanks!

Can multiple data be trained?

I found that the training operation in the introduction can only train a single video sequence, I want to batch video sequence sequence how to operate? Put all the data in the same path? eg: bear/color, butterfly/color to bear_butterfly/color? Eg: Bear /color, butterfly/color to bear_butterfly/color? like this?

How is each frame's local volume Li generated?

Understanding Eq. 1 and 2

Congratulations on achieving this great work! The demo and results are very impressive, and it has been a big hit! I really like the idea of using a quasi-3D representation and ignoring the ambiguities because they are not important to the problem.

I'm trying to understand Eq. 1 and 2 from the paper and can't understand why we use the same points in the source $x_i^k$ and target frame $x_j^k=\mathcal{T}_j^{-1}\circ\mathcal{T}_i(x_i^k)$ and hope I can get some clarifications.

In my understanding, if the points $x_j^k$ are the same points as $x_i^k$ in the canonical frame, then the occlusion relationship would not change across frames as the camera ray still passes through the same set of points in the same order. Since $\sigma_k$ is stored in $G$ and does not change across frames, I don't understand why OmniMotion can handle occlusions.

So my question is, why are we computing $x_j^k$ as $\mathcal{T}_j^{-1}\circ\mathcal{T}_i(x_i^k)$ instead of sampling from a new ray in $j$-th frame and map that to the same canonical space? Why does the model work so well despite $M_\theta$ cannot change the occlusion relationship?

How long does one training task cost?

Thank you for your work! In the paper, you said "We train our representation on each video sequence with the Adam optimizer for 200k iterations". if the video has 100 frames, how long the task cost?

The TAPNet loader Module

Hi,
Thanks for your excellent work!
I want to try the TAPNet correspondence method, but it seems that you dont release the loader code about it.
I wonder know whehter you have the plan to release the code? Or can you give some suggestions about how to build the TAPNet correspondence loader like what data I need to get and what preprocess i need to do?
Looking forward to your reply!
Yudi Shi

Use Case: Temporally Coherent Pose Estimation

Frameworks such as Mediapipe or OpenPose are used to extract skeletal keypoints from images.
Unfortunately, the results are inconsistent and somewhat jittery when trying to extract poses from consecutive frames.

I propose a use case supported by omnimotion when released:

Extract poses for an initial frame using mediapipe. Perhaps even for the whole video.
Track the keypoints across frames. Prefer omnimotion's tracking. If omnimotion and mediapipe diverge, fall back to the mediapipe pose and continue tracking from there.

This idea, similarly to how MP4 files work, considers P-frames as gold, mediapipe poses, and I-frames, as long as consistent, from omnipose. When the data stored in the I-frame is no longer consistent, introduce another P-frame.

test time optimization

Why is it a test-time optimization method? I do not find something is adopted in the test stage.

Will the model weights and testing code be open-sourced

Hello,

I recently came across your project and I must say, it's an amazing piece of work. I was wondering if there are any plans to release the model weights and testing code as open source. This would be extremely helpful for those of us who would like to test the model's capabilities directly. With the weights and testing code available, we could simply input a video and see the results without having to train the model ourselves.

Thank you for your hard work and I look forward to your response.

Evaluate the trained checkpoints and the provided checkpoints, and the results of the metrics are inconsistent

Hello！
I made a modification to the default configuration as this comment（https://github.com/qianqianwang68/omnimotion/issues/37#issuecomment-1856324840）,There is a discrepancy between the metrics evaluated by the trained checkpoints and those provided.Take the blackswan dataset as an example，this is the metric measured with the checkpoints provided：

This is the metric measured by the trained checkpoint：

In addition, the loss function of the training process also showed a downward trend and then an upward trend：

I would like to ask if there are any other configurations that need to be modified to present the results in the paper in addition to the changes involved in the above comment？

preparation of custom data set

Installation problems

Could not find a version that satisfies the requirement torchvision==0.11.0+cu111 ,Could you please check it, thanks.

Are we going to see pretrained W&B?

Are there any plans on releasing the pre-trained weights and biases of this model?

Train all frames or sample some?

Hi,

Congratualate for the amazing work!

I wonder if it is possible to sample some frames and train these data when I hope to track the whole frames of a video sequence. or I need to train all the frames of a long video and cost a lot of time.

Reporting mistakes during training

When running a command
python train.py --config configs/default.txt --data_dir {sequence_directory}
Even when using the processed data you provided, the following error will be reported
No ckpts found, from scratch...
Traceback (most recent call last):
File "train.py", line 105, in
train(args)
File "train.py", line 81, in train
for batch in data_loader:
File "/home/BHX/anaconda3/envs/omnimotion/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 521, in next
data = self._next_data()
File "/home/BHX/anaconda3/envs/omnimotion/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 1203, in _next_data
return self._process_data(data)
File "/home/BHX/anaconda3/envs/omnimotion/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 1229, in _process_data
data.reraise()
File "/home/BHX/anaconda3/envs/omnimotion/lib/python3.8/site-packages/torch/_utils.py", line 434, in reraise
raise exception
AttributeError: Caught AttributeError in DataLoader worker process 0.
Original Traceback (most recent call last):
File "/home/BHX/anaconda3/envs/omnimotion/lib/python3.8/site-packages/torch/utils/data/_utils/worker.py", line 287, in _worker_loop
data = fetcher.fetch(index)
File "/home/BHX/anaconda3/envs/omnimotion/lib/python3.8/site-packages/torch/utils/data/_utils/fetch.py", line 49, in fetch
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/home/BHX/anaconda3/envs/omnimotion/lib/python3.8/site-packages/torch/utils/data/_utils/fetch.py", line 49, in
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/home/BHX/omnimotion/omnimotion/loaders/raft.py", line 123, in getitem
count_map = imageio.imread(os.path.join(self.seq_dir, 'count_maps', img_name1.replace('.jpg', '.png')))
File "/home/BHX/anaconda3/envs/omnimotion/lib/python3.8/site-packages/imageio/init.py", line 97, in imread
return imread_v2(uri, format=format, **kwargs)
File "/home/BHX/anaconda3/envs/omnimotion/lib/python3.8/site-packages/imageio/v2.py", line 360, in imread
result = file.read(index=0, **kwargs)
File "/home/BHX/anaconda3/envs/omnimotion/lib/python3.8/site-packages/imageio/plugins/pillow.py", line 231, in read
image = self._apply_transforms(
File "/home/BHX/anaconda3/envs/omnimotion/lib/python3.8/site-packages/imageio/plugins/pillow.py", line 312, in _apply_transforms
image.mode = desired_mode
AttributeError: can't set attribute

A question about training

This work is great. Thank you for posting the code.
I have encountered the following error when applying the code you have made public and training on my own dataset. I followed the default parameters you provided except for changing the dataset paths.

What confuses me is that there were no problems during training before the 100,000 steps.

Looking forward to your reply！Thank you.

Amazing results! Looking for the demo

Hi @qianqianwang68 ,

The results are so good! This tight correspondence can enable many interesting applications for video.

I was wondering if you plan to release the demo/inference code in the near future by any chance?

Question about the depth consistency loss

Hi, thanks for this amazing work.
I read the code in trainer.py and I find a depth consistency loss method is defined but not used in training.

    def compute_depth_consistency_loss(self, proj_depths, pred_depths, visibilities, normalize=True):
        '''
        :param proj_depths: [n_imgs, n_pts, 1]
        :param pred_depths: [n_imgs, n_pts, 1]
        :param visibilities: [n_imgs, n_pts, 1]
        :return: depth loss
        '''
        if normalize:
            mse_error = torch.mean((proj_depths - pred_depths) ** 2 * visibilities) / (torch.mean(visibilities) + 1e-6)
        else:
            mse_error = torch.mean((proj_depths - pred_depths) ** 2 * visibilities)
        return mse_error

It seems that you tried to supervise the model with the consistency between the depth value mapped from frame $i$ to frame $j$ and the depth value predicted in frame $j$. I wonder why you eventually abandon this loss in training.

In addition, do you think it's possible to add some depth supervision terms to produce more reliable depth information instead of using pseudo-depth and will this enhance the performance?

Looking forward to your reply!

visualisation of scene flow

Hi guys,

Congrats on the great work!
I am interested in visualising the scene flow. Could you provide some guidelines on how can I achieve that easily?

Cheers.

Dears, when will you release the code and the model?

Failing to reproduce. Is there an ETA on submission of code?

Hi, Fantastic paper. I believe its going to be a huge success. I am reading the paper and facing some issues reproduce your results on basic low speed movements.

Can you maybe help me with the preprocessing steps of the video?
any additional methods used to train the model that are not mentioned in the paper?

I have reached out to you on LinkedIn as well.

Easily integrated with Segment anything?

With segment anything, we can cut out any masked object at any given frame. There is even already a project denoted Track Anything that applies Segment anything to video.

But it fails at motion tracking tasking with occlusion. Example here : https://github.com/gaomingqi/Track-Anything/blob/master/README.md

It fails at occluded pixels due to segment anything does not have pseudo depth embedded in the model. But Omnimotion has. It would be a perfect match for Ominimotion's latent space to work as a prior so that we can even show segmented masks for parts of objects that are partially occluded. This way, it enables more accurate image in painting, such as replacing motion capture actors with movie characters. But I don't think there is way to fine tune segment anything because it essentially lack the representation of Nerf.

Final question, do you know anywork in trakcing objects/points in nerf, like the Omnimotion in nerf?

CUDA out of memory when running the train.py

This computer has 10 GPU and still out of memory after enabling distributed training
I h set args.distributed=1, and
os. environ[' RANK'] = '0'
os. environ[' WORLD_SIZE'] = '1'ave
os. environ[' MASTER ADDR'] = 'Locathost '
os. environ[' MASTER_PORT'] = '7356'

Hello! This is a question about how to perform online operations after training is complete.

Congratulations! The incredible achievement you have made is truly remarkable. After running the code you provided, I have a question. Could you please guide me on how to perform online testing locally, similar to the swing demo on your homepage where punctuation tests can be conducted directly?

question about formula (2)

hi, thank you for releasing this work code. I have a question about formula (2). For the points on the rays sampled in the camera space of frame I, when they are back-projected to frame j, they may not be in a straight line anymore, so why can pixel coordinates and corresponding color be obtained using the accumulation based on nerf？

Particle Tracking Results

Hi,

I have been running some test with omnimotion for a microgravity research project. Specifically we are interested in its ability to track falling particles. I took a sample video of larger balls falling to see how omnimotion performs and the results are not great. I tried the video slowed down and a regular speed. Do you know of any configuration changes that could produce better results for this type of video? I used the default configuration file.

https://drive.google.com/file/d/1yx-oDuu_4quwm0bf9t6tfwVE588TjItV/view?usp=sharing
https://drive.google.com/file/d/1WRzMdiCTwDibMqbDTsDdFDc4OHZQ6KHc/view?usp=sharing

Online or Offline?

Dear authors,

Thank you very much for your outstanding contributions in the field of object tracking!

I have a small question. Is the algorithm you proposed based on online object tracking or offline object tracking? Or does it support both?

Looking forward to your reply!

Thanks!

How is it different from DeepMind's TapNet?

https://github.com/deepmind/tapnet#tapir-demos

Deepmind had a similar work using the same testing images in your work? How is your work different from deepmind fundamentally?

I am a hobbyist, so it's gratfull for you to spend time explaining briefly the differences, purpose, approach, and result wise.

Thanks!

Affine Transformation

Hi,

I see that you are applying a learned affine transformation for all (x,y) in the same frame and depth before feeding (x,y,z) into the deformation MLP.

I didn't find any information about it in the paper. Can you explain the motivation for having this transformation?

installing and running omnimotion with only cpu

Hello! Thank you for sharing this code - omnimotion looks really promising and I would love to test it out with some of my microscope videos. But having tried installing it, it seems that I cannot run CUDA on my computer. Is it possible to run omnimotion with cpu only?

the frame resolution when evaluating on TAP-Vid

Hi, thanks for your excellent open-source repositories.
I'd like to confirm that when testing on tap-vid, all the data pre-processing processes such as extracting optical flow, dino features, etc. are done on 256x256 video frames. Is that right?

3D Information

Your model can track on z axis as well.

qianqianwang68 / omnimotion Goto Github PK

omnimotion's People

Contributors

Stargazers

Watchers

Forkers

omnimotion's Issues

Recommend Projects

Recommend Topics

Recommend Org