qianqianwang68 / omnimotion Goto Github PK
View Code? Open in Web Editor NEWLicense: Apache License 2.0
License: Apache License 2.0
This may be a silly question but I am looking to use this on videos. Yet the structure seems to be only .jpgs. I wrote these:
To downscale and then convert video to img's as that is the structure in the preprocessing read me. Does this make sense? Or should i be training on the videos?
`
import cv2
import os
from moviepy.editor import VideoFileClip
def downscale_video(video_path, output_path):
cap = cv2.VideoCapture(video_path)
fourcc = cv2.VideoWriter_fourcc(*"mp4v")
out = cv2.VideoWriter(output_path, fourcc, 30.0, (640, 480))
while True:
ret, frame = cap.read()
if not ret:
break
resized_frame = cv2.resize(frame, (640, 480))
out.write(resized_frame)
cap.release()
out.release()
def extract_60_frame_clips(video_path, output_dir, video_name):
with VideoFileClip(video_path) as clip:
duration = clip.duration
fps = clip.fps
n_frames = int(fps * duration)
for i in range(0, n_frames, 60):
start = i / fps
end = min((i + 60) / fps, duration)
short_clip = clip.subclip(start, end)
short_clip.write_videofile(f"{output_dir}/{video_name}_{i//60}.mp4")
input_dir = "input"
output_dir = "output_clips"
temp_dir = "temp"
if not os.path.exists(output_dir):
os.makedirs(output_dir)
if not os.path.exists(temp_dir):
os.makedirs(temp_dir)
for video_file in os.listdir(input_dir):
if video_file.endswith(".mov"):
video_path = os.path.join(input_dir, video_file)
video_name = os.path.splitext(video_file)[0]
temp_path = os.path.join(temp_dir, f"{video_name}_temp.mp4")
downscale_video(video_path, temp_path)
extract_60_frame_clips(temp_path, output_dir, video_name)
def extract_frames(video_path, output_folder, sequence_name):
cap = cv2.VideoCapture(video_path)
frame_count = 0
saved_frame_count = 0 # Counter for saved frames
while True:
ret, frame = cap.read()
if not ret:
break
if frame_count % 10 == 0: # Save only every 10th frame
frame_filename = os.path.join(
output_folder, f"{sequence_name}_{saved_frame_count:05d}.jpg"
)
cv2.imwrite(frame_filename, frame)
saved_frame_count += 1 # Increment the counter for saved frames
frame_count += 1 # Increment the overall frame counter
cap.release()
video_folder = "output_clips" # Your output directory containing MP4 clips
base_folder = "sequence_name" # Base folder for sequences
color_folder = os.path.join(base_folder, "color") # Folder to save JPEG frames
os.makedirs(color_folder, exist_ok=True)
for video_file in os.listdir(video_folder):
if video_file.endswith(".mp4"):
video_path = os.path.join(video_folder, video_file)
sequence_name = os.path.splitext(video_file)[0]
extract_frames(video_path, color_folder, sequence_name)
`
Is this proper practice for this? Sorry this is my first CNN! Thanks for any help
The given checkpoint miss the parameters for the affine_mlp model in the nvp_simplified.py
I have installed vision_transformer from this site:
https://github.com/google-research/vision_transformer/tree/main
but I still get the error:
ImportError: cannot import name 'RAFT' from 'raft' (/home/piotr/anaconda3/envs/omnimotion/lib/python3.8/site-packages/raft/init.py)
Traceback (most recent call last):
File "extract_dino_features.py", line 27, in
import vision_transformer as vits
ModuleNotFoundError: No module named 'vision_transformer'
Do you have any idea what did I wrong ?
I am trying to preprocess my own sequences.
Thanks for sharing this awesome job. How can we recover the transformation matrix between two points please, or if there's a way to recover numerically the trajectory of a point between the frames please.
Thanks in advance.
It seems that every image sequence has it's own weight, look forward you to improve this repo.
I am trying to train on some videos of Mosquitoes and am doing some preprocessing. I am running into
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 20.00 MiB (GPU 0; 12.00 GiB total capacity; 9.91 GiB already allocated; 0 bytes free; 11.27 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
I am on a 3080ti. Inside of config.py i reduced the default num of points to 20 and the chunk size to a measly 100. Yet still memory errors. Any suggestions? Ran nvidia-smi and have nothing else hogging Gpu. trying to squeeze down can't afford an A100!
This research looks amazing.
Could this be used to export Nuke-like SmartVectors to 32-bit EXRs?
Could it be used for camera matchmove?
Very excited for how this could be applied to visual effects.
Congratulations, thank you very much for your amazing work.
when I want to compute and process flow, I meet some problem. Here is the command I ran:
python main_processing.py --data_dir /home/omnimotion/data/01_0 --chain
This is the error message:
Traceback (most recent call last):
File "exhaustive_raft.py", line 18, in <module>
from raft import RAFT
ModuleNotFoundError: No module named 'raft'
Traceback (most recent call last):
File "extract_dino_features.py", line 26, in <module>
import utils
ModuleNotFoundError: No module named 'utils'
flitering raft optical flow for /home/omnimotion/data/01_0....
0%| | 0/54056 [00:00<?, ?it/s]Traceback (most recent call last):
File "filter_raft.py", line 125, in <module>
run_filtering(args)
File "filter_raft.py", line 50, in run_filtering
features = [torch.from_numpy(np.load(os.path.join(scene_dir, 'features', feature_name,
File "filter_raft.py", line 50, in <listcomp>
features = [torch.from_numpy(np.load(os.path.join(scene_dir, 'features', feature_name,
File "/home/anaconda3/envs/omnimotion/lib/python3.8/site-packages/numpy/lib/npyio.py", line 405, in load
fid = stack.enter_context(open(os_fspath(file), "rb"))
FileNotFoundError: [Errno 2] No such file or directory: '/home/omnimotion/data/01_0/features/dino/000.jpg.npy
This is the structure of the folder (/home/omnimotion/data/01_0 ):
Thanks very much
I'm excited for this code to be shared. When do you think that will be?
Amazing work - really enjoyed reading the paper and extremely intrigued by the results.
I noted the citation in the readme. Can we add a license to ensure proper usage of the code?
Thanks for all this work!
I have a lot of videos to process, so I would like to inquire about how to speed up the training? Reducing a certain level of performance is acceptable. Thank you a lot!
Again, congratulations, and thank you for the great work!
I'm also curious about how you created the demo videos, where the tracking trajectories seem consistent in the 3D canonical frame and move as the camera moves. This indicates that OmniMotion has a good understanding of the camera motion. However, the paper suggests that the camera motion is entangled with object motion, making it difficult to render the demo videos unless I misunderstood something.
Please correct my mistakes. Thank you very much.
Hello,
I am trying to reproduce OmniMotion results on TAP-Vid DAVIS. I preprocessed and trained the models using the default configs (except for using num_iters=200_000
). However, when evaluating the trained models I am getting d_avg=63.5%
, which is lower compared to 67.5% outlined in the paper. (Further elaboration of my training & evaluation process is described below).
Therefore I wanted to ask, do the default hyperparameters and configurations in the repo match reported model?
Also, I wanted to ask whether you have any code for evaluating OmniMotion on Tap-Vid? I had to write some code on my own (which I verified and fairly trust), but still I think that using your evaluation pipeline would still be reliable. :)
Thank you in advance!
Assaf
Hi there, see you trained on an A100. Only have access to a 3080ti. Should I not even bother? or just reduce the number of sampled points num_pts and the chunk size chunk_size?
I have a secondary question if there are any resources you can point to for getting this setup on a Windows machine. Not quite sure what I need to do exactly.
Thanks!
I found that the training operation in the introduction can only train a single video sequence, I want to batch video sequence sequence how to operate? Put all the data in the same path? eg: bear/color, butterfly/color to bear_butterfly/color? Eg: Bear /color, butterfly/color to bear_butterfly/color? like this?
Congratulations on achieving this great work! The demo and results are very impressive, and it has been a big hit! I really like the idea of using a quasi-3D representation and ignoring the ambiguities because they are not important to the problem.
I'm trying to understand Eq. 1 and 2 from the paper and can't understand why we use the same points in the source
In my understanding, if the points
So my question is, why are we computing
Thank you for your work! In the paper, you said "We train our representation on each video sequence with the Adam optimizer for 200k iterations". if the video has 100 frames, how long the task cost?
Hi,
Thanks for your excellent work!
I want to try the TAPNet correspondence method, but it seems that you dont release the loader code about it.
I wonder know whehter you have the plan to release the code? Or can you give some suggestions about how to build the TAPNet correspondence loader like what data I need to get and what preprocess i need to do?
Looking forward to your reply!
Yudi Shi
Frameworks such as Mediapipe or OpenPose are used to extract skeletal keypoints from images.
Unfortunately, the results are inconsistent and somewhat jittery when trying to extract poses from consecutive frames.
I propose a use case supported by omnimotion when released:
omnimotion
's tracking. If omnimotion
and mediapipe
diverge, fall back to the mediapipe pose and continue tracking from there.This idea, similarly to how MP4 files work, considers P-frames as gold, mediapipe
poses, and I-frames, as long as consistent, from omnipose
. When the data stored in the I-frame
is no longer consistent, introduce another P-frame.
Why is it a test-time optimization method? I do not find something is adopted in the test stage.
Hello,
I recently came across your project and I must say, it's an amazing piece of work. I was wondering if there are any plans to release the model weights and testing code as open source. This would be extremely helpful for those of us who would like to test the model's capabilities directly. With the weights and testing code available, we could simply input a video and see the results without having to train the model ourselves.
Thank you for your hard work and I look forward to your response.
Hello!
I made a modification to the default configuration as this comment(https://github.com/qianqianwang68/omnimotion/issues/37#issuecomment-1856324840),There is a discrepancy between the metrics evaluated by the trained checkpoints and those provided.Take the blackswan dataset as an example,this is the metric measured with the checkpoints provided:
This is the metric measured by the trained checkpoint:
In addition, the loss function of the training process also showed a downward trend and then an upward trend:
I would like to ask if there are any other configurations that need to be modified to present the results in the paper in addition to the changes involved in the above comment?
Could not find a version that satisfies the requirement torchvision==0.11.0+cu111 ,Could you please check it, thanks.
Are there any plans on releasing the pre-trained weights and biases of this model?
Hi,
Congratualate for the amazing work!
I wonder if it is possible to sample some frames and train these data when I hope to track the whole frames of a video sequence. or I need to train all the frames of a long video and cost a lot of time.
When running a command
python train.py --config configs/default.txt --data_dir {sequence_directory}
Even when using the processed data you provided, the following error will be reported
No ckpts found, from scratch...
Traceback (most recent call last):
File "train.py", line 105, in
train(args)
File "train.py", line 81, in train
for batch in data_loader:
File "/home/BHX/anaconda3/envs/omnimotion/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 521, in next
data = self._next_data()
File "/home/BHX/anaconda3/envs/omnimotion/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 1203, in _next_data
return self._process_data(data)
File "/home/BHX/anaconda3/envs/omnimotion/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 1229, in _process_data
data.reraise()
File "/home/BHX/anaconda3/envs/omnimotion/lib/python3.8/site-packages/torch/_utils.py", line 434, in reraise
raise exception
AttributeError: Caught AttributeError in DataLoader worker process 0.
Original Traceback (most recent call last):
File "/home/BHX/anaconda3/envs/omnimotion/lib/python3.8/site-packages/torch/utils/data/_utils/worker.py", line 287, in _worker_loop
data = fetcher.fetch(index)
File "/home/BHX/anaconda3/envs/omnimotion/lib/python3.8/site-packages/torch/utils/data/_utils/fetch.py", line 49, in fetch
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/home/BHX/anaconda3/envs/omnimotion/lib/python3.8/site-packages/torch/utils/data/_utils/fetch.py", line 49, in
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/home/BHX/omnimotion/omnimotion/loaders/raft.py", line 123, in getitem
count_map = imageio.imread(os.path.join(self.seq_dir, 'count_maps', img_name1.replace('.jpg', '.png')))
File "/home/BHX/anaconda3/envs/omnimotion/lib/python3.8/site-packages/imageio/init.py", line 97, in imread
return imread_v2(uri, format=format, **kwargs)
File "/home/BHX/anaconda3/envs/omnimotion/lib/python3.8/site-packages/imageio/v2.py", line 360, in imread
result = file.read(index=0, **kwargs)
File "/home/BHX/anaconda3/envs/omnimotion/lib/python3.8/site-packages/imageio/plugins/pillow.py", line 231, in read
image = self._apply_transforms(
File "/home/BHX/anaconda3/envs/omnimotion/lib/python3.8/site-packages/imageio/plugins/pillow.py", line 312, in _apply_transforms
image.mode = desired_mode
AttributeError: can't set attribute
This work is great. Thank you for posting the code.
I have encountered the following error when applying the code you have made public and training on my own dataset. I followed the default parameters you provided except for changing the dataset paths.
What confuses me is that there were no problems during training before the 100,000 steps.
Looking forward to your reply!Thank you.
Hi @qianqianwang68 ,
The results are so good! This tight correspondence can enable many interesting applications for video.
I was wondering if you plan to release the demo/inference code in the near future by any chance?
Hi, thanks for this amazing work.
I read the code in trainer.py
and I find a depth consistency loss method is defined but not used in training.
def compute_depth_consistency_loss(self, proj_depths, pred_depths, visibilities, normalize=True):
'''
:param proj_depths: [n_imgs, n_pts, 1]
:param pred_depths: [n_imgs, n_pts, 1]
:param visibilities: [n_imgs, n_pts, 1]
:return: depth loss
'''
if normalize:
mse_error = torch.mean((proj_depths - pred_depths) ** 2 * visibilities) / (torch.mean(visibilities) + 1e-6)
else:
mse_error = torch.mean((proj_depths - pred_depths) ** 2 * visibilities)
return mse_error
It seems that you tried to supervise the model with the consistency between the depth value mapped from frame
In addition, do you think it's possible to add some depth supervision terms to produce more reliable depth information instead of using pseudo-depth and will this enhance the performance?
Looking forward to your reply!
Hi guys,
Congrats on the great work!
I am interested in visualising the scene flow. Could you provide some guidelines on how can I achieve that easily?
Cheers.
Dears, when will you release the code and the model?
Hi, Fantastic paper. I believe its going to be a huge success. I am reading the paper and facing some issues reproduce your results on basic low speed movements.
I have reached out to you on LinkedIn as well.
With segment anything, we can cut out any masked object at any given frame. There is even already a project denoted Track Anything that applies Segment anything to video.
But it fails at motion tracking tasking with occlusion. Example here : https://github.com/gaomingqi/Track-Anything/blob/master/README.md
It fails at occluded pixels due to segment anything does not have pseudo depth embedded in the model. But Omnimotion has. It would be a perfect match for Ominimotion's latent space to work as a prior so that we can even show segmented masks for parts of objects that are partially occluded. This way, it enables more accurate image in painting, such as replacing motion capture actors with movie characters. But I don't think there is way to fine tune segment anything because it essentially lack the representation of Nerf.
Final question, do you know anywork in trakcing objects/points in nerf, like the Omnimotion in nerf?
hi, thank you for releasing this work code. I have a question about formula (2). For the points on the rays sampled in the camera space of frame I, when they are back-projected to frame j, they may not be in a straight line anymore, so why can pixel coordinates and corresponding color be obtained using the accumulation based on nerf?
Hi,
I have been running some test with omnimotion for a microgravity research project. Specifically we are interested in its ability to track falling particles. I took a sample video of larger balls falling to see how omnimotion performs and the results are not great. I tried the video slowed down and a regular speed. Do you know of any configuration changes that could produce better results for this type of video? I used the default configuration file.
https://drive.google.com/file/d/1yx-oDuu_4quwm0bf9t6tfwVE588TjItV/view?usp=sharing
https://drive.google.com/file/d/1WRzMdiCTwDibMqbDTsDdFDc4OHZQ6KHc/view?usp=sharing
Dear authors,
Thank you very much for your outstanding contributions in the field of object tracking!
I have a small question. Is the algorithm you proposed based on online object tracking or offline object tracking? Or does it support both?
Looking forward to your reply!
Thanks!
https://github.com/deepmind/tapnet#tapir-demos
Deepmind had a similar work using the same testing images in your work? How is your work different from deepmind fundamentally?
I am a hobbyist, so it's gratfull for you to spend time explaining briefly the differences, purpose, approach, and result wise.
Thanks!
Hi,
I see that you are applying a learned affine transformation for all (x,y)
in the same frame and depth before feeding (x,y,z)
into the deformation MLP.
I didn't find any information about it in the paper. Can you explain the motivation for having this transformation?
Hello! Thank you for sharing this code - omnimotion looks really promising and I would love to test it out with some of my microscope videos. But having tried installing it, it seems that I cannot run CUDA on my computer. Is it possible to run omnimotion with cpu only?
Hi, thanks for your excellent open-source repositories.
I'd like to confirm that when testing on tap-vid, all the data pre-processing processes such as extracting optical flow, dino features, etc. are done on 256x256 video frames. Is that right?
Your model can track on z axis as well.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.