nihaomiao / cvpr23_lfdm Goto Github PK
View Code? Open in Web Editor NEWThe pytorch implementation of our CVPR 2023 paper "Conditional Image-to-Video Generation with Latent Flow Diffusion Models"
License: BSD 2-Clause "Simplified" License
The pytorch implementation of our CVPR 2023 paper "Conditional Image-to-Video Generation with Latent Flow Diffusion Models"
License: BSD 2-Clause "Simplified" License
@nihaomiao
hello!
When I run python demo/demo_mug.py, the process does not respond.
can you provied the baidu yunpan or google yunpan url of dataset (include image) and model when training ?
Hi haomiao,
I tried to train the dm on MHAD Dataset using python -u DM/train_video_flow_diffusion_mhad.py
and the released LFAE_MHAD.pth
model. However, after about 40000 times iteration, the outputs in SAMPLE_DIR still seems really strange, expecially the third and fourth columns(sample_out_img
and fake_grid
). Could you please help me figure out whether it's a normal process? By the way, I am not very clear about the significance to compare both the generated["prediction"]
(out_vid) and generated["deformed"]
(warped_vid). Could you please give me some instructions? Thank you!
Thanks a lot
I used the pre-trained model LFAE you provided to run python -u LFAE/test_flowautoenc_natops.py
which shows RuntimeError: PytorchStreamReader failed reading zip archive: failed finding central directory
TOKENIZER = torch.hub.load('huggingface/pytorch-transformers', 'tokenizer', 'bert-base-cased') meet errors :
File "/usr/lib/python3.8/importlib/metadata.py", line 169, in from_name
raise PackageNotFoundError(name)
importlib.metadata.PackageNotFoundError: regex
File "/root/.cache/torch/hub/huggingface_pytorch-transformers_main/src/transformers/utils/versions.py", line 104, in require_version
raise importlib.metadata.PackageNotFoundError(
importlib.metadata.PackageNotFoundError: The 'regex!=2019.12.17' distribution was not found and is required by this application.
Try: pip install transformers -U or pip install -e '.[dev]' if you're working with git main
I got error when I tried to demo
python demo/demo_mug.py
should i change some codes ?
def get_tokenizer():
global TOKENIZER
if not exists(TOKENIZER):
TOKENIZER = torch.hub.load('huggingface/pytorch-transformers', 'tokenizer', 'bert-base-cased')
return TOKENIZER
error message is
File "/home/aaa/anaconda3/envs/py38_lfdm/lib/python3.8/site-packages/torch/hub.py", line 362, in load
repo_or_dir = _get_cache_or_reload(repo_or_dir, force_reload, verbose)
File "/home/aaa/anaconda3/envs/py38_lfdm/lib/python3.8/site-packages/torch/hub.py", line 162, in _get_cache_or_reload
_validate_not_a_forked_repo(repo_owner, repo_name, branch)
File "/home/aaa/anaconda3/envs/py38_lfdm/lib/python3.8/site-packages/torch/hub.py", line 124, in _validate_not_a_forked_repo
with urlopen(url) as r:
File "/home/aaa/anaconda3/envs/py38_lfdm/lib/python3.8/urllib/request.py", line 222, in urlopen
return opener.open(url, data, timeout)
File "/home/aaa/anaconda3/envs/py38_lfdm/lib/python3.8/urllib/request.py", line 531, in open
response = meth(req, response)
File "/home/aaa/anaconda3/envs/py38_lfdm/lib/python3.8/urllib/request.py", line 640, in http_response
response = self.parent.error(
File "/home/aaa/anaconda3/envs/py38_lfdm/lib/python3.8/urllib/request.py", line 569, in error
return self._call_chain(*args)
File "/home/aaa/anaconda3/envs/py38_lfdm/lib/python3.8/urllib/request.py", line 502, in _call_chain
result = func(*args)
File "/home/aaa/anaconda3/envs/py38_lfdm/lib/python3.8/urllib/request.py", line 649, in http_error_default
raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 403: rate limit exceeded
Hi @nihaomiao, thanks for open-sourcing the awesome work!
I have a few doubts about the model names in the code.
In the script run_natops.py
, what do Generator
, RegionPredictor
and BGMotionPredictor
mean and are used for?
I am a bit confused because you denote the models as encoder
, decoder
and flow predictor
in your paper.
So, what are the correspondences of them?
1.What are the three losses in the first stage?
2.The generator model in this parameter seems to contain three trainable networks in one stage, is it correct?
3.What is the purpose of adding x0 in the second stage, I did not find it in the paper
4.I am not very clear about the division of test sets and datasets
可以分享一下MUG数据集吗,我去官网发的邮件没有回复。
Can you share the MUG data set, I went to the official website and sent an email without a reply.
like this demo
https://twitter.com/pika_labs/status/1678892871670464513
hello!
When I run the demo_mug.py,it display:
Traceback (most recent call last):
File "demo/demo_mug.py", line 13, in
from misc import Logger, grid2fig, conf2fig
ModuleNotFoundError: No module named 'misc'
HI!
In codes:
real_out_img_list.append(generated["prediction"])
real_warped_img_list.append(generated["deformed"])
In the demo, what do the second and the third part seperately mean? As i know, they are probably "deformed" and "prediction". Could you describe it in detail.
Appreciate for your patient answer
When I was creating the environment, I ran into an error:
`ERROR: Cannot install Pillow==9.2.0 and Pillow==9.5.0 because these package versions have conflicting dependencies.
The conflict is caused by:
The user requested Pillow==9.2.0
The user requested Pillow==9.5.0`
and I noticed there are two different version of pillow in the requirements.txt, so which one is the correct version? Thanks!
Hi @nihaomiao ,
I search for the .pth pretrained model, but I could only find one called "data.pkl" - Is the pretrained model saved in the "data.pkl"?
update: nvm, saw my mistake..
Thanks for your work!
I used the pretrained models published in this repository and calculated the FVD using this repository.
However, I obtained FVDs much larger than those in the paper.
Could you please release the source code for calculating FVD?
Hi,
How small can the size of the batch_size in the training phase be? At the size you set, I found that my machine didn't have enough video memory, and then I adjusted it to 30 to train.I'm just using a piece of 2080ti (11G)
Hello,Thanks for your work!
I am a novice, how can I run through your code?
Hi! I really appreciate your work! When I run your multi-gpu code, I met the following problem. It looks like some layers are in different device. Could you please help me with that?
Traceback (most recent call last):
File "/CVPR23_LFDM/DM/train_video_flow_diffusion_mhad_multiGPU.py", line 465, in <module>
main()
File "/CVPR23_LFDM/DM/train_video_flow_diffusion_mhad_multiGPU.py", line 253, in main
train_output_dict = model.forward(real_vid=real_vids, ref_img=ref_imgs, ref_text=cond)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/anaconda3/envs/pytorch/lib/python3.11/site-packages/torch/nn/parallel/data_parallel.py", line 185, in forward
outputs = self.parallel_apply(replicas, inputs, module_kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/anaconda3/envs/pytorch/lib/python3.11/site-packages/torch/nn/parallel/data_parallel.py", line 200, in parallel_apply
return parallel_apply(replicas, inputs, kwargs, self.device_ids[:len(replicas)])
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/anaconda3/envs/pytorch/lib/python3.11/site-packages/torch/nn/parallel/parallel_apply.py", line 110, in parallel_apply
output.reraise()
File "/anaconda3/envs/pytorch/lib/python3.11/site-packages/torch/_utils.py", line 694, in reraise
raise exception
RuntimeError: Caught RuntimeError in replica 0 on device 0.
Original Traceback (most recent call last):
File "/anaconda3/envs/pytorch/lib/python3.11/site-packages/torch/nn/parallel/parallel_apply.py", line 85, in _worker
output = module(*input, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^
File "/anaconda3/envs/pytorch/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/anaconda3/envs/pytorch/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/CVPR23_LFDM/DM/modules/video_flow_diffusion_model_multiGPU.py", line 103, in forward
generated = self.generator(ref_img, source_region_params=source_region_params,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/anaconda3/envs/pytorch/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/anaconda3/envs/pytorch/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/CVPR23_LFDM/LFAE/modules/generator.py", line 100, in forward
motion_params = self.pixelwise_flow_predictor(source_image=source_image,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/anaconda3/envs/pytorch/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/anaconda3/envs/pytorch/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/CVPR23_LFDM/LFAE/modules/pixelwise_flow_predictor.py", line 111, in forward
heatmap_representation = self.create_heatmap_representations(source_image, driving_region_params,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/CVPR23_LFDM/LFAE/modules/pixelwise_flow_predictor.py", line 54, in create_heatmap_representations
gaussian_driving = region2gaussian(driving_region_params['shift'], covar=covar, spatial_size=spatial_size)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/CVPR23_LFDM/LFAE/modules/util.py", line 44, in region2gaussian
covar_inverse = torch.inverse(covar).view(*shape)
^^^^^^^^^^^^^^^^^^^^
RuntimeError: lazy wrapper should be called at most once
Hi, haomiao. When I run "python -u LFAE/run_mhad.py" to train the LFAE model, I have the bug :
File "CVPR23_LFDM/LFAE/modules/model.py", line 217, in forward
if self.loss_weights['photometric'] != 0:
KeyError: 'photometric'
May be the loss_weights of 'photometric' should be provided in config/xxx128.yaml . Looking forward to your reply : )
how to get the Depth、Interial、Skeleton when trianing the new person ? (MHAD)
@nihaomiao
Hi,Thanks for your great work!
Here are a few questions for you to answer!
Why is the dark area of the face mainly the eyes, mouth, and edge contours of the face?
Is there the most occlusion here? Does eye occlusion mean that the eyes may be closed? Does the occlusion of the mouth mean that there may be teeth? I don't know if I understand it right? So why do the edge contours shade so deeply? What needs to be redrawn there?
Hi, thanks for your great work! How can I evaluate the test results? I did not find scripts/codes for quantitative evaluation for mhad dataset.
Thank you for sharing the code!
Can you also share the hardware environment required to run this code? (minimum memory requirements, GPU spec.. etc)
I only find the demo files for MHAD and MUG in the demo
folder. Will be the demo of NATOPS publicly available?
When I try to tun python -u demo/demo_mug.py
, I encountered the following error:
no checkpoint found at '/data/hfn5052/text2motion/videoflowdiff_mug/snapshots-j-sl-random-of-tr-rmm/flowdiff_0005_S111600.pth'
I notice that there are two model paths in the code, RESTORE_FROM and AE_RESTORE_FROM. I gave the path about the pre-training model on the MUG dataset to AE_RESTORE_FROM, but what should the RESTORE_FROM value be?
Also want to ask, code about when will be updated a version? Now many variables in this version are written to the code, to modify and run a lot of inconvenience, if can be integrated into the YAML configuration file, I think it would be a great benefit for the rest of the community to follow your work.
Finally, thank you for this work, gave us a lot of inspiring ideas, also hope to see this project can get more attention.
@nihaomiao
hello!
When I run the demo_mug.py, it doesn't run successfully as follows:
it blocks in this line:
model.sample_one_video(cond_scale=cond_scale)
Hi,
Is it possible to generate a single character from the Pose for about 5 seconds?
I have a video of Pose ( openpose + hands + face) and i was wondering if it is possible to generate an output video withe the length of 5 seconds that has a consistent character/Avatar which plays Dance, .... from the controlled (pose) input?
I have a video of OpenPose+hands+face and i want to generate human like animation (No matter what, but just a consistent Character/Avatar)
Sample Video
Thanks
Best regards
Is there a requirements file to install dependencies before running the demo ?
I want to train the diffusion model with multi gpus. But there are not relevant codes.
Thank you for your excellent work ! I have a problem below.
When I am trying to train DM model and running "python -u DM/train_video_flow_diffusion_mhad.py", there is an error below:
File "CVPR23_LFDM/DM/modules/video_flow_diffusion_model.py", line 144, in forward
self.ref_img_fea = generated["bottle_neck_feat"].clone().detach()
KeyError: 'bottle_neck_feat'
I cannot find the key ''bottle_neck_feat'' in the generator.py, please help me to solve it ~ Thank you very much!
Nice work on latent diffusion model, can you provide an arXiv preprint of your paper? thanks! ^ ^
Thanks for your great work.
I saw your paper, and I want to enquire what is difference between cFVD,sFVD and normal FVD.
And where can I find cFVD, sFVD metric code?
Thanks.
Hi, Thank you for sharing the demo scenarios.
What if I want to apply this LFDM demo code to the custom image,
what are the things that I have to be aware of?
For instance, do I have to align the human pose or facial feature position in advance?
Any other tips would be welcome.
And is there a code for fine-tuning the decoder using the custom image?
Also, what is the difference between the 2nd image and the 3rd image in the output gif? (Occlusion aware and Occlusion agnostic?)
Thank you.
Hi dear @nihaomiao
Congratulations on your very interesting work! I'm curious about the role of data jittering. If it's to prevent model overfitting, why was it implemented only in the DM and not in the LFAE? and have you tested running the DM without jittering the data?
Thank you in advance 😁
When I follow your steps to train the model, it reminds me that I need a pre-trained weight file. This problem is in the training code for all three datasets. Where do I get this weights file? Hope to get your reply
parser.add_argument("--checkpoint", # use the pretrained Taichi model provided by Snap default="/data/hfn5052/text2motion/RegionMM/taichi256.pth",
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.