nihaomiao / cvpr23_lfdm Goto Github PK

The pytorch implementation of our CVPR 2023 paper "Conditional Image-to-Video Generation with Latent Flow Diffusion Models"

License: BSD 2-Clause "Simplified" License

Python 100.00%

cvpr2023 diffusion-models image-animation image-to-video video-generation video-prediction latent-diffusion optical-flow

cvpr23_lfdm's People

Contributors

Stargazers

Watchers

Forkers

johndpope syaikhipin techthiyanes zhouliang-yu cv-synthesis arunsank cenkbircanoglu yangbinb austinmarckx xiaoyun4 1359347500cwc hszhai ratikantmohanta steven-xiong ps-dev12 aspnetcs mairju nicery sateodoro bysen32 gaochonghan amohamedaakhil mvasil dahwin sean-reid amerssun globle-thunder xiaojiean815 py85252876 hadryan hadesnull123 devinhee ro728 pereztomer wmonica yepjin eailab-on-device-video-diffusion kira1433 deathmaker-inf

cvpr23_lfdm's Issues

a demo

@nihaomiao
hello!
When I run python demo/demo_mug.py, the process does not respond.

url of dataset and model

can you provied the baidu yunpan or google yunpan url of dataset (include image) and model when training ?

about checkpoint

Thanks for your great work! Where can I get the pretrained Taichi/VOX checkpoint? And are the pretrained model very important for LFAE?

Outputs in SAMPLE_DIR seem really strange

Hi haomiao,

I tried to train the dm on MHAD Dataset using python -u DM/train_video_flow_diffusion_mhad.py and the released LFAE_MHAD.pth model. However, after about 40000 times iteration, the outputs in SAMPLE_DIR still seems really strange, expecially the third and fourth columns(sample_out_img and fake_grid). Could you please help me figure out whether it's a normal process? By the way, I am not very clear about the significance to compare both the generated["prediction"] (out_vid) and generated["deformed"] (warped_vid). Could you please give me some instructions? Thank you!

How to obtain supplementary materials for the paper?

Thanks a lot

RuntimeError: PytorchStreamReader failed reading zip archive: failed finding central directory

I used the pre-trained model LFAE you provided to run python -u LFAE/test_flowautoenc_natops.py

which shows RuntimeError: PytorchStreamReader failed reading zip archive: failed finding central directory

torch.hub.load error

TOKENIZER = torch.hub.load('huggingface/pytorch-transformers', 'tokenizer', 'bert-base-cased') meet errors :

File "/usr/lib/python3.8/importlib/metadata.py", line 169, in from_name
raise PackageNotFoundError(name)
importlib.metadata.PackageNotFoundError: regex

File "/root/.cache/torch/hub/huggingface_pytorch-transformers_main/src/transformers/utils/versions.py", line 104, in require_version
raise importlib.metadata.PackageNotFoundError(
importlib.metadata.PackageNotFoundError: The 'regex!=2019.12.17' distribution was not found and is required by this application.
Try: pip install transformers -U or pip install -e '.[dev]' if you're working with git main

demo_mug http error 403 (torch.hub.load)

I got error when I tried to demo
python demo/demo_mug.py

should i change some codes ?

def get_tokenizer():
    global TOKENIZER
    if not exists(TOKENIZER):
        TOKENIZER = torch.hub.load('huggingface/pytorch-transformers', 'tokenizer', 'bert-base-cased')
    return TOKENIZER

error message is

  File "/home/aaa/anaconda3/envs/py38_lfdm/lib/python3.8/site-packages/torch/hub.py", line 362, in load
    repo_or_dir = _get_cache_or_reload(repo_or_dir, force_reload, verbose)
  File "/home/aaa/anaconda3/envs/py38_lfdm/lib/python3.8/site-packages/torch/hub.py", line 162, in _get_cache_or_reload
    _validate_not_a_forked_repo(repo_owner, repo_name, branch)
  File "/home/aaa/anaconda3/envs/py38_lfdm/lib/python3.8/site-packages/torch/hub.py", line 124, in _validate_not_a_forked_repo
    with urlopen(url) as r:
  File "/home/aaa/anaconda3/envs/py38_lfdm/lib/python3.8/urllib/request.py", line 222, in urlopen
    return opener.open(url, data, timeout)
  File "/home/aaa/anaconda3/envs/py38_lfdm/lib/python3.8/urllib/request.py", line 531, in open
    response = meth(req, response)
  File "/home/aaa/anaconda3/envs/py38_lfdm/lib/python3.8/urllib/request.py", line 640, in http_response
    response = self.parent.error(
  File "/home/aaa/anaconda3/envs/py38_lfdm/lib/python3.8/urllib/request.py", line 569, in error
    return self._call_chain(*args)
  File "/home/aaa/anaconda3/envs/py38_lfdm/lib/python3.8/urllib/request.py", line 502, in _call_chain
    result = func(*args)
  File "/home/aaa/anaconda3/envs/py38_lfdm/lib/python3.8/urllib/request.py", line 649, in http_error_default
    raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 403: rate limit exceeded

doubts about the model name in the code

Hi @nihaomiao, thanks for open-sourcing the awesome work!

I have a few doubts about the model names in the code.

In the script run_natops.py, what do Generator, RegionPredictor and BGMotionPredictor mean and are used for?
I am a bit confused because you denote the models as encoder, decoder and flow predictor in your paper.
So, what are the correspondences of them?

Some doubts about the paper and code

1.What are the three losses in the first stage?

2.The generator model in this parameter seems to contain three trainable networks in one stage, is it correct?

3.What is the purpose of adding x0 in the second stage, I did not find it in the paper

4.I am not very clear about the division of test sets and datasets

dataset

可以分享一下MUG数据集吗，我去官网发的邮件没有回复。
Can you share the MUG data set, I went to the official website and sent an email without a reply.

result in new people is bad

result in new people is bad.

Can image animation be achieved? like this demo

like this demo
https://twitter.com/pika_labs/status/1678892871670464513

demo ModuleNotFoundError

hello！
When I run the demo_mug.py，it display：

Traceback (most recent call last):
File "demo/demo_mug.py", line 13, in
from misc import Logger, grid2fig, conf2fig
ModuleNotFoundError: No module named 'misc'

What do the "deformed" and "prediction" mean?

HI!

In codes:
real_out_img_list.append(generated["prediction"])
real_warped_img_list.append(generated["deformed"])

In the demo, what do the second and the third part seperately mean? As i know, they are probably "deformed" and "prediction". Could you describe it in detail.
Appreciate for your patient answer

ModuleNotFoundError: No module named 'huggingface_hub'

What do fake and real graphs refer to?

I have troble understanding what does these respertively means.
Grateful to get your response.

Which version of 'Pillow' package should we use?

When I was creating the environment, I ran into an error:
`ERROR: Cannot install Pillow==9.2.0 and Pillow==9.5.0 because these package versions have conflicting dependencies.

The conflict is caused by:
The user requested Pillow==9.2.0
The user requested Pillow==9.5.0`

and I noticed there are two different version of pillow in the requirements.txt, so which one is the correct version? Thanks!

mug demo pretrained model

Hi @nihaomiao ,

I search for the .pth pretrained model, but I could only find one called "data.pkl" - Is the pretrained model saved in the "data.pkl"?

update: nvm, saw my mistake..

FVD calculation

Thanks for your work!

I used the pretrained models published in this repository and calculated the FVD using this repository.
However, I obtained FVDs much larger than those in the paper.
Could you please release the source code for calculating FVD?

about memory

Hi，
How small can the size of the batch_size in the training phase be? At the size you set, I found that my machine didn't have enough video memory, and then I adjusted it to 30 to train.I'm just using a piece of 2080ti (11G)

How do I run your code ?

Hello,Thanks for your work!
I am a novice, how can I run through your code?

Multi-GPUs training problem

Hi! I really appreciate your work! When I run your multi-gpu code, I met the following problem. It looks like some layers are in different device. Could you please help me with that?

Traceback (most recent call last):
  File "/CVPR23_LFDM/DM/train_video_flow_diffusion_mhad_multiGPU.py", line 465, in <module>
    main()
  File "/CVPR23_LFDM/DM/train_video_flow_diffusion_mhad_multiGPU.py", line 253, in main
    train_output_dict = model.forward(real_vid=real_vids, ref_img=ref_imgs, ref_text=cond)
                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/anaconda3/envs/pytorch/lib/python3.11/site-packages/torch/nn/parallel/data_parallel.py", line 185, in forward
    outputs = self.parallel_apply(replicas, inputs, module_kwargs)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/anaconda3/envs/pytorch/lib/python3.11/site-packages/torch/nn/parallel/data_parallel.py", line 200, in parallel_apply
    return parallel_apply(replicas, inputs, kwargs, self.device_ids[:len(replicas)])
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/anaconda3/envs/pytorch/lib/python3.11/site-packages/torch/nn/parallel/parallel_apply.py", line 110, in parallel_apply
    output.reraise()
  File "/anaconda3/envs/pytorch/lib/python3.11/site-packages/torch/_utils.py", line 694, in reraise
    raise exception
RuntimeError: Caught RuntimeError in replica 0 on device 0.
Original Traceback (most recent call last):
  File "/anaconda3/envs/pytorch/lib/python3.11/site-packages/torch/nn/parallel/parallel_apply.py", line 85, in _worker
    output = module(*input, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/anaconda3/envs/pytorch/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/anaconda3/envs/pytorch/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/CVPR23_LFDM/DM/modules/video_flow_diffusion_model_multiGPU.py", line 103, in forward
    generated = self.generator(ref_img, source_region_params=source_region_params,
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/anaconda3/envs/pytorch/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/anaconda3/envs/pytorch/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/CVPR23_LFDM/LFAE/modules/generator.py", line 100, in forward
    motion_params = self.pixelwise_flow_predictor(source_image=source_image,
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/anaconda3/envs/pytorch/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/anaconda3/envs/pytorch/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/CVPR23_LFDM/LFAE/modules/pixelwise_flow_predictor.py", line 111, in forward
    heatmap_representation = self.create_heatmap_representations(source_image, driving_region_params,
                             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/CVPR23_LFDM/LFAE/modules/pixelwise_flow_predictor.py", line 54, in create_heatmap_representations
    gaussian_driving = region2gaussian(driving_region_params['shift'], covar=covar, spatial_size=spatial_size)
                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/CVPR23_LFDM/LFAE/modules/util.py", line 44, in region2gaussian
    covar_inverse = torch.inverse(covar).view(*shape)
                    ^^^^^^^^^^^^^^^^^^^^
RuntimeError: lazy wrapper should be called at most once

KeyError: 'photometric'

Hi, haomiao. When I run "python -u LFAE/run_mhad.py" to train the LFAE model, I have the bug :
File "CVPR23_LFDM/LFAE/modules/model.py", line 217, in forward
if self.loss_weights['photometric'] != 0:
KeyError: 'photometric'

May be the loss_weights of 'photometric' should be provided in config/xxx128.yaml . Looking forward to your reply : )

how to get the Depth、Interial、Skeleton when trianing the new person ? (MHAD)

how to get the Depth、Interial、Skeleton when trianing the new person ? (MHAD)
@nihaomiao

Inquiries about face occlusion map

Hi，Thanks for your great work！
Here are a few questions for you to answer!
Why is the dark area of the face mainly the eyes, mouth, and edge contours of the face?
Is there the most occlusion here? Does eye occlusion mean that the eyes may be closed? Does the occlusion of the mouth mean that there may be teeth? I don't know if I understand it right? So why do the edge contours shade so deeply? What needs to be redrawn there?

Why is the grid generated well in the demo file, but the grid generated in the train_video_flow_diffusion_mhad_multiGPU file is distorted?

the grid generated well in the demo file

the grid generated in the train_video_flow_diffusion_mhad_multiGPU file

I am referring to the GT grid. The grid in the demo is square, but in the file below is folded. I'm using the same AE weights.

Evaluation

Hi, thanks for your great work! How can I evaluate the test results? I did not find scripts/codes for quantitative evaluation for mhad dataset.

Hardware Requirements

Thank you for sharing the code!

Can you also share the hardware environment required to run this code? (minimum memory requirements, GPU spec.. etc)

Demo for NATOPS Dataset

I only find the demo files for MHAD and MUG in the demo folder. Will be the demo of NATOPS publicly available?

The model file could not be found

When I try to tun python -u demo/demo_mug.py, I encountered the following error:

no checkpoint found at '/data/hfn5052/text2motion/videoflowdiff_mug/snapshots-j-sl-random-of-tr-rmm/flowdiff_0005_S111600.pth'

I notice that there are two model paths in the code, RESTORE_FROM and AE_RESTORE_FROM. I gave the path about the pre-training model on the MUG dataset to AE_RESTORE_FROM, but what should the RESTORE_FROM value be?

Also want to ask, code about when will be updated a version? Now many variables in this version are written to the code, to modify and run a lot of inconvenience, if can be integrated into the YAML configuration file, I think it would be a great benefit for the rest of the community to follow your work.

Finally, thank you for this work, gave us a lot of inspiring ideas, also hope to see this project can get more attention.

demo not running successully

@nihaomiao
hello！
When I run the demo_mug.py， it doesn't run successfully as follows:

it blocks in this line:
model.sample_one_video(cond_scale=cond_scale)

Hand + Face of Human Pose

Hi,
Is it possible to generate a single character from the Pose for about 5 seconds?

I have a video of Pose ( openpose + hands + face) and i was wondering if it is possible to generate an output video withe the length of 5 seconds that has a consistent character/Avatar which plays Dance, .... from the controlled (pose) input?

I have a video of OpenPose+hands+face and i want to generate human like animation (No matter what, but just a consistent Character/Avatar)
Sample Video

Thanks
Best regards

Requirements file before running a demo ?

Is there a requirements file to install dependencies before running the demo ?

Code of multi gpus training of diffusion model

I want to train the diffusion model with multi gpus. But there are not relevant codes.

KeyError: 'bottle_neck_feat'

Thank you for your excellent work ! I have a problem below.

When I am trying to train DM model and running "python -u DM/train_video_flow_diffusion_mhad.py"， there is an error below:
File "CVPR23_LFDM/DM/modules/video_flow_diffusion_model.py", line 144, in forward
self.ref_img_fea = generated["bottle_neck_feat"].clone().detach()
KeyError: 'bottle_neck_feat'

I cannot find the key ''bottle_neck_feat'' in the generator.py, please help me to solve it ~ Thank you very much!

demo_mug.py RuntimeError: CUDA out of memory. Tried to allocate 120.00 MiB V100 32G

Hello, I have a memory leak in the test, in the diffusion 1000 iterations when the memory rose from 14G to 32G so that can not be tested, please help to see what is the reason? python3.7.10 pytorch1.12.1+cu10.2 is installed in requirements.txt for all environments.

expect an arXiv preprint

Nice work on latent diffusion model, can you provide an arXiv preprint of your paper? thanks! ^ ^

ModuleNotFoundError: No module named 'importlib.metadata'

When I run python -u DM/train_video_flow_diffusion_natops.py, the whole issue occurs.

It doesn't seem to support versions below 3.8, how to solve this, do you have to reinstall version 3.8 python.

NATOPS Dataset preprocessing

Hi,
I want to train LFDM using NATOPS Dataset,but it(I downloaded it via your link) have a little different in preprocessing,it doesn't have json files.

it like this

but this code

cFVD and sFVD?

Thanks for your great work.

I saw your paper, and I want to enquire what is difference between cFVD,sFVD and normal FVD.
And where can I find cFVD, sFVD metric code?

Thanks.

About demo scenario

Hi, Thank you for sharing the demo scenarios.

What if I want to apply this LFDM demo code to the custom image,

what are the things that I have to be aware of?

For instance, do I have to align the human pose or facial feature position in advance?

Any other tips would be welcome.

And is there a code for fine-tuning the decoder using the custom image?

Also, what is the difference between the 2nd image and the 3rd image in the output gif? (Occlusion aware and Occlusion agnostic?)

Thank you.

'jitter' effect

Hi dear @nihaomiao

Congratulations on your very interesting work! I'm curious about the role of data jittering. If it's to prevent model overfitting, why was it implemented only in the DM and not in the LFAE? and have you tested running the DM without jittering the data?

Thank you in advance 😁

Where is the pre-trained weight file used when training the model

When I follow your steps to train the model, it reminds me that I need a pre-trained weight file. This problem is in the training code for all three datasets. Where do I get this weights file? Hope to get your reply
parser.add_argument("--checkpoint", # use the pretrained Taichi model provided by Snap default="/data/hfn5052/text2motion/RegionMM/taichi256.pth",