Code Monkey home page Code Monkey logo

animateanyone-unofficial's Introduction

Unofficial Implementation of Animate Anyone

If you find this repository helpful, please consider giving us a star⭐!

We only train on small-scale datasets (such as TikTok, UBC), and it is difficult to achieve official results under the condition of insufficient data scale and quality. Because of the consideration of time and cost, we do not intend to collect and filter a large number of high-quality data. If someone has a robust model trained on a large amount of high-quality data and is willing to share it, make a pull request.

Overview

This repository contains an simple and unofficial implementation of Animate Anyone. This project is built upon magic-animate and AnimateDiff. This implementation is first developed by Qin Guo and then assisted by Zhenzhi Wang.

Training Guidance

Although we cannot use large-scale data to train the model, we can provide several training suggestions:

  1. In our experiments, the poseguider in the original paper of AnimateAnyone is very difficult to control pose, no matter what activation function we use (such as ReLU, SiLU), but the output channel is enlarged to 320 and added after conv_in (such as model.hack_poseguider ) is very effective, and at the same time, compared to controlnet, this solution is more lightweight (<1M para vs 400M para). But we still think that Controlnet is a good choice. Poseguider relies on unet that is fine-tuned at the same time and cannot be used immediately. Plug and play.
  2. In small-scale data sets (less than 2000 videos), stage1 can work very well (including generalization), but stage2 is data hungry. When the amount of data is low, artifacts and flickers can easily occur. Because we retrained unet in the first stage, the checkpoint of the original animatediff lost its effect, so a large number of high-quality data sets are needed to retrain the motion module of animatediff at this stage.
  3. Freezing unet is not a good choice as it will lose the texture information of the reference image.
  4. This is a data hungry task. We believe that scale up data quality and scale are often more valuable than modifying the tiny structure of the model. Data quantity and quality are very important!
  5. High-resolution training is very important, which affects the learning and reconstruction of details. The training resolution should not be greater than the inference resolution.

Sample of Result on UBC-fashion dataset

Stage 1

The current version of the face still has some artifacts. This model is trained on the UBC dataset rather than a large-scale dataset.

Stage 2

The training of stage2 is challenging due to artifacts in the background. We select one of our best results here, and are still working on it. An important point is to ensure that training and inference resolution is consistent.

ToDo

  • Release Training Code.
  • Release Inference Code.
  • Release Unofficial Pre-trained Weights.
  • Release Gradio Demo.

Requirements

bash fast_env.sh

🎬Gradio Demo

python3 -m demo.gradio_animate

For a 13-second pose video, processing at 256 resolution requires 11G VRAM, and at 512 resolution, it requires 23.5G VRAM.

Training

Original AnimateAnyone Architecture (It is difficult to control pose when training on a small dataset.)

First Stage

torchrun --nnodes=8 --nproc_per_node=8 train.py --config configs/training/train_stage_1.yaml

Second Stage

torchrun --nnodes=8 --nproc_per_node=8 train.py --config configs/training/train_stage_2.yaml

Our Method (A more dense pose control scheme, the number of parameters is still small.) (Highly recommended)

torchrun --nnodes=8 --nproc_per_node=8 train_hack.py --config configs/training/train_stage_1.yaml

Second Stage

torchrun --nnodes=8 --nproc_per_node=8 train_hack.py --config configs/training/train_stage_2.yaml

Acknowledgements

Special thanks to the original authors of the Animate Anyone project and the contributors to the magic-animate and AnimateDiff repository for their open research and foundational work that inspired this unofficial implementation.

Email

For academic or business cooperation only: [email protected]

animateanyone-unofficial's People

Contributors

guoqincode avatar eltociear avatar zhenzhiwang avatar dongxuyue avatar

Stargazers

Sean avatar David Dudas avatar AoMa avatar  avatar  avatar ZhiyingDu avatar  avatar Fernando Rojas avatar  avatar  avatar  avatar  avatar 李京彧 avatar  avatar  avatar hy avatar John Flynn avatar  avatar jiding duan avatar Kirok Kim avatar  avatar Mingfu Yan avatar soIT avatar Xiong Lin avatar  avatar Ryan Rynaldo avatar NewCoderQ avatar koyuki avatar 从未 avatar Mehedi Hasan avatar Alex S. Liu avatar  avatar Siarhei Damanau avatar  avatar  avatar Shen Yecheng avatar  avatar Sean Gilbert avatar Yuqian Hong avatar Zhihao avatar  avatar  avatar Vis avatar CooperLeong avatar  avatar Artisphie Flarichiistika avatar MicroK avatar  avatar  avatar  avatar  avatar  avatar Fangdong Wu avatar  avatar  avatar Duc-Thuc Pham (Daniel) avatar Sicheng Xu avatar Ketan Ramaneti avatar Hello Lin avatar 孙士杰 avatar  avatar  avatar lb203 avatar  avatar 唐温如 avatar Joe M avatar  avatar  avatar Zc avatar r1c7 avatar  avatar borishan avatar Xiaomin Wu avatar Tater Tot  avatar  avatar george avatar yuheng avatar Jayden Qin avatar  avatar Darren yebutong avatar Kingfo avatar  avatar rika avatar  avatar  avatar Ibrahim H. avatar  avatar  avatar  avatar Deshan Jayawardana avatar ShowMaker avatar  avatar  avatar  avatar  avatar  avatar Lawrence kraft avatar else avatar allen avatar  avatar

Watchers

robbin han avatar  avatar  avatar Juan G. avatar 苹果的味道 avatar  avatar Chuck Lien avatar Chuck Gaffney avatar v avatar liu3xing3long avatar sizhky avatar  avatar  avatar  avatar  avatar 建业 avatar fingerx avatar ryan avatar Vivek Bhoraniya avatar Roberto Navoni avatar SunshineAtNoon  avatar L.JIE avatar Yunlin Chen avatar  avatar Paragoner avatar Mit Dave avatar CQ avatar visonpon avatar signal processing fan avatar Yutong Wang avatar Stray avatar Lacan avatar Runs avatar  avatar  avatar Gleb avatar Deepak Mangla avatar Guile Lindroth avatar  avatar Saran avatar  avatar wisdom-pan avatar  avatar LYUYork avatar  avatar Dalao avatar  avatar ButterCream avatar Zeekim avatar Xuanmeng Zhang avatar  avatar 张思绮 avatar 梁俊宇 avatar Leo Pan avatar 画画的北北 avatar ドーム avatar Benjamin Moll avatar Sofian Mejjoute avatar ~小恶魔~ avatar Jayden Gottlieb avatar 筱楽 avatar 虞兮曦 avatar Hồ Thi Tý avatar Jason Sung avatar thundax avatar Marco Liang avatar Awesome King avatar Xuper avatar Jingjing Wang avatar Francesco Fugazzi avatar No.67 avatar  avatar ❤︎ん∘ꠋ⢌ avatar SONU AWS EXPERT DEVOPS ENGINEER avatar Forest/修普诺斯 avatar  avatar Winniy avatar  avatar Bluxy avatar  avatar  avatar  avatar  avatar

animateanyone-unofficial's Issues

No trainable param for unet in stage 1

unet = DDP(unet, device_ids=[local_rank], output_device=local_rank)
File "/home/hongfating/miniconda3/envs/animate/lib/python3.8/site-packages/torch/nn/parallel/distributed.py", line 551, in init
self._log_and_throw(
File "/home/hongfating/miniconda3/envs/animate/lib/python3.8/site-packages/torch/nn/parallel/distributed.py", line 686, in _log_and_throw
raise err_type(err_msg)
RuntimeError: DistributedDataParallel is not needed when a module doesn't have any parameter that requires a gradient.

where is spatial attention?

Great work, and i have some question about the attention modules (spatial attention&cross-attention&temporal attention), but the spatial-attention for calculating reference-net latent feature and denoising-unet latent feature is ignored? (cite:we replace the self-attention layer with spatial-attention layer. Given a feature map x1∈Rt×h×w×c from denoising UNet and x2∈Rh×w×c from ReferenceNet, we first copy x2 by t times and concatenate it with x1 along w dimension)

SparseCausalAttention2D

While reading the code I saw that the standard BasicTransformerBlock from diffusers has been replaced with a modified version that utilizes a new class called SparseCausalAttention2D for the attn1 layer. Could you specify where this class is defined? Or maybe, were you able to successfully train the model without using this class (replacing it with a different one)?

Running Inference step 2

I got train results for both stages 1 and 2. Inference stage one works but creates a video with the same frame for one second; the inference stage 2 module is not working. I tried python -m pipelines.animation_stage_2 --config configs/prompts/animation_stage_2.yaml. I set the config values correctly. It throws an import error, than I fixed it. I have this error:

  from diffusers.pipeline_utils import DiffusionPipeline
loaded temporal unet's pretrained weights from outputs/train_stage_2-2023-12-22T08-59-53
Traceback (most recent call last):
  File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/usr/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/workspace/AnimateAnyone-unofficial/pipelines/animation_stage_2.py", line 244, in <module>
    run(args)
  File "/workspace/AnimateAnyone-unofficial/pipelines/animation_stage_2.py", line 233, in run
    main(args)
  File "/workspace/AnimateAnyone-unofficial/pipelines/animation_stage_2.py", line 70, in main
    unet = UNet3DConditionModel.from_pretrained_2d(config.pretrained_motion_unet_path, subfolder=None, unet_additional_kwargs=OmegaConf.to_container(inference_config.unet_additional_kwargs), specific_model=config.specific_motion_unet_model)
  File "/workspace/AnimateAnyone-unofficial/models/unet.py", line 457, in from_pretrained_2d
    raise RuntimeError(f"{config_file} does not exist")
RuntimeError: outputs/train_stage_2-2023-12-22T08-59-53/config.json does not exist

Which clip encoder is this?

Magicanimate doesn't seem to have it in their pretrained directory. Is it the same as "laion/CLIP-ViT-B-32-laion2B-s34B-b79K" ?

Have you noticed any issue during training related to the denoising timesteps?

Per the title I've been a little perplexed to see that what was denoised well at 30 inference timesteps @ 60k training steps, requires 70 steps @ 100k training steps.

My implementation is slightly different than yours so there could be quite a few things going on. Just curious if you noticed any similar behaviors since you're in the middle of training these days.

Thank you

what is the poseguider_checkpoint_path value?

Hello, first of all thanks for your work.

I have some questions. During the second stage of training, in the train_stage_2.yaml file,
poseguider_checkpoint_path: ""
referencenet_checkpoint_path: ""
What should these two contents be? Should the model trained in the first stage be written in referencenet_checkpoint_path?
Or something else, I hope to get your reply.

about the result of the first stage

my config:
train_data:
csv_path: ../TikTok_info.csv
video_folder:../TikTok_dataset/TikTok_dataset
sample_size: 512
sample_stride: 4
sample_n_frames: 16
clip_model_path: openai/clip-vit-base-patch32

gradient_accumulation_steps: 128
batch_size: 1
use 1 V100, optimizer = torch.optim.SGD(trainable_params, lr=learning_rate / gradient_accumulation_steps, momentum=0.9)
result: show the result of 20000 steps
image

Could it be because the 20,000 steps I have here are actually only equivalent to more than 300 steps when the batchsize is 64? or other reasons?

No module named 'models.hack_poseguider'

I tried to run demo.gradio_animate, but the following error was reported. Under the models folder, I did not find hack_poseguider

Traceback (most recent call last):
File "/home/work/diffuser-env/python/lib/python3.10/runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/home/work/diffuser-env/python/lib/python3.10/runpy.py", line 86, in _run_code
exec(code, run_globals)
File "/home/work/AnimateAnyone-unofficial/demo/gradio_animate.py", line 8, in
from demo.animate import AnimateAnyone
File "/home/work/AnimateAnyone-unofficial/demo/animate.py", line 21, in
from models.hack_poseguider import Hack_PoseGuider as PoseGuider
ModuleNotFoundError: No module named 'models.hack_poseguider'

RuntimeError: mat1 and mat2 shapes cannot be multiplied (4x1024 and 768x320)

I modified the paths in the configuration file to point to my local directories (UBC Fashion Video dataset) and started the training process. However, an error occurred during the process.

/home/user/miniconda3/envs/animateanyone-unofficial/lib/python3.8/site-packages/torchvision/io/image.py:13: UserWarning: Failed to load image Python extension: libtorch_cuda_cu.so: cannot open shared object file: No such file or directory
  warn(f"Failed to load image Python extension: {e}")
### Train Info: train stage 1: image pretrain ###
Some weights of the model checkpoint were not used when initializing ReferenceNet: 
 ['conv_norm_out.bias, conv_norm_out.weight, conv_out.bias, conv_out.weight, up_blocks.3.attentions.2.proj_out.bias, up_blocks.3.attentions.2.proj_out.weight, up_blocks.3.attentions.2.transformer_blocks.0.attn1.to_k.weight, up_blocks.3.attentions.2.transformer_blocks.0.attn1.to_out.0.bias, up_blocks.3.attentions.2.transformer_blocks.0.attn1.to_out.0.weight, up_blocks.3.attentions.2.transformer_blocks.0.attn1.to_q.weight, up_blocks.3.attentions.2.transformer_blocks.0.attn1.to_v.weight, up_blocks.3.attentions.2.transformer_blocks.0.attn2.to_k.weight, up_blocks.3.attentions.2.transformer_blocks.0.attn2.to_out.0.bias, up_blocks.3.attentions.2.transformer_blocks.0.attn2.to_out.0.weight, up_blocks.3.attentions.2.transformer_blocks.0.attn2.to_q.weight, up_blocks.3.attentions.2.transformer_blocks.0.attn2.to_v.weight, up_blocks.3.attentions.2.transformer_blocks.0.ff.net.0.proj.bias, up_blocks.3.attentions.2.transformer_blocks.0.ff.net.0.proj.weight, up_blocks.3.attentions.2.transformer_blocks.0.ff.net.2.bias, up_blocks.3.attentions.2.transformer_blocks.0.ff.net.2.weight, up_blocks.3.attentions.2.transformer_blocks.0.norm2.bias, up_blocks.3.attentions.2.transformer_blocks.0.norm2.weight, up_blocks.3.attentions.2.transformer_blocks.0.norm3.bias, up_blocks.3.attentions.2.transformer_blocks.0.norm3.weight']
12/20/2023 01:40:44 - INFO - root - ***** Running training *****
12/20/2023 01:40:44 - INFO - root -   Num examples = 500
12/20/2023 01:40:44 - INFO - root -   Num Epochs = 480
12/20/2023 01:40:44 - INFO - root -   Instantaneous batch size per device = 4
12/20/2023 01:40:44 - INFO - root -   Total train batch size (w. parallel, distributed & accumulation) = 4
12/20/2023 01:40:44 - INFO - root -   Gradient Accumulation steps = 1
12/20/2023 01:40:44 - INFO - root -   Total optimization steps = 60000

  0%|          | 0/60000 [00:00<?, ?it/s]
Steps:   0%|          | 0/60000 [00:00<?, ?it/s]Traceback (most recent call last):
  File "train.py", line 629, in <module>
    main(name=name, launcher=args.launcher, use_wandb=args.wandb, **config)
  File "train.py", line 492, in main
    referencenet(latents_ref_img, ref_timesteps, encoder_hidden_states)
  File "/home/user/miniconda3/envs/animateanyone-unofficial/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/user/miniconda3/envs/animateanyone-unofficial/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/user/miniconda3/envs/animateanyone-unofficial/lib/python3.8/site-packages/torch/nn/parallel/distributed.py", line 1519, in forward
    else self._run_ddp_forward(*inputs, **kwargs)
  File "/home/user/miniconda3/envs/animateanyone-unofficial/lib/python3.8/site-packages/torch/nn/parallel/distributed.py", line 1355, in _run_ddp_forward
    return self.module(*inputs, **kwargs)  # type: ignore[index]
  File "/home/user/miniconda3/envs/animateanyone-unofficial/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/user/miniconda3/envs/animateanyone-unofficial/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/user/AnimateAnyone-unofficial/models/ReferenceNet.py", line 1005, in forward
    sample, res_samples = downsample_block(
  File "/home/user/miniconda3/envs/animateanyone-unofficial/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/user/miniconda3/envs/animateanyone-unofficial/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/user/miniconda3/envs/animateanyone-unofficial/lib/python3.8/site-packages/diffusers/models/unet_2d_blocks.py", line 1086, in forward
    hidden_states = attn(
  File "/home/user/miniconda3/envs/animateanyone-unofficial/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/user/miniconda3/envs/animateanyone-unofficial/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/user/miniconda3/envs/animateanyone-unofficial/lib/python3.8/site-packages/diffusers/models/transformer_2d.py", line 315, in forward
    hidden_states = block(
  File "/home/user/miniconda3/envs/animateanyone-unofficial/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/user/miniconda3/envs/animateanyone-unofficial/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/user/AnimateAnyone-unofficial/models/ReferenceNet_attention.py", line 199, in hacked_basic_transformer_inner_forward
    attn_output = self.attn2(
  File "/home/user/miniconda3/envs/animateanyone-unofficial/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/user/miniconda3/envs/animateanyone-unofficial/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/user/miniconda3/envs/animateanyone-unofficial/lib/python3.8/site-packages/diffusers/models/attention_processor.py", line 417, in forward
    return self.processor(
  File "/home/user/miniconda3/envs/animateanyone-unofficial/lib/python3.8/site-packages/diffusers/models/attention_processor.py", line 1023, in __call__
    key = attn.to_k(encoder_hidden_states, scale=scale)
  File "/home/user/miniconda3/envs/animateanyone-unofficial/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/user/miniconda3/envs/animateanyone-unofficial/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/user/miniconda3/envs/animateanyone-unofficial/lib/python3.8/site-packages/diffusers/models/lora.py", line 224, in forward
    out = super().forward(hidden_states)
  File "/home/user/miniconda3/envs/animateanyone-unofficial/lib/python3.8/site-packages/torch/nn/modules/linear.py", line 114, in forward
    return F.linear(input, self.weight, self.bias)
RuntimeError: mat1 and mat2 shapes cannot be multiplied (4x1024 and 768x320)

Steps:   0%|          | 0/60000 [00:05<?, ?it/s]
[2023-12-20 01:40:55,416] torch.distributed.elastic.multiprocessing.api: [ERROR] failed (exitcode: 1) local_rank: 0 (pid: 412807) of binary: /home/user/miniconda3/envs/animateanyone-unofficial/bin/python
Traceback (most recent call last):
  File "/home/user/miniconda3/envs/animateanyone-unofficial/bin/torchrun", line 8, in <module>
    sys.exit(main())
  File "/home/user/miniconda3/envs/animateanyone-unofficial/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 346, in wrapper
    return f(*args, **kwargs)
  File "/home/user/miniconda3/envs/animateanyone-unofficial/lib/python3.8/site-packages/torch/distributed/run.py", line 806, in main
    run(args)
  File "/home/user/miniconda3/envs/animateanyone-unofficial/lib/python3.8/site-packages/torch/distributed/run.py", line 797, in run
    elastic_launch(
  File "/home/user/miniconda3/envs/animateanyone-unofficial/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 134, in __call__
    return launch_agent(self._config, self._entrypoint, list(args))
  File "/home/user/miniconda3/envs/animateanyone-unofficial/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 264, in launch_agent
    raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError: 
============================================================
train.py FAILED
------------------------------------------------------------
Failures:
  <NO_OTHER_FAILURES>
------------------------------------------------------------
Root Cause (first observed failure):
[0]:
  time      : 2023-12-20_01:40:55
  host      : gpuserver
  rank      : 0 (local_rank: 0)
  exitcode  : 1 (pid: 412807)
  error_file: <N/A>
  traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
============================================================

About time embedding in ReferenceNet

In the official paper, the authors say

While ReferenceNet introduces a comparable number of parameters to the denoising UNet, in diffusion-based video generation, all video frames undergo denoising multiple times, whereas ReferenceNet only needs to extract features once throughout the entire process

But in your implementation of inference, the forward of ReferenceNet is performed multiple times.

Consider fixing the timestep of ReferenceUnet?

stage2 training error

Thank you for your work.

When I was in the second stage of training, I kept reporting out-of-memory errors. I have 80G of memory. No matter on a single card or multiple cards, the same error was reported. Even if --train_batch_size is set to 1, what went wrong?

error message:
Traceback (most recent call last):
File "/home/work/animate-anyone/train_2nd_stage.py", line 919, in
main(args)
File "/home/work/animate-anyone/train_2nd_stage.py", line 823, in main
model_pred = unet(
File "/home/work/AnimateAnyone-unofficial/animateanyone_env/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/home/work/AnimateAnyone-unofficial/animateanyone_env/lib/python3.10/site-packages/accelerate/utils/operations.py", line 632, in forward
return model_forward(*args, **kwargs)
File "/home/work/AnimateAnyone-unofficial/animateanyone_env/lib/python3.10/site-packages/accelerate/utils/operations.py", line 620, in call
return convert_to_fp32(self.model_forward(*args, **kwargs))
File "/home/work/AnimateAnyone-unofficial/animateanyone_env/lib/python3.10/site-packages/torch/amp/autocast_mode.py", line 14, in decorate_autocast
return func(*args, **kwargs)
File "/home/work/animate-anyone/animate_anyone/models/unet_3d_condition.py", line 1011, in forward
sample = upsample_block(
File "/home/work/AnimateAnyone-unofficial/animateanyone_env/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/home/work/animate-anyone/animate_anyone/models/unet_3d_blocks.py", line 901, in forward
hidden_states = resnet(hidden_states, temb, scale=lora_scale)
File "/home/work/AnimateAnyone-unofficial/animateanyone_env/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/home/work/animate-anyone/animate_anyone/models/resnet.py", line 340, in forward
hidden_states = self.norm1(hidden_states)
File "/home/work/AnimateAnyone-unofficial/animateanyone_env/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/home/work/AnimateAnyone-unofficial/animateanyone_env/lib/python3.10/site-packages/torch/nn/modules/normalization.py", line 273, in forward
return F.group_norm(
File "/home/work/AnimateAnyone-unofficial/animateanyone_env/lib/python3.10/site-packages/torch/nn/functional.py", line 2530, in group_norm
return torch.group_norm(input, num_groups, weight, bias, eps, torch.backends.cudnn.enabled)
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 810.00 MiB (GPU 0; 79.35 GiB total capacity; 76.87 GiB already allocated; 64.19 MiB free; 77.66 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

model saving

Hi, it seems that you train the 2D unet, referencenet, and poseguider during the first stage,
but you don't save parameters of 2D unet.

About multi-GPU training

Thank you for your contributions! There are two questions below:

  1. I have observed that the training duration using two RTX 6000 ada GPUs exceeds the time it takes with a single GPU. Is this an expected phenomenon?
  2. I encountered the phenomenon of gradient explosion during the training process.

Results

Hi, @guoqincode, thanks for your effort in reimplementing this! Could you show some video results as demonstration?

about training memory optimization

In the README, you mentioned that you would optimize the training code using DeepSpeed and Accelerate. However, as far as I know, the DeepSpeed functionality integrated into the Accelerate library does not support multi-model training. Do you have any suggestions?

About masks?

In Tiktok dataset, there is a masks file。 Maybe the foreground is trained separately, Have you taken this into account?

about training memory optimization

In the README, you mentioned that you would optimize the training code using DeepSpeed and Accelerate. However, as far as I know, the DeepSpeed functionality integrated into the Accelerate library does not support multi-model training. Do you have any suggestions about use deepspeed to optimize the memory?

about loss

image
Why my loss is quite strange
today, I try the new code, and my loss gets NaN:
image

about training optimization

i try the 8bit adam optimizer, i can train stage one on 40g a100. i think it can help reduce the vram usage, but i don't know if it will decrease the model performance. what dou you think? did you try the 8bit adam?

one of the variables needed for gradient computation has been modified by an inplace operation [torch.cuda.FloatTensor [128]] is at version 3; expected version 2 instead

File "train_th.py", line 637, in
main(name=name, launcher=args.launcher, use_wandb=args.wandb, **config)
File "train_th.py", line 460, in main
latents_pose = poseguider(mask_image)
File "/home/hongfating/miniconda3/envs/animate/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "/home/hongfating/miniconda3/envs/animate/lib/python3.8/site-packages/torch/nn/parallel/distributed.py", line 1040, in forward
output = self._run_ddp_forward(*inputs, **kwargs)
File "/home/hongfating/miniconda3/envs/animate/lib/python3.8/site-packages/torch/nn/parallel/distributed.py", line 1000, in _run_ddp_forward
return module_to_run(*inputs[0], **kwargs[0])
File "/home/hongfating/miniconda3/envs/animate/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "/cto_labs/hongfating/workspace/src/AnimateAnyone-unofficial/models/PoseGuider.py", line 78, in forward
x = self.conv_layers(x)
File "/home/hongfating/miniconda3/envs/animate/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "/home/hongfating/miniconda3/envs/animate/lib/python3.8/site-packages/torch/nn/modules/container.py", line 204, in forward
input = module(input)
File "/home/hongfating/miniconda3/envs/animate/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "/home/hongfating/miniconda3/envs/animate/lib/python3.8/site-packages/torch/nn/modules/batchnorm.py", line 171, in forward
return F.batch_norm(
File "/home/hongfating/miniconda3/envs/animate/lib/python3.8/site-packages/torch/nn/functional.py", line 2450, in batch_norm
return torch.batch_norm(
File "/home/hongfating/miniconda3/envs/animate/lib/python3.8/site-packages/torch/fx/traceback.py", line 57, in format_stack
return traceback.format_stack()

It should be here: File "/cto_labs/hongfating/workspace/src/AnimateAnyone-unofficial/models/PoseGuider.py", line 78, in forward
x = self.conv_layers(x)

I have no idea about that.

Tensor size mismatch in using clip-vit-large-patch14

Hi,

Thanks for sharing your implementation. It really helps the community a lot to reproduce animate-anyone. When I try to training the network with your code, I find that in the referencenet_attention, the hidden state size of stable diffusion unet is 768 while the clip image feature extracted from clip-vit-large-patch14 is 1024, which causes size mismatch in network forward (however, the hidden size of clip-vit-base-patch32 is 768). As your config yaml file was clip-vit-base-patch32 and recently change to clip-vit-large-patch14, and you mentioned that you use clip-vit-large-patch14 in another issue. Could you elaborate more details how your code works with clip-vit-large-patch14? I encountered errors when I directly run your training code with clip-vit-large-patch14.

Looking forward to your reply! Thanks again for your effort.

Any results?

I saw you added an inference cmd to the readme.
Do you have any preliminary results?

Loss not decreasing in stage 1

After training stage-1 for 30000 steps on TikTok dataset I'm getting the following loss curve and images from validation_pipeline is this correct?

image

referencenet initializing warning ?

Some weights of the model checkpoint were not used when initializing ReferenceNet:
['conv_norm_out.bias, conv_norm_out.weight, conv_out.bias, conv_out.weight, up_blocks.3.attentions.2.proj_out.bias, up_blocks.3.attentions.2.proj_out.weight, up_blocks.3.attentions.2.transformer_blocks.0.attn1.to_k.weight, up_blocks.3.attentions.2.transformer_blocks.0.attn1.to_out.0.bias, up_blocks.3.attentions.2.transformer_blocks.0.attn1.to_out.0.weight, up_blocks.3.attentions.2.transformer_blocks.0.attn1.to_q.weight, up_blocks.3.attentions.2.transformer_blocks.0.attn1.to_v.weight, up_blocks.3.attentions.2.transformer_blocks.0.attn2.to_k.weight, up_blocks.3.attentions.2.transformer_blocks.0.attn2.to_out.0.bias, up_blocks.3.attentions.2.transformer_blocks.0.attn2.to_out.0.weight, up_blocks.3.attentions.2.transformer_blocks.0.attn2.to_q.weight, up_blocks.3.attentions.2.transformer_blocks.0.attn2.to_v.weight, up_blocks.3.attentions.2.transformer_blocks.0.ff.net.0.proj.bias, up_blocks.3.attentions.2.transformer_blocks.0.ff.net.0.proj.weight, up_blocks.3.attentions.2.transformer_blocks.0.ff.net.2.bias, up_blocks.3.attentions.2.transformer_blocks.0.ff.net.2.weight, up_blocks.3.attentions.2.transformer_blocks.0.norm2.bias, up_blocks.3.attentions.2.transformer_blocks.0.norm2.weight, up_blocks.3.attentions.2.transformer_blocks.0.norm3.bias, up_blocks.3.attentions.2.transformer_blocks.0.norm3.weight']

is this correct?the training loss is not decreasing,result:
grid

the pose condition is invalid..

batch size in training stage1.

I training the first stage with 8*A800 80G. However, the max batch size can only be set to 1 on each single GPU. Is that normal?

About "beta_schedule"

I noticed that you changed beta_schedule from linear to scaled_linear. Is it because the training results are better when using the latter?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.