frankchieng / comfyui_aniportrait Goto Github PK

Unofficial implementation of AniPortrait custom node in ComfyUI

Python 100.00%

comfyui_aniportrait's Introduction

Updates:

① Implement the frame_interpolation to speed up generation

② Modify the current code and support chain with the VHS nodes, i just found that comfyUI IMAGE type requires the torch float32 datatype, and AniPortrait heavily used numpy of image unit8 datatype,so i just changed my mind from my own image/video upload and generation nodes to the prevelance SOTA VHS image/video upload and video combined nodes,it WYSIWYG and inteactive well and instantly render the result

✅ [2024/04/09] raw video to pose video with reference image(aka self-driven)
✅ [2024/04/09] audio driven
✅ [2024/04/09] face reenacment
✅ [2024/04/22] implement audio2pose model and pre-trained weight for audio2video,the face reenacment and audio2video workflow has been modified, currently inference up to a maximum length of 10 seconds has been supported,you can experiment with the length hyperparameter.

U can contact me thr twitter Weixin：GalaticKing

audio driven combined with reference image and reference video

audio2video workflow

Aniportrait_00002-audio.mp4

raw video to pose video with reference image

Aniportrait_00004-audio.mp4

face reenacment

video2video workflow

AnimateDiff_00001-audio.mp4

This is unofficial implementation of AniPortrait in ComfyUI custom_node,cuz i have routine jobs,so i will update this project when i have time

Aniportrait_pose2video.json

Audio driven

face reenacment

you should run

git clone https://github.com/frankchieng/ComfyUI_Aniportrait.git

then run

pip install -r requirements.txt

download the pretrained models

StableDiffusion V1.5

sd-vae-ft-mse

image_encoder

wav2vec2-base-960h

download the weights:

denoising_unet.pth reference_unet.pth pose_guider.pth motion_module.pth audio2mesh.pt audio2pose.pt film_net_fp16.pt

./pretrained_model/
|-- image_encoder
|   |-- config.json
|   `-- pytorch_model.bin
|-- sd-vae-ft-mse
|   |-- config.json
|   |-- diffusion_pytorch_model.bin
|   `-- diffusion_pytorch_model.safetensors
|-- stable-diffusion-v1-5
|   |-- feature_extractor
|   |   `-- preprocessor_config.json
|   |-- model_index.json
|   |-- unet
|   |   |-- config.json
|   |   `-- diffusion_pytorch_model.bin
|   `-- v1-inference.yaml
|-- wav2vec2-base-960h
|   |-- config.json
|   |-- feature_extractor_config.json
|   |-- preprocessor_config.json
|   |-- pytorch_model.bin
|   |-- README.md
|   |-- special_tokens_map.json
|   |-- tokenizer_config.json
|   `-- vocab.json
|-- audio2mesh.pt
|-- audio2pose.pt
|-- denoising_unet.pth
|-- motion_module.pth
|-- pose_guider.pth
|-- reference_unet.pth
|-- film_net_fp16.pt

Tips : The intermediate audio file will be generated and deleted,the raw video to pose video with audio and pose2video mp4 file will be located in the output directory of ComfyUI the original uploaded mp4 video requires square size like 512x512, otherwise the result will be weird

I've updated diffusers from 0.24.x to 0.26.2,so the diffusers/models/embeddings.py classname of PositionNet changed to GLIGENTextBoundingboxProjection and CaptionProjection changed to PixArtAlphaTextProjection,you should pay attention to it and modify the corresponding python files like src/models/transformer_2d.py if you installed the lower version of diffusers

comfyui_aniportrait's People

Contributors

Stargazers

Watchers

Forkers

leixy76 assassindesign

comfyui_aniportrait's Issues

Thanks for your node!

I test this node,the effect is good than I expect.I'm looking forward to support but only 512*512 not also other resolution ratio

hello,I ask a question

hello,dose it work with CPU,not GPU?

CLIPVisionModelWithProjection Shape Size Error

Unfortunately, running any of the example workflows I get the following error:

Error occurred when executing AniPortrait_Pose_Gen_Video:

Error(s) in loading state_dict for CLIPVisionModelWithProjection:
size mismatch for vision_model.embeddings.class_embedding: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]).
size mismatch for vision_model.embeddings.patch_embedding.weight: copying a param with shape torch.Size([1024, 3, 14, 14]) from checkpoint, the shape in current model is torch.Size([768, 3, 32, 32]).
size mismatch for vision_model.embeddings.position_embedding.weight: copying a param with shape torch.Size([257, 1024]) from checkpoint, the shape in current model is torch.Size([50, 768]).
size mismatch for vision_model.pre_layrnorm.weight: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]).
size mismatch for vision_model.pre_layrnorm.bias: copying a param with shape torch.Size([1024]) from checkpoint, the shape in current model is torch.Size([768]).

[...]

You may consider adding `ignore_mismatched_sizes=True` in the model `from_pretrained` method.

File "E:\COMFY\ComfyUI-robe\execution.py", line 151, in recursive_execute
output_data, output_ui = get_output_data(obj, input_data_all)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "E:\COMFY\ComfyUI-robe\execution.py", line 81, in get_output_data
return_values = map_node_over_list(obj, input_data_all, obj.FUNCTION, allow_interrupt=True)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "E:\COMFY\ComfyUI-robe\execution.py", line 74, in map_node_over_list
results.append(getattr(obj, func)(**slice_dict(input_data_all, i)))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "E:\COMFY\ComfyUI-robe\custom_nodes\ComfyUI_Aniportrait\nodes.py", line 169, in pose_generate_video
image_enc = CLIPVisionModelWithProjection.from_pretrained(image_encoder_path).to(dtype=weight_dtype, device=device)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\xxx\.conda\envs\comfy\Lib\site-packages\transformers\modeling_utils.py", line 3677, in from_pretrained
) = cls._load_pretrained_model(
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\xxx\.conda\envs\comfy\Lib\site-packages\transformers\modeling_utils.py", line 4155, in _load_pretrained_model
raise RuntimeError(f"Error(s) in loading state_dict for {model.__class__.__name__}:\n\t{error_msg}")

Is this related to my torch or transformers version?
I'm running transformers 4.40.2 and torch 2.3.0+cu118

Can you please help me to fix it?

Error occurred when executing AniPortraitLoader

Error occurred when executing AniPortraitLoader:

Error no file named config.json found in directory E:\AI\ComfyUI-aki-v1.1\custom_nodes\ComfyUI-AniPortrait\pretrained_model\StableDiffusion V1.5.

File "E:\AI\ComfyUI-aki-v1.1\execution.py", line 151, in recursive_execute
output_data, output_ui = get_output_data(obj, input_data_all)
File "E:\AI\ComfyUI-aki-v1.1\execution.py", line 81, in get_output_data
return_values = map_node_over_list(obj, input_data_all, obj.FUNCTION, allow_interrupt=True)
File "E:\AI\ComfyUI-aki-v1.1\execution.py", line 74, in map_node_over_list
results.append(getattr(obj, func)(**slice_dict(input_data_all, i)))
File "E:\AI\ComfyUI-aki-v1.1\custom_nodes\ComfyUI-AniPortrait\nodes.py", line 120, in run
reference_unet = UNet2DConditionModel.from_pretrained(
File "E:\AI\ComfyUI-aki-v1.1\python\lib\site-packages\huggingface_hub\utils_validators.py", line 118, in _inner_fn
return fn(*args, **kwargs)
File "E:\AI\ComfyUI-aki-v1.1\python\lib\site-packages\diffusers\models\modeling_utils.py", line 567, in from_pretrained
config, unused_kwargs, commit_hash = cls.load_config(
File "E:\AI\ComfyUI-aki-v1.1\python\lib\site-packages\huggingface_hub\utils_validators.py", line 118, in _inner_fn
return fn(*args, **kwargs)
File "E:\AI\ComfyUI-aki-v1.1\python\lib\site-packages\diffusers\configuration_utils.py", line 374, in load_config
raise EnvironmentError(

如何仅保存生成后的视频？

大佬，我顺利的运行完工作流，看到保存的是视频带有原图片和面部网络，请问如果只保存生成后的视频？

Official audio2pose model is released

Would it be better to use the newly released official audio2pose model for AniPortrait?

https://huggingface.co/ZJYang/AniPortrait/tree/main

frankchieng / comfyui_aniportrait Goto Github PK

comfyui_aniportrait's Introduction

Updates:

audio driven combined with reference image and reference video

raw video to pose video with reference image

face reenacment

comfyui_aniportrait's People

Contributors

Stargazers

Watchers

Forkers

comfyui_aniportrait's Issues

Thanks for your node!

hello,I ask a question

CLIPVisionModelWithProjection Shape Size Error

Error occurred when executing AniPortraitLoader

如何仅保存生成后的视频？

Official audio2pose model is released

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent