Code Monkey home page Code Monkey logo

makeittalk's People

Contributors

dingzeyuli avatar yzhou359 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

makeittalk's Issues

Question about the Sound Encoder

Thanks for your great work! The results are amazing!

I have a questions about the choice of the sound encoder. As stated in the last paragraph of "Related Work" in your paper, you use Resemblyer to extract the identity embedding, and AutoVC to extract the content embedding.

Here is my question: since AutoVC itself can decompose content and identity, why bother use Resemblyer? Does it make a big difference?

Thank you in advance!

训练中文模型

您好,我在用makeittalk训练自己的模型,src/dataset/utils/目录下的STD_FACE_LANDMARKS.txt, MEAN_STD_AUTOVC_RETRAIN_AU.txt这些文件能通用吗,如果不是通用的,您是通过那个文件生成的,没有从项目中找到对应的代码。

Is there any scope for making a lightweight model?

Hey! it is an awesome work on animating faces according to text. I wanted to know, if the image(just a sketch) is fixed and the audio received is varying, can we make a custom lightweight model like mobilenet, which can generate Generate input data for inference and Audio-to-Landmarks prediction using the browser's webgl only, in real time.

autovc 训练

作者您好,autovc作者已经开源训练代码,您这边对autovc模型做了一些修改,能否贡献您这边修改后autovc的训练代码呢,期待您的回复

Train

Hi, good job, how to train?

When Training Content Branch

Hello!I met with some problem when i was training content branch.
I could not find a file named 'autovc_retrain_mel_train_au.pickle' from your dataset uploaded in google drive.
so the code stopped while running '/src/dataset/audio2landmark/audio2landmark_dataset.py line 33'
Could you please tell me how to solve this problem.
btw, i wonder what is the meaning of 'align' and 'mel' of the filenames in your dataset.🤔

3DMM

Hi,
thanks again for this amazing work. In your opinion, do you think that this approach can replace the 3D Morphable models based approaches? Regardless of the simplicity and less DoF, only the quality.

Thanks

Could find no file with path '%06d.tga' and index in the range 0-4 %06d.tga: No such file or directory

The cartoon demo encountered the following errors:
ffmpeg version 4.2.2 Copyright (c) 2000-2019 the FFmpeg developers
built with gcc 9.2.1 (GCC) 20200122
configuration: --disable-static --enable-shared --enable-gpl --enable-version3 --enable-sdl2 --enable-fontconfig --enable-gnutls --enable-iconv --enable-libass --enable-libdav1d --enable-libbluray --enable-libfreetype --enable-libmp3lame --enable-libopencore-amrnb --enable-libopencore-amrwb --enable-libopenjpeg --enable-libopus --enable-libshine --enable-libsnappy --enable-libsoxr --enable-libtheora --enable-libtwolame --enable-libvpx --enable-libwavpack --enable-libwebp --enable-libx264 --enable-libx265 --enable-libxml2 --enable-libzimg --enable-lzma --enable-zlib --enable-gmp --enable-libvidstab --enable-libvorbis --enable-libvo-amrwbenc --enable-libmysofa --enable-libspeex --enable-libxvid --enable-libaom --enable-libmfx --enable-amf --enable-ffnvcodec --enable-cuvid --enable-d3d11va --enable-nvenc --enable-nvdec --enable-dxva2 --enable-avisynth --enable-libopenmpt
libavutil 56. 31.100 / 56. 31.100
libavcodec 58. 54.100 / 58. 54.100
libavformat 58. 29.100 / 58. 29.100
libavdevice 58. 8.100 / 58. 8.100
libavfilter 7. 57.100 / 7. 57.100
libswscale 5. 5.100 / 5. 5.100
libswresample 3. 5.100 / 3. 5.100
libpostproc 55. 5.100 / 55. 5.100
[image2 @ 000002659ffd11c0] Could find no file with path '%06d.tga' and index in the range 0-4
%06d.tga: No such file or directory

I am using Ubuntu 18.04

Thanks

Training data

Hello! Thanks for your great results!
I have a question about the data you training on:
I can't find exactly the resolution of the face crop in the "underlined" part of the data. Also the amount of it one.

MakeItTalk-Speaker-Aware-Talking-Head-Animation

Can you mention it please ^^

P.S. Did you use the full VoxCeleb2 dataset for Img-to-Img training?

content branch 的训练问题

目前谷歌网盘上没有提供train集,请问可否用文字说明

  1. content branch 需要什么输入格式,如何使用 AutoVC 代码或者需要哪些额外处理方式从原始音频和视频构造这些输入;
  2. 在目前提供的代码基础上如何重新训练 AutoVC 模型。
    我用其他数据集构建训练数据时对所需格式比较迷惑,求指点。

How to get the triangulation.txt from a new cartoon image

Thanks for sharing the works. I wonder if I can use the codes to drive a new cartoon image. I can get the landmarks by hand, but what does the file triangulation.txt mean? How can I get it from a new cartoon image?

Thanks a lot.

out.mp4 with audio embeded ?

Hi,
I am looking how to get the good quality "out.mp4", with the audio embeded in it (instead of having the test_audio_embed.mp4 which is less good as quality).
If you have a clue what to change in the code, thank you

tmp.wav file missing?

When I run the two jupyter notebooks, or even main_end2end.py, I get the following error:


---------------------------------------------------------------------------
FileNotFoundError                         Traceback (most recent call last)
<ipython-input-7-fc3b009acf6e> in <module>
     76 for ain in ains:
     77     os.system('ffmpeg -y -loglevel error -i examples/{} -ar 16000 examples/tmp.wav'.format(ain))
---> 78     shutil.copyfile('examples/tmp.wav', 'examples/{}'.format(ain))
     79 
     80     # au embedding

c:\users\admin\appdata\local\programs\python\python37\lib\shutil.py in copyfile(src, dst, follow_symlinks)
    118         os.symlink(os.readlink(src), dst)
    119     else:
--> 120         with open(src, 'rb') as fsrc:
    121             with open(dst, 'wb') as fdst:
    122                 copyfileobj(fsrc, fdst)

FileNotFoundError: [Errno 2] No such file or directory: 'examples/tmp.wav'

Where can I find this tmp.wav file?

'NoneType' object has no attribute 'ndim'

Hi,
thanks alot for bringing the amazing concept here in github.
i got error when i change the perbuilt image to my image
below is the screen shot of error log.
errorlog

please help me as this is my personal project to get the job.

Once again thanks a alot.
god bless you.

when training Content Branch

the preprocessed dataset for training Content Branch only includes 3 files i.e. autovc_retrain_mel_test_au.pickle autovc_retrain_mel_test_fl.pickle emb.pickle ? when running command 'python main_train_content.py --train' ,i will get FileNotFoundError: [Errno 2] No such file or directory: autovc_retrain_mel_train_au.pickle.

'NoneType' object has no attribute 'ndim'

Hi, after struggling to get a good configuration of conda + pytorch + pyvision + cuda etc..(first time).
I managed to run the script but I'm facing another problem;

(makeittalk_env) vincent@denaes:~/Desktop/development/MakeItTalk-main$ python main_end2end.py --jpg examples/327-3275260_leonardo-dicaprio-png-famous-actor.png
Downloading: "https://www.adrianbulat.com/downloads/python-fan/3DFAN4-4a694010b9.zip" to /home/vincent/.cache/torch/hub/checkpoints/3DFAN4-4a694010b9.zip
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████| 91.9M/91.9M [03:41<00:00, 434kB/s]
Downloading: "https://www.adrianbulat.com/downloads/python-fan/depth-6c4283c0e0.zip" to /home/vincent/.cache/torch/hub/checkpoints/depth-6c4283c0e0.zip
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████| 224M/224M [09:08<00:00, 429kB/s]
Traceback (most recent call last):
  File "main_end2end.py", line 72, in <module>
    shapes = predictor.get_landmarks(img)
  File "/home/vincent/anaconda3/envs/makeittalk_env/lib/python3.6/site-packages/face_alignment/api.py", line 110, in get_landmarks
    return self.get_landmarks_from_image(image_or_path, detected_faces, return_bboxes, return_landmark_score)
  File "/home/vincent/anaconda3/envs/makeittalk_env/lib/python3.6/site-packages/torch/autograd/grad_mode.py", line 28, in decorate_context
    return func(*args, **kwargs)
  File "/home/vincent/anaconda3/envs/makeittalk_env/lib/python3.6/site-packages/face_alignment/api.py", line 138, in get_landmarks_from_image
    image = get_image(image_or_path)
  File "/home/vincent/anaconda3/envs/makeittalk_env/lib/python3.6/site-packages/face_alignment/utils.py", line 342, in get_image
    if image.ndim == 2:
AttributeError: 'NoneType' object has no attribute 'ndim'

If you have a clue, thank you !

没有找到音频输入的地方

请问在哪里输入音频呢?我运行的quickdemo时候一直报错没有tmp.wav文件。请问tmp音频是什么?怎么来的呢?

Output dimension of image-image translation

Looks like in the paper, the image-image translation is trained using batch 16.
When using the code to do inference with batch 16, the input tensor is of dimension (16, 6, 256, 256). However the output from image-image translation model is still (1,6,256,256). Any idea what's the issue behind that?

Could find no file with path '%06d.tga' and index in the range 0-4 %06d.tga: No such file or directory

The cartoon demo encountered the following errors:
ffmpeg version 4.2.2 Copyright (c) 2000-2019 the FFmpeg developers
built with gcc 9.2.1 (GCC) 20200122
configuration: --disable-static --enable-shared --enable-gpl --enable-version3 --enable-sdl2 --enable-fontconfig --enable-gnutls --enable-iconv --enable-libass --enable-libdav1d --enable-libbluray --enable-libfreetype --enable-libmp3lame --enable-libopencore-amrnb --enable-libopencore-amrwb --enable-libopenjpeg --enable-libopus --enable-libshine --enable-libsnappy --enable-libsoxr --enable-libtheora --enable-libtwolame --enable-libvpx --enable-libwavpack --enable-libwebp --enable-libx264 --enable-libx265 --enable-libxml2 --enable-libzimg --enable-lzma --enable-zlib --enable-gmp --enable-libvidstab --enable-libvorbis --enable-libvo-amrwbenc --enable-libmysofa --enable-libspeex --enable-libxvid --enable-libaom --enable-libmfx --enable-amf --enable-ffnvcodec --enable-cuvid --enable-d3d11va --enable-nvenc --enable-nvdec --enable-dxva2 --enable-avisynth --enable-libopenmpt
libavutil 56. 31.100 / 56. 31.100
libavcodec 58. 54.100 / 58. 54.100
libavformat 58. 29.100 / 58. 29.100
libavdevice 58. 8.100 / 58. 8.100
libavfilter 7. 57.100 / 7. 57.100
libswscale 5. 5.100 / 5. 5.100
libswresample 3. 5.100 / 3. 5.100
libpostproc 55. 5.100 / 55. 5.100
[image2 @ 000002659ffd11c0] Could find no file with path '%06d.tga' and index in the range 0-4
%06d.tga: No such file or directory

RuntimeError: [enforce fail at ..\caffe2\serialize\inline_container.cc:145] . PytorchStreamReader failed reading zip archive: failed finding central directory

` img =cv2.imread('examples/' + opt_parser.jpg)
predictor = face_alignment.FaceAlignment(face_alignment.LandmarksType._3D, device='cpu', flip_input=True)
shapes = predictor.get_landmarks(img)
if (not shapes or len(shapes) != 1):
print('Cannot detect face landmarks. Exit.')
exit(-1)
shape_3d = shapes[0]

if(opt_parser.close_input_face_mouth):
util.close_input_face_mouth(shape_3d)
`

Can anyone help??


RuntimeError Traceback (most recent call last)
in
1 img =cv2.imread('examples/' + opt_parser.jpg)
----> 2 predictor = face_alignment.FaceAlignment(face_alignment.LandmarksType._3D, device='cpu', flip_input=True)
3 shapes = predictor.get_landmarks(img)
4 if (not shapes or len(shapes) != 1):
5 print('Cannot detect face landmarks. Exit.')

~\Anaconda3\lib\site-packages\face_alignment\api.py in init(self, landmarks_type, network_size, device, flip_input, face_detector, face_detector_kwargs, verbose)
83 network_name = '3DFAN-' + str(network_size)
84 self.face_alignment_net = torch.jit.load(
---> 85 load_file_from_url(models_urls.get(pytorch_version, default_model_urls)[network_name]))
86
87 self.face_alignment_net.to(device)

~\Anaconda3\lib\site-packages\torch\jit_serialization.py in load(f, map_location, _extra_files)
159 cu = torch._C.CompilationUnit()
160 if isinstance(f, str) or isinstance(f, pathlib.Path):
--> 161 cpp_module = torch._C.import_ir_module(cu, str(f), map_location, _extra_files)
162 else:
163 cpp_module = torch._C.import_ir_module_from_buffer(

RuntimeError: [enforce fail at ..\caffe2\serialize\inline_container.cc:145] . PytorchStreamReader failed reading zip archive: failed finding central directory

my own dataset

Thank you for sharing your work.

I want to train your model on my own dataset. I wonder how to preprocess the dataset.
Could you tell me how you did it?

basic information about image background

Thanks for your sharing, this is a good project. I have a question now. The portrait image (256x256.jpg) is cropped from an image (1280x720.jpg) and used as input, then a video (256x256.mp4) is generated. Is there any way to convert 1280x720 based on your results the background of .jpg is also increased to 256x256.mp4? Thanks.

custom cartoon

Hello,
thanks for this amazing work and for sharing it. I need to understand how to get the Delaunay triangulation for the custom cartoon?

Thanks in advance and will really appreciate your response

CV小白的关于模型和训练的一些问题

1.什么是register?
在训练content model的时候,论文提到
We also register the facial landmarks to a frontfacing standard facial template using a
best-estimated affine transformation 是对应src/approaches/train_content.py下的如下代码吗

          ''' register face '''
        if (self.opt_parser.use_reg_as_std):
            landmarks = input_face_id.detach().cpu().numpy().reshape(68, 3)
            frame_t_shape = landmarks[self.t_shape_idx, :]
            T, distance, itr = icp(frame_t_shape, self.anchor_t_shape)
            landmarks = np.hstack((landmarks, np.ones((68, 1))))
            registered_landmarks = np.dot(T, landmarks.T).T
            input_face_id = torch.tensor(registered_landmarks[:, 0:3].reshape(1, 204), requires_grad=False,
            dtype=torch.float).to(device)

在准备content的训练数据时候,即目标fl坐标,需要做register吗?

3.训练speaker_aware_model是使用 src/approaches/train_speaker_aware.py 代码吗?
1)为什么“Discriminator D_T” 训练的部分被注释掉,需要训练D吗?

2)训练这个模型需要register相关的东西吗?
论文中4.2中提到:
we do not register the landmarks to a front-facing template since
here we are interested in learning the overall head motion.
但是src/approaches/train_speaker_aware.py中还加载了inputs_reg_fl, 不明白是什么意思

3)计算loss:
fl_dis_pred = fl_dis_pred + face_id[0:1].detach()
loss_reg_fls = torch.nn.functional.l1_loss(fl_dis_pred, reg_fls_gt)
V = (fl_dis_pred + face_id[0:1]).view(-1, 68, 3) -> 这里是不是不用 + face_id了?

  1. rot_trans, rot_quats跟pos相关,可以把相关代码注释,对吗?reg*跟pos相关吗?
  1. 对人脸做norm_input_face处理的时候,每一帧单独处理吗? 是不是一整个视频用同样的scale和shift值更合适?

How to preprocess landmarks for speech content training (and speaker-aware module)

I have three questions:

  1. How to preprocess facial landmark for speech content training? I use face-alignment for landmark detection, scale of landmark in hundreds, but when I open file autovc_retrain_mel_test_au.pickle, the scale of facial landmark in (-1, 1).
  2. In your paper, in section 4.1, I think in training speech content module, we must align all facial landmark frame to fix the landmark points (27, 28, 29, 30, 33, 36, 39, 42, 45) and only consider the displacement of other points (mouth, jaw movements) and factoring out head movement. But in file src\approaches\train_content.py, in line 176, you only register the chosen closing lip landmark to standard landmark in file src\dataset\utils\STD_FACE_LANDMARKS.txt, other facial landmark frames are not aligned, this mean contains head movement in speech content training?
  3. What are the differences between the preprocessing of facial landmark in preparing dataset in speech content animation module training and speaker-aware animation module training?

The content embedding of the Vox dataset

Hi, When you extract the content embedding of the vox dataset, do you use the same target identity embedding(autovc/retrain_version/obama_emb.txt) as that in the Obama dataset?

Align facial landmarks and AutoVC output

Thanks for your great project. As we know, fps of the video is 25 or 29.97, the rate of AutoVC output is 62.5Hz. I wonder how to align facial landmark and AutoVC output temporal steps?

训练时遇到的一些问题

你好,当我训练内容分支时遇到一个问题,在视频中总会出现一些结果很坏的帧,如下图所示

这是训练245epoch之后在验证集中得到的连续的428、429、430、431帧,我检查过我的训练集没有错误的视频数据,请问您遇到过这样的问题吗?您认为可能是哪里出现了问题?

Broken face when using example cartoons

Hi, I used python main_end2end_cartoon.py --jpg "wilk.png" --jpg_bg "wilk_bg.png" for both wilk and bluehead and the output video looks like this:
image
image

Please help, thanks!

Quick Demo Colab File not found error

No such file or directory: 'examples/pred_fls_examples/M6_04_16k_audio_embed.txt'. There is a similar closed issue on Windows but I am experiencing this on Colab.

输入中文音频嘴型对不上以及性能如何优化

作者您好,我对你MakeItTalk很感兴趣,但是我被两个问题所困扰,第一个问题是我输入的中文音频和嘴型对不上,这个问题从何入手去定位?第二个问题是我输入一段24s的音频,生成的out.mp4所需要的时间在170-190s左右,时间有点太长了,这个性能能否优化到50%,请问如何优化呢?

Where are the raw images on Colab Demo?

I tried out the demo and received a video clip with 3 faces at the end. I want to post process the images. How can I download the rendered images only?
I couldn't find them. I guess they are deleted each time. How can I stop that?

Evaluation Metrics

Thanks for the release of this great repo!

I am interested in your proposed quantitative evaluation metrics (i.e., D-VL, D-A, D-LL, D-Rot/Pos). A standard evaluation metrics will make the following comparison more fairer. Is there any code for this part? Or can you provide more details (e.g., the definition of the mouth shape for the D-A metric) for computing those metrics?

Thanks for any reply!

No module named 'thirdparty.autovc' - colab error

Just following colab example step by step and getting and error

---------------------------------------------------------------------------
ModuleNotFoundError                       Traceback (most recent call last)
<ipython-input-6-a3006e2fd807> in <module>()
      9 import pickle
     10 import face_alignment
---> 11 from thirdparty.autovc.AutoVC_mel_Convertor_retrain_version import AutoVC_mel_Convertor
     12 import shutil
     13 import time

ModuleNotFoundError: No module named 'thirdparty.autovc'

---------------------------------------------------------------------------
NOTE: If your import is failing due to a missing package, you can
manually install dependencies using either !pip or !apt.

To view examples of installing some common dependencies, click the
"Open Examples" button below.
---------------------------------------------------------------------------

MakeItTalk Quick Demo - 3/3 - SameFileError

MakeItTalk Quick Demo - 3/3 - SameFileError

MakeItTalk Quick Demo (natural human face animation)
Step 3/3: One-click to Run (just wait in seconds).

Error

Loaded Image...
---------------------------------------------------------------------------
SameFileError                             Traceback (most recent call last)
<ipython-input-10-fc3b009acf6e> in <module>()
     76 for ain in ains:
     77     os.system('ffmpeg -y -loglevel error -i examples/{} -ar 16000 examples/tmp.wav'.format(ain))
---> 78     shutil.copyfile('examples/tmp.wav', 'examples/{}'.format(ain))
     79 
     80     # au embedding

/home/ifarkas/anaconda3/envs/makeittalk_env/lib/python3.6/shutil.py in copyfile(src, dst, follow_symlinks)
    102     """
    103     if _samefile(src, dst):
--> 104         raise SameFileError("{!r} and {!r} are the same file".format(src, dst))
    105 
    106     for fn in [src, dst]:

SameFileError: 'examples/tmp.wav' and 'examples/tmp.wav' are the same file

image2image

random_0000.mp4

您好,我使用你发布的img2img预训练模型生成的视频效果不好,请问一下这是什么原因造成的

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.