timobolkart / voca Goto Github PK

This codebase demonstrates how to synthesize realistic 3D character animations given an arbitrary speech signal and a static character mesh.

Home Page: https://voca.is.tue.mpg.de/en

Python 100.00%

computer-vision computer-graphics face-animation 3d-models morphable-model machine-learning 3d-face python tensorflow voca

voca's People

Contributors

Stargazers

Watchers

Forkers

peterzs peterzhousz mannykayy huillll jackylee1 tzirakis daydreamer2023 zouaghihoussem xiaoyun4 gvc0461082002 avatarworld gordonsgm keyky stevenyesz ayangdanny hdjsjyl chunde tobyclh dattachandan ericustc leventt dung-n-tran johndpope leo-xxx shyamalschandra keithimyers dorniwang human2b mistobaan wang-b22 abhilash-kar0 wangbaorui fangcao1314 bruinxiong jrdeco560 amir22010 hzitoun gyanachand1 davtalab en-ashay rudhviyom asrlytics oguzkirman dudescoding rubenszimbres codeaudit wangliqiang7 rcleoni bx5974 anasvaf biranchi-narayan azarual iozoamx greatfeel yes7rose liaokongvfx jonntd huhu233 liukuangxiangzi entn-at soccergame rtvt123 ox9osub clayandcarbon uzielroy woodworker2017 anubhavmishra7 edward-deltax shatterblast kjhgfdsaas ahong286 yuh063 jonike asunny72 sunilsivadas makai281 taotaoyuhust jinfeng-jiang threedlife zfountas jjandnn peng2017 alain-michel-pilot reloadbrain ai-natural-language-processing-lab jclimma thetobysiu match08 wuxiaolianggit dimensionnxg savemuri ascust zerrui ykankaya james-ireland semsomi techlowd nitinkumartech 5l1v3r1 kznmft

voca's Issues

DeepSpeech is Too Slow, Is there other possible features?

DeepSpeech is too slow, So I want to try some other model features. Does this will influence the performance matter? Is there some possible features(models) those look like good?
Thank you very much~

Training data corrupted

Thanks a lot for this wonderful work! I am trying to train voca with given dataset, however I met the problem with training data. After unzip, I got only data_vert.npy. Is this the complete training data?

I get the OpenGL Error when i run visualize_sequence.py

whenI run visualize_sequence.py, then the console immediately printed the following error. Does this affect the program?

OpenGL test failed:
stdout: failure

stderr: Attempt to call an undefined function glutInit, check for bool(glutInit) before calling

Rendering texture in output video

Hello, I would like to know if it's possible to render a textured video instead of a pure mesh. I found from the flame project that it also exports a mesh with texture. Should I load the texture mapping every time the pyrender renders a mesh in the voca code? Thank you!

Textured mesh mapping of lips is lost

Hello
Great work!
I have used this repo and other flame repos to create a textured mesh animation with sound.
I would request your help in the texture mapping coming out as more realistic.

As you can see the first image is a textured mesh and when it is animated the result is the next picture.

As you must notice the lip region is troublesome, a particular region that should be textured like the skin get a texture of the face.
I have used the TF_FLAME repo to map textures onto all meshes generated by VOCA.

Best Regards

Audio/mesh alignment

Hello, thank you for making this code available, this is some really awesome work.
I am looking at the data provided in the project page now,
The mesh sequence for each sentence and the audio is not strictly aligned.
(i.e., the FPS calculated from # of mesh/sample size*sample rate are roughly 59~61)
Would you please elaborate on how can the audio is cropped or aligned with the video in the original capture/training process?
Thank you.

Issue about decoder

The decoder of VOCA is ba fully connected layer with linear activation function, outputting the 5023 × 3 dimensional array, but how does the 50-dimensional data map to 5023x3 through the last full connection layer? I hope the architecture here can be explained in more detail. Thank you.

Fitting a TurboSquid Head mesh to a FLAME topology

Hi,
thank you for this great library.

From my understanding, the library only allow to use 3D models with FLAME topology,
but in the demo video, a model (Churchill) from TurboSquid was used.

I cannot find clear instruction on how to use a "hand made" 3D model with this library.
I tried using Blender to convert an .obj head mesh to .ply, but obviously, the topology is different and it failed.

I see I could maybe use RingNet to generate a mesh from an image, but what if I have a 3D model already?

TLDR
Could you please elaborate on the procedure to take a regular .obj and use it with this library.

Thank you very much for your time and any information you can share.

the result contain the static face information of training data when I use the model trained by myself,could you please tell me what I had missed?

hello,thanks for your sharing firstly. I found that the result of the model contain the static face information in the training data(I downloaded on the MPI-IS/VOCASET) when i try to trained the model according to your paper:
such as the out put of '.obj' in the 'animation_output' folder will contain the face static features which in the training data,the face static feature is change and is not the same as 'template/FLAME_sample.ply' when i set '--condition_idx 3',
Could you please tell me the reason of the question?
And I found the template as input such as "input_template: np.repeat(template.v[np.newaxis, :, :, np.newaxis], num_frames, axis=0)" when i run run_voca.py，is it use to initialized the weights or it has another purpose?Could you please tell me the use of the template in training?

no module audiohandler

Traceback (most recent call last):
File "run_voca.py", line 22, in
from utils.inference import inference
File "/content/voca/utils/inference.py", line 26, in
from audio_handler import AudioHandler
ModuleNotFoundError: No module named 'audio_handler'

MPI/Mesh does't support Python 2.7 anymore

Hi, the new update for the MPI/Mesh repository breaks on Python 2.7.
I can confirm that reverting to commit 1761d544686b3735991954947a8befa759891eb4 does fix the problem on my side. Please consider adding this to the readme/ add the mesh repo as a submodule.

Thanks!

Training code?

Hi @TimoBolkart, any update on when the training code will be released?

Thanks

Audio and the mouth are not synced

My sincere condolences for Daniel Cudeiro, and amazing works!

I have a problem with the generated video. The audio and mouth seem un-synced, e.g, the speech is finished, but the mouth is still moving.
I followed the instruction to install the model, and there were no errors prompted during the installation and inference.
I'm wondering if it is because of some setting errors in the code, or the model itself?

My workarounds:

switching to different input audio.
changing different frame rates in visualize_sequence.py line 62 ... '-framerate', '60', ... to 30, 40, 50.

Bug when I want to edit the head pose

When I want edit the head pose, I got a bug

~/PycharmProjects/voca-master$ python edit_sequences.py --source_path './animation_output/meshes' --out_path './FLAME_variation_pose' --flame_model_path './flame/generic_model.pkl' --mode pose --index 3 --max_variation 0.52 Traceback (most recent call last): File "edit_sequences.py", line 215, in <module> alter_sequence_head_pose(source_path, out_path, flame_model_fname, pose_idx=pose_idx, rot_angle=rot_angle) File "edit_sequences.py", line 197, in alter_sequence_head_pose np.interp(xsteps4, x4, y4))) ValueError: could not broadcast input array from shape (590) into shape (589)

Problem about out-of-range condition index

The model still works even if I give an out-of-range condition index, for exmaple 100. What will happen if the index is out of range?

cuDNN launch failure : input shape ([306,1,16,29])

Hi,

thanks for sharing! Sorry to bother you againT_T.

I am currently trying to run your code on my machine(Python 2.7& Tensorflow 1.12.0).
When I run command line "python run_voca.py", there are some problems, The following is the output, is there something wrong with my settings?

Thank you so much!!

<
python run_voca.py --tf_model_fname './model/gstep_52280.model' --ds_fname './ds_graph/output_graph.pb' --audio_fname './audio/test_sentence.wav' --template_fname './template/FLAME_sample.ply' --condition_idx 3 --out_path './animation_output'
2019-07-23 06:49:29.981799: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 AVX512F FMA
2019-07-23 06:49:34.971252: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1432] Found device 0 with properties:
name: GeForce RTX 2080 Ti major: 7 minor: 5 memoryClockRate(GHz): 1.545
pciBusID: 0000:18:00.0
totalMemory: 10.76GiB freeMemory: 1.87GiB
2019-07-23 06:49:35.165961: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1432] Found device 1 with properties:
name: GeForce RTX 2080 Ti major: 7 minor: 5 memoryClockRate(GHz): 1.545
pciBusID: 0000:3b:00.0
totalMemory: 10.76GiB freeMemory: 1.77GiB
2019-07-23 06:49:35.287032: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1432] Found device 2 with properties:
name: GeForce RTX 2080 Ti major: 7 minor: 5 memoryClockRate(GHz): 1.545
pciBusID: 0000:86:00.0
totalMemory: 10.76GiB freeMemory: 10.60GiB
2019-07-23 06:49:35.287328: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1511] Adding visible gpu devices: 0, 1, 2
2019-07-23 06:50:29.496448: I tensorflow/core/common_runtime/gpu/gpu_device.cc:982] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-07-23 06:50:29.496516: I tensorflow/core/common_runtime/gpu/gpu_device.cc:988] 0 1 2
2019-07-23 06:50:29.496526: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1001] 0: N N N
2019-07-23 06:50:29.496549: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1001] 1: N N N
2019-07-23 06:50:29.496557: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1001] 2: N N N
2019-07-23 06:50:29.496791: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 1607 MB memory) -> physical GPU (device: 0, name: GeForce RTX 2080 Ti, pci bus id: 0000:18:00.0, compute capability: 7.5)
2019-07-23 06:50:31.759153: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:1 with 1503 MB memory) -> physical GPU (device: 1, name: GeForce RTX 2080 Ti, pci bus id: 0000:3b:00.0, compute capability: 7.5)
2019-07-23 06:50:31.759597: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:2 with 10232 MB memory) -> physical GPU (device: 2, name: GeForce RTX 2080 Ti, pci bus id: 0000:86:00.0, compute capability: 7.5)
process subj - seq
2019-07-23 06:50:41.148658: W tensorflow/core/framework/allocator.cc:122] Allocation of 201326592 exceeds 10% of system memory.
2019-07-23 06:50:41.449976: W tensorflow/core/framework/allocator.cc:122] Allocation of 201326592 exceeds 10% of system memory.
2019-07-23 06:50:42.074583: W tensorflow/core/framework/allocator.cc:122] Allocation of 201326592 exceeds 10% of system memory.
2019-07-23 06:50:42.386122: W tensorflow/core/framework/allocator.cc:122] Allocation of 201326592 exceeds 10% of system memory.
2019-07-23 06:50:42.732171: W tensorflow/core/framework/allocator.cc:122] Allocation of 201326592 exceeds 10% of system memory.
2019-07-23 06:51:52.237460: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1511] Adding visible gpu devices: 0, 1, 2
2019-07-23 06:51:52.237883: I tensorflow/core/common_runtime/gpu/gpu_device.cc:982] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-07-23 06:51:52.237898: I tensorflow/core/common_runtime/gpu/gpu_device.cc:988] 0 1 2
2019-07-23 06:51:52.237909: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1001] 0: N N N
2019-07-23 06:51:52.237916: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1001] 1: N N N
2019-07-23 06:51:52.237924: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1001] 2: N N N
2019-07-23 06:51:52.238128: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 1607 MB memory) -> physical GPU (device: 0, name: GeForce RTX 2080 Ti, pci bus id: 0000:18:00.0, compute capability: 7.5)
2019-07-23 06:51:52.238453: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:1 with 1503 MB memory) -> physical GPU (device: 1, name: GeForce RTX 2080 Ti, pci bus id: 0000:3b:00.0, compute capability: 7.5)
2019-07-23 06:51:52.238653: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:2 with 10232 MB memory) -> physical GPU (device: 2, name: GeForce RTX 2080 Ti, pci bus id: 0000:86:00.0, compute capability: 7.5)
2019-07-23 06:52:12.202266: E tensorflow/stream_executor/cuda/cuda_dnn.cc:373] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR
2019-07-23 06:52:12.202348: W ./tensorflow/stream_executor/stream.h:2093] attempting to perform DNN operation using StreamExecutor without DNN support
Traceback (most recent call last):
File "run_voca.py", line 44, in
inference(tf_model_fname, ds_fname, audio_fname, template_fname, condition_idx, out_path)
File "/home/wangqianyun/voca/utils/inference.py", line 83, in inference
predicted_vertices = np.squeeze(session.run(output_decoder, feed_dict))
File "/home/wangqianyun/voca/voca/local/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 929, in run
run_metadata_ptr)
File "/home/wangqianyun/voca/voca/local/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1152, in _run
feed_dict_tensor, options, run_metadata)
File "/home/wangqianyun/voca/voca/local/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1328, in _do_run
run_metadata)
File "/home/wangqianyun/voca/voca/local/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1348, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InternalError: cuDNN launch failure : input shape ([306,1,16,29])
[[node VOCA/SpeechEncoder/batch_norm_1/cond/FusedBatchNorm_1 (defined at /home/wangqianyun/voca/utils/inference.py:65) = FusedBatchNorm[T=DT_FLOAT, data_format="NCHW", epsilon=1.001e-05, is_training=false, _device="/job:localhost/replica:0/task:0/device:GPU:0"](VOCA/SpeechEncoder/batch_norm_1/cond/FusedBatchNorm_1-0-TransposeNHWCToNCHW-LayoutOptimizer, VOCA/SpeechEncoder/batch_norm_1/cond/FusedBatchNorm_1/Switch_1, VOCA/SpeechEncoder/batch_norm_1/cond/FusedBatchNorm_1/Switch_2, VOCA/SpeechEncoder/batch_norm_1/cond_1/AssignMovingAvg/sub/Switch, VOCA/SpeechEncoder/batch_norm_1/cond_1/AssignMovingAvg_1/sub/Switch)]]

Caused by op u'VOCA/SpeechEncoder/batch_norm_1/cond/FusedBatchNorm_1', defined at:
File "run_voca.py", line 44, in
inference(tf_model_fname, ds_fname, audio_fname, template_fname, condition_idx, out_path)
File "/home/wangqianyun/voca/utils/inference.py", line 65, in inference
saver = tf.train.import_meta_graph(tf_model_fname + '.meta')
File "/home/wangqianyun/voca/voca/local/lib/python2.7/site-packages/tensorflow/python/training/saver.py", line 1674, in import_meta_graph
meta_graph_or_file, clear_devices, import_scope, **kwargs)[0]
File "/home/wangqianyun/voca/voca/local/lib/python2.7/site-packages/tensorflow/python/training/saver.py", line 1696, in _import_meta_graph_with_return_elements
**kwargs))
File "/home/wangqianyun/voca/voca/local/lib/python2.7/site-packages/tensorflow/python/framework/meta_graph.py", line 806, in import_scoped_meta_graph_with_return_elements
return_elements=return_elements)
File "/home/wangqianyun/voca/voca/local/lib/python2.7/site-packages/tensorflow/python/util/deprecation.py", line 488, in new_func
return func(*args, **kwargs)
File "/home/wangqianyun/voca/voca/local/lib/python2.7/site-packages/tensorflow/python/framework/importer.py", line 442, in import_graph_def
_ProcessNewOps(graph)
File "/home/wangqianyun/voca/voca/local/lib/python2.7/site-packages/tensorflow/python/framework/importer.py", line 234, in _ProcessNewOps
for new_op in graph._add_new_tf_operations(compute_devices=False): # pylint: disable=protected-access
File "/home/wangqianyun/voca/voca/local/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 3440, in _add_new_tf_operations
for c_op in c_api_util.new_tf_operations(self)
File "/home/wangqianyun/voca/voca/local/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 3299, in _create_op_from_tf_operation
ret = Operation(c_op, self)
File "/home/wangqianyun/voca/voca/local/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 1770, in init
self._traceback = tf_stack.extract_stack()

InternalError (see above for traceback): cuDNN launch failure : input shape ([306,1,16,29])
[[node VOCA/SpeechEncoder/batch_norm_1/cond/FusedBatchNorm_1 (defined at /home/wangqianyun/voca/utils/inference.py:65) = FusedBatchNorm[T=DT_FLOAT, data_format="NCHW", epsilon=1.001e-05, is_training=false, _device="/job:localhost/replica:0/task:0/device:GPU:0"](VOCA/SpeechEncoder/batch_norm_1/cond/FusedBatchNorm_1-0-TransposeNHWCToNCHW-LayoutOptimizer, VOCA/SpeechEncoder/batch_norm_1/cond/FusedBatchNorm_1/Switch_1, VOCA/SpeechEncoder/batch_norm_1/cond/FusedBatchNorm_1/Switch_2, VOCA/SpeechEncoder/batch_norm_1/cond_1/AssignMovingAvg/sub/Switch, VOCA/SpeechEncoder/batch_norm_1/cond_1/AssignMovingAvg_1/sub/Switch)]]

Edit Sequence Problem: could not broadcast input array from shape (296) into shape (295)

Thank your for share of VOCA, a very excellent work!
I meet Edit-Sequence problem. I edit eye success but shape&&pose failed.

Shape Error
np_resource = np.dtype([("resource", np.ubyte, 1)])
Traceback (most recent call last):
File "edit_sequences.py", line 211, in
alter_sequence_shape(source_path, out_path, flame_model_fname, pc_idx=pc_idx, pc_range=pc_range)
File "edit_sequences.py", line 144, in alter_sequence_shape
model_parms[:, pc_idx] = np.hstack((np.interp(xsteps1, x1, y1), np.interp(xsteps2, x2, y2)))
ValueError: could not broadcast input array from shape (296) into shape (295)

Pose Error
np_resource = np.dtype([("resource", np.ubyte, 1)])
Traceback (most recent call last):
File "edit_sequences.py", line 215, in
alter_sequence_head_pose(source_path, out_path, flame_model_fname, pose_idx=pose_idx, rot_angle=rot_angle)
File "edit_sequences.py", line 197, in alter_sequence_head_pose
np.interp(xsteps4, x4, y4)))
ValueError: could not broadcast input array from shape (296) into shape (295)

Vocaset to training data

Hello, I would like to know if it's possible to know how to convert Vocaset to the training data. Is there some code that you can share. It's very helpful. Thank you!

python version mismatch

This repo requires python2.7, but DeepSpeech (a dependency) requires python 3. Any thoughts on how to reconcile the two?

I tried installing voca & mesh with python 3, but mesh only works with python 2.7.

thank you

AttributeError: 'str' object has no attribute 'astype' resampled_audio = resampy.resample(audio_sample.astype(float), sample_rate, 16000)

Hi,

I ran the process_audio function from inference.py, I got this error:

AttributeError: 'str' object has no attribute 'astype'

which occured in this line resampled_audio = resampy.resample(audio_sample.astype(float), sample_rate, 16000) from utils/audio_handler.py

Could anyone help me please?

Render Error for Pyglet

Render training sequences
cv2 >= 3
Traceback (most recent call last):
  File "/home/mesh/.virtualenvs/voca/lib/python3.6/site-packages/pyrender/platforms/pyglet.py", line 32, in init_context
    width=1, height=1)
  File "/home/mesh/.virtualenvs/voca/lib/python3.6/site-packages/pyglet/window/xlib/__init__.py", line 170, in __init__
    super(XlibWindow, self).__init__(*args, **kwargs)
  File "/home/mesh/.virtualenvs/voca/lib/python3.6/site-packages/pyglet/window/__init__.py", line 573, in __init__
    display = pyglet.canvas.get_display()
  File "/home/mesh/.virtualenvs/voca/lib/python3.6/site-packages/pyglet/canvas/__init__.py", line 95, in get_display
    return Display()
  File "/home/mesh/.virtualenvs/voca/lib/python3.6/site-packages/pyglet/canvas/xlib.py", line 119, in __init__
    raise NoSuchDisplayException('Cannot connect to "%s"' % name)
pyglet.canvas.xlib.NoSuchDisplayException: Cannot connect to "None"

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "run_training.py", line 92, in <module>
    main()
  File "run_training.py", line 89, in main
    model.train()
  File "/home/mesh/voca/utils/voca_model.py", line 189, in train
    , data_specifier='training')
  File "/home/mesh/voca/utils/voca_model.py", line 249, in _render_sequences
    self._render_helper(out_folder, data_specifier)
  File "/home/mesh/voca/utils/voca_model.py", line 274, in _render_helper
    self._render_sequences_helper(video_fname, raw_audio[i_seq], processed_audio[i_seq], templates[i_seq], vertices[i_seq], condition_idx)
  File "/home/mesh/voca/utils/voca_model.py", line 306, in _render_sequences_helper
    gt_img = render_mesh_helper(Mesh(seq_verts[i_frame], self.template_mesh.f), center)
  File "/home/mesh/voca/utils/rendering.py", line 107, in render_mesh_helper
    r = pyrender.OffscreenRenderer(viewport_width=frustum['width'], viewport_height=frustum['height'])
  File "/home/mesh/.virtualenvs/voca/lib/python3.6/site-packages/pyrender/offscreen.py", line 31, in __init__
    self._create()
  File "/home/mesh/.virtualenvs/voca/lib/python3.6/site-packages/pyrender/offscreen.py", line 134, in _create
    self._platform.init_context()
  File "/home/mesh/.virtualenvs/voca/lib/python3.6/site-packages/pyrender/platforms/pyglet.py", line 38, in init_context
    'internal error message was "{}"'.format(e)
ValueError: Failed to initialize Pyglet window with an OpenGL >= 3+ context. If you're logged in via SSH, ensure that you're running your script with vglrun (i.e. VirtualGL). The internal error message was "Cannot connect to "None""

I didn't access it through SSH. I connected to the standard desktop environment through teamviewer and it still reported an error.
This seems to be a problem of dependence. How can I solve it?

pyrender: Failed rendering frame

I follow the README.md. But in the Render sequence

python visualize_sequence.py --sequence_path './FLAME_eye_blink/meshes' --audio_fname './audio/test_sentence.wav' --out_path './FLAME_eye_blink'

The srceen print
pyrender: Failed rendering frame
pyrender: Failed rendering frame
pyrender: Failed rendering frame
pyrender: Failed rendering frame

The training loss is always small and does not converge

basic information about ply file

I created a template with 3Dmax and saved it as an obj file without texture information. Finally, I used a tool convert obj into a ply file. Does the generated ply file meet the input requirements of voca? Thanks.

How to eliminate head jitter in data preprocessing?

Thank you very much for your work.
I downloaded Registered Data, Unposed Data, and Unposed Cleaned Data.
What are the operations in the data preprocessing stage?
How to eliminate the jitter of the head and finally get Unposed Cleaned Data?

ValueError running README example

Hello,

I set up the project and tried running the example command:

python run_voca.py --tf_model_fname './model/gstep_52280.model' --ds_fname './ds_graph/output_graph.pb' --audio_fname './audio/test_sentence.wav' --template_fname './template/FLAME_sample.ply' --condition_idx 3 --out_path './animation_output'

The script runs for a while and terminates with the following error:
ValueError: Cannot feed value of shape (1, 254, 494) for Tensor 'deepspeech/input_node:0', which has shape '(1, 16, 19, 26)'

Am not sure if doing something wrong or if this is an issue, but would appreciate any advice.

Thank you,
Anton

How to get the deep speech windows

I am trying to implement your work.
Could you help me with the procedure for getting the the feature vectors from the deep speech model.
Because from what i know the deep speech model's output is a text transcript.

tensorflow-gpu 1.14.0 error importing

When I try to import tensorflow-gpu 1.14.0 with pip install -r requirements.txt, I get the following error. How do I deal with this? Can I just use pip install tensorflow-gpu 1.14.0 instead and it will work?

(voca) josephdanielchang@ubuntu:~/voca$ pip install -r requirements.txt
Collecting numpy (from -r requirements.txt (line 1))
Downloading https://files.pythonhosted.org/packages/d2/ab/43e678759326f728de861edbef34b8e2ad1b1490505f20e0d1f0716c3bf4/numpy-1.17.4-cp36-cp36m-manylinux1_x86_64.whl (20.0MB)
100% |████████████████████████████████| 20.0MB 102kB/s
Collecting scipy (from -r requirements.txt (line 2))
Downloading https://files.pythonhosted.org/packages/54/18/d7c101d5e93b6c78dc206fcdf7bd04c1f8138a7b1a93578158fa3b132b08/scipy-1.3.3-cp36-cp36m-manylinux1_x86_64.whl (25.2MB)
100% |████████████████████████████████| 25.2MB 78kB/s
Collecting chumpy (from -r requirements.txt (line 3))
Downloading https://files.pythonhosted.org/packages/87/81/d7a94ad0ff556b9dd1bf27b84c0b255d9ba22ad5952f20088dafacd9292e/chumpy-0.69.tar.gz (50kB)
100% |████████████████████████████████| 51kB 2.5MB/s
Collecting opencv-python (from -r requirements.txt (line 4))
Downloading https://files.pythonhosted.org/packages/c0/a9/9828dfaf93f40e190ebfb292141df6b7ea1a2d57b46263e757f52be8589f/opencv_python-4.1.2.30-cp36-cp36m-manylinux1_x86_64.whl (28.3MB)
100% |████████████████████████████████| 28.3MB 65kB/s
Collecting resampy (from -r requirements.txt (line 5))
Downloading https://files.pythonhosted.org/packages/79/75/e22272b9c2185fc8f3af6ce37229708b45e8b855fd4bc38b4d6b040fff65/resampy-0.2.2.tar.gz (323kB)
100% |████████████████████████████████| 327kB 3.2MB/s
Collecting python-speech-features (from -r requirements.txt (line 6))
Downloading https://files.pythonhosted.org/packages/ff/d1/94c59e20a2631985fbd2124c45177abaa9e0a4eee8ba8a305aa26fc02a8e/python_speech_features-0.6.tar.gz
Collecting tensorflow-gpu==1.14.0 (from -r requirements.txt (line 7))
Downloading https://files.pythonhosted.org/packages/76/04/43153bfdfcf6c9a4c38ecdb971ca9a75b9a791bb69a764d652c359aca504/tensorflow_gpu-1.14.0-cp36-cp36m-manylinux1_x86_64.whl (377.0MB)
99% |████████████████████████████████| 377.0MB 14.6MB/s eta 0:00:01Exception:
Traceback (most recent call last):
File "/home/josephdanielchang/.virtualenvs/voca/lib/python3.6/site-packages/pip/basecommand.py", line 215, in main
status = self.run(options, args)
File "/home/josephdanielchang/.virtualenvs/voca/lib/python3.6/site-packages/pip/commands/install.py", line 353, in run
wb.build(autobuilding=True)
File "/home/josephdanielchang/.virtualenvs/voca/lib/python3.6/site-packages/pip/wheel.py", line 749, in build
self.requirement_set.prepare_files(self.finder)
File "/home/josephdanielchang/.virtualenvs/voca/lib/python3.6/site-packages/pip/req/req_set.py", line 380, in prepare_files
ignore_dependencies=self.ignore_dependencies))
File "/home/josephdanielchang/.virtualenvs/voca/lib/python3.6/site-packages/pip/req/req_set.py", line 620, in _prepare_file
session=self.session, hashes=hashes)
File "/home/josephdanielchang/.virtualenvs/voca/lib/python3.6/site-packages/pip/download.py", line 821, in unpack_url
hashes=hashes
File "/home/josephdanielchang/.virtualenvs/voca/lib/python3.6/site-packages/pip/download.py", line 659, in unpack_http_url
hashes)
File "/home/josephdanielchang/.virtualenvs/voca/lib/python3.6/site-packages/pip/download.py", line 882, in _download_http_url
_download_url(resp, link, content_file, hashes)
File "/home/josephdanielchang/.virtualenvs/voca/lib/python3.6/site-packages/pip/download.py", line 603, in _download_url
hashes.check_against_chunks(downloaded_chunks)
File "/home/josephdanielchang/.virtualenvs/voca/lib/python3.6/site-packages/pip/utils/hashes.py", line 46, in check_against_chunks
for chunk in chunks:
File "/home/josephdanielchang/.virtualenvs/voca/lib/python3.6/site-packages/pip/download.py", line 571, in written_chunks
for chunk in chunks:
File "/home/josephdanielchang/.virtualenvs/voca/lib/python3.6/site-packages/pip/utils/ui.py", line 139, in iter
for x in it:
File "/home/josephdanielchang/.virtualenvs/voca/lib/python3.6/site-packages/pip/download.py", line 560, in resp_read
decode_content=False):
File "/home/josephdanielchang/.virtualenvs/voca/share/python-wheels/urllib3-1.22-py2.py3-none-any.whl/urllib3/response.py", line 436, in stream
data = self.read(amt=amt, decode_content=decode_content)
File "/home/josephdanielchang/.virtualenvs/voca/share/python-wheels/urllib3-1.22-py2.py3-none-any.whl/urllib3/response.py", line 384, in read
data = self._fp.read(amt)
File "/home/josephdanielchang/.virtualenvs/voca/share/python-wheels/CacheControl-0.11.7-py2.py3-none-any.whl/cachecontrol/filewrapper.py", line 63, in read
self._close()
File "/home/josephdanielchang/.virtualenvs/voca/share/python-wheels/CacheControl-0.11.7-py2.py3-none-any.whl/cachecontrol/filewrapper.py", line 50, in _close
self.__callback(self.__buf.getvalue())
File "/home/josephdanielchang/.virtualenvs/voca/share/python-wheels/CacheControl-0.11.7-py2.py3-none-any.whl/cachecontrol/controller.py", line 275, in cache_response
self.serializer.dumps(request, response, body=body),
File "/home/josephdanielchang/.virtualenvs/voca/share/python-wheels/CacheControl-0.11.7-py2.py3-none-any.whl/cachecontrol/serialize.py", line 87, in dumps
).encode("utf8"),
MemoryError

python: malloc.c:2401: sysmalloc: Assertion `(old_top == initial_top (av) && old_size == 0) || ((unsigned long) (old_size) >= MINSIZE && prev_inuse (old_top) && ((unsigned long) old_end & (pagesize - 1)) == 0)' failed.

After generating obj file from run_voca.py when visualizing those object and creating the video using visualize_sequence.py, and I've got this error, it's like there is C compiler run-time memory allocation problem.

python: malloc.c:2401: sysmalloc: Assertion `(old_top == initial_top (av) && old_size == 0) || ((unsigned long) (old_size) >= MINSIZE && prev_inuse (old_top) && ((unsigned long) old_end & (pagesize - 1)) == 0)' failed.

Thanks in Advance. ;)

I get the “X Error of failed request” when i run visualize_sequence.py

When i run visualize_sequence.py，i got this error：
X Error of failed request: BadRequest (invalid request code or no such operation)
Major opcode of failed request: 146 (GLX)
Minor opcode of failed request: 187 ()
Serial number of failed request: 135
Current serial number in output stream: 135
It's the first time I've had this problem，I don't know how it started. Can someone help me？

ValueError: NodeDef mentions attr 'feature_win_len' not in Op

Hi,

thanks for sharing!

Thank you so much!!!

<
$ python run_voca.py --tf_model_fname './model/gstep_52280.model' --ds_fname './ds_graph/output_graph.pb' --audio_fname './audio/test_sentence.wav' --template_fname './template/FLAME_sample.ply' --condition_idx 3 --out_path './animation_output'
Traceback (most recent call last):
File "run_voca.py", line 44, in
inference(tf_model_fname, ds_fname, audio_fname, template_fname, condition_idx, out_path)
File "/home/wangqianyun/voca/utils/inference.py", line 62, in inference
processed_audio = process_audio(ds_fname, audio, sample_rate)
File "/home/wangqianyun/voca/utils/inference.py", line 41, in process_audio
return audio_handler.process(tmp_audio)['subj']['seq']['audio']
File "/home/wangqianyun/voca/utils/audio_handler.py", line 53, in process
return self.convert_to_deepspeech(audio)
File "/home/wangqianyun/voca/utils/audio_handler.py", line 101, in convert_to_deepspeech
tf.import_graph_def(graph_def, name="deepspeech")
File "/home/wangqianyun/voca/voca/local/lib/python2.7/site-packages/tensorflow/python/util/deprecation.py", line 488, in new_func
return func(*args, **kwargs)
File "/home/wangqianyun/voca/voca/local/lib/python2.7/site-packages/tensorflow/python/framework/importer.py", line 422, in import_graph_def
raise ValueError(str(e))
ValueError: NodeDef mentions attr 'feature_win_len' not in Op<name=NoOp; signature= -> >; NodeDef: {{node deepspeech/model_metadata}} = NoOpfeature_win_len=32, feature_win_step=20, sample_rate=16000. (Check whether your GraphDef-interpreting binary is up to date with your GraphDef-generating binary.).

Any reason that VOCA does not output FLAME parameters?

Hi Timo,

First thanks for sharing the code for VOCA, very impressive work. I am wondering why VOCA does not output FLAME parameters (at least expression parameters). Can the current VOCA network be modified such that the model will output expression parameters?

"VOCA outputs meshes in FLAME mesh topology. However, it does not directly output FLAME parameters but you can easily compute this using this code TF_FLAME. The demo to fit the 3D model to registered 3D meshes should do the job"

Originally posted by @TimoBolkart in #21 (comment)

Question about make windows in the audio_handler.py

Hello,
Thank you for this great work.

I noticed that in the audio_handler.py, half of the window_size is padded before the sequence and half padded after. Therefore, each window includes window_size/2 frame before the current frame and window_size/2 frame after the current frame.

zero_pad = np.zeros((int(self.audio_window_size / 2), network_output.shape[1]))
network_output = np.concatenate((zero_pad, network_output, zero_pad), axis=0)

I'm trying to re-train the model without gathering the future data with the intended application in real-time animation. Therefore, I modified the padding zero section as follows:

zero_pad = np.zeros((int(self.audio_window_size), network_output.shape[1]))
 network_output = np.concatenate((zero_pad, network_output), axis=0)

However, the result became much worse.

May I know if there is any reason the audio window has to be padded as that?
Is there any specific pre-processing for the target vertex that has relationship with the window?

Thanks a lot!

can not find palce to download data

Hi,
i am very interested with the project and want to have a try. as described in the readme ,"Download the trained VOCA model, audio sequences, and template meshes from MPI-IS/VOCA."
but I found no data the target link at https://voca.is.tue.mpg.de/ or at https://ps.is.tuebingen.mpg.de/publications/voca2019
could you please provide a more specific url to get the data? thank you very much

The requested URL /downloads/voca/trainingdata_zip was not found on this server.

Hi,

Thanks a lot for this wonderful work!
I am trying to train voca with given dataset, however I met the problem with training data.
After going to https://psfiles.is.tuebingen.mpg.de/downloads/voca/trainingdata_zip,
I got "The requested URL /downloads/voca/trainingdata_zip was not found on this server.".
Is there something wrong with my network?

add eyeblink to texture data in animation

hi,
First of all, thank you for this work.
I was able to make the eye blinking possible with the created 3d mesh, but don't know how to perform the same in the obj data along with the texture content. I was able to create the animation with the texture data but I need to make the eye blinking also work in that animation, please do help.
Thanks

How to add or controll expression in the template animation?

I want to add some expressions when the template is talking. But now I am not sure how to add it. Thanks for your advice~

FLAME parameters from VOCA model

I've been looking through the code and trying to figure out how to get flame parameters e.g. shape pose camera and expression out of VOCA. It appears as if all we get out of it is the mesh object? Is there a way to get the FLAME parameters as well?

a Gaussian filtering (across the sequence) to mitigate capture noise

to get the Unposed Cleaned Data, we fix for each sequence (within the Unposed Data) the neck boundary vertices and the ear vertices to a fixed locatio, and apply to the region around the eyes a Gaussian filtering (across the sequence) to mitigate capture noise.

In Issue #28
Can you reveal the code for this step of Gaussian filtering? Thank you

visualize_sequence render failure: Could not open file : ./animation_visualization/img/*.png

OpenGL test failed:
stdout: failure

stderr: Attempt to call an undefined function glutInit, check for bool(glutInit) before calling

ffmpeg version 2.8.15-0ubuntu0.16.04.1 Copyright (c) 2000-2018 the FFmpeg developers
built with gcc 5.4.0 (Ubuntu 5.4.0-6ubuntu1~16.04.10) 20160609
configuration: --prefix=/usr --extra-version=0ubuntu0.16.04.1 --build-suffix=-ffmpeg --toolchain=hardened --libdir=/usr/lib/x86_64-linux-gnu --incdir=/usr/include/x86_64-linux-gnu --cc=cc --cxx=g++ --enable-gpl --enable-shared --disable-stripping --disable-decoder=libopenjpeg --disable-decoder=libschroedinger --enable-avresample --enable-avisynth --enable-gnutls --enable-ladspa --enable-libass --enable-libbluray --enable-libbs2b --enable-libcaca --enable-libcdio --enable-libflite --enable-libfontconfig --enable-libfreetype --enable-libfribidi --enable-libgme --enable-libgsm --enable-libmodplug --enable-libmp3lame --enable-libopenjpeg --enable-libopus --enable-libpulse --enable-librtmp --enable-libschroedinger --enable-libshine --enable-libsnappy --enable-libsoxr --enable-libspeex --enable-libssh --enable-libtheora --enable-libtwolame --enable-libvorbis --enable-libvpx --enable-libwavpack --enable-libwebp --enable-libx265 --enable-libxvid --enable-libzvbi --enable-openal --enable-opengl --enable-x11grab --enable-libdc1394 --enable-libiec61883 --enable-libzmq --enable-frei0r --enable-libx264 --enable-libopencv
libavutil 54. 31.100 / 54. 31.100
libavcodec 56. 60.100 / 56. 60.100
libavformat 56. 40.101 / 56. 40.101
libavdevice 56. 4.100 / 56. 4.100
libavfilter 5. 40.101 / 5. 40.101
libavresample 2. 1. 0 / 2. 1. 0
libswscale 3. 1.101 / 3. 1.101
libswresample 1. 2.101 / 1. 2.101
libpostproc 53. 3.100 / 53. 3.100
[image2 @ 0x1e9c0e0**] Could not open file : ./animation_visualization/img/.png**
[image2 @ 0x1e9c0e0] Could not find codec parameters for stream 0 (Video: png, none(pc)): unspecified size
Consider increasing the value for the 'analyzeduration' and 'prCould not open file : ./animation_visualization/img/.pngobesize' options
./animation_visualization/img/.png: could not find codec parameters
Input #0, image2, from './animation_visualization/img/.png':
Duration: 00:00:00.02, start: 0.000000, bitrate: N/A
Stream #0:0: Video: png, none(pc), 60 tbr, 60 tbn, 60 tbc
Guessed Channel Layout for Input Stream #1.0 : mono

Why did mv.save did not save any *.png files in img folder? Thank you.

is there any way to make it blink?

Hi, thanks for your excellent work. Is there any way to make the head blink? thanks.

Loss does not converge on verification set

The loss on the verification set almost does not decrease, and the loss on the training set decreases normally.

Implementation mistake in _validation_step function

Hey,

Congratulations on this excellent work and thanks for sharing your code and results.

As far as I understand, the _validation_step funtion in the voca_model class is intended to compute the validation error conditioned on all training subjects and return the average over all. Towards that, the elements of each of the validation batch variables (processed_audio, vertices, templates, conditions) were repeated according to 'num_training_subjects' parameter. For that, the np.repeat method was used. However, given the way you create conditions (see below), the np.repeat will repeat each single element along axis 0 (as you define it in your code) a 'repeats' number of times.

conditions = np.reshape(np.repeat(np.arange(num_training_subjects)[:,np.newaxis],
repeats=self.config['num_consecutive_frames']*self.config['batch_size'], axis=-1), [-1,])

Instead, we would like to repeat the elements along axis 0 in blocks to match with 'conditions' so that each data frame is conditioned on all training subjects. Currently, for example, the first 'num_training_subjects' elements of 'conditions' are all zeros, while the first 'num_training_subjects' elements along axis 0 of , e.g., self.speech_features, after repeating them the way you do currently, are all exactly the same. This means you will condition the same data frame repeatedly by 'num_training_subjects' on subject number 0. This will also produce wrong results when computing the velocity loss, as consequtive data frames are the same after following this way of repeatition.

Alternatively, np.tile could be use as a solution to this. For example, the new way of creating self.speech_features could be as follows:

self.speech_features = np.expand_dims(np.tile(processed_audio, (num_training_subjects, 1, 1)), -1)

Do you agree?

save video has a bug. call(cmd)

thank you for you work ,could you help me save the problem?
Traceback (most recent call last):
File "visualize_sequence.py", line 66, in
call(cmd)
File "/usr/lib/python2.7/subprocess.py", line 523, in call
return Popen(*popenargs, **kwargs).wait()
File "/usr/lib/python2.7/subprocess.py", line 711, in init
errread, errwrite)
File "/usr/lib/python2.7/subprocess.py", line 1343, in _execute_child
raise child_exception
OSError: [Errno 2] No such file or directory

basic information about texture

I run voca, visualize the meshes with a pre-defined texture (obtained by fitting FLAME to an image using TF_FLAME), the video is generated.The texture_mesh.png does not seem to contain texture information about the teeth.If I need to add tooth texture to an existing video, what should I do, thanks.
video.zip

dataset

Hi, thanks for the amazing work, but I have not found the dataset link. Have you made it publicly available yet?

Training code?

Hi, first of all amazing work and my condolences for the first author.

It seems that the tensorflow code for the model is not present in the repo. Can you please guide me to it or can you please make it available to try out the training?

How to add textured template ? does it only supports .ply and .obj ? not able to add textured obj/ply

How to create 3D textured model from Ringnet for using it in VOCA

I was trying some experiments with VOCA. When I tried the Ringnet repository many of the meshes looked similar with another. Hence as you know, the texture is really important for identifying a person. How can I add texture to the meshes for using them in VOCA. I have tried with TF_FLAME for creating texture for the mesh using the image file but it was not usable in VOCA. Can you suggest how I can do this?
It will be really grateful if you can help me with this. Thank you

Install Mesh Error in MacOs

solved see detail in here Mesh install Make all c++ Error in MacOS