Light

ERROR: Boolean value of Tensor with more than one value is ambiguous about audio2photoreal HOT 11 CLOSED

facebookresearch commented on July 21, 2024

ERROR: Boolean value of Tensor with more than one value is ambiguous

from audio2photoreal.

Comments (11)

evonneng commented on July 21, 2024 2

Ah I see! I believe the issue should be because the max function is returning more than a scalar value (eg if your audio recording is 2xT for binaural audio). Currently I am only supporting single channel audio. But I can push a fix later to combine your audio to single channel and ping this thread after!

from audio2photoreal.

evonneng commented on July 21, 2024

Hi! Thanks for reporting this issue. Could you please provide me with more context into the issue? (E.g. stack trace, inputs, screenshots etc)

from audio2photoreal.

chrisbward commented on July 21, 2024

hi @evonneng sure thing!

➜  audio2photoreal git:(main) ✗ source ./.venv/bin/activate
(.venv) ➜  audio2photoreal git:(main) ✗ python -m demo.demo
running on... cuda:0
 adding lip conditioning ./assets/iter-0200000.pt
Loading checkpoints from [checkpoints/diffusion/c1_face/model000155000.pt]...
running on... cuda:0
 using keyframes: torch.Size([1, 20, 256])
loading checkpoint from checkpoints/vq/c1_pose/net_iter300000.pth
 loading TRANSFORMER checkpoint from checkpoints/guide/c1_pose/checkpoints/iter-0100000.pt
Loading checkpoints from [checkpoints/diffusion/c1_pose/model000340000.pt]...
/home/user/Projects/11_PLAYMKRAI/audio2photoreal/.venv/lib/python3.9/site-packages/torch/functional.py:504: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at ../aten/src/ATen/native/TensorShape.cpp:3483.)
  return _VF.meshgrid(tensors, **kwargs)  # type: ignore[attr-defined]
WARNING:visualize.ca_body.nn.color_cal:Requested color-calibration identity camera not present, defaulting to 400883.
loading... ./checkpoints/ca_body/data/PXB184/body_dec.ckpt
Running on local URL:  http://127.0.0.1:7860

To create a public link, set `share=True` in `launch()`.
Traceback (most recent call last):
  File "/home/user/Projects/11_PLAYMKRAI/audio2photoreal/.venv/lib/python3.9/site-packages/gradio/queueing.py", line 489, in call_prediction
    output = await route_utils.call_process_api(
  File "/home/user/Projects/11_PLAYMKRAI/audio2photoreal/.venv/lib/python3.9/site-packages/gradio/route_utils.py", line 232, in call_process_api
    output = await app.get_blocks().process_api(
  File "/home/user/Projects/11_PLAYMKRAI/audio2photoreal/.venv/lib/python3.9/site-packages/gradio/blocks.py", line 1561, in process_api
    result = await self.call_function(
  File "/home/user/Projects/11_PLAYMKRAI/audio2photoreal/.venv/lib/python3.9/site-packages/gradio/blocks.py", line 1179, in call_function
    prediction = await anyio.to_thread.run_sync(
  File "/home/user/Projects/11_PLAYMKRAI/audio2photoreal/.venv/lib/python3.9/site-packages/anyio/to_thread.py", line 56, in run_sync
    return await get_async_backend().run_sync_in_worker_thread(
  File "/home/user/Projects/11_PLAYMKRAI/audio2photoreal/.venv/lib/python3.9/site-packages/anyio/_backends/_asyncio.py", line 2134, in run_sync_in_worker_thread
    return await future
  File "/home/user/Projects/11_PLAYMKRAI/audio2photoreal/.venv/lib/python3.9/site-packages/anyio/_backends/_asyncio.py", line 851, in run
    result = context.run(func, *args)
  File "/home/user/Projects/11_PLAYMKRAI/audio2photoreal/.venv/lib/python3.9/site-packages/gradio/utils.py", line 678, in wrapper
    response = f(*args, **kwargs)
  File "/home/user/Projects/11_PLAYMKRAI/audio2photoreal/demo/demo.py", line 216, in audio_to_avatar
    face_results, pose_results, audio = generate_results(audio, num_repetitions, top_p)
  File "/home/user/Projects/11_PLAYMKRAI/audio2photoreal/demo/demo.py", line 176, in generate_results
    dual_audio[:, :, 0] = y / max(y)
RuntimeError: Boolean value of Tensor with more than one value is ambiguous

This happens after recording audio in the gradio app and starting the generation, thanks!

from audio2photoreal.

chrisbward commented on July 21, 2024

Nice one, I found this if it helps just changing it client-side;
https://blog.mozilla.org/webrtc/channelcount-microphone-constraint/

from audio2photoreal.

chrisbward commented on July 21, 2024

And something I found for a possible solution in python;
https://stackoverflow.com/questions/30401042/stereo-to-mono-wav-in-python

from audio2photoreal.

chrisbward commented on July 21, 2024

Okay, as a quick workaround, I updated line 241;

gr.Audio(sources=["microphone", "upload"] ),

then recorded an mono audio track in Audacity to mp3 and uploaded, seems to now be running

from audio2photoreal.

chrisbward commented on July 21, 2024

Okay, I got a generation, but the audio is VERY quiet - unsure what happened here, source seems fine.

I'm just tuning the ffmpeg step to see if I can speed things up here

from audio2photoreal.

chrisbward commented on July 21, 2024

Adding -hwaccel cuda to the ffpmeg header made this step almost instant

from audio2photoreal.

MustaphaU commented on July 21, 2024

Hi @chrisbward did the issue eventually resolve? In my case, I first got the 'Boolean value of Tensor with more than one value is ambiguous' error originally but it resolved after I used a mono audio.

However, a new error ensues:

  File "C:\Users\musta\audio2photoreal\model\diffusion.py", line 388, in forward
    cond_tokens = torch.where(
RuntimeError: The size of tensor a (11598) must match the size of tensor b (1998) at non-singleton dimension 1

I wonder if anyone has an idea how to fix this. As it suggests, it has to to do with unmatched tensors due to the torch.where(..) condition in the diffusion.py file.

from audio2photoreal.

evonneng commented on July 21, 2024

Thank you all for such active help on these issues!

Hi @MustaphaU , it seems that results from the auto-generated mask size not matching that of the audio conditioning tensor. I am not too sure why that might be the case (since it would require more downstream information), but could you please try the above fix in the PR to see if it solves it? I wonder if it is because the audio is somehow getting corrupted downstream...

from audio2photoreal.

evonneng commented on July 21, 2024

Closing this for now due to inactivity. But please feel to reopen if there's more issues related. Thanks!

from audio2photoreal.

Related Issues (20)

How to build a new person? HOT 8
Novel view HOT 2
render_defaults_PXB184.pth
Local url issue HOT 5
evaluation code HOT 3
what are 256 facial codes? HOT 1
tutorial video on how to make the conversational avatar in audio2photoreal. HOT 1
video instructions. HOT 1
About classifier-free guidance train policy HOT 3
How can I manually rotate an avatar's head? HOT 2
How to pass avatar renderer conditions HOT 1
How to change the position of camera/model? HOT 1
Training the model with different data format HOT 1
The lips regressor predicts unexpected result HOT 5
Switching from Recording to Uploading Audio in a Demo: Is it Possible? HOT 1
Why the data is not as in the README ? HOT 2
Models and pre-requisites models unavailable HOT 3
Does it support languages other than English? HOT 1
Models and pre-requisites models unavailable HOT 3
What model was used to extract the body pose ? HOT 4

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.