Comments (20)
Hi @roschler, I'm happy to look into this. Can you provide some steps for how to reproduce this issue? If you had a transcript of host speech that would cause this to occur that would be really helpful. Also would be good to get some information about your browser and version and which rendering engine you're using. Have you tried using a different browser like Firefox to see if the issue still occurs?
For the audio implementation, we are using Web Audio. For the three.js build we create an Audio object and either a THREE.PositionalAudio or THREE.Audio object, then connect the Audio object to the three.js Audio object using Audio.setMediaElementSource. In Babylon.js the rendering engine handles the creation of the web audio object, we just pass the url to the constructor of BABYLON.Sound. If you wanted to circumvent the host audio I can think of two possibilities. The first option is a little hacky but quicker to implement. You could call the setVolume method of the TextToSpeechFeature to set it to 0. Then you could listen for the TextToSpeechFeature's play event, which will supply a Speech object as an argument to your listener function. When you catch the event you could immediately pause speech and use the Speech object's audio property to get a handle to the Web Audio object, which will point you to the url the audio was loaded from. Use that to create your Howler.js audio and then play your resulting audio once it's ready at the same time as resuming speech on the host. The second option would be to pull the repository and create your own custom build that overloads the speech implementation. TextToSpeechFeature._synthesizeAudio is where you'd need to create your custom audio. You may also need to overload play/pause/resume/stop of the Speech class depending on how Howler.js audio works. I can provide more details on the second option if you do want to try that.
from amazon-sumerian-hosts.
@c-morten The text doesn't matter. I've run a lot of tests. To test for yourself, just grab 5 to 10 minutes worth of text off the web, anywhere, and just keep generating TTS with the host TextToSpeechFeature facility with it. It's a quantity thing.
A big thanks for the audio internals details. Hopefully it doesn't come to that (at least for now). Eventually I'll want that to replace the audio anyways but hopefully not for a lon gitme. I'd like to apply volume and sound effects to the voices eventurally and I don't think there's a way to do that with the current library. Note, it would be nice if there was an easy way to swap out control of the audio so that once the audio needs to be started in sync with the viseme stream to effect lip sync TTS, the audio side of the things could be handed off to a consumer provided callback.
For now, I'm going to try the same test on my other stations. Hopefully it's an Ubuntu 14.04 audio driver issue and nothing else. That's an old Linux build.
from amazon-sumerian-hosts.
@c-morten Does the Sumerian Hosts use WAV or MP3 generated audio when creating TTS through Polly via the TextToSpeechFeature._synthesizeAudio
call? I found this Stack Overflow post that mentions crackling audio when using WAV formatted audio and suggests switching to MPE:
https://stackoverflow.com/questions/6955957/html5-audio-crackle-in-chrome
from amazon-sumerian-hosts.
The audio format is specified in the options you pass in when adding the TextToSpeechFeature, or when you play speech. If you don't define it we default to MP3, so you most likely were not getting WAV audio.
from amazon-sumerian-hosts.
Ok, thanks. I was hoping it was WAV. Looks like I'm going to have fork and dig deeper. It happens on all stations.
from amazon-sumerian-hosts.
I have not yet been able to reproduce this, I have run test audio for over 30 minutes straight on all 3 builds with no issues yet. Can I get more information on your test scenario:
- Which build are you using?
- Is the browser and tab that's playing the audio active for the entire time leading up to when the issue occurs?
- When it starts happening, do you notice any memory spikes in the console?
- Do you encounter the same issue when using Firefox instead of Chrome
from amazon-sumerian-hosts.
from amazon-sumerian-hosts.
@c-morten Just tested on FireFox. Happens with FireFox too, same pattern too.
Where do I look to give you the correct answer to "what build are you using?"?
Regarding memory spikes, do you mean in the main system monitor or in the Chrome Task Browser (i.e. - Chrome's internal system monitor)?
Here's a note, not related to the audio crackling. Just a general comment about Sumerian Hosts audio on FireFox compared to Chrome. On Chrome, before the crackling occurs, the audio is smooth. On FireFox, the audio seems to get "clipped" at the start and the end of the waveform. If you have ever worked with music gear it feels like a noise gate with the volume threshold set too high, so when the audio starts there's an abrupt jump from no sound to some sound instead of a gentle, smooth easing in like sound normally does.
from amazon-sumerian-hosts.
@c-morten I watched memory/CPU/GPU in both the main System Monitor (Windows 8) and the Chrome Task Manager. Memory did not jump around much, but I did see something strange. When the audio was smooth, the CPU% was around 46% and then dropped to about 5% when the host animation/audio playing stopped. However, when the audio started crackling, especially heavily, the CPU was around 79% or worse. Also, after the scene stopped, the CPU stayed at that same high consumption level instead of dropping precipitously like it usually does after a scene stops. It's as if something in the browser is stuck doing something and won't stop.
This is wild speculation, but if for some reason some audio rendering process got stuck, then further attempts to play audio could easily cause crackling since the audio buffers would not be delivered properly with gaps between their delivery. This would get worse with each attempt if each attempt added another stuck audio process on the "stack".
from amazon-sumerian-hosts.
@c-morten I found a tutorial on debugging web audio problems using Chrome DevTools, especially in regards to crackling:
https://web.dev/profiling-web-audio-apps-in-chrome/
Here are some screenshots showing the performance metrics before and after crackling has begun. I have drawn boxes around the stats that are most notable (to me):
VIEW: tracing
SECTOR: AudioOutputDevice
PHASE: Before Crackling Has Begun
PHASE: During Crackling
NOTE: For the wasapi_render_thread, I didn't see any glaring differences, but when I look at the average durations the load appears to be about 25% greater during crackling compared to before crackling.
VIEW: tracing
SECTOR: wasapi_render_thread
PHASE: Before Crackling Has Begun
PHASE: During Crackling
VIEW: WebAudio Tools
NOTE: Look at the status line at the bottom of the screen for each of the following screenshots.
PHASE: IDLE (i.e. - baseline, **before** any audio rendering has begun)
NOTE: All values in the status line are zero.
PHASE: ACTIVE (i.e. - actively rendering scene and audio, but **before** crackling has begun)
PHASE: DURING CRACKLING (i.e. - the scene is rendering and crackling has made the audio unlistenable)
PHASE: IDLE, AFTER CRACKLING HAS BEGUN (i.e. - the scene is no longer rendering, after crackling has made the audio unlistenable)
As you can see, the audio rendering system is completely damaged. I tried the trash can icon to execute an explicit garbage collection operation, and it did not help at all, no change. Note, the tutorial I linked to above also has tips on how to restructure audio rendering code to try and correct problems that might be causing the audio rendering difficulties. Let me know if you need anything else.
from amazon-sumerian-hosts.
Thanks for the link, I'll try debugging this way. In regards to figuring out which build you are on, are you using host.three.js or host.babylon.js? These would either be referenced in a script tag in your html file or you would have installed amazon-sumerian-hosts
via npm and imported one of those.
from amazon-sumerian-hosts.
Here's the package.json reference for amazon-sumerian-hosts
:
"devDependencies": {
"amazon-sumerian-hosts": "^1.3.1"
}
I am using host.three.js
.
from amazon-sumerian-hosts.
Any updates? I still have this problem and it happens consistently.
from amazon-sumerian-hosts.
I have not had much luck reproducing this yet, it's not happening for me within even 30 minutes so it's difficult to know how long I need to let things run before calling it quits. Since it is happening consistently for you, there are a few things I would want to test that you might give a try, it would be good to know your results:
-
Can you reproduce this using three.js traditional audio rather than positional audio? To do this, do not define the
attachTo
property of the options object you pass when creating the TextToSpeechFeature. If this option is not defined it will default to creating a three.js Audio object rather than a PositionalAudio object. -
A little more involved, but can you reproduce this using the
host.babylon.js
build rather thanhost.three.js
? Trying to determine if this is specific to the rendering engine audio system since hosts hook into the audio system of the rendering engine being used. -
Last resort, I would try generating audio files for the dialog you are passing to the host system using the AWS Polly console. Then create an application that uses three.js without the host package and play that audio in sequence using the three.js audio system. Does this reproduce the issue?
from amazon-sumerian-hosts.
"Can you reproduce this using three.js traditional audio rather than positional audio? To do this, do not define the attachTo property of the options object you pass when creating the TextToSpeechFeature. If this option is not defined it will default to creating a three.js Audio object rather than a PositionalAudio object."
Thanks. I'll give that a try. I don't have to to do the host.babylon.js test at this time because that would be a massive refactor. But I'll try disabling positional audio as you suggest.
BTW, I found this interesting post that describes problems with :
https://bugs.chromium.org/p/chromium/issues/detail?id=175363
I'm not sure if this is relevant but this and other posts I found describes problems with the user of scriptProcessorNode
that can cause crackling audio.
from amazon-sumerian-hosts.
I'm taking a wild guess here, but I'm thinking there may just be too much audio stored if you are continually playing dialog for long periods of time. We don't have any system in place for managing the storage of audio you are creating, but maybe you could set up a test to confirm whether or not this is actually the case. You will need to access internal host variables to get to the place where the host audio is stored. Assuming you have a HostObject variable named host
, we store the speech audio that gets generated in the following location: host._features.TextToSpeechFeature._speechCache
. Try setting up a keyboard event to set this variable to {}
, then execute that keyboard event once you hear the audio crackling. Monitor the memory to try to determine when the next garbage collection happens after executing that event. Does the next piece of audio that plays after garbage collection happens play back normally?
from amazon-sumerian-hosts.
I was just scanning through the three.js audio documentation and I noticed there’s a mistake in our three.html example file, I’m wondering if it may be causing your issue. How closely are you following the example code? In our createHost method we’re creating a separate THREE.AudioListener instance for each host. However the three.js docs state that there’s only meant to be one listener per scene. If you are also using multiple listeners, try using just one instead.
from amazon-sumerian-hosts.
To set up the hosts I'm using the code from examples. I just checked my code and indeed three audioListener objects are being added to the camera object (odd place to add a listener object, don't you think?). I'm going to move that code out of the per-host set up code to the scene initialization stage and only do that operation only once. I'll tell you how it goes tomorrow.
from amazon-sumerian-hosts.
@c-morten The audioListener
idea was helpful but I don't think it solved the original problem. I say this because now that I only create one audioListener
object instead of 3, the glitchy audio still occurs, it just takes 3 times longer to start degrading. This is a big help but I would still like to get rid of the problem completely. When I get the chance I'll try your cache clean-up idea.
Side note. How can I get a list of the emotes? I looked at the emote.glb
file but that's in a format that is not readable by a standard editor. When I try and open it I see non-ASCII characters. I see the animations in the gestures.json
file that exists for each character, but not the emotes? Does the "Alien" character only have the one "angry" emote?
from amazon-sumerian-hosts.
Hi @roschler. The .glb format is viewable in DCC applications like blender. You can also import them into glTF viewers like https://gltf-viewer.donmccurdy.com/ and https://sandbox.babylonjs.com/ to be able to preview the names of animations contained within. Currently the "Alien" character only has the "angry" emote, that character has a more limited animation set because it was used as a test to prove out that we could use the PointOfInterestFeature on characters whose rigs have varying proportions and joint orientations/names.
from amazon-sumerian-hosts.
Related Issues (20)
- AWS-Infrastructure-Setup.md Documentation Is Very Obsolete HOT 1
- How did you create the libsync.glb file? HOT 1
- unable to get examples to work HOT 2
- Mouth Does Not Move With Speech HOT 7
- Add support for BabylonJS v5 HOT 13
- Speech animation, gaze tracking, face shape break with BabylonJS v5 HOT 1
- Custom Avatar Models for BabylonJS HOT 1
- LexRuntimeV2 recognizeUtterance Response HOT 2
- Cannot create and run "production" builds HOT 1
- Audio not working with BabylonJS Demo on iOS devices HOT 1
- Editing glTF assets with DCC Tools (import/export) degrades content. HOT 5
- How to develop a custom Amazon Sumerian Host avatar? HOT 9
- PlayCanvas host HOT 3
- Remove webpack, use vanilla ES Modules to simplify the project. HOT 2
- loading forever on mac HOT 3
- -
- replace Karma with @web/test-runner
- GET /_next/webpack-hmr 404 HOT 1
- Babylon types not visible
- Allow the usage of Generative Polly for a more natural sounding voice
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from amazon-sumerian-hosts.