Code Monkey home page Code Monkey logo

Comments (20)

c-morten avatar c-morten commented on June 24, 2024 1

Hi @roschler, I'm happy to look into this. Can you provide some steps for how to reproduce this issue? If you had a transcript of host speech that would cause this to occur that would be really helpful. Also would be good to get some information about your browser and version and which rendering engine you're using. Have you tried using a different browser like Firefox to see if the issue still occurs?

For the audio implementation, we are using Web Audio. For the three.js build we create an Audio object and either a THREE.PositionalAudio or THREE.Audio object, then connect the Audio object to the three.js Audio object using Audio.setMediaElementSource. In Babylon.js the rendering engine handles the creation of the web audio object, we just pass the url to the constructor of BABYLON.Sound. If you wanted to circumvent the host audio I can think of two possibilities. The first option is a little hacky but quicker to implement. You could call the setVolume method of the TextToSpeechFeature to set it to 0. Then you could listen for the TextToSpeechFeature's play event, which will supply a Speech object as an argument to your listener function. When you catch the event you could immediately pause speech and use the Speech object's audio property to get a handle to the Web Audio object, which will point you to the url the audio was loaded from. Use that to create your Howler.js audio and then play your resulting audio once it's ready at the same time as resuming speech on the host. The second option would be to pull the repository and create your own custom build that overloads the speech implementation. TextToSpeechFeature._synthesizeAudio is where you'd need to create your custom audio. You may also need to overload play/pause/resume/stop of the Speech class depending on how Howler.js audio works. I can provide more details on the second option if you do want to try that.

from amazon-sumerian-hosts.

roschler avatar roschler commented on June 24, 2024

@c-morten The text doesn't matter. I've run a lot of tests. To test for yourself, just grab 5 to 10 minutes worth of text off the web, anywhere, and just keep generating TTS with the host TextToSpeechFeature facility with it. It's a quantity thing.

A big thanks for the audio internals details. Hopefully it doesn't come to that (at least for now). Eventually I'll want that to replace the audio anyways but hopefully not for a lon gitme. I'd like to apply volume and sound effects to the voices eventurally and I don't think there's a way to do that with the current library. Note, it would be nice if there was an easy way to swap out control of the audio so that once the audio needs to be started in sync with the viseme stream to effect lip sync TTS, the audio side of the things could be handed off to a consumer provided callback.

For now, I'm going to try the same test on my other stations. Hopefully it's an Ubuntu 14.04 audio driver issue and nothing else. That's an old Linux build.

from amazon-sumerian-hosts.

roschler avatar roschler commented on June 24, 2024

@c-morten Does the Sumerian Hosts use WAV or MP3 generated audio when creating TTS through Polly via the TextToSpeechFeature._synthesizeAudio call? I found this Stack Overflow post that mentions crackling audio when using WAV formatted audio and suggests switching to MPE:

https://stackoverflow.com/questions/6955957/html5-audio-crackle-in-chrome

from amazon-sumerian-hosts.

c-morten avatar c-morten commented on June 24, 2024

The audio format is specified in the options you pass in when adding the TextToSpeechFeature, or when you play speech. If you don't define it we default to MP3, so you most likely were not getting WAV audio.

from amazon-sumerian-hosts.

roschler avatar roschler commented on June 24, 2024

Ok, thanks. I was hoping it was WAV. Looks like I'm going to have fork and dig deeper. It happens on all stations.

from amazon-sumerian-hosts.

c-morten avatar c-morten commented on June 24, 2024

I have not yet been able to reproduce this, I have run test audio for over 30 minutes straight on all 3 builds with no issues yet. Can I get more information on your test scenario:

  • Which build are you using?
  • Is the browser and tab that's playing the audio active for the entire time leading up to when the issue occurs?
  • When it starts happening, do you notice any memory spikes in the console?
  • Do you encounter the same issue when using Firefox instead of Chrome

from amazon-sumerian-hosts.

roschler avatar roschler commented on June 24, 2024

from amazon-sumerian-hosts.

roschler avatar roschler commented on June 24, 2024

@c-morten Just tested on FireFox. Happens with FireFox too, same pattern too.

Where do I look to give you the correct answer to "what build are you using?"?

Regarding memory spikes, do you mean in the main system monitor or in the Chrome Task Browser (i.e. - Chrome's internal system monitor)?

Here's a note, not related to the audio crackling. Just a general comment about Sumerian Hosts audio on FireFox compared to Chrome. On Chrome, before the crackling occurs, the audio is smooth. On FireFox, the audio seems to get "clipped" at the start and the end of the waveform. If you have ever worked with music gear it feels like a noise gate with the volume threshold set too high, so when the audio starts there's an abrupt jump from no sound to some sound instead of a gentle, smooth easing in like sound normally does.

from amazon-sumerian-hosts.

roschler avatar roschler commented on June 24, 2024

@c-morten I watched memory/CPU/GPU in both the main System Monitor (Windows 8) and the Chrome Task Manager. Memory did not jump around much, but I did see something strange. When the audio was smooth, the CPU% was around 46% and then dropped to about 5% when the host animation/audio playing stopped. However, when the audio started crackling, especially heavily, the CPU was around 79% or worse. Also, after the scene stopped, the CPU stayed at that same high consumption level instead of dropping precipitously like it usually does after a scene stops. It's as if something in the browser is stuck doing something and won't stop.

This is wild speculation, but if for some reason some audio rendering process got stuck, then further attempts to play audio could easily cause crackling since the audio buffers would not be delivered properly with gaps between their delivery. This would get worse with each attempt if each attempt added another stuck audio process on the "stack".

from amazon-sumerian-hosts.

roschler avatar roschler commented on June 24, 2024

@c-morten I found a tutorial on debugging web audio problems using Chrome DevTools, especially in regards to crackling:

https://web.dev/profiling-web-audio-apps-in-chrome/

Here are some screenshots showing the performance metrics before and after crackling has begun. I have drawn boxes around the stats that are most notable (to me):

VIEW: tracing

SECTOR: AudioOutputDevice

 PHASE: Before Crackling Has Begun

image

 PHASE: During Crackling

image

NOTE: For the wasapi_render_thread, I didn't see any glaring differences, but when I look at the average durations the load appears to be about 25% greater during crackling compared to before crackling.

VIEW: tracing
SECTOR: wasapi_render_thread

 PHASE: Before Crackling Has Begun

image

 PHASE: During Crackling

image

VIEW: WebAudio Tools

NOTE: Look at the status line at the bottom of the screen for each of the following screenshots.

 PHASE: IDLE (i.e. - baseline, **before** any audio rendering has begun)
 NOTE: All values in the status line are zero.

image

 PHASE: ACTIVE (i.e. - actively rendering scene and audio, but **before** crackling has begun)

image

 PHASE: DURING CRACKLING (i.e. - the scene is rendering and crackling has made the audio unlistenable)

image

 PHASE: IDLE, AFTER CRACKLING HAS BEGUN (i.e. - the scene is no longer rendering, after crackling has made the audio unlistenable)

image

As you can see, the audio rendering system is completely damaged. I tried the trash can icon to execute an explicit garbage collection operation, and it did not help at all, no change. Note, the tutorial I linked to above also has tips on how to restructure audio rendering code to try and correct problems that might be causing the audio rendering difficulties. Let me know if you need anything else.

from amazon-sumerian-hosts.

c-morten avatar c-morten commented on June 24, 2024

Thanks for the link, I'll try debugging this way. In regards to figuring out which build you are on, are you using host.three.js or host.babylon.js? These would either be referenced in a script tag in your html file or you would have installed amazon-sumerian-hosts via npm and imported one of those.

from amazon-sumerian-hosts.

roschler avatar roschler commented on June 24, 2024

@c-morten

Here's the package.json reference for amazon-sumerian-hosts:

  "devDependencies": {
    "amazon-sumerian-hosts": "^1.3.1"
  }

I am using host.three.js.

from amazon-sumerian-hosts.

roschler avatar roschler commented on June 24, 2024

Any updates? I still have this problem and it happens consistently.

from amazon-sumerian-hosts.

c-morten avatar c-morten commented on June 24, 2024

I have not had much luck reproducing this yet, it's not happening for me within even 30 minutes so it's difficult to know how long I need to let things run before calling it quits. Since it is happening consistently for you, there are a few things I would want to test that you might give a try, it would be good to know your results:

  • Can you reproduce this using three.js traditional audio rather than positional audio? To do this, do not define the attachTo property of the options object you pass when creating the TextToSpeechFeature. If this option is not defined it will default to creating a three.js Audio object rather than a PositionalAudio object.

  • A little more involved, but can you reproduce this using the host.babylon.js build rather than host.three.js? Trying to determine if this is specific to the rendering engine audio system since hosts hook into the audio system of the rendering engine being used.

  • Last resort, I would try generating audio files for the dialog you are passing to the host system using the AWS Polly console. Then create an application that uses three.js without the host package and play that audio in sequence using the three.js audio system. Does this reproduce the issue?

from amazon-sumerian-hosts.

roschler avatar roschler commented on June 24, 2024

"Can you reproduce this using three.js traditional audio rather than positional audio? To do this, do not define the attachTo property of the options object you pass when creating the TextToSpeechFeature. If this option is not defined it will default to creating a three.js Audio object rather than a PositionalAudio object."

Thanks. I'll give that a try. I don't have to to do the host.babylon.js test at this time because that would be a massive refactor. But I'll try disabling positional audio as you suggest.

BTW, I found this interesting post that describes problems with :

https://bugs.chromium.org/p/chromium/issues/detail?id=175363

I'm not sure if this is relevant but this and other posts I found describes problems with the user of scriptProcessorNode that can cause crackling audio.

from amazon-sumerian-hosts.

c-morten avatar c-morten commented on June 24, 2024

I'm taking a wild guess here, but I'm thinking there may just be too much audio stored if you are continually playing dialog for long periods of time. We don't have any system in place for managing the storage of audio you are creating, but maybe you could set up a test to confirm whether or not this is actually the case. You will need to access internal host variables to get to the place where the host audio is stored. Assuming you have a HostObject variable named host, we store the speech audio that gets generated in the following location: host._features.TextToSpeechFeature._speechCache. Try setting up a keyboard event to set this variable to {}, then execute that keyboard event once you hear the audio crackling. Monitor the memory to try to determine when the next garbage collection happens after executing that event. Does the next piece of audio that plays after garbage collection happens play back normally?

from amazon-sumerian-hosts.

c-morten avatar c-morten commented on June 24, 2024

I was just scanning through the three.js audio documentation and I noticed there’s a mistake in our three.html example file, I’m wondering if it may be causing your issue. How closely are you following the example code? In our createHost method we’re creating a separate THREE.AudioListener instance for each host. However the three.js docs state that there’s only meant to be one listener per scene. If you are also using multiple listeners, try using just one instead.

from amazon-sumerian-hosts.

roschler avatar roschler commented on June 24, 2024

To set up the hosts I'm using the code from examples. I just checked my code and indeed three audioListener objects are being added to the camera object (odd place to add a listener object, don't you think?). I'm going to move that code out of the per-host set up code to the scene initialization stage and only do that operation only once. I'll tell you how it goes tomorrow.

from amazon-sumerian-hosts.

roschler avatar roschler commented on June 24, 2024

@c-morten The audioListener idea was helpful but I don't think it solved the original problem. I say this because now that I only create one audioListener object instead of 3, the glitchy audio still occurs, it just takes 3 times longer to start degrading. This is a big help but I would still like to get rid of the problem completely. When I get the chance I'll try your cache clean-up idea.

Side note. How can I get a list of the emotes? I looked at the emote.glb file but that's in a format that is not readable by a standard editor. When I try and open it I see non-ASCII characters. I see the animations in the gestures.json file that exists for each character, but not the emotes? Does the "Alien" character only have the one "angry" emote?

from amazon-sumerian-hosts.

c-morten avatar c-morten commented on June 24, 2024

Hi @roschler. The .glb format is viewable in DCC applications like blender. You can also import them into glTF viewers like https://gltf-viewer.donmccurdy.com/ and https://sandbox.babylonjs.com/ to be able to preview the names of animations contained within. Currently the "Alien" character only has the "angry" emote, that character has a more limited animation set because it was used as a test to prove out that we could use the PointOfInterestFeature on characters whose rigs have varying proportions and joint orientations/names.

from amazon-sumerian-hosts.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.