Comments (7)
UPDATE: I have discovered the listenTo()
Sumerian Hosts function. I believe this is what I want. I'm going to try using it with the "TextToSpeechFeature.onSsmlEvent"
message and see if I can intercept my own custom marks.
More importantly, I don't want to take your time unnecessarily. Is there a document I could read that covers these features of the repo like the listenTo()
function? If not, I'll continue to learn by discovery, but that's a bit time consuming of course.
from amazon-sumerian-hosts.
I have custom listeners working. For those interested, here is the proper syntax:
hostObject.listenTo(
hdsLuke.host.TextToSpeechFeature.EVENTS.ssml,
onHostCustomSsmlTag
);
Note: hostObject
is an arbitrary name I use for my Sumerian host objects. onHostCustomSsmlTag
is an arbitrarily named function that I am using for the callback function, which is the function to be called when an SSML tag is received. Note, the SSML tag must be the "" tag.
For the Design Patterns fans out there, the Sumerian Host listener pattern is the classic Chain of Responsibility pattern:
https://en.wikipedia.org/wiki/Chain-of-responsibility_pattern
There can be multiple handlers for the "mark" tag. It is up to each handler (i.e. - each callback function assigned to the SSML event tag via a listenTo()
call) to inspect the "mark" tag object and pull out what you need. It is up to you to deal with any overlapping functionality that causes conflicts between handlers.
@c-morten If I have any of this wrong please correct me. If you have any other tips or guidance that might be useful please let me know. As I said in my message above, if there are docs that discuss these features in depth, I'd like to know about them.
from amazon-sumerian-hosts.
I think there's a nasty problem with trying to effectivley use custom listeners. I'm referring to listeners that process the embedded ("") tags in a text-to-speech string when using the Sumerian hosts library.
If you use a command that the library can't fully parse, then it doesn't remove the "mark" text from the TTS stream or properly condition them (not really sure what), and this causes the preliminary call to Polly to fail with a 400 Bad Request error
, crashing the whole play-speech call. And when I mean not exact, I mean even if the number of args in the "args" part of the JSON packet has the wrong count with everything else being correct for a method the repo does understand.
For example, as a test of my assertion, I crafted a PointOfInterestFeature
command using the setTargetByName
method and passed two strings in the args
array instead of 1. This crashed Polly and therefore of course never hit the event listener I added for OnSsmlText
events using the listenTo()
method:
<mark name='{"feature":"PointOfInterestFeature","method":"setTargetByName","args":["Luke, Cristine"]}' />
I didn't do this to crash the system. I did it to prove the point that any variation of a call signature that the repo doesn't understand will break the TTS operation. Originally I crafted my own completely different call architecture for my custom event listener. But after many hours of trying many different combinations, I discovered nothing worked.
I think someone tried to make the custom listener feature viable because you do see code like this in SSMLSpeechMarkInterface.js
:
static Mixin(BaseClass) {
const ParentClass = TextToSpeechFeatureDependentInterface.Mixin(BaseClass);
const SSMLSpeechMarkMixin = class extends ParentClass {
_onSsml({mark}) {
try {
const {feature, method, args} = JSON.parse(mark.value);
if (this.constructor.name === feature) {
const callback = this[method];
if (callback && typeof callback === 'function') {
callback.apply(this, args)
} else {
console.warn(`Function ${method} does not exist within feature ${feature}`)
}
}
} catch(e) {}
}
};
So that code does check to see if the repo supports the feature and warns if it doesn't. However, that code is never reached if the call payload used doesn't match something the rest of the code fully understands. I'm guessing this is because the library fixes up the SSML text stream for commands it knows before passing them to Polly? If true, and this operation doesn't happen for custom "mark" tags created by devs like myself, than the reality is you really can't create your own custom listeners because you can't craft a "mark" string that won't crash Polly. There is some operation the repo needs to do for custom payloads that isn't happening and that breaks the whole TTS pipeline because the call to Polly fails.
Please correct me if I'm wrong on this. I've spent a lot of time trying to get this to work. For now, I'm going to try something really awkward like stuffing what I need into the args
string for setTargetByName
into the singular string that goes in the arg
array. I going to try this because that call doesn't do anything harmful if it can't find the target. Then I will modify my event listener pull what it needs out of that string. Pretty ugly stuff and as you can imagine, at this point I would really like to see some docs are certain key subjects of this repo. It's an amazing piece of work but trying to implement certain key, admittedly exotic, custom extensions of it are painful at the moment.
from amazon-sumerian-hosts.
I got it working by kludging my call into the PointOfInterestFeature.setTargetByName
message as I described in my above post. It works but it's really awkward. If someone can tell me how to make custom messages that don't crash the Polly code in the repo, that would be great so I could go back to an embedded JSON message approach.
from amazon-sumerian-hosts.
Hi @roschler. You are correct in your use of the listenTo
method. We use this method in the example html files for reference. You can also find a full list of EVENTS in the constructor of each Feature object in the API Documentation(Note: some of these may be listed on the abstract version of the feature if the feature inherits from an abstract class).
As for errors you are encountering, unfortunately that is a limitation of the Polly API, which is what we use for the TextToSpeechFeature. If you pass it a string it can't handle it will throw an error. The code you linked above for the _onSsml
gets called after a result is received from Polly and encountered during playback via the TextToSpeechFeature, so if Polly rejected the original speech text it will never make it to that method. A good way to test to make sure your speech text is valid is to try it out in the AWS Polly console. I believe in your case it may be due to the extra spacing/tab characters at the end of your mark tag, Polly can be pretty picky about spacing.
from amazon-sumerian-hosts.
"As for errors you are encountering, unfortunately that is a limitation of the Polly API, which is what we use for the TextToSpeechFeature."
@c-morten Something doesn't add up. Unless the Polly API has intimate knowledge of all the variations of the "" command the Sumerian Host library currently parses and uses, then it should not crash just because I pass a variation of one of your commands that only varies slightly.
As I said in my long post above:
"For example, as a test of my assertion, I crafted a PointOfInterestFeature command using the setTargetByName method and passed two strings in the args array instead of 1. This crashed Polly and therefore of course never hit the event listener I added for OnSsmlText events using the listenTo() method:
I didn't do this to crash the system. I did it to prove the point that any variation of a call signature that the repo doesn't understand will break the TTS operation."Just having two string arguments in the args
array contained in the PointOfInterestFeature.setTargetByName
shouldn't cause Polly to crash unless Polly knows the exact details of every embedded JSON method payload the Sumerian Hosts library supports. Except for that minor variation in the args
list, it's still valid SSML. There has to be something else going on, don't you think?
from amazon-sumerian-hosts.
In the tag you included in your example, there was a large space between the end of the contents of the tag and the closing /> characters, try removing that empty space. I remember when I was originally testing different strings to send to Polly it would throw errors if there were leading or trailing spaces in the SSML markup.
from amazon-sumerian-hosts.
Related Issues (20)
- AWS-Infrastructure-Setup.md Documentation Is Very Obsolete HOT 1
- How did you create the libsync.glb file? HOT 1
- unable to get examples to work HOT 2
- Mouth Does Not Move With Speech HOT 7
- Add support for BabylonJS v5 HOT 13
- Speech animation, gaze tracking, face shape break with BabylonJS v5 HOT 1
- Custom Avatar Models for BabylonJS HOT 1
- LexRuntimeV2 recognizeUtterance Response HOT 2
- Cannot create and run "production" builds HOT 1
- Audio not working with BabylonJS Demo on iOS devices HOT 1
- Editing glTF assets with DCC Tools (import/export) degrades content. HOT 5
- How to develop a custom Amazon Sumerian Host avatar? HOT 9
- PlayCanvas host HOT 3
- Remove webpack, use vanilla ES Modules to simplify the project. HOT 2
- loading forever on mac HOT 3
- -
- replace Karma with @web/test-runner
- GET /_next/webpack-hmr 404 HOT 1
- Babylon types not visible
- Allow the usage of Generative Polly for a more natural sounding voice
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from amazon-sumerian-hosts.