Code Monkey home page Code Monkey logo

Comments (7)

roschler avatar roschler commented on September 26, 2024

UPDATE: I have discovered the listenTo() Sumerian Hosts function. I believe this is what I want. I'm going to try using it with the "TextToSpeechFeature.onSsmlEvent" message and see if I can intercept my own custom marks.

More importantly, I don't want to take your time unnecessarily. Is there a document I could read that covers these features of the repo like the listenTo() function? If not, I'll continue to learn by discovery, but that's a bit time consuming of course.

from amazon-sumerian-hosts.

roschler avatar roschler commented on September 26, 2024

I have custom listeners working. For those interested, here is the proper syntax:

	hostObject.listenTo(
			hdsLuke.host.TextToSpeechFeature.EVENTS.ssml,
			onHostCustomSsmlTag
		);

Note: hostObject is an arbitrary name I use for my Sumerian host objects. onHostCustomSsmlTag is an arbitrarily named function that I am using for the callback function, which is the function to be called when an SSML tag is received. Note, the SSML tag must be the "" tag.

For the Design Patterns fans out there, the Sumerian Host listener pattern is the classic Chain of Responsibility pattern:

https://en.wikipedia.org/wiki/Chain-of-responsibility_pattern

There can be multiple handlers for the "mark" tag. It is up to each handler (i.e. - each callback function assigned to the SSML event tag via a listenTo() call) to inspect the "mark" tag object and pull out what you need. It is up to you to deal with any overlapping functionality that causes conflicts between handlers.

@c-morten If I have any of this wrong please correct me. If you have any other tips or guidance that might be useful please let me know. As I said in my message above, if there are docs that discuss these features in depth, I'd like to know about them.

from amazon-sumerian-hosts.

roschler avatar roschler commented on September 26, 2024

I think there's a nasty problem with trying to effectivley use custom listeners. I'm referring to listeners that process the embedded ("") tags in a text-to-speech string when using the Sumerian hosts library.

If you use a command that the library can't fully parse, then it doesn't remove the "mark" text from the TTS stream or properly condition them (not really sure what), and this causes the preliminary call to Polly to fail with a 400 Bad Request error, crashing the whole play-speech call. And when I mean not exact, I mean even if the number of args in the "args" part of the JSON packet has the wrong count with everything else being correct for a method the repo does understand.

For example, as a test of my assertion, I crafted a PointOfInterestFeature command using the setTargetByName method and passed two strings in the args array instead of 1. This crashed Polly and therefore of course never hit the event listener I added for OnSsmlText events using the listenTo() method:

<mark name='{"feature":"PointOfInterestFeature","method":"setTargetByName","args":["Luke, Cristine"]}'  />

I didn't do this to crash the system. I did it to prove the point that any variation of a call signature that the repo doesn't understand will break the TTS operation. Originally I crafted my own completely different call architecture for my custom event listener. But after many hours of trying many different combinations, I discovered nothing worked.

I think someone tried to make the custom listener feature viable because you do see code like this in SSMLSpeechMarkInterface.js:

  static Mixin(BaseClass) {
    const ParentClass = TextToSpeechFeatureDependentInterface.Mixin(BaseClass);
    const SSMLSpeechMarkMixin = class extends ParentClass {
      _onSsml({mark}) {
        try {
          const {feature, method, args} = JSON.parse(mark.value);
          if (this.constructor.name === feature) {
            const callback = this[method];
            if (callback && typeof callback === 'function') {
              callback.apply(this, args)
            } else {
              console.warn(`Function ${method} does not exist within feature ${feature}`)
            }
          }
        } catch(e) {}
      }
    };

So that code does check to see if the repo supports the feature and warns if it doesn't. However, that code is never reached if the call payload used doesn't match something the rest of the code fully understands. I'm guessing this is because the library fixes up the SSML text stream for commands it knows before passing them to Polly? If true, and this operation doesn't happen for custom "mark" tags created by devs like myself, than the reality is you really can't create your own custom listeners because you can't craft a "mark" string that won't crash Polly. There is some operation the repo needs to do for custom payloads that isn't happening and that breaks the whole TTS pipeline because the call to Polly fails.

Please correct me if I'm wrong on this. I've spent a lot of time trying to get this to work. For now, I'm going to try something really awkward like stuffing what I need into the args string for setTargetByName into the singular string that goes in the arg array. I going to try this because that call doesn't do anything harmful if it can't find the target. Then I will modify my event listener pull what it needs out of that string. Pretty ugly stuff and as you can imagine, at this point I would really like to see some docs are certain key subjects of this repo. It's an amazing piece of work but trying to implement certain key, admittedly exotic, custom extensions of it are painful at the moment.

from amazon-sumerian-hosts.

roschler avatar roschler commented on September 26, 2024

I got it working by kludging my call into the PointOfInterestFeature.setTargetByName message as I described in my above post. It works but it's really awkward. If someone can tell me how to make custom messages that don't crash the Polly code in the repo, that would be great so I could go back to an embedded JSON message approach.

from amazon-sumerian-hosts.

c-morten avatar c-morten commented on September 26, 2024

Hi @roschler. You are correct in your use of the listenTo method. We use this method in the example html files for reference. You can also find a full list of EVENTS in the constructor of each Feature object in the API Documentation(Note: some of these may be listed on the abstract version of the feature if the feature inherits from an abstract class).

As for errors you are encountering, unfortunately that is a limitation of the Polly API, which is what we use for the TextToSpeechFeature. If you pass it a string it can't handle it will throw an error. The code you linked above for the _onSsml gets called after a result is received from Polly and encountered during playback via the TextToSpeechFeature, so if Polly rejected the original speech text it will never make it to that method. A good way to test to make sure your speech text is valid is to try it out in the AWS Polly console. I believe in your case it may be due to the extra spacing/tab characters at the end of your mark tag, Polly can be pretty picky about spacing.

from amazon-sumerian-hosts.

roschler avatar roschler commented on September 26, 2024

"As for errors you are encountering, unfortunately that is a limitation of the Polly API, which is what we use for the TextToSpeechFeature."

@c-morten Something doesn't add up. Unless the Polly API has intimate knowledge of all the variations of the "" command the Sumerian Host library currently parses and uses, then it should not crash just because I pass a variation of one of your commands that only varies slightly.

As I said in my long post above:

#25 (comment)

"For example, as a test of my assertion, I crafted a PointOfInterestFeature command using the setTargetByName method and passed two strings in the args array instead of 1. This crashed Polly and therefore of course never hit the event listener I added for OnSsmlText events using the listenTo() method:

I didn't do this to crash the system. I did it to prove the point that any variation of a call signature that the repo doesn't understand will break the TTS operation."

Just having two string arguments in the args array contained in the PointOfInterestFeature.setTargetByName shouldn't cause Polly to crash unless Polly knows the exact details of every embedded JSON method payload the Sumerian Hosts library supports. Except for that minor variation in the args list, it's still valid SSML. There has to be something else going on, don't you think?

from amazon-sumerian-hosts.

c-morten avatar c-morten commented on September 26, 2024

In the tag you included in your example, there was a large space between the end of the contents of the tag and the closing /> characters, try removing that empty space. I remember when I was originally testing different strings to send to Polly it would throw errors if there were leading or trailing spaces in the SSML markup.

from amazon-sumerian-hosts.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.