Code Monkey home page Code Monkey logo

Comments (15)

joshuashort avatar joshuashort commented on August 29, 2024

Bien sûr. Any help would be appreciated; English isn't even very well supported yet (#18), and there are many languages to add :)

To start on something like French support, you can put whatever you want as the command phrase, et voilà. The current design definitely has customization gaps though..

So, writing some French words on a command is easy enough, but it's also necessary to give pocketsphinx what it needs to 'listen' for French sounds. It'd have to be set up with the right phonemes, glyphs, and whatnot.

For anyone interested, pocketsphinx is what handles speech recognition in OA, and it's its own complex thing. To natively support French (let alone a multilingual agent), it'd require doing things that I only vaguely understand. Fortunately, it's an accessible project with great documentation and tools.

OA generates a corpus of terms that it finds attached to commands, and that'll work fine enough, but it uses that corpus to generate a language model and a phonetic dictionary. The language model is pretty trivial, and it will probably work as-is. The phonetic dictionary is where it gets tricky..

Anyway, yes, we should support all kinds of languages and vocalizations.. there are many improvements left. Thanks for the interest -- out of curiosity, are you able to run OA?

from oa-core.

joshuashort avatar joshuashort commented on August 29, 2024

Other bugs aside, here's what I just tried:

Added a command to minds/boot.py

@command("ça va")
def mhm():
  say("Très bien!")

As expected, no problems with the corpus or language model (well, no new problems).

The phonetic dictionary wasn't too far off:

VA	V AA
ÇA	AH

I changed it (and restarted OA):

VA	V AA
ÇA	S AH

So, boot mind can say "très bien" like an English-speaking computer on my system when it hears this approximation of "ça va". Getting the speech right is a whole other thing..

from oa-core.

Paullux avatar Paullux commented on August 29, 2024

I found the french dic in CMUSphinx SourceForge ( https://sourceforge.net/projects/cmusphinx/files/Acoustic%20and%20Language%20Models/French/fr.dict/download)

I used it in oa/cache/boot and in oa/cache/root, it seams worked, not totally.

To change Boot Config :
-change oa-core-master/oa/modules/mind/minds/boot.py
-change oa-core-master/oa/cache/boot/dic
-change oa-core-master/oa/cache/boot/lm
-change oa-core-master/oa/cache/boot/sentences.corpus
To change Root Config :
-change oa-core-master/oa/modules/mind/minds/root.py
-change oa-core-master/oa/cache/root/dic
-change oa-core-master/oa/cache/root/lm
-change oa-core-master/oa/cache/root/sentences.corpus

The seams works when I change this all file...

How I can change synthesis voice to pico ? In French, espeak it's horrible...

from oa-core.

Paullux avatar Paullux commented on August 29, 2024

I can add order, but order aren't completly understand, how i can re-use cmuxsphinx with another dictionnary from developpement of cmuxsphinx?

from oa-core.

joshuashort avatar joshuashort commented on August 29, 2024

Well, you're definitely on the right track. You shouldn't have to do anything to lm or sentences.corpus -- those will be overwritten/generated as OA loads. dic is the same way, but it'll need to be changed.

One change I've been wanting to make is disable automatic generation/find a way around using an online service.

For now, you'll need to let the files get generated, stop OA, manually add entries to dic, then re-run OA.. and hope they don't get overwritten.

I don't have much experience with running a fuller language model.. I'm guessing it'll add plenty of false positives and a harder time matching, but I'm not sure.

from oa-core.

joshuashort avatar joshuashort commented on August 29, 2024

What do you mean they aren't completely understood? You've added commands, but OA doesn't seem to recognize the words and trigger the commands? Or it's not accurate?

from oa-core.

Paullux avatar Paullux commented on August 29, 2024

OpenAssistant mixes the keywords. It does not understand everything at once. I have to repeat and the results are often wrong.

from oa-core.

Paullux avatar Paullux commented on August 29, 2024

OA write word in terminal, what i add in some order but not the really good by sound and not in the right way. For example : when i said "donne moi les nouvelles" i see write in terminal "LANCE LANCE NOUS"...

from oa-core.

Paullux avatar Paullux commented on August 29, 2024

to show you https://youtu.be/HYsGUvyb7Eg

from oa-core.

joshuashort avatar joshuashort commented on August 29, 2024

Ok, yeah, that makes sense.

I’ve been looking into the ear module, and it’s one place that needs some improvement. I added some logging to provide some visibility on what it does.. and depending on the settings, it can differ pretty wildly on how well it can recognize phrases/silence. I’ll get a MR worked up soonish that use these changes.. but I also don’t want to flood the normal log.

And some user-facing feedback would be great. There are states that ear gets into where it’s basically ignoring speech or what’s being spoken is falling between recognition phases. And setting tend to vary between headset/internal mic and what kind of ambient noise there is (e.g. fans).

Even if speech gets recognized, it’s expecting a full phrase match.. that’s a bigger change, but it should move to intent-based interpretation that can span multiple phrases.

from oa-core.

joshuashort avatar joshuashort commented on August 29, 2024

I made that last comment right before your link. Thanks for sharing that video! That’s a very cool setup you have!

Impressive results, but the recognition is very frustrating :/

It’s likely that it’s mostly an audio/ear module problem vs. a language/recognizer problem, so that’s nice.

from oa-core.

joshuashort avatar joshuashort commented on August 29, 2024

One thing to verify is the dic file you’re using.. I’d avoid a full-language one while debugging, so are you using one that only has the few words that are actually used in commands? Also, check on the phonetic mapping of the words to make sure they seem like how you’d expect.

To explore a little deeper, check out modules/ear/__init__.py.. there are some configuration settings in there that might make a difference. Specifically ones related to energy threshold or timeouts. I’m on mobile right now, but I’ll try to get a link.

from oa-core.

joshuashort avatar joshuashort commented on August 29, 2024

Without some logging, it’s hard to know what values to use and what the detected levels are.. but maybe try 1000 or 2000 for the energy threshold:

"energy_threshold": 4000,

Too low will give false recognitions from background noise.

from oa-core.

Paullux avatar Paullux commented on August 29, 2024

To help me i found this site http://www.speech.cs.cmu.edu/tools/lmtool-new.html , i can use it or i don't never changer file in 'cache' folder.

from oa-core.

joshuashort avatar joshuashort commented on August 29, 2024

I think that’s the service the speech recognizer used to update:

def update_language(_):

But running it outside of OA might be easier for now.

from oa-core.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.