Hello,
This is not an issue, but rather a post on how this idea could jump on to the next level.
There exist many projects which actually cover speech recognition and a provide a "Jarvis" feel, and a couple of them are actually written especially for the raspberry pi, like this one.
I heard about this project about a year ago and was totally fascinated, but realized quickly, that not all of my needs could get covered.
I started my own project (half way), which is nothing else that a clone of voicecommand written on perl (https://github.com/computaholic/pivoice). However, I am not an expert in all this and sometimes it worked and sometimes it didnt. But I implemented some cool features, which I will summarize a little below.
Another great Project is Jasper (http://jasperproject.github.io/), which again is actually the same, but does not depend on google (which is good and/or not so good).
IDEA = JASPER + VOICECOMMAND and some pivoice ;)
So my Idea is, why not combining the best from all projects and create something that is superiour to all that has come to glance so far? I'd rather have one good solution than 10 proof of concepts ones.
PROJECT SUMMARIES
My project is called pivoice and is heavily inspired by this project. However, I tried to do things a little different:
- Main Purpose
The Main purpose is to catch voice, make tts and than match the text to a dictionary. Thats why I chose to use dictionaries.
- Dictionary
A file where all words are saved, which should get matched. Each Command has several entries, allowing various settings for each command, like the matching type and so on. You are also able to provide self written commands and you have full REGEX features. Here is an example, I will not go into detail
sample dict
[playmusic]
ListenFor = play music
Action = echo ""
[playssong]
ListenFor =~ play $song from @band
Action = echo "vlc /path/to/music/@band/$song.mp3"
[playvideo]
ListenFor = ^play\b(.?)\bseason (.?) episode (.*?)
# default: simple
MatchStyle = regex
Action = echo "mplayer /videos/$1/season_$2/episode_$3"
- Scenarios
What happens after a command has been recognized? My Implementation uses Scenarios. That means, if a command has been accepted, a new scenario can be loaded, allowing language change, microphone change etc. The Idea came when I tried to allow multiple kewords. Depending on what keyword you say, a different language is loaded. So you can google in your native language and then in german or so.
- Perl
I chose perl because I think its best when it come to parsing text, and thats basically all this is about.
Basically, this uses a local speech recognition. You have to define every word, that needs to be recognized. This is fast, but does not allow any words, that are not in the dictionary. Its ultra fast and very accurate (if the ditcionary is well written). I first thought of writing a google speech plugin for this, but then I thought why not tryning to combine the best of all worlds.
So, what do you think. Is there any interest in bringing these together? I guess this mostly goes to Steven.
computaholic