jamezq / palaver Goto Github PK

View Code? Open in Web Editor NEW

417.0 417.0 157.0 851 KB

Linux Speech Recognition

License: GNU General Public License v3.0

Shell 15.73% Python 50.84% C 32.94% Makefile 0.49%

palaver's People

Contributors

Stargazers

Watchers

Forkers

bytesofbinary meteoritt mars198356 sanyaade-research-hub luizcorreia willingc pcundotest srinivasgumdelli owstern peeyush113 victorcarlquist bhatworks sachithkadamba marcoceppi jvalbuena willknott mathieuvillegas artpar gryftir jorgeramirez karl1980llnl biblioclasta impiza douglaswalrath kschap nathanhere trtg vinuxes scottcagno monstrikow diogogmiranda hernandezzzzz tjpoe tvanicraath michaelmuxica vijaykeswani zzmjohn lundieee markmandel mark8038 streambo tfenby azinventor tutankamon tobiasbora ubraz marcorabelo alfem kagehak maikitol lastabank sottmar beardroid kalouantonis e52smith imclab ricardorock13 yavolna huexotzinca caoba1 dmarcoux111 jijothic wwwalter insp3kt0r666 adamsky289 sanskriti101 cartitech jeffcost elmapul mimoccc sibghatullahsheikh danielcebrian rex-jarp bhagirathverma uzumak edempco dragondon ororo clementsainsrobotics tlerosswork mouhamedoussama 2faaz aliades nkanand4 avilaverde ramimo periclescesar tilo wduncan saidimu khaledmos3ad hihihippp zzfelipebenevideszz dklobucaric dardevelin coronadola sravyabalagala solwilliams stealthx3 loganjing

palaver's Issues

main.dic not being used?

I was wondering why none of my commands were being accepted then I found out where main.dic was :)

Still, commands were not being recognized. If I copied the commands to my personal.dic, everything works fine. Does this mean I can only ever have one .dic file?

press button

Hi
Sorry! I didn't find any better subject.

I want to create a plugin that when user is typing in a TextBox (in any application) by saying something like "Correct Layout" , cuts the whole texts in TextBox then runs a specific app and finally pastes Texts of clipboard into the TextBox.

That specific app runs some methods and witch edit texts of clipboard.

I just don't know how to cut and paste.

settings plugin

what files do i need to add to install the settings plugin? i already merged the new stuff into main.dic and replaced plugins.db
what else do i need to add?

HUD Plugin

I have a plugin that can control the Unity HUD (Pressing ALT) but the speech recognition can't recognize HUD very well, any other suggestions for what the command could be?

User Interface

How do I add an interactive user interface layer to it instead of the default one on the right top? Kindly guide me for the same .

dictating emails

is there an add on you could make that would be able to dictate emails? it would go something like this:
"email"
say email address?
"address"
say subject?
"subject"
say body?
"body blah blah blah blah"
say are you sure you want to send this?
"yes" "no" "edit"
say email sent

oh and have a "cancel" command to cancel the email dictation, this would hopefully be done without a window, maybe an extension of the gmail add on?

German Language Problem and something more

The dictation mode can't display German extra caracters like "ä, ö, ü, ß" and I think some other caracters in some different languages, I have fixed the problem for me, in the way below:

I have replaced the content oft the file "/Recognition/bin/type", with the following code, but this fix needs the package "xclip" to be installed.

!/bin/bash

echo -e "$1" | xclip -in -selection clipboard
xvkbd -text "\Cv";

I don't know why it works, but this way it works.

And now what is about a function to get the weather of the day?

Sorry for my bad English.

Update Palaver in the background.

How would people feel about a making it so Palaver will automatically update it self from github? I have a small program started that can do this but I am wondering what people think.

Update URL in send_speech

Please update the URL on line 8 in send_speech to:

URL="https://www.google.com/speech-api/v1/recognize?lang=en&client=chromium"

To use secure http send/receive of autio file and result.

nano is used during setup instead of $EDITOR

During setup, nano is executed instead of respecting the user's $EDITOR.
This breaks if:

Nano is not installed
User has no idea how to use nano (though killall nano worked fine)

No OSD on Kubuntu & List of Commands

Is there some reason why no OSD when (de)activating Palaver on Kubuntu?
Also, where is the list of commands again?
Thanks so much for taking this on. As a person with a disability, I am very grateful.

Can't open ANY applications

The program responds to my voice, shows me exactly what I said, even uses the personal dictionary for some of the example responses I followed but it will not open any application.

If I say "open firefox" the program responds with "I just opened the firefox" as per followed example:
(WORD action) [the ](WORD object)
say "I just $action$ed the $object$."

Missing the directions how to set this up properly.

Ubuntu PPA / license

Hello,

Do you plan on making this available in a PPA soon? I've wanted to do it myself but I can't find the license anywhere... what's the license?

Issues with apostrophes

I was testing the type command and said "I'm testing". Unfortunately it failed to execute the command, presumably due to the presence of an apostrophe in the spoken text (which marks the end of the string prematurely).
I have attached a screenshot of the error.

"Text bubble" not showing...sometimes....persistent

When I start l my machine (Linux Mint 15, Kernel 3.8.0-33-generic) Palaver works fine. At 'some point' (guessing an audio conflict with other audio programs, maybe Banshee and/or Mumble??). Then later I'll hit the hotkey and the 'listening' bubble does not show but the program still responds.

If I 'Restart Servers' then the bubbles return.

new tab

i was wondering if there was a way to make a voice command open a new tab, it can be browser specific or work in any browser that is open, or work in the default browser

Installation instruction

Please add detailed instruction for installation in README.md.

Initial cap and carriage return

In experimenting with palavar (thank you for this!) I note that the initial character of the results of type or dictation mode are lower case.

More importantly, when I ask it to do a carriage return, I get an error. I don't understand the code yet, but which bit actually matches commands?

Error
'{"status":0,"id":"9e6a6a801618a545686c492d834fc625-1","hypotheses":[{"utterance":"type this is a test.\n","confidence":0.96828884}]}'
is not a recognized command

How to setup & test

Hi,

i have installed all required dependencies, but i m not able to test. how to add hotkey ?
can you please give me an example what i have to do if i want to open as firefox web browser . using this example i'll do it rest of the stuff. I am using Ubuntu 12.04 32bit

Thank you.

Support Multiple Languages

Hi, I just discovered Palaver. I got it running, although I have not been able (yet) to interact with it. Great stuff, thank you for sharing!

One thing I noticed: so far it seems to be designed for one user = one language. If this is the assumption, I submit to your attention that it is invalid in many real world cases. Would you kindly accommodate polyglots like me?

e.g. rename files like *.dic into *.LANG.dic

I have not read myself through the whole specification / plan, but I thought to bring this in as early as possible because later changes are more difficult than earlier ones.

Keep up the good work!
Yuv

The dictionary inside plugins should not be called "actions.dic"

It should refer to the dictionary that the plugin will be installed to. So for example, main.dic or dictation.dic. That way packages can choose what dictionary they want to extend.

Recording/Listening device not working

I have installed the software as per instructions given ..and have arrived on the final terminal screen which says " Configure .........Hotkey"

Have configured hotkey.......name is : Voice:

command is /home/Documents/Palaver-master/hotkey

but when i press the combination which is the same as shown in tutorials " Ctrl + L" ...nothing happens ....no green icon... no recorder icon......have tried speaking with keys pressed ...no response ....

the output and input mics are working fine have checked them through Audacity and skype .....

please help.......

Regards >>
Sajan

Can't open many applications

'Open Google Chrome' works but Palaver says 'Open Firefox' is not a recognized command.

Suggestion to change dependency

May I suggest you change the dependency on notify-osd to notification-daemon?

notification-daemon is the generic tool that is completely interchangeable with notify-osd and xfce4-notifyd for example.

I know the software is targeting Ubuntu, but this change wll make Palaver distro and desktop agnostic.

I currently have Palaver running (and have packaged in the AUR) on Arch Linux using an Xfce desktop.

For a pre-release beta, it is in a very useable state. Looking forward to watching the features increase as you approach release (and hopefully I may be able to contribute some plugins!)

Cheers.

Palaver app name already exists

Found this while browsing around --> http://palaverapp.com/ <---

{status:5,id:,hypotheses:[]}

User dictionary location

May I suggest adherence to the XDG base directory specification.

The user dictionary should be in /home/USER/.config/palaver

This can be deciphered in code by using the enviromnent variable $XDG_CONFIG_HOME. If this variable is not set, use the environment variable $USER.

Something along the lines of (in the file recognize):

if [ $XDG_CONFIG_HOME ]; then
_CONFIG_DIR=$XDG_CONFIG_HOME/palaver
else
_CONFIG_DIR=/home/$USER/.config/palaver
fi

Cheers.

Change recording to listening

From this youtube comment:
"Also, one thing regarding the OSD/Notifications UI: Instead of "Recording," it may be better to use "Listening". This may make it clearer that the primary function of the application is to transcribe input and process commands, and not function as an "audio recorder" (although, I have no doubt that a plugin/script could be written to do just that)... Semantics can be important for Privacy advocates."

Just making a note to myself, anyone else can do this too.

References to project inconsistent

Project name is Palaver, but it is referenced as pavaler and palver

/Doc/Dictionaries/guide.txt: 9: located in ~/.palver.d/personal.dic
/Doc/Plugins/Example Plugins/UserInformation (Uses the new SDK)/plugin.info:9:whats my name, Display the users name (as set in .pavaler.d/UserInfo)
/Microphone/osd_server.py:17: with open(home+"/.pavaler.d/UserInfo") as f:
/Recognition/bin/edit_details:3:gedit "$HOME/.pavaler.d/UserInfo"
/Recognition/config/Gmail/settings.conf:2:#This program uses the same email under ~/.pavaler.d/UserInfo

I had issues getting the personal dictionary working until I fixed these and recompiled.

Don't over use the HDD

Im not sure if its safe to the HD to use tons of times the same area, some devices have an limit (for example SSD may have issues if you record 10.000 files on the same phisicalblocks) that is why some Distros designed to be used from an pendrive (eg: puppy linux) use an file system that avoid update the file using the same phisical space.
they alternate between diferent places.

im not sure if the same can ocur with Read and on HDD, but in any case HD is one of the most slow parts of the computer wich levels the speed by the Weakest Link.

as i could understand the current way that this program work is by opening the program again every time you press the shortcut key.

there are 2 solutions for this.
Open Once, then listen user imput (on the keyboard) for the shortcuts key
or

Create an virtual file system on Ram and copy the program to there at first launch.
the keyboard shortcut should be append to the virtual filesystem instead of the real one. (this can be opitional)

Using an offline speech recognizer.

This is a feature request

These are the best features of using an offline speech recognizer:

Non-reliance on the internet service.
There will be no necessity to press the "boss" key twice.
The recognition can be a system service
- voice based login
- allow to perform super user level functionalities (shutdown, restart, install, ...)

volume up/down

with the volume up/down i find that the number of volume clicks (i guess u would call them clicks) is too how, is there a way to change how far the volume is turned up/down with that command?

Ubuntu 13.10 can't to use!

In my ubuntu 13.10 run, it say

How can i fix?

is not a recognized command

I installed it just fine via git clone. Then mapped the hotkey. But whatever command I try I get is not a recognized command. I.e "open Firefox is not a...

It's as if the dic files where not copied over. Where are they supposed to reside? My /home/dan/.palaver.d/ only contains the file UserInfo.

Upper and lowercase

Hello, I have modified the source code of dictionary in order to be case insensitive when recognizing.
How can I push my modifications? If I can, of course.

open

Performing recognition loop problem

I can't get the latest version to work, when i press the hotkey i can see the traybar icon but nothing happens (i talk and nothing), but sometimes i can see the traybar icon and the green HUD icon so i talk and press the hotkey again and it says Performing recognition but it stuck there forever idk why.
I have an old version (before the update file was added) and it works good.
Idk how fix it :(

Continuous mode

Any update on this issue?

I did some digging around and found the below command. It will start the recording when sound is detected and end it after 0.5 seconds silence. You'll need to change the percentage to fit your mic.

rec example.flac silence 1 0.5 10% 1 0.5 20%

We could issue the command when the continuous mode is enabled. After the command the audio file is created which is sent to Google. Then the file can be deleted and another process can be opened using the same command. Continuing like that until the continuous mode is disabled.

adding speech to chatbot

i believe that rather than have the chatbot show its responses in the notification area (which only show the first 2 words of each response) have the chatbot speak the response

New Google Streaming API

While browsing the chrome source archive i found a interesting file: https://src.chromium.org/viewvc/chrome/trunk/src/content/browser/speech/google_streaming_remote_engine.cc

Looks like google is switching to a streaming version (instead of the POST method on https://www.google.com/speech-api/v1/recognize). Basically, you open two HTTP connections - one to https://www.google.com/speech-api/full-duplex/v1/up and one to https://www.google.com/speech-api/full-duplex/v1/down - in order to use duplex capabilities. Audio chunks are streamed to the 'up' connection and the corresponding JSON results are sent back down through the 'down' connection (if I understand it correctly). Also there is some sort of protocol involved (check the comments at the top: https://src.chromium.org/viewvc/chrome/trunk/src/content/browser/speech/google_streaming_remote_engine.h?view=markup). This eliminates the 15 (or so) seconds cap for flac files. Only problem is the requirement of an API key and some sort of request keys.

Just wanted to let you know ..

.pyc and .pyo ---- to ".gitignore"

Appropriate Location for User-Submitted Plugins

What is the appropriate location to upload user-submitted plugins? I've seen that a few plugins have been submitted to GitHub and eventually merged into the main code. However, what about those user-submitted plugins that do not get merged? Do you envision some sort of separate user-plugin-repository where users can browse & download plugins? (I'm thinking of Arch Linux's "AUR" as an example)

gnome do and/or synapse

is there a way you could hook the voice recognition system up to gnome do and/or synapse?

multiplexing multiple sound files for continuous speech

You must have thought of this but just in case you haven't.
Record sound for upto 15 seconds and stop if quiet for more then 2 seconds (early stop)
As you are recording
Send 1 second,
Send 1,2 seconds
send 1,2,3 seconds
...
Send 1.2.3... 15 seconds or early stop.
While all this is going on receive results from recognizer (google or whatever)
tally all the results that are most common, (words that are the same the most) and that is the final output.
repeat.

It's brute force and uses a lot of bandwidth, but for those times when you have to have it, it might work.

Multiple Listening Areas

I can't wait for this program to become more active. It seems like a great step toward a smart house. I have started working with Arduino to switch some lights, adjust climate controls, and control a few appliances. Adding voice commands to the mix makes a total home automation solution. Only part missing is a good text to speech engine. However besides having to say what lights I want on if the Speech Recognition could listen to a few inputs (one from each room) and have fuzzy commands based on input. So If I say lights in my bedroom the lights come on. If I ask for the weather while I am in the kitchen it reads the weather on an output in the kitchen. Is something like that possible? Also what about using several Mics to increase accuracy.

Dealing with Conflicting Plugins

Should we look at ways to mark certain plugins as being in conflict with one another and force the user installing the plugin to choose which one they would like installed?

As an example, say a user has a plugin installed for Amarok music player. If the user then decided to install a plugin for MPD, obviously many (if not all) of the commands would be in conflict.

Missinig Folder - .palaver.d

I just installed Palaver and cannot make it work beyond "listening" and "done." The problem seems to be that the recorded voice file is not linking with a dictionary. I saw Troubleshooting in the Wiki, that indicated a ".palaver.d" folder was needed. There is no such folder in the download or created by setup. "send_speech" and "send_speech.py" seem to use this folder. Where should this folder be placed and what should be in the folder?

I need to mention that during setup, the error "mkdir: cannot create directory `/root/.palaver.d/config/browser/" (message from terminal window) and an Error message saying that "Google Chrome can not be run as root" popped up. These may be part of the problem??

Dictation mode

Dear JamesQ: I read that not all the issues previewed in the video demonstration are available in the current version.

Is the dictation mode already working?

Because I would like to try the software and being able of providing feedback about its behaviour.

Cheers, and congratulations for starting a wonderful tool!!!

Dictation mode

Dear JamesQ: I read that not all the issues previewed in the video demonstration are available in the current version.

Is the dictation mode already working?

Because I would like to try the software and being able of providing feedback about its behaviour.

Cheers, and congratulations for starting a wonderful tool!!!

Remove Plugins

I want to remove all the plugins that have worked their way into the master code. Using the installDefault I created. I want the system to come preinstalled with 0 plugins. When you run ./setup it will give you a list of "official, default plugins", the user can select the ones they want and install them.

I am hoping to get more people using the Repository so I can improve it faster and support more official plugins.