stypox / dicio-android Goto Github PK

Dicio assistant app for Android

License: GNU General Public License v3.0

Java 4.97% Ruby 0.47% Shell 0.27% Kotlin 94.29%

assistant assistive-technology personal-assistant personal-assistant-framework voice-assistant dicio-assistant dicio android vosk

dicio-android's Introduction

Dicio assistant

Dicio is a free and open source voice assistant running on Android. It supports many different skills and input/output methods, and it provides both speech and graphical feedback to a question. It uses Vosk for speech to text. It has multilanguage support, and is currently available in these languages: English, French, German, Greek, Italian, Russian, Slovenian and Spanish. Open to contributions :-D

Screenshots

Skills

Currently Dicio answers questions about:

search: looks up information on DuckDuckGo (and in the future more engines) - Search for Dicio
weather: collects weather information from OpenWeatherMap - What's the weather like?
lyrics: shows Genius lyrics for songs - What's the song that goes we will we will rock you?
open: opens an app on your device - Open NewPipe
calculator: evaluates basic calculations - What is four thousand and two times three minus a million divided by three hundred?
telephone: view and call contacts - Call Tom
timer: set, query and cancel timers - Set a timer for five minutes
current time: query current time - What time is it?
navigation: opens the navigation app at the requested position - Take me to New York, fifteenth avenue

Speech to text

Dicio uses Vosk as its speech to text (STT) engine. In order to be able to run on every phone small models are employed, weighing ~50MB. The download from here starts automatically whenever needed, so the app language can be changed seamlessly.

Contributing

Dicio's code is not only here! The repository with the compiler for sentences language files is at dicio-sentences-compiler, the code taking care of input matching and skill interfaces is at dicio-skill and the number parser and formatter is at dicio-numbers.

When contributing keep in mind that other people may have needs and views different than yours, so please respect them. For any question feel free to contact the project team at @Stypox.

IRC/Matrix room for communication

The #dicio channel on Libera Chat (ircs://irc.libera.chat:6697/dicio) is available to get in touch with the developers. Click here for webchat!
You can also use a Matrix account to join the Dicio channel at #dicio:libera.chat. Some convenient clients, available both for phone and desktop, are listed at that link.

Translating

If you want to translate Dicio to a new language you have to follow these steps:

Translate the strings used inside the app via Weblate. If your language isn't already there, add it with tool -> start new translation.

Translate the sentences used by Dicio to identify a user's request and to feed it to the correct skill. To do this open the repository root and navigate to app/src/main/sentences/. Copy-paste the en folder (i.e. the one containing English translations) and call the new folder with the 2- or 3-letter name of your language (in particular, any ISO-639-compliant language ID is supported). Then open the newly created folder: inside there should be some files with the .dslf extension and in English language. Open each one of them and translate the English content; feel free to add/remove sentences if their translation does not fit into your language and remember those sentences need to identify as better as possible what the user said. Do NOT edit the name of the copied files or the first line in them (i.e. the ID: SPECIFICITY line, like weather: high): they should remain English. To learn about the Dicio sentences language syntax, please refer to the documentation and the example in dicio-sentences-compiler. Hopefully in the future a custom translation system will be used for sentences.
Once both the Weblate and the sentences translations are ready, add the new language to the app's language selector. You can do so by editing this file:
1. Add the language code in the language code array pref_language_entry_values. You must respect the alphabetic order. You can find the language code with Weblate: click on a language to translate, and the language code is in the last part of the URL. For example, English is https://hosted.weblate.org/projects/dicio-android/strings/en, and English language code is en.
2. Add the language name in the language name array pref_language_entries. It must be placed at the same index as language code. For instance, if en is the 3rd on the language code array, then it's the 3rd on the language name array, too.
Then update the app descriptions so that people know that the language you are adding is supported. The files you should edit are README.md (i.e. the file you are currently viewing) and fastlane/metadata/android/en-US/full_description.txt (the English description for F-Droid).
Open a pull request containing both the translated sentences files, the language selector addition and the app descriptions updates. You may want to take a look at the pull request that added German, #19, and if you need help don't hesitate to ask :-)

Adding skills

A skill is a component that enables the assistant to understand some specific queries and act accordingly. While reading the instructions, keep in mind the skill structure description on the dicio-skill repo, the javadocs of the methods being implemented and the code of the already implemented skills. In order to add a skill to Dicio you have to follow the steps below, where SKILL_ID is the computer readable name of the skill (e.g. weather).

1. Sentences

Create a file named SKILL_ID.dslf (e.g. weather.dslf) under app/src/main/sentences/en/: it will contain the sentences the skill should recognize.

Add a section to the file by putting SKILL_ID: SPECIFICITY (e.g. weather: high) on the first line, where SPECIFICITY can be high, medium or low. Choose the specificity wisely: for example, a section that matches queries about phone calls is very specific, while one that matches every question about famous people has a lower specificity.
Fill the rest of the file with sentences according to the dicio-sentences-language's syntax.
[Optional] If you need to, you can add other sections by adding another SECTION_NAME: SPECIFICITY to the same file (check out the calculator skill for why that could be useful). For style reasons, always prefix the section name with SKILL_ID_ (e.g. calculator_operators).
[Optional] Note that you may choose not to use the standard recognizer; in that case create a class in the skill package overriding InputRecognizer. If you do so, replace any reference to StandardRecognizer with your recognizer and any reference to StandardResult with the result type of your recognizer, while reading the steps below.
Try to build the app: if it succeeds you did everything right, otherwise you will get errors pointing to syntax errors in the .dslf file.

2. Subpackage

Create a subpackage that will contain all of the classes you are about to add: org.stypox.dicio.skills.SKILLID (e.g. org.stypox.dicio.skills.weather).

3. Output generator

Create a class named SKILL_IDOutput (e.g. WeatherOutput): it will contain the code that talks, displays information or does actions. It will not contain code that fetches data from the internet or does calculations.

Create a subclass named Data and add to that class some public fields representing the input to the output generator, i.e. all of the data needed to provide an output.
Have the class implement OutputGenerator<Data> (e.g. WeatherOutput implements OutputGenerator<WeatherOutput.Data>)
Override the generate() method and implement the output behaviour of the skill. In particular, use SpeechOutputDevice for speech output and GraphicalOutputDevice for graphical output.

4. Intermediate processor

Create a class named PROCESSOR_NAMEProcessor (e.g. OpenWeatherMapProcessor): it will contain the code needed to turn the recognized data into data ready to be outputted. Note that the name of the class is not based on the skill id but on what is actually being done.

Have the class implement IntermediateProcessor<StandardResult, SKILL_IDOutput.Data> (e.g. OpenWeatherMapProcessor implements IntermediateProcessor<StandardResult, WeatherOutput.Data>). StandardResult is the input data for the processor, generated by StandardRecognizer after having understood a user's sentence; SKILL_IDOutput.Data, from 3.2, is the output data from the processor to feed to the OutputGenerator.
Override the process() method and put there any code making network requests or calculations, then return data ready to be outputted. For example, the weather skill gets the weather information for the city you asked for.
[Optional] There could be more than one processor for the same skill: you can chain them or use different ones based on some conditions (see 3.3). The search skill, for example, allows the user to choose the search engine, and has a different processor for each engine.

5. Skill info

Create a class named SKILL_IDInfo (e.g. WeatherInfo) overriding SkillInfo: it will contain all of the information needed to manage your skill.

Create a constructor taking no arguments and initialize super with the skill id (e.g. "weather"), a human readable name, a description, an icon (add Android resources for these last three) and finally whether the skill will have some tunable settings (more on this at point 5.4)
Override the isAvailable() method and return whether the skill can be used under the circumstances the user is in (e.g. check whether the recognizer sentences are translated into the user language with isSectionAvailable(SECTION_NAME) (see 1.1) or check whether context.getNumberParserFormatter() != null, if your skill uses number parsing and formatting).
Override the build() method. This is the core method of SkillInfo, as it actually builds a skill. You shall use ChainSkill.Builder() to achieve that: it will create a skill that recognizes input, then passes the recognized input to the intermediate processor(s) which in turn provides the output generator with something to output.
1. Add .recognize(new StandardRecognizer(getSection(SectionsGenerated.SECTION_NAME))) as the first function. SECTION_NAME is SKILL_ID, if you followed the naming scheme from 1.1, e.g. SectionsGenerated.weather.
2. Add .process(new PROCESSOR_NAMEProcessor()): add the processor you built at step 4, e.g. new OpenWeatherMapProcessor().
3. [Optional] Implement here any condition on processors: for example, query settings to choose the service the user wants, etc. If you wish, you can chain multiple processors together; just make sure the output/input types of consecutive processors match. For an example of this check out the search skill, that uses the search engine chosen by the user.
4. At the end add `.output
[Optional] If your skill wants to present some preferences to the user, it has to do so by overriding getPreferenceFragment() (return null otherwise). Create a subclass of SKILL_IDInfo named Preferences extending PreferenceFragmentCompat (Android requires you not to use anonymous classes) and override the onCreatePreferences() as you would do normally. getPreferenceFragment() should then return new Preferences(). Make sure the hasPreferences parameter you use in the constructor (see 5.1) reflects whether there are preferences or not.

6. List skill for SkillHandler

Under org.stypox.dicio.Skills.SkillHandler, update the SKILL_INFO_LIST by adding add(new SKILL_IDInfo()); this will make your skill visible to Dicio.

Notes

skillContext is provided in many places and can be used to access resources and services, similarly to Andorid's context.
If your input recognizer, processor or output generator use some resources that need to be cleaned up in order not to create memory leaks, make sure to override the cleanup() method.
If the skill doesn't do any processing (e.g. it may just answer with random quotes from famous people after a request for quotes by the user) you may skip step 4 above. Also skip 3.1 in that case, and have SKILL_IDOutput implement OutputGenerator<StandardResult>.
The names used for things (files, classes, packages, sections, etc.) are not mandatory, but they help avoiding confusion, so try to stick to them.
When committing changes about a skill, prefix the commit message with "[SKILL_ID]", e.g. "[Weather] Fix crash".
Add your skill with a short description and an example in the README under Skills and in the fastlane's long description.
If you have any question, don't hesitate to ask. 😃

dicio-android's People

Contributors

Stargazers

Watchers

Forkers

triallax comradekingu jarceloelement andreytkachenko cristian19194848 wiltonlazary midnightnerd trman david-allison kustomzone emporia-ai githangar alphacep daoos muflhi01 eddiemattos iamnaran alansanchezp mxc48-zz global-localhost global19 global19-atlassian-net tadashi-hikari jorik041 parolteknologio msgpo drew-sinha intensifier birx-web danielegobbetti geraldsoellinger nebkrid primesun vaginessa ankitbhaiya isocietyblackhat kri164 abdul1039 sl1txdvd rex07 berbascum 008chen 4144 pro1mantis sts0mrg0 lfyg hawkeye116477 mdouchin vijay-nailwal shura0 spacingbat3 rockystevejobs charudatta10 paulo-haas vnpower mmar58 tiptoptom jake354 sycomix ketansp gianpaolof bardock88

dicio-android's Issues

Search results can't be opened

I only get a "View in..." Menu with "No personal apps can open this content".

The code seems to do something complicated in ShareUtils.java. I have LineageOS based on Android 11, no Google Apps, and NewPipe doesn't have issues opening in a browser.

Skill marketplace

Dicio should be able to download skills from a marketplace where users can upload their own skills, whose development would be separated from Dicio, such as Mycroft Marketplace. A skill would be packaged as a compiled java file (i.e. a jar file) and then loaded at runtime by Dicio, to ensure the best performance. An alternative to this would be to have users install an app for each skill, and then sending Android intents around to communicate with skills: this is the approach taken by Athena, but I don't think it would work out well for Dicio since skills would then be unable to natively show graphical output to the user.
I will not focus on creating a skill marketplace right now, since Dicio is in a pretty early stage and still requires a lot of work. When (if) it will become more popular, and more people will start creating skills, creating a skill marketplace will become important. I opened this issue to illustrate my plans for the future ;-)

Calculator skill unavailable in spanish despite available translation

At least from what I can see, there's a spanish translation for the calculator skill available, despite that the app shows it as unavailable in the settings. I could help to translate it if needed, but it seems to be slightly different from the english file, so I don't know exactly what to do.

Not able to download VOSK model

When clicking the microphone icon, I get a toast that downloading VOSK model, but then it keeps on going round and round. And nothing ever happens. Tried it multiple times.
Help Please

Add usage instructions somewhere in the app

How to set the app as the system assistant (i.e. that triggers when long pressing the circle button)
How to setup a TTS if the on-device one does not work, with a button that leads to the relevant system settings screen
& more

Implement error activity

The NewPipe ErrorActivity and related files should be integrated into Dicio so that errors can be reported easily.

Also, the current default way to display errors generated by a skill being evaluated is to add a new view with the full stack trace to the output screen. This should be replaced with just an error message and a button allowing the user to open the error activity.

null object

Steps to reproduce
1.tell 'search for' to dicio
2.You get an error message
Expected behavior

'I did not understand, could you repeat?' as answer or even better:
Ask what your searching for

Weather tomorrow

I typed this command but it searched for a city with the name tomorrow.

It should search current city and detect time/date if possible before searching city name.

feature request: start the Dicio listening service in background via intent

From a comment in #64 , I'd like to be able start the listening service in the background via an intent. Personally I'd send it via Termux's implementation of am and/or a KeyMapper action.

Timer and clock skills should use default timer instead of Dicio internal timer

Currently
When setting a timer, Dicio internal timer is used.

Expected
Use default timer instead. That will allow for the best integration and UX allowing to then use notifications and system integration like button presses to stop the timer. Currently Dicio alarm cannot be stopped (at least in my limited testing or only by closing Dicio, which is rather inconvenient).

Some ideas for skills

I have some ideas for skills:

You can add hello skill (ie. I say: Hello Dicio, Dicio say: Hi, may I help you?.)
You can add tell me a joke skill.
You can add type a messige skill. When you are in car, you need this skill.
~~You can add wake up phrase.~~
~~You can add alarm skill.~~
~~You can add translator skill.~~
You can add what is my name skill.
You can add shopping list skill.
~~And many many other skills.~~

These are my ideas, you can add also your ideas.

Android voice input

On my Lineage OS 18.1 I can set a default assistant app. Once that's set, I would like to set Dicio as voice input, so when I trigger the default assistant by an external source (like my bluetooth headset by pressing a button), it opens and starts listening. Just line I do by mapping a long swipe (it opens Dicio app already listening).

Thanks in advance!

[Bug] Unable to find difference between "to" and "two"

I dont know if this is an issue you can really fix but when trying to use the calculator, it is unable to tell the difference between "to" and "two" by voice in English.

Maybe, not sure though, for now have all calculations that need numbers have to start with the phrase "calculate". After that numbers and operation phrases like "divided" or "by" will be accepted. This way "two" will be prioitised when needed.

Love the project, hope it continues!

[Weather] fahrenheit

For us cavemen plzkthxbai

Language translation skill

Lingva.ml is a open source front-end of Google Translate that contains no trackers of Google or any kind of things.
It is already being deployed and being continuously updated.
It would be great to see if you implement that as a webview part or any kind of thing in Dicio-Assistant.

Ask me anything.

I type translate "something" to <language_name>.
And it throws an output that is done in lingva.ml.

And, App opener is really annoying and buggy. It really pisses me off when ever I ask it to open an app, it opens something other.
Ex: I activate Dicio through edge swipe action as I cannot start it with a voice. I ask it to 'open dialer' and it opens 'camera'! Yes, dialer app's name is 'Phone'. And I find it hard to say, open Phone as I'm already using the phone.

It's really helpful if you make it recognise all the keywords.
Or tell me what stuffs needs to be learnt, I'll learn and try to implement that, and send a pull request.:D

I translated the .dslf files into Spanish

Hello everyone, I have translated lyrics, search, open, calculator and weather into Spanish, my English is basic, but with my basic knowledge and a translator I was able to translate it. I wanted to upload it and it won't let me, how could I contribute this translation? .

Add checkstyle

Similarly to NewPipe, some Java style should be enforced so that PRs follow a consistent style. NewPipe uses checkstyle for this purpose, and it seems to work well.

Feature request: Tasker plugin

It would be great to have a tasker plugin that can recognise text and pass it to tasker variable.
With such thing lot of people will be able to write tasker scripts with offline voice recognition without needs of Dicio recompilation.

Tasker: https://tasker.joaoapps.com
Yes, it's not opensource or even free, but it's quite popular utility for android automation.

Calculator Functions

I can see that there some basic mathematics comprehension integrated, but what are the limits?

For example, could you throw algebraic formulae and expect an answer?

So far, I tried throwing intermediate multiplication (powers, worked fine) and division (square root, was ignored or misunderstood), however I would like to know what the limits are, so as to be able to push them farther.

Also, whenever I speak, "2" and "4" are recognized as "to" and "for", respectively.

Problems with translation.

I've tried to translate but it won't let me, it says something about the repositories.
https://hosted.weblate.org/projects/dicio-android/strings/#translations

Add "retry" button to error messages

Currently when there is an error and a skill fails to do its job, there is no way to retry the same action. The only workaround is to tap on the last input, add a character somewhere, and then press Enter so that it is sent again.

This should be changed by keeping track of the last input (which could even be a list of inputs, if the user is in a conversation with multiple prompts, e.g. the telephone). Then the user should be able to retry the last action both by saying "retry" or by tapping retry on the error message.

Feature Request: Search music apps such as Spotify

It would be nice to be able to say something such as "Play Hello by Adele" and it would search Spotify for a track with that nane

application does not start

the application does not start and has an error

Downloading vosk model failed

Had downloaded this app recently, and while i press middle download button i get following message:
Failed downloading vosk model

device language: english
App version: 0.5

New Skills

Some others skills which will be easy to implement are:

Unit conversions (e.g. feet to cm)
Time Zone conversions
Currency conversions
Setting Alarms

[Feature Request]: Take dictation

It would be good if Dicio could take dictation, composing and modifying text.

Publish on F-Droid

~~Publishing on F-Droid requires alphacep/vosk-api#558 to be solved, since the repository maven { url 'https://alphacephei.com/maven/' } is not allowed in F-Droid.~~ alphacep/vosk-api#558 was solved, waiting for approval by F-Droid: https://gitlab.com/fdroid/fdroiddata/-/merge_requests/9657

Spanish sentences

Hi, I downloaded dicio and noticed that some of the skills are not available in spanish. I can see that some .dslf files are not present in the spanish directory and I would like to contribute with those translations.

Is there specific requirement that the sentences themselves should meet (to ensure that the engine will properly recognize them) or is it enough to just write what I consider the proper translation for a given skill? Sorry if this question seems a bit dumb, I haven't contributed to any STT-related project before and I want to do the best I can.

The project looks amazing so far btw, gratz!

[Add VOSK Model] Please add Indian English Voice Model

The Indian English voice model from VOSK detects my speech with a 90% accuracy rate...while the normal english model detects at 30-40% accuracy.

Adding the Indian-English speech model should be relatively straightforward.

Thanks :)

Use Dicio as system STT / voice recognition service

It is not an urgent thing,
but I think it would be very nice to be able to set the system STT in the Dicio's input mode as well.
There are many PoC projects to create FOSS STTs, some based on vosk, some on Mozilla Deepspeech or other.
At present none are really functional, but when they will be ready, I find it useless to download and save in two separate places the same vosk models for example.
I repeat that there is no hurry, but I think it should be done sooner or later.

Question/Feature request dump

Hello,

Here's a list of things I thought of while testing the app. If there was a gitter/discord/other platform for this project I would've reached out there before writing this big post. I can create separate issues for feature requests that prove viable:

Which vosk model is dicio actually downloading from https://alphacephei.com/vosk/models ? I'm just curious because there are dozens of models listed under different licenses and I wanted to know which dicio is using
Feature request: Passive listening with a wakeup word. How would you feel about dicio always using the microphone but only reacting after a keyphrase such as "Hey Dicio"
What's the long-term plan for dicio's skills if dicio becomes massively popular and recieves many contributions to dicio-skills? I'm not sure how much space skills use currently, but not everyone will want every skill. This sort of leads into my next feature request:
Feature request: 3rd party app integration skills. There's a few automation related apps I think dicio could integrate with nicely. They could reduce work needed on some dicio skills because they implement different things related to device management. Here's a list of apps and their docs on what could be used by dicio:

Termux - arbitrary commands in a terminal emulator for android. Docs on its RUN_COMMAND intent

KeyMapper - trigger keymaps from other apps

Easer - I can't find good docs for this particular one, but it has a Recieve Broadcast event condition in which users can specify received actions and categories. It can then be used to trigger what Easer calls "profiles" which are sets of actions such as toggling WiFi/Bluetooth, starting services kf other apps, and sending other broadcasts.

Sorry again for the long post.

Custom translation system for sentences files

Problems

Currently sentences files have to be translated by copying the *.dslf files from the en folder to the new language folder, and then translating them. This is cumbersome since:

GitHub pull requests are not the best tool to handle translations, since even a small update requires a branch, a dev approval, a merge, ... (that's why Weblate is used for app strings)
The sentence and capturing group ids and the section specificity should not be translated, but there is nothing making sure that this does not happen (not even when building the app!)
Syntax errors are only found out about when building the app, but we can't expect translators to be able to build the app by themselves (this would be partially solved by adding a github action to pull requests that reports build errors, though the feedback would still not be instant)
Translating a dslf file is more difficult than just translating a string, since you have to think about all possible ways to say something. It is easy to miss a sentence and only realize about it when testing out the app (and again, we can't expect translators to build the app by themselves)

Custom translation system

Adding the dslf files to Weblate is doable, though that would only solve point 1, since it would not ensure that there are no syntax/semantic errors. I don't think Weblate has any way to add custom plugins that would do that.

Therefore I think the way to go would be to create a custom translation system with these features (italic means "difficult to implement"/"maybe later"):

git integration like Weblate (but more basic)
a view that shows the file currently being translated, with syntax highlighting
a view that shows syntax errors, highlighting them in the code
a view that shows other errors, such as mismatched sentence ids, capturing group ids or section specificity
a view with suggestions on how to improve a sentence
a text field that allows inserting a user input and then shows whether it matches or not, the matched capturing groups and which sentence actually matched
when a skill marketplace will be setup, third-party skill developers should be able to get their skill sentences hosted on the same service

Crash when timer isn't visible (Dicio 0.6)

Hey there,

Thanks for Dicio 0.6! I'm glad to have a timer feature, which is the main feature I use in voice assistants :)

Sadly, the timer feature is quite easy to break. If you set a timer, click settings and then click back the app's state resets and a crash will happen on the next TTS update:

FATAL EXCEPTION: main
Process: org.dicio.dicio_android, PID: 24891
java.lang.NullPointerException: Attempt to invoke virtual method 'int android.speech.tts.TextToSpeech.speak(java.lang.CharSequence, int, android.os.Bundle, java.lang.String)' on a null object reference
 at org.dicio.dicio_android.output.speech.AndroidTtsSpeechDevice.speak([AndroidTtsSpeechDevice.java:71](https://androidttsspeechdevice.java:71/))
 at org.dicio.dicio_android.skills.timer.TimerOutput.lambda$setTimer$0$TimerOutput([TimerOutput.java:243](https://timeroutput.java:243/))
 at org.dicio.dicio_android.skills.timer.-$$Lambda$TimerOutput$JVQKXD3e0_QZmKL-N9F6BPYovhc.accept(Unknown Source:14)
 at org.dicio.dicio_android.skills.timer.TimerOutput$SetTimer$1.onTick([TimerOutput.java:74](https://timeroutput.java:74/))
 at android.os.CountDownTimer$1.handleMessage([CountDownTimer.java:130](https://countdowntimer.java:130/))
 at android.os.Handler.dispatchMessage([Handler.java:106](https://handler.java:106/))
 at android.os.Looper.loop([Looper.java:223](https://looper.java:223/))
 at android.app.ActivityThread.main([ActivityThread.java:7664](https://activitythread.java:7664/))
 at java.lang.reflect.Method.invoke(Native Method)
 at com.android.internal.os.RuntimeInit$[MethodAndArgsCaller.run](https://methodandargscaller.run/)([RuntimeInit.java:592](https://runtimeinit.java:592/))
 at com.android.internal.os.ZygoteInit.main([ZygoteInit.java:947](https://zygoteinit.java:947/))

95a4dd62-7402-4bb0-af23-a493d43a1048.mp4

(Please ignore the Dicio notification, this comes from Scoop which I personally find the easiest way to collect stack traces, it doesn't come from your app)

Request: timer skill

I would love to see a timer skill 🙂
As I am not into Java I can sadly not contribute by myself.

Feature request: Custom Synonyms for App names and Places

Many non-english models have problems with opening apps that have a english name, because the name does not appear in the Vosk dictionary. Also, a lot of places and cities are not recognized correctly by the models. A relatively easy workaround would be a feature that makes it possible to define synonyms for App names. This could also be useful for English speakers who have problems remembering some App names or for unknown Names that are not part of the Vosk dictionary.

Some examples:

Semantic synonymes: Whatsapp -> Text Messenger
Adding wrong transcriptions that often happen: WhatsApp -> what's app, Wort Sepp (German),...
Switching the Alphabet: WhatsApp -> WхатсАпп

I don't think that a predefined list of Synonyms should be part of the App, just a list that every user can define for themselves.

What do you think?

Possibility to train city names

It would be great if there would be possibility to train the recognition of cities.

I'm unable to get the model to recognize Baar or Zug. 🙄
And I'm sure this will be the case for other places as well.

Our is there any trick?

Launching Apps

To test this feature (and joke around a bit), I tried telling Dicio to open itself (by saying "Open Dicio").

It took three tries, which I would probably attribute to my accent, however this issue is about the hilarious results of the first two tries (the last try brought up a toast about opening Dicio, though nothing else happened, good job on that!).

First try: Dicio interpreted my command as "Open CEO", and for some odd reason launched ProtonMail.

Second try: Dicio interpreted my command as "Open D C O if", and oddly enough launched Aegis.

I can understand my accent not going through, but not operations that are completely unrelated to the interpreted results.

Dynamically define trigger sentences for skills

Hello,

thanks for the latest update, which introduces the telephone skill! :)

I want to implement a hands-free workflow to call a person using my Bluetooth headset only. It is almost complete, but unfortunately the microphone recording quality is bad over headset. it can't recognize the word "call" when I speak it. (most of the times I get "oh" instead).

Could you make it possible to additionally define synonyms for the trigger words?

Thanks for your efforts.

Not an issue, but can't figure out how to contact otherwise

I've been working on a really similar project, but I already have STT, TTS, and NLU working on-device (without network access). I've also created a plug-in system that doesn't require a user to fork or add skills to the existing project to run them. Would you like to work together to merge the projects, so we can bring a FOSS app to the community?

The primary project is https://github.com/Tadashi-Hikari/Sapphire-Assistant-Framework, but the most active version can be found at https://github.com/Tadashi-Hikari/Athena

Implement About screen

Something like this but in a fragment of its own

Ends of words

Does the dicio sentence syntax allow to use different ends for word? I've found two issues with it and I can avoid first of them dut cannot second.

Example of 'open' skill in Russin. It's possible to say "запусти" (do run) or "запустите" (kind form of "run") or "запустить" (to run). I found that I can use line like
запусти|заустите|запустить .what.
But more convenient would be to use something like
'запусти(те|ть)?'
Is it possible?
More hard to avoid. Word ends in capture groups. Example with 'weather' skill. I try to ask weather in Moscow (Москва):
какая погода в Москве?
And openweathermap does not know a city 'Москве' it knows only 'Москва'.
So, is it possible to setup word forms in capture groups?

I suppose the second issue will cause worst troubles in Unit converter skill, because all of units have different forms in singular and plural forms.

Other languages. Where to start?

Hello.
I would like to translate current skills to Russian. How can I do it? I see vosk have a russian voice model, so how to attach it to dicio?

Create a matrix room for the project

Skill translation

Hello, I would like to participate in the translation of skills. I have already understood more or less how it works, but I don't know if there is any need to compile anything.
A little help would be nice

small issue with the calculator function.

As you can see, "what's" is not seen as the same as "what is"

also generally it struggled with my accent but that is not really something you guys can fix.

other than that, really cool! cant wait to see more

skill request: broadcast user defined intent at runtime

I'd like to integrate Dicio into my personal automation system which primarily uses KeyMapper and Termux.

KeyMapper can receive intents that are sent containing a UUID and I want Dicio to send them. As the UUIDs are randomly generated at runtime, I'd need a text box in Dicio to copy the UUIDs to, rather than hard coding them. Additionally, #62 Would probably be a prerequisite to this skill as it'd require having a pronouncable name(s) for the intents.

Running commands in Termux would require requesting a permission and a few other steps

Additionally both KeyMapper and Termux (through am) can send service, activity, and broadcast intents. Is it possible to start Dicio's "listening" in the background (I.e. without starting the main Dicio activity)?

Thank you

Add [Feature request]: Floating mic button/Improving car usage

Hello,
First of all, congratulations for your great work! I have spent a huge amount of time looking for a free and open source voice assistant. Awesome project, thank you!!

I know it's and early stage I'd like to suggest (no ETA) if it's possible to implement, in the future, something that could help for example using the phone in the car, like a floating mic button. Or quick opening for navigation or music apps, indeed the open function works well yet and I see that is possible to implement some skills building the app.

Thanks and congratulation again

Icon Dicio

I think Dicio's icon is way too simplistic (no offense to whoever made it), Dicio is a voice assistant, so its icon should inspire usability.
I tried to make an icon that would better fit the project, I started with the idea that when people see the icon, they should know right away that it's a voice assistant, or at least that it has something to do with voice.
Please tell me what you think about it and if it fits, you can use it as you want.

Wake up word / wakeword recognition

All assistants have a wake up word (e.g. "Hey Google"), so Dicio should have it, too. This should be doable with a service running in the background with Vosk that keeps listening. Athena already has this feature (video), we might take inspiration from it. The wake word recognizer should obviously be easy to enable/disable in settings, and should probably be implemented with a foreground service, so that newer Android versions do not force close it after a while.

Popup instead of fullscreen when triggered from system buttons

It would be nice if when we open the application with the shortcuts we only have a pop-up of the microphone for example but not that the whole application opens. I think it would give an impression of speed and really of a more discreet assistant that activates in a corner and waits for instructions.