Code Monkey home page Code Monkey logo

hear's People

Contributors

adisidev avatar mryakobo avatar sveinbjornt avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

hear's Issues

iTerm 2 zsh: abort

Using iTerm 2 with zsh, hear immediately aborts:

hear -d -i example.m4a > output.txt
zsh: abort      hear -d -i example.m4a > output.txt

iTerm 2, Build 3.4.19
MacBook Pro, M1 Pro
Ventura 13.3.1

Using Terminal Version 2.13, this issue does not occur.

Usage of the -l flag

I cannot for the life of me get the -l flag to work.

Running an Intel build of 12.4 any calls to a language other than the default results in:

No file at path en-GB

or en-IE or sk-SK etc. I assume that it either can’t find the correct path to a language pack or in some way it the OS needs to be informed to download that specific language pack?

Dictation in other applications (e.g. text edit) are working as expected in a system preferences selected language.

Is there something I’m missing about how to use other language profiles?

Translate from audio output

Hi, I'm interested in using hear for Speech Recognition when I have a call via e.g. FaceTime, Microsoft Teams or Google Meet. Most of these apps already have a recognition feature, but mostly it's for English only.

Do you know if it's possible to use the audio output, instead of the audio input, as source for hear?

"Abort trap: 6" - Permission request gets lost

bash-5.1$ uname -v

Darwin Kernel Version 21.6.0: Mon Aug 22 20:19:52 PDT 2022; root:xnu-8020.140.49~2/RELEASE_ARM64_T6000

bash-5.1$ /usr/libexec/PlistBuddy -c "Print:ProductName" -c "Print:ProductVersion" -c "Print:ProductBuildVersion" /System/Library/CoreServices/SystemVersion.plist

macOS
12.6
21G115

bash-5.1$ git clone https://github.com/sveinbjornt/hear
bash-5.1$ cd hear
bash-5.1$ make

** BUILD SUCCEEDED **

bash-5.1$ cd products
bash-5.1$ ./hear
Abort trap: 6

Error: On-device recognition is not supported for en-US

I have Mac Mini (M1), MacOS Monterey 12.5.1 (21G83).
After attempt to launch hear I see the following:

./hear -s
ar-SA
ca-ES
cs-CZ
da-DK
de-AT
de-CH
de-DE
el-GR
en-AE
en-AU
en-CA
en-GB
en-ID
en-IE
en-IN
en-NZ
en-PH
en-SA
en-SG
en-US
en-ZA
es-419
es-CL
es-CO
es-ES
es-MX
es-US
fi-FI
fr-BE
fr-CA
fr-CH
fr-FR
he-IL
hi-IN
hi-IN-translit
hi-Latn
hr-HR
hu-HU
id-ID
it-CH
it-IT
ja-JP
ko-KR
ms-MY
nb-NO
nl-BE
nl-NL
pl-PL
pt-BR
pt-PT
ro-RO
ru-RU
sk-SK
sv-SE
th-TH
tr-TR
uk-UA
vi-VN
wuu-CN
yue-CN
zh-CN
zh-HK
zh-TW

But nevertheless I can not transcribe any audio with the following error:

./hear  -d -i ./test.m4a        
2023-03-09 19:20:01.767 hear[21203:2358806] Required assets are not available for Locale:en-US
Error: On-device recognition is not supported for en-US

Here are the settings of dictation (see the screenshot).

What am I missing?
Screenshot 2023-03-09 at 19 20 28

Dealing With ANSI Escape Sequences

Hello, I'm hoping to use this tool as a way to have speech-to-text in the terminal and to then send the query as a JSON payload to the ChatGPT API.

The little Bash script looks like this:

Screenshot 2023-04-08 at 12 21 49 PM

The problem that I'm encountering is that the JSON is malformed, and ends up looking like this:

{ "model": "text-davinci-003", "prompt": "\u001b[2K\rTell\u001b[2K\rTell me a\u001b[2K\rTell me a joke\u001b[2K\rTell me a joke send", "max_tokens": "100", "temperature": "0.5" }

Do you know why this is the case, and how I might resolve this so that only the full final output of hear is used? From what I can tell by cat-ing out the /tmp/voice.txt file, it does contain just the prompt. I'm confused on why the includes these extra ANSI escape characters.

Continuous dictation only works on Apple Silicon M1

Submitted for the record.
I've just had a frustrating few hours getting hear going in Monterey 12.5.1. Its a neat piece of work.
If you have the same issue it is not Sveinbjornt's problem, it is caused by Apple not supporting continuous dictation in older Intel based Macs.

See: https://discussions.apple.com/thread/253318311

Please see macOS Monterey - New Features. The availability of this feature is limited to Macs with the M1 chip. Does your Mac mini have the M1?

Perhaps a comment could be added to the installation notes to avoid more frustrated users?

The limitation could be subverted by automatically breaking the input audio file into less than 30(or is it 60) second chunks. A task for someone cleverer than me.

Crash - privacy usage string missing from info.plist

On Monterey 12.6, arm64, this error occurs upon run from fresh build or bash install:
Process: hear [32653]
Path: /usr/local/bin/hear
Identifier: hear
Version: ???
Code Type: ARM-64 (Native)
Parent Process: Exited process [32651]
Responsible: Electron [11201]
User ID: 0

Date/Time: 2022-09-16 19:05:40.5243 -0400
OS Version: macOS 12.6 (21G115)
Report Version: 12
Anonymous UUID: 041C5E4F-1501-2D80-5C6B-36160B336907

Time Awake Since Boot: 23000 seconds

System Integrity Protection: enabled

Crashed Thread: 2 Dispatch queue: com.apple.root.default-qos

Exception Type: EXC_CRASH (SIGABRT)
Exception Codes: 0x0000000000000000, 0x0000000000000000
Exception Note: EXC_CORPSE_NOTIFY

Termination Reason: Namespace TCC, Code 0
This app has crashed because it attempted to access privacy-sensitive data without a usage description. The app's Info.plist must contain an NSSpeechRecognitionUsageDescription key with a string value explaining to the user how the app uses this data.

Option to stop listening after a while

I wish hear could stop automatically, instead of staying active. Maybe when there is no more input for 3 seconds, or when reaching a user-defined text length?

In my simple game, the user has to say back the word.

	say --voice="Samantha"  -- "$word [[slnc 400]]" 
	response=$(hear --mode)

	shopt -s nocasematch
	if [[  "${response}" == "${word}" ]]; then
		echo$word
	else
		echo 🛑 wrong. ➡  $word
		echo " - press enter for next word -"; read
	fi

However pressing CTRL-C to stop the listening it stops the script instead.

Also for scripting, the ESCAPE-DELETE control characters get into the way. A plain text output when the program is complete would be preffered. The --mode output looks like this

�[2K
Good�[2K
�[2K
Good morning

Workarounds:

  • stop after 4 seconds: response=$(timeout 4 hear --mode)
  • clean string: response=$(echo "$response" | sed 's/.*\[2K//g' | tr -d '\r\n')

Doesn't work in VisualStudioCode

Thanks for awesome software!
It works in terminal but when I run the same script using visualstudiocode it outputs nothing (no errors too, neither in code or in result.stderr).

I run it like result = subprocess.run(["./hear", "-l", "en-US", "-d", "-p", "-i", "output.wav"], capture_output=True, text=True)
VisualStudioCode has access to microphone and it works (checked it)
Thx!

execution denied

Hello, I tried hear but denied execution.
It said "hear" can't be opened because Apple cannot check it for malicious software.
Any suggestion for a workaround?

Thanks in advance.

image

Flag -d

Hi, thanks for the work. I understand correctly? Will it work with the flag -d only the language package that is in ‘Siri’ and is loaded? I got a job only with the flag -d ‘ru-RU’ and ‘en -US’, but I need ‘uk-UA’. ‘Siri’ does not speak Ukrainian and so I can’t get a translation with a flag -d. I conducted an experiment, installed ‘Siri’ French, the language package loaded and I had an ‘fr -Fr’ Local with a flag -d.

Supported language not finding file

First things first: I love this tool!

It's working great for the english language. Maybe I'm using it wrong but I would expect this to return german text (if german text was spoken):
hear -l de-DE
Instead I'm getting: No file at path de-DE

I reassured that this language is supported by executing hear -s (the list contains de-DE).

Where am I supposed to save what file exactly? Or how should I call hear -l ?

Choose language

It would be nice if one could chose from one of the available languages. Is that planned? :-)

FR: log when the sentence is considered finsihed

Hi,
thanks for this great job. I'm not exactly writing an issue, rather a feature request. Hear adds a carriage return after a certain timespan. Consequently, it starts to log the speech in a new line. Would be useful to log on the output whenever the sentence is considered finished, and next word will be logged on a new line. This could be achieved by simply adding a carriage return at the proper deadline, without waiting for a new word to be logged.
cheers
michele

hear works on Terminal, but not on Alacritty: NSSpeechRecognitionUsageDescription key missing.

When I run 'hear' from Alacritty it does not ask access to speech recognition and return 'zsh: abort'.

Console.app:

ERROR: This app has crashed because it attempted to access privacy-sensitive data without a usage description. The app's Info.plist must contain an NSSpeechRecognitionUsageDescription key with a string value explaining to the user how the app uses this data.

On the other hand, when I tried to run 'hear' using Terminal.app, it worked perfectly. Terminal asked for access to speech recognition and started outputting the transcribed text.

Apple documentation says that "You must include the NSSpeechRecognitionUsageDescription key in your app’s Info.plist file. If this key is not present, your app will crash when it attempts to request authorization or use the APIs of the Speech framework."(1) Because "This key is required if your app uses APIs that send user data to Apple’s speech recognition servers."(2)

I added to /Applications/Alacritty.app/Contents/Info.plist:

<key>NSSpeechRecognitionUsageDescription</key>
<string>An application in Alacritty would like to access speech recognition.</string>

Now 'hear' exit with:

Error: Speech recognition authorization not determined

hear/src/Hear.m

Line 114 in e324756

[self die:@"Speech recognition authorization not determined"];

Does anyone know how to solve this issue? Or have any suggestion?

Thanks!

Getting - zsh: abort hear on iMac M1

first try on MacbookPro M1 and works fine but same not working on desktop iMac...

Looks like some security issue. first allowed app opened from App store and identified developers
but still getting abort.

hear -h is works fine

Checking why with:
xattr -l /usr/local/bin/hear

getting : com.apple.quarantine: 0081;665dd081;Chrome;

also tried to exclude from quarantine xattr -d com.apple.quarantine /path/to/file , but not helped..

Hear "go to sleep" after some inactivity

I run hear with:
hear -l it-IT -d

It works perfectly.
Then I leave it running for some time without speaking.
When I speak again, speech is no logged anymore.
After some seconds of speech, hear restarts logging the speech.

It sounds like if there's an auto idle status after some time of no speech.

I checked in activity monitor, and neither hear nor localspeechrecognition processes show App Nap is active.

Any idea?

kAFAssistantErrorDomain error 203

hear -d -l 'en-GB' -i someM4A.m4a > aTranscription.txt

results in

Error: The operation couldn’t be completed. (kAFAssistantErrorDomain error 203.)

Running 12.4 (21F79), with en-GB for on device.

Integrate with ffmpeg?

Hi, great utility! Any way to integrate this with ffmpeg to be able to auto-subtitle videos? Maybe there's a piping command that could make this work today, but I'm not able to figure that out.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.