[Enhancement] update the output of CLI version function

Magic Voice Switch

Overview

Magic Voice Switch is a project inspired by a popular Instagram video where magic words like "開damn~~" and "關damn~~" are used to control lights. Although the video was proven to involve manual control, this project aims to bring the idea to life by using voice commands to control lights.

The project supports two modes:

Machine Learning Mode: Uses a model trained with Teachable Machine to recognize specific magic words.
Speech-to-Text (STT) Mode: Recognizes similar sounding words to classify them into categories.

Future Plans

Integrate with Raspberry Pi to control physical LED lights.
Develop a more visually appealing web interface for cloud deployment.

Dependencies

Audio Processing

librosa
numpy
PyAudio

Speech Recognition

SpeechRecognition
openai

Machine Learning and AI

tensorflow

Environment Management

python-dotenv

Setup Instructions

Build `venv` for MacOS

Mac should brew install portaudio at first to install PyAudio.

$ python3 -m venv venv
$ source venv/bin/activate
$ pip install -r requirements.txt
$ deactivate
$ rm -rf venv     # remove the venv

Build `venv` for Windows

$ pip install virtualenv
$ virtualenv venv
$ venv\Scripts\activate
$ pip install -r requirements.txt
$ deactivate
$ rmdir /s venv     # remove the venv

Running the Project

Run the following command to start the voice recognition loop:

python main.py

You will be prompted to choose the mode:

If you choose Model, the system will use the trained model for recognition.
If you choose STT, you will be prompted to choose between Google or OpenAI for speech-to-text processing.

Demo

model.final.mp4

stt.final.mp4

	import os
	import time
	import threading
	import numpy as np
	import tensorflow as tf
	from audio_utils import get_audio, read_audio
	from classify_utils import load_labels, classify_audio
	from stt_utils import stt_audio, classify_from_text

	# 設定模型和標籤文件的路徑
	MODEL_DIR = 'models'
	MODEL_FILE = 'soundclassifier_with_metadata.tflite'
	LABELS_FILE = 'labels.txt'
	MODEL_PATH = os.path.join(MODEL_DIR, MODEL_FILE)
	LABELS_PATH = os.path.join(MODEL_DIR, LABELS_FILE)

	def classify_and_print_results(interpreter, labels, audio_data):
	audio_data = np.fromfile(open('output.wav'), np.int16)[22:]
	results = classify_audio(interpreter, audio_data)

	label_id, prob = results[0]
	print(f"Detected: {labels[label_id]} with probability {prob:.4f}")

	def stt_function(labels, stt_mode):
	# 使用stt_audio進行語音轉文字
	text = stt_audio('output.wav', mode=stt_mode)
	print(f"STT Result: {text}")

	# 進行分類
	label_id, label, raw_text = classify_from_text(text)
	print(f"Detected: {labels[label_id]} with label ID: {label_id}")
	print(f"Raw Text: {raw_text}")


	def main():
	mode = input("請選擇模式 (1: 使用模型, 2: 使用STT): ").strip()
	if mode not in ['1', '2']:
	print("無效的選擇，請選擇1或2")
	return

	if mode == '2':
	stt_mode = input("請選擇STT模式 (google/openai): ").strip()
	if stt_mode not in ['google', 'openai']:
	print("無效的選擇，請選擇google或openai")
	return

	labels = load_labels(LABELS_PATH)

	if mode == '1':
	interpreter = tf.lite.Interpreter(MODEL_PATH)
	interpreter.allocate_tensors()
	print("Interpreter initialized. Ready to classify audio commands.")
	duration = 1 # 模型模式下的錄音時間為1秒
	else:
	print(f"STT mode ({stt_mode}) selected. Ready to transcribe audio.")
	duration = 3 # STT模式下的錄音時間為3秒

	while True:
	# 使用多線程進行音頻錄製
	audio_thread = threading.Thread(target=get_audio, args=("output.wav", duration))
	audio_thread.start()
	audio_thread.join()

	if mode == '1':
	# 開始推理
	classify_thread = threading.Thread(target=classify_and_print_results, args=(interpreter, labels, None))
	classify_thread.start()
	classify_thread.join()
	else:
	# 使用STT
	stt_thread = threading.Thread(target=stt_function, args=(labels, stt_mode))
	stt_thread.start()
	stt_thread.join()

	time.sleep(0.5)

	if __name__ == "__main__":
	main()

juntinglin / magic-voice-switch Goto Github PK

magic-voice-switch's Introduction

Magic Voice Switch

Overview

Categories

Future Plans

Dependencies

Audio Processing

Speech Recognition

Machine Learning and AI

Environment Management

Setup Instructions

Build venv for MacOS

Build venv for Windows

Running the Project

Demo

magic-voice-switch's People

Contributors

Watchers

Forkers

magic-voice-switch's Issues

影片說明

需求

Feature Requirements

Recommend Projects

Recommend Topics

Recommend Org

Build `venv` for MacOS

Build `venv` for Windows