Code Monkey home page Code Monkey logo

magic-voice-switch's Introduction

Magic Voice Switch

Overview

Magic Voice Switch is a project inspired by a popular Instagram video where magic words like "開damn~~" and "關damn~~" are used to control lights. Although the video was proven to involve manual control, this project aims to bring the idea to life by using voice commands to control lights.

The project supports two modes:

  1. Machine Learning Mode: Uses a model trained with Teachable Machine to recognize specific magic words.
  2. Speech-to-Text (STT) Mode: Recognizes similar sounding words to classify them into categories.

Categories

  • 0: Background Noise
  • 1: 開damn
  • 2: 開燈
  • 3: 關damn
  • 4: 關燈

Future Plans

  • Integrate with Raspberry Pi to control physical LED lights.
  • Develop a more visually appealing web interface for cloud deployment.

Dependencies

Audio Processing

  • librosa
  • numpy
  • PyAudio

Speech Recognition

  • SpeechRecognition
  • openai

Machine Learning and AI

  • tensorflow

Environment Management

  • python-dotenv

Setup Instructions

Build venv for MacOS

Mac should brew install portaudio at first to install PyAudio.

$ python3 -m venv venv
$ source venv/bin/activate
$ pip install -r requirements.txt
$ deactivate
$ rm -rf venv     # remove the venv

Build venv for Windows

$ pip install virtualenv
$ virtualenv venv
$ venv\Scripts\activate
$ pip install -r requirements.txt
$ deactivate
$ rmdir /s venv     # remove the venv

Running the Project

Run the following command to start the voice recognition loop:

python main.py

You will be prompted to choose the mode:

  1. If you choose Model, the system will use the trained model for recognition.
  2. If you choose STT, you will be prompted to choose between Google or OpenAI for speech-to-text processing.

Demo

model.final.mp4
stt.final.mp4

magic-voice-switch's People

Contributors

juntinglin avatar 1chooo avatar

Watchers

 avatar

magic-voice-switch's Issues

[Enhancement] update the output of CLI version function

現有的 main.py 是根據聲音輸入,直接操作,因此需要先改寫這段邏輯的改寫才能完成 #6

現有的 function 沒有任何 return,然而目前操控開關燈的方式是依據 retrun 1, return 2 來決定的,因此此 main.py 的內容需要獨立出來成兩部分

  1. 透過 model 決定 return 的內容(此目的為了 api 串接使用)
  2. 將 return 的用來決定下個動作

import os
import time
import threading
import numpy as np
import tensorflow as tf
from audio_utils import get_audio, read_audio
from classify_utils import load_labels, classify_audio
from stt_utils import stt_audio, classify_from_text
# 設定模型和標籤文件的路徑
MODEL_DIR = 'models'
MODEL_FILE = 'soundclassifier_with_metadata.tflite'
LABELS_FILE = 'labels.txt'
MODEL_PATH = os.path.join(MODEL_DIR, MODEL_FILE)
LABELS_PATH = os.path.join(MODEL_DIR, LABELS_FILE)
def classify_and_print_results(interpreter, labels, audio_data):
audio_data = np.fromfile(open('output.wav'), np.int16)[22:]
results = classify_audio(interpreter, audio_data)
label_id, prob = results[0]
print(f"Detected: {labels[label_id]} with probability {prob:.4f}")
def stt_function(labels, stt_mode):
# 使用stt_audio進行語音轉文字
text = stt_audio('output.wav', mode=stt_mode)
print(f"STT Result: {text}")
# 進行分類
label_id, label, raw_text = classify_from_text(text)
print(f"Detected: {labels[label_id]} with label ID: {label_id}")
print(f"Raw Text: {raw_text}")
def main():
mode = input("請選擇模式 (1: 使用模型, 2: 使用STT): ").strip()
if mode not in ['1', '2']:
print("無效的選擇,請選擇1或2")
return
if mode == '2':
stt_mode = input("請選擇STT模式 (google/openai): ").strip()
if stt_mode not in ['google', 'openai']:
print("無效的選擇,請選擇google或openai")
return
labels = load_labels(LABELS_PATH)
if mode == '1':
interpreter = tf.lite.Interpreter(MODEL_PATH)
interpreter.allocate_tensors()
print("Interpreter initialized. Ready to classify audio commands.")
duration = 1 # 模型模式下的錄音時間為1秒
else:
print(f"STT mode ({stt_mode}) selected. Ready to transcribe audio.")
duration = 3 # STT模式下的錄音時間為3秒
while True:
# 使用多線程進行音頻錄製
audio_thread = threading.Thread(target=get_audio, args=("output.wav", duration))
audio_thread.start()
audio_thread.join()
if mode == '1':
# 開始推理
classify_thread = threading.Thread(target=classify_and_print_results, args=(interpreter, labels, None))
classify_thread.start()
classify_thread.join()
else:
# 使用STT
stt_thread = threading.Thread(target=stt_function, args=(labels, stt_mode))
stt_thread.start()
stt_thread.join()
time.sleep(0.5)
if __name__ == "__main__":
main()

影片說明

Screen.Recording.2024-06-05.at.11.46.05.AM.mov

需求

  • 請將 function 改寫成能夠 return 狀態

Update README in pr-branch

@1chooo

Please update the README file in the pr-branch to include the new setup instructions. Ensure that the following points are covered:

  • Usage instructions
  • Any additional notes or prerequisites

[Feat] Combine CMD Version with Web UI and Implement Voice Recognition Features

@1chooo

Please combine the CMD version of Magic Voice Switch with Web UI. The CMD version details and usage instructions can be found here or in the demo video (see attachment).

2024-06-03.15-46-27.mp4

Feature Requirements

  • Web Interface:
    • Create a user-friendly web interface.
  • Voice Recognition Mode Selection:
    • Allow users to select the voice recognition mode on the web page:
      • Mode 1: Model-based recognition
      • Mode 2: Speech-to-Text (STT) using Google Speech Recognition
  • Microphone Permission:
    • Implement functionality to request and obtain microphone permissions from the browser.
  • Recording Button
    • Create a recording button in the web interface that allows continuous recording when pressed.
    • Implement functionality to split the recording into 1-second chunks and save to project root directory.
  • Update README:
    • Clearly document the setup and usage instructions for the web interface in the dev branch's README.

Thank you!

[Add] the dashboard to track the light

Our topic is talking about the sustainability of the world; therefore, we can track the carbon cost of the light, also we can support the visualization with the gradio and use fastapi as the backend.

  • Create a Dashboard
  • Search the data with the cost of the light
  • Record how long the light has been turned and turned off!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.