Code Monkey home page Code Monkey logo

mitsuha's Introduction

Contributors Forks Stargazers Issues GPL-3.0 License YouTube Discord


Logo

OneReality

Bridging the real and virtual worlds

Demo video with 3D printed hologram box · Report Bug · Request Feature

Table of Contents
  1. About The Project
  2. Getting Started
  3. Roadmap
  4. License
  5. Acknowledgments

About The Project

Demo Click on image for demo video

A virtual waifu / assistant that you can speak to through your mic and it'll speak back to you! Has many features such as:

  • You can speak to her with a mic
  • It can speak back to you
  • Has short-term memory and long-term memory
  • Can open apps
  • Smarter than you
  • Fluent in English, Japanese, Korean, and Chinese
  • Can control your smart home like Alexa if you set up Tuya (more info in Prerequisites)

More features I'm planning to add soon in the Roadmap. Also, here's a summary of how it works for those of you who want to know:

First, the Python package SpeechRecognition recognizes what you say into your mic, then that speech is written into an audio (.wav) file, which is sent to OpenAI's Whisper speech-to-text transcription AI, and the transcribed result is printed in the terminal and written in a conversation.jsonl which the vector database hyperdb uses cosine similarity on to find 2 of the closest matches to what you said in the conversation.jsonl and appends that to the prompt to give Megumin context, the response is then passed through multiple NLE RTE and other checks to see if you want to open an app or do something with your smarthome, the prompt is then sent to llama.cpp, and the response from Megumin is printed to the terminal and appended to conversation.jsonl, and finally, the response is spoken by VITS TTS.

(back to top)

Built With

(back to top)

Getting Started

Video tutorial Here's how you can set it up on Windows (probably similar steps on Mac and Linux but I haven't tested them).

Prerequisites

  1. Install Python 3.10.11 and set it as an environment variable in PATH
  2. Install GIT
  3. Install CUDA 11.7 if you have an Nvidia GPU
  4. Install Visual Studio Community 2022 and select Desktop Development with C++ in the install options
  5. Install VTube Studio on Steam
  6. Download Megumin's VTube Studio Model
  7. Extract the downloaded zip so it's only one folder deep (you should be able to open the folder and have all the files there, not one folder containing everything)
  8. Open VTube Studio > Settings icon > Open Data Folder and move the folder there > Person icon > c001_f_costume_kouma
  9. Install VB Cable Audio Driver, but don't set it as your audio devices just yet
  10. Open Control Panel > Sound and Hardware > Sound > Recording > find CABLE Output > right-click > Properties > Listen > Check Listen to this device > For Playback through this device, select your headphones or speakers
  11. (Optional) Create a Tuya cloud project if you want to control your smart devices with the AI, for example, you can say 'Hey Megumin, can you turn on my LEDs' it's a bit complicated though and I'll probably make a video on it later because it's hard to explain through text, but here's a guide that should help you out: https://developer.tuya.com/en/docs/iot/device-control-practice?id=Kat1jdeul4uf8

Automatic Installation

  1. Open cmd in whatever folder you want the project to be in, and run git clone --recurse-submodules https://github.com/DogeLord081/OneReality.git
  2. Open the folder and run python setup.py and follow the instructions
  3. Edit the variables in .env if you must
  4. Run OneReality.bat and while it's running, open the start menu and type Sound Mixer Options and open it. You might have to wait and make Megumin say something first, but you should see Python in the App Volume list
  5. Change the output to CABLE Input (VB-Audio Virtual Cable)
  6. Open VTube Studio > Settings icon > Scroll to Microphone Settings > Select Microphone > CABLE Output (VB-Audio Virtual Cable) > Person with settings icon > Scroll to Mouth Smile > Copy these settings > Scroll to Mouth Open > Copy these settings
  7. Open Sound Mixer Options again and change the input for VTube Studio to CABLE Output (VB-Audio Virtual Cable)
  8. May need to restart computer if lip sync doesn't work
  9. You're good to go! If you run into any issues, let me know on Discord and I can help you. Once again, it's https://discord.gg/PN48PZEXJS
  10. When you want to stop, say goodbye, bye, or see you somewhere in your sentence because that automatically ends the program, otherwise you can just ctrl + c or close the window

(back to top)

Roadmap

  • Long-term memory
  • Time and date awareness
  • Virtual reality / augmented reality / mixed reality integration
  • Gatebox-style hologram
  • Animatronic body
  • Alexa-like smart home control
  • More languages for the AI's voice
    • Japanese
    • English
    • Korean
    • Chinese
    • Spanish
    • Indonesian
  • Mobile version
  • Easier setup
  • Compiling into one exe
  • Localized
  • VTube Studio lip-sync without driver like in this project but I don't really understand the VTube Studio API used here

(back to top)

License

Distributed under the GNU General Public License v3.0 License. See LICENSE.txt for more information.

(back to top)

Contact and Socials

E-mail: [email protected]

YouTube: https://www.youtube.com/@OneReality-tb4ut

Discord: https://discord.gg/PN48PZEXJS

Project Link: https://github.com/DogeLord081/OneReality

(back to top)

Acknowledgments and Major Contributors

(back to top)

mitsuha's People

Contributors

dogelord081 avatar jaxfry avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.