endimionzf,github

comfyui

The most powerful and modular stable diffusion GUI, api and backend with a graph/nodes interface.

deepfacelive

Real-time face swap for PC streaming or video calls

deepvideoanalytics

Analyze videos, perform detections, index frames & detected objects, search by examples

demucs

Code for the paper Hybrid Spectrogram and Waveform Source Separation

ffmprovisr

Repository of useful FFmpeg commands for archivists!

flowframes

Flowframes Windows GUI for video interpolation using DAIN (NCNN) or RIFE (CUDA/NCNN)

ltu

Code, Dataset, and Pretrained Models for Audio and Speech Large Language Model "Listen, Think, and Understand".

machinevideoeditor

This repository does not contain code, its purpose it for issue tracking and wiki

A modern, web-based photo management server. Run it on your home server and it will let you find the right photo from your collection on any device. Smart filtering is made possible by object recognition, face recognition, location awareness, color analysis and other ML algorithms.

pyannote-audio

Neural building blocks for speaker diarization: speech activity detection, speaker change detection, overlapped speech detection, speaker embedding

pyannote-core

Advanced data structures for handling temporal segments with attached labels.

pyscenedetect

:movie_camera: Python and OpenCV-based scene cut/transition detection program & library.

retrieval-based-voice-conversion-webui

Voice data <= 10 mins can also be used to train a good VC model!

rvc-studio

The best looking and most functional webui for RVC related tasks. See website for UI demo:

scenecut-extractor

Extract scenecuts from video files using ffmpeg

segment-anything

The repository provides code for running inference with the SegmentAnything Model (SAM), links for downloading the trained model checkpoints, and example notebooks that show how to use the model.

stash-box

Stash App's own OpenSource video indexing and Perceptual Hashing MetaData API

storytoolkitai

An editing tool that uses AI to transcribe, understand content and search for anything in your footage, integrated with ChatGPT and other AI models

text-generation-webui

A Gradio web UI for Large Language Models. Supports transformers, GPTQ, AWQ, EXL2, llama.cpp (GGUF), Llama models.

tinydiarize

Minimal extension of OpenAI's Whisper adding speaker diarization with special tokens

tts

🐸💬 - a deep learning toolkit for Text-to-Speech, battle-tested in research and production

waas

Whisper as a Service (GUI and API with queuing for OpenAI Whisper)

whisper-at

Code and Pretrained Models for Interspeech 2023 Paper "Whisper-AT: Noise-Robust Automatic Speech Recognizers are Also Strong Audio Event Taggers"

whisper-timestamped

Multilingual Automatic Speech Recognition with word-level timestamps and confidence

whisper.cpp

Port of OpenAI's Whisper model in C/C++

whisperx

WhisperX: Automatic Speech Recognition with Word-level Timestamps (& Diarization)

willow-inference-server

Open source, local, and self-hosted highly optimized language inference server supporting ASR/STT, TTS, and LLM across WebRTC, REST, and WS

endimionzf Goto Github PK

endimionzf's Projects

Recommend Projects

Recommend Topics

Recommend Org