kedreamix / linly-talker Goto Github PK

Digital Avatar Conversational System - Linly-Talker. 😄✨ Linly-Talker is an intelligent AI system that combines large language models (LLMs) with visual models to create a novel human-AI interaction method. 🤝🤖 It integrates various technologies like Whisper, Linly, Microsoft Speech Services, and SadTalker talking head generation system. 🌟🔬

Home Page: https://kedreamix.github.io/

License: MIT License

Python 67.86% Shell 0.25% Jupyter Notebook 27.05% C++ 0.10% Cuda 4.46% C 0.28%

linly-talker's Introduction

Digital Human Intelligent Dialogue System - Linly-Talker — 'Interactive Dialogue with Your Virtual Self'

Linly-Talker WebUI

English | 中文简体

2023.12 Update 📆

Users can upload any images for the conversation

2024.01 Update 📆📆

Exciting news! I've now incorporated both the powerful GeminiPro and Qwen large models into our conversational scene. Users can now upload images during the conversation, adding a whole new dimension to the interactions.
The deployment invocation method for FastAPI has been updated.
The advanced settings options for Microsoft TTS have been updated, increasing the variety of voice types. Additionally, video subtitles have been introduced to enhance visualization.
Updated the GPT multi-turn conversation system to establish contextual connections in dialogue, enhancing the interactivity and realism of the digital persona.

2024.02 Update 📆

Updated Gradio to the latest version 4.16.0, providing the interface with additional functionalities such as capturing images from the camera to create digital personas, among others.
ASR and THG have been updated. FunASR from Alibaba has been integrated into ASR, enhancing its speed significantly. Additionally, the THG section now incorporates the Wav2Lip model, while ER-NeRF is currently in preparation (Coming Soon).
I have incorporated the GPT-SoVITS model, which is a voice cloning method. By fine-tuning it with just one minute of a person's speech data, it can effectively clone their voice. The results are quite impressive and worth recommending.
I have integrated a web user interface (WebUI) that allows for better execution of Linly-Talker.

2024.04 Update 📆

Updated the offline mode for Paddle TTS, excluding Edge TTS.
Updated ER-NeRF as one of the choices for Avatar generation.
Updated app_talk.py to allow for the free upload of voice and images/videos for generation without being based on a dialogue scenario.

2024.05 Update 📆

Updated the beginner-friendly AutoDL deployment tutorial, and also updated the codewithgpu image, allowing for one-click experience and learning.
Updated WebUI.py: Linly-Talker WebUI now supports multiple modules, multiple models, and multiple options

2024.06 Update 📆

Integrated MuseTalk into Linly-Talker and updated the WebUI, enabling basic real-time conversation capabilities.
The refined WebUI defaults to not loading the LLM model to reduce GPU memory usage. It directly responds with text to complete voiceovers. The enhanced WebUI features three main functions: personalized character generation, multi-turn intelligent dialogue with digital humans, and real-time MuseTalk conversations. These improvements reduce previous GPU memory redundancies and add more prompts to assist users effectively.

2024.08 Update 📆

Updated CosyVoice to offer high-quality text-to-speech (TTS) functionality and voice cloning capabilities; also upgraded to Wav2Lipv2 to enhance overall performance.

Content

Digital Human Intelligent Dialogue System - Linly-Talker — 'Interactive Dialogue with Your Virtual Self'
- Introduction
- TO DO LIST
- Example
- Setup Environment
- ASR - Speech Recognition
- TTS - Text To Speech
- Voice Clone
- THG - Avatar
  - SadTalker
  - Wav2Lip
  - Wav2Lipv2
  - ER-NeRF
  - MuseTalk
  - Coming Soon
- LLM - Conversation
- Optimizations
- Gradio
- Start WebUI
  - WebUI
- Old Verison
- Folder structure
- Support Us
- Reference
- Star History

Introduction

Linly-Talker is an innovative digital human conversation system that integrates the latest artificial intelligence technologies, including Large Language Models (LLM) 🤖, Automatic Speech Recognition (ASR) 🎙️, Text-to-Speech (TTS) 🗣️, and voice cloning technology 🎤. This system offers an interactive web interface through the Gradio platform 🌐, allowing users to upload images 📷 and engage in personalized dialogues with AI 💬.

The core features of the system include:

Multi-Model Integration: Linly-Talker combines major models such as Linly, GeminiPro, Qwen, as well as visual models like Whisper, SadTalker, to achieve high-quality dialogues and visual generation.
Multi-Turn Conversational Ability: Through the multi-turn dialogue system powered by GPT models, Linly-Talker can understand and maintain contextually relevant and coherent conversations, significantly enhancing the authenticity of the interaction.
Voice Cloning: Utilizing technologies like GPT-SoVITS, users can upload a one-minute voice sample for fine-tuning, and the system will clone the user's voice, enabling the digital human to converse in the user's voice.
Real-Time Interaction: The system supports real-time speech recognition and video captioning, allowing users to communicate naturally with the digital human via voice.
Visual Enhancement: With digital human generation technologies, Linly-Talker can create realistic digital human avatars, providing a more immersive experience.

The design philosophy of Linly-Talker is to create a new form of human-computer interaction that goes beyond simple Q&A. By integrating advanced technologies, it offers an intelligent digital human capable of understanding, responding to, and simulating human communication.

You can watch the demo video here.

I have recorded a series of videos on Bilibili, which also represent every step of my updates and methods of use. For detailed information, please refer to Digital Human Dialogue System - Linly-Talker Collection.

🔥🔥🔥 Digital Human Dialogue System Linly-Talker 🔥🔥🔥

🚀 The Future of Digital Humans: The Empowerment Path of Linly-Talker + GPT-SoVIT Voice Cloning Technology

Deploying Linly-Talker on AutoDL Platform (Super Detailed Tutorial for Beginners)

Linly-Talker Update: Offline TTS Integration and Customized Digital Human Solutions

TO DO LIST

🔆 The Linly-Talker project is ongoing - pull requests are welcome! If you have any suggestions regarding new model approaches, research, techniques, or if you discover any runtime errors, please feel free to edit and submit a pull request. You can also open an issue or contact me directly via email. 📩⭐ If you find this repository useful, please give it a star! 🤩

If you encounter any issues during deployment, please consult the Common Issues Summary section, where I have compiled a list of all potential problems. Additionally, a discussion group is available here, and I will provide regular updates. Thank you for your attention and use of Linly-Talker!

Example

文字/语音对话	数字人回答
应对压力最有效的方法是什么？	example_answer1.mp4
如何进行时间管理？	example_answer2.mp4
撰写一篇交响乐音乐会评论，讨论乐团的表演和观众的整体体验。	example_answer3.mp4
翻译成中文：Luck is a dividend of sweat. The more you sweat, the luckier you get.	example_answer4.mp4

Setup Environment

AutoDL has released an image, which can be used directly at https://www.codewithgpu.com/i/Kedreamix/Linly-Talker/Kedreamix-Linly-Talker. You can also create an environment directly using Docker. I will continue to update the image.

docker pull registry.cn-beijing.aliyuncs.com/codewithgpu2/kedreamix-linly-talker:cMDvNE4RYl

For Windows, I've included an all-in-one Python package. You can run the steps in sequence to install the necessary dependencies and download the corresponding model to get it running. Follow the instructions using conda and start installing PyTorch from step 02. If you encounter any issues, please feel free to contact me.

Windows All-in-One Package

Download the code:

git clone https://github.com/Kedreamix/Linly-Talker.git --depth 1

cd Linly-Talker
git submodule update --init --recursive

If you are using Linly-Talker, you can set up the environment directly with Anaconda, which covers almost all the dependencies required by the models. The specific steps are as follows:

conda create -n linly python=3.8
conda activate linly

# PyTorch installation method 1: Install via conda
# CUDA 11.7
# conda install pytorch==2.0.1 torchvision==0.15.2 torchaudio==2.0.2 pytorch-cuda=11.7 -c pytorch -c nvidia
# CUDA 11.8
# conda install pytorch==2.0.1 torchvision==0.15.2 torchaudio==2.0.2 pytorch-cuda=11.8 -c pytorch -c nvidia

# PyTorch installation method 2: Install via pip
# CUDA 11.7
# pip install torch==2.0.1 torchvision==0.15.2 torchaudio==2.0.2
# CUDA 11.8
pip install torch==2.0.1 torchvision==0.15.2 torchaudio==2.0.2 --index-url https://download.pytorch.org/whl/cu118

conda install -q ffmpeg==4.2.2 # ffmpeg==4.2.2

# Upgrade pip
python -m pip install --upgrade pip
# Change the PyPI source to speed up the installation of packages
pip config set global.index-url https://pypi.tuna.tsinghua.edu.cn/simple

pip install tb-nightly -i https://mirrors.aliyun.com/pypi/simple
pip install -r requirements_webui.txt

# Install dependencies related to musetalk
pip install --no-cache-dir -U openmim
mim install mmengine 
mim install "mmcv==2.1.0" 
mim install "mmdet>=3.1.0" 
mim install "mmpose>=1.1.0" 

# ⚠️ Note: You must first download CosyVoice-ttsfrd. Complete the model download before proceeding with these steps.
mkdir -p CosyVoice/pretrained_models # Create directory CosyVoice/pretrained_models
mv checkpoints/CosyVoice_ckpt/CosyVoice-ttsfrd CosyVoice/pretrained_models # Move directory
unzip CosyVoice/pretrained_models/CosyVoice-ttsfrd/resource.zip # Unzip
# This .whl library is only compatible with Python 3.8
pip install CosyVoice/pretrained_models/CosyVoice-ttsfrd/ttsfrd-0.3.6-cp38-cp38-linux_x86_64.whl

# Install NeRF-based dependencies, which might have several issues and can be skipped initially
pip install "git+https://github.com/facebookresearch/pytorch3d.git"
# If you encounter problems installing PyTorch3D, you can use the following command to install it:
# python scripts/install_pytorch3d.py
pip install -r TFG/requirements_nerf.txt

# If you encouter issues with pyaudio
sudo apt-get update
sudo apt-get install libasound-dev portaudio19-dev libportaudio2 libportaudiocpp0

# Note the following modules. If installation fails, you can enter the directory and use pip install . or python setup.py install to compile and install:
# NeRF/freqencoder
# NeRF/gridencoder
# NeRF/raymarching
# NeRF/shencoder

# If you encounter sox compatibility issues
# ubuntu
sudo apt-get install sox libsox-dev
# centos
sudo yum install sox sox-devel

Below are some older installation methods, which might cause dependency conflicts, but they generally don't produce many bugs. For an easier and better installation, I've updated the above version. You can ignore the following versions or refer to them if you encounter issues.

To install the environment using Anaconda and PyTorch, follow the steps below:

conda create -n linly python=3.10
conda activate linly

# PyTorch Installation Method 1: Conda Installation (Recommended)
conda install pytorch==1.12.1 torchvision==0.13.1 torchaudio==0.12.1 cudatoolkit=11.3 -c pytorch

# PyTorch Installation Method 2: Pip Installation
pip install torch==1.12.1+cu113 torchvision==0.13.1+cu113 torchaudio==0.12.1 --extra-index-url https://download.pytorch.org/whl/cu113

conda install -q ffmpeg # ffmpeg==4.2.2

pip install -r requirements_app.txt

If you want to use models like voice cloning, you may need a higher version of PyTorch. However, the functionality will be more diverse. You may need to use CUDA 11.8 as the driver version, which you can choose.

conda create -n linly python=3.10  
conda activate linly

pip install torch==2.0.1 torchvision==0.15.2 torchaudio==2.0.2 --index-url https://download.pytorch.org/whl/cu118

conda install -q ffmpeg # ffmpeg==4.2.2

pip install -r requirements_app.txt

# Install dependencies for voice cloning
pip install -r VITS/requirements_gptsovits.txt

If you wish to use NeRF-based models, you may need to set up the corresponding environment:

# Install dependencies for NeRF
pip install "git+https://github.com/facebookresearch/pytorch3d.git"
pip install -r TFG/requirements_nerf.txt

# If there are issues with PyAudio, you can install the corresponding dependencies
# sudo apt-get install libasound-dev portaudio19-dev libportaudio2 libportaudiocpp0

# Note the following modules. If installation is unsuccessful, you can navigate to the path and use pip install . or python setup.py install to compile and install.
# NeRF/freqencoder
# NeRF/gridencoder
# NeRF/raymarching
# NeRF/shencoder

If you are using PaddleTTS, you can set up the corresponding environment with:

pip install -r TTS/requirements_paddle.txt

If you are using the FunASR speech recognition model, you can install the environment with:

pip install -r ASR/requirements_funasr.txt

If using the MuesTalk model, you can set up the environment with the following commands:

pip install --no-cache-dir -U openmim 
mim install mmengine 
mim install "mmcv>=2.0.1" 
mim install "mmdet>=3.1.0" 
mim install "mmpose>=1.1.0" 
pip install -r TFG/requirements_musetalk.txt

Next, you need to install the corresponding models. You can download them using the following methods. Once downloaded, place the files in the specified folder structure (explained at the end of this document).

We recommend downloading from Quark Netdisk for the latest updates.

I made a script that can download all the models mentioned below without requiring much input from the user. This method is suitable for stable network conditions, especially for Linux users. For Windows users, Git can also be used to download the models. If the network connection is unstable, users can choose to manually download the models or try running a Shell script to complete the download. The script has the following features:

Choose Download Method: Users can choose to download models from three different sources: ModelScope, Huggingface, or Huggingface mirror site.
Download Models: Based on the user's selection, the script executes the corresponding download command.
Move Model Files: After downloading, the script moves the model files to the specified directory.
Error Handling: Error checks are included in each step of the operation. If any step fails, the script will output an error message and stop execution.

sh scripts/download_models.sh

HuggingFace Download

If the download speed is too slow, consider using a mirror site. For more information, refer to Efficiently Obtain Hugging Face Models Using Mirror Sites.

# Download pre-trained models from HuggingFace
git lfs install
git clone https://huggingface.co/Kedreamix/Linly-Talker --depth 1
# git lfs clone https://huggingface.co/Kedreamix/Linly-Talker --depth 1

# pip install -U huggingface_hub
# export HF_ENDPOINT=https://hf-mirror.com # Use a mirror site
huggingface-cli download --resume-download --local-dir-use-symlinks False Kedreamix/Linly-Talker --local-dir Linly-Talker

ModelScope Download

# Download pre-trained models from Modelscope
# 1. Using git
git lfs install
git clone https://www.modelscope.cn/Kedreamix/Linly-Talker.git --depth 1
# git lfs clone https://www.modelscope.cn/Kedreamix/Linly-Talker.git

# 2. Download using Python code
pip install modelscope
from modelscope import snapshot_download
model_dir = snapshot_download('Kedreamix/Linly-Talker')

Move All Models to the Current Directory

If you downloaded from Baidu Netdisk, you can refer to the directory structure at the end of the document to move the models.

# Move all models to the current directory
# Checkpoints contain SadTalker and Wav2Lip
mv Linly-Talker/checkpoints/* ./checkpoints

# Enhanced GFPGAN for SadTalker
# pip install gfpgan
# mv Linly-Talker/gfpan ./

# Voice cloning models
mv Linly-Talker/GPT_SoVITS/pretrained_models/* ./GPT_SoVITS/pretrained_models/

# Qwen large language model
mv Linly-Talker/Qwen ./

# MuseTalk model
mkdir -p ./Musetalk/models
mv Linly-Talker/MuseTalk/* ./Musetalk/models

For the convenience of deployment and usage, an configs.py file has been updated. You can modify some hyperparameters in this file for customization:

# Device Running Port
port = 7870

# API Running Port and IP
# Localhost port is 127.0.0.1; for global port forwarding, use "0.0.0.0"
ip = '127.0.0.1'
api_port = 7871

# Linly Model Path
mode = 'api'  # For 'api', Linly-api-fast.py must be run first
mode = 'offline'
model_path = 'Linly-AI/Chinese-LLaMA-2-7B-hf'

# SSL Certificate (required for microphone interaction)
# Preferably an absolute path
ssl_certfile = "./https_cert/cert.pem"
ssl_keyfile = "./https_cert/key.pem"

This file allows you to adjust parameters such as the device running port, API running port, Linly model path, and SSL certificate paths for ease of deployment and configuration.

ASR - Speech Recognition

For detailed information about the usage and code implementation of Automatic Speech Recognition (ASR), please refer to ASR - Bridging the Gap with Digital Humans.

Whisper

To implement ASR (Automatic Speech Recognition) using OpenAI's Whisper, you can refer to the specific usage methods provided in the GitHub repository: https://github.com/openai/whisper

FunASR

The speech recognition performance of Alibaba's FunASR is quite impressive and it is actually better than Whisper in terms of Chinese language. Additionally, FunASR is capable of achieving real-time results, making it a great choice. You can experience FunASR by accessing the FunASR file in the ASR folder. Please refer to https://github.com/alibaba-damo-academy/FunASR for more information.

Coming Soon

Welcome everyone to provide suggestions, motivating me to continuously update the models and enrich the functionality of Linly-Talker.

TTS - Text To Speech

For detailed information about the usage and code implementation of Text-to-Speech (TTS), please refer to TTS - Empowering Digital Humans with Natural Speech Interaction.

Edge TTS

To use Microsoft Edge's online text-to-speech service from Python without needing Microsoft Edge or Windows or an API key, you can refer to the GitHub repository at https://github.com/rany2/edge-tts. It provides a Python module called "edge-tts" that allows you to utilize the service. You can find detailed installation instructions and usage examples in the repository's README file.

PaddleTTS

In practical use, there may be scenarios that require offline operation. Since Edge TTS requires an online environment to generate speech, we have chosen PaddleSpeech, another open-source alternative, for Text-to-Speech (TTS). Although there might be some differences in the quality, PaddleSpeech supports offline operations. For more information, you can refer to the GitHub page of PaddleSpeech: https://github.com/PaddlePaddle/PaddleSpeech.

Coming Soon

Welcome everyone to provide suggestions, motivating me to continuously update the models and enrich the functionality of Linly-Talker.

Voice Clone

For detailed information about the usage and code implementation of Voice Clone, please refer to Voice Clone - Stealing Your Voice Quietly During Conversations.

GPT-SoVITS（Recommend）

Thank you for your open source contribution. I have also found the GPT-SoVITS voice cloning model to be quite impressive. You can find the project at https://github.com/RVC-Boss/GPT-SoVITS.

XTTS

Coqui XTTS is a leading deep learning toolkit for Text-to-Speech (TTS) tasks, allowing for voice cloning and voice transfer to different languages using a 5-second or longer audio clip.

🐸 TTS is a library for advanced text-to-speech generation.

🚀 Over 1100 pre-trained models for various languages.

🛠️ Tools for training new models and fine-tuning existing models in any language.

📚 Utility programs for dataset analysis and management.

Experience XTTS online https://huggingface.co/spaces/coqui/xtts
Official GitHub repository: https://github.com/coqui-ai/TTS

CosyVoice

CosyVoice is an open-source multilingual speech understanding model developed by Alibaba’s Tongyi Lab, focusing on high-quality speech synthesis. The model has been trained on over 150,000 hours of data and supports speech synthesis in multiple languages, including Chinese, English, Japanese, Cantonese, and Korean. CosyVoice excels in multilingual speech generation, zero-shot voice generation, cross-lingual voice synthesis, and command execution capabilities.

CosyVoice supports one-shot voice cloning technology, enabling the generation of realistic and natural-sounding voices with details such as prosody and emotion using only 3 to 10 seconds of original audio.

GitHub project link: CosyVoice GitHub

CosyVoice includes several pre-trained speech synthesis models, mainly:

CosyVoice-300M: Supports zero-shot and cross-lingual speech synthesis in Chinese, English, Japanese, Cantonese, Korean, and other languages.
CosyVoice-300M-SFT: A model focused on supervised fine-tuning (SFT) inference.
CosyVoice-300M-Instruct: A model that supports command-based inference, capable of generating speech with specific tones, emotions, and other elements.

Key Features

Multilingual Support: Capable of handling various languages including Chinese, English, Japanese, Cantonese, and Korean.
Multi-style Speech Synthesis: Allows control over the tone and emotion of the generated speech through commands.
Streaming Inference Support: Future updates will include streaming inference modes, such as KV caching and SDPA, for real-time optimization.

Currently, Linly-Talker integrates three features from CosyVoice: pre-trained voice cloning, 3s rapid cloning, and cross-lingual cloning. Stay tuned for more exciting updates on Linly-Talker. Below are some examples of CosyVoice's capabilities:

	PROMPT TEXT	PROMPT SPEECH	TARGET TEXT	RESULT
Pre-trained Voice	中文女音色（'中文女', '中文男', '日语男', '粤语女', '英文女', '英文男', '韩语女'）	—	你好，我是通义生成式语音大模型，请问有什么可以帮您的吗？	sft.webm
3s Language Cloning	希望你以后能够做的比我还好呦。	zero_shot_prompt.webm	收到好友从远方寄来的生日礼物，那份意外的惊喜与深深的祝福让我心中充满了甜蜜的快乐，笑容如花儿般绽放。	zero_shot.webm
Cross-lingual Cloning	在那之后，完全收购那家公司，因此保持管理层的一致性，利益与即将加入家族的资产保持一致。这就是我们有时不买下全部的原因。	cross_lingual_prompt.webm	< \|en\|>And then later on, fully acquiring that company. So keeping management in line, interest in line with the asset that's coming into the family is a reason why sometimes we don't buy the whole thing.	cross_lingual.webm

Coming Soon

Welcome everyone to provide suggestions, motivating me to continuously update the models and enrich the functionality of Linly-Talker.

THG - Avatar

Detailed information about the usage and code implementation of digital human generation can be found in THG - Building Intelligent Digital Humans.

SadTalker

Digital persona generation can utilize SadTalker (CVPR 2023). For detailed information, please visit https://sadtalker.github.io.

Before usage, download the SadTalker model:

bash scripts/sadtalker_download_models.sh

Baidu (百度云盘) (Password: linl)

Quark(夸克网盘)

If downloading from Baidu Cloud, remember to place it in the checkpoints folder. The model downloaded from Baidu Cloud is named sadtalker by default, but it should be renamed to checkpoints.

Wav2Lip

Digital persona generation can also utilize Wav2Lip (ACM 2020). For detailed information, refer to https://github.com/Rudrabha/Wav2Lip.

Before usage, download the Wav2Lip model:

Model	Description	Link to the model
Wav2Lip	Highly accurate lip-sync	Link
Wav2Lip + GAN	Slightly inferior lip-sync, but better visual quality	Link
Expert Discriminator	Weights of the expert discriminator	Link
Visual Quality Discriminator	Weights of the visual disc trained in a GAN setup	Link

Wav2Lipv2

Inspired by the repository https://github.com/primepake/wav2lip_288x288, Wav2Lipv2 uses a newly trained 288 model to achieve higher quality results.

Additionally, by employing YOLO for facial detection, the overall effect is improved. You can compare and test the results in Linly-Talker. The model has been updated, and the comparison is as follows:

Wav2Lip	Wav2Lipv2
Wav2Lipv2.mp4	Wav2Lip.mp4

ER-NeRF

ER-NeRF (ICCV 2023) is a digital human built using the latest NeRF technology. It allows for the customization of digital characters and can reconstruct them using just a five-minute video of a person. For more details, please refer to https://github.com/Fictionarry/ER-NeRF.

Updated: Taking inspiration from the likeness of Obama, for better results, consider cloning and customizing the voice of digital personas for improved effectiveness.

MuseTalk

MuseTalk is a real-time, high-quality audio-driven lip synchronization model capable of running at over 30 frames per second on an NVIDIA Tesla V100 GPU. This model can be integrated with input videos generated by MuseV, forming a part of a comprehensive virtual human solution. For more details, please refer to https://github.com/TMElyralab/MuseTalk.

MuseTalk is trained to operate within the latent space of ft-mse-vae and offers the following features:

Unseen Face Synchronization: It can modify unseen faces based on input audio, with a face region size of 256 x 256.
Multi-language Support: Supports audio inputs in various languages, including Chinese, English, and Japanese.
High-performance Real-time Inference: Achieves real-time inference at over 30 frames per second on an NVIDIA Tesla V100.
Facial Center Point Adjustment: Allows the adjustment of the facial region's center point, significantly impacting the generated results.
HDTF Dataset Training: Provides model checkpoints trained on the HDTF dataset.
Upcoming Training Code Release: Training code will be released soon, facilitating further development and research.

MuseTalk offers an efficient and versatile tool for precise audio synchronization with facial expressions in virtual humans, marking a significant step towards fully interactive virtual personas.

In Linly-Talker, MuseTalk has been integrated to perform inference on videos based on MuseV, achieving an ideal speed for conversations with near real-time performance. This approach works very well and supports streaming-based inference.

Coming Soon

Welcome everyone to provide suggestions, motivating me to continuously update the models and enrich the functionality of Linly-Talker.

LLM - Conversation

For detailed information about the usage and code implementation of Large Language Models (LLM), please refer to LLM - Empowering Digital Humans with Powerful Language Models.

Linly-AI

Linly-AI is a Large Language model developed by CVI at Shenzhen University. You can find more information about Linly-AI on their GitHub repository: https://github.com/CVI-SZU/Linly

Download Linly models: https://huggingface.co/Linly-AI/Chinese-LLaMA-2-7B-hf

You can use git to download:

git lfs install
git clone https://huggingface.co/Linly-AI/Chinese-LLaMA-2-7B-hf

Alternatively, you can use the huggingface download tool huggingface-cli:

pip install -U huggingface_hub

# Set up mirror acceleration
# Linux
export HF_ENDPOINT="https://hf-mirror.com"
# Windows PowerShell
$env:HF_ENDPOINT="https://hf-mirror.com"

huggingface-cli download --resume-download Linly-AI/Chinese-LLaMA-2-7B-hf --local-dir Linly-AI/Chinese-LLaMA-2-7B-hf

Qwen

Qwen is an AI model developed by Alibaba Cloud. You can check out the GitHub repository for Qwen here: https://github.com/QwenLM/Qwen

If you want to quickly use Qwen, you can choose the 1.8B model, which has fewer parameters and can run smoothly even with limited GPU memory. Of course, this part can be replaced with other options.

You can download the Qwen 1.8B model from this link: https://huggingface.co/Qwen/Qwen-1_8B-Chat

You can use git to download:

git lfs install
git clone https://huggingface.co/Qwen/Qwen-1_8B-Chat

Alternatively, you can use the huggingface download tool huggingface-cli:

pip install -U huggingface_hub

# Set up mirror acceleration
# Linux
export HF_ENDPOINT="https://hf-mirror.com"
# Windows PowerShell
$env:HF_ENDPOINT="https://hf-mirror.com"

huggingface-cli download --resume-download Qwen/Qwen-1_8B-Chat --local-dir Qwen/Qwen-1_8B-Chat

Gemini-Pro

Gemini-Pro is an AI model developed by Google. To learn more about Gemini-Pro, you can visit their website: https://deepmind.google/technologies/gemini/

If you want to request an API key for Gemini-Pro, you can visit this link: https://makersuite.google.com/

ChatGPT

From OpenAI, requires API application. For more information, please visit https://platform.openai.com/docs/introduction.

ChatGLM

From Tsinghua University, for more information please visit https://github.com/THUDM/ChatGLM3.

GPT4Free

For free access to GPT-4 and other models, you can refer to https://github.com/xtekky/gpt4free. This resource provides methods to utilize these models without cost.

LLM Multiple Model Selection

In the webui.py file, easily select the model you need. ⚠️ For the first run, make sure to download the model first. Refer to Qwen1.8B.

Coming Soon

Welcome everyone to provide suggestions, motivating me to continuously update the models and enrich the functionality of Linly-Talker.

Optimizations

Some optimizations:

Use fixed input face images, extract features beforehand to avoid reading each time
Remove unnecessary libraries to reduce total time
Only save final video output, don't save intermediate results to improve performance
Use OpenCV to generate final video instead of mimwrite for faster runtime

Gradio

Gradio is a Python library that provides an easy way to deploy machine learning models as interactive web apps.

For Linly-Talker, Gradio serves two main purposes:

Visualization & Demo: Gradio provides a simple web GUI for the model, allowing users to see the results intuitively by uploading an image and entering text. This is an effective way to showcase the capabilities of the system.
User Interaction: The Gradio GUI can serve as a frontend to allow end users to interact with Linly-Talker. Users can upload their own images and ask arbitrary questions or have conversations to get real-time responses. This provides a more natural speech interaction method.

Specifically, we create a Gradio Interface in app.py that takes image and text inputs, calls our function to generate the response video, and displays it in the GUI. This enables browser interaction without needing to build complex frontend.

In summary, Gradio provides visualization and user interaction interfaces for Linly-Talker, serving as effective means for showcasing system capabilities and enabling end users.

If considering real-time conversation, it may be necessary to switch to a different framework or customize Gradio. Looking forward to working together with everyone.

Start WebUI

Previously, I had separated many versions, but it became cumbersome to run multiple versions. Therefore, I have added a WebUI feature to provide a single interface for a seamless experience. I will continue to update it in the future.

WebUI

The current features available in the WebUI are as follows:

You can directly run the web UI to obtain results. The page you will see is as follows:

# WebUI
python webui.py

This time, we've updated the interface. We can freely select the fine-tuned model of GPT-SoVITS to achieve voice cloning. Simply upload a reference audio file to clone the voice.

Old Verison

There are three modes for the current startup, and you can choose a specific setting based on the scenario.

The first mode involves fixed Q&A with a predefined character, eliminating preprocessing time.

python app.py

The first mode has recently been updated to include the Wav2Lip model for dialogue.

python appv2.py

The second mode allows for conversing with any uploaded image.

python app_img.py

The third mode builds upon the first one by incorporating a large language model for multi-turn GPT conversations.

python app_multi.py

Now, the part of voice cloning has been added, allowing for freely switching between cloned voice models and corresponding person images. Here, I have chosen a deep, smoky voice and an image of a male.

python app_vits.py

A fourth method has been added, which does not fixate on a specific scenario for conversation. Instead, it allows for direct input of voice or the generation of voice for the creation of a digital human. It incorporates methods such as Sadtalker, Wav2Lip, and ER-NeRF.

ER-NeRF is trained on videos of a single individual, so a specific model needs to be replaced to render and obtain the correct results. It comes with pre-installed weights for Obama, which can be used directly with the following command:

python app_talk.py

MuseTalk has been integrated into Linly-Talker, enabling efficient preprocessing of MuseV-generated videos. Once preprocessed, these videos facilitate conversations at speeds that meet near real-time requirements, providing very fast performance. MuseTalk is now available within the WebUI.

To run the application, use the following command:

python app_musetalk.py

Folder structure

The folder structure of the weight files is as follows:

Baidu (百度云盘): You can download the weights from here (Password: linl).
huggingface: You can access the weights at this link.
modelscope: The weights will be available soon at this link.
Qurak(夸克网盘)：You can download the weights from here

Linly-Talker/ 
├── checkpoints
│   ├── audio_visual_encoder.pth
│   ├── hub
│   │   └── checkpoints
│   │       └── s3fd-619a316812.pth
│   ├── lipsync_expert.pth
│   ├── mapping_00109-model.pth.tar
│   ├── mapping_00229-model.pth.tar
│   ├── May.json
│   ├── May.pth
│   ├── Obama_ave.pth
│   ├── Obama.json
│   ├── Obama.pth
│   ├── ref_eo.npy
│   ├── ref.npy
│   ├── ref.wav
│   ├── SadTalker_V0.0.2_256.safetensors
│   ├── visual_quality_disc.pth
│   ├── wav2lip_gan.pth
│   └── wav2lip.pth
├── gfpgan
│   └── weights
│       ├── alignment_WFLW_4HG.pth
│       └── detection_Resnet50_Final.pth
├── GPT_SoVITS
│   └── pretrained_models
│       ├── chinese-hubert-base
│       │   ├── config.json
│       │   ├── preprocessor_config.json
│       │   └── pytorch_model.bin
│       ├── chinese-roberta-wwm-ext-large
│       │   ├── config.json
│       │   ├── pytorch_model.bin
│       │   └── tokenizer.json
│       ├── README.md
│       ├── s1bert25hz-2kh-longer-epoch=68e-step=50232.ckpt
│       ├── s2D488k.pth
│       ├── s2G488k.pth
│       └── speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-pytorch
├── MuseTalk
│   ├── models
│   │   ├── dwpose
│   │   │   └── dw-ll_ucoco_384.pth
│   │   ├── face-parse-bisent
│   │   │   ├── 79999_iter.pth
│   │   │   └── resnet18-5c106cde.pth
│   │   ├── musetalk
│   │   │   ├── musetalk.json
│   │   │   └── pytorch_model.bin
│   │   ├── README.md
│   │   ├── sd-vae-ft-mse
│   │   │   ├── config.json
│   │   │   └── diffusion_pytorch_model.bin
│   │   └── whisper
│   │       └── tiny.pt
├── Qwen
│   └── Qwen-1_8B-Chat
│       ├── assets
│       │   ├── logo.jpg
│       │   ├── qwen_tokenizer.png
│       │   ├── react_showcase_001.png
│       │   ├── react_showcase_002.png
│       │   └── wechat.png
│       ├── cache_autogptq_cuda_256.cpp
│       ├── cache_autogptq_cuda_kernel_256.cu
│       ├── config.json
│       ├── configuration_qwen.py
│       ├── cpp_kernels.py
│       ├── examples
│       │   └── react_prompt.md
│       ├── generation_config.json
│       ├── LICENSE
│       ├── model-00001-of-00002.safetensors
│       ├── model-00002-of-00002.safetensors
│       ├── modeling_qwen.py
│       ├── model.safetensors.index.json
│       ├── NOTICE
│       ├── qwen_generation_utils.py
│       ├── qwen.tiktoken
│       ├── README.md
│       ├── tokenization_qwen.py
│       └── tokenizer_config.json
├── Whisper
│   ├── base.pt
│   └── tiny.pt
├── FunASR
│   ├── punc_ct-transformer_zh-cn-common-vocab272727-pytorch
│   │   ├── configuration.json
│   │   ├── config.yaml
│   │   ├── example
│   │   │   └── punc_example.txt
│   │   ├── fig
│   │   │   └── struct.png
│   │   ├── model.pt
│   │   ├── README.md
│   │   └── tokens.json
│   ├── speech_fsmn_vad_zh-cn-16k-common-pytorch
│   │   ├── am.mvn
│   │   ├── configuration.json
│   │   ├── config.yaml
│   │   ├── example
│   │   │   └── vad_example.wav
│   │   ├── fig
│   │   │   └── struct.png
│   │   ├── model.pt
│   │   └── README.md
│   └── speech_seaco_paraformer_large_asr_nat-zh-cn-16k-common-vocab8404-pytorch
│       ├── am.mvn
│       ├── asr_example_hotword.wav
│       ├── configuration.json
│       ├── config.yaml
│       ├── example
│       │   ├── asr_example.wav
│       │   └── hotword.txt
│       ├── fig
│       │   ├── res.png
│       │   └── seaco.png
│       ├── model.pt
│       ├── README.md
│       ├── seg_dict
│       └── tokens.json
└── README.md

Reference

ASR

TTS

LLM

THG

Voice Clone

Star History

linly-talker's People

Contributors

Stargazers

Watchers

Forkers

hsaigroup fingerx ljy2019 ariafyy hanwenyuan0907 strategist922 timkar164 fern001 hectorta1989 yangbod saiyi123 hike2008 wxyv orangels yyheart huangweiboy2 ai-jie01 ai-framwork 731why wangchaodeyuzhou ynag9508 tangyiyong xgymchq qinzhuguang yanniszhou ythyty anthonyyuan bi0nd0 todouer kaixindelele dafei1288 weizihua catspunch heefan sakuramaiii lily569 iweig zqz981 mocha-xsy xueminghui redstarxz zxh263 l1-j5n zhikanggfu riderdecade ilumiere jackstephen sqsjavaer chenmoyun jivaklong zjzkiss juno119 skic sunbin728 keyzf blackwhites wingjoezhou weblfe kuyacai cvcuiwei danvan freesteel ainisa20 yslion nksix bestpredicts colinyyj laohuguaiguai venbill weichunpeng yzhou9700 vbc11 jags111 mru4913 zhengmingshao tenzo444 nuffins hushi55 qwioer1 bytescientist newxlife aloukik21 nemodem opensorceycw mtcto ajeema 1192603654 rehberim360 scriptsnet meng-x julianyangjingjun coolbe chenliqiang1106 yanlianfu jackieglq songfang markyouyuren huiyao351 johndpope duzhi-vnx

linly-talker's Issues

run app_img.py error!

config.py unchanged.
import gradio as gr
ValueError: Unknown scheme for proxy URL URL('socks://127.0.0.1:7890/')
Looking forward to your letter to resolving this issue.

"i got an error when options preprocess --full without --still"这个错误貌似仍存在

原Issues地址
OpenTalker/SadTalker#268

我看作者回复尽快修改，但是该bug至今仍是open的
有哪位大神解决了吗？

Potential Code Injection Vulnerability

Code injection could happen via environment variable.
In code here, it directly eval the value from environment variable. A malicous local actor could set something like export is_half='os.system("touch rickroll")' to execute arbitrary commands. It would be better to use ast.literal_eval here.
For ref, this issue is similar to CVE-2022-2054.

Error of “python app.py”

(linlytalker) PS D:\Python\Linly-Talker>python app_multi.py
如果想使用FunASR，请先安装funasr，若使用Whisper，请忽略此条信息
PaddleTTS Error: No module named 'paddlespeech'
如果使用PaddleTTS，请先安装PaddleTTS环境
pip install -r requirements_paddle.txt
ERNeRF导入失败，原因： No module named 'trimesh'
使用ERNeRF前需要安装对应的环境
The model is automatically converting to bf16 for faster inference. If you want to disable the automatic precision, please manually add bf16/fp16/fp32=True to "AutoModelForCausalLM.from_pretrained".
Try importing flash-attention for faster inference...
Warning: import flash_attn rotary fail, please install FlashAttention rotary to get higher efficiency https://github.com/Dao-AILab/flash-attention/tree/main/csrc/rotary
Warning: import flash_attn rms_norm fail, please install FlashAttention layer_norm to get higher efficiency https://github.com/Dao-AILab/flash-attention/tree/main/csrc/layer_norm
Warning: import flash_attn fail, please install FlashAttention to get higher efficiency https://github.com/Dao-AILab/flash-attention
Loading checkpoint shards: 100%|█████████████████████████████████████████████████████████| 2/2 [00:02<00:00, 1.23s/it]
using safetensor as default
Running on local URL: https://127.0.0.1:7860
Traceback (most recent call last):
File "D:\Python\envs\linlytalker\lib\site-packages\httpx_transports\default.py", line 69, in map_httpcore_exceptions
yield
File "D:\Python\envs\linlytalker\lib\site-packages\httpx_transports\default.py", line 233, in handle_request
resp = self._pool.handle_request(req)
File "D:\Python\envs\linlytalker\lib\site-packages\httpcore_sync\connection_pool.py", line 216, in handle_request
raise exc from None
File "D:\Python\envs\linlytalker\lib\site-packages\httpcore_sync\connection_pool.py", line 196, in handle_request
response = connection.handle_request(
File "D:\Python\envs\linlytalker\lib\site-packages\httpcore_sync\http_proxy.py", line 317, in handle_request
stream = stream.start_tls(**kwargs)
File "D:\Python\envs\linlytalker\lib\site-packages\httpcore_sync\http11.py", line 383, in start_tls
return self._stream.start_tls(ssl_context, server_hostname, timeout)
File "D:\Python\envs\linlytalker\lib\site-packages\httpcore_backends\sync.py", line 152, in start_tls
with map_exceptions(exc_map):
File "D:\Python\envs\linlytalker\lib\contextlib.py", line 153, in exit
self.gen.throw(typ, value, traceback)
File "D:\Python\envs\linlytalker\lib\site-packages\httpcore_exceptions.py", line 14, in map_exceptions
raise to_exc(exc) from exc
httpcore.ConnectError: [SSL: UNEXPECTED_EOF_WHILE_READING] EOF occurred in violation of protocol (_ssl.c:1007)

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "D:\Python\MetaHuman\Linly-Talker\app_multi.py", line 258, in
demo.launch(server_name=ip, # 本地端口localhost:127.0.0.1 全局端口转发:"0.0.0.0"
File "D:\Python\envs\linlytalker\lib\site-packages\gradio\blocks.py", line 1946, in launch
httpx.get(f"{self.local_url}startup-events", verify=ssl_verify)
File "D:\Python\envs\linlytalker\lib\site-packages\httpx_api.py", line 198, in get
return request(
File "D:\Python\envs\linlytalker\lib\site-packages\httpx_api.py", line 106, in request
return client.request(
File "D:\Python\envs\linlytalker\lib\site-packages\httpx_client.py", line 827, in request
return self.send(request, auth=auth, follow_redirects=follow_redirects)
File "D:\Python\envs\linlytalker\lib\site-packages\httpx_client.py", line 914, in send
response = self._send_handling_auth(
File "D:\Python\envs\linlytalker\lib\site-packages\httpx_client.py", line 942, in _send_handling_auth
response = self._send_handling_redirects(
File "D:\Python\envs\linlytalker\lib\site-packages\httpx_client.py", line 979, in _send_handling_redirects
response = self._send_single_request(request)
File "D:\Python\envs\linlytalker\lib\site-packages\httpx_client.py", line 1015, in _send_single_request
response = transport.handle_request(request)
File "D:\Python\envs\linlytalker\lib\site-packages\httpx_transports\default.py", line 232, in handle_request
with map_httpcore_exceptions():
File "D:\Python\envs\linlytalker\lib\contextlib.py", line 153, in exit
self.gen.throw(typ, value, traceback)
File "D:\Python\envs\linlytalker\lib\site-packages\httpx_transports\default.py", line 86, in map_httpcore_exceptions
raise mapped_exc(message) from exc
httpx.ConnectError: [SSL: UNEXPECTED_EOF_WHILE_READING] EOF occurred in violation of protocol (_ssl.c:1007)
(linlytalker) PS D:\Python\Linly-Talker>

请问是什么原因呢？我把模型文件都下载下来了，手动放到指定位置。那一系列的app_*.py的启动都报错，只有webui.py可以启动
我的cuda 11.7 python 3.10 pytorch 2.0.1

VITS/app.py打不开

网页打不开，请问怎么解决

LLM对话步骤出现错误：“对不起，你的请求出错了，请再次尝试。”

您好，我在使用webui时上传语音对话，识别完成后提交视频时发生了如下的问题。
使用的显卡为4090。

错误部分如下：
extern "C"
launch_bounds(512, 4)
global void reduction_prod_kernel(ReduceJitOp r){
r.run();
}
nvrtc: error: invalid value for --gpu-architecture (-arch)

对不起，你的请求出错了，请再次尝试。
Sorry, your request has encountered an error. Please try again.

函数 predict 运行时间： 3.0960586071014404 秒
函数 LLM_response 运行时间： 3.160871982574463 秒
audio2exp:: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 19/19 [00:00<00:00, 212.45it/s]Face Renderer:: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 92/92 [00:18<00:00, 5.00it/s]fps: 20 183
./results/temp_girl_answer.mp4
函数 Talker_response 运行时间： 22.409300565719604 秒

我的Qwen文件夹结构如下：

希望能帮忙解答，谢谢！

启动webui.py报错：SadTalker Error: invalid load key, 'v'.

我已经下载SadTalker的相关权重，保持和如下一致：

但是我运行程序的时候还是报错了：

是模型的权重文件下载得不对吗？请问该从哪里下载正确的文件？

请问可以实现和数字人实时交流？

请问可以实现用rtmp_streaming推流，让数字人实时动起来然后实时交流吗？

add easy-wav2lip to project

wav2lip used too much time on lip change, I found a faster one named Easy-Wav2lip
Can author add it to your project? Thanks~

在数字人文本/语音播报，用视频人物、wav2lip模型，出现Image too big to run face detection on GPU. Please use the resize_factor argument

Traceback (most recent call last):
File "/root/Linly-Talker/TFG/Wav2Lip.py", line 194, in face_detect
predictions.extend(detector.get_detections_for_batch(np.array(images[i:i + batch_size])))
File "/root/Linly-Talker/face_detection/api.py", line 66, in get_detections_for_batch
detected_faces = self.face_detector.detect_from_batch(images.copy())
File "/root/Linly-Talker/face_detection/detection/sfd/sfd_detector.py", line 42, in detect_from_batch
bboxlists = batch_detect(self.face_detector, images, device=self.device)
File "/root/Linly-Talker/face_detection/detection/sfd/detect.py", line 59, in batch_detect
imgs = imgs - np.array([104, 117, 123])
TypeError: unsupported operand type(s) for -: 'NoneType' and 'int'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/root/miniconda3/lib/python3.10/site-packages/gradio/queueing.py", line 495, in call_prediction
output = await route_utils.call_process_api(
File "/root/miniconda3/lib/python3.10/site-packages/gradio/route_utils.py", line 232, in call_process_api
output = await app.get_blocks().process_api(
File "/root/miniconda3/lib/python3.10/site-packages/gradio/blocks.py", line 1561, in process_api
result = await self.call_function(
File "/root/miniconda3/lib/python3.10/site-packages/gradio/blocks.py", line 1179, in call_function
prediction = await anyio.to_thread.run_sync(
File "/root/miniconda3/lib/python3.10/site-packages/anyio/to_thread.py", line 56, in run_sync
return await get_async_backend().run_sync_in_worker_thread(
File "/root/miniconda3/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 2134, in run_sync_in_worker_thread
return await future
File "/root/miniconda3/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 851, in run
result = context.run(func, *args)
File "/root/miniconda3/lib/python3.10/site-packages/gradio/utils.py", line 695, in wrapper
response = f(*args, **kwargs

gr.Error("无克隆环境或者无克隆模型权重，无法克隆声音", e)

事先在单独工作区中训练了GPTsoVITS，然后再将训练好的权重放在了GPT_weights和SoVITS_weights中，然后运行克隆声音时出现以下error:

/Linly-Talker/webui.py", line 114, in LLM_response
gr.Error("无克隆环境或者无克隆模型权重，无法克隆声音", e)
TypeError: Error.init

或许声音克隆这个模块在webui.py中还需要改代码吗？

作者对展厅数字人有什么建议吗？

作者好，有个问题想请教下，我想基于这个项目做一个3d的问答虚拟数字人，就是那种可以放到展厅来实时对话，就调用在线的gpt api就行，有可能实现吗？
我理解是不是从问答到驱动数字人回答这个项目都集成了，我只需要更多的关注怎么用设备去承载数字人去展示就可以了？

support qwen models?

测试下来，发现几个问题，请大佬指点解决。

首先，觉得这个项目挺好，所以才会本地部署起来测试，这是值得肯定的！

其次，先描述本人系统状态：

Lenovo P52 笔记本，64GB 内存，P3200 6GB + 外接 P40 24GB 双显卡
Windows 11 x64，Python 3.10.13，CUDA 11.8， Torch 2.0.1 环境
采用 Linly-AI-7B 做对话模型
首先，根据大佬的 requirements_app.txt 列出的依赖项，补充了环境里没有的：
gradio==3.38.0
edge-tts>=6.1.9
openai-whisper
zhconv
google-generativeai
transformers==4.32.0
其它环境里具备的，按 pip install -r requirements_app.txt 走。

一、直接 Python app.py，成功执行，会有警告：
Exception in callback _ProactorBasePipeTransport._call_connection_lost(None)
handle: <Handle _ProactorBasePipeTransport._call_connection_lost(None)>
Traceback (most recent call last):
File "C:\Python\Python310\lib\asyncio\events.py", line 80, in _run
self._context.run(self._callback, *self._args)
File "C:\Python\Python310\lib\asyncio\proactor_events.py", line 165, in _call_connection_lost
self._sock.shutdown(socket.SHUT_RDWR)
ConnectionResetError: [WinError 10054] 远程主机强迫关闭了一个现有的连接。
查了网上资料，发现这个与网络连接有关的问题，很常见，但不影响使用，具体原因应该是asyncio库在运行时，没有判别系统平台是Windows还是Linux或别的，都直接调用了asyncio.set_event_loop_policy()类引起的，解决方法可通过加入判断：
if platform.system() == 'Windows':
asyncio.set_event_loop_policy(asyncio.WindowsSelectorEventLoopPolicy())
之后，就不会再报错。

二、运行过程加载模型的时候，会有提示：
bin C:\Python\Python310\lib\site-packages\bitsandbytes\libbitsandbytes_cuda118_nocublaslt.dll
[2024-01-25 17:08:13,225] [INFO] [real_accelerator.py:161:get_accelerator] Setting ds_accelerator to cuda (auto detect)
NOTE: Redirects are currently not supported in Windows or MacOs.
Loading checkpoint shards: 100%|███████████████████████████████████████████████████████| 2/2 [00:37<00:00, 18.90s/it]
using safetensor as default
但其实，本人安装的是 windows 版编译的 bitsandbytes，可能是跟某个模型加速的库调用有关，不影响使用。

三、测试 app_img.py，视频合成的最后阶段，报错如下：
{'checkpoint': 'checkpoints\SadTalker_V0.0.2_256.safetensors', 'dir_of_BFM_fitting': 'src/config', 'audio2pose_yaml_path': 'src/config\auido2pose.yaml', 'audio2exp_yaml_path': 'src/config\auido2exp.yaml', 'pirender_yaml_path': 'src/config\facerender_pirender.yaml', 'pirender_checkpoint': 'checkpoints\epoch_00190_iteration_000400000_checkpoint.pt', 'use_safetensor': True, 'mappingnet_checkpoint': 'checkpoints\mapping_00229-model.pth.tar', 'facerender_yaml': 'src/config\facerender.yaml'}
temp\1822631dac470091cee138bad413911fac97da9e\image.png
landmark Det:: 100%|███████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 5.03it/s]
3DMM Extraction In Video:: 100%|███████████████████████████████████████████████████████| 1/1 [00:00<00:00, 14.77it/s]
audio2exp:: 100%|███████████████████████████████████████████████████████████████████| 13/13 [00:00<00:00, 110.95it/s]
Face Renderer:: 100%|██████████████████████████████████████████████████████████████| 123/123 [00:34<00:00, 3.54it/s]
fps: 25 123
ffmpeg error
Traceback (most recent call last):
File "C:\Python\Python310\lib\site-packages\gradio\routes.py", line 442, in run_predict
output = await app.get_blocks().process_api(
File "C:\Python\Python310\lib\site-packages\gradio\blocks.py", line 1389, in process_api
result = await self.call_function(
File "C:\Python\Python310\lib\site-packages\gradio\blocks.py", line 1094, in call_function
prediction = await anyio.to_thread.run_sync(
File "C:\Python\Python310\lib\site-packages\anyio\to_thread.py", line 31, in run_sync
return await get_asynclib().run_sync_in_worker_thread(
File "C:\Python\Python310\lib\site-packages\anyio_backends_asyncio.py", line 937, in run_sync_in_worker_thread
return await future
File "C:\Python\Python310\lib\site-packages\anyio_backends_asyncio.py", line 867, in run
result = context.run(func, *args)
File "C:\Python\Python310\lib\site-packages\gradio\utils.py", line 703, in wrapper
response = f(*args, **kwargs)
File "D:\AITest\LinlyTalker\my_app_img.py", line 84, in text_response
video = sad_talker.test2(source_image,
File "D:\AITest\LinlyTalker\src\SadTalker.py", line 279, in test2
return_path = self.animate_from_coeff.generate(data, save_dir, pic_path, crop_info, enhancer='gfpgan' if use_enhancer else None, preprocess=preprocess, img_size=size)
File "D:\AITest\LinlyTalker\src\facerender\animate.py", line 272, in generate
os.remove(path)
FileNotFoundError: [WinError 3] 系统找不到指定的路径。: './results/85200a0a-e6c9-4143-980f-a82b4a8dd3b5\temp_85200a0a-e6c9-4143-980f-a82b4a8dd3b5\first_frame_dir\image_85200a0a-e6c9-4143-980f-a82b4a8dd3b5\input\answer.mp4'

这个可能与大佬传递的系统 path 变量有关，但没找到如何解决，请大佬帮忙分析解决。

四、在使用 app.py 和 app_multi.py 时，想修改默认的头像 example.png 为别的头像，但发现修改脚本里面的 image 路径是不管用的，最后直接删除掉 inputs 目录下的 first_frame_dir 整个目录，执行得到报错信息如下：
Traceback (most recent call last):
File "C:\Python\Python310\lib\site-packages\scipy\io\matlab_mio.py", line 39, in _open_file
return open(file_like, mode), True
FileNotFoundError: [Errno 2] No such file or directory: './inputs/first_frame_dir/example.mat'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "C:\Python\Python310\lib\site-packages\gradio\routes.py", line 442, in run_predict
output = await app.get_blocks().process_api(
File "C:\Python\Python310\lib\site-packages\gradio\blocks.py", line 1389, in process_api
result = await self.call_function(
File "C:\Python\Python310\lib\site-packages\gradio\blocks.py", line 1094, in call_function
prediction = await anyio.to_thread.run_sync(
File "C:\Python\Python310\lib\site-packages\anyio\to_thread.py", line 31, in run_sync
return await get_asynclib().run_sync_in_worker_thread(
File "C:\Python\Python310\lib\site-packages\anyio_backends_asyncio.py", line 937, in run_sync_in_worker_thread
return await future
File "C:\Python\Python310\lib\site-packages\anyio_backends_asyncio.py", line 867, in run
result = context.run(func, *args)
File "C:\Python\Python310\lib\site-packages\gradio\utils.py", line 703, in wrapper
response = f(*args, **kwargs)
File "D:\AITest\LinlyTalker\my_app_multi.py", line 148, in human_respone
video_path = sad_talker.test(source_image,
File "D:\AITest\LinlyTalker\src\SadTalker.py", line 153, in test
batch = get_data(first_coeff_path, audio_path, self.device, ref_eyeblink_coeff_path=ref_eyeblink_coeff_path, still=still_mode,
File "D:\AITest\LinlyTalker\src\generate_batch.py", line 82, in get_data
source_semantics_dict = scio.loadmat(source_semantics_path)
File "C:\Python\Python310\lib\site-packages\scipy\io\matlab_mio.py", line 225, in loadmat
with _open_file_context(file_name, appendmat) as f:
File "C:\Python\Python310\lib\contextlib.py", line 135, in enter
return next(self.gen)
File "C:\Python\Python310\lib\site-packages\scipy\io\matlab_mio.py", line 17, in _open_file_context
f, opened = _open_file(file_like, appendmat, mode)
File "C:\Python\Python310\lib\site-packages\scipy\io\matlab_mio.py", line 45, in _open_file
return open(file_like, mode), True
FileNotFoundError: [Errno 2] No such file or directory: './inputs/first_frame_dir/example.mat'

感觉这个脚本里面哪里被写死了，请大佬指点修改哪里可以实现替换不同默认头像的功能，谢谢！

API

大佬，什么时候能开放API出来哇？（坐等中...

运行报错 python app.py

你好，请教一个问题，项目根目录运行的时候会报这个错。

运行python app_musetalk.py 报错 MuseTalk的相关模型我都已经下载完了，为啥运行musetalk需要ERNeRF的环境不理解

No CUDA runtime is found, using CUDA_HOME='C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.4'
ERNeRF导入失败，原因： list index out of range
使用ERNeRF前需要安装对应的环境
MuseTalk导入失败，原因： DLL load failed while importing ext: 找不到指定的模块。
使用MuseTalk前需要安装对应的环境
MuseTalk Error: cannot import name 'MuseTalk_RealTime' from 'TFG' (D:\ai3\Linly-Talker\TFG_init.py)
如果使用MuseTalk，请先下载MuseTalk相关模型
Traceback (most recent call last):
File "D:\ai3\Linly-Talker\app_musetalk.py", line 195, in
demo = main()
File "D:\ai3\Linly-Talker\app_musetalk.py", line 78, in main
source_video.change(fn=musetalker.prepare_material, inputs=[source_video, bbox_shift], outputs=[source_video, bbox_shift_scale])
NameError: name 'musetalker' is not defined

感谢大佬可以正常运行

代码运行成功了

关于数字人的问题

请问我想传入一段话让数字人读，怎么实现？

麻烦问下数字人驱动这模块速度怎么样呢

colab的连接找不到启动文件··

关于GPT-SoVITS和XTTS，README写的太简单了

GPT-SoVITS和XTTS的配置写的太简单了。
GPT-SoVITS还有一堆包需要下载，还有nltk需要下载配置。
XTTS也是报：没有examples/female.wav、 tts_models--multilingual--multi-dataset--xtts_v2/config.json等错误。

README能否写详细点，或者类似Sadtalker，把调用的模型和存放位置都写一下。

能调ollama嘛？本地算力不够，需要调远程算力

有api接口吗？还是得自己写啊

我用WEBUI启动，似乎是用CPU在推理数字人视频！

我用源码在WINDOWS机器用CONDA环境部署的，运行WEBUI.PY，跑到face renderer那一步的时候发现很慢，然后GPU没有占用都是CPU，是需要在哪里打开调用GPU开关吗？

拼写错误

pip intall -r ASR/requirements_funasr.py

输出视频图片大小如何保持原照片大小

像输出的视频还是和原图片大小一致，要如何修改呢

有语音唤醒实现吗

这个proxy url要如何设置

需要单独构建一个魔法server吗

启动问题执行python app.py出现如下的错误

Traceback (most recent call last):
File "D:\ai3\Linly-Talker\app.py", line 180, in
talker = SadTalker(lazy_load=True)
File "D:\ai3\Linly-Talker\TFG\SadTalker.py", line 38, in init
self.animate_from_coeff = AnimateFromCoeff(self.sadtalker_paths, self.device)
File "D:\ai3\Linly-Talker\src\facerender\animate.py", line 82, in init
self.load_cpk_mapping(sadtalker_path['mappingnet_checkpoint'], mapping=mapping)
File "D:\ai3\Linly-Talker\src\facerender\animate.py", line 157, in load_cpk_mapping
checkpoint = torch.load(checkpoint_path, map_location=torch.device(device))
File "D:\ai3\Linly-Talker\venv\lib\site-packages\torch\serialization.py", line 1028, in load
return _legacy_load(opened_file, map_location, pickle_module, **pickle_load_args)
File "D:\ai3\Linly-Talker\venv\lib\site-packages\torch\serialization.py", line 1246, in _legacy_load
magic_number = pickle_module.load(f, **pickle_load_args)
_pickle.UnpicklingError: invalid load key, 'v'.

启动问题

启动后报错了，总是解决不了，请问这是什么原因，该怎么解决？

(linly) D:\Linly-Talker>python app.py
Traceback (most recent call last):
File "D:\Linly-Talker\app.py", line 5, in
from LLM import LLM
File "D:\Linly-Talker\LLM_init_.py", line 1, in
from .Linly import Linly
File "D:\Linly-Talker\LLM\Linly.py", line 2, in
import torch
File "C:\ProgramData\Anaconda3\envs\linly\lib\site-packages\torch_init_.py", line 130, in
raise err
OSError: [WinError 127] 找不到指定的程序。 Error loading "C:\ProgramData\Anaconda3\envs\linly\lib\site-packages\torch\lib\c10_cuda.dll" or one of its dependencies.

python app_img.py 生成视频的话，每次都会报错

python app_img.py 生成视频的话，每次都会报这种错， app.py 和 app_multi.py启动后是好着的

上传视频格式问题

你好，请问使用musetalk的时候，上传的视频有固定格式要求吗，为什么我的总报错呢

执行 pip install -r requirements_webui.txt 时候报错

ERROR: Could not find a version that satisfies the requirement tb-nightly (from basicsr) (from versions: none)
ERROR: No matching distribution found for tb-nightly

使用full模式报错

使用full模式提示：RuntimeError: Given groups=1, weight of size [1024, 70, 7], expected input[1, 73, 27] to have 70 channels, but got 73 channels instead。

sadTalker下载的脚本没找到

bash scripts/sadtalker_download_models.sh
没有在源码中见到这个脚本文件

保存视频路径错误

Face Renderer:: 100%之后，提示路径错误，请问是配置的问题吗？

./results/b71f4ace-a29e-47fe-a1ad-edeb3ba99e28\temp_b71f4ace-a29e-47fe-a1ad-edeb3ba99e28\first_frame_dir\image_b71f4ace-a29e-47fe-a1ad-edeb3ba99e28\input\answer.mp4: No such file or directory
Traceback (most recent call last):
File "C:\ProgramData\anaconda3\envs\linly\lib\shutil.py", line 791, in move
os.rename(src, real_dst)
FileNotFoundError: [WinError 2] 系统找不到指定的文件。: '89cf9dcd-0120-4368-8106-ef56ecd5ed86.mp4' -> './results/b71f4ace-a29e-47fe-a1ad-edeb3ba99e28\b71f4ace-a29e-47fe-a1ad-edeb3ba99e28\first_frame_dir\image_b71f4ace-a29e-47fe-a1ad-edeb3ba99e28\input\answer.mp4'

请问如何连接ONEAPI，应当优化什么位置的代码

pip install -r VITS/requirements_gptsovits.txt报错

按照你的说明，安装这个的时候依赖出了问题

Installing build dependencies ... error
error: subprocess-exited-with-error

× pip subprocess to install build dependencies did not run successfully.
│ exit code: 2
╰─> [63 lines of output]
Looking in indexes: https://pypi.mirrors.ustc.edu.cn/simple/
Ignoring oldest-supported-numpy: markers 'python_version < "3.9"' don't match your environment
ERROR: Exception:
Traceback (most recent call last):

AttributeError: 'MuseTalk_RealTime' object has no attribute 'vae' 运行MuseTalk界面的时候点击例子运行一会儿就报下面的错误

Traceback (most recent call last):
File "C:\ProgramData\anaconda3\envs\linly\lib\site-packages\gradio\queueing.py", line 495, in call_prediction
output = await route_utils.call_process_api(
File "C:\ProgramData\anaconda3\envs\linly\lib\site-packages\gradio\route_utils.py", line 232, in call_process_api
output = await app.get_blocks().process_api(
File "C:\ProgramData\anaconda3\envs\linly\lib\site-packages\gradio\blocks.py", line 1561, in process_api
result = await self.call_function(
File "C:\ProgramData\anaconda3\envs\linly\lib\site-packages\gradio\blocks.py", line 1179, in call_function
prediction = await anyio.to_thread.run_sync(
File "C:\ProgramData\anaconda3\envs\linly\lib\site-packages\anyio\to_thread.py", line 56, in run_sync
return await get_async_backend().run_sync_in_worker_thread(
File "C:\ProgramData\anaconda3\envs\linly\lib\site-packages\anyio_backends_asyncio.py", line 2177, in run_sync_in_worker_thread
return await future
File "C:\ProgramData\anaconda3\envs\linly\lib\site-packages\anyio_backends_asyncio.py", line 859, in run
result = context.run(func, *args)
File "C:\ProgramData\anaconda3\envs\linly\lib\site-packages\gradio\utils.py", line 695, in wrapper
response = f(*args, **kwargs)
File "C:\ProgramData\anaconda3\envs\linly\lib\site-packages\gradio\utils.py", line 695, in wrapper
response = f(*args, **kwargs)
File "D:\ai3\Linly-Talker\TFG\MuseTalk.py", line 236, in prepare_material
latents = self.vae.get_latents_for_unet(resized_crop_frame)
AttributeError: 'MuseTalk_RealTime' object has no attribute 'vae'

页面出来就是连接到网络，这什么情况

打开网页之后，会出现连接到网络需要登录，然后跳转http://edge-http.microsoft.com/captiveportal/generate_204
http://www.gstatic.com/generate_204

pip install -r VITS/requirements_gptsovits.txt安装报错

按照你的说明，安装这个的时候依赖出了问题
pip install -r VITS/requirements_gptsovits.txt

Installing build dependencies ... error
error: subprocess-exited-with-error

群聊二维码已过期，求拉。顺便问项目问题

我希望我来传入文本，本地数字人不调用大模型而是直接运行TTS和wav2lip，请问可以做到吗？

up主可以在语言克隆模块中加入RVC（简版SOVITS）项目嘛ㅠㅠ

sovits运行时一直报错了。。可能RVC会更好跑通一点..？（）

请问能连接ONEAPI吗

请问能连接ONEAPI中的gpt模型吗，应当如何操作

AttributeError: 'MuseTalk_RealTime' object has no attribute 'vae' 开启后点击运行还是显示这个错误

镜像

大神, 是不是可以做个更方便的镜像

error: metadata-generation-failed

(LinlyTalkerLaoGao) E:\GaoLLM\Test\LinlyTalkerLaoGao\Linly-Talker>pip install -r VITS/requirements_xtts.txt
Collecting TTS@ git+https://github.com/coqui-ai/[email protected] (from -r VITS/requirements_xtts.txt (line 2))
Cloning https://github.com/coqui-ai/TTS (to revision v0.21.1) to c:\users\administrator\appdata\local\temp\pip-install-y6ubg7yq\tts_3702a0838f9d47cabc5e0cc2d313aefa
Running command git clone --filter=blob:none --quiet https://github.com/coqui-ai/TTS 'C:\Users\Administrator\AppData\Local\Temp\pip-install-y6ubg7yq\tts_3702a0838f9d47cabc5e0cc2d313aefa'
Running command git checkout -q 00a870c26abdc06429ffef3e2814b1a1d5b40fff
Resolved https://github.com/coqui-ai/TTS to commit 00a870c26abdc06429ffef3e2814b1a1d5b40fff
Installing build dependencies ... done
Getting requirements to build wheel ... done
Preparing metadata (pyproject.toml) ... done
Collecting pydantic==1.10.13 (from -r VITS/requirements_xtts.txt (line 3))
Downloading pydantic-1.10.13-cp310-cp310-win_amd64.whl.metadata (150 kB)
---------------------------------------- 150.9/150.9 kB 1.8 MB/s eta 0:00:00
Collecting python-multipart==0.0.6 (from -r VITS/requirements_xtts.txt (line 4))
Downloading python_multipart-0.0.6-py3-none-any.whl.metadata (2.5 kB)
Requirement already satisfied: typing-extensions>=4.8.0 in e:\anaconda3\envs\linlytalkerlaogao\lib\site-packages (from -r VITS/requirements_xtts.txt (line 5)) (4.11.0)
Collecting cutlet (from -r VITS/requirements_xtts.txt (line 6))
Downloading cutlet-0.4.0.tar.gz (412 kB)
---------------------------------------- 412.3/412.3 kB 12.6 MB/s eta 0:00:00
Installing build dependencies ... done
Getting requirements to build wheel ... done
Preparing metadata (pyproject.toml) ... done
Collecting mecab-python3==1.0.6 (from -r VITS/requirements_xtts.txt (line 7))
Downloading mecab_python3-1.0.6-cp310-cp310-win_amd64.whl.metadata (6.3 kB)
Collecting unidic-lite==1.0.8 (from -r VITS/requirements_xtts.txt (line 8))
Downloading unidic-lite-1.0.8.tar.gz (47.4 MB)
---------------------------------------- 47.4/47.4 MB 17.7 MB/s eta 0:00:00
Preparing metadata (setup.py) ... done
Collecting unidic==1.1.0 (from -r VITS/requirements_xtts.txt (line 9))
Downloading unidic-1.1.0.tar.gz (7.7 kB)
Preparing metadata (setup.py) ... done
Requirement already satisfied: langid in e:\anaconda3\envs\linlytalkerlaogao\lib\site-packages (from -r VITS/requirements_xtts.txt (line 10)) (1.1.6)
Collecting deepspeed (from -r VITS/requirements_xtts.txt (line 11))
Downloading deepspeed-0.14.2.tar.gz (1.3 MB)
---------------------------------------- 1.3/1.3 MB 14.0 MB/s eta 0:00:00
Preparing metadata (setup.py) ... error
error: subprocess-exited-with-error

× python setup.py egg_info did not run successfully.
│ exit code: 1
╰─> [36 lines of output]
test.c
LINK : fatal error LNK1181: 无法打开输入文件“aio.lib”
系统找不到指定的文件。
系统找不到指定的文件。
系统找不到指定的文件。
系统找不到指定的文件。
系统找不到指定的文件。
系统找不到指定的文件。
Traceback (most recent call last):
File "", line 2, in
File "", line 34, in
File "C:\Users\Administrator\AppData\Local\Temp\pip-install-y6ubg7yq\deepspeed_72617cae880f4f1d8e6c1569fd949e57\setup.py", line 190, in
ext_modules.append(builder.builder())
File "C:\Users\Administrator\AppData\Local\Temp\pip-install-y6ubg7yq\deepspeed_72617cae880f4f1d8e6c1569fd949e57\op_builder\builder.py", line 659, in builder
extra_link_args=self.strip_empty_entries(self.extra_ldflags()))
File "C:\Users\Administrator\AppData\Local\Temp\pip-install-y6ubg7yq\deepspeed_72617cae880f4f1d8e6c1569fd949e57\op_builder\inference_cutlass_builder.py", line 71, in extra_ldflags
import dskernels
ModuleNotFoundError: No module named 'dskernels'
DS_BUILD_OPS=1
�[93m [WARNING] �[0m async_io requires the dev libaio .so object and headers but these were not found.
�[93m [WARNING] �[0m If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.
�[93m [WARNING] �[0m cpu_adam requires the 'lscpu' command, but it does not exist!
�[93m [WARNING] �[0m cpu_adam attempted to query 'lscpu' after failing to use py-cpuinfo to detect the CPU architecture. 'lscpu' does not appear to exist on your system, will fall back to use -march=native and non-vectorized execution.
�[93m [WARNING] �[0m cpu_adam requires the 'lscpu' command, but it does not exist!
�[93m [WARNING] �[0m cpu_adam attempted to query 'lscpu' after failing to use py-cpuinfo to detect the CPU architecture. 'lscpu' does not appear to exist on your system, will fall back to use -march=native and non-vectorized execution.
�[93m [WARNING] �[0m cpu_adagrad requires the 'lscpu' command, but it does not exist!
�[93m [WARNING] �[0m cpu_adagrad attempted to query 'lscpu' after failing to use py-cpuinfo to detect the CPU architecture. 'lscpu' does not appear to exist on your system, will fall back to use -march=native and non-vectorized execution.
�[93m [WARNING] �[0m cpu_adagrad requires the 'lscpu' command, but it does not exist!
�[93m [WARNING] �[0m cpu_adagrad attempted to query 'lscpu' after failing to use py-cpuinfo to detect the CPU architecture. 'lscpu' does not appear to exist on your system, will fall back to use -march=native and non-vectorized execution.
�[93m [WARNING] �[0m cpu_lion requires the 'lscpu' command, but it does not exist!
�[93m [WARNING] �[0m cpu_lion attempted to query 'lscpu' after failing to use py-cpuinfo to detect the CPU architecture. 'lscpu' does not appear to exist on your system, will fall back to use -march=native and non-vectorized execution.
�[93m [WARNING] �[0m cpu_lion requires the 'lscpu' command, but it does not exist!
�[93m [WARNING] �[0m cpu_lion attempted to query 'lscpu' after failing to use py-cpuinfo to detect the CPU architecture. 'lscpu' does not appear to exist on your system, will fall back to use -march=native and non-vectorized execution.
�[93m [WARNING] �[0m Please specify the CUTLASS repo directory as environment variable $CUTLASS_PATH
�[93m [WARNING] �[0m Filtered compute capabilities ['6.0', '6.1', '7.0']
�[93m [WARNING] �[0m Filtered compute capabilities ['6.0', '6.1', '7.0']
[end of output]

note: This error originates from a subprocess, and is likely not a problem with pip.
error: metadata-generation-failed

× Encountered error while generating package metadata.
╰─> See above for output.

note: This is an issue with the package mentioned above, not pip.
hint: See above for details.

(LinlyTalkerLaoGao) E:\GaoLLM\Test\LinlyTalkerLaoGao\Linly-Talker>conda install -c conda-forge

CondaValueError: too few arguments, must supply command line package specs or --file

当我运行python webui.py ，并点击提交视频生成时，我遇到了Connection errored out

终端如下图

请问各位大佬是什么原因呢

大佬，能否合作一下，用一些你们的技术

我们想在app（ios/android），还有web端用你一些技术，想实时聊天那种，不知道如何做

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.