artrajz / vits-simple-api Goto Github PK

View Code? Open in Web Editor NEW

720.0 8.0 113.0 15.32 MB

A simple VITS HTTP API, developed by extending Moegoe with additional features.

License: GNU Affero General Public License v3.0

Python 91.94% Dockerfile 0.07% Shell 1.47% HTML 3.46% CSS 0.65% JavaScript 2.41% PowerShell 0.01%

bert-vits2 tts vits moegoe gpt-sovits tts-api

vits-simple-api's Introduction

vits-simple-api

Simply call the vits api

English|中文文档

Feature

Online Demo

Thanks to Hugging Face!

Colab Notebook

Please note that different IDs may support different languages.speakers

https://artrajz-vits-simple-api.hf.space/voice/vits?text=你好,こんにちは&id=164
https://artrajz-vits-simple-api.hf.space/voice/vits?text=Difficult the first time, easy the second.&id=4
excited:https://artrajz-vits-simple-api.hf.space/voice/w2v2-vits?text=こんにちは&id=3&emotion=111
whispered:https://artrajz-vits-simple-api.hf.space/w2v2-vits?text=こんにちは&id=3&emotion=2077

ssml.mov

Deployment

There are two deployment options to choose from. Regardless of the option you select, you'll need to import the model after deployment to use the application.

Docker Deployment (Recommended for Linux)

Step 1: Pull the Docker Image

Run the following command to pull the Docker image. Follow the prompts in the script to choose the necessary files to download and pull the image:

bash -c "$(wget -O- https://raw.githubusercontent.com/Artrajz/vits-simple-api/main/vits-simple-api-installer-latest.sh)"

The default paths for project configuration files and model folders are /usr/local/vits-simple-api/.

Step 2: Start

Run the following command to start the container:

docker-compose up -d

Image Update

To update the image, run the following commands:

docker-compose pull

Then, restart the container:

docker-compose up -d

Virtual Environment Deployment

Step 1: Clone the Project

Clone the project repository using the following command:

git clone https://github.com/Artrajz/vits-simple-api.git

Step 2: Install Python Dependencies

It is recommended to use a virtual environment with Python version 3.10 for this project. Run the following command to install the Python dependencies required for the project:

If you encounter issues installing certain dependencies, please refer to the common problems outlined below.

pip install -r requirements.txt

Step 3: Start

Run the following command to start the program:

python app.py

Windows Quick Deployment Package

Step 1: Download and Extract the Deployment Package

Go to the releases page and download the latest deployment package. Extract the downloaded files.

Step 2: Start

Run start.bat to launch the program.

Model Loading

Step 1: Download VITS Models

Download the VITS model files and place them in the data/models folder.

Step 2: Loading Models

Automatic Model Loading

Starting from version 0.6.6, it is default behavior to automatically load all models in the data/models folder, making it easier for beginners to use.

Manual Model Loading

After the initial startup, a config.yaml configuration file will be generated. You need to change tts_config.auto_load to false in order to enable manual loading mode.

You can modify the tts_config.models in the config.yaml or make modifications in the admin panel in the browser.

Note: After version 0.6.6, the model loading path has been modified. Please follow the steps below to configure the model path again!

The path can be an absolute path or a relative path. If it's a relative path, it starts from the data/models folder in the project root directory.

For example, if the data/models folder has the following files:

├─model1
│  │─G_1000.pth
│  └─config.json
└─model2
   │─G_1000.pth
   └─config.json

Fill in the configuration like this in the YAML file:

tts_config:
  auto_load: false
  models:
  - config_path: model1/config.json
    model_path: model1/G_1000.pth
  - config_path: model2/config.json
    model_path: model2/G_1000.pth
	# GPT-SoVITS
  - sovits_path: gpt_sovits1/model1_e8_s11536.pth
    gpt_path: gpt_sovits1/model1-e15.ckpt
  - sovits_path: gpt_sovits2/model2_e8_s11536.pth
    gpt_path: gpt_sovits2/model2-e15.ckpt

Loading models through the admin panel is convenient, but if you want to load models outside the data/models folder, you can only do so by modifying the config.yaml configuration file. The method is to directly provide the absolute path.

Absolute path example:

tts_config:
  auto_load: false
  models:
  - config_path: D://model3/config.json
    model_path: D://model3/G_1000.pth

models_path: This is the models folder relative to the data directory, with the default value being "models". When auto_load is set to true, all models in the models_path directory will be loaded.

Other Models

After downloading the BERT model and emotion model, place them in the data/bert and data/emotional folders respectively. Find the corresponding names and insert them accordingly.

GPU accelerated

Windows

Install CUDA

Check the highest version of CUDA supported by your graphics card:

nvidia-smi

Taking CUDA 11.7 as an example, download it from the official website

Install GPU version of PyTorch

https://pytorch.org/

pip install torch --index-url https://download.pytorch.org/whl/cu118

Linux

The installation process is similar, but I don't have the environment to test it.

WebUI

Inference Frontend

http://127.0.0.1:23456

*Port is modifiable under the default setting of port 23456.

Admin Backend

The default address is http://127.0.0.1:23456/admin.

The initial username and password can be found by searching for 'admin' in the config.yaml file after the first startup.

Function Options Explanation

Disable the Admin Backend

The admin backend allows loading and unloading models, and while it has login authentication, for added security, you can disable the admin backend in the config.yaml:

'IS_ADMIN_ENABLED': !!bool 'false'

This extra measure helps ensure absolute security when making the admin backend inaccessible to the public network.

Bert-VITS2 Configuration and Language/Bert Model Usage

Starting from Bert-VITS2 v2.0, a model requires loading three different language Bert models. If you only need to use one or two languages, you can add the lang parameter in the config.json file of the model's data section. The value ["zh"] indicates that the model only uses Chinese and will load Chinese Bert models. The value ["zh", "ja"] indicates the usage of both Chinese and Japanese bilingual models, and only Chinese and Japanese Bert models will be loaded. Similarly, this pattern continues for other language combinations.

Example:

"data": {
  "lang": ["zh", "ja"],
  "training_files": "filelists/train.list",
  "validation_files": "filelists/val.list",
  "max_wav_value": 32768.0,
  ...

Custom Chinese Polyphonic Dictionary

If you encounter issues with incorrect pronunciation of polyphonic characters, you can try resolving it using the following method.

Create and open phrases_dict.txt in the data directory to add polyphonic words.

{
"一骑当千": [["yí"], ["jì"], ["dāng"], ["qiān"]],
}

GPT-SoVITS Reference Audio Presets

Find the configuration for GPT-SoVITS in the config.yaml file. Add presets under the presets section. Multiple presets can be added, with keys serving as preset names. Below are two default presets, default and default2:

gpt_sovits_config:
  hz: 50
  is_half: false
  id: 0
  lang: auto
  format: wav
  segment_size: 50
  presets:
    default:
      refer_wav_path: null
      prompt_text: null
      prompt_lang: auto
    default2:
      refer_wav_path: null
      prompt_text: null
      prompt_lang: auto

Reading API

Tested in legado

Multiple models can be used for reading, including VITS, Bert-VITS2, GPT-SoVITS. Parameters starting with in configure the speaker of the text in quotes, while parameters starting with nr configure the narrator.

To use GPT-SoVITS, it is necessary to configure the reference audio in the presets section of the config.yaml file in advance and modify the preset in the URL below.

The IP in the URL can be found after the API is started, generally using a local area network IP starting with 192.168.

After modification, select the reading engine, add the reading engine, paste the source, and enable the reading engine.

{
  "concurrentRate": "1",
  "contentType": "audio/wav",
  "enabledCookieJar": false,
  "header": "",
  "id": 1709643305070,
  "lastUpdateTime": 1709821070082,
  "loginCheckJs": "",
  "loginUi": "",
  "loginUrl": "",
  "name": "vits-simple-api",
  "url": "http://192.168.xxx.xxx:23456/voice/reading?text={{java.encodeURI(speakText)}}&in_model_type=GPT-SOVITS&in_id=0&in_preset=default&nr_model_type=BERT-VITS2&nr_id=0&nr_preset=default&format=wav&lang=zh"
}

Frequently Asked Questions

Installation Issues with fastText Dependency

Fasttext may not be installed on windows, you can install it with the following command,or download wheels here

# For Python 3.10 on win_amd64
pip install https://github.com/Artrajz/archived/raw/main/fasttext/fasttext-0.9.2-cp310-cp310-win_amd64.whl

pip install fasttext -i https://pypi.artrajz.cn/simple

Installation Issues with pyopenjtalk Dependency

Since pypi.org does not provide a wheel file for pyopenjtalk, you often need to install it from the source code. This process might be cumbersome for some users, so you can also install it using a pre-built wheel as follows:

pip install pyopenjtalk -i https://pypi.artrajz.cn/simple

Bert-VITS2 Version Compatibility

To ensure compatibility with the Bert-VITS2 model, modify the config.json file by adding a version parameter "version": "x.x.x". For instance, if the model version is 1.0.1, the configuration file should be written as:

{
  "version": "1.0.1",
  "train": {
    "log_interval": 10,
    "eval_interval": 100,
    "seed": 52,
    ...

Please note that for the Chinese extra version, the version should be changed to extra or zh-clap, and for the extra fix version, the version should be 2.4 or extra-fix.

API

GET

speakers list

GET http://127.0.0.1:23456/voice/speakers

Returns the mapping table of role IDs to speaker names.

voice vits

GET http://127.0.0.1:23456/voice/vits?text=text

Default values are used when other parameters are not specified.
GET http://127.0.0.1:23456/voice/vits?text=[ZH]text[ZH][JA]text[JA]&lang=mix

When lang=mix, the text needs to be annotated.
GET http://127.0.0.1:23456/voice/vits?text=text&id=142&format=wav&lang=zh&length=1.4

The text is "text", the role ID is 142, the audio format is wav, the text language is zh, the speech length is 1.4, and the other parameters are default.

check

GET http://127.0.0.1:23456/voice/check?id=0&model=vits

POST

See api_test.py

API KEY

Set api_key_enabled: true in config.yaml to enable API key authentication. The API key is api_key: api-key. After enabling it, you need to add the api_key parameter in GET requests and add the X-API-KEY parameter in the header for POST requests.

Parameter

VITS

Name	Parameter	Is must	Default	Type	Instruction
Synthesized text	text	true		str	Text needed for voice synthesis.
Speaker ID	id	false	From `config.yaml`	int	The speaker ID.
Audio format	format	false	From `config.yaml`	str	Support for wav,ogg,silk,mp3,flac
Text language	lang	false	From `config.yaml`	str	The language of the text to be synthesized. Available options include auto, zh, ja, and mix. When lang=mix, the text should be wrapped in [ZH] or [JA].The default mode is auto, which automatically detects the language of the text
Audio length	length	false	From `config.yaml`	float	Adjusts the length of the synthesized speech, which is equivalent to adjusting the speed of the speech. The larger the value, the slower the speed.
Noise	noise	false	From `config.yaml`	float	Sample noise, controlling the randomness of the synthesis.
SDP noise	noisew	false	From `config.yaml`	float	Stochastic Duration Predictor noise, controlling the length of phoneme pronunciation.
Segment Size	segment_size	false	From `config.yaml`	int	Divide the text into paragraphs based on punctuation marks, and combine them into one paragraph when the length exceeds segment_size. If segment_size<=0, the text will not be divided into paragraphs.
Streaming response	streaming	false	false	bool	Streamed synthesized speech with faster initial response.

VITS voice conversion

Name	Parameter	Is must	Type	Instruction
Uploaded Audio	upload	true	file	The audio file to be uploaded. It should be in wav or ogg
Source Role ID	original_id	true	int	The ID of the role used to upload the audio file.
Target Role ID	target_id	true	int	The ID of the target role to convert the audio to.

HuBert-VITS

Name	Parameter	Is must	Type	Instruction
Uploaded Audio	upload	true	file	The audio file to be uploaded. It should be in wav or ogg format.
Target speaker ID	id	true	int	The target speaker ID.
Audio format	format	true	str	wav,ogg,silk
Audio length	length	true	float	Adjusts the length of the synthesized speech, which is equivalent to adjusting the speed of the speech. The larger the value, the slower the speed.
Noise	noise	true	float	Sample noise, controlling the randomness of the synthesis.
sdp noise	noisew	true	float	Stochastic Duration Predictor noise, controlling the length of phoneme pronunciation.

W2V2-VITS

Name	Parameter	Is must	Default	Type	Instruction
Synthesized text	text	true		str	Text needed for voice synthesis.
Speaker ID	id	false	From `config.yaml`	int	The speaker ID.
Audio format	format	false	From `config.yaml`	str	Support for wav,ogg,silk,mp3,flac
Text language	lang	false	From `config.yaml`	str	The language of the text to be synthesized. Available options include auto, zh, ja, and mix. When lang=mix, the text should be wrapped in [ZH] or [JA].The default mode is auto, which automatically detects the language of the text
Audio length	length	false	From `config.yaml`	float	Adjusts the length of the synthesized speech, which is equivalent to adjusting the speed of the speech. The larger the value, the slower the speed.
Noise	noise	false	From `config.yaml`	float	Sample noise, controlling the randomness of the synthesis.
SDP noise	noisew	false	From `config.yaml`	float	Stochastic Duration Predictor noise, controlling the length of phoneme pronunciation.
Segment Size	segment_size	false	From `config.yaml`	int	Divide the text into paragraphs based on punctuation marks, and combine them into one paragraph when the length exceeds segment_size. If segment_size<=0, the text will not be divided into paragraphs.
Dimensional emotion	emotion	false	0	int	The range depends on the emotion reference file in npy format, such as the range of the innnky's model all_emotions.npy, which is 0-5457.

Dimensional emotion

Name	Parameter	Is must	Default	Type	Instruction
Uploaded Audio	upload	true		file	Return the npy file that stores the dimensional emotion vectors.

Bert-VITS2

Name	Parameter	Is must	Default	Type	Instruction
Synthesized text	text	true		str	Text needed for voice synthesis.
Speaker ID	id	false	From `config.yaml`	int	The speaker ID.
Audio format	format	false	From `config.yaml`	str	Support for wav,ogg,silk,mp3,flac
Text language	lang	false	From `config.yaml`	str	"Auto" is a mode for automatic language detection and is also the default mode. However, it currently only supports detecting the language of an entire text passage and cannot distinguish languages on a per-sentence basis. The other available language options are "zh" and "ja".
Audio length	length	false	From `config.yaml`	float	Adjusts the length of the synthesized speech, which is equivalent to adjusting the speed of the speech. The larger the value, the slower the speed.
Noise	noise	false	From `config.yaml`	float	Sample noise, controlling the randomness of the synthesis.
SDP noise	noisew	false	From `config.yaml`	float	Stochastic Duration Predictor noise, controlling the length of phoneme pronunciation.
Segment Size	segment_size	false	From `config.yaml`	int	Divide the text into paragraphs based on punctuation marks, and combine them into one paragraph when the length exceeds segment_size. If segment_size<=0, the text will not be divided into paragraphs.
SDP/DP mix ratio	sdp_ratio	false	From `config.yaml`	int	The theoretical proportion of SDP during synthesis, the higher the ratio, the larger the variance in synthesized voice tone.
Emotion	emotion	false	From `config.yaml`	int	Available for Bert-VITS2 v2.1, ranging from 0 to 9
Emotion reference Audio	reference_audio	false	None		Bert-VITS2 v2.1 uses reference audio to control the synthesized audio's emotion
Text Prompt	text_prompt	false	From `config.yaml`	str	Bert-VITS2 v2.2 text prompt used for emotion control
Style Text	style_text	false	From `config.yaml`	str	Bert-VITS2 v2.3 text prompt used for emotion control
Style Text Weight	style_weight	false	From `config.yaml`	float	Bert-VITS2 v2.3 text prompt weight used for prompt weighting
Streaming response	streaming	false	false	bool	Streamed synthesized speech with faster initial response.

GPT-SoVITS Speech Synthesis

Name	Parameter	Is must	Default	Type	Instruction
Synthesized text	text	true		str	Text needed for voice synthesis.
Speaker ID	id	false	From `config.yaml`	int	Speaker ID. In GPT-SoVITS, each model serves as a Speaker ID, and the voice is switched by reference audio presets.
Audio format	format	false	From `config.yaml`	str	Support for wav, ogg, silk, mp3, flac
Text language	lang	false	From `config.yaml`	str	"auto" is the automatic language detection mode, which is also the default mode. However, it currently only supports recognizing the language of the entire text passage, and cannot distinguish each sentence.
Reference Audio	reference_audio	false	None		reference_audio is required, but it can be replaced by preset.
Reference Audio Text	prompt_text	false	From `config.yaml`	float	Need to be consistent with the actual text of the reference audio.
Reference Audio Language	prompt_lang	false	From `config.yaml`	str	Defaults to auto for automatic text language recognition. If recognition fails, manually fill in, zh for Chinese, ja for Japanese, en for English.
Reference Audio Preset	preset	false	default	str	Replace the reference audio with pre-set presets, multiple presets can be set.

SSML (Speech Synthesis Markup Language)

Supported Elements and Attributes

speak Element

Attribute	Instruction	Is must
id	Default value is retrieved From `config.yaml`	false
lang	Default value is retrieved From `config.yaml`	false
length	Default value is retrieved From `config.yaml`	false
noise	Default value is retrieved From `config.yaml`	false
noisew	Default value is retrieved From `config.yaml`	false
segment_size	Splits text into segments based on punctuation marks. When the sum of segment lengths exceeds `segment_size`, it is treated as one segment. `segment_size<=0` means no segmentation. The default value is 0.	false
model_type	Default is VITS. Options: W2V2-VITS, BERT-VITS2	false
emotion	Only effective when using W2V2-VITS . The range depends on the npy emotion reference file.	false
sdp_ratio	Only effective when using BERT-VITS2 .	false

voice Element

Higher priority than speak.

Attribute	Instruction	Is must
id	Default value is retrieved From `config.yaml`	false
lang	Default value is retrieved From `config.yaml`	false
length	Default value is retrieved From `config.yaml`	false
noise	Default value is retrieved From `config.yaml`	false
noisew	Default value is retrieved From `config.yaml`	false
segment_size	Splits text into segments based on punctuation marks. When the sum of segment lengths exceeds `segment_size`, it is treated as one segment. `segment_size<=0` means no segmentation. The default value is 0.	false
model_type	Default is VITS. Options: W2V2-VITS, BERT-VITS2	false
emotion	Only effective when using W2V2-VITS . The range depends on the npy emotion reference file.	false
sdp_ratio	Only effective when using BERT-VITS2 .	false

break Element

Attribute	Instruction	Is must
strength	x-weak, weak, medium (default), strong, x-strong	false
time	The absolute duration of a pause in seconds (such as `2s`) or milliseconds (such as `500ms`). Valid values range from 0 to 5000 milliseconds. If you set a value greater than the supported maximum, the service will use `5000ms`. If the `time` attribute is set, the `strength` attribute is ignored.	false

Strength	Relative Duration
x-weak	250 ms
weak	500 ms
medium	750 ms
strong	1000 ms
x-strong	1250 ms

Reading

Name	Parameter	Is must	Default	Type	Instruction
Synthesis Text	text	true		str	The text to be synthesized into speech.
Interlocutor Model Type	in_model_type	false	Obtained from `config.yaml`	str
Interlocutor ID	in_id	false	Obtained from `config.yaml`	int
Interlocutor Reference Audio Preset	preset	false	default	str	Replace the reference audio with preset settings, which can be set to multiple presets in advance.
Narrator Model Type	nr_model_type	false	Obtained from `config.yaml`	str
Narrator ID	nr_id	false	Obtained from `config.yaml`	int
Narrator Reference Audio Preset	preset	false	default	str	Replace the reference audio with preset settings, which can be set to multiple presets in advance.
Audio Format	format	false	Obtained from `config.yaml`	str	Supports wav, ogg, silk, mp3, flac
Text Language	lang	false	Obtained from `config.yaml`	str	'auto' for automatic language detection mode, which is also the default mode. However, currently, it only supports recognizing the language of the entire text and cannot distinguish each sentence.
Reference Audio Preset	preset	false	default	str	Replace the reference audio with preset settings, which can be set to multiple presets in advance.

The other parameters of the model will use the default parameters of the corresponding model in the config.yaml file.

Example

See api_test.py

Communication

Learning and communication,now there is only Chinese QQ group

Acknowledgements

vits:https://github.com/jaywalnut310/vits
MoeGoe:https://github.com/CjangCjengh/MoeGoe
emotional-vits:https://github.com/innnky/emotional-vits
vits-uma-genshin-honkai:https://huggingface.co/spaces/zomehwh/vits-uma-genshin-honkai
vits_chinese:https://github.com/PlayVoice/vits_chinese
Bert_VITS2:https://github.com/fishaudio/Bert-VITS2
GPT-SoVITS:https://github.com/RVC-Boss/GPT-SoVITS

Thank You to All Contributors

vits-simple-api's People

Contributors

Stargazers

Watchers

Forkers

lss233 haibersut k2lin-daniel ylql8023 nene07210721 mistletoe0628 t4wefan linuslieu 2dipw zwssunny elijah-0616 onexuan 40740 yongyug ryaochengfeng charliedreemur cgnannan liroda fulldb byesoft 574610qianyuqianxun zyqq yanchenbai d1654 zyx-a gemsford xiaolingaki paddy0914 ikaros-521 lcylym aik168 jimmyleesnow zcloud2014 chandou-code wqtogo gachaun sagiri996 dogelord081 fznf1010 yaoyunchou af-74413592 liandanshi01 assassindesign hushuoyouli mushan0x0 xwbsz max-liu qwsxmkoi sligter prahs benson80 haoduoyu1203 luckytain dfmjndm lingyue-sojoy xiao-john remilibra harveytvt ealvn xunfeng1980 uni345 chenyangy lemon22333 yuanmeng1120 xming673 haoyaogang zhangtianrong haomingxr kirayuukiasuna durumei spikeparaffa asuh9133 anhlbt funying ayybsyya cycham0 cat-undertale xianyue0125 tosaka07 lx1309244704 matchingagent run2ai-m vincent-the-gamer quanquan-dane liyangbing hcoder1949 sdyby2006 jjtools blogbin ilaiyu shaohongy shijianwanwu xyrlsz wangmiemie fengyuan-liang digitdance sakuramaiii aurxs winpkay luckmc

vits-simple-api's Issues

如何对接lss大佬的bot项目

使用docker启动


moegoe-config.json是个文件夹形式,该如何正常启动呢直接部署启动也会返回500的报错

老大加一个流式处理？

老大加一个流式处理？是否能实现流式响应？

用的是哪个版本的模型？

v1模型可以使用吗

docker 一键部署问题

我无法通过那个脚本部署docker镜像，我看了那个脚本里的内容 docker compose pull 后面什么都没有，是还没开发吗

老大哦，还是建议加上音频来调节情感吧，在MoeGoe来回跳转着实麻烦哈哈哈

有时候需要新增一个情绪的npy ，需要先到MoeGoe去生成npy，还是觉得建议加上音频来调节情感吧，来回跳转很麻烦哈哈哈，而且
也还是要依靠MoeGoe，如vits-simple-api有生成npy的功能，那就完全独立了，就可以删掉MoeGoe了哈哈哈。

在服务器上部署后生成的语音没有声音或者都是0秒

部署方式：docker-compose

下面是我的config.py文件内容

import os
import sys

JSON_AS_ASCII = False
MAX_CONTENT_LENGTH = 5242880

# port
PORT = 23456
# absolute path
ABS_PATH = os.path.join(os.path.dirname(os.path.realpath(sys.argv[0])))
# upload path
UPLOAD_FOLDER = ABS_PATH + "/upload"
# cahce path
CACHE_PATH = ABS_PATH + "/cache"
# zh ja ko en ...
LANGUAGE_AUTOMATIC_DETECT = ["zh","ja"]
#set to True to enable API Key authentication
API_KEY_ENABLED = False
# API_KEY is required for authentication
API_KEY = "api-key"

'''
For each model, the filling method is as follows 模型列表中每个模型的填写方法如下
example 示例:
MODEL_LIST = [
    #VITS
    [ABS_PATH+"/Model/Nene_Nanami_Rong_Tang/1374_epochs.pth", ABS_PATH+"/Model/Nene_Nanami_Rong_Tang/config.json"],
    [ABS_PATH+"/Model/Zero_no_tsukaima/1158_epochs.pth", ABS_PATH+"/Model/Zero_no_tsukaima/config.json"],
    [ABS_PATH+"/Model/g/G_953000.pth", ABS_PATH+"/Model/g/config.json"],
    #HuBert-VITS
    [ABS_PATH+"/Model/louise/360_epochs.pth", ABS_PATH+"/Model/louise/config.json", ABS_PATH+"/Model/louise/hubert-soft-0d54a1f4.pt"],
]
'''
# load mutiple models
MODEL_LIST = [
    [ABS_PATH+"/Model/Nene_Meguru_Yoshino_Mako_Myrasame_hoharu_Nanami/365_epochs.pth", ABS_PATH+"/Model/Nene_Meguru_Yoshino_Mako_Myrasame_hoharu_Nanami/config.json"],
    [ABS_PATH+"/Model/to_love/1113_epochs.pth", ABS_PATH+"/Model/to_love/config.json"],
]

"""
default params
以下选项是修改VITS GET方法 [不指定参数]时的默认值
"""

# GET 默认音色id
ID = 0
# GET 默认音频格式 可选wav,ogg,silk
FORMAT = "wav"
# GET 默认语言
LANG = "AUTO"
# GET 默认语音长度，相当于调节语速，该数值越大语速越慢
LENGTH = 1
# GET 默认噪声
NOISE = 0.667
# GET 默认噪声偏差
NOISEW = 0.8
#长文本分段阈值，max<=0表示不分段,text will not be divided if max<=0
MAX = 50

请求其他接口，比如http://127.0.0.1:23456/voice/speakers，都是正常返回的，但是请求 http://127.0.0.1:23456/voice?text=你好呀 生成的音频却只有喘息声

Model的下载地址在哪里？

如何实现gpu推理

我发现这个默认只能cpu推理，即便我有48核但是还是慢而且吃不满，但是我这还有3090，我想用gpu可以大幅度加速推理

挂载完成后服务未能找到speaker

挂载是成功的，但是无法查询到speaker

getattr()报错

运行app.py后,getattr返回一个报错.提示 attribute name must be string.

配置文件路径设置如下:
MODEL_LIST = [
# VITS
[ABS_PATH + "/Model/Xin/G_32000.pth", ABS_PATH + "/Model/Xin/config.json"],
#[ABS_PATH + "/Model/Zero_no_tsukaima/1158_epochs.pth", ABS_PATH + "/Model/Zero_no_tsukaima/config.json"],
#[ABS_PATH + "/Model/g/G_953000.pth", ABS_PATH + "/Model/g/config.json"],
# HuBert-VITS (Need to configure HUBERT_SOFT_MODEL)
#[ABS_PATH + "/Model/louise/360_epochs.pth", ABS_PATH + "/Model/louise/config.json"],
# W2V2-VITS (Need to configure DIMENSIONAL_EMOTION_NPY)
#[ABS_PATH + "/Model/w2v2-vits/1026_epochs.pth", ABS_PATH + "/Model/w2v2-vits/config.json"],
]

sovits 4.0的模型无法使用

好像是配置文件不兼容导致的，请问有啥解决办法么？

我将配置文件里的"n_speakers"挪到了data里可以正常运行，但是生成wav过程中报错了，生成的都是1kb无法播放的文件。

VITS without [JA]

Hi. Thanks for your contribution to the project. I had an issue using your project with a trained model from vits-finetuning. It speaks that [JA] in the generated audio file. Is there a way to prevent that? Thanks.

Docker如何启用GPU加速

我尝试了如下的配置

version: '3.4'
services:
  vits:
    image: vits-simple-api:latest
    restart: always
    ports:
      - 23456:23456
    environment:
      LANG: 'C.UTF-8'
      TZ: Asia/Shanghai #timezone
    volumes:
      - ./Model:/app/Model # 挂载模型文件夹
      - ./config.py:/app/config.py # 挂载配置文件
      - /opt/cuda:/opt/cuda
      - ./cuda_test.py:/app/cuda_test.py
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: 1
              capabilities: [ gpu,utility ]

但是torch.cuda.is_available()是False。

nvidia-smi是可用的。

我在Dockerfile中删去了RUN pip install torch --index-url https://download.pytorch.org/whl/cpu改为了RUN pip install torch.

之后我在容器中将cuda添加到PATH中，使得nvcc -v也可用，但是还是没有解决问题。

ubanru22.04 docker环境下报错

作者又要麻烦你了
请问这是什么情况啊

对接lss233的qqbot出现问题

大佬，好像是因为/voice/speakers id的返回值格式变了，qqbot的切换语音找不到对应的音色了，这个我直接改代码然后重新打包镜像就可以了吗

是我的问题吗？看log

DEBUG:vits-simple-api:[GD]君不见，黄河之水天上来，奔流到海不复回。君不见，高堂明镜悲白发，朝如青丝暮成雪[GD]
ERROR:app:Exception on /voice/w2v2-vits [POST]
Traceback (most recent call last):
File "C:\Users\lin85\AppData\Local\Programs\Python\Python310\lib\site-packages\flask\app.py", line 2528, in wsgi_app
response = self.full_dispatch_request()
File "C:\Users\lin85\AppData\Local\Programs\Python\Python310\lib\site-packages\flask\app.py", line 1825, in full_dispatch_request
rv = self.handle_user_exception(e)
File "C:\Users\lin85\AppData\Local\Programs\Python\Python310\lib\site-packages\flask\app.py", line 1823, in full_dispatch_request
rv = self.dispatch_request()
File "C:\Users\lin85\AppData\Local\Programs\Python\Python310\lib\site-packages\flask\app.py", line 1799, in dispatch_request
return self.ensure_sync(self.view_functions[rule.endpoint])(**view_args)
File "G:\AI\vits接口\VITS\app.py", line 44, in check_api_key
return func(args, **kwargs)
File "G:\AI\vits接口\VITS\app.py", line 239, in voice_w2v2_api
output = tts.w2v2_vits_infer({"text": text,
File "G:\AI\vits接口\VITS\voice.py", line 454, in w2v2_vits_infer
audio = voice_obj.get_audio(voice, auto_break=True)
File "G:\AI\vits接口\VITS\voice.py", line 216, in get_audio
self.get_infer_param(text=sentence, speaker_id=speaker_id, length=length, noise=noise,
File "G:\AI\vits接口\VITS\voice.py", line 119, in get_infer_param
stn_tst = self.get_cleaned_text(text, self.hps_ms, cleaned=cleaned)
File "G:\AI\vits接口\VITS\voice.py", line 59, in get_cleaned_text
text_norm = text_to_sequence(text, hps.symbols, hps.data.text_cleaners)
File "G:\AI\vits接口\VITS\text_init_.py", line 17, in text_to_sequence
clean_text = clean_text(text, cleaner_names)
File "G:\AI\vits接口\VITS\text_init.py", line 31, in _clean_text
text = cleaner(text)
File "G:\AI\vits接口\VITS\text\cleaners.py", line 241, in chinese_dialect_cleaners
text = re.sub(r'[GD](.?)[GD]',
File "C:\Users\lin85\AppData\Local\Programs\Python\Python310\lib\re.py", line 209, in sub
return compile(pattern, flags).sub(repl, string, count)
File "G:\AI\vits接口\VITS\text\cleaners.py", line 242, in
lambda x: cantonese_to_ipa(x.group(1)) + ' ', text)
File "G:\AI\vits接口\VITS\text\cantonese.py", line 62, in cantonese_to_ipa
text = converter.convert(text).replace('-', '').replace('$', ' ')
File "C:\Users\lin85\AppData\Local\Programs\Python\Python310\lib\site-packages\opencc_init.py", line 87, in convert
retv_i = libopencc.opencc_convert_utf8(self._od, text, len(text))
OSError: exception: access violation reading 0xFFFFFFFFFFFFFFFF
INFO:werkzeug:127.0.0.1 - - [31/May/2023 10:51:14] "POST /voice/w2v2-vits HTTP/1.1" 500 -
DEBUG:urllib3.connectionpool:http://127.0.0.1:23456 "POST /voice/w2v2-vits HTTP/1.1" 500 265

单纯使用转化功能不训练，需要的服务器性能是什么

我服务器是centOS 8，尝试启动项目：

python3 app.py 
torch:2.0.1+cu117 GPU_available:False
device:cpu device.type:cpu

到这里CPU占用和硬盘IO飙升，很快导致io_util达到峰值，服务器无法响应了。关掉终端重新登录也无法响应，只能重启服务器。死机的时候看了一眼硬盘读取是285508kb/s，这是读取模型吗？总共我就导入了三个pth模型。
有什么好的办法吗？（已经多次尝试，必死机）

希望能加入mp3格式的输出

mp3作为一种常用格式能被许多应用内部打开，而且是压缩过的。用ogg一些应用内部打不开，还要转到一些外部播放器里打开有点麻烦，谢谢！

'list' object has no attribute 'get'报错

服务器部署的应该没有问题的，不清楚什么原因报的错

大佬请教一下，我用wechat机器人请求vits-simple-api，反馈语音合成失败，该如何解决。

大佬您好，最近在测试微信机器人的语音合成方案，在调用咱们家的vist-simple-api时遇到了点问题，想请教您。

前置确认
1.我确认我运行的是最新版本的代码，并且安装了所需的依赖
搜索issues中是否已存在类似问题

2.我已经搜索过issues，没有跟我遇到的问题相关的issue

操作系统类型?
Windows 10 专业版(21H2)

运行的python版本是?
python 3.10.8

复现步骤 🕹
启动终端——>激活环境变量(fort)——>python app.py——>给微信机器人发语音——>请求失败

问题描述 😯
在微信机器人项目(https://github.com/zhayujie/chatgpt-on-wechat) 中调用仓库的语音合成api,现实语音合成失败。

终端日志 📒
127.0.0.1 - - [14/May/2023 13:12:40] "POST /voice HTTP/1.1" 400 -

关于SSML语言停顿处理

你这个项目非常赞，我想请教一下，这个SSML语言相关的停顿在项目中哪里有处理？我没找到。我要是想在源vits项目自定义停顿应该如何做？麻烦你说一下思路，谢谢了。

你好关于请求报错列表正常输出但是语音无法合成合成出的语音仅1kb 0s 无法播放

报错代码如下

[E 230613 19:09:03 fastlid:170] fastlid.set_languages is not a list
[JA]こんにちは[JA]
ERROR:app:Exception on /voice [POST]
Traceback (most recent call last):
File "E:\aila\API\vits-simple-api-windows_3\vits-simple-api\py310\lib\site-packages\flask\app.py", line 2528, in wsgi_app
response = self.full_dispatch_request()
File "E:\aila\API\vits-simple-api-windows_3\vits-simple-api\py310\lib\site-packages\flask\app.py", line 1825, in full_dispatch_request
rv = self.handle_user_exception(e)
File "E:\aila\API\vits-simple-api-windows_3\vits-simple-api\py310\lib\site-packages\flask\app.py", line 1823, in full_dispatch_request
rv = self.dispatch_request()
File "E:\aila\API\vits-simple-api-windows_3\vits-simple-api\py310\lib\site-packages\flask\app.py", line 1799, in dispatch_request
return self.ensure_sync(self.view_functions[rule.endpoint])(**view_args)
File "E:\aila\API\vits-simple-api-windows_3\vits-simple-api\app.py", line 37, in check_api_key
return func(*args, **kwargs)
File "E:\aila\API\vits-simple-api-windows_3\vits-simple-api\app.py", line 136, in voice_api
output = real_obj.create_infer_task(text=text,
File "E:\aila\API\vits-simple-api-windows_3\vits-simple-api\voice.py", line 211, in create_infer_task
self.get_infer_param(text=sentence, speaker_id=speaker_id, length=length, noise=noise,
File "E:\aila\API\vits-simple-api-windows_3\vits-simple-api\voice.py", line 147, in get_infer_param
stn_tst = self.get_cleaned_text(text, self.hps_ms, cleaned=cleaned)
File "E:\aila\API\vits-simple-api-windows_3\vits-simple-api\voice.py", line 71, in get_cleaned_text
text_norm = text_to_sequence(text, hps.symbols, hps.data.text_cleaners)
File "E:\aila\API\vits-simple-api-windows_3\vits-simple-api\text_init_.py", line 17, in text_to_sequence
clean_text = clean_text(text, cleaner_names)
File "E:\aila\API\vits-simple-api-windows_3\vits-simple-api\text_init.py", line 28, in _clean_text
cleaner = getattr(cleaners, name)
AttributeError: module 'text.cleaners' has no attribute 'custom_cleaners'
INFO:werkzeug:127.0.0.1 - - [13/Jun/2023 19:09:03] "POST /voice HTTP/1.1" 500 -

部署成功，打不开HTTP请求

在windows的linux子系统上拉取失败

$GZO_V`1@O0FSAZ2W }P4{4I$

大佬，可以吧你的模型共享出来吗？

如题

[BUG] 生成语音失败speaker_id = int(request.args.get("id", app.config["ID"]))

这是日志

moegoe_1 | INFO:werkzeug:101.33.231.208 - - [07/Apr/2023 17:20:24] "GET /voice?text=%5BLENGTH=1.4%5D你好！有什么我可以为您做的吗？请注意，我只能通过文本输入与您交流，无法识别语音指令。&lang=zh&id=1&format=silk HTTP/1.1" 500 -
moegoe_1 | INFO:werkzeug:101.33.231.208 - - [07/Apr/2023 17:20:30] "POST /voice/speakers HTTP/1.1" 200 -
moegoe_1 | ERROR:app:Exception on /voice [GET]
moegoe_1 | Traceback (most recent call last):
moegoe_1 | File "/usr/local/lib/python3.9/site-packages/flask/app.py", line 2528, in wsgi_app
moegoe_1 | response = self.full_dispatch_request()
moegoe_1 | File "/usr/local/lib/python3.9/site-packages/flask/app.py", line 1825, in full_dispatch_request
moegoe_1 | rv = self.handle_user_exception(e)
moegoe_1 | File "/usr/local/lib/python3.9/site-packages/flask/app.py", line 1823, in full_dispatch_request
moegoe_1 | rv = self.dispatch_request()
moegoe_1 | File "/usr/local/lib/python3.9/site-packages/flask/app.py", line 1799, in dispatch_request
moegoe_1 | return self.ensure_sync(self.view_functions[rule.endpoint])(**view_args)
moegoe_1 | File "/app/app.py", line 51, in voice_api
moegoe_1 | speaker_id = int(request.args.get("id", app.config["ID"]))
moegoe_1 | KeyError: 'ID'
moegoe_1 | INFO:werkzeug:101.33.231.208 - - [07/Apr/2023 17:20:30] "GET /voice?text=%5BLENGTH=1.4%5D&lang=zh&id=1&format=silk HTTP/1.1" 500 -
moegoe_1 | ERROR:app:Exception on /voice [GET]
moegoe_1 | Traceback (most recent call last):
moegoe_1 | File "/usr/local/lib/python3.9/site-packages/flask/app.py", line 2528, in wsgi_app
moegoe_1 | response = self.full_dispatch_request()
moegoe_1 | File "/usr/local/lib/python3.9/site-packages/flask/app.py", line 1825, in full_dispatch_request
moegoe_1 | rv = self.handle_user_exception(e)
moegoe_1 | File "/usr/local/lib/python3.9/site-packages/flask/app.py", line 1823, in full_dispatch_request
moegoe_1 | rv = self.dispatch_request()
moegoe_1 | File "/usr/local/lib/python3.9/site-packages/flask/app.py", line 1799, in dispatch_request
moegoe_1 | return self.ensure_sync(self.view_functions[rule.endpoint])(**view_args)
moegoe_1 | File "/app/app.py", line 51, in voice_api
moegoe_1 | speaker_id = int(request.args.get("id", app.config["ID"]))
moegoe_1 | KeyError: 'ID'
moegoe_1 | INFO:werkzeug:101.33.231.208 - - [07/Apr/2023 17:20:35] "GET /voice?text=%5BLENGTH=1.4%5D你好！有什么我可以帮助您的吗？&lang=zh&id=1&format=silk HTTP/1.1" 500 -
moegoe_1 | * Serving Flask app 'app'
moegoe_1 | * Debug mode: off
moegoe_1 | INFO:werkzeug:WARNING: This is a development server. Do not use it in a production deployment. Use a production WSGI server instead.
moegoe_1 | * Running on all addresses (0.0.0.0)
moegoe_1 | * Running on http://127.0.0.1:23457
moegoe_1 | * Running on http://172.21.0.2:23457
moegoe_1 | INFO:werkzeug:Press CTRL+C to quit
moegoe_1 | INFO:werkzeug:101.33.231.208 - - [07/Apr/2023 17:23:06] "POST /voice//speakers HTTP/1.1" 308 -
moegoe_1 | INFO:werkzeug:101.33.231.208 - - [07/Apr/2023 17:23:06] "POST /voice/speakers HTTP/1.1" 200 -
moegoe_1 | INFO:werkzeug:101.33.231.208 - - [07/Apr/2023 17:23:07] "POST /voice//speakers HTTP/1.1" 308 -
moegoe_1 | INFO:werkzeug:101.33.231.208 - - [07/Apr/2023 17:23:07] "POST /voice/speakers HTTP/1.1" 200 -
moegoe_1 | INFO:werkzeug:101.33.231.208 - - [07/Apr/2023 17:23:07] "GET /voice/?text=%5BLENGTH=1.4%5D&lang=zh&id=1&format=silk HTTP/1.1" 200 -
moegoe_1 | INFO:werkzeug:101.33.231.208 - - [07/Apr/2023 17:23:12] "GET /voice/?text=%5BLENGTH=1.4%5D这句话太长了，抱歉&lang=zh&id=1&format=silk HTTP/1.1" 200 -
moegoe_1 | INFO:werkzeug:101.33.231.208 - - [07/Apr/2023 17:26:46] "GET /voice/?text=%5BLENGTH=1.4%5D消息已收到！当前我还有条消息要回复，请您稍等。&lang=zh&id=1&format=silk HTTP/1.1" 200 -
moegoe_1 | INFO:werkzeug:101.33.231.208 - - [07/Apr/2023 17:27:12] "GET /voice/?text=%5BLENGTH=1.4%5D消息已收到！当前我还有条消息要回复，请您稍等。&lang=zh&id=1&format=silk HTTP/1.1" 200 -
moegoe_1 | INFO:werkzeug:101.33.231.208 - - [07/Apr/2023 17:27:27] "GET /voice/?text=%5BLENGTH=1.4%5D消息已收到！当前我还有条消息要回复，请您稍等。&lang=zh&id=1&format=silk HTTP/1.1" 200 -
moegoe_1 | * Serving Flask app 'app'
moegoe_1 | * Debug mode: off
moegoe_1 | INFO:werkzeug:WARNING: This is a development server. Do not use it in a production deployment. Use a production WSGI server instead.
moegoe_1 | * Running on all addresses (0.0.0.0)
moegoe_1 | * Running on http://127.0.0.1:23457
moegoe_1 | * Running on http://172.21.0.2:23457
moegoe_1 | INFO:werkzeug:Press CTRL+C to quit
moegoe_1 | INFO:werkzeug:101.33.231.208 - - [07/Apr/2023 17:34:12] "POST /voice//speakers HTTP/1.1" 308 -
moegoe_1 | INFO:werkzeug:101.33.231.208 - - [07/Apr/2023 17:34:12] "POST /voice/speakers HTTP/1.1" 200 -
moegoe_1 | INFO:werkzeug:101.33.231.208 - - [07/Apr/2023 17:34:12] "GET /voice/?text=%5BLENGTH=1.4%5D你好！有什么我可以帮助你的吗？&lang=zh&id=1&format=silk HTTP/1.1" 200 -

请问大佬，怎么能将本项目部署在另一个主机上呢

就是两个服务器，因为我部署chat-bot的服务器配置太低了，而且硬盘容量也不够，这个项目如果部署在了另一个服务器上改怎么去访问呢，有没有什么具体的思路呀，或者我该百度些什么内容呢，问题有点低级，麻烦你了

在docker上部署时出错

在docker上部署时出现以下错误：

INFO:moegoe-simple-api:角色id：0
INFO:moegoe-simple-api:合成文本：[ZH]您好！有什么我可以帮助您的吗？[ZH]
ERROR:app:Exception on /voice [GET]
Traceback (most recent call last):
File "/usr/local/lib/python3.9/site-packages/flask/app.py", line 2528, in wsgi_app
response = self.full_dispatch_request()
File "/usr/local/lib/python3.9/site-packages/flask/app.py", line 1825, in full_dispatch_request
rv = self.handle_user_exception(e)
File "/usr/local/lib/python3.9/site-packages/flask/app.py", line 1823, in full_dispatch_request
rv = self.dispatch_request()
File "/usr/local/lib/python3.9/site-packages/flask/app.py", line 1799, in dispatch_request
return self.ensure_sync(self.view_functions[rule.endpoint])(**view_args)
File "/app/app.py", line 80, in voice_api
output, file_type, fname = real_obj.generate(text=text,
File "/app/voice.py", line 100, in generate
stn_tst = self.get_text(text, self.hps_ms, cleaned=cleaned)
File "/app/voice.py", line 56, in get_text
text_norm = text_to_sequence(text, hps.symbols, hps.data.text_cleaners)
File "/app/text/init.py", line 17, in text_to_sequence
clean_text = _clean_text(text, cleaner_names)
File "/app/text/init.py", line 31, in _clean_text
text = cleaner(text)
File "/app/text/cleaners.py", line 118, in shanghainese_cleaners
from text.shanghainese import shanghainese_to_ipa
File "/app/text/shanghainese.py", line 6, in
converter = opencc.OpenCC('zaonhe')
File "/usr/local/lib/python3.9/site-packages/opencc/init.py", line 43, in init
super(OpenCC, self).init(config)
RuntimeError: /usr/local/lib/python3.9/site-packages/opencc/clib/share/opencc/zaonhe.json not found or not accessible.
INFO:werkzeug:172.30.0.1 - - [10/Apr/2023 13:28:30] "GET /voice?text=您好！有什么我可以帮助您的吗？&lang=zh&id=0&format=silk&length=1.4 HTTP/1.1" 500 -

请问是什么原因

在docker执行python app.py时出现报错

文件结构已经检查：
$}L6@(MIT L_O{ 4K{I6)H}B$
配置文件已经检查:

报错截图如下：
$HG7AU$%PI23F`2ED(@_MR{Q$

speaker id 0 does not exist

http://127.0.0.1:23456/voice/speakers 返回 {"HUBERT-VITS":[],"VITS":[],"W2V2-VITS":[]} 是不是表示失败？

http://127.0.0.1:23456/voice?text=我喜欢看**记录片
{"message":"id 0 does not exist","status":"error"}

使用say指令后报错IncompleteRead(432040 bytes read, 505973 more expected)

对机器人发送
say 你好
后台报错
INFO:vits-simple-api:[VITS] len:2 text：你好
[E 230609 13:09:49 fastlid:48] IncompleteRead(451317 bytes read, 486696 more expected)
请问是什么问题

【功能请求】希望能添加防止公网服务器被滥用的鉴权功能

作者您好：
目前vits-simple-api没有鉴权功能，这意味着部署在公网的vits-simple-api会响应所有请求，一旦服务器URL暴露，可能有被滥用的风险，建议考虑加入类似于api key的鉴权功能，谢谢！

报错TypeError: 'type' object is not subscriptable

按步骤部署最后python app.py时报错
Traceback (most recent call last):
File "app.py", line 10, in
from utils import clean_folder, merge_model
File "C:\Users\Administrator\Desktop\MoeGoe-Simple-API\utils.py", line 120, in
def to_pcm(in_path: str) -> tuple[str, int]:
TypeError: 'type' object is not subscriptable

在移动平台重新搭建时出现故障

平台 win11 WSL2.0 ubuntu
docker能够运行，但是无法输出内容，已经检查模型文件与路径并重新尝试拉取模型三次部署，皆无法解决问题。
输出内容：

配置文件写法:
$LF1H)XD_ZF)1$9FK8G(PU{1$

linux版本的speakers请求和windows版本的返回json不一样？

wget http://127.0.0.1:23456/voice/speakers 只会得到id和name两个属性，而且没有属性名，更像map？
{"HuBert-VITS":[],"VITS":[{"0":"爱丽丝"},{"1":"日奈"},{"2":"星野"},{"3":"优香"}],"W2V2-VITS":[]}

使用日文进行对话的时候，如果中间夹杂有汉字或者平假名，会跳过汉字或者平假名

在huggingface上部署如何部署项目

在服务器上运行VITS模型需要满足较高的运行环境要求，而本地电脑不可能一直运行不关机。相比之下，Hugging Face提供的免费配置能够轻松地运行一个VITS项目。

关于SSML语言停顿问题

[Feature Request] mix 模式支持语言检测

请求 /voice?lang=mix 时提供的 text 可以自动标记语言。

例如： /voice?lang=mix&text=你好用日语说是こんにちは

服务端可以自动标记成 [ZH]你好用日语说是[ZH][JA]こんにちは[JA]

如果这个功能可以在服务端完成，客户端可以少写很多代码，这个项目就可以作为一个即插即用的 API 直接使用了。

Mac M1 pip install error when installing openjtalk>=0.3.0.dev2

it seems something go wrong when installing packages:

(py30) zhonghaoli@MacBook-Pro-14-inch-2021 vits-simple-api % pip install -r requirements.txt
Defaulting to user installation because normal site-packages is not writeable
Collecting numba (from -r requirements.txt (line 1))
Using cached numba-0.57.0-cp310-cp310-macosx_11_0_arm64.whl (2.5 MB)
Collecting librosa (from -r requirements.txt (line 2))
Using cached librosa-0.10.0.post2-py3-none-any.whl (253 kB)
Collecting numpy==1.23.3 (from -r requirements.txt (line 3))
Using cached numpy-1.23.3-cp310-cp310-macosx_11_0_arm64.whl (13.3 MB)
Collecting scipy (from -r requirements.txt (line 4))
Using cached scipy-1.10.1-cp310-cp310-macosx_12_0_arm64.whl (28.8 MB)
Collecting torch (from -r requirements.txt (line 5))
Using cached torch-2.0.1-cp310-none-macosx_11_0_arm64.whl (55.8 MB)
Collecting unidecode (from -r requirements.txt (line 6))
Using cached Unidecode-1.3.6-py3-none-any.whl (235 kB)
Collecting openjtalk==0.3.0.dev2 (from -r requirements.txt (line 7))
Using cached openjtalk-0.3.0.dev2.tar.gz (24.9 MB)
Installing build dependencies ... done
Getting requirements to build wheel ... error
error: subprocess-exited-with-error

× Getting requirements to build wheel did not run successfully.
│ exit code: 1
╰─> [18 lines of output]
Traceback (most recent call last):
File "/Users/zhonghaoli/miniforge3/envs/py30/lib/python3.10/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 353, in

加入amr格式音频输出

部分平台（如qq）需要使用amr格式才能发送音频，使用其他格式需要用ffmpeg转码成amr，不如在服务端自动转码，省掉前端流程

这个使用gpu加速的话一般短句的十几个字的这种。一句话多久合成好呢？

size mismatch for emb_g.weight: copying a param with shape torch.Size([5, 256]) from checkpoint, the shape in current model is torch.Size([7, 256]).

config.py

import os
import sys

JSON_AS_ASCII = False
MAX_CONTENT_LENGTH = 5242880

端口

PORT = 23457

项目的绝对路径

ABS_PATH = os.path.join(os.path.dirname(os.path.realpath(sys.argv[0])))

上传文件的临时路径，非必要不要动

UPLOAD_FOLDER = ABS_PATH + "/upload"

音频转换的临时缓存路径，非必要不要动

CACHE_PATH = ABS_PATH + "/cache"

'''
vits模型路径填写方法，MODEL_LIST中的每一行是
[ABS_PATH+"/Model/{模型文件夹}/{.pth模型}", ABS_PATH+"/Model/{模型文件夹}/config.json"],
也可以写相对路径或绝对路径，由于windows和linux路径写法不同，用上面的写法或绝对路径最稳妥
示例：
MODEL_LIST = [
#VITS
[ABS_PATH+"/Model/Nene_Nanami_Rong_Tang/1374_epochs.pth", ABS_PATH+"/Model/Nene_Nanami_Rong_Tang/config.json"],
[ABS_PATH+"/Model/Zero_no_tsukaima/1158_epochs.pth", ABS_PATH+"/Model/Zero_no_tsukaima/config.json"],
[ABS_PATH+"/Model/g/G_953000.pth", ABS_PATH+"/Model/g/config.json"],
#HuBert-VITS
[ABS_PATH+"/Model/louise/360_epochs.pth", ABS_PATH+"/Model/louise/config.json", ABS_PATH+"/Model/louise/hubert-soft-0d54a1f4.pt"],
]
'''

模型加载列表

MODEL_LIST = [
[ABS_PATH+"/Model/g/1374_epochs.pth", ABS_PATH+"/Model/g/config.json"],
]

docker-compose.yaml

version: '3.4'
services:
moegoe:
image: artrajz/moegoe-simple-api:latest
restart: always
ports:
- 23457:23457
environment:
LANG: 'C.UTF-8'
volumes:
- ./Model:/app/Model # 挂载模型文件夹
- ./config.py:/app/config.py # 挂载配置文件

这是模型存放的位置

是哪出问题了呢？

发现三个bug？

我测试一下了，很棒！感谢大佬！
不过好像有三个bug（或许是我的方式不对？）
1.方言无法失败会自动转为普通话，比如我用粤语；[GD]XXXXXXXX[GD] , 无效，会读成普通话；
2.用老大的post，发现ssml 会出现这样的bug：

Traceback (most recent call last):
  File "1.py", line 5, in <module>
    voice_ssml(smm);
  File "D:\ai\vits\bmss_fy\vits-simple-api-windows\post.py", line 151, in voice_ssml
    fname = re.findall("filename=(.+)", res.headers["Content-Disposition"])[0]
  File "C:\Users\lin85\AppData\Local\Programs\Python\Python38\lib\site-packages\requests\structures.py", line 52, in __getitem__
    return self._store[key.lower()][1]
KeyError: 'content-disposition'

3.（不是bug哈哈哈）原emotion可以使用单个npy文件使用情绪，是否后续追加，也支持单个npy？

最后感谢付出。

请问为什么链接打开之后什么内容都没有呢

cmd里会显示 "GET /favicon.ico HTTP/1.1" 404 -

bug:The logging does not work if no model is loaded

未加载任何模型时logging不工作#36 (comment)

请问模型的推理一般对硬件要求有多高

报错 EOFError: Ran out of input

在使用python app.py时出现了报错，日志如下：
root@ecsekei:~/vits# python3 app.py
torch:2.0.0+cu117 GPU_available:False
device:cpu device.type:cpu
Traceback (most recent call last):
File "/root/vits/app.py", line 25, in
voice_obj, voice_speakers = merge_model(app.config["MODEL_LIST"])
File "/root/vits/utils/merge.py", line 53, in merge_model
obj = vits(model=i[0], config=i[1])
File "/root/vits/voice.py", line 54, in init
self.load_model(model, model_)
File "/root/vits/voice.py", line 57, in load_model
utils.load_checkpoint(model, self.net_g_ms)
File "/root/vits/utils/utils.py", line 43, in load_checkpoint
checkpoint_dict = load(checkpoint_path, map_location='cpu')
File "/usr/local/lib/python3.10/dist-packages/torch/serialization.py", line 815, in load
return _legacy_load(opened_file, map_location, pickle_module, **pickle_load_args)
File "/usr/local/lib/python3.10/dist-packages/torch/serialization.py", line 1033, in _legacy_load
magic_number = pickle_module.load(f, **pickle_load_args)
EOFError: Ran out of input

不是很懂python求助路径问题

clone项目到服务器上后，下一步，模型库以及config.py都要放到服务器根路径的/path/to/....？但是在app.py里读取config.py的时候却没有指定ABS_PATH，应该是直接在同级目录下加载，这样能找到吗？感觉有一点反直觉，想确认一下

宝子，好像有一个bug

当我测试这个的时候，它自动把一些词识别为日语。
并且就算我在加上[ZH]包裹也还是复现。

log如下：

DEBUG:vits-simple-api:[[EN]ZH][EN][ZH] 君不见，[ZH][JA]黄河之水天上来，[JA][ZH]奔流到海不复回。君不见，高堂明镜悲白发，朝如青丝暮成雪[[ZH][EN]ZH][EN]

artrajz / vits-simple-api Goto Github PK

vits-simple-api's Introduction

vits-simple-api

Feature

Online Demo

Deployment

Docker Deployment (Recommended for Linux)

Step 1: Pull the Docker Image

Step 2: Start

Image Update

Virtual Environment Deployment

Step 1: Clone the Project

Step 2: Install Python Dependencies

Step 3: Start

Windows Quick Deployment Package

Step 1: Download and Extract the Deployment Package

Step 2: Start

Model Loading

Step 1: Download VITS Models

Step 2: Loading Models

Automatic Model Loading

Manual Model Loading

Other Models

GPU accelerated

Windows

Install CUDA

Install GPU version of PyTorch

Linux

WebUI

Inference Frontend

Admin Backend

Function Options Explanation

Disable the Admin Backend

Bert-VITS2 Configuration and Language/Bert Model Usage

Custom Chinese Polyphonic Dictionary

GPT-SoVITS Reference Audio Presets

Reading API

Frequently Asked Questions

Installation Issues with fastText Dependency

Installation Issues with pyopenjtalk Dependency

Bert-VITS2 Version Compatibility

API

GET

speakers list

voice vits

check

POST

API KEY

Parameter

VITS

VITS voice conversion

HuBert-VITS

W2V2-VITS

Dimensional emotion

Bert-VITS2

GPT-SoVITS Speech Synthesis

SSML (Speech Synthesis Markup Language)

Reading

Example

Communication

Acknowledgements

Thank You to All Contributors

vits-simple-api's People

Contributors

Stargazers

Watchers

Forkers

vits-simple-api's Issues

端口

项目的绝对路径

上传文件的临时路径，非必要不要动

音频转换的临时缓存路径，非必要不要动

模型加载列表

Recommend Projects

Recommend Topics

Recommend Org