artrajz / vits-simple-api Goto Github PK

View Code? Open in Web Editor NEW

768.0 768.0 117.0 15.44 MB

A simple VITS HTTP API, developed by extending Moegoe with additional features.

License: GNU Affero General Public License v3.0

Python 91.90% Dockerfile 0.07% Shell 1.46% HTML 3.50% CSS 0.65% JavaScript 2.41% PowerShell 0.01%

bert-vits2 gpt-sovits moegoe tts tts-api vits

vits-simple-api's Issues

在移动平台重新搭建时出现故障

平台 win11 WSL2.0 ubuntu
docker能够运行，但是无法输出内容，已经检查模型文件与路径并重新尝试拉取模型三次部署，皆无法解决问题。
输出内容：

配置文件写法:
$LF1H)XD_ZF)1$9FK8G(PU{1$

[BUG] 生成语音失败speaker_id = int(request.args.get("id", app.config["ID"]))

这是日志

moegoe_1 | INFO:werkzeug:101.33.231.208 - - [07/Apr/2023 17:20:24] "GET /voice?text=%5BLENGTH=1.4%5D你好！有什么我可以为您做的吗？请注意，我只能通过文本输入与您交流，无法识别语音指令。&lang=zh&id=1&format=silk HTTP/1.1" 500 -
moegoe_1 | INFO:werkzeug:101.33.231.208 - - [07/Apr/2023 17:20:30] "POST /voice/speakers HTTP/1.1" 200 -
moegoe_1 | ERROR:app:Exception on /voice [GET]
moegoe_1 | Traceback (most recent call last):
moegoe_1 | File "/usr/local/lib/python3.9/site-packages/flask/app.py", line 2528, in wsgi_app
moegoe_1 | response = self.full_dispatch_request()
moegoe_1 | File "/usr/local/lib/python3.9/site-packages/flask/app.py", line 1825, in full_dispatch_request
moegoe_1 | rv = self.handle_user_exception(e)
moegoe_1 | File "/usr/local/lib/python3.9/site-packages/flask/app.py", line 1823, in full_dispatch_request
moegoe_1 | rv = self.dispatch_request()
moegoe_1 | File "/usr/local/lib/python3.9/site-packages/flask/app.py", line 1799, in dispatch_request
moegoe_1 | return self.ensure_sync(self.view_functions[rule.endpoint])(**view_args)
moegoe_1 | File "/app/app.py", line 51, in voice_api
moegoe_1 | speaker_id = int(request.args.get("id", app.config["ID"]))
moegoe_1 | KeyError: 'ID'
moegoe_1 | INFO:werkzeug:101.33.231.208 - - [07/Apr/2023 17:20:30] "GET /voice?text=%5BLENGTH=1.4%5D&lang=zh&id=1&format=silk HTTP/1.1" 500 -
moegoe_1 | ERROR:app:Exception on /voice [GET]
moegoe_1 | Traceback (most recent call last):
moegoe_1 | File "/usr/local/lib/python3.9/site-packages/flask/app.py", line 2528, in wsgi_app
moegoe_1 | response = self.full_dispatch_request()
moegoe_1 | File "/usr/local/lib/python3.9/site-packages/flask/app.py", line 1825, in full_dispatch_request
moegoe_1 | rv = self.handle_user_exception(e)
moegoe_1 | File "/usr/local/lib/python3.9/site-packages/flask/app.py", line 1823, in full_dispatch_request
moegoe_1 | rv = self.dispatch_request()
moegoe_1 | File "/usr/local/lib/python3.9/site-packages/flask/app.py", line 1799, in dispatch_request
moegoe_1 | return self.ensure_sync(self.view_functions[rule.endpoint])(**view_args)
moegoe_1 | File "/app/app.py", line 51, in voice_api
moegoe_1 | speaker_id = int(request.args.get("id", app.config["ID"]))
moegoe_1 | KeyError: 'ID'
moegoe_1 | INFO:werkzeug:101.33.231.208 - - [07/Apr/2023 17:20:35] "GET /voice?text=%5BLENGTH=1.4%5D你好！有什么我可以帮助您的吗？&lang=zh&id=1&format=silk HTTP/1.1" 500 -
moegoe_1 | * Serving Flask app 'app'
moegoe_1 | * Debug mode: off
moegoe_1 | INFO:werkzeug:WARNING: This is a development server. Do not use it in a production deployment. Use a production WSGI server instead.
moegoe_1 | * Running on all addresses (0.0.0.0)
moegoe_1 | * Running on http://127.0.0.1:23457
moegoe_1 | * Running on http://172.21.0.2:23457
moegoe_1 | INFO:werkzeug:Press CTRL+C to quit
moegoe_1 | INFO:werkzeug:101.33.231.208 - - [07/Apr/2023 17:23:06] "POST /voice//speakers HTTP/1.1" 308 -
moegoe_1 | INFO:werkzeug:101.33.231.208 - - [07/Apr/2023 17:23:06] "POST /voice/speakers HTTP/1.1" 200 -
moegoe_1 | INFO:werkzeug:101.33.231.208 - - [07/Apr/2023 17:23:07] "POST /voice//speakers HTTP/1.1" 308 -
moegoe_1 | INFO:werkzeug:101.33.231.208 - - [07/Apr/2023 17:23:07] "POST /voice/speakers HTTP/1.1" 200 -
moegoe_1 | INFO:werkzeug:101.33.231.208 - - [07/Apr/2023 17:23:07] "GET /voice/?text=%5BLENGTH=1.4%5D&lang=zh&id=1&format=silk HTTP/1.1" 200 -
moegoe_1 | INFO:werkzeug:101.33.231.208 - - [07/Apr/2023 17:23:12] "GET /voice/?text=%5BLENGTH=1.4%5D这句话太长了，抱歉&lang=zh&id=1&format=silk HTTP/1.1" 200 -
moegoe_1 | INFO:werkzeug:101.33.231.208 - - [07/Apr/2023 17:26:46] "GET /voice/?text=%5BLENGTH=1.4%5D消息已收到！当前我还有条消息要回复，请您稍等。&lang=zh&id=1&format=silk HTTP/1.1" 200 -
moegoe_1 | INFO:werkzeug:101.33.231.208 - - [07/Apr/2023 17:27:12] "GET /voice/?text=%5BLENGTH=1.4%5D消息已收到！当前我还有条消息要回复，请您稍等。&lang=zh&id=1&format=silk HTTP/1.1" 200 -
moegoe_1 | INFO:werkzeug:101.33.231.208 - - [07/Apr/2023 17:27:27] "GET /voice/?text=%5BLENGTH=1.4%5D消息已收到！当前我还有条消息要回复，请您稍等。&lang=zh&id=1&format=silk HTTP/1.1" 200 -
moegoe_1 | * Serving Flask app 'app'
moegoe_1 | * Debug mode: off
moegoe_1 | INFO:werkzeug:WARNING: This is a development server. Do not use it in a production deployment. Use a production WSGI server instead.
moegoe_1 | * Running on all addresses (0.0.0.0)
moegoe_1 | * Running on http://127.0.0.1:23457
moegoe_1 | * Running on http://172.21.0.2:23457
moegoe_1 | INFO:werkzeug:Press CTRL+C to quit
moegoe_1 | INFO:werkzeug:101.33.231.208 - - [07/Apr/2023 17:34:12] "POST /voice//speakers HTTP/1.1" 308 -
moegoe_1 | INFO:werkzeug:101.33.231.208 - - [07/Apr/2023 17:34:12] "POST /voice/speakers HTTP/1.1" 200 -
moegoe_1 | INFO:werkzeug:101.33.231.208 - - [07/Apr/2023 17:34:12] "GET /voice/?text=%5BLENGTH=1.4%5D你好！有什么我可以帮助你的吗？&lang=zh&id=1&format=silk HTTP/1.1" 200 -

在docker执行python app.py时出现报错

文件结构已经检查：
$}L6@(MIT L_O{ 4K{I6)H}B$
配置文件已经检查:

报错截图如下：
$HG7AU$%PI23F`2ED(@_MR{Q$

linux版本的speakers请求和windows版本的返回json不一样？

wget http://127.0.0.1:23456/voice/speakers 只会得到id和name两个属性，而且没有属性名，更像map？
{"HuBert-VITS":[],"VITS":[{"0":"爱丽丝"},{"1":"日奈"},{"2":"星野"},{"3":"优香"}],"W2V2-VITS":[]}

[Feature Request] mix 模式支持语言检测

请求 /voice?lang=mix 时提供的 text 可以自动标记语言。

例如： /voice?lang=mix&text=你好用日语说是こんにちは

服务端可以自动标记成 [ZH]你好用日语说是[ZH][JA]こんにちは[JA]

如果这个功能可以在服务端完成，客户端可以少写很多代码，这个项目就可以作为一个即插即用的 API 直接使用了。

在docker上部署时出错

在docker上部署时出现以下错误：

INFO:moegoe-simple-api:角色id：0
INFO:moegoe-simple-api:合成文本：[ZH]您好！有什么我可以帮助您的吗？[ZH]
ERROR:app:Exception on /voice [GET]
Traceback (most recent call last):
File "/usr/local/lib/python3.9/site-packages/flask/app.py", line 2528, in wsgi_app
response = self.full_dispatch_request()
File "/usr/local/lib/python3.9/site-packages/flask/app.py", line 1825, in full_dispatch_request
rv = self.handle_user_exception(e)
File "/usr/local/lib/python3.9/site-packages/flask/app.py", line 1823, in full_dispatch_request
rv = self.dispatch_request()
File "/usr/local/lib/python3.9/site-packages/flask/app.py", line 1799, in dispatch_request
return self.ensure_sync(self.view_functions[rule.endpoint])(**view_args)
File "/app/app.py", line 80, in voice_api
output, file_type, fname = real_obj.generate(text=text,
File "/app/voice.py", line 100, in generate
stn_tst = self.get_text(text, self.hps_ms, cleaned=cleaned)
File "/app/voice.py", line 56, in get_text
text_norm = text_to_sequence(text, hps.symbols, hps.data.text_cleaners)
File "/app/text/init.py", line 17, in text_to_sequence
clean_text = _clean_text(text, cleaner_names)
File "/app/text/init.py", line 31, in _clean_text
text = cleaner(text)
File "/app/text/cleaners.py", line 118, in shanghainese_cleaners
from text.shanghainese import shanghainese_to_ipa
File "/app/text/shanghainese.py", line 6, in
converter = opencc.OpenCC('zaonhe')
File "/usr/local/lib/python3.9/site-packages/opencc/init.py", line 43, in init
super(OpenCC, self).init(config)
RuntimeError: /usr/local/lib/python3.9/site-packages/opencc/clib/share/opencc/zaonhe.json not found or not accessible.
INFO:werkzeug:172.30.0.1 - - [10/Apr/2023 13:28:30] "GET /voice?text=您好！有什么我可以帮助您的吗？&lang=zh&id=0&format=silk&length=1.4 HTTP/1.1" 500 -

请问是什么原因

bug:The logging does not work if no model is loaded

未加载任何模型时logging不工作#36 (comment)

请问模型的推理一般对硬件要求有多高

'list' object has no attribute 'get'报错

服务器部署的应该没有问题的，不清楚什么原因报的错

用的是哪个版本的模型？

v1模型可以使用吗

speaker id 0 does not exist

http://127.0.0.1:23456/voice/speakers 返回 {"HUBERT-VITS":[],"VITS":[],"W2V2-VITS":[]} 是不是表示失败？

http://127.0.0.1:23456/voice?text=我喜欢看**记录片
{"message":"id 0 does not exist","status":"error"}

是我的问题吗？看log

DEBUG:vits-simple-api:[GD]君不见，黄河之水天上来，奔流到海不复回。君不见，高堂明镜悲白发，朝如青丝暮成雪[GD]
ERROR:app:Exception on /voice/w2v2-vits [POST]
Traceback (most recent call last):
File "C:\Users\lin85\AppData\Local\Programs\Python\Python310\lib\site-packages\flask\app.py", line 2528, in wsgi_app
response = self.full_dispatch_request()
File "C:\Users\lin85\AppData\Local\Programs\Python\Python310\lib\site-packages\flask\app.py", line 1825, in full_dispatch_request
rv = self.handle_user_exception(e)
File "C:\Users\lin85\AppData\Local\Programs\Python\Python310\lib\site-packages\flask\app.py", line 1823, in full_dispatch_request
rv = self.dispatch_request()
File "C:\Users\lin85\AppData\Local\Programs\Python\Python310\lib\site-packages\flask\app.py", line 1799, in dispatch_request
return self.ensure_sync(self.view_functions[rule.endpoint])(**view_args)
File "G:\AI\vits接口\VITS\app.py", line 44, in check_api_key
return func(args, **kwargs)
File "G:\AI\vits接口\VITS\app.py", line 239, in voice_w2v2_api
output = tts.w2v2_vits_infer({"text": text,
File "G:\AI\vits接口\VITS\voice.py", line 454, in w2v2_vits_infer
audio = voice_obj.get_audio(voice, auto_break=True)
File "G:\AI\vits接口\VITS\voice.py", line 216, in get_audio
self.get_infer_param(text=sentence, speaker_id=speaker_id, length=length, noise=noise,
File "G:\AI\vits接口\VITS\voice.py", line 119, in get_infer_param
stn_tst = self.get_cleaned_text(text, self.hps_ms, cleaned=cleaned)
File "G:\AI\vits接口\VITS\voice.py", line 59, in get_cleaned_text
text_norm = text_to_sequence(text, hps.symbols, hps.data.text_cleaners)
File "G:\AI\vits接口\VITS\text_init_.py", line 17, in text_to_sequence
clean_text = clean_text(text, cleaner_names)
File "G:\AI\vits接口\VITS\text_init.py", line 31, in _clean_text
text = cleaner(text)
File "G:\AI\vits接口\VITS\text\cleaners.py", line 241, in chinese_dialect_cleaners
text = re.sub(r'[GD](.?)[GD]',
File "C:\Users\lin85\AppData\Local\Programs\Python\Python310\lib\re.py", line 209, in sub
return compile(pattern, flags).sub(repl, string, count)
File "G:\AI\vits接口\VITS\text\cleaners.py", line 242, in
lambda x: cantonese_to_ipa(x.group(1)) + ' ', text)
File "G:\AI\vits接口\VITS\text\cantonese.py", line 62, in cantonese_to_ipa
text = converter.convert(text).replace('-', '').replace('$', ' ')
File "C:\Users\lin85\AppData\Local\Programs\Python\Python310\lib\site-packages\opencc_init.py", line 87, in convert
retv_i = libopencc.opencc_convert_utf8(self._od, text, len(text))
OSError: exception: access violation reading 0xFFFFFFFFFFFFFFFF
INFO:werkzeug:127.0.0.1 - - [31/May/2023 10:51:14] "POST /voice/w2v2-vits HTTP/1.1" 500 -
DEBUG:urllib3.connectionpool:http://127.0.0.1:23456 "POST /voice/w2v2-vits HTTP/1.1" 500 265

希望能加入mp3格式的输出

mp3作为一种常用格式能被许多应用内部打开，而且是压缩过的。用ogg一些应用内部打不开，还要转到一些外部播放器里打开有点麻烦，谢谢！

在huggingface上部署如何部署项目

在服务器上运行VITS模型需要满足较高的运行环境要求，而本地电脑不可能一直运行不关机。相比之下，Hugging Face提供的免费配置能够轻松地运行一个VITS项目。

在服务器上部署后生成的语音没有声音或者都是0秒

部署方式：docker-compose

下面是我的config.py文件内容

import os
import sys

JSON_AS_ASCII = False
MAX_CONTENT_LENGTH = 5242880

# port
PORT = 23456
# absolute path
ABS_PATH = os.path.join(os.path.dirname(os.path.realpath(sys.argv[0])))
# upload path
UPLOAD_FOLDER = ABS_PATH + "/upload"
# cahce path
CACHE_PATH = ABS_PATH + "/cache"
# zh ja ko en ...
LANGUAGE_AUTOMATIC_DETECT = ["zh","ja"]
#set to True to enable API Key authentication
API_KEY_ENABLED = False
# API_KEY is required for authentication
API_KEY = "api-key"

'''
For each model, the filling method is as follows 模型列表中每个模型的填写方法如下
example 示例:
MODEL_LIST = [
    #VITS
    [ABS_PATH+"/Model/Nene_Nanami_Rong_Tang/1374_epochs.pth", ABS_PATH+"/Model/Nene_Nanami_Rong_Tang/config.json"],
    [ABS_PATH+"/Model/Zero_no_tsukaima/1158_epochs.pth", ABS_PATH+"/Model/Zero_no_tsukaima/config.json"],
    [ABS_PATH+"/Model/g/G_953000.pth", ABS_PATH+"/Model/g/config.json"],
    #HuBert-VITS
    [ABS_PATH+"/Model/louise/360_epochs.pth", ABS_PATH+"/Model/louise/config.json", ABS_PATH+"/Model/louise/hubert-soft-0d54a1f4.pt"],
]
'''
# load mutiple models
MODEL_LIST = [
    [ABS_PATH+"/Model/Nene_Meguru_Yoshino_Mako_Myrasame_hoharu_Nanami/365_epochs.pth", ABS_PATH+"/Model/Nene_Meguru_Yoshino_Mako_Myrasame_hoharu_Nanami/config.json"],
    [ABS_PATH+"/Model/to_love/1113_epochs.pth", ABS_PATH+"/Model/to_love/config.json"],
]

"""
default params
以下选项是修改VITS GET方法 [不指定参数]时的默认值
"""

# GET 默认音色id
ID = 0
# GET 默认音频格式 可选wav,ogg,silk
FORMAT = "wav"
# GET 默认语言
LANG = "AUTO"
# GET 默认语音长度，相当于调节语速，该数值越大语速越慢
LENGTH = 1
# GET 默认噪声
NOISE = 0.667
# GET 默认噪声偏差
NOISEW = 0.8
#长文本分段阈值，max<=0表示不分段,text will not be divided if max<=0
MAX = 50

请求其他接口，比如http://127.0.0.1:23456/voice/speakers，都是正常返回的，但是请求 http://127.0.0.1:23456/voice?text=你好呀 生成的音频却只有喘息声

关于SSML语言停顿处理

你这个项目非常赞，我想请教一下，这个SSML语言相关的停顿在项目中哪里有处理？我没找到。我要是想在源vits项目自定义停顿应该如何做？麻烦你说一下思路，谢谢了。

请问为什么链接打开之后什么内容都没有呢

cmd里会显示 "GET /favicon.ico HTTP/1.1" 404 -

挂载完成后服务未能找到speaker

挂载是成功的，但是无法查询到speaker

不是很懂python求助路径问题

clone项目到服务器上后，下一步，模型库以及config.py都要放到服务器根路径的/path/to/....？但是在app.py里读取config.py的时候却没有指定ABS_PATH，应该是直接在同级目录下加载，这样能找到吗？感觉有一点反直觉，想确认一下

老大加一个流式处理？

老大加一个流式处理？是否能实现流式响应？

发现三个bug？

我测试一下了，很棒！感谢大佬！
不过好像有三个bug（或许是我的方式不对？）
1.方言无法失败会自动转为普通话，比如我用粤语；[GD]XXXXXXXX[GD] , 无效，会读成普通话；
2.用老大的post，发现ssml 会出现这样的bug：

Traceback (most recent call last):
  File "1.py", line 5, in <module>
    voice_ssml(smm);
  File "D:\ai\vits\bmss_fy\vits-simple-api-windows\post.py", line 151, in voice_ssml
    fname = re.findall("filename=(.+)", res.headers["Content-Disposition"])[0]
  File "C:\Users\lin85\AppData\Local\Programs\Python\Python38\lib\site-packages\requests\structures.py", line 52, in __getitem__
    return self._store[key.lower()][1]
KeyError: 'content-disposition'

3.（不是bug哈哈哈）原emotion可以使用单个npy文件使用情绪，是否后续追加，也支持单个npy？

最后感谢付出。

报错 EOFError: Ran out of input

在使用python app.py时出现了报错，日志如下：
root@ecsekei:~/vits# python3 app.py
torch:2.0.0+cu117 GPU_available:False
device:cpu device.type:cpu
Traceback (most recent call last):
File "/root/vits/app.py", line 25, in
voice_obj, voice_speakers = merge_model(app.config["MODEL_LIST"])
File "/root/vits/utils/merge.py", line 53, in merge_model
obj = vits(model=i[0], config=i[1])
File "/root/vits/voice.py", line 54, in init
self.load_model(model, model_)
File "/root/vits/voice.py", line 57, in load_model
utils.load_checkpoint(model, self.net_g_ms)
File "/root/vits/utils/utils.py", line 43, in load_checkpoint
checkpoint_dict = load(checkpoint_path, map_location='cpu')
File "/usr/local/lib/python3.10/dist-packages/torch/serialization.py", line 815, in load
return _legacy_load(opened_file, map_location, pickle_module, **pickle_load_args)
File "/usr/local/lib/python3.10/dist-packages/torch/serialization.py", line 1033, in _legacy_load
magic_number = pickle_module.load(f, **pickle_load_args)
EOFError: Ran out of input

Mac M1 pip install error when installing openjtalk>=0.3.0.dev2

it seems something go wrong when installing packages:

(py30) zhonghaoli@MacBook-Pro-14-inch-2021 vits-simple-api % pip install -r requirements.txt
Defaulting to user installation because normal site-packages is not writeable
Collecting numba (from -r requirements.txt (line 1))
Using cached numba-0.57.0-cp310-cp310-macosx_11_0_arm64.whl (2.5 MB)
Collecting librosa (from -r requirements.txt (line 2))
Using cached librosa-0.10.0.post2-py3-none-any.whl (253 kB)
Collecting numpy==1.23.3 (from -r requirements.txt (line 3))
Using cached numpy-1.23.3-cp310-cp310-macosx_11_0_arm64.whl (13.3 MB)
Collecting scipy (from -r requirements.txt (line 4))
Using cached scipy-1.10.1-cp310-cp310-macosx_12_0_arm64.whl (28.8 MB)
Collecting torch (from -r requirements.txt (line 5))
Using cached torch-2.0.1-cp310-none-macosx_11_0_arm64.whl (55.8 MB)
Collecting unidecode (from -r requirements.txt (line 6))
Using cached Unidecode-1.3.6-py3-none-any.whl (235 kB)
Collecting openjtalk==0.3.0.dev2 (from -r requirements.txt (line 7))
Using cached openjtalk-0.3.0.dev2.tar.gz (24.9 MB)
Installing build dependencies ... done
Getting requirements to build wheel ... error
error: subprocess-exited-with-error

× Getting requirements to build wheel did not run successfully.
│ exit code: 1
╰─> [18 lines of output]
Traceback (most recent call last):
File "/Users/zhonghaoli/miniforge3/envs/py30/lib/python3.10/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 353, in

大佬请教一下，我用wechat机器人请求vits-simple-api，反馈语音合成失败，该如何解决。

大佬您好，最近在测试微信机器人的语音合成方案，在调用咱们家的vist-simple-api时遇到了点问题，想请教您。

前置确认
1.我确认我运行的是最新版本的代码，并且安装了所需的依赖
搜索issues中是否已存在类似问题

2.我已经搜索过issues，没有跟我遇到的问题相关的issue

操作系统类型?
Windows 10 专业版(21H2)

运行的python版本是?
python 3.10.8

复现步骤 🕹
启动终端——>激活环境变量(fort)——>python app.py——>给微信机器人发语音——>请求失败

问题描述 😯
在微信机器人项目(https://github.com/zhayujie/chatgpt-on-wechat) 中调用仓库的语音合成api,现实语音合成失败。

终端日志 📒
127.0.0.1 - - [14/May/2023 13:12:40] "POST /voice HTTP/1.1" 400 -

【功能请求】希望能添加防止公网服务器被滥用的鉴权功能

作者您好：
目前vits-simple-api没有鉴权功能，这意味着部署在公网的vits-simple-api会响应所有请求，一旦服务器URL暴露，可能有被滥用的风险，建议考虑加入类似于api key的鉴权功能，谢谢！

单纯使用转化功能不训练，需要的服务器性能是什么

我服务器是centOS 8，尝试启动项目：

python3 app.py 
torch:2.0.1+cu117 GPU_available:False
device:cpu device.type:cpu

到这里CPU占用和硬盘IO飙升，很快导致io_util达到峰值，服务器无法响应了。关掉终端重新登录也无法响应，只能重启服务器。死机的时候看了一眼硬盘读取是285508kb/s，这是读取模型吗？总共我就导入了三个pth模型。
有什么好的办法吗？（已经多次尝试，必死机）

size mismatch for emb_g.weight: copying a param with shape torch.Size([5, 256]) from checkpoint, the shape in current model is torch.Size([7, 256]).

config.py

import os
import sys

JSON_AS_ASCII = False
MAX_CONTENT_LENGTH = 5242880

端口

PORT = 23457

项目的绝对路径

ABS_PATH = os.path.join(os.path.dirname(os.path.realpath(sys.argv[0])))

上传文件的临时路径，非必要不要动

UPLOAD_FOLDER = ABS_PATH + "/upload"

音频转换的临时缓存路径，非必要不要动

CACHE_PATH = ABS_PATH + "/cache"

'''
vits模型路径填写方法，MODEL_LIST中的每一行是
[ABS_PATH+"/Model/{模型文件夹}/{.pth模型}", ABS_PATH+"/Model/{模型文件夹}/config.json"],
也可以写相对路径或绝对路径，由于windows和linux路径写法不同，用上面的写法或绝对路径最稳妥
示例：
MODEL_LIST = [
#VITS
[ABS_PATH+"/Model/Nene_Nanami_Rong_Tang/1374_epochs.pth", ABS_PATH+"/Model/Nene_Nanami_Rong_Tang/config.json"],
[ABS_PATH+"/Model/Zero_no_tsukaima/1158_epochs.pth", ABS_PATH+"/Model/Zero_no_tsukaima/config.json"],
[ABS_PATH+"/Model/g/G_953000.pth", ABS_PATH+"/Model/g/config.json"],
#HuBert-VITS
[ABS_PATH+"/Model/louise/360_epochs.pth", ABS_PATH+"/Model/louise/config.json", ABS_PATH+"/Model/louise/hubert-soft-0d54a1f4.pt"],
]
'''

模型加载列表

MODEL_LIST = [
[ABS_PATH+"/Model/g/1374_epochs.pth", ABS_PATH+"/Model/g/config.json"],
]

docker-compose.yaml

version: '3.4'
services:
moegoe:
image: artrajz/moegoe-simple-api:latest
restart: always
ports:
- 23457:23457
environment:
LANG: 'C.UTF-8'
volumes:
- ./Model:/app/Model # 挂载模型文件夹
- ./config.py:/app/config.py # 挂载配置文件

这是模型存放的位置

是哪出问题了呢？

报错TypeError: 'type' object is not subscriptable

按步骤部署最后python app.py时报错
Traceback (most recent call last):
File "app.py", line 10, in
from utils import clean_folder, merge_model
File "C:\Users\Administrator\Desktop\MoeGoe-Simple-API\utils.py", line 120, in
def to_pcm(in_path: str) -> tuple[str, int]:
TypeError: 'type' object is not subscriptable

关于SSML语言停顿问题

ubanru22.04 docker环境下报错

作者又要麻烦你了
请问这是什么情况啊

老大哦，还是建议加上音频来调节情感吧，在MoeGoe来回跳转着实麻烦哈哈哈

有时候需要新增一个情绪的npy ，需要先到MoeGoe去生成npy，还是觉得建议加上音频来调节情感吧，来回跳转很麻烦哈哈哈，而且
也还是要依靠MoeGoe，如vits-simple-api有生成npy的功能，那就完全独立了，就可以删掉MoeGoe了哈哈哈。

sovits 4.0的模型无法使用

好像是配置文件不兼容导致的，请问有啥解决办法么？

我将配置文件里的"n_speakers"挪到了data里可以正常运行，但是生成wav过程中报错了，生成的都是1kb无法播放的文件。

请问大佬，怎么能将本项目部署在另一个主机上呢

就是两个服务器，因为我部署chat-bot的服务器配置太低了，而且硬盘容量也不够，这个项目如果部署在了另一个服务器上改怎么去访问呢，有没有什么具体的思路呀，或者我该百度些什么内容呢，问题有点低级，麻烦你了

如何实现gpu推理

我发现这个默认只能cpu推理，即便我有48核但是还是慢而且吃不满，但是我这还有3090，我想用gpu可以大幅度加速推理

使用日文进行对话的时候，如果中间夹杂有汉字或者平假名，会跳过汉字或者平假名

在windows的linux子系统上拉取失败

$GZO_V`1@O0FSAZ2W }P4{4I$

对接lss233的qqbot出现问题

大佬，好像是因为/voice/speakers id的返回值格式变了，qqbot的切换语音找不到对应的音色了，这个我直接改代码然后重新打包镜像就可以了吗

你好关于请求报错列表正常输出但是语音无法合成合成出的语音仅1kb 0s 无法播放

报错代码如下

[E 230613 19:09:03 fastlid:170] fastlid.set_languages is not a list
[JA]こんにちは[JA]
ERROR:app:Exception on /voice [POST]
Traceback (most recent call last):
File "E:\aila\API\vits-simple-api-windows_3\vits-simple-api\py310\lib\site-packages\flask\app.py", line 2528, in wsgi_app
response = self.full_dispatch_request()
File "E:\aila\API\vits-simple-api-windows_3\vits-simple-api\py310\lib\site-packages\flask\app.py", line 1825, in full_dispatch_request
rv = self.handle_user_exception(e)
File "E:\aila\API\vits-simple-api-windows_3\vits-simple-api\py310\lib\site-packages\flask\app.py", line 1823, in full_dispatch_request
rv = self.dispatch_request()
File "E:\aila\API\vits-simple-api-windows_3\vits-simple-api\py310\lib\site-packages\flask\app.py", line 1799, in dispatch_request
return self.ensure_sync(self.view_functions[rule.endpoint])(**view_args)
File "E:\aila\API\vits-simple-api-windows_3\vits-simple-api\app.py", line 37, in check_api_key
return func(*args, **kwargs)
File "E:\aila\API\vits-simple-api-windows_3\vits-simple-api\app.py", line 136, in voice_api
output = real_obj.create_infer_task(text=text,
File "E:\aila\API\vits-simple-api-windows_3\vits-simple-api\voice.py", line 211, in create_infer_task
self.get_infer_param(text=sentence, speaker_id=speaker_id, length=length, noise=noise,
File "E:\aila\API\vits-simple-api-windows_3\vits-simple-api\voice.py", line 147, in get_infer_param
stn_tst = self.get_cleaned_text(text, self.hps_ms, cleaned=cleaned)
File "E:\aila\API\vits-simple-api-windows_3\vits-simple-api\voice.py", line 71, in get_cleaned_text
text_norm = text_to_sequence(text, hps.symbols, hps.data.text_cleaners)
File "E:\aila\API\vits-simple-api-windows_3\vits-simple-api\text_init_.py", line 17, in text_to_sequence
clean_text = clean_text(text, cleaner_names)
File "E:\aila\API\vits-simple-api-windows_3\vits-simple-api\text_init.py", line 28, in _clean_text
cleaner = getattr(cleaners, name)
AttributeError: module 'text.cleaners' has no attribute 'custom_cleaners'
INFO:werkzeug:127.0.0.1 - - [13/Jun/2023 19:09:03] "POST /voice HTTP/1.1" 500 -

加入amr格式音频输出

部分平台（如qq）需要使用amr格式才能发送音频，使用其他格式需要用ffmpeg转码成amr，不如在服务端自动转码，省掉前端流程

docker 一键部署问题

我无法通过那个脚本部署docker镜像，我看了那个脚本里的内容 docker compose pull 后面什么都没有，是还没开发吗

Docker如何启用GPU加速

我尝试了如下的配置

version: '3.4'
services:
  vits:
    image: vits-simple-api:latest
    restart: always
    ports:
      - 23456:23456
    environment:
      LANG: 'C.UTF-8'
      TZ: Asia/Shanghai #timezone
    volumes:
      - ./Model:/app/Model # 挂载模型文件夹
      - ./config.py:/app/config.py # 挂载配置文件
      - /opt/cuda:/opt/cuda
      - ./cuda_test.py:/app/cuda_test.py
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: 1
              capabilities: [ gpu,utility ]

但是torch.cuda.is_available()是False。

nvidia-smi是可用的。

我在Dockerfile中删去了RUN pip install torch --index-url https://download.pytorch.org/whl/cpu改为了RUN pip install torch.

之后我在容器中将cuda添加到PATH中，使得nvcc -v也可用，但是还是没有解决问题。

getattr()报错

运行app.py后,getattr返回一个报错.提示 attribute name must be string.

配置文件路径设置如下:
MODEL_LIST = [
# VITS
[ABS_PATH + "/Model/Xin/G_32000.pth", ABS_PATH + "/Model/Xin/config.json"],
#[ABS_PATH + "/Model/Zero_no_tsukaima/1158_epochs.pth", ABS_PATH + "/Model/Zero_no_tsukaima/config.json"],
#[ABS_PATH + "/Model/g/G_953000.pth", ABS_PATH + "/Model/g/config.json"],
# HuBert-VITS (Need to configure HUBERT_SOFT_MODEL)
#[ABS_PATH + "/Model/louise/360_epochs.pth", ABS_PATH + "/Model/louise/config.json"],
# W2V2-VITS (Need to configure DIMENSIONAL_EMOTION_NPY)
#[ABS_PATH + "/Model/w2v2-vits/1026_epochs.pth", ABS_PATH + "/Model/w2v2-vits/config.json"],
]