Code Monkey home page Code Monkey logo

vits-simple-api's Issues

在移动平台重新搭建时出现故障

平台 win11 WSL2.0 ubuntu
docker能够运行,但是无法输出内容,已经检查模型文件与路径并重新尝试拉取模型三次部署,皆无法解决问题。
输出内容:
25WQ2L)WMP)9DTK88W2R](https://user-images.githubusercontent.com/95335008/229728096-bb3db031-d13d-4f8b-93ef-c065e3ec48ad.png) docker控制台内容: ![%3IOE7_G6H(901(MANW_RI
配置文件写法:
LF1H)XD_ZF)1$9FK8G(PU{1

[BUG] 生成语音失败speaker_id = int(request.args.get("id", app.config["ID"]))

QQ截图20230408013938

这是日志

moegoe_1 | INFO:werkzeug:101.33.231.208 - - [07/Apr/2023 17:20:24] "GET /voice?text=%5BLENGTH=1.4%5D你好!有什么我可以为您做的吗?请注意,我只能通过文本输入与您交流,无法识别语音指令。&lang=zh&id=1&format=silk HTTP/1.1" 500 -
moegoe_1 | INFO:werkzeug:101.33.231.208 - - [07/Apr/2023 17:20:30] "POST /voice/speakers HTTP/1.1" 200 -
moegoe_1 | ERROR:app:Exception on /voice [GET]
moegoe_1 | Traceback (most recent call last):
moegoe_1 | File "/usr/local/lib/python3.9/site-packages/flask/app.py", line 2528, in wsgi_app
moegoe_1 | response = self.full_dispatch_request()
moegoe_1 | File "/usr/local/lib/python3.9/site-packages/flask/app.py", line 1825, in full_dispatch_request
moegoe_1 | rv = self.handle_user_exception(e)
moegoe_1 | File "/usr/local/lib/python3.9/site-packages/flask/app.py", line 1823, in full_dispatch_request
moegoe_1 | rv = self.dispatch_request()
moegoe_1 | File "/usr/local/lib/python3.9/site-packages/flask/app.py", line 1799, in dispatch_request
moegoe_1 | return self.ensure_sync(self.view_functions[rule.endpoint])(**view_args)
moegoe_1 | File "/app/app.py", line 51, in voice_api
moegoe_1 | speaker_id = int(request.args.get("id", app.config["ID"]))
moegoe_1 | KeyError: 'ID'
moegoe_1 | INFO:werkzeug:101.33.231.208 - - [07/Apr/2023 17:20:30] "GET /voice?text=%5BLENGTH=1.4%5D&lang=zh&id=1&format=silk HTTP/1.1" 500 -
moegoe_1 | ERROR:app:Exception on /voice [GET]
moegoe_1 | Traceback (most recent call last):
moegoe_1 | File "/usr/local/lib/python3.9/site-packages/flask/app.py", line 2528, in wsgi_app
moegoe_1 | response = self.full_dispatch_request()
moegoe_1 | File "/usr/local/lib/python3.9/site-packages/flask/app.py", line 1825, in full_dispatch_request
moegoe_1 | rv = self.handle_user_exception(e)
moegoe_1 | File "/usr/local/lib/python3.9/site-packages/flask/app.py", line 1823, in full_dispatch_request
moegoe_1 | rv = self.dispatch_request()
moegoe_1 | File "/usr/local/lib/python3.9/site-packages/flask/app.py", line 1799, in dispatch_request
moegoe_1 | return self.ensure_sync(self.view_functions[rule.endpoint])(**view_args)
moegoe_1 | File "/app/app.py", line 51, in voice_api
moegoe_1 | speaker_id = int(request.args.get("id", app.config["ID"]))
moegoe_1 | KeyError: 'ID'
moegoe_1 | INFO:werkzeug:101.33.231.208 - - [07/Apr/2023 17:20:35] "GET /voice?text=%5BLENGTH=1.4%5D你好!有什么我可以帮助您的吗?&lang=zh&id=1&format=silk HTTP/1.1" 500 -
moegoe_1 | * Serving Flask app 'app'
moegoe_1 | * Debug mode: off
moegoe_1 | INFO:werkzeug:WARNING: This is a development server. Do not use it in a production deployment. Use a production WSGI server instead.
moegoe_1 | * Running on all addresses (0.0.0.0)
moegoe_1 | * Running on http://127.0.0.1:23457
moegoe_1 | * Running on http://172.21.0.2:23457
moegoe_1 | INFO:werkzeug:Press CTRL+C to quit
moegoe_1 | INFO:werkzeug:101.33.231.208 - - [07/Apr/2023 17:23:06] "POST /voice//speakers HTTP/1.1" 308 -
moegoe_1 | INFO:werkzeug:101.33.231.208 - - [07/Apr/2023 17:23:06] "POST /voice/speakers HTTP/1.1" 200 -
moegoe_1 | INFO:werkzeug:101.33.231.208 - - [07/Apr/2023 17:23:07] "POST /voice//speakers HTTP/1.1" 308 -
moegoe_1 | INFO:werkzeug:101.33.231.208 - - [07/Apr/2023 17:23:07] "POST /voice/speakers HTTP/1.1" 200 -
moegoe_1 | INFO:werkzeug:101.33.231.208 - - [07/Apr/2023 17:23:07] "GET /voice/?text=%5BLENGTH=1.4%5D&lang=zh&id=1&format=silk HTTP/1.1" 200 -
moegoe_1 | INFO:werkzeug:101.33.231.208 - - [07/Apr/2023 17:23:12] "GET /voice/?text=%5BLENGTH=1.4%5D这句话太长了,抱歉&lang=zh&id=1&format=silk HTTP/1.1" 200 -
moegoe_1 | INFO:werkzeug:101.33.231.208 - - [07/Apr/2023 17:26:46] "GET /voice/?text=%5BLENGTH=1.4%5D消息已收到!当前我还有条消息要回复,请您稍等。&lang=zh&id=1&format=silk HTTP/1.1" 200 -
moegoe_1 | INFO:werkzeug:101.33.231.208 - - [07/Apr/2023 17:27:12] "GET /voice/?text=%5BLENGTH=1.4%5D消息已收到!当前我还有条消息要回复,请您稍等。&lang=zh&id=1&format=silk HTTP/1.1" 200 -
moegoe_1 | INFO:werkzeug:101.33.231.208 - - [07/Apr/2023 17:27:27] "GET /voice/?text=%5BLENGTH=1.4%5D消息已收到!当前我还有条消息要回复,请您稍等。&lang=zh&id=1&format=silk HTTP/1.1" 200 -
moegoe_1 | * Serving Flask app 'app'
moegoe_1 | * Debug mode: off
moegoe_1 | INFO:werkzeug:WARNING: This is a development server. Do not use it in a production deployment. Use a production WSGI server instead.
moegoe_1 | * Running on all addresses (0.0.0.0)
moegoe_1 | * Running on http://127.0.0.1:23457
moegoe_1 | * Running on http://172.21.0.2:23457
moegoe_1 | INFO:werkzeug:Press CTRL+C to quit
moegoe_1 | INFO:werkzeug:101.33.231.208 - - [07/Apr/2023 17:34:12] "POST /voice//speakers HTTP/1.1" 308 -
moegoe_1 | INFO:werkzeug:101.33.231.208 - - [07/Apr/2023 17:34:12] "POST /voice/speakers HTTP/1.1" 200 -
moegoe_1 | INFO:werkzeug:101.33.231.208 - - [07/Apr/2023 17:34:12] "GET /voice/?text=%5BLENGTH=1.4%5D你好!有什么我可以帮助你的吗?&lang=zh&id=1&format=silk HTTP/1.1" 200 -

[Feature Request] mix 模式支持语言检测

请求 /voice?lang=mix 时 提供的 text 可以自动标记语言。

例如: /voice?lang=mix&text=你好用日语说是こんにちは

服务端可以自动标记成 [ZH]你好用日语说是[ZH][JA]こんにちは[JA]

如果这个功能可以在服务端完成,客户端可以少写很多代码,这个项目就可以作为一个即插即用的 API 直接使用了。

在docker上部署时出错

在docker上部署时出现以下错误:

INFO:moegoe-simple-api:角色id:0
INFO:moegoe-simple-api:合成文本:[ZH]您好!有什么我可以帮助您的吗?[ZH]
ERROR:app:Exception on /voice [GET]
Traceback (most recent call last):
File "/usr/local/lib/python3.9/site-packages/flask/app.py", line 2528, in wsgi_app
response = self.full_dispatch_request()
File "/usr/local/lib/python3.9/site-packages/flask/app.py", line 1825, in full_dispatch_request
rv = self.handle_user_exception(e)
File "/usr/local/lib/python3.9/site-packages/flask/app.py", line 1823, in full_dispatch_request
rv = self.dispatch_request()
File "/usr/local/lib/python3.9/site-packages/flask/app.py", line 1799, in dispatch_request
return self.ensure_sync(self.view_functions[rule.endpoint])(**view_args)
File "/app/app.py", line 80, in voice_api
output, file_type, fname = real_obj.generate(text=text,
File "/app/voice.py", line 100, in generate
stn_tst = self.get_text(text, self.hps_ms, cleaned=cleaned)
File "/app/voice.py", line 56, in get_text
text_norm = text_to_sequence(text, hps.symbols, hps.data.text_cleaners)
File "/app/text/init.py", line 17, in text_to_sequence
clean_text = _clean_text(text, cleaner_names)
File "/app/text/init.py", line 31, in _clean_text
text = cleaner(text)
File "/app/text/cleaners.py", line 118, in shanghainese_cleaners
from text.shanghainese import shanghainese_to_ipa
File "/app/text/shanghainese.py", line 6, in
converter = opencc.OpenCC('zaonhe')
File "/usr/local/lib/python3.9/site-packages/opencc/init.py", line 43, in init
super(OpenCC, self).init(config)
RuntimeError: /usr/local/lib/python3.9/site-packages/opencc/clib/share/opencc/zaonhe.json not found or not accessible.
INFO:werkzeug:172.30.0.1 - - [10/Apr/2023 13:28:30] "GET /voice?text=您好!有什么我可以帮助您的吗?&lang=zh&id=0&format=silk&length=1.4 HTTP/1.1" 500 -

请问是什么原因

是我的问题吗?看log

DEBUG:vits-simple-api:[GD]君不见,黄河之水天上来,奔流到海不复回。君不见,高堂明镜悲白发,朝如青丝暮成雪[GD]
ERROR:app:Exception on /voice/w2v2-vits [POST]
Traceback (most recent call last):
File "C:\Users\lin85\AppData\Local\Programs\Python\Python310\lib\site-packages\flask\app.py", line 2528, in wsgi_app
response = self.full_dispatch_request()
File "C:\Users\lin85\AppData\Local\Programs\Python\Python310\lib\site-packages\flask\app.py", line 1825, in full_dispatch_request
rv = self.handle_user_exception(e)
File "C:\Users\lin85\AppData\Local\Programs\Python\Python310\lib\site-packages\flask\app.py", line 1823, in full_dispatch_request
rv = self.dispatch_request()
File "C:\Users\lin85\AppData\Local\Programs\Python\Python310\lib\site-packages\flask\app.py", line 1799, in dispatch_request
return self.ensure_sync(self.view_functions[rule.endpoint])(**view_args)
File "G:\AI\vits接口\VITS\app.py", line 44, in check_api_key
return func(args, **kwargs)
File "G:\AI\vits接口\VITS\app.py", line 239, in voice_w2v2_api
output = tts.w2v2_vits_infer({"text": text,
File "G:\AI\vits接口\VITS\voice.py", line 454, in w2v2_vits_infer
audio = voice_obj.get_audio(voice, auto_break=True)
File "G:\AI\vits接口\VITS\voice.py", line 216, in get_audio
self.get_infer_param(text=sentence, speaker_id=speaker_id, length=length, noise=noise,
File "G:\AI\vits接口\VITS\voice.py", line 119, in get_infer_param
stn_tst = self.get_cleaned_text(text, self.hps_ms, cleaned=cleaned)
File "G:\AI\vits接口\VITS\voice.py", line 59, in get_cleaned_text
text_norm = text_to_sequence(text, hps.symbols, hps.data.text_cleaners)
File "G:\AI\vits接口\VITS\text_init_.py", line 17, in text_to_sequence
clean_text = clean_text(text, cleaner_names)
File "G:\AI\vits接口\VITS\text_init
.py", line 31, in _clean_text
text = cleaner(text)
File "G:\AI\vits接口\VITS\text\cleaners.py", line 241, in chinese_dialect_cleaners
text = re.sub(r'[GD](.
?)[GD]',
File "C:\Users\lin85\AppData\Local\Programs\Python\Python310\lib\re.py", line 209, in sub
return compile(pattern, flags).sub(repl, string, count)
File "G:\AI\vits接口\VITS\text\cleaners.py", line 242, in
lambda x: cantonese_to_ipa(x.group(1)) + ' ', text)
File "G:\AI\vits接口\VITS\text\cantonese.py", line 62, in cantonese_to_ipa
text = converter.convert(text).replace('-', '').replace('$', ' ')
File "C:\Users\lin85\AppData\Local\Programs\Python\Python310\lib\site-packages\opencc_init
.py", line 87, in convert
retv_i = libopencc.opencc_convert_utf8(self._od, text, len(text))
OSError: exception: access violation reading 0xFFFFFFFFFFFFFFFF
INFO:werkzeug:127.0.0.1 - - [31/May/2023 10:51:14] "POST /voice/w2v2-vits HTTP/1.1" 500 -
DEBUG:urllib3.connectionpool:http://127.0.0.1:23456 "POST /voice/w2v2-vits HTTP/1.1" 500 265

希望能加入mp3格式的输出

mp3作为一种常用格式能被许多应用内部打开,而且是压缩过的。用ogg一些应用内部打不开,还要转到一些外部播放器里打开有点麻烦,谢谢!

在huggingface上部署如何部署项目

在服务器上运行VITS模型需要满足较高的运行环境要求,而本地电脑不可能一直运行不关机。相比之下,Hugging Face提供的免费配置能够轻松地运行一个VITS项目。

在服务器上部署后生成的语音没有声音或者都是0秒

部署方式:docker-compose

下面是我的config.py文件内容

import os
import sys

JSON_AS_ASCII = False
MAX_CONTENT_LENGTH = 5242880

# port
PORT = 23456
# absolute path
ABS_PATH = os.path.join(os.path.dirname(os.path.realpath(sys.argv[0])))
# upload path
UPLOAD_FOLDER = ABS_PATH + "/upload"
# cahce path
CACHE_PATH = ABS_PATH + "/cache"
# zh ja ko en ...
LANGUAGE_AUTOMATIC_DETECT = ["zh","ja"]
#set to True to enable API Key authentication
API_KEY_ENABLED = False
# API_KEY is required for authentication
API_KEY = "api-key"

'''
For each model, the filling method is as follows 模型列表中每个模型的填写方法如下
example 示例:
MODEL_LIST = [
    #VITS
    [ABS_PATH+"/Model/Nene_Nanami_Rong_Tang/1374_epochs.pth", ABS_PATH+"/Model/Nene_Nanami_Rong_Tang/config.json"],
    [ABS_PATH+"/Model/Zero_no_tsukaima/1158_epochs.pth", ABS_PATH+"/Model/Zero_no_tsukaima/config.json"],
    [ABS_PATH+"/Model/g/G_953000.pth", ABS_PATH+"/Model/g/config.json"],
    #HuBert-VITS
    [ABS_PATH+"/Model/louise/360_epochs.pth", ABS_PATH+"/Model/louise/config.json", ABS_PATH+"/Model/louise/hubert-soft-0d54a1f4.pt"],
]
'''
# load mutiple models
MODEL_LIST = [
    [ABS_PATH+"/Model/Nene_Meguru_Yoshino_Mako_Myrasame_hoharu_Nanami/365_epochs.pth", ABS_PATH+"/Model/Nene_Meguru_Yoshino_Mako_Myrasame_hoharu_Nanami/config.json"],
    [ABS_PATH+"/Model/to_love/1113_epochs.pth", ABS_PATH+"/Model/to_love/config.json"],
]

"""
default params
以下选项是修改VITS GET方法 [不指定参数]时的默认值
"""

# GET 默认音色id
ID = 0
# GET 默认音频格式 可选wav,ogg,silk
FORMAT = "wav"
# GET 默认语言
LANG = "AUTO"
# GET 默认语音长度,相当于调节语速,该数值越大语速越慢
LENGTH = 1
# GET 默认噪声
NOISE = 0.667
# GET 默认噪声偏差
NOISEW = 0.8
#长文本分段阈值,max<=0表示不分段,text will not be divided if max<=0
MAX = 50

请求其他接口,比如http://127.0.0.1:23456/voice/speakers,都是正常返回的,但是请求 http://127.0.0.1:23456/voice?text=你好呀 生成的音频却只有喘息声

关于SSML语言停顿处理

你这个项目非常赞,我想请教一下,这个SSML语言相关的停顿在项目中哪里有处理?我没找到。我要是想在源vits项目自定义停顿应该如何做?麻烦你说一下思路,谢谢了。

不是很懂python求助路径问题

clone项目到服务器上后,下一步,模型库以及config.py都要放到服务器根路径的/path/to/....? 但是在app.py里读取config.py的时候却没有指定ABS_PATH,应该是直接在同级目录下加载,这样能找到吗?感觉有一点反直觉,想确认一下

发现三个bug?

我测试一下了,很棒!感谢大佬!
不过好像有三个bug(或许是我的方式不对?)
1.方言无法失败会自动转为普通话,比如我用粤语;[GD]XXXXXXXX[GD] , 无效,会读成普通话;
2.用老大的post,发现ssml 会出现这样的bug:

Traceback (most recent call last):
  File "1.py", line 5, in <module>
    voice_ssml(smm);
  File "D:\ai\vits\bmss_fy\vits-simple-api-windows\post.py", line 151, in voice_ssml
    fname = re.findall("filename=(.+)", res.headers["Content-Disposition"])[0]
  File "C:\Users\lin85\AppData\Local\Programs\Python\Python38\lib\site-packages\requests\structures.py", line 52, in __getitem__
    return self._store[key.lower()][1]
KeyError: 'content-disposition'

3.(不是bug哈哈哈)原emotion可以使用单个npy文件使用情绪,是否后续追加,也支持单个npy?

最后感谢付出。

报错 EOFError: Ran out of input

在使用python app.py时出现了报错,日志如下:
root@ecsekei:~/vits# python3 app.py
torch:2.0.0+cu117 GPU_available:False
device:cpu device.type:cpu
Traceback (most recent call last):
File "/root/vits/app.py", line 25, in
voice_obj, voice_speakers = merge_model(app.config["MODEL_LIST"])
File "/root/vits/utils/merge.py", line 53, in merge_model
obj = vits(model=i[0], config=i[1])
File "/root/vits/voice.py", line 54, in init
self.load_model(model, model_)
File "/root/vits/voice.py", line 57, in load_model
utils.load_checkpoint(model, self.net_g_ms)
File "/root/vits/utils/utils.py", line 43, in load_checkpoint
checkpoint_dict = load(checkpoint_path, map_location='cpu')
File "/usr/local/lib/python3.10/dist-packages/torch/serialization.py", line 815, in load
return _legacy_load(opened_file, map_location, pickle_module, **pickle_load_args)
File "/usr/local/lib/python3.10/dist-packages/torch/serialization.py", line 1033, in _legacy_load
magic_number = pickle_module.load(f, **pickle_load_args)
EOFError: Ran out of input

Mac M1 pip install error when installing openjtalk>=0.3.0.dev2

it seems something go wrong when installing packages:

(py30) zhonghaoli@MacBook-Pro-14-inch-2021 vits-simple-api % pip install -r requirements.txt
Defaulting to user installation because normal site-packages is not writeable
Collecting numba (from -r requirements.txt (line 1))
Using cached numba-0.57.0-cp310-cp310-macosx_11_0_arm64.whl (2.5 MB)
Collecting librosa (from -r requirements.txt (line 2))
Using cached librosa-0.10.0.post2-py3-none-any.whl (253 kB)
Collecting numpy==1.23.3 (from -r requirements.txt (line 3))
Using cached numpy-1.23.3-cp310-cp310-macosx_11_0_arm64.whl (13.3 MB)
Collecting scipy (from -r requirements.txt (line 4))
Using cached scipy-1.10.1-cp310-cp310-macosx_12_0_arm64.whl (28.8 MB)
Collecting torch (from -r requirements.txt (line 5))
Using cached torch-2.0.1-cp310-none-macosx_11_0_arm64.whl (55.8 MB)
Collecting unidecode (from -r requirements.txt (line 6))
Using cached Unidecode-1.3.6-py3-none-any.whl (235 kB)
Collecting openjtalk==0.3.0.dev2 (from -r requirements.txt (line 7))
Using cached openjtalk-0.3.0.dev2.tar.gz (24.9 MB)
Installing build dependencies ... done
Getting requirements to build wheel ... error
error: subprocess-exited-with-error

× Getting requirements to build wheel did not run successfully.
│ exit code: 1
╰─> [18 lines of output]
Traceback (most recent call last):
File "/Users/zhonghaoli/miniforge3/envs/py30/lib/python3.10/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 353, in

大佬请教一下,我用wechat机器人请求vits-simple-api,反馈语音合成失败,该如何解决。

大佬您好,最近在测试微信机器人的语音合成方案,在调用咱们家的vist-simple-api时遇到了点问题,想请教您。

前置确认
1.我确认我运行的是最新版本的代码,并且安装了所需的依赖
搜索issues中是否已存在类似问题

2.我已经搜索过issues,没有跟我遇到的问题相关的issue

操作系统类型?
Windows 10 专业版(21H2)

运行的python版本是?
python 3.10.8

复现步骤 🕹
启动终端——>激活环境变量(fort)——>python app.py——>给微信机器人发语音——>请求失败

问题描述 😯
在微信机器人项目(https://github.com/zhayujie/chatgpt-on-wechat) 中调用仓库的语音合成api,现实语音合成失败。
IMG_99673FB09C89-1

终端日志 📒
127.0.0.1 - - [14/May/2023 13:12:40] "POST /voice HTTP/1.1" 400 -

Screen Shot 2023-05-14 at 1 14 15 PM

单纯使用转化功能不训练,需要的服务器性能是什么

我服务器是centOS 8,尝试启动项目:

python3 app.py 
torch:2.0.1+cu117 GPU_available:False
device:cpu device.type:cpu

到这里CPU占用和硬盘IO飙升,很快导致io_util达到峰值,服务器无法响应了。关掉终端重新登录也无法响应,只能重启服务器。死机的时候看了一眼硬盘读取是285508kb/s,这是读取模型吗?总共我就导入了三个pth模型。
有什么好的办法吗?(已经多次尝试,必死机)

size mismatch for emb_g.weight: copying a param with shape torch.Size([5, 256]) from checkpoint, the shape in current model is torch.Size([7, 256]).

QQ截图20230406214635

config.py

import os
import sys

JSON_AS_ASCII = False
MAX_CONTENT_LENGTH = 5242880

端口

PORT = 23457

项目的绝对路径

ABS_PATH = os.path.join(os.path.dirname(os.path.realpath(sys.argv[0])))

上传文件的临时路径,非必要不要动

UPLOAD_FOLDER = ABS_PATH + "/upload"

音频转换的临时缓存路径,非必要不要动

CACHE_PATH = ABS_PATH + "/cache"

'''
vits模型路径填写方法,MODEL_LIST中的每一行是
[ABS_PATH+"/Model/{模型文件夹}/{.pth模型}", ABS_PATH+"/Model/{模型文件夹}/config.json"],
也可以写相对路径或绝对路径,由于windows和linux路径写法不同,用上面的写法或绝对路径最稳妥
示例:
MODEL_LIST = [
#VITS
[ABS_PATH+"/Model/Nene_Nanami_Rong_Tang/1374_epochs.pth", ABS_PATH+"/Model/Nene_Nanami_Rong_Tang/config.json"],
[ABS_PATH+"/Model/Zero_no_tsukaima/1158_epochs.pth", ABS_PATH+"/Model/Zero_no_tsukaima/config.json"],
[ABS_PATH+"/Model/g/G_953000.pth", ABS_PATH+"/Model/g/config.json"],
#HuBert-VITS
[ABS_PATH+"/Model/louise/360_epochs.pth", ABS_PATH+"/Model/louise/config.json", ABS_PATH+"/Model/louise/hubert-soft-0d54a1f4.pt"],
]
'''

模型加载列表

MODEL_LIST = [
[ABS_PATH+"/Model/g/1374_epochs.pth", ABS_PATH+"/Model/g/config.json"],
]

docker-compose.yaml

version: '3.4'
services:
moegoe:
image: artrajz/moegoe-simple-api:latest
restart: always
ports:
- 23457:23457
environment:
LANG: 'C.UTF-8'
volumes:
- ./Model:/app/Model # 挂载模型文件夹
- ./config.py:/app/config.py # 挂载配置文件

这是模型存放的位置
QQ截图20230406215335
是哪出问题了呢?

报错TypeError: 'type' object is not subscriptable

按步骤部署最后python app.py时报错
Traceback (most recent call last):
File "app.py", line 10, in
from utils import clean_folder, merge_model
File "C:\Users\Administrator\Desktop\MoeGoe-Simple-API\utils.py", line 120, in
def to_pcm(in_path: str) -> tuple[str, int]:
TypeError: 'type' object is not subscriptable

关于SSML语言停顿问题

你这个项目非常赞,我想请教一下,这个SSML语言相关的停顿在项目中哪里有处理?我没找到。我要是想在源vits项目自定义停顿应该如何做?麻烦你说一下思路,谢谢了。

sovits 4.0的模型无法使用

好像是配置文件不兼容导致的,请问有啥解决办法么?

我将配置文件里的"n_speakers"挪到了data里可以正常运行,但是生成wav过程中报错了 ,生成的都是1kb无法播放的文件。

请问大佬,怎么能将本项目部署在另一个主机上呢

就是两个服务器,因为我部署chat-bot的服务器配置太低了,而且硬盘容量也不够,这个项目如果部署在了另一个服务器上改怎么去访问呢,有没有什么具体的思路呀,或者我该百度些什么内容呢,问题有点低级,麻烦你了

如何实现gpu推理

我发现这个默认只能cpu推理,即便我有48核但是还是慢而且吃不满,但是我这还有3090,我想用gpu可以大幅度加速推理

对接lss233的qqbot出现问题

image
大佬,好像是因为/voice/speakers id的返回值格式变了,qqbot的切换语音找不到对应的音色了,这个我直接改代码然后重新打包镜像就可以了吗

你好 关于请求报错 列表正常输出 但是语音无法合成 合成出的语音仅1kb 0s 无法播放

报错代码如下

[E 230613 19:09:03 fastlid:170] fastlid.set_languages is not a list
[JA]こんにちは[JA]
ERROR:app:Exception on /voice [POST]
Traceback (most recent call last):
File "E:\aila\API\vits-simple-api-windows_3\vits-simple-api\py310\lib\site-packages\flask\app.py", line 2528, in wsgi_app
response = self.full_dispatch_request()
File "E:\aila\API\vits-simple-api-windows_3\vits-simple-api\py310\lib\site-packages\flask\app.py", line 1825, in full_dispatch_request
rv = self.handle_user_exception(e)
File "E:\aila\API\vits-simple-api-windows_3\vits-simple-api\py310\lib\site-packages\flask\app.py", line 1823, in full_dispatch_request
rv = self.dispatch_request()
File "E:\aila\API\vits-simple-api-windows_3\vits-simple-api\py310\lib\site-packages\flask\app.py", line 1799, in dispatch_request
return self.ensure_sync(self.view_functions[rule.endpoint])(**view_args)
File "E:\aila\API\vits-simple-api-windows_3\vits-simple-api\app.py", line 37, in check_api_key
return func(*args, **kwargs)
File "E:\aila\API\vits-simple-api-windows_3\vits-simple-api\app.py", line 136, in voice_api
output = real_obj.create_infer_task(text=text,
File "E:\aila\API\vits-simple-api-windows_3\vits-simple-api\voice.py", line 211, in create_infer_task
self.get_infer_param(text=sentence, speaker_id=speaker_id, length=length, noise=noise,
File "E:\aila\API\vits-simple-api-windows_3\vits-simple-api\voice.py", line 147, in get_infer_param
stn_tst = self.get_cleaned_text(text, self.hps_ms, cleaned=cleaned)
File "E:\aila\API\vits-simple-api-windows_3\vits-simple-api\voice.py", line 71, in get_cleaned_text
text_norm = text_to_sequence(text, hps.symbols, hps.data.text_cleaners)
File "E:\aila\API\vits-simple-api-windows_3\vits-simple-api\text_init_.py", line 17, in text_to_sequence
clean_text = clean_text(text, cleaner_names)
File "E:\aila\API\vits-simple-api-windows_3\vits-simple-api\text_init
.py", line 28, in _clean_text
cleaner = getattr(cleaners, name)
AttributeError: module 'text.cleaners' has no attribute 'custom_cleaners'
INFO:werkzeug:127.0.0.1 - - [13/Jun/2023 19:09:03] "POST /voice HTTP/1.1" 500 -

({JC}3SG@JE~ J}3K IY(G7

加入amr格式音频输出

部分平台(如qq)需要使用amr格式才能发送音频,使用其他格式需要用ffmpeg转码成amr,不如在服务端自动转码,省掉前端流程

docker 一键部署问题

我无法通过那个脚本部署docker镜像,我看了那个脚本里的内容 docker compose pull 后面什么都没有,是还没开发吗

Docker如何启用GPU加速

我尝试了如下的配置

version: '3.4'
services:
  vits:
    image: vits-simple-api:latest
    restart: always
    ports:
      - 23456:23456
    environment:
      LANG: 'C.UTF-8'
      TZ: Asia/Shanghai #timezone
    volumes:
      - ./Model:/app/Model # 挂载模型文件夹
      - ./config.py:/app/config.py # 挂载配置文件
      - /opt/cuda:/opt/cuda
      - ./cuda_test.py:/app/cuda_test.py
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: 1
              capabilities: [ gpu,utility ]

但是torch.cuda.is_available()是False。

nvidia-smi是可用的。

我在Dockerfile中删去了RUN pip install torch --index-url https://download.pytorch.org/whl/cpu改为了RUN pip install torch.

之后我在容器中将cuda添加到PATH中,使得nvcc -v也可用,但是还是没有解决问题。

getattr()报错

运行app.py后,getattr返回一个报错.提示 attribute name must be string.
`MW 3%WH ES@H8ID%~T@FP5
配置文件路径设置如下:
MODEL_LIST = [
# VITS
[ABS_PATH + "/Model/Xin/G_32000.pth", ABS_PATH + "/Model/Xin/config.json"],
#[ABS_PATH + "/Model/Zero_no_tsukaima/1158_epochs.pth", ABS_PATH + "/Model/Zero_no_tsukaima/config.json"],
#[ABS_PATH + "/Model/g/G_953000.pth", ABS_PATH + "/Model/g/config.json"],
# HuBert-VITS (Need to configure HUBERT_SOFT_MODEL)
#[ABS_PATH + "/Model/louise/360_epochs.pth", ABS_PATH + "/Model/louise/config.json"],
# W2V2-VITS (Need to configure DIMENSIONAL_EMOTION_NPY)
#[ABS_PATH + "/Model/w2v2-vits/1026_epochs.pth", ABS_PATH + "/Model/w2v2-vits/config.json"],
]

宝子,好像有一个bug

当我测试这个的时候,它自动把一些词识别为日语。
并且就算我在加上[ZH]包裹也还是复现。

log如下:

DEBUG:vits-simple-api:[[EN]ZH][EN][ZH] 君不见,[ZH][JA]黄河之水天上来,[JA][ZH]奔流到海不复回。君不见,高堂明镜悲白发, 朝如青丝暮成雪[[ZH][EN]ZH][EN]

VITS without [JA]

Hi. Thanks for your contribution to the project. I had an issue using your project with a trained model from vits-finetuning. It speaks that [JA] in the generated audio file. Is there a way to prevent that? Thanks.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.