Code Monkey home page Code Monkey logo

Comments (25)

Artrajz avatar Artrajz commented on July 21, 2024

模型是其他大佬做的,你可以在这里找到我所使用的模型
https://github.com/CjangCjengh/TTSModels
https://huggingface.co/spaces/zomehwh/vits-uma-genshin-honkai/tree/main/model

from vits-simple-api.

jwister avatar jwister commented on July 21, 2024

from vits-simple-api.

jwister avatar jwister commented on July 21, 2024

感谢,已经用起来了,现在的问题是在哪里开启gpu加速啊?有配置项吗?

from vits-simple-api.

Artrajz avatar Artrajz commented on July 21, 2024

感谢,已经用起来了,现在的问题是在哪里开启gpu加速啊?有配置项吗?

需要安装cuda和gpu版pytorch,安装好后会自动调用gpu。

from vits-simple-api.

jwister avatar jwister commented on July 21, 2024

安装好了,电脑也重启了,重新跑的时候还是没显示启用gpu加速。后面吧集成显卡禁用了,也没用,是我的显卡是mx250 是需要设置哪里吗?

from vits-simple-api.

Artrajz avatar Artrajz commented on July 21, 2024

安装好了,电脑也重启了,重新跑的时候还是没显示启用gpu加速。后面吧集成显卡禁用了,也没用,是我的显卡是mx250 是需要设置哪里吗?

你验证下cuda是否安装成功,mx250应该要找对应的版本安装。然后是pytorch,在vits启动时会打印pytorch版本信息,版本是x.x.x+cu1xx(x是数字)的才是可以使用cuda的。

from vits-simple-api.

cgnannan avatar cgnannan commented on July 21, 2024

模型是其他大佬做的,你可以在这里找到我所使用的模型 https://github.com/CjangCjengh/TTSModels https://huggingface.co/spaces/zomehwh/vits-uma-genshin-honkai/tree/main/model

大佬,目前vits有英语模型么?今天在网上搜了半天,也没找到英语Model。

from vits-simple-api.

Artrajz avatar Artrajz commented on July 21, 2024

https://github.com/jaywalnut310/vits
原仓库有英语模型,不过需要稍微改下代码并另外安装espeak才能使用

from vits-simple-api.

Artrajz avatar Artrajz commented on July 21, 2024

305cae8
在对应的json文件中添加以下两行才能使用

"speakers": ["vctk"],
"symbols":  ["_", ";", ":", ",", ".", "!", "?", "¡", "¿", "—", "…", "\"", "«", "»", "“", "”", " ", "A", "B", "C", "D", "E", "F", "G", "H", "I", "J", "K", "L", "M", "N", "O", "P", "Q", "R", "S", "T", "U", "V", "W", "X", "Y", "Z", "a", "b", "c", "d", "e", "f", "g", "h", "i", "j", "k", "l", "m", "n", "o", "p", "q", "r", "s", "t", "u", "v", "w", "x", "y", "z", "ɑ", "ɐ", "ɒ", "æ", "ɓ", "ʙ", "β", "ɔ", "ɕ", "ç", "ɗ", "ɖ", "ð", "ʤ", "ə", "ɘ", "ɚ", "ɛ", "ɜ", "ɝ", "ɞ", "ɟ", "ʄ", "ɡ", "ɠ", "ɢ", "ʛ", "ɦ", "ɧ", "ħ", "ɥ", "ʜ", "ɨ", "ɪ", "ʝ", "ɭ", "ɬ", "ɫ", "ɮ", "ʟ", "ɱ", "ɯ", "ɰ", "ŋ", "ɳ", "ɲ", "ɴ", "ø", "ɵ", "ɸ", "θ", "œ", "ɶ", "ʘ", "ɹ", "ɺ", "ɾ", "ɻ", "ʀ", "ʁ", "ɽ", "ʂ", "ʃ", "ʈ", "ʧ", "ʉ", "ʊ", "ʋ", "ⱱ", "ʌ", "ɣ", "ɤ", "ʍ", "χ", "ʎ", "ʏ", "ʑ", "ʐ", "ʒ", "ʔ", "ʡ", "ʕ", "ʢ", "ǀ", "ǁ", "ǂ", "ǃ", "ˈ", "ˌ", "ː", "ˑ", "ʼ", "ʴ", "ʰ", "ʱ", "ʲ", "ʷ", "ˠ", "ˤ", "˞", "↓", "↑", "→", "↗", "↘", "'", "̩", "'", "ᵻ"]

from vits-simple-api.

cgnannan avatar cgnannan commented on July 21, 2024

305cae8 在对应的json文件中添加以下两行才能使用

"speakers": ["vctk"],
"symbols":  ["_", ";", ":", ",", ".", "!", "?", "¡", "¿", "—", "…", "\"", "«", "»", "“", "”", " ", "A", "B", "C", "D", "E", "F", "G", "H", "I", "J", "K", "L", "M", "N", "O", "P", "Q", "R", "S", "T", "U", "V", "W", "X", "Y", "Z", "a", "b", "c", "d", "e", "f", "g", "h", "i", "j", "k", "l", "m", "n", "o", "p", "q", "r", "s", "t", "u", "v", "w", "x", "y", "z", "ɑ", "ɐ", "ɒ", "æ", "ɓ", "ʙ", "β", "ɔ", "ɕ", "ç", "ɗ", "ɖ", "ð", "ʤ", "ə", "ɘ", "ɚ", "ɛ", "ɜ", "ɝ", "ɞ", "ɟ", "ʄ", "ɡ", "ɠ", "ɢ", "ʛ", "ɦ", "ɧ", "ħ", "ɥ", "ʜ", "ɨ", "ɪ", "ʝ", "ɭ", "ɬ", "ɫ", "ɮ", "ʟ", "ɱ", "ɯ", "ɰ", "ŋ", "ɳ", "ɲ", "ɴ", "ø", "ɵ", "ɸ", "θ", "œ", "ɶ", "ʘ", "ɹ", "ɺ", "ɾ", "ɻ", "ʀ", "ʁ", "ɽ", "ʂ", "ʃ", "ʈ", "ʧ", "ʉ", "ʊ", "ʋ", "ⱱ", "ʌ", "ɣ", "ɤ", "ʍ", "χ", "ʎ", "ʏ", "ʑ", "ʐ", "ʒ", "ʔ", "ʡ", "ʕ", "ʢ", "ǀ", "ǁ", "ǂ", "ǃ", "ˈ", "ˌ", "ː", "ˑ", "ʼ", "ʴ", "ʰ", "ʱ", "ʲ", "ʷ", "ˠ", "ˤ", "˞", "↓", "↑", "→", "↗", "↘", "'", "̩", "'", "ᵻ"]

谢谢大佬,我试试去

from vits-simple-api.

cgnannan avatar cgnannan commented on July 21, 2024

https://github.com/jaywalnut310/vits 原仓库有英语模型,不过需要稍微改下代码并另外安装espeak才能使用

大佬,看了原仓库的README,是否要拿数据集训练,才能得到英语语音模型?在仓库文件夹里,没找到Model文件

"3.Download datasets
Download and extract the LJ Speech dataset, then rename or create a link to the dataset folder: ln -s /path/to/LJSpeech-1.1/wavs DUMMY1
For mult-speaker setting, download and extract the VCTK dataset, and downsample wav files to 22050 Hz. Then rename or create a link to the dataset folder: ln -s /path/to/VCTK-Corpus/downsampled_wavs DUMMY2

4.Build Monotonic Alignment Search and run preprocessing if you use your own datasets."

from vits-simple-api.

Artrajz avatar Artrajz commented on July 21, 2024

不用,他在README里提供了预训练模型的下载链接,你可以直接使用预训练模型,或者在这个模型上继续训练。而json文件可以在仓库的configs里找到

from vits-simple-api.

cgnannan avatar cgnannan commented on July 21, 2024

305cae8 在对应的json文件中添加以下两行才能使用

"speakers": ["vctk"],
"symbols":  ["_", ";", ":", ",", ".", "!", "?", "¡", "¿", "—", "…", "\"", "«", "»", "“", "”", " ", "A", "B", "C", "D", "E", "F", "G", "H", "I", "J", "K", "L", "M", "N", "O", "P", "Q", "R", "S", "T", "U", "V", "W", "X", "Y", "Z", "a", "b", "c", "d", "e", "f", "g", "h", "i", "j", "k", "l", "m", "n", "o", "p", "q", "r", "s", "t", "u", "v", "w", "x", "y", "z", "ɑ", "ɐ", "ɒ", "æ", "ɓ", "ʙ", "β", "ɔ", "ɕ", "ç", "ɗ", "ɖ", "ð", "ʤ", "ə", "ɘ", "ɚ", "ɛ", "ɜ", "ɝ", "ɞ", "ɟ", "ʄ", "ɡ", "ɠ", "ɢ", "ʛ", "ɦ", "ɧ", "ħ", "ɥ", "ʜ", "ɨ", "ɪ", "ʝ", "ɭ", "ɬ", "ɫ", "ɮ", "ʟ", "ɱ", "ɯ", "ɰ", "ŋ", "ɳ", "ɲ", "ɴ", "ø", "ɵ", "ɸ", "θ", "œ", "ɶ", "ʘ", "ɹ", "ɺ", "ɾ", "ɻ", "ʀ", "ʁ", "ɽ", "ʂ", "ʃ", "ʈ", "ʧ", "ʉ", "ʊ", "ʋ", "ⱱ", "ʌ", "ɣ", "ɤ", "ʍ", "χ", "ʎ", "ʏ", "ʑ", "ʐ", "ʒ", "ʔ", "ʡ", "ʕ", "ʢ", "ǀ", "ǁ", "ǂ", "ǃ", "ˈ", "ˌ", "ː", "ˑ", "ʼ", "ʴ", "ʰ", "ʱ", "ʲ", "ʷ", "ˠ", "ˤ", "˞", "↓", "↑", "→", "↗", "↘", "'", "̩", "'", "ᵻ"]

谢谢大佬,我试试去

大佬,我先git pull了您最新的代码。然后加载了预训练模型,也将您这两行代码更新到模型对应的json文件,python.app后报出以下错误

(fort) E:\Fort\WechatBot\vits-simple-api>python app.py
INFO:root:Loaded checkpoint 'E:\Fort\WechatBot\vits-simple-api/Model/Nene_Nanami_Rong_Tang/1374_epochs.pth' (iteration None)
INFO:root:Loaded checkpoint 'E:\Fort\WechatBot\vits-simple-api/Model/Zero_no_tsukaima/1158_epochs.pth' (iteration None)
INFO:root:Loaded checkpoint 'E:\Fort\WechatBot\vits-simple-api/Model/g/G_953000.pth' (iteration 630)
INFO:root:Loaded checkpoint 'E:\Fort\WechatBot\vits-simple-api/Model/Voistock/547_epochs.pth' (iteration None)
INFO:root:Loaded checkpoint 'E:\Fort\WechatBot\vits-simple-api/Model/ljs/ljs.pth' (iteration 0)
Traceback (most recent call last):
File "E:\Fort\WechatBot\vits-simple-api\fort\lib\site-packages\torch\serialization.py", line 354, in _check_seekable
f.seek(f.tell())
AttributeError: 'NoneType' object has no attribute 'seek'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "E:\Fort\WechatBot\vits-simple-api\app.py", line 28, in
tts = merge_model(app.config["MODEL_LIST"])
File "E:\Fort\WechatBot\vits-simple-api\utils\merge.py", line 55, in merge_model
obj = vits(model=i[0], config=i[1])
File "E:\Fort\WechatBot\vits-simple-api\voice.py", line 54, in init
self.load_model(model, model_)
File "E:\Fort\WechatBot\vits-simple-api\voice.py", line 61, in load_model
self.hubert = hubert_soft(model_)
File "E:\Fort\WechatBot\vits-simple-api\hubert_model.py", line 217, in hubert_soft
checkpoint = torch.load(path)
File "E:\Fort\WechatBot\vits-simple-api\fort\lib\site-packages\torch\serialization.py", line 791, in load
with _open_file_like(f, 'rb') as opened_file:
File "E:\Fort\WechatBot\vits-simple-api\fort\lib\site-packages\torch\serialization.py", line 276, in _open_file_like
return _open_buffer_reader(name_or_buffer)
File "E:\Fort\WechatBot\vits-simple-api\fort\lib\site-packages\torch\serialization.py", line 261, in init
_check_seekable(buffer)
File "E:\Fort\WechatBot\vits-simple-api\fort\lib\site-packages\torch\serialization.py", line 357, in _check_seekable
raise_err_msg(["seek", "tell"], e)
File "E:\Fort\WechatBot\vits-simple-api\fort\lib\site-packages\torch\serialization.py", line 350, in raise_err_msg
raise type(e)(msg)
AttributeError: 'NoneType' object has no attribute 'seek'. You can only torch.load from a file that is seekable. Please pre-load the data into a buffer like io.BytesIO and try to load from it instead.

我将这两行代码放到了"train"下,是不是我放的位置不对
Screen Shot 2023-05-15 at 11 41 11 PM

另外,他的configs中,ljs模型有两个json文件,我用的ljs_base.json
Screen Shot 2023-05-15 at 11 44 10 PM

MODEL_LIST也同步更新了
Screen Shot 2023-05-15 at 11 45 53 PM

from vits-simple-api.

Artrajz avatar Artrajz commented on July 21, 2024

train data model并列,可以参考其他的config.json,我还是贴一份在这里吧,改的是vctk_base.json

{
  "train": {
    "log_interval": 200,
    "eval_interval": 1000,
    "seed": 1234,
    "epochs": 10000,
    "learning_rate": 2e-4,
    "betas": [0.8, 0.99],
    "eps": 1e-9,
    "batch_size": 64,
    "fp16_run": true,
    "lr_decay": 0.999875,
    "segment_size": 8192,
    "init_lr_ratio": 1,
    "warmup_epochs": 0,
    "c_mel": 45,
    "c_kl": 1.0
  },
  "data": {
    "training_files":"filelists/vctk_audio_sid_text_train_filelist.txt.cleaned",
    "validation_files":"filelists/vctk_audio_sid_text_val_filelist.txt.cleaned",
    "text_cleaners":["english_cleaners2"],
    "max_wav_value": 32768.0,
    "sampling_rate": 22050,
    "filter_length": 1024,
    "hop_length": 256,
    "win_length": 1024,
    "n_mel_channels": 80,
    "mel_fmin": 0.0,
    "mel_fmax": null,
    "add_blank": true,
    "n_speakers": 109,
    "cleaned_text": true
  },
  "model": {
    "inter_channels": 192,
    "hidden_channels": 192,
    "filter_channels": 768,
    "n_heads": 2,
    "n_layers": 6,
    "kernel_size": 3,
    "p_dropout": 0.1,
    "resblock": "1",
    "resblock_kernel_sizes": [3,7,11],
    "resblock_dilation_sizes": [[1,3,5], [1,3,5], [1,3,5]],
    "upsample_rates": [8,8,2,2],
    "upsample_initial_channel": 512,
    "upsample_kernel_sizes": [16,16,4,4],
    "n_layers_q": 3,
    "use_spectral_norm": false,
    "gin_channels": 256
  },
  "speakers": ["vctk"],
  "symbols":  ["_", ";", ":", ",", ".", "!", "?", "¡", "¿", "—", "…", "\"", "«", "»", "“", "”", " ", "A", "B", "C", "D", "E", "F", "G", "H", "I", "J", "K", "L", "M", "N", "O", "P", "Q", "R", "S", "T", "U", "V", "W", "X", "Y", "Z", "a", "b", "c", "d", "e", "f", "g", "h", "i", "j", "k", "l", "m", "n", "o", "p", "q", "r", "s", "t", "u", "v", "w", "x", "y", "z", "ɑ", "ɐ", "ɒ", "æ", "ɓ", "ʙ", "β", "ɔ", "ɕ", "ç", "ɗ", "ɖ", "ð", "ʤ", "ə", "ɘ", "ɚ", "ɛ", "ɜ", "ɝ", "ɞ", "ɟ", "ʄ", "ɡ", "ɠ", "ɢ", "ʛ", "ɦ", "ɧ", "ħ", "ɥ", "ʜ", "ɨ", "ɪ", "ʝ", "ɭ", "ɬ", "ɫ", "ɮ", "ʟ", "ɱ", "ɯ", "ɰ", "ŋ", "ɳ", "ɲ", "ɴ", "ø", "ɵ", "ɸ", "θ", "œ", "ɶ", "ʘ", "ɹ", "ɺ", "ɾ", "ɻ", "ʀ", "ʁ", "ɽ", "ʂ", "ʃ", "ʈ", "ʧ", "ʉ", "ʊ", "ʋ", "ⱱ", "ʌ", "ɣ", "ɤ", "ʍ", "χ", "ʎ", "ʏ", "ʑ", "ʐ", "ʒ", "ʔ", "ʡ", "ʕ", "ʢ", "ǀ", "ǁ", "ǂ", "ǃ", "ˈ", "ˌ", "ː", "ˑ", "ʼ", "ʴ", "ʰ", "ʱ", "ʲ", "ʷ", "ˠ", "ˤ", "˞", "↓", "↑", "→", "↗", "↘", "'", "̩", "'", "ᵻ"]
}

from vits-simple-api.

cgnannan avatar cgnannan commented on July 21, 2024

train data model并列,可以参考其他的config.json,我还是贴一份在这里吧,改的是vctk_base.json

{
  "train": {
    "log_interval": 200,
    "eval_interval": 1000,
    "seed": 1234,
    "epochs": 10000,
    "learning_rate": 2e-4,
    "betas": [0.8, 0.99],
    "eps": 1e-9,
    "batch_size": 64,
    "fp16_run": true,
    "lr_decay": 0.999875,
    "segment_size": 8192,
    "init_lr_ratio": 1,
    "warmup_epochs": 0,
    "c_mel": 45,
    "c_kl": 1.0
  },
  "data": {
    "training_files":"filelists/vctk_audio_sid_text_train_filelist.txt.cleaned",
    "validation_files":"filelists/vctk_audio_sid_text_val_filelist.txt.cleaned",
    "text_cleaners":["english_cleaners2"],
    "max_wav_value": 32768.0,
    "sampling_rate": 22050,
    "filter_length": 1024,
    "hop_length": 256,
    "win_length": 1024,
    "n_mel_channels": 80,
    "mel_fmin": 0.0,
    "mel_fmax": null,
    "add_blank": true,
    "n_speakers": 109,
    "cleaned_text": true
  },
  "model": {
    "inter_channels": 192,
    "hidden_channels": 192,
    "filter_channels": 768,
    "n_heads": 2,
    "n_layers": 6,
    "kernel_size": 3,
    "p_dropout": 0.1,
    "resblock": "1",
    "resblock_kernel_sizes": [3,7,11],
    "resblock_dilation_sizes": [[1,3,5], [1,3,5], [1,3,5]],
    "upsample_rates": [8,8,2,2],
    "upsample_initial_channel": 512,
    "upsample_kernel_sizes": [16,16,4,4],
    "n_layers_q": 3,
    "use_spectral_norm": false,
    "gin_channels": 256
  },
  "speakers": ["vctk"],
  "symbols":  ["_", ";", ":", ",", ".", "!", "?", "¡", "¿", "—", "…", "\"", "«", "»", "“", "”", " ", "A", "B", "C", "D", "E", "F", "G", "H", "I", "J", "K", "L", "M", "N", "O", "P", "Q", "R", "S", "T", "U", "V", "W", "X", "Y", "Z", "a", "b", "c", "d", "e", "f", "g", "h", "i", "j", "k", "l", "m", "n", "o", "p", "q", "r", "s", "t", "u", "v", "w", "x", "y", "z", "ɑ", "ɐ", "ɒ", "æ", "ɓ", "ʙ", "β", "ɔ", "ɕ", "ç", "ɗ", "ɖ", "ð", "ʤ", "ə", "ɘ", "ɚ", "ɛ", "ɜ", "ɝ", "ɞ", "ɟ", "ʄ", "ɡ", "ɠ", "ɢ", "ʛ", "ɦ", "ɧ", "ħ", "ɥ", "ʜ", "ɨ", "ɪ", "ʝ", "ɭ", "ɬ", "ɫ", "ɮ", "ʟ", "ɱ", "ɯ", "ɰ", "ŋ", "ɳ", "ɲ", "ɴ", "ø", "ɵ", "ɸ", "θ", "œ", "ɶ", "ʘ", "ɹ", "ɺ", "ɾ", "ɻ", "ʀ", "ʁ", "ɽ", "ʂ", "ʃ", "ʈ", "ʧ", "ʉ", "ʊ", "ʋ", "ⱱ", "ʌ", "ɣ", "ɤ", "ʍ", "χ", "ʎ", "ʏ", "ʑ", "ʐ", "ʒ", "ʔ", "ʡ", "ʕ", "ʢ", "ǀ", "ǁ", "ǂ", "ǃ", "ˈ", "ˌ", "ː", "ˑ", "ʼ", "ʴ", "ʰ", "ʱ", "ʲ", "ʷ", "ˠ", "ˤ", "˞", "↓", "↑", "→", "↗", "↘", "'", "̩", "'", "ᵻ"]
}

大佬,服务成功打开了,我发了一条语音请求,报错提示espeak没安装

(fort) E:\Fort\WechatBot\vits-simple-api>python app.py
INFO:root:Loaded checkpoint 'E:\Fort\WechatBot\vits-simple-api/Model/ljs/ljs.pth' (iteration 0)
INFO:root:Loaded checkpoint 'E:\Fort\WechatBot\vits-simple-api/Model/vctk/vctk.pth' (iteration 0)
INFO:vits-simple-api:torch:2.0.1+cpu cuda_available:False
INFO:vits-simple-api:device:cpu device.type:cpu
INFO:vits-simple-api:Loaded 2 speakers
INFO:apscheduler.scheduler:Added job "clean_task" to job store "default"
DEBUG:apscheduler.scheduler:Looking for jobs to run
DEBUG:apscheduler.scheduler:Next wakeup is due at 2023-05-16 01:16:36.449880+08:00 (in 3599.999002 seconds)

  • Serving Flask app 'app'
  • Debug mode: off
    INFO:werkzeug:WARNING: This is a development server. Do not use it in a production deployment. Use a production WSGI server instead.
  • Running on all addresses (0.0.0.0)
  • Running on http://127.0.0.1:23456
  • Running on http://192.168.1.52:23456
    INFO:werkzeug:Press CTRL+C to quit
    INFO:werkzeug:127.0.0.1 - - [16/May/2023 00:16:47] "GET /voice/speakers HTTP/1.1" 200 -
    INFO:vits-simple-api:[VITS] id:0 format:wav lang:auto length:1.0 noise:0.667 noisew:0.8
    INFO:vits-simple-api:[VITS] len:41 text:Good evening! How can I assist you today?
    DEBUG:vits-simple-api:[EN]Good evening! How can I assist you today?[EN]
    ERROR:app:Exception on /voice [POST]
    Traceback (most recent call last):
    File "E:\Fort\WechatBot\vits-simple-api\fort\lib\site-packages\flask\app.py", line 2528, in wsgi_app
    response = self.full_dispatch_request()
    File "E:\Fort\WechatBot\vits-simple-api\fort\lib\site-packages\flask\app.py", line 1825, in full_dispatch_request
    rv = self.handle_user_exception(e)
    File "E:\Fort\WechatBot\vits-simple-api\fort\lib\site-packages\flask\app.py", line 1823, in full_dispatch_request
    rv = self.dispatch_request()
    File "E:\Fort\WechatBot\vits-simple-api\fort\lib\site-packages\flask\app.py", line 1799, in dispatch_request
    return self.ensure_sync(self.view_functions[rule.endpoint])(**view_args)
    File "E:\Fort\WechatBot\vits-simple-api\app.py", line 38, in check_api_key
    return func(*args, **kwargs)
    File "E:\Fort\WechatBot\vits-simple-api\app.py", line 113, in voice_vits_api
    output = tts.vits_infer({"text": text,
    File "E:\Fort\WechatBot\vits-simple-api\voice.py", line 435, in vits_infer
    audio = voice_obj.get_audio(voice, auto_break=True)
    File "E:\Fort\WechatBot\vits-simple-api\voice.py", line 205, in get_audio
    self.get_infer_param(text=sentence, speaker_id=speaker_id, length=length, noise=noise,
    File "E:\Fort\WechatBot\vits-simple-api\voice.py", line 131, in get_infer_param
    stn_tst = self.get_cleaned_text(text, self.hps_ms, cleaned=cleaned)
    File "E:\Fort\WechatBot\vits-simple-api\voice.py", line 71, in get_cleaned_text
    text_norm = text_to_sequence(text, hps.symbols, hps.data.text_cleaners)
    File "E:\Fort\WechatBot\vits-simple-api\text_init_.py", line 17, in text_to_sequence
    clean_text = clean_text(text, cleaner_names)
    File "E:\Fort\WechatBot\vits-simple-api\text_init
    .py", line 31, in _clean_text
    text = cleaner(text)
    File "E:\Fort\WechatBot\vits-simple-api\text\cleaners.py", line 62, in english_cleaners2
    phonemes = phonemize(text, language='en-us', backend='espeak', strip=True, preserve_punctuation=True,
    File "E:\Fort\WechatBot\vits-simple-api\fort\lib\site-packages\phonemizer\phonemize.py", line 206, in phonemize
    phonemizer = BACKENDS[backend](
    File "E:\Fort\WechatBot\vits-simple-api\fort\lib\site-packages\phonemizer\backend\espeak\espeak.py", line 45, in init
    super().init(
    File "E:\Fort\WechatBot\vits-simple-api\fort\lib\site-packages\phonemizer\backend\espeak\base.py", line 39, in init
    super().init(
    File "E:\Fort\WechatBot\vits-simple-api\fort\lib\site-packages\phonemizer\backend\base.py", line 77, in init
    raise RuntimeError( # pragma: nocover
    RuntimeError: espeak not installed on your system

我已经装了espeak,也打开了。
Screen Shot 2023-05-16 at 12 30 50 AM

from vits-simple-api.

Artrajz avatar Artrajz commented on July 21, 2024

在config.py中填写espeak的dll路径即可解决
例如Windows下路径为C:\Program Files\eSpeak NG\libespeak-ng.dll

from vits-simple-api.

cgnannan avatar cgnannan commented on July 21, 2024

espeak

大佬,好像我下载的espeak版本不对,在对应目录下没有找到libespeak-ng.dll文件
在网上搜了一下,没找到Win10系统的eSpeak NG的exe安装文件,

找到了python依赖库github仓库

from vits-simple-api.

Artrajz avatar Artrajz commented on July 21, 2024

这个是win10可用的安装文件
https://github.com/espeak-ng/espeak-ng/releases/download/1.51/espeak-ng-X64.msi

from vits-simple-api.

cgnannan avatar cgnannan commented on July 21, 2024

这个是win10可用的安装文件 https://github.com/espeak-ng/espeak-ng/releases/download/1.51/espeak-ng-X64.msi

大佬,成功啦,可以接到请求并返回语音啦。试了id0和id1都是女声,是不是ljs和vctk这两个模型都是女声?想要英文男声,应该去哪里找啊?

from vits-simple-api.

Artrajz avatar Artrajz commented on July 21, 2024

看了下vctk的json中有109个speaker,你可以将json里的speaker名称任意补全,例如"speakers": ["vctk1","vctk2","vctk3"],然后再进行挑选。我试了下这个模型中id1就是男声。

from vits-simple-api.

cgnannan avatar cgnannan commented on July 21, 2024

看了下vctk的json中有109个speaker,你可以将json里的speaker名称任意补全,例如"speakers": ["vctk1","vctk2","vctk3"],然后再进行挑选。我试了下这个模型中id1就是男声。

大佬,我按照您的方案,已经将vctk1,vctk2,vctk3加载到config.json里了,运行服务后,在页面端(http://127.0.0.1:23456/voice/speakers)可以看到不同id号了

无标题

是否意味着已经成功加载vctk的3个音库了,我只要在咱们仓库的config.py里改id选出来男声就可以了。

无标题

from vits-simple-api.

Artrajz avatar Artrajz commented on July 21, 2024

是的,其实你可以通过在请求的时候指定id,就不用通过更改config.py来切换speaker

from vits-simple-api.

cgnannan avatar cgnannan commented on July 21, 2024

是的,其实你可以通过在请求的时候指定id,就不用通过更改config.py来切换speaker

懂了,谢谢大佬。

from vits-simple-api.

gzmasterpulse avatar gzmasterpulse commented on July 21, 2024

在config.py中填写espeak的dll路径即可解决 例如Windows下路径为C:\Program Files\eSpeak NG\libespeak-ng.dll

大佬,在Linux环的Docker部署如何配置和安装espeak

from vits-simple-api.

Artrajz avatar Artrajz commented on July 21, 2024

docker里我应该写了安装espeak-ng命令,你可以在docker容器终端里输入espeak-ng --version确认是否安装。linux环境下安装espeak会自动配置环境变量,所以不需要手动配置dll路径,直接使用就可以了。

from vits-simple-api.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.