Code Monkey home page Code Monkey logo

neuralspeech's Introduction

NeuralSpeech

NeuralSpeech is a research project at Microsoft Research Asia, which focuses on neural network based speech processing, including automatic speech recognition (ASR), text-to-speech synthesis (TTS), spatial audio synthesis, video dubbing, etc.

Currently this repo covers several research work:

For more research in NeuralSpeech project, you can refer to this page: https://speechresearch.github.io/. We will release more research work in the future.

For our research on AI music, you can refer to our Muzic project: https://github.com/microsoft/muzic.

We are hiring!

We are hiring researchers on speech (speech synthesis, speech recognition, voice conversion, audio processing), natural language processing, and machine learning. Please contact Xu Tan ([email protected]) if you have interests.

Reference

If you find NeuralSpeech project useful in your work, you can cite the following papers:

  • [1] FastCorrect: Fast Error Correction with Edit Alignment for Automatic Speech Recognition, Yichong Leng, Xu Tan, Linchen Zhu, Jin Xu, Renqian Luo, Linquan Liu, Tao Qin, Xiang-Yang Li, Ed Lin and Tie-Yan Liu, NeurIPS 2021.
  • [2] FastCorrect 2: Fast Error Correction on Multiple Candidates for Automatic Speech Recognition, Yichong Leng, Xu Tan, Rui Wang, Linchen Zhu, Jin Xu, Wenjie Liu, Linquan Liu, Tao Qin, Xiang-Yang Li, Ed Lin, Tie-Yan Liu, Findings of EMNLP 2021.
  • [3] SoftCorrect: Error Correction with Soft Detection for Automatic Speech Recognition, Yichong Leng, Xu Tan, Wenjie Liu, Kaitao Song, Rui Wang, Xiang-Yang Li, Tao Qin, Edward Lin, Tie-Yan Liu, AAAI 2023.
  • [4] [MaskCorrect] Mask the Correct Tokens: An Embarrassingly Simple Approach for Error Correction, Kai Shen, Yichong Leng, Xu Tan, Siliang Tang, Yuan Zhang, Wenjie Liu, Edward Lin, EMNLP 2022.
  • [5] [CMatch] Cross-domain Speech Recognition with Unsupervised Character-level Distribution Matching, Wenxin Hou, Jindong Wang, Xu Tan, Tao Qin, Takahiro Shinozaki, INTERSPEECH 2021.
  • [6] [Adapter] Exploiting Adapters for Cross-lingual Low-resource Speech Recognition, Wenxin Hou, Han Zhu, Yidong Wang, Jindong Wang, Tao Qin, Renjun Xu, Takahiro Shinozaki. IEEE/ACM TASLP 2022.
  • [7] LightSpeech: Lightweight and Fast Text to Speech with Neural Architecture Search, Renqian Luo, Xu Tan, Rui Wang, Tao Qin, Jinzhu Li, Sheng Zhao, Enhong Chen and Tie-Yan Liu, ICASSP 2021.
  • [8] PriorGrad: Improving Conditional Denoising Diffusion Models with Data-Dependent Adaptive Prior, Sang-gil Lee, Heeseung Kim, Chaehun Shin, Xu Tan, Chang Liu, Qi Meng, Tao Qin, Wei Chen, Sungroh Yoon, Tie-Yan Liu, ICLR 2022.
  • [9] BinauralGrad: A Two-Stage Conditional Diffusion Probabilistic Model for Binaural Audio Synthesis, Yichong Leng, Zehua Chen, Junliang Guo, Haohe Liu, Jiawei Chen, Xu Tan, Danilo Mandic, Lei He, Xiang-Yang Li, Tao Qin, Sheng Zhao and Tie-Yan Liu, NeurIPS 2022.
  • [10] VideoDubber: Machine Translation with Speech-Aware Length Control for Video Dubbing, Yihan Wu, Junliang Guo, Xu Tan, Chen Zhang, Bohan Li, Ruihua Song, Lei He, Sheng Zhao, Arul Menezes, Jiang Bian, AAAI 2022.
  • [11] PromptTTS 2: Describing and Generating Voices with Text Prompt, Yichong Leng, Zhifang Guo, Kai Shen, Xu Tan, Zeqian Ju, Yanqing Liu, Yufei Liu, Dongchao Yang, Leying Zhang, Kaitao Song, Lei He, Xiang-Yang Li, Sheng Zhao, Tao Qin, Jiang Bian, ICLR 2024.

Contributing

This project welcomes contributions and suggestions. Most contributions require you to agree to a Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us the rights to use your contribution. For details, visit https://cla.opensource.microsoft.com.

When you submit a pull request, a CLA bot will automatically determine whether you need to provide a CLA and decorate the PR appropriately (e.g., status check, comment). Simply follow the instructions provided by the bot. You will only need to do this once across all repos using our CLA.

This project has adopted the Microsoft Open Source Code of Conduct. For more information see the Code of Conduct FAQ or contact [email protected] with any additional questions or comments.

Trademarks

This project may contain trademarks or logos for projects, products, or services. Authorized use of Microsoft trademarks or logos is subject to and must follow Microsoft's Trademark & Brand Guidelines. Use of Microsoft trademarks or logos in modified versions of this project must not cause confusion or imply Microsoft sponsorship. Any use of third-party trademarks or logos are subject to those third-party's policies.

neuralspeech's People

Contributors

houwenxin avatar jindongwang avatar l0sg avatar microsoft-github-policy-service[bot] avatar microsoftopensource avatar renqianluo avatar tan-xu avatar tilakraj0308 avatar wyh2000 avatar xel-maker avatar yichongleng avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

neuralspeech's Issues

The different between chinese_char_sim.txt and sim_prun_char.txt ?

What is the different between chinese_char_sim.txt and sim_prun_char.txt in add_noise.py?
Do they represent different similarity as "the negative of ratio of edit distance of two phoneme sequences" and "average length of two phoneme sequences"?
chinese_char_sim.txt seems related to same phoneme with diffenent tones, but how to create the sim_prun_char.txt?

FastCorrect2: Prepare pseudo paired dataset is too slow

I tried to pretrain a FastCorrect2 model with 200M unpaired data. I used align_cal_werdur_v2.py to align noised text pairs. Looks it can only align 100 pairs/s even I created 40 processes... Could you give me some idea to speed it up or just release a pretrained model?

FastCorrect2:模型训练和模型预测使用的错误句子数不同是否可行?

我们的语音识别引擎,输入一个音频只会输出最优可能,不会输出多个识别结果,并且相同音频,多次传给识别引擎,会有不同的输出结果。所以想在训练、微调模型时,使用4个错误句子组成的句子对作为模型输入,在模型预测时,使用一个错误句子作为模型输入。不知道是否可行?

FastCorrect: finetuned paired data overlapped with ASR model training data?

Hi,

Thanks for sharing the code. Based on the paper, it looks like you train the ASR models on aishell-1 and internal dataset, and then use the trained models to transcribe the training data again to get finetuning data for FastCorrect. I think this could bring mismatch because the real hypothesis we want to correct is never seen during ASR model training.

Any thoughts?

Thanks

基于模型结果无任何影响

您好,感谢您们开源FastCorrect 项目
我们在使用 data/werdur_data_aishell 训练模型 以及用aishell 数据进行 retrain,但是最后测得的字准确率没有任何变化,请问是什么原因?两次结果在测试集都是4.6的CER。
以及我们发现在运行 bash runs/test_ft.sh,最后data.json 文件中 eval_origin_dict["utts"][k]["output"][0]["rec_token"] 结果无任何变化?
请问下,我们是不是做错了什么?

期待您们的回复
Shylock

"E2E" object has no attribute error

你好
关于上个issue中,我已经加上from espnet.nets.pytorch_backend.nets_utils import *,但是加上后又报错了,比如:
"E2E" object has no attribute "logzero"
"E2E" object has no attribute "_get_last_yesq"等
我尝试在E2E的父类中寻找相关的属性,但是没有找到
请问这仍是缺少了相关的定义吗?
期待您的回复

FastCorrect: What's the meaning of the character ▁ in dict.CN_char.txt?

I don't understand the meaning of the character ▁ in dict.CN_char.txt under the folder data\werdur_data_aishell, I find that it occurs before some english or chinese words, while other words don't have the prefix ▁, what's the difference? And which command can create the dictionary file?
To solve the code-switch problem, I need to correct some Chinese to English, or vice versa, should I extend words in dict.CN_char.txt with the special english words in my corpus while fine tuning the model? Should I need to do pretrain from scratch with the extended dict.CN_char.txt? Or reuse the provided pre-trained model and do fine tune with the extended dict.CN_char.txt is enough?
Thanks!

LightSpeech fine-tuning

Could you tell me if it is possible to fine-tune the model? If yes, what steps do I need to take to do this?

预测的结果和原句子一样

I not only use 4 GPU, but also set update-freq=4. So if you just finetune FastCorrect on one card, you need to set update-freq=16.
From my log you can know all the hyper-parameters, you can have a check with yours.

Originally posted by @YichongLeng in #14 (comment)

我已经按照您设定的参数微调了模型,但我预测的结果和原句子一样,请问您知道这是什么原因吗?
我预测的输出是translated = [transf_gec.decode(hypos[0]['tokens']) for hypos in batched_hypos][0]

can't find gram2.txt/gram3.txt file

./scripts/align_cal_werdur_v2.py
Traceback (most recent call last):
File "./scripts/align_cal_werdur_v2.py", line 257, in
print("Loading gram2:", gram2_path)
NameError: name 'gram2_path' is not defined

How to make it?Or can you provide it?

FastCorrect2:train_pretrain.sh:脚本报错:overflow detected, setting loss scale to xxx

我使用wiki dataset构造了25500000个错误句子对(每个句子对有4个错误句子),执行 train_pretrain.sh 脚本到epoch 010 时报下面错误,应该怎么解决啊?
2022-07-12 12:16:36 | INFO | train_inner | epoch 010: 1801 / 10548 loss=8.169, nll_loss=0.65, wer_dur_loss=0.011, word_ins=2.032, word_ins1=2.032, word_ins2=2.032, word_ins3=2.032, to_be_edited_loss=0.036, closest_loss=0.033, ppl=287.87, wps=12655.6, ups=0.32, wpb=39554.3, bsz=2362.3, num_updates=1800, lr=0.000225, gnorm=1.888, loss_scale=64, train_wall=312, wall=0
2022-07-12 12:21:48 | INFO | train_inner | epoch 010: 1901 / 10548 loss=8.249, nll_loss=1.071, wer_dur_loss=0.013, word_ins=2.053, word_ins1=2.053, word_ins2=2.054, word_ins3=2.054, to_be_edited_loss=0.039, closest_loss=0.019, ppl=304.25, wps=12681.9, ups=0.32, wpb=39470.2, bsz=2485.7, num_updates=1900, lr=0.0002375, gnorm=1.525, loss_scale=64, train_wall=311, wall=0
2022-07-12 12:23:50 | INFO | fairseq.trainer | NOTE: overflow detected, setting loss scale to: 32.0
2022-07-12 12:26:57 | INFO | train_inner | epoch 010: 2002 / 10548 loss=10.537, nll_loss=0.916, wer_dur_loss=0.019, word_ins=2.547, word_ins1=2.547, word_ins2=2.548, word_ins3=2.549, to_be_edited_loss=0.078, closest_loss=0.596, ppl=1486.15, wps=12743.2, ups=0.32, wpb=39484, bsz=2445.9, num_updates=2000, lr=0.00025, gnorm=6.127, loss_scale=32, train_wall=309, wall=0
2022-07-12 12:32:08 | INFO | train_inner | epoch 010: 2102 / 10548 loss=11.682, nll_loss=0.126, wer_dur_loss=0.012, word_ins=2.912, word_ins1=2.911, word_ins2=2.912, word_ins3=2.912, to_be_edited_loss=0.039, closest_loss=0.018, ppl=3286.11, wps=12729.3, ups=0.32, wpb=39531, bsz=2508.8, num_updates=2100, lr=0.0002625, gnorm=0.378, loss_scale=32, train_wall=310, wall=0
2022-07-12 12:35:43 | INFO | fairseq.trainer | NOTE: overflow detected, setting loss scale to: 16.0
2022-07-12 12:35:49 | INFO | fairseq.trainer | NOTE: overflow detected, setting loss scale to: 8.0
2022-07-12 12:35:52 | INFO | fairseq.trainer | NOTE: overflow detected, setting loss scale to: 4.0
2022-07-12 12:36:01 | INFO | fairseq.trainer | NOTE: overflow detected, setting loss scale to: 2.0
2022-07-12 12:36:10 | INFO | fairseq.trainer | NOTE: overflow detected, setting loss scale to: 1.0
2022-07-12 12:36:17 | INFO | fairseq.trainer | NOTE: overflow detected, setting loss scale to: 0.5
2022-07-12 12:36:27 | INFO | fairseq.trainer | NOTE: overflow detected, setting loss scale to: 0.25
2022-07-12 12:36:35 | INFO | fairseq.trainer | NOTE: overflow detected, setting loss scale to: 0.125
2022-07-12 12:36:39 | INFO | fairseq.trainer | NOTE: overflow detected, setting loss scale to: 0.0625
2022-07-12 12:36:42 | INFO | fairseq.trainer | NOTE: overflow detected, setting loss scale to: 0.03125
2022-07-12 12:36:45 | INFO | fairseq.trainer | NOTE: overflow detected, setting loss scale to: 0.015625
2022-07-12 12:36:54 | INFO | fairseq.trainer | NOTE: overflow detected, setting loss scale to: 0.0078125
2022-07-12 12:37:06 | INFO | fairseq.trainer | NOTE: overflow detected, setting loss scale to: 0.00390625
2022-07-12 12:37:10 | INFO | fairseq.trainer | NOTE: overflow detected, setting loss scale to: 0.001953125
2022-07-12 12:37:15 | INFO | fairseq.trainer | NOTE: overflow detected, setting loss scale to: 0.0009765625
2022-07-12 12:37:22 | INFO | fairseq.trainer | NOTE: overflow detected, setting loss scale to: 0.00048828125
2022-07-12 12:37:31 | INFO | fairseq.trainer | NOTE: overflow detected, setting loss scale to: 0.000244140625
2022-07-12 12:37:34 | INFO | fairseq.trainer | NOTE: overflow detected, setting loss scale to: 0.0001220703125
2022-07-12 12:37:38 | INFO | fairseq.nan_detector | Detected nan/inf grad norm, dumping norms...

why so stupid mistake?

LightSpeech/utils/pwg_decode_from_mel.py", line 19, in load_pwg_model
    with open(config_path) as f:
FileNotFoundError: [Errno 2] No such file or directory: 'wavegan_pretrained/config.yaml'

FastCorrect结果评估

作者你好,按照FastCorrect的Readme,最后一步评估效果的脚本执行后 提示“sclite: Error, Reference file 'path/to/ref.trn' does not exist“ ,请问是什么问题?

FastCorrect: pseudo data pre-processing

Hi,

Thanks for the repo!
I followed the FastCorrect Step 1 in README and extracted wiki text for pseudo dataset generation. From your instructions:

Also, we can perform some further pre-processing such as 1) removing non-Chinese letter, 2) splitting sentence, 3) changing number to its corresponding Chinese letter.

Are these further pre-processing steps necessary before we use scripts/align_cal_werdur_v2.py to align the sentences? Or are these steps included in any scripts in this repository?

Thank you!

FastCorrect2: add noise may delete all tokens

for example,
in the beam=4 setting,

for a unpaired text
"eight twitter"

add_noise.py may generate a nbest sample like
"twitter # 皮 twitter # eight twitter # "

after tokenization and alignment step, we get
"<void> ▁tw it ter <void> 1 ||| 皮 ▁tw it ter <void> 1 ||| ▁eight ▁tw it ter # 1 |||| 0 -2 1 1 0 ||| -1 1 1 1 0 ||| 1 1 1 1 0",
only 3 candidates left, which lead to assertion error when binarizing dataset.

I think the reason is the code below in align_cal_werdur_fast.py. It strips the rightmost space of the nbest sample line, so the empty candidate cannot be split by " # "

print("Loading: ", hypo_file)
with open(hypo_file, 'r', encoding='utf-8') as infile:
    for line in infile.readlines():
        all_hypo_line.append([i.strip().split() for i in line.strip().split(' # ')])

If the empty candidate is in the middle of nbest line, the data preprocessing steps works fine, but it will also cause error during training

LightSpeech: Code for inference?

Greetings and first of all, thanks for your repository.

I have a question regarding the inference of LightSpeech. Since the aim of a TTS system is to provide a system that can take a text input and synthesise an audio, I find it rather difficult to achieve this purpose with this code.

Is there any code for synthesising any text given as input? As an example, in this repository of FastSpeech2, they provide a script for synthesis:

python .\inference.py -c .\configs\default.yaml -p .\checkpoints\first_1\ts_version2_fastspeech_fe9a2c7_7k_steps.pyt --out output --text "ModuleList can be indexed like a regular Python list but modules it contains are properly registered."

Thanks in advance.

fastcorrect 效果没有复现

您好,我没有复现论文效果呢,加载您仓库里提供的预训练模型,用aishell数据做的微调。loss最终在2.34左右,但to_be_edited_loss下降不太正常,一直都是0.33-0.32

alignment performance issue

The performance of align_cal_werdur_v2.py is show when the alignment data is large on Fastcorrect.
Can I use "align_cal_werdur_fast.py" in Fastcorrect2 for Fastcorrect alignment?
The outputs are different between"align_cal_werdur_v2" and "align_cal_werdur_fast".

FastCorrect2: Poor performance in pre-training phase

I pretrained FastCorrect2 model with the settings in "runs/train_pretrain.sh". The pseudo paired dataset was generated using the corpus downloaded at here. I think the dataset size is enough for pretraining. But the result looks compromised, loss stopped dropping at the 7th epoch. And the model also performs very bad after fine-tuning on Aishell-1 dataset. Here is my log at validation step.

2022-03-28 08:15:55 | INFO | valid | epoch 001 | valid on 'valid' subset | loss 7.448 | nll_loss 0.596 | wer_dur_loss 0.014 | word_ins 1.855 | word_ins1 1.855 | word_ins2 1.855 | word_ins3 1.855 | to_be_edited_loss 0.037 | closest_loss 0.008 | ppl 174.59 | wps 312994 | wpb 39214.3 | bsz 1042.1 | num_updates 11156
2022-03-28 19:06:00 | INFO | valid | epoch 002 | valid on 'valid' subset | loss 7.418 | nll_loss 0.602 | wer_dur_loss 0.01 | word_ins 1.844 | word_ins1 1.844 | word_ins2 1.844 | word_ins3 1.844 | to_be_edited_loss 0.028 | closest_loss 0.045 | ppl 171.01 | wps 308015 | wpb 39214.3 | bsz 1042.1 | num_updates 22305 | best_loss 7.418
2022-03-29 05:49:45 | INFO | valid | epoch 003 | valid on 'valid' subset | loss 7.406 | nll_loss 0.598 | wer_dur_loss 0.009 | word_ins 1.841 | word_ins1 1.841 | word_ins2 1.841 | word_ins3 1.841 | to_be_edited_loss 0.026 | closest_loss 0.049 | ppl 169.61 | wps 315539 | wpb 39214.3 | bsz 1042.1 | num_updates 33455 | best_loss 7.406
2022-03-29 16:33:39 | INFO | valid | epoch 004 | valid on 'valid' subset | loss 7.391 | nll_loss 0.6 | wer_dur_loss 0.009 | word_ins 1.839 | word_ins1 1.839 | word_ins2 1.839 | word_ins3 1.839 | to_be_edited_loss 0.025 | closest_loss 0.037 | ppl 167.9 | wps 311628 | wpb 39214.3 | bsz 1042.1 | num_updates 44603 | best_loss 7.391
2022-03-30 03:08:25 | INFO | valid | epoch 005 | valid on 'valid' subset | loss 7.39 | nll_loss 0.609 | wer_dur_loss 0.009 | word_ins 1.838 | word_ins1 1.838 | word_ins2 1.838 | word_ins3 1.839 | to_be_edited_loss 0.025 | closest_loss 0.04 | ppl 167.72 | wps 321428 | wpb 39214.3 | bsz 1042.1 | num_updates 55753 | best_loss 7.39
2022-03-30 13:30:34 | INFO | valid | epoch 006 | valid on 'valid' subset | loss 7.378 | nll_loss 0.611 | wer_dur_loss 0.009 | word_ins 1.837 | word_ins1 1.837 | word_ins2 1.837 | word_ins3 1.837 | to_be_edited_loss 0.025 | closest_loss 0.026 | ppl 166.39 | wps 320994 | wpb 39214.3 | bsz 1042.1 | num_updates 66904 | best_loss 7.378
2022-03-31 00:00:03 | INFO | valid | epoch 007 | valid on 'valid' subset | loss 7.375 | nll_loss 0.614 | wer_dur_loss 0.008 | word_ins 1.837 | word_ins1 1.837 | word_ins2 1.837 | word_ins3 1.837 | to_be_edited_loss 0.023 | closest_loss 0.027 | ppl 165.98 | wps 320013 | wpb 39214.3 | bsz 1042.1 | num_updates 78051 | best_loss 7.375
2022-03-31 10:21:02 | INFO | valid | epoch 008 | valid on 'valid' subset | loss 7.378 | nll_loss 0.609 | wer_dur_loss 0.009 | word_ins 1.837 | word_ins1 1.837 | word_ins2 1.837 | word_ins3 1.837 | to_be_edited_loss 0.025 | closest_loss 0.031 | ppl 166.37 | wps 321458 | wpb 39214.3 | bsz 1042.1 | num_updates 89201 | best_loss 7.375
2022-03-31 20:32:51 | INFO | valid | epoch 009 | valid on 'valid' subset | loss 7.384 | nll_loss 0.611 | wer_dur_loss 0.009 | word_ins 1.836 | word_ins1 1.836 | word_ins2 1.836 | word_ins3 1.836 | to_be_edited_loss 0.024 | closest_loss 0.046 | ppl 167.01 | wps 325090 | wpb 39214.3 | bsz 1042.1 | num_updates 100350 | best_loss 7.375

I pretrained model on 8 GPUs, with the settings in "runs/train_pretrain.sh", the batch size is 8000+, maybe it's too large? Could you please give me some advice?

Can not load FastCorrect pretrain model

Hi,
I load your FastCorrect pretrain model by

from fastcorrect_model import FastCorrectModel
model = FastCorrectModel.from_pretrained('FastCorrect', checkpoint_file='fastcorrect_pretrain.pt')

then I get

RuntimeError: Error(s) in loading state_dict for NATransformerModel:
	Missing key(s) in state_dict: "decoder.embed_length.weight".

How can I load fastcorrect model correct?
Thanks.

adapterASR中部分函数缺乏定义

首先感谢开源

但是遇到一些问题:
e2e_asr_adaptertransformer.py中to_device\to_torch_tensor\pad_list等一些函数缺乏定义就使用了。可以提供一个完整无错的版本吗?

期待您的回复

Question about Fastcorrect2

Hello, I am running Fastcorrect2. But I have some questions about the test set in the eval_data folder: in the data.json, for a uttid , there are 4 candidate paths in the 'rec_text' . I would like to wonder , the wer of test set mentioned in the paper is 4.31%, does it have any relationship with the 'rec_text'? Thanks a lot !

eval_data/test no correction WER=4.31% ?

你好,我发现论文里给出的aishell-1的test set 和dev set 在no correction下的性能分别为 4.31%和4.03%
但统计你们提供的eval_data文件夹下的data.json文件性能test/dev性能分别是4.83%和4.46%

so many bugs

LightSpeech/tasks/lightspeech.py", line 568, in save_result
audio.save_wav(wav_out, f'{gen_dir}/wavs/{base_fn}.wav', hparams['audio_sample_rate'],
KeyError: 'audio_sample_rate'
"""

what's the version number did you use in FastCorrection?

Error:
ModuleNotFoundError: No module named 'fairseq.dataclass.data_class'

code:
from fairseq.dataclass.data_class import (
CheckpointParams,
CommonEvalParams,
CommonParams,
DatasetParams,
DistributedTrainingParams,
EvalLMParams,
OptimizationParams,
)

why you fix a package that not released yet to requirements??

I just want save other users time, your code have too many bugs. I can't tolerant about it.

image

pytorch_lightning 1.6.0 does NOT released yet. the latest version is 1.5.0

ALSO, even you installed from GitHub source code , the APIs are not going like your codes:

# from pytorch_lightning.callbacks.pt_callbacks import ModelCheckpoint
from pytorch_lightning.callbacks.model_checkpoint import ModelCheckpoint

there is no such from pytorch_lightning.callbacks.pt_callbacks in latest pytorch_lightining code.

Again, please fix your code bugs before release!!!

Where is the sentencepiece_model?

I can‘t find the sentencepiece_model,can you upload this model if it's convenient for you? And in eval_aishell.py,I need pass the parameter of data_name_or_path, I am not sure if I should use train data or dev data or test data.

FastCorrect1: how to calculate the 'P', 'R'?

Sorry to bother, and thanks for this awsome paper && code.

I've done all the steps of FastCorrect1, but found no code for P_{edit}, R_{edit}, P_{right} which mentioned in the paper. I am really not sure how to calcaulate them.

Again, thanks so much for ur time and all the wonderful work!

Confuses about some parameters

image
1): How does the "32768" represents?

2): assert len(tgt_item) == len(for_wer_gather), I got some error about this line on AISHELL-1 dataset, but it still works when I comment on this line, could you help to explain more details about the variable "for_wer_gather"?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.