microsoft / neuralspeech Goto Github PK

License: MIT License

Python 97.82% Shell 0.82% Cython 0.27% C++ 0.39% Cuda 0.70%

neuralspeech's Introduction

NeuralSpeech

NeuralSpeech is a research project at Microsoft Research Asia, which focuses on neural network based speech processing, including automatic speech recognition (ASR), text-to-speech synthesis (TTS), spatial audio synthesis, video dubbing, etc.

Currently this repo covers several research work:

Automatic Speech Recognition
Text-to-Speech Synthesis
Spatial Audio Synthesis
- BinauralGrad, NeurIPS 2022
Video Dubbing
- VideoDubber, AAAI 2023

For more research in NeuralSpeech project, you can refer to this page: https://speechresearch.github.io/. We will release more research work in the future.

For our research on AI music, you can refer to our Muzic project: https://github.com/microsoft/muzic.

We are hiring!

We are hiring researchers on speech (speech synthesis, speech recognition, voice conversion, audio processing), natural language processing, and machine learning. Please contact Xu Tan ([email protected]) if you have interests.

Reference

If you find NeuralSpeech project useful in your work, you can cite the following papers:

[1] FastCorrect: Fast Error Correction with Edit Alignment for Automatic Speech Recognition, Yichong Leng, Xu Tan, Linchen Zhu, Jin Xu, Renqian Luo, Linquan Liu, Tao Qin, Xiang-Yang Li, Ed Lin and Tie-Yan Liu, NeurIPS 2021.
[2] FastCorrect 2: Fast Error Correction on Multiple Candidates for Automatic Speech Recognition, Yichong Leng, Xu Tan, Rui Wang, Linchen Zhu, Jin Xu, Wenjie Liu, Linquan Liu, Tao Qin, Xiang-Yang Li, Ed Lin, Tie-Yan Liu, Findings of EMNLP 2021.
[3] SoftCorrect: Error Correction with Soft Detection for Automatic Speech Recognition, Yichong Leng, Xu Tan, Wenjie Liu, Kaitao Song, Rui Wang, Xiang-Yang Li, Tao Qin, Edward Lin, Tie-Yan Liu, AAAI 2023.
[4] [MaskCorrect] Mask the Correct Tokens: An Embarrassingly Simple Approach for Error Correction, Kai Shen, Yichong Leng, Xu Tan, Siliang Tang, Yuan Zhang, Wenjie Liu, Edward Lin, EMNLP 2022.
[5] [CMatch] Cross-domain Speech Recognition with Unsupervised Character-level Distribution Matching, Wenxin Hou, Jindong Wang, Xu Tan, Tao Qin, Takahiro Shinozaki, INTERSPEECH 2021.
[6] [Adapter] Exploiting Adapters for Cross-lingual Low-resource Speech Recognition, Wenxin Hou, Han Zhu, Yidong Wang, Jindong Wang, Tao Qin, Renjun Xu, Takahiro Shinozaki. IEEE/ACM TASLP 2022.
[7] LightSpeech: Lightweight and Fast Text to Speech with Neural Architecture Search, Renqian Luo, Xu Tan, Rui Wang, Tao Qin, Jinzhu Li, Sheng Zhao, Enhong Chen and Tie-Yan Liu, ICASSP 2021.
[8] PriorGrad: Improving Conditional Denoising Diffusion Models with Data-Dependent Adaptive Prior, Sang-gil Lee, Heeseung Kim, Chaehun Shin, Xu Tan, Chang Liu, Qi Meng, Tao Qin, Wei Chen, Sungroh Yoon, Tie-Yan Liu, ICLR 2022.
[9] BinauralGrad: A Two-Stage Conditional Diffusion Probabilistic Model for Binaural Audio Synthesis, Yichong Leng, Zehua Chen, Junliang Guo, Haohe Liu, Jiawei Chen, Xu Tan, Danilo Mandic, Lei He, Xiang-Yang Li, Tao Qin, Sheng Zhao and Tie-Yan Liu, NeurIPS 2022.
[10] VideoDubber: Machine Translation with Speech-Aware Length Control for Video Dubbing, Yihan Wu, Junliang Guo, Xu Tan, Chen Zhang, Bohan Li, Ruihua Song, Lei He, Sheng Zhao, Arul Menezes, Jiang Bian, AAAI 2022.
[11] PromptTTS 2: Describing and Generating Voices with Text Prompt, Yichong Leng, Zhifang Guo, Kai Shen, Xu Tan, Zeqian Ju, Yanqing Liu, Yufei Liu, Dongchao Yang, Leying Zhang, Kaitao Song, Lei He, Xiang-Yang Li, Sheng Zhao, Tao Qin, Jiang Bian, ICLR 2024.

Contributing

This project welcomes contributions and suggestions. Most contributions require you to agree to a Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us the rights to use your contribution. For details, visit https://cla.opensource.microsoft.com.

When you submit a pull request, a CLA bot will automatically determine whether you need to provide a CLA and decorate the PR appropriately (e.g., status check, comment). Simply follow the instructions provided by the bot. You will only need to do this once across all repos using our CLA.

This project has adopted the Microsoft Open Source Code of Conduct. For more information see the Code of Conduct FAQ or contact [email protected] with any additional questions or comments.

Trademarks

This project may contain trademarks or logos for projects, products, or services. Authorized use of Microsoft trademarks or logos is subject to and must follow Microsoft's Trademark & Brand Guidelines. Use of Microsoft trademarks or logos in modified versions of this project must not cause confusion or imply Microsoft sponsorship. Any use of third-party trademarks or logos are subject to those third-party's policies.

neuralspeech's People

Contributors

Stargazers

Watchers

Forkers

ishine shaun95 entn-at techthiyanes whitefu cescfangs kingfener ggsonic sciai-ai chenchy yymax-max dystudio qgzang uloveqian2021 polarcrab xjohnxjohn yingfenging nakhunchumpolsathien ammydolphin mazzzystar okrio wangyang2014 mohammadhosseinian rivertre willie-lin esoff manmushanhe python-repository-hub vulong3896 laurinmyreha veggiesonglwd kobeche cyaaronk vnbzty xvdp pengge finesjtu uakbr snowppy gshan4056 macroustc guoyang94 maxmax2016 fightseed sendream assassindesign tale-legend test-mass-forker-org-1 liroda adambear whz-nj git-zhp zhanfengdog ajitkumar15 sinntalker duaneking piggypiggyrun zcth428 yunzhongfei zhouchen428 runngezhang githublht14470309 liujingxiu23 ewing-h ninarag minhpqn aimoment tuannvhust emanueltns v-xingwuchen fastflair b1sounours analysmith jwgu assemblyai yuan-manx wyh2000 thangnvkcn dhockaday lhfazry chengjingfeng sudosadia gg-big-org xahiru qmpham suryatmodulus philippgrundhuber hans-aipark shirly-24 alpanait aixingxy lvhang chhaviilli nice14k amart85 peteralexandercharles zhongshijun junshipeng zhuifeng414 vvvm23

neuralspeech's Issues

The different between chinese_char_sim.txt and sim_prun_char.txt ?

What is the different between chinese_char_sim.txt and sim_prun_char.txt in add_noise.py?
Do they represent different similarity as "the negative of ratio of edit distance of two phoneme sequences" and "average length of two phoneme sequences"?
chinese_char_sim.txt seems related to same phoneme with diffenent tones, but how to create the sim_prun_char.txt?

fastcorrect2 pre-trained model

Could you please upload the pre-trained model at your convenience?

Fastcorrect：What is the loss function of Fastcorrect?

What is the loss function of Fastcorrect? Should I rewrite the fc_loss.py file if I want to change the loss function？ Thank you！

FastCorrect2: Prepare pseudo paired dataset is too slow

I tried to pretrain a FastCorrect2 model with 200M unpaired data. I used align_cal_werdur_v2.py to align noised text pairs. Looks it can only align 100 pairs/s even I created 40 processes... Could you give me some idea to speed it up or just release a pretrained model?

FastCorrect: some question of data preparation

Excuse me. Is werdur_data_aishell/dict.CN_char.txt extracted from the wiki data?

Is there any usage of the frequency of the tokens ?

我该怎么使用fastcorrect的预训练模型来纠正我语音识别结果的错误

FastCorrect2：模型训练和模型预测使用的错误句子数不同是否可行？

我们的语音识别引擎，输入一个音频只会输出最优可能，不会输出多个识别结果，并且相同音频，多次传给识别引擎，会有不同的输出结果。所以想在训练、微调模型时，使用4个错误句子组成的句子对作为模型输入，在模型预测时，使用一个错误句子作为模型输入。不知道是否可行？

FastCorrect: finetuned paired data overlapped with ASR model training data?

Hi,

Thanks for sharing the code. Based on the paper, it looks like you train the ASR models on aishell-1 and internal dataset, and then use the trained models to transcribe the training data again to get finetuning data for FastCorrect. I think this could bring mismatch because the real hypothesis we want to correct is never seen during ASR model training.

Any thoughts?

Thanks

基于模型结果无任何影响

您好，感谢您们开源FastCorrect 项目
我们在使用 data/werdur_data_aishell 训练模型以及用aishell 数据进行 retrain，但是最后测得的字准确率没有任何变化，请问是什么原因？两次结果在测试集都是4.6的CER。
以及我们发现在运行 bash runs/test_ft.sh，最后data.json 文件中 eval_origin_dict["utts"][k]["output"][0]["rec_token"] 结果无任何变化？
请问下，我们是不是做错了什么？

期待您们的回复
Shylock

"E2E" object has no attribute error

你好
关于上个issue中，我已经加上from espnet.nets.pytorch_backend.nets_utils import *，但是加上后又报错了，比如：
"E2E" object has no attribute "logzero"
"E2E" object has no attribute "_get_last_yesq"等
我尝试在E2E的父类中寻找相关的属性，但是没有找到
请问这仍是缺少了相关的定义吗？
期待您的回复

FastCorrect: What's the meaning of the character ▁ in dict.CN_char.txt?

I don't understand the meaning of the character ▁ in dict.CN_char.txt under the folder data\werdur_data_aishell, I find that it occurs before some english or chinese words, while other words don't have the prefix ▁, what's the difference? And which command can create the dictionary file?
To solve the code-switch problem, I need to correct some Chinese to English, or vice versa, should I extend words in dict.CN_char.txt with the special english words in my corpus while fine tuning the model? Should I need to do pretrain from scratch with the extended dict.CN_char.txt? Or reuse the provided pre-trained model and do fine tune with the extended dict.CN_char.txt is enough?
Thanks!

LightSpeech fine-tuning

Could you tell me if it is possible to fine-tune the model? If yes, what steps do I need to take to do this?

This repo is missing important files

There are important files that Microsoft projects should all have that are not present in this repository. A pull request has been opened to add the missing file(s). When the pr is merged this issue will be closed automatically.

Microsoft teams can learn more about this effort and share feedback within the open source guidance available internally.

Merge this pull request

预测的结果和原句子一样

I not only use 4 GPU, but also set update-freq=4. So if you just finetune FastCorrect on one card, you need to set update-freq=16.
From my log you can know all the hyper-parameters, you can have a check with yours.

Originally posted by @YichongLeng in #14 (comment)

我已经按照您设定的参数微调了模型，但我预测的结果和原句子一样，请问您知道这是什么原因吗？
我预测的输出是translated = [transf_gec.decode(hypos[0]['tokens']) for hypos in batched_hypos][0]

can't find gram2.txt/gram3.txt file

./scripts/align_cal_werdur_v2.py
Traceback (most recent call last):
File "./scripts/align_cal_werdur_v2.py", line 257, in
print("Loading gram2:", gram2_path)
NameError: name 'gram2_path' is not defined

How to make it？Or can you provide it?

FastCorrect2：train_pretrain.sh:脚本报错：overflow detected, setting loss scale to xxx

我使用wiki dataset构造了25500000个错误句子对（每个句子对有4个错误句子），执行 train_pretrain.sh 脚本到epoch 010 时报下面错误，应该怎么解决啊？
2022-07-12 12:16:36 | INFO | train_inner | epoch 010: 1801 / 10548 loss=8.169, nll_loss=0.65, wer_dur_loss=0.011, word_ins=2.032, word_ins1=2.032, word_ins2=2.032, word_ins3=2.032, to_be_edited_loss=0.036, closest_loss=0.033, ppl=287.87, wps=12655.6, ups=0.32, wpb=39554.3, bsz=2362.3, num_updates=1800, lr=0.000225, gnorm=1.888, loss_scale=64, train_wall=312, wall=0
2022-07-12 12:21:48 | INFO | train_inner | epoch 010: 1901 / 10548 loss=8.249, nll_loss=1.071, wer_dur_loss=0.013, word_ins=2.053, word_ins1=2.053, word_ins2=2.054, word_ins3=2.054, to_be_edited_loss=0.039, closest_loss=0.019, ppl=304.25, wps=12681.9, ups=0.32, wpb=39470.2, bsz=2485.7, num_updates=1900, lr=0.0002375, gnorm=1.525, loss_scale=64, train_wall=311, wall=0
2022-07-12 12:23:50 | INFO | fairseq.trainer | NOTE: overflow detected, setting loss scale to: 32.0
2022-07-12 12:26:57 | INFO | train_inner | epoch 010: 2002 / 10548 loss=10.537, nll_loss=0.916, wer_dur_loss=0.019, word_ins=2.547, word_ins1=2.547, word_ins2=2.548, word_ins3=2.549, to_be_edited_loss=0.078, closest_loss=0.596, ppl=1486.15, wps=12743.2, ups=0.32, wpb=39484, bsz=2445.9, num_updates=2000, lr=0.00025, gnorm=6.127, loss_scale=32, train_wall=309, wall=0
2022-07-12 12:32:08 | INFO | train_inner | epoch 010: 2102 / 10548 loss=11.682, nll_loss=0.126, wer_dur_loss=0.012, word_ins=2.912, word_ins1=2.911, word_ins2=2.912, word_ins3=2.912, to_be_edited_loss=0.039, closest_loss=0.018, ppl=3286.11, wps=12729.3, ups=0.32, wpb=39531, bsz=2508.8, num_updates=2100, lr=0.0002625, gnorm=0.378, loss_scale=32, train_wall=310, wall=0
2022-07-12 12:35:43 | INFO | fairseq.trainer | NOTE: overflow detected, setting loss scale to: 16.0
2022-07-12 12:35:49 | INFO | fairseq.trainer | NOTE: overflow detected, setting loss scale to: 8.0
2022-07-12 12:35:52 | INFO | fairseq.trainer | NOTE: overflow detected, setting loss scale to: 4.0
2022-07-12 12:36:01 | INFO | fairseq.trainer | NOTE: overflow detected, setting loss scale to: 2.0
2022-07-12 12:36:10 | INFO | fairseq.trainer | NOTE: overflow detected, setting loss scale to: 1.0
2022-07-12 12:36:17 | INFO | fairseq.trainer | NOTE: overflow detected, setting loss scale to: 0.5
2022-07-12 12:36:27 | INFO | fairseq.trainer | NOTE: overflow detected, setting loss scale to: 0.25
2022-07-12 12:36:35 | INFO | fairseq.trainer | NOTE: overflow detected, setting loss scale to: 0.125
2022-07-12 12:36:39 | INFO | fairseq.trainer | NOTE: overflow detected, setting loss scale to: 0.0625
2022-07-12 12:36:42 | INFO | fairseq.trainer | NOTE: overflow detected, setting loss scale to: 0.03125
2022-07-12 12:36:45 | INFO | fairseq.trainer | NOTE: overflow detected, setting loss scale to: 0.015625
2022-07-12 12:36:54 | INFO | fairseq.trainer | NOTE: overflow detected, setting loss scale to: 0.0078125
2022-07-12 12:37:06 | INFO | fairseq.trainer | NOTE: overflow detected, setting loss scale to: 0.00390625
2022-07-12 12:37:10 | INFO | fairseq.trainer | NOTE: overflow detected, setting loss scale to: 0.001953125
2022-07-12 12:37:15 | INFO | fairseq.trainer | NOTE: overflow detected, setting loss scale to: 0.0009765625
2022-07-12 12:37:22 | INFO | fairseq.trainer | NOTE: overflow detected, setting loss scale to: 0.00048828125
2022-07-12 12:37:31 | INFO | fairseq.trainer | NOTE: overflow detected, setting loss scale to: 0.000244140625
2022-07-12 12:37:34 | INFO | fairseq.trainer | NOTE: overflow detected, setting loss scale to: 0.0001220703125
2022-07-12 12:37:38 | INFO | fairseq.nan_detector | Detected nan/inf grad norm, dumping norms...

No module named 'fastcorrect_generator'

FastCorrect/fastcorrect_model.py
from fastcorrect_generator import DecoderOut

How or where to install fastcorrect_generator?

why so stupid mistake?

LightSpeech/utils/pwg_decode_from_mel.py", line 19, in load_pwg_model
    with open(config_path) as f:
FileNotFoundError: [Errno 2] No such file or directory: 'wavegan_pretrained/config.yaml'

FastCorrect结果评估

作者你好，按照FastCorrect的Readme，最后一步评估效果的脚本执行后提示“sclite: Error, Reference file 'path/to/ref.trn' does not exist“ ，请问是什么问题？

FastCorrect: pseudo data pre-processing

Hi,

Thanks for the repo!
I followed the FastCorrect Step 1 in README and extracted wiki text for pseudo dataset generation. From your instructions:

Also, we can perform some further pre-processing such as 1) removing non-Chinese letter, 2) splitting sentence, 3) changing number to its corresponding Chinese letter.

Are these further pre-processing steps necessary before we use scripts/align_cal_werdur_v2.py to align the sentences? Or are these steps included in any scripts in this repository?

Thank you!

FastCorrect2: add noise may delete all tokens

for example,
in the beam=4 setting,

for a unpaired text
"eight twitter"

add_noise.py may generate a nbest sample like
"twitter # 皮 twitter # eight twitter # "

after tokenization and alignment step, we get
"<void> ▁tw it ter <void> 1 ||| 皮 ▁tw it ter <void> 1 ||| ▁eight ▁tw it ter # 1 |||| 0 -2 1 1 0 ||| -1 1 1 1 0 ||| 1 1 1 1 0",
only 3 candidates left, which lead to assertion error when binarizing dataset.

I think the reason is the code below in align_cal_werdur_fast.py. It strips the rightmost space of the nbest sample line, so the empty candidate cannot be split by " # "

print("Loading: ", hypo_file)
with open(hypo_file, 'r', encoding='utf-8') as infile:
    for line in infile.readlines():
        all_hypo_line.append([i.strip().split() for i in line.strip().split(' # ')])

If the empty candidate is in the middle of nbest line, the data preprocessing steps works fine, but it will also cause error during training

LightSpeech: Code for inference?

Greetings and first of all, thanks for your repository.

I have a question regarding the inference of LightSpeech. Since the aim of a TTS system is to provide a system that can take a text input and synthesise an audio, I find it rather difficult to achieve this purpose with this code.

Is there any code for synthesising any text given as input? As an example, in this repository of FastSpeech2, they provide a script for synthesis:

python .\inference.py -c .\configs\default.yaml -p .\checkpoints\first_1\ts_version2_fastspeech_fe9a2c7_7k_steps.pyt --out output --text "ModuleList can be indexed like a regular Python list but modules it contains are properly registered."

Thanks in advance.

fastcorrect 效果没有复现

您好，我没有复现论文效果呢，加载您仓库里提供的预训练模型，用aishell数据做的微调。loss最终在2.34左右，但to_be_edited_loss下降不太正常，一直都是0.33-0.32

alignment performance issue

The performance of align_cal_werdur_v2.py is show when the alignment data is large on Fastcorrect.
Can I use "align_cal_werdur_fast.py" in Fastcorrect2 for Fastcorrect alignment?
The outputs are different between"align_cal_werdur_v2" and "align_cal_werdur_fast".

What is demo "TBD"

Hi. My name is Thong Vo.
Tell me what is 'TBD' in demo.
Thank you.

FastCorrect2: Poor performance in pre-training phase

I pretrained FastCorrect2 model with the settings in "runs/train_pretrain.sh". The pseudo paired dataset was generated using the corpus downloaded at here. I think the dataset size is enough for pretraining. But the result looks compromised, loss stopped dropping at the 7th epoch. And the model also performs very bad after fine-tuning on Aishell-1 dataset. Here is my log at validation step.

2022-03-28 08:15:55 | INFO | valid | epoch 001 | valid on 'valid' subset | loss 7.448 | nll_loss 0.596 | wer_dur_loss 0.014 | word_ins 1.855 | word_ins1 1.855 | word_ins2 1.855 | word_ins3 1.855 | to_be_edited_loss 0.037 | closest_loss 0.008 | ppl 174.59 | wps 312994 | wpb 39214.3 | bsz 1042.1 | num_updates 11156
2022-03-28 19:06:00 | INFO | valid | epoch 002 | valid on 'valid' subset | loss 7.418 | nll_loss 0.602 | wer_dur_loss 0.01 | word_ins 1.844 | word_ins1 1.844 | word_ins2 1.844 | word_ins3 1.844 | to_be_edited_loss 0.028 | closest_loss 0.045 | ppl 171.01 | wps 308015 | wpb 39214.3 | bsz 1042.1 | num_updates 22305 | best_loss 7.418
2022-03-29 05:49:45 | INFO | valid | epoch 003 | valid on 'valid' subset | loss 7.406 | nll_loss 0.598 | wer_dur_loss 0.009 | word_ins 1.841 | word_ins1 1.841 | word_ins2 1.841 | word_ins3 1.841 | to_be_edited_loss 0.026 | closest_loss 0.049 | ppl 169.61 | wps 315539 | wpb 39214.3 | bsz 1042.1 | num_updates 33455 | best_loss 7.406
2022-03-29 16:33:39 | INFO | valid | epoch 004 | valid on 'valid' subset | loss 7.391 | nll_loss 0.6 | wer_dur_loss 0.009 | word_ins 1.839 | word_ins1 1.839 | word_ins2 1.839 | word_ins3 1.839 | to_be_edited_loss 0.025 | closest_loss 0.037 | ppl 167.9 | wps 311628 | wpb 39214.3 | bsz 1042.1 | num_updates 44603 | best_loss 7.391
2022-03-30 03:08:25 | INFO | valid | epoch 005 | valid on 'valid' subset | loss 7.39 | nll_loss 0.609 | wer_dur_loss 0.009 | word_ins 1.838 | word_ins1 1.838 | word_ins2 1.838 | word_ins3 1.839 | to_be_edited_loss 0.025 | closest_loss 0.04 | ppl 167.72 | wps 321428 | wpb 39214.3 | bsz 1042.1 | num_updates 55753 | best_loss 7.39
2022-03-30 13:30:34 | INFO | valid | epoch 006 | valid on 'valid' subset | loss 7.378 | nll_loss 0.611 | wer_dur_loss 0.009 | word_ins 1.837 | word_ins1 1.837 | word_ins2 1.837 | word_ins3 1.837 | to_be_edited_loss 0.025 | closest_loss 0.026 | ppl 166.39 | wps 320994 | wpb 39214.3 | bsz 1042.1 | num_updates 66904 | best_loss 7.378
2022-03-31 00:00:03 | INFO | valid | epoch 007 | valid on 'valid' subset | loss 7.375 | nll_loss 0.614 | wer_dur_loss 0.008 | word_ins 1.837 | word_ins1 1.837 | word_ins2 1.837 | word_ins3 1.837 | to_be_edited_loss 0.023 | closest_loss 0.027 | ppl 165.98 | wps 320013 | wpb 39214.3 | bsz 1042.1 | num_updates 78051 | best_loss 7.375
2022-03-31 10:21:02 | INFO | valid | epoch 008 | valid on 'valid' subset | loss 7.378 | nll_loss 0.609 | wer_dur_loss 0.009 | word_ins 1.837 | word_ins1 1.837 | word_ins2 1.837 | word_ins3 1.837 | to_be_edited_loss 0.025 | closest_loss 0.031 | ppl 166.37 | wps 321458 | wpb 39214.3 | bsz 1042.1 | num_updates 89201 | best_loss 7.375
2022-03-31 20:32:51 | INFO | valid | epoch 009 | valid on 'valid' subset | loss 7.384 | nll_loss 0.611 | wer_dur_loss 0.009 | word_ins 1.836 | word_ins1 1.836 | word_ins2 1.836 | word_ins3 1.836 | to_be_edited_loss 0.024 | closest_loss 0.046 | ppl 167.01 | wps 325090 | wpb 39214.3 | bsz 1042.1 | num_updates 100350 | best_loss 7.375

I pretrained model on 8 GPUs, with the settings in "runs/train_pretrain.sh", the batch size is 8000+, maybe it's too large? Could you please give me some advice？

关于chinese_char_sim.txt和sim_prun_char.txt

非常感谢开源代码，我想问一下，在构造伪校正文本时，所使用的同音字词典是怎样制作的？

Can not load FastCorrect pretrain model

Hi,
I load your FastCorrect pretrain model by

from fastcorrect_model import FastCorrectModel
model = FastCorrectModel.from_pretrained('FastCorrect', checkpoint_file='fastcorrect_pretrain.pt')

then I get

RuntimeError: Error(s) in loading state_dict for NATransformerModel:
	Missing key(s) in state_dict: "decoder.embed_length.weight".

How can I load fastcorrect model correct?
Thanks.

adapterASR中部分函数缺乏定义

首先感谢开源

但是遇到一些问题：
e2e_asr_adaptertransformer.py中to_device\to_torch_tensor\pad_list等一些函数缺乏定义就使用了。可以提供一个完整无错的版本吗？

期待您的回复

how can i get the pretrain model of fastcorrect2?

As mentioned in #14, is the pretrain model of fastcorrect2 open source, how can i get it? thank you.

Question about Fastcorrect2

Hello, I am running Fastcorrect2. But I have some questions about the test set in the eval_data folder: in the data.json, for a uttid , there are 4 candidate paths in the 'rec_text' . I would like to wonder , the wer of test set mentioned in the paper is 4.31%, does it have any relationship with the 'rec_text'？ Thanks a lot !

Can Fastcorrect be used for Error Correction in English?

If I want to do English error correction, can I use this model? Is it possible to pre-train and fine-tune with English corpus?

eval_data/test no correction WER=4.31% ?

你好,我发现论文里给出的aishell-1的test set 和dev set 在no correction下的性能分别为 4.31%和4.03%
但统计你们提供的eval_data文件夹下的data.json文件性能test/dev性能分别是4.83%和4.46%

fastcorrect2：What was the format of the data before running align_cal_werdur_v2.py?

What was the format of the data before running align_cal_werdur_v2.py?

so many bugs

LightSpeech/tasks/lightspeech.py", line 568, in save_result
audio.save_wav(wav_out, f'{gen_dir}/wavs/{base_fn}.wav', hparams['audio_sample_rate'],
KeyError: 'audio_sample_rate'
"""

fairseq-train: error: the following arguments are required: data

运行train_ft.sh的时候报错：fairseq-train: error: the following arguments are required: data
我已经在train_ft.sh里设置了DATA_PATH= '/XXX/FastCorrect/data/werdur_data_aishell'
请问这个报错该如何解决？

what's the version number did you use in FastCorrection?

Error:
ModuleNotFoundError: No module named 'fairseq.dataclass.data_class'

code:
from fairseq.dataclass.data_class import (
CheckpointParams,
CommonEvalParams,
CommonParams,
DatasetParams,
DistributedTrainingParams,
EvalLMParams,
OptimizationParams,
)

Fastcorrect2: Could you upload your pre-processed wiki pre-trained data and your pre-trained log?

I want to Reproduce the results of the paper Fastcorrect2. Could you please upload your pre-processed wiki pre-trained data and your pre-trained log ？

NeuralSpeech/LightSpeech/tasks/lightspeech.py", line 572, in save_result norm=hparams['out_wav_norm']) KeyError: 'out_wav_norm' """

NeuralSpeech/LightSpeech/tasks/lightspeech.py", line 572, in save_result
norm=hparams['out_wav_norm'])
KeyError: 'out_wav_norm'
"""

Please make sure these params are mentioned in README before open source....

it make users very confused, since the code can not be RUN.

why you fix a package that not released yet to requirements??

I just want save other users time, your code have too many bugs. I can't tolerant about it.

pytorch_lightning 1.6.0 does NOT released yet. the latest version is 1.5.0

ALSO, even you installed from GitHub source code , the APIs are not going like your codes:

# from pytorch_lightning.callbacks.pt_callbacks import ModelCheckpoint
from pytorch_lightning.callbacks.model_checkpoint import ModelCheckpoint

there is no such from pytorch_lightning.callbacks.pt_callbacks in latest pytorch_lightining code.

Again, please fix your code bugs before release!!!

"E2E" object has no attribute "_get_last_yseq"

Where is the sentencepiece_model?

I can‘t find the sentencepiece_model，can you upload this model if it's convenient for you？ And in eval_aishell.py，I need pass the parameter of data_name_or_path, I am not sure if I should use train data or dev data or test data.

fastcorrect2的训练数据的格式是什么？

请问fastcorrect2的训练数据的格式是什么？

Fastcorrect2: How to calculate P_{edit}, R_{edit}, P_{right}

I completed the experiment with fastcorrect2, it performs well. But I'm not sure how to calculate P_{edit}, R_{edit}, P_{right} mentioned in the paper. Could you upload the code?

FastCorrect中的eval_data怎么生成的？

FastCorrect中的eval_data怎么生成的？我想构造自己的验证数据应该怎么构造啊？

FastCorrect：eval_aishell.py中的data_name_or_path配置可以配置为data-gen.sh处理data/werdur_data_aishell后得到的DATA_PATH目录吗

eval_aishell.py中的data_name_or_path可以配置为data-gen.sh处理data/werdur_data_aishell后得到的DATA_PATH目录吗？另外eval_aishell.py只是读取data_name_or_path目录下的字典文件，其他用于模型训练的二进制化的训练数据不会使用，是吗？

FastCorrect1: how to calculate the 'P', 'R'?

Sorry to bother, and thanks for this awsome paper && code.

I've done all the steps of FastCorrect1, but found no code for P_{edit}, R_{edit}, P_{right} which mentioned in the paper. I am really not sure how to calcaulate them.

Again, thanks so much for ur time and all the wonderful work!