yeyupiaoling / paddlepaddle-deepspeech Goto Github PK

View Code? Open in Web Editor NEW

626.0 5.0 140.0 15.13 MB

基于PaddlePaddle实现的语音识别，中文语音识别。项目完善，识别效果好。支持Windows，Linux下训练和预测，支持Nvidia Jetson开发板预测。

Home Page: https://yeyupiaoling.blog.csdn.net/article/details/102904306

License: Apache License 2.0

Python 93.21% JavaScript 3.60% HTML 2.50% CSS 0.68%

paddlepaddle deepspeech chinese asr deepspeech2 docker nvidia-docker speech-recognition speech-to-text deep-learning

paddlepaddle-deepspeech's Introduction

DeepSpeech2 语音识别

本项目是基于PaddlePaddle的DeepSpeech 项目开发的，做了较大的修改，方便训练中文自定义数据集，同时也方便测试和使用。DeepSpeech2是基于PaddlePaddle实现的端到端自动语音识别（ASR）引擎，其论文为《Baidu's Deep Speech 2 paper》，本项目同时还支持各种数据增强方法，以适应不同的使用场景。支持在Windows，Linux下训练和预测，支持Nvidia Jetson等开发板推理预测，该分支为新版本，如果要使用旧版本，请查看release/1.0分支。

动态图版本使用更简单，支持Deepspeech2、Conformer、Squeezeformer模型：PPASR

本项目使用的环境：

Python 3.7
PaddlePaddle 2.2.0
Windows or Ubuntu

更新记录

2021.11.26: 修改集束解码bug。
2021.11.09: 提供WenetSpeech数据集制作脚本。
2021.09.05: 提供GUI界面识别部署。
2021.09.04: 提供三个公开数据的预训练模型。
2021.08.30: 支持中文数字转阿拉伯数字，具体请看预测文档。
2021.08.29: 完成训练代码和预测代码，同时完善相关文档。
2021.08.07: 支持导出预测模型，使用预测模型进行推理。使用webrtcvad工具，实现长语音识别。
2021.08.06: 将项目大部分的代码修改为PaddlePaddle2.0之后的新API。

模型下载

数据集	卷积层数量	循环神经网络的数量	循环神经网络的大小	测试集字错率	下载地址
aishell(179小时)	2	3	1024	0.084532	点击下载
free_st_chinese_mandarin_corpus(109小时)	2	3	1024	0.170260	点击下载
thchs_30(34小时)	2	3	1024	0.026838	点击下载

说明： 这里提供的是训练参数，如果要用于预测，还需要执行导出模型，使用的解码方法是集束搜索。

有问题欢迎提 issue 交流

文档教程

快速预测

下载作者提供的模型或者训练模型，然后执行导出模型，使用infer_path.py预测音频，通过参数--wav_path指定需要预测的音频路径，完成语音识别，详情请查看模型部署。

python infer_path.py --wav_path=./dataset/test.wav

输出结果：

-----------  Configuration Arguments -----------
alpha: 1.2
beam_size: 10
beta: 0.35
cutoff_prob: 1.0
cutoff_top_n: 40
decoding_method: ctc_greedy
enable_mkldnn: False
is_long_audio: False
lang_model_path: ./lm/zh_giga.no_cna_cmn.prune01244.klm
mean_std_path: ./dataset/mean_std.npz
model_dir: ./models/infer/
to_an: True
use_gpu: True
use_tensorrt: False
vocab_path: ./dataset/zh_vocab.txt
wav_path: ./dataset/test.wav
------------------------------------------------
消耗时间：132, 识别结果: 近几年不但我用书给女儿儿压岁也劝说亲朋不要给女儿压岁钱而改送压岁书, 得分: 94

长语音预测

python infer_path.py --wav_path=./dataset/test_vad.wav --is_long_audio=True

Web部署

GUI界面部署

打赏作者

打赏一块钱支持一下作者

paddlepaddle-deepspeech's People

Contributors

Stargazers

Watchers

Forkers

laugha gaozheyuan 4colors maxenergy hehaoq weimingtom cxzhou007 wuchaowei2012 hommmm gt-acerzhang zhangrong-mz zcswdt org-mars yinghuochongxiaoq shixiangbupt 1364468984qqcom cqutwanghong ceasarlee panzuanxin jasonzhang-jx coderboy24x7 yyht cocobar cxapython undercontroller ntzzc scorpiokay zhzhuangxue asyncgo zhangyifei1 yueyedeai zelda3721 fangrn davidhefan tim-chen-code bitdaocao sky8652 chinayiqun sodawater05 jianchi2001 xiaohuochai123 litianw yxun6966 johndoe117 lansefangzhou coderchuan winderwl a-new-eruption justwdh littlestone0806 shuiniu86 kerwinchina cy5211 ground-truth xtly2012 bonima123 maggic303 juie destinyming hongsiyu zoand xiahongjin rookie-j ly03240921 imogenqi leo812993 caixiong110 chenhaohan88 flash-lw saxh ekicham thinkinchaos alanlv jiahong3837 perfyperfect nlp-chnproject 2259798112 convect-bot lianglili shadofung daish98 fanhuafeng errolyan anonymouslycn ohki-ki lonelyxmas wanghaisheng hangzhou1 lty628 wtszu bluep0int hailangzz yangboz maxuanjun tommy13579 guoxiongfei skinny-joey sqjkl normonisping dannyneo

paddlepaddle-deepspeech's Issues

将ctc_decoders单独出来编译

将ctc_decoders单独出来编译是指在decoder文件夹下运行setup.sh文件吗

infer_path

您好，我想直接用你提供的模型对我的音频进行ASR预测(https://deepspeech.bj.bcebos.com/zh_lm/zh_giga.no_cna_cmn.prune01244.klm)
那么我应该使用那个数据集对应的mean_std.npz和zh_vocab.txt呢。
请问您是否有已经处理好的这两个文件，如果可以希望可以共享一下。[email protected]

多卡速度问题

您好，我在训练过程中发现，一个batch,用两块儿卡比一块卡还要慢。单卡跑一个batch 要3分钟，双卡跑一个batch用了8分钟。

离线配置环境问题

请问公司的离线服务器无法直接运行decoder中的setup.sh，该怎么安装ctc_decoders
现在运行主程序报错是：
modulenotfounderror: no module named'ctc_decoders'
非常感谢！

请问如何接上次训练的模型训练

音频长度大于15

目前手里的数据集音频长度都是大于15s的，batch_size设置成16的话就爆显存，然后就设置了batch_size=8,结果就碰到了loss=nan问题，想知道有方法能对长音频进行分割吗，现在几乎无法训练了

如果测试音频有背景音乐等噪音，效果好像变差了很多，有办法解决吗

代码内有类似人声提取等预处理方法吗，或者有类似方法推荐吗，谢谢

这个项目可以部署在windows上吗

关于预测接口问题

您好，
1.输入的音频数据都需要转成wav后才能喂到接口里吗？目前接口支持哪个格式的语音数据
2.接口有没有转换采样频率的逻辑，还是说必须调整好采用频率后才能给到接口里
谢谢

环境问题

您好，按照readme中指示，在执行测试时候，报错。

请问在线试用网站的功能没有开启吗？

我上传了一个音频，想看看效果怎么样，返回了{'error': 1001, 'msg': '这是测试！'}

关于中文识别、声纹识别和声音场景分类的咨询

你好，非常不好意思打扰你。我想搭建中文识别、说话人识别、声音场景分类这三个的功能项目。然后我找了一些开源库。但因为我之前是一直在开发的方向上走，没有接触过语音这一块。想咨询你几个问题：
1.我使用了官方paddlepaddle-DeepSpeech开源代码搭建了一个baidu1.2k模型的中文语音在线识别系统，但识别效果不是太好。想咨询下你的这个工程在使用baidu1.2k模型和自训练模型的效果如何？或者可以帮忙推荐一个代码和模型都开源的效果比较好的中文识别项目么？
2.声纹识别我也找了一个开源库搭建起来了，但还没有用数据集去测试。想咨询下您基于paddlepaddle的声纹识别的项目效果如何？
3.我看了你其他的两个基于paddlepaddle和TensorFlow的用urbansound8k搭建的声音场景分类工程，想咨询这个的效果怎么样？

非常抱歉打扰您~非常希望你能拨冗给予我一点帮助，谢谢

模型加载问题

如图，卡在这一步

推理出错

你好，我运行infer_path的时候，显示Warning: The pretrained params do not exist.程序就运行结束了，
参数如下：

请问怎么解决呢

python2 中文编码问题

UnicodeEncodeError: 'ascii' codec can't encode characters in position 24-31: ordinal not in range(128)

时长

请问，这个模型训练和预测时，最大支持的每段音频时长是多少呢？

document so large,else ways to pull container

docker pull is so large have else to download docker container?

AISHELL模型和1300小时模型对比

你好，我下载了AISHELL模型和1300小时模型这两个模型，我下载了一段普通话片段，AISHELL模型效果会更好；如果是data_thchs30里面的数据集，是1300小时的模型更好一点，请问，你测试的时候也是这样吗

想拿作者训练好的模型只跑测试，可以不用装docker吗，需要安装哪些依赖库？

想作为一个工具来识别音频输出文本

无法打开麦克风。异常信息:18

访问localhost:5002 ，点击录音按钮会报错：无法打开麦克风。异常信息:18。请问是什么原因？

运行sudo sh setup.sh错误

Running setup.py install for llvmlite ... error

RuntimeError: llvm-config failed executing, please point LLVM_CONFIG to the path for llvm-config
求大神指点这个错误应该怎么解决。
是不是要安装LLVM这个工具啊

FileNotFoundError: [Errno 2] No such file or directory: './dataset/mean_std.npz'

作用python3 eval.py --model_path=./models/step_final/报错,FileNotFoundError: [Errno 2] No such file or directory: './dataset/mean_std.npz'

export FLAGS_sync_nccl_allreduce=0的意义

export FLAGS_sync_nccl_allreduce=0
这行代码是训练前先在命令行里执行么？意义是什么呢？

关于在Windows10部署的问题

能否告知怎么在windows10上使用，非常感谢！

声道数、采样率、模型

1、请问作者声道数和采样率对识别结果影响大吗，它们分别是多少的时候识别效果最好呢？
2、作者还有用更大的数据集训练出来的模型么？现在的这几个发布的模型感觉在实际的对话录音场景中，效果还是不够理想额。。。
非常感谢

关于数据集

您好，能知道1300h小时的数据集是公开的数据集吗，都包含哪些呢

可以在windows上训练吗？

windows可以装ubuntu，但是navidia-docker使用不了

超大规模预训练模型和官方的vocab不匹配。

你好，官方的vocab和mean_std和这个预训练不匹配，能不能提供预制配套的vocab和mean_std

模型问题

官方提供的模型和自训练的哪个效果更好啊（微云上的东西下载很慢。。。）

docker 环境问题

您好，我使用您提供配置docker的教程，但遇到：

E1210 12:48:08.426810 4204 pybind.cc:1261] Cannot use GPU because there is no GPU detected on your machine.

3070的卡可以正常训练，但预测遇到以下问题

由于30系列只支持CUDA-11.0以上，我换掉了CUDA-10.0和cudnn7.3到cuda11.3和cudnn8.2
paddlepaddle也换到2.1.0的版本
models.py
加入
import paddle
paddle.enable.static()

python3 train.py 正常训练，正常保存模型

但是 infer_path.py 加入了
import paddle
paddle.enable.static()
会出现以下错误：

python3 infer_path.py --model_path=./models/epoch_0/ --wav_path=./dataset/test.wav

grep: warning: GREP_OPTIONS is deprecated; please use an alias or script
/usr/local/lib/python3.7/site-packages/setuptools/depends.py:2: DeprecationWarning: the imp module is deprecated in favour of importlib; see the module's documentation for alternative uses
  import imp
W0520 03:02:36.909706 16902 device_context.cc:404] Please NOTE: device: 0, GPU Compute Capability: 8.6, Driver API Version: 11.3, Runtime API Version: 11.2
W0520 03:02:36.911597 16902 device_context.cc:422] device: 0, cuDNN Version: 8.2
> 成功加载了预训练模型：./models/epoch_0/
[INFO 2021-05-20 03:02:40,163 model.py:524] begin to initialize the external scorer for decoding
terminate called after throwing an instance of 'lm::FormatLoadException'
  what():  kenlm/lm/vocab.cc:43 in void lm::ngram::{anonymous}::ReadWords(int, lm::EnumerateVocab*, lm::WordIndex, uint64_t) threw FormatLoadException because `memcmp(check_unk, "<unk>", 6)'.
Vocabulary words are in the wrong place.  This could be because the binary file was built with stale gcc and old kenlm.  Stale gcc, including the gcc distributed with RedHat and OS X, has a bug that ignores pragma pack for template-dependent types.  New kenlm works around this, so you'll save memory but have to rebuild any binary files using the probing data structure.

--------------------------------------
C++ Traceback (most recent call last):
--------------------------------------
0   Scorer::Scorer(double, double, std::string const&, std::vector<std::string, std::allocator<std::string > > const&)
1   Scorer::setup(std::string const&, std::vector<std::string, std::allocator<std::string > > const&)
2   Scorer::load_lm(std::string const&)
3   lm::ngram::LoadVirtual(char const*, lm::ngram::Config const&, lm::ngram::ModelType)
4   lm::ngram::detail::GenericModel<lm::ngram::detail::HashedSearch<lm::ngram::BackoffValue>, lm::ngram::ProbingVocabulary>::GenericModel(char const*, lm::ngram::Config const&)
5   lm::ngram::ProbingVocabulary::LoadedBinary(bool, int, lm::EnumerateVocab*, unsigned long)
6   paddle::framework::SignalHandle(char const*, int)
7   paddle::platform::GetCurrentTraceBackString[abi:cxx11]()

----------------------
Error Message Summary:
----------------------
FatalError: `Process abort signal` is detected by the operating system.
  [TimeInfo: *** Aborted at 1621479760 (unix time) try "date -d @1621479760" if you are using GNU date ***]
  [SignalInfo: *** SIGABRT (@0x4206) received by PID 16902 (TID 0x7f254b481700) from PID 16902 ***]

超参数设置问题

@yeyupiaoling 你好，我在跑您的代码，数据集采用的是"MAGICDATA Mandarin Chinese Read Speech Corpus"，训练集大概有570000条语音。我的机器是单机RTX2080Ti，我的batch_size设为16，请问如何设置对应的学习率呢，训练太慢了，没法自己去一点点调参。感谢

使用gpu推理错误

版本、环境信息：
1）PaddlePaddle版本：Paddlepaddle2.0.2-gpu（直接安装官方编译好的whl）
2）系统环境：NVIDIA Jetson AGX Xavier JetPack4.3 Ubuntu18.04、Python3.6.9,cuda10.0,cudnn7.6,
你好，当我将'use_gpu', 设为True的时候会出现如下错误

将其改为False可以正常运行，只是时间会慢很多，
您是否遇到过这样的问题？

运行setup.sh报错

Could not find a version that satisfies the requirement numba==0.52.0 (from versions: 0.1, 0.2, 0.3, 0.5.0, 0.6.0, 0.7.0, 0.7.1, 0.7.2, 0.8.0, 0.8.1, 0.9.0, 0.10.0, 0.10.1, 0.11.0, 0.12.0, 0.12.1, 0.12.2, 0.13.0, 0.13.2, 0.13.3, 0.13.4, 0.14.0, 0.15.1, 0.16.0, 0.17.0, 0.18.1, 0.18.2, 0.19.1, 0.19.2, 0.20.0, 0.21.0, 0.22.0, 0.22.1, 0.23.0, 0.23.1, 0.24.0, 0.25.0, 0.26.0, 0.27.0, 0.28.1, 0.29.0, 0.30.0, 0.30.1, 0.31.0, 0.32.0, 0.33.0, 0.34.0, 0.35.0, 0.36.1, 0.36.2, 0.37.0, 0.38.0, 0.38.1, 0.39.0, 0.40.0, 0.40.1, 0.41.0, 0.42.0, 0.42.1, 0.43.0, 0.43.1, 0.44.0, 0.44.1, 0.45.0, 0.45.1, 0.46.0, 0.47.0)
No matching distribution found for numba==0.52.0

请问下numba其他版本影响运行吗？

ValueError: Failed to parse the augmentation config json: [Errno 2] No such file or directory: 'dataset/manifest.noise'

ValueError: Failed to parse the augmentation config json: [Errno 2] No such file or directory: 'dataset/manifest.noise'这个怎么搞

如何在您这个项目中使用官百度提供英语模型，进行英语识别

我下载百度提供的英语模型，直接替换您项目中相关的路径似乎不能运行，希望大神指点一下

复杂度问题

Error Message Summary:

InvalidArgumentError: Cannot parse tensor desc
[Hint: Expected desc.ParseFromArray(buf.get(), size) == true, but received desc.ParseFromArray(buf.get(), size):0 != true:1.] at (/paddle/paddle/fluid/framework/tensor_util.cu:527)
[operator < load_combine > error]

FileNotFoundError: [Errno 2] No such file or directory: './dataset/manifest.test'

这个怎么搞

大佬，加载自己之前的预训练模型报错怎么办呢？

您好开发者

您好，我是百度飞桨运营，看了您的项目觉得很优秀，希望能与您取得联系，请问可以加一下我的微信（paddlehelp）备注飞桨开发者么？
期待您的回复~

预训练模型

请问夜雨飘零大神有预训练模型吗，试了下paddle官方的预训练模型效果不太理想。

你好，请问是修改了什么地方解决了OOM问题呢

显存一直飙升，到最后就会出错

用/models/step_final/做预训练模型报错。

dataloader能不能多进程加载数据啊

能像PPASR一样加num_worker多进程加载data吗

多卡环境

您好，按照您的教程我拉取镜像安装完成后，
import paddle
paddle.fluid.install_check.run_check()
报错了,log 如下
///////////////////////////////////////////////////////
Running Verify Fluid Program ...
W0318 01:57:14.200362 29 device_context.cc:252] Please NOTE: device: 0, CUDA Capability: 61, Driver API Version: 11.0, Runtime API Version: 10.0
W0318 01:57:14.200688 29 device_context.cc:260] device: 0, cuDNN Version: 7.6.
Your Paddle Fluid works well on SINGLE GPU or CPU.
/usr/local/python3.5.1/lib/python3.5/site-packages/paddle/fluid/executor.py:1070: UserWarning: The following exception is not an EOF exception.
"The following exception is not an EOF exception.")
WARNING:root:Your Paddle Fluid has some problem with multiple GPU. This may be caused by:

There is only 1 or 0 GPU visible on your Device;
No.1 or No.2 GPU or both of them are occupied now
Wrong installation of NVIDIA-NCCL2, please follow instruction on https://github.com/NVIDIA/nccl-tests
to test your NCCL, or reinstall it following https://docs.nvidia.com/deeplearning/sdk/nccl-install-guide/index.html
Original Error is:

C++ Call Stacks (More useful to developers):
0 std::string paddle::platform::GetTraceBackString<char const*>(char const*&&, char const*, int)
1 paddle::platform::EnforceNotMet::EnforceNotMet(std::__exception_ptr::exception_ptr, char const*, int)
2 paddle::platform::NCCLContextMap::NCCLContextMap(std::vector<paddle::platform::Place, std::allocatorpaddle::platform::Place > const&, ncclUniqueId*, unsigned long, unsigned long)
3 paddle::framework::ParallelExecutor::ParallelExecutor(std::vector<paddle::platform::Place, std::allocatorpaddle::platform::Place > const&, std::vector<std::string, std::allocatorstd::string > const&, std::string const&, paddle::framework::Scope*, std::vector<paddle::framework::Scope*, std::allocatorpaddle::framework::Scope* > const&, paddle::framework::details::ExecutionStrategy const&, paddle::framework::details::BuildStrategy const&, paddle::framework::ir::Graph*)

Error Message Summary:
ExternalError: Nccl error, unhandled system error at (/paddle/paddle/fluid/platform/nccl_helper.h:114)

Your Paddle Fluid is installed successfully ONLY for SINGLE GPU or CPU!
Let's start deep Learning with Paddle Fluid now
////////////////////////////////////////////////////////////////////
我打算用多卡训练，现在只能用一块卡，想咨询一下您训练的时候有没有用多卡。