yeyupiaoling / voiceprintrecognition-paddlepaddle Goto Github PK

View Code? Open in Web Editor NEW

187.0 5.0 42.0 2.06 MB

本项目使用了EcapaTdnn、ResNetSE、ERes2Net、CAM++等多种先进的声纹识别模型，同时本项目也支持了MelSpectrogram、Spectrogram、MFCC、Fbank等多种数据预处理方法

License: Apache License 2.0

Python 100.00%

paddlepaddle voice-recognition arcface speaker-recognition ecapa-tdnn

voiceprintrecognition-paddlepaddle's People

Stargazers

Watchers

Forkers

ole168 leolld linglinduan gt-acerzhang edencfc boragocode barryzm winderwl crazy11ll id-verne ground-truth dyfcode orctom dario12138 grasshourse lingxufeng simmoncn ada63 githublht14470309 antonizdp iablee fireae bigroup antonizhubar haisonx sundisee wallaceliu conor-yang-cn mdys kukupigs g-wellsa wmqkevin gzwuyouyong rapidai wendongj cloader hmaplelove cl102300 tingfengwuyou zer0days404 8-diagrams 394260262

voiceprintrecognition-paddlepaddle's Issues

ValueError: The ``path`` (models/ecapa_tdnn/model.pdparams) to load model not exists.

大佬我执行 python3 infer_contrast.py --audio_path1=audio/a_1.wav --audio_path2=audio/b_2.wav 的时候报 ValueError: The path (models/ecapa_tdnn/model.pdparams) to load model not exists. 同时我没有安装PaddlePaddle的Inference预测库，因为我在安装的时候总是报不适用当前的平台，但是其他环境我都安装完了，是因为这个的原因吗？顺便PaddlePaddle的Inference预测库只能是Nvidia Jetson吗？

采集的音频去静音处理

想通过对采集的音频进行去静音处理提高识别率，请问大佬有什么好的方法没？

您好,请问本库可以在centos环境下cpu使用吗?

如题 ,感谢!

数据读取慢

将中文语音语料数据集下载好后，运行create_data.py发现数据读取非常的慢

training with CPU

which configs need I to modify

请问大佬有木有训练好的模型？

有测试网页吗？

When there is only one data in the audio_db, will crash

Hi,

Great job!

I encountered a problem when using it. When there is only one data in the audio_db(just resigter one wav), an error will occur. Do you have any suggestions

请选择功能，0为注册音频到声纹库，1为执行声纹识别：1
Traceback (most recent call last):
File "/usr/local/src/VoiceprintRecognition-PaddlePaddle/infer_recognition.py", line 51, in
name = predictor.recognition(audio_data)
File "/usr/local/src/VoiceprintRecognition-PaddlePaddle/ppvector/predict.py", line 300, in recognition
name = self.__retrieval(np_feature=[feature])[0]
File "/usr/local/src/VoiceprintRecognition-PaddlePaddle/ppvector/predict.py", line 160, in __retrieval
similarity = cosine_similarity(self.audio_feature, feature.reshape(1, -1)).squeeze()
File "/data1/bigdata/miniconda3/envs/paddlespeech/lib/python3.9/site-packages/sklearn/metrics/pairwise.py", line 1393, in cosine_similarity
X, Y = check_pairwise_arrays(X, Y)
File "/data1/bigdata/miniconda3/envs/paddlespeech/lib/python3.9/site-packages/sklearn/metrics/pairwise.py", line 155, in check_pairwise_arrays
X = check_array(
File "/data1/bigdata/miniconda3/envs/paddlespeech/lib/python3.9/site-packages/sklearn/utils/validation.py", line 902, in check_array
raise ValueError(
ValueError: Expected 2D array, got 1D array instead:
array=[ -9.528404 20.831924 -7.2488213 -10.465841 3.648481
-27.02867 -21.46711 -7.2084546 -2.5436175 16.762877
-23.119123 -19.69215 19.306414 -29.709349 10.007728
9.711845 15.237292 -31.4283 13.326845 24.98609
-0.93558085 -13.328272 -1.4318435 -2.5589817 -14.899953
-10.004118 8.370364 -28.427952 -16.635942 17.125128
-21.187462 -13.563347 22.93637 0.2699321 -42.41188
-32.501728 -0.88023186 -17.82453 22.414608 -4.979337
-8.525855 23.49937 2.4326832 45.067253 -23.60708
-30.05538 1.6135166 -40.467884 6.419506 -22.83227
14.336002 -6.9231305 -2.9081142 -3.9221501 22.34546
15.799733 -31.135666 -11.1763735 -36.390778 20.186132
-1.3171989 3.5721273 -6.8223796 -0.87155807 0.6096292
-0.7906767 24.010586 -30.601904 42.77444 4.056578
8.387045 -32.088486 1.4989619 -16.874323 -14.909355
10.0754 26.727545 3.1605248 -7.187451 17.319191
-29.09326 -10.794649 16.416176 -39.16383 8.402718
-18.068346 -24.327047 -10.149664 32.352417 29.281029
23.015427 39.01204 -0.17142385 18.62544 -43.314125
-3.381555 -24.813742 -9.385822 -14.046436 -30.988276
-31.660748 -11.767179 12.45567 -14.2988205 -18.520817
-4.230826 7.557493 19.553474 30.747616 -22.761354
9.038652 30.985561 -43.875 -6.442013 -13.843025
-22.249443 -9.539998 -8.984698 -6.8686695 -10.849445
-3.2795053 -10.321569 20.782562 -33.36194 29.603746
5.8829403 -16.764643 33.195232 26.174906 -10.609048
-0.93014306 21.179016 3.711417 -30.199566 -3.0830624
-15.195986 3.977836 -22.489988 -32.214226 -38.9151
11.913775 5.5171957 -16.352848 -14.75191 -25.292871
12.144376 9.523496 31.895811 12.43035 25.481085
19.512949 2.1031296 28.158194 0.76135576 6.5883613
2.3754144 -9.671471 26.664793 15.48469 -17.29369
6.898872 11.445733 9.200221 -12.630247 -10.417504
0.6524699 -0.5872514 -9.257348 -18.316444 -13.710389
7.0560064 -28.609156 8.623157 -5.470796 -3.8638942
16.686884 -11.8199415 -30.546083 16.687752 9.732939
-27.76715 -24.231022 -3.1489227 -10.5073 -0.15581396
7.884112 -11.454353 -6.92933 20.55611 -7.311349
25.531921 -8.121537 ].
Reshape your data either using array.reshape(-1, 1) if your data has a single feature or array.reshape(1, -1) if it contains a single sample.

转onnx报错

老哥，你试过你新改的这个代码转onnx吗？好像不成功
File "/data1/sda/lingck/VoiceRecogntion/VoiceprintRecognition-PaddlePaddle/modules/ecapa_tdnn.py", line 186, in forward
y_i = self.blocksi - 1
else:
y_i = self.blocks[i - 1](x_i + y_i)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ <--- HERE
y.append(y_i)
y = paddle.concat(y, axis=1)

UnboundLocalError: local variable 'y_i' referenced before assignment

小白提问

请问各位大佬，我这边有几百条标记好讲话人的语音作为训练集，打算之后用训练好的模型去给更多的语音标记讲话人。

那我是直接拿我的这几百条去训练和使用，还是先用zhvoice 的数据训练完，再用我自己这几百条去训练呢？

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.