您好，我看到demo是对小于3秒的音频进行训练，如果我想对一秒的音频进行训练，最终训练出一秒音频的类别模型，除了对小于3s音频进行过滤那里，还需要改动哪里呢？

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

<a class="user-mention notranslate" data-hovercard-type="user" data-hover

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

<a class="user-mention notranslate" data-hovercard-type="user" data-hover

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

<a class="user-mention notranslate" data-hovercard-type="user" data-hover

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

如何对小于3秒的数据进行训练？ about audioclassification-tensorflow HOT 28 CLOSED

yeyupiaoling commented on August 26, 2024

如何对小于3秒的数据进行训练？

from audioclassification-tensorflow.

Comments (28)

yeyupiaoling commented on August 26, 2024

@Eleven-is-cool 我已经更新代码，可以方便其他的音频长度输入，具体看一下几个地方，我都有注释：
https://github.com/yeyupiaoling/AudioClassification_Tensorflow/blob/b719ad472b6b4fbca2455071516cdae0231ba361/create_data.py#L46

https://github.com/yeyupiaoling/AudioClassification_Tensorflow/blob/b719ad472b6b4fbca2455071516cdae0231ba361/create_data.py#L61

https://github.com/yeyupiaoling/AudioClassification_Tensorflow/blob/b719ad472b6b4fbca2455071516cdae0231ba361/create_data.py#L83

https://github.com/yeyupiaoling/AudioClassification_Tensorflow/blob/b719ad472b6b4fbca2455071516cdae0231ba361/reader.py#L7

https://github.com/yeyupiaoling/AudioClassification_Tensorflow/blob/b719ad472b6b4fbca2455071516cdae0231ba361/train.py#L27

https://github.com/yeyupiaoling/AudioClassification_Tensorflow/blob/b719ad472b6b4fbca2455071516cdae0231ba361/train.py#L51

from audioclassification-tensorflow.

Eleven-is-cool commented on August 26, 2024

@Eleven-is-cool 我已经更新代码，可以方便其他的音频长度输入，具体看一下几个地方，我都有注释：
https://github.com/yeyupiaoling/AudioClassification_Tensorflow/blob/b719ad472b6b4fbca2455071516cdae0231ba361/create_data.py#L46

https://github.com/yeyupiaoling/AudioClassification_Tensorflow/blob/b719ad472b6b4fbca2455071516cdae0231ba361/create_data.py#L61

https://github.com/yeyupiaoling/AudioClassification_Tensorflow/blob/b719ad472b6b4fbca2455071516cdae0231ba361/create_data.py#L83

https://github.com/yeyupiaoling/AudioClassification_Tensorflow/blob/b719ad472b6b4fbca2455071516cdae0231ba361/reader.py#L7

https://github.com/yeyupiaoling/AudioClassification_Tensorflow/blob/b719ad472b6b4fbca2455071516cdae0231ba361/train.py#L27

https://github.com/yeyupiaoling/AudioClassification_Tensorflow/blob/b719ad472b6b4fbca2455071516cdae0231ba361/train.py#L51

好的，真是太感谢您了！还有些问题，就是训练的终止条件是什么？因为用了ubansound8k的数据集来训练，训练了大概两个小时，准确率非常低且两个小时内的准确率一直没有变化，并且训练一直没有停止，请问这是什么原因呢？

from audioclassification-tensorflow.

yeyupiaoling commented on August 26, 2024

@Eleven-is-cool 你的准确率是多少呢？测试准确率有80%以上，训练集更是达到了98%以上。
这些是设置训练轮数的：
https://github.com/yeyupiaoling/AudioClassification_Tensorflow/blob/e66d79f93e2053d188b24abebef0e2463952a4d9/train.py#L6

from audioclassification-tensorflow.

Eleven-is-cool commented on August 26, 2024

@Eleven-is-cool 你的准确率是多少呢？测试准确率有80%以上，训练集更是达到了98%以上。
这些是设置训练轮数的：
https://github.com/yeyupiaoling/AudioClassification_Tensorflow/blob/e66d79f93e2053d188b24abebef0e2463952a4d9/train.py#L6

再次感谢您能回答我的问题！我看是我之前改了分类数量没改回来，我重新跑了一下确实没问题。我现在想尝试做个二分类，目标是识别出有没有人在说话，我在您的代码上只改了分类数量，还需要做什么吗？因为我在只改了分类数量的情况下效果不理想。

from audioclassification-tensorflow.

yeyupiaoling commented on August 26, 2024

目标是识别出有没有人在说话

你具体的需要的效果是怎么样的，是区分人在说话或者其他声音，还是区分人在说话或者静音这种状态？如果是后者，可以参考：https://blog.doiduoyi.com/articles/2020/04/16/1587006578892.html

from audioclassification-tensorflow.

Eleven-is-cool commented on August 26, 2024

目标是识别出有没有人在说话

你具体的需要的效果是怎么样的，是区分人在说话或者其他声音，还是区分人在说话或者静音这种状态？如果是后者，可以参考：https://blog.doiduoyi.com/articles/2020/04/16/1587006578892.html

感谢您提供的资料！我是想根据您这个demo来训练出一个二分类模型，目前数据集是人声的和其他非人声的，如果单纯在您的demo上改变分类数量是不是不能达到理想的效果。或者我还应该怎么做呢？

from audioclassification-tensorflow.

yeyupiaoling commented on August 26, 2024

@Eleven-is-cool 这应该是可以的，是不是你的数据集太少了，或者每条音频的长度太短了。

from audioclassification-tensorflow.

Eleven-is-cool commented on August 26, 2024

@Eleven-is-cool 这应该是可以的，是不是你的数据集太少了，或者每条音频的长度太短了。

您好，现在总共是13G的数据集，因为我最终预测的时候是对1s的音频进行判断，数据集就算不是1s的也没有影响吗？

from audioclassification-tensorflow.

yeyupiaoling commented on August 26, 2024

@Eleven-is-cool 一秒时间太短了，你试试2秒或者3秒

from audioclassification-tensorflow.

yeyupiaoling commented on August 26, 2024

@Eleven-is-cool 你用的是最新的代码吗？

from audioclassification-tensorflow.

Eleven-is-cool commented on August 26, 2024

@Eleven-is-cool 你用的是最新的代码吗？

对的，不过我的数据集都是1s的。训练集超过1s，对预测1s的音频得到的结果是不会有影响的吗？
我是认为我每次识别都是对1s的音频做出判断，所以我的训练数据也应该是1s，请问这种思路是不是错了？

from audioclassification-tensorflow.

yeyupiaoling commented on August 26, 2024

@Eleven-is-cool 如果你的数据集都是1秒的，那也只能是设置为1秒了。如果是使用最新的代码，在创建数据集时，会裁剪掉静音部分，应该不会有太多噪声的。你有自己去听一下数据，有没有什么问题吗？

from audioclassification-tensorflow.

Eleven-is-cool commented on August 26, 2024

@Eleven-is-cool 如果你的数据集都是1秒的，那也只能是设置为1秒了。如果是使用最新的代码，在创建数据集时，会裁剪掉静音部分，应该不会有太多噪声的。你有自己去听一下数据，有没有什么问题吗？

您好，数据集听起来是没问题的，裁剪静音我只对人声数据集这么做，非人声没有进行裁剪操作。我发现我的数据集里面混有单双通道的wav，我预测的都是双通道的wav，请问单双通道的音频是否需要全部改成统一的通道？

from audioclassification-tensorflow.

yeyupiaoling commented on August 26, 2024

@Eleven-is-cool 这个影响不大，这我就找不到什么问题了。1秒也是太短了

from audioclassification-tensorflow.

Eleven-is-cool commented on August 26, 2024

https://github.com/yeyupiaoling/AudioClassification_Tensorflow/blob/b719ad472b6b4fbca2455071516cdae0231ba361/create_data.py#L61

https://github.com/yeyupiaoling/AudioClassification_Tensorflow/blob/b719ad472b6b4fbca2455071516cdae0231ba361/reader.py#L7

https://github.com/yeyupiaoling/AudioClassification_Tensorflow/blob/b719ad472b6b4fbca2455071516cdae0231ba361/train.py#L27

https://github.com/yeyupiaoling/AudioClassification_Tensorflow/blob/b719ad472b6b4fbca2455071516cdae0231ba361/train.py#L51

您好，十分感谢您能抽空回答我的问题。我想再次请教一下，如果我都是1s的音频，那么上面参数具体应该怎么改呢？

from audioclassification-tensorflow.

yeyupiaoling commented on August 26, 2024

@Eleven-is-cool 你查看这些代码，都有注释的，要看最新的代码，这个是旧的。

from audioclassification-tensorflow.

Eleven-is-cool commented on August 26, 2024

@Eleven-is-cool 你查看这些代码，都有注释的，要看最新的代码，这个是旧的。

好的，我是想问一下shape与音频时长的关系，比如我是1s的音频，那shape应该是多少呢？

from audioclassification-tensorflow.

yeyupiaoling commented on August 26, 2024

@Eleven-is-cool 你要打印才知道，

from audioclassification-tensorflow.

Eleven-is-cool commented on August 26, 2024

@Eleven-is-cool 你要打印才知道，

好的，开始没看清楚。我根据你提供的修改了那些参数，1s的shape是123，63，还改了class_dim = 2。但是这个准确率一直没变化，且很低。

我用过kears搭建最简单的顺序模型准确率可以达到98%左右，应该是可以证明数据集没问题的？

from audioclassification-tensorflow.

yeyupiaoling commented on August 26, 2024

@Eleven-is-cool 那你修改一下网络模型，我的不一定是最好的

from audioclassification-tensorflow.

Eleven-is-cool commented on August 26, 2024

@Eleven-is-cool 那你修改一下网络模型，我的不一定是最好的

好的，再次感谢你！

from audioclassification-tensorflow.

StearArre commented on August 26, 2024

@Eleven-is-cool 你用的是最新的代码吗？

对的，不过我的数据集都是1s的。训练集超过1s，对预测1s的音频得到的结果是不会有影响的吗？
我是认为我每次识别都是对1s的音频做出判断，所以我的训练数据也应该是1s，请问这种思路是不是错了？

“如果需要预测的数据都为1s的，但训练集数据都为超过1s的，对预测1s的音频得到的结果是否有影响？”你好，麻烦问一下，这个结论是什么呀？训练数据的时间长短与预测数据的时间长短关系是怎样的？

from audioclassification-tensorflow.

yeyupiaoling commented on August 26, 2024

@StearArre 当然声音长一点有效的特征会多一点，会有一点影响。

from audioclassification-tensorflow.

StearArre commented on August 26, 2024

@StearArre 当然声音长一点有效的特征会多一点，会有一点影响。

嗯嗯感谢回复晓得了。
我正在按文章所述训练中，已经过去3个小时了batch_id才跑5260，最终需要得跑到多少才会停止，样本数/32*100么？

from audioclassification-tensorflow.

yeyupiaoling commented on August 26, 2024

@StearArre 训练的100轮的，每一轮多少个batch，这个要看你的数据集大小了，这个你随时可以停的，因为每一轮都会保存模型。

AudioClassification-Tensorflow/train.py

Line 6 in 3641796

EPOCHS = 100

from audioclassification-tensorflow.

StearArre commented on August 26, 2024

@StearArre 训练的100轮的，每一轮多少个batch，这个要看你的数据集大小了，这个你随时可以停的，因为每一轮都会保存模型。

AudioClassification-Tensorflow/train.py

Line 6 in 3641796

EPOCHS = 100

再次感谢你~
我也遇到了训练一段时间，可是Loss跟Accuracy自始至终一直没变的情况，loss一直是14.439128，Accuracy一直是0.104167，麻烦问一下我哪里有问题么？
我操作如下：
1、我提前把音频数据集存放在单独目录下，每个文件夹存放一个类别的音频数据，如/dataset/UrbanSound8K_byclass/car_horn
2、执行create_data.py中的get_data_list('dataset/UrbanSound8K_byclass', 'dataset')
--->create_data_tfrecord('dataset/train_list.txt', 'dataset/train.tfrecord')
--->create_data_tfrecord('dataset/test_list.txt', 'dataset/test.tfrecord')
3、执行train.py，训练过程截取如下：
Batch 5420, Loss 16.118095, Accuracy 0.000000
Batch 5440, Loss 16.118095, Accuracy 0.000000
Batch 5460, Loss 16.118095, Accuracy 0.000000
Batch 5480, Loss 16.118095, Accuracy 0.000000
Batch 5500, Loss 4.533215, Accuracy 0.718750
Batch 5520, Loss 12.088572, Accuracy 0.250000
Batch 5540, Loss 15.614405, Accuracy 0.031250
Batch 5560, Loss 15.110714, Accuracy 0.062500
Batch 5580, Loss 14.607023, Accuracy 0.093750
Batch 5600, Loss 15.110714, Accuracy 0.062500
Test, Loss 14.439128, Accuracy 0.104167
==================save ok=====================
Batch 5620, Loss 15.614405, Accuracy 0.031250
Batch 5640, Loss 15.614405, Accuracy 0.031250
Batch 5660, Loss 16.118095, Accuracy 0.000000
Batch 5680, Loss 15.614405, Accuracy 0.031250
Batch 5700, Loss 16.118095, Accuracy 0.000000
Batch 5720, Loss 16.118095, Accuracy 0.000000
Batch 5740, Loss 8.059048, Accuracy 0.500000
Batch 5760, Loss 10.577500, Accuracy 0.343750
Batch 5780, Loss 13.599644, Accuracy 0.156250
Batch 5800, Loss 14.103333, Accuracy 0.125000
Test, Loss 14.439128, Accuracy 0.104167
==================save ok=====================
Batch 5820, Loss 15.614405, Accuracy 0.031250
Batch 5840, Loss 15.614405, Accuracy 0.031250
Batch 5860, Loss 16.118095, Accuracy 0.000000
Batch 5880, Loss 16.118095, Accuracy 0.000000
Batch 5900, Loss 16.118095, Accuracy 0.000000
Batch 5920, Loss 16.118095, Accuracy 0.000000
Batch 5940, Loss 16.118095, Accuracy 0.000000
Batch 5960, Loss 16.118095, Accuracy 0.000000
Batch 5980, Loss 7.555357, Accuracy 0.531250
Batch 6000, Loss 11.584881, Accuracy 0.281250
Test, Loss 14.439128, Accuracy 0.104167
==================save ok=====================

from audioclassification-tensorflow.

yeyupiaoling commented on August 26, 2024

@StearArre 你的数据是多少个类别的，准确率不应该这么低的

from audioclassification-tensorflow.

StearArre commented on August 26, 2024

@StearArre 你的数据是多少个类别的，准确率不应该这么低的

数据就是UrbanSound8K 数据集的10个分类，我是在复制你的整个流程。
我也很困惑。
我Python版本为 3.6.7
是否是test_list.txt跟train_list.txt错误导致tfrecord错误引起？
test_list.txt 截取如下：

dataset/UrbanSound8K_byclass/engine_idling/176787-5-0-23.wav 3
dataset/UrbanSound8K_byclass/engine_idling/176638-5-0-1.wav 3
dataset/UrbanSound8K_byclass/engine_idling/146186-5-0-6.wav 3
dataset/UrbanSound8K_byclass/engine_idling/195451-5-0-9.wav 3
dataset/UrbanSound8K_byclass/car_horn/180156-1-0-0.wav 4
dataset/UrbanSound8K_byclass/car_horn/54187-1-0-2.wav 4
dataset/UrbanSound8K_byclass/street_music/132162-9-1-58.wav 5
dataset/UrbanSound8K_byclass/street_music/66324-9-0-53.wav 5
dataset/UrbanSound8K_byclass/street_music/21684-9-0-39.wav 5
dataset/UrbanSound8K_byclass/street_music/105425-9-0-10.wav 5
dataset/UrbanSound8K_byclass/street_music/42955-9-0-14.wav 5
dataset/UrbanSound8K_byclass/street_music/113601-9-0-34.wav 5
dataset/UrbanSound8K_byclass/street_music/155044-9-0-15.wav 5
dataset/UrbanSound8K_byclass/street_music/77901-9-0-4.wav 5
dataset/UrbanSound8K_byclass/street_music/108638-9-0-0.wav 5
dataset/UrbanSound8K_byclass/street_music/39967-9-0-99.wav 5
dataset/UrbanSound8K_byclass/dog_bark/34872-3-0-1.wav 6
dataset/UrbanSound8K_byclass/dog_bark/72724-3-2-8.wav 6
dataset/UrbanSound8K_byclass/dog_bark/63292-3-0-0.wav 6
dataset/UrbanSound8K_byclass/dog_bark/76566-3-0-3.wav 6
dataset/UrbanSound8K_byclass/dog_bark/184575-3-0-1.wav 6
dataset/UrbanSound8K_byclass/dog_bark/118101-3-0-0.wav 6
dataset/UrbanSound8K_byclass/dog_bark/179386-3-0-1.wav 6
dataset/UrbanSound8K_byclass/dog_bark/19496-3-1-0.wav 6
dataset/UrbanSound8K_byclass/siren/107357-8-1-14.wav 7
dataset/UrbanSound8K_byclass/siren/24347-8-0-93.wav 7
dataset/UrbanSound8K_byclass/siren/159755-8-0-0.wav 7
dataset/UrbanSound8K_byclass/siren/71177-8-1-3.wav 7
dataset/UrbanSound8K_byclass/siren/159738-8-0-4.wav 7
dataset/UrbanSound8K_byclass/siren/102871-8-0-7.wav 7
dataset/UrbanSound8K_byclass/siren/164782-8-0-7.wav 7
dataset/UrbanSound8K_byclass/siren/22601-8-0-44.wav 7
dataset/UrbanSound8K_byclass/siren/106905-8-0-3.wav 7
dataset/UrbanSound8K_byclass/gun_shot/135527-6-14-5.wav 8
dataset/UrbanSound8K_byclass/drilling/169466-4-0-12.wav 9
dataset/UrbanSound8K_byclass/drilling/66622-4-0-1.wav 9
dataset/UrbanSound8K_byclass/drilling/168037-4-6-0.wav 9
dataset/UrbanSound8K_byclass/drilling/166931-4-2-13.wav 9
dataset/UrbanSound8K_byclass/drilling/161129-4-0-16.wav 9
dataset/UrbanSound8K_byclass/drilling/151005-4-1-2.wav 9
dataset/UrbanSound8K_byclass/drilling/99192-4-0-26.wav 9
dataset/UrbanSound8K_byclass/drilling/50416-4-0-2.wav 9
dataset/UrbanSound8K_byclass/drilling/180134-4-2-0.wav 9

from audioclassification-tensorflow.

如何对小于3秒的数据进行训练？ about audioclassification-tensorflow HOT 28 CLOSED

Comments (28)

Related Issues (5)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent