yuangongnd / psla Goto Github PK

View Code? Open in Web Editor NEW

137.0 1.0 16.0 7.26 MB

Code for the TASLP paper "PSLA: Improving Audio Tagging With Pretraining, Sampling, Labeling, and Aggregation".

License: BSD 3-Clause "New" or "Revised" License

Shell 3.61% Python 96.39%

audio-classification audio deep-learning

psla's Issues

class_labels_indices.csv is missing in psla/egs/fsd50k/class_labels_indices.csv

File 'class_labels_indices.csv' is missing from the following path 'egs/fsd50k/class_labels_indices.csv'.

To reproduce:
python psla/egs/fsd50k/prep_fsd.py >> No such file or directory: './class_labels_indices.csv'
python psla/src/label_enhancement/fix_type1.py >> No such file or directory: '../../egs/fsd50k/class_labels_indices.csv'

prep_fsd.py problems

I'm following the step-by-step implementation of PSLA here on GitHub, but when I run 'python3 prep_fsd.py,' it creates the folders FSD50K.dev_audio_16k and FSD50K.eval_audio_16k at the specified dataset path. However, it doesn't generate the converted audios inside the respective folders. Any idea what might be happening?

P.S.: The terminal indicates that the samples were created, but they were not.
The data path is defined as fsd_path = './dataset/', and this is the folder structure:

impact of enhanced labeling on fsd50k

Hi,

I used your pretrained 5th percentile, however it seems that it doesn't have a considerable effect as you can from Label histogram( music and musical instrument are still dominating). I wonder what will make fsd50k more balanced?
Is the provided json include also balancing process, as described in the article?

Thanks

Classic labels: (we splitted the 200 categories to 8 histograms for visibility).

With Label Enhancement:

using gen_weight_File

Hi,

I'm missing what are you doing with all the weights in the csv file which gen_weight_File has created.

How do you use them afterward?

Thanks ;)

file `ontology.json` not found

psla/src/label_enhancement/fix_type2.py

Line 18 in 16be239

    
           with open('/data/sls/scratch/yuangong/aed-pc/src/utilities/ontology.json', 'r', encoding='utf8')as fp:

inference script

👋 great work!

Are you able to provide a quick and dirty inference script like in the ast code base for failsafe inference on audioset and esc? that would be a great help

Thanks

Pretrained enhanced label set

As per README.md, the link under "(Optional) Step 2. Enhance the label of the balanced AudioSet training set"
this link doesn't exist:
[pretrained enhanced label set](https://github.com/YuanGongND/psla/blob/main/here)

Can anyone supply this file?

Number of parameters of the model

Hello!

I have a small doubt regarding the model parameters of the EfficientNet-B2 with 4 attention heads. In the paper, 13.64M are reported. However, in practice, after 'removing' the final classification layers from EfficientNet and adding the multi-head attention module, I get reported 7.71M instead of 13.64M. As you can see in the following screenshot, EfficientNet-B2 parameters are immediately reduced to 7.7M after getting rid of the classification layer. On top of that, the multi-head module only has around 11.000 parameters, resulting in 7.71M.

Am I missing something? I am reporting back the number of parameters of this model for my project but I am a bit confused about it. Could you clarify this for me? :)

Pretrained models

Hello Yuan,
great work and thank you for making it available for other researchers.
I am currently testing deep learning models on my audio dataset to see which model performs better.
I saw you made available the pretrained EfficientNet B2 models with 4-headed attention. I was wondering if it would be possible to download other pretrained models too, e.g. EfficientNet B2 with Mean Pooling.
Thank you in advance,

Annalisa

Output of MHA EfficientNet model

Hi Yuan,

Thanks for open-sourcing this repo. I have a quick question about the MHA EfficientNet model you proposed. When I tried the EfficientNet-b2 with the multi-head attention model, I found some values in the out variable were bigger than one, instead of between 0-1. Is that intentionally designed?

Many Thanks

regarding dataset prep scripts and audioset splits

Hi,

I couldnt find in any of you recent publications on Audioset how you split the unbalanced (or even balanced) train segments to train and val for hyper parameter tuning. Just to try to replicate your results. also the dropbox link for the PSLA experiments you have listed is down.
On another note regarding FSD50k, could you elaborate what are those "forbidden" classes and why? also could you explain the purpose of this comment in prep_fsd50k.py when generating the JSON files please?:
"# only apply to the vocal sound data"

Thanks

Why you choose to use fbank instead of spectrogram?

Why you choose to use fbank instead of spectrogram? Thank you!

yuangongnd / psla Goto Github PK

psla's Issues

class_labels_indices.csv is missing in psla/egs/fsd50k/class_labels_indices.csv

prep_fsd.py problems

impact of enhanced labeling on fsd50k

using gen_weight_File

file `ontology.json` not found

inference script

Pretrained enhanced label set

Number of parameters of the model

Pretrained models

Output of MHA EfficientNet model

regarding dataset prep scripts and audioset splits

Why you choose to use fbank instead of spectrogram?

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent