yuangongnd / psla Goto Github PK
View Code? Open in Web Editor NEWCode for the TASLP paper "PSLA: Improving Audio Tagging With Pretraining, Sampling, Labeling, and Aggregation".
License: BSD 3-Clause "New" or "Revised" License
Code for the TASLP paper "PSLA: Improving Audio Tagging With Pretraining, Sampling, Labeling, and Aggregation".
License: BSD 3-Clause "New" or "Revised" License
File 'class_labels_indices.csv' is missing from the following path 'egs/fsd50k/class_labels_indices.csv'.
To reproduce:
python psla/egs/fsd50k/prep_fsd.py >> No such file or directory: './class_labels_indices.csv'
python psla/src/label_enhancement/fix_type1.py >> No such file or directory: '../../egs/fsd50k/class_labels_indices.csv'
I'm following the step-by-step implementation of PSLA here on GitHub, but when I run 'python3 prep_fsd.py,' it creates the folders FSD50K.dev_audio_16k and FSD50K.eval_audio_16k at the specified dataset path. However, it doesn't generate the converted audios inside the respective folders. Any idea what might be happening?
P.S.: The terminal indicates that the samples were created, but they were not.
The data path is defined as fsd_path = './dataset/', and this is the folder structure:
dataset
|
|--FSD50K.dev_audio
|--FSD50K.doc
|--FSD50K.eval_audio
|--FSD50K.ground_truth
|--FSD50K.metadata
Hi,
I used your pretrained 5th percentile, however it seems that it doesn't have a considerable effect as you can from Label histogram( music and musical instrument are still dominating). I wonder what will make fsd50k more balanced?
Is the provided json include also balancing process, as described in the article?
Thanks
Classic labels: (we splitted the 200 categories to 8 histograms for visibility).
Hi,
I'm missing what are you doing with all the weights in the csv file which gen_weight_File has created.
How do you use them afterward?
Thanks ;)
psla/src/label_enhancement/fix_type2.py
Line 18 in 16be239
๐ great work!
Are you able to provide a quick and dirty inference script like in the ast code base for failsafe inference on audioset and esc? that would be a great help
Thanks
As per README.md, the link under "(Optional) Step 2. Enhance the label of the balanced AudioSet training set"
this link doesn't exist:
[pretrained enhanced label set](https://github.com/YuanGongND/psla/blob/main/here)
Can anyone supply this file?
Hello!
I have a small doubt regarding the model parameters of the EfficientNet-B2 with 4 attention heads. In the paper, 13.64M are reported. However, in practice, after 'removing' the final classification layers from EfficientNet and adding the multi-head attention module, I get reported 7.71M instead of 13.64M. As you can see in the following screenshot, EfficientNet-B2 parameters are immediately reduced to 7.7M after getting rid of the classification layer. On top of that, the multi-head module only has around 11.000 parameters, resulting in 7.71M.
Am I missing something? I am reporting back the number of parameters of this model for my project but I am a bit confused about it. Could you clarify this for me? :)
Hello Yuan,
great work and thank you for making it available for other researchers.
I am currently testing deep learning models on my audio dataset to see which model performs better.
I saw you made available the pretrained EfficientNet B2 models with 4-headed attention. I was wondering if it would be possible to download other pretrained models too, e.g. EfficientNet B2 with Mean Pooling.
Thank you in advance,
Annalisa
Hi Yuan,
Thanks for open-sourcing this repo. I have a quick question about the MHA EfficientNet model you proposed. When I tried the EfficientNet-b2 with the multi-head attention model, I found some values in the out variable were bigger than one, instead of between 0-1. Is that intentionally designed?
Many Thanks
Hi,
I couldnt find in any of you recent publications on Audioset how you split the unbalanced (or even balanced) train segments to train and val for hyper parameter tuning. Just to try to replicate your results. also the dropbox link for the PSLA experiments you have listed is down.
On another note regarding FSD50k, could you elaborate what are those "forbidden" classes and why? also could you explain the purpose of this comment in prep_fsd50k.py when generating the JSON files please?:
"# only apply to the vocal sound data"
Thanks
Why you choose to use fbank instead of spectrogram? Thank you!
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.