Code Monkey home page Code Monkey logo

pfann's Introduction

pfann

This is an unofficial reproduction of paper "Neural Audio Fingerprint for High-specific Audio Retrieval based on Contrasive Learning."

Now I have a thesis that is a "trivial" improvement to the above paper: "Improvement of Neural Network- and Landmark-based Audio Fingerprinting" (in Traditional Chinese). Link here

Note: I am now employed and our company does not allow GitHub login during work. I have less time to work on my side project or maintain my thesis code, and I do not have access to high performance GPU (currently), so I cannot solve compatibility issues or problems related to training. Finally I bought a gaming computer in 2023, now I can help you solve training issues.

Install

conda install python=3.9 # python 3.10 doesn't work with faiss...
conda install pytorch torchvision torchaudio pytorch-cuda=11.7 -c pytorch -c nvidia # I forget which version of PyTorch I used, but latest PyTorch seems to work
conda install -c pytorch faiss-gpu # can also be faiss-cpu if you don't test gpu-accelerated search
pip install tqdm
pip install tensorboardX
pip install torch_optimizer
pip install scipy
pip install julius
pip install matplotlib # for visualization purpose, not needed for server
pip install seaborn # for visualization purpose, not needed for server
pip install scikit-learn

Prepare dataset

FMA dataset

Download fma_medium from https://github.com/mdeff/fma and unzip to ../pfann_dataset/fma_medium .

python tools/listaudio.py --folder ../pfann_dataset/fma_medium --out lists/fma_medium.csv
python tools/filterduration.py --csv lists/fma_medium.csv --min-len 29.9 --out lists/fma_medium_30s.csv
python tools/traintestsplit.py --csv lists/fma_medium_30s.csv --train lists/fma_medium_train.csv --train-size 10000 --test lists/fma_medium_valtest.csv --test-size 1000
python tools/traintestsplit.py --csv lists/fma_medium_valtest.csv --train lists/fma_medium_val.csv --train-size 500 --test lists/fma_medium_test.csv --test-size 500
python tools/traintestsplit.py --csv lists/fma_medium_train.csv --train-size 2000 --train lists/fma_inside_test.csv
rm test.csv
python tools/listaudio.py --folder ../pfann_dataset/fma_large --out lists/fma_large.csv

AudioSet

Download 3 csv files unbalanced_train_segments.csv, balanced_train_segments.csv, eval_segments.csv, and ontology.json from https://research.google.com/audioset/download.html . Then run these to list all the videos needed:

python tools/audioset.py /path/to/unbalanced_train_segments.csv lists/audioset1.csv --ontology /path/to/ontology.json
python tools/audioset.py /path/to/balanced_train_segments.csv lists/audioset2.csv --ontology /path/to/ontology.json
python tools/audioset.py /path/to/eval_segments.csv lists/audioset3.csv --ontology /path/to/ontology.json

Use these commands to crawl videos from youtube and convert to wav:

python tools/audioset2.py lists/audioset1.csv ../pfann_dataset/audioset
python tools/audioset2.py lists/audioset2.csv ../pfann_dataset/audioset
python tools/audioset2.py lists/audioset3.csv ../pfann_dataset/audioset

After downloading, run this command to list all successfully downloaded files:

python tools/listaudio.py --folder ../pfann_dataset/audioset --out lists/noise.csv

This command will show errors because some videos are unavailable.

Finally run the command:

python tools/filterduration.py --csv lists/noise.csv --min-len 9.9 --out lists/noise_10s.csv
python tools/traintestsplit.py --csv lists/noise_10s.csv --train lists/noise_train.csv --train-size 8 --test lists/noise_val.csv --test-size 2 -p

Microphone impulse response dataset

Go to http://micirp.blogspot.com/ , and download files to ../pfann_dataset/micirp. Then run the commands:

python tools/listaudio.py --folder ../pfann_dataset/micirp --out lists/micirp.csv
python tools/traintestsplit.py --csv lists/micirp.csv --train lists/micirp_train.csv --train-size 8 --test lists/micirp_val.csv --test-size 2 -p

Aachen Impulse Response Database

Download zip from https://www.iks.rwth-aachen.de/en/research/tools-downloads/databases/aachen-impulse-response-database/ and unzip to ../pfann_dataset/AIR_1_4.

python -m datautil.ir ../pfann_dataset/AIR_1_4 lists/air.csv
python tools/traintestsplit.py --csv lists/air.csv --train lists/air_train.csv --train-size 8 --test lists/air_val.csv --test-size 2 -p

Train

python train.py --param configs/default.json -w4

Generate query

Inside test (not used in my thesis anymore):

python genquery.py --params configs/gentest.json --len 10 --num 2000 --mode train --out out/queries/inside

Assume that you have installed all the datasets, then just run this to generate all queries:

./genall.sh

Will output to folders out/queries/out2_snr$snr, where $snr is one of -6, -4, -2, 0, 2, 4, 6, 8. The query list (used by matcher.py) is out/queries/out2_snr$snr/list.txt, and the ground truth is out/queries/out2_snr$snr/expected.csv.

Build a fingerprint database

Inside test (not used in my thesis anymore):

python tools/csv2txt.py --dir ../pfann_dataset/fma_medium lists/fma_medium_train.csv --out lists/fma_medium_train.txt
python builder.py lists/fma_medium_train.txt /path/to/db configs/default.json

Usage of builder.py:

python builder.py <music list file> <output database location> <model config>

Music list file is a file containing list of music file paths. File must be UTF-8 without BOM. For example:

/path/to/fma_medium/000/000002.mp3
/path/to/fma_medium/000/000005.mp3
/path/to/your/music/aaa.wav
/path/to/your/music/bbb.wav

Model config is a JSON file like in configs/ folder. It is used to load a trained model. If omitted, the model config is configs/default.json by default.

This program supports both MP3 and WAV audio format. Relative paths are supported but not recommended.

Recognize music

Usage of matcher.py:

python matcher.py <query list> <database location> <output result file>

Query list is a file containing list of query file paths. For example:

/path/to/queries/out2_snr2/000002.wav
/path/to/queries/out2_snr2/000005.wav
/path/to/song_recorded_on_street1.wav
/path/to/song_recorded_on_street2.wav

Database location is the place where builder.py saves database.

The result file will be a TSV file with 2 fields: query file path, and matched music path, but without header. It may look like this:

/path/to/queries/out2_snr2/000002.wav	/path/to/fma_medium/000/000002.mp3
/path/to/queries/out2_snr2/000005.wav	/path/to/fma_medium/000/000005.mp3
/path/to/song_recorded_on_street1.wav	/path/to/your/music/aaa.wav
/path/to/song_recorded_on_street2.wav	/path/to/your/music/aaa.wav

Matcher will also generate a _detail.csv file and a .bin file. CSV file contains more information about the matches. It has 5 columns: query, answer, score, time, and part_scores.

  • query: Query file path
  • answer: Matched music path
  • score: Matching score, used in my thesis
  • time: The time when the query clip starts in the matched music, in seconds
  • part_scores: Mainly used for debugging, currently empty

BIN file contains matching scores of every database music for each query. It is used in my ensemble experiments. The file format is a flattened 2D array of following structure, without header:

struct match_t {
  float score; // Matching score
  float offset; // The time when the query clip starts in the matched music, in seconds
};

The matching score of j-th database music in i-th query is at index i * database size + j.

Evaluation

python tools/accuracy.py /path/to/query6s/expected.csv /path/to/result_detail.csv

Ensemble experiment

python ensemble/svmheatmap2.py out/lm_ out/shift_4_ out/svm lin_acc.csv

More info TODO

pfann's People

Contributors

stdio2016 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

pfann's Issues

matcher.py input .txt not explicit

Expected format for /path/to/query6s/list.txt the input of the matcher.py is not explicit nor visible anywhere.
Thanks for all your efforts, the library is great and inspiring

[REQ] add a (GH-compliant) license file

Hi there, 1st of all thanks for this awesome work !

Since we've 'doxed' it in our HyMPS project (under AUDIO section \ AIbased page \ Fingerprinting), can you please add a GH-compliant license file for it ?

As you know, expliciting licensing terms is extremely important to let anyone better/faster understand how to reuse/adapt/modify sources (and not only) in other open projects and vice-versa.

Although it may sounds like a minor aspect, license file omission obviously causes an inconsistent generation of the relative badge too:


(badge-generator URL: https://badgen.net/github/license/stdio2016/pfann)

You can easily set a standardized one through the GH's license wizard tool.

Last but not least, let us know how we could improve - in your opinion - our categorizations and links to resources in order to favor collaboration between developers (and therefore evolution) of listed projects.

Hope that helps/inspires !

Finding Timestamps in a Long Audio File

Hello, I find your code very interesting.
Instead of matching among multiple audio files, I am looking to modify the code to match a fragment of audio (3-10 seconds) within a single audio file (about 2 hours) and obtain its offset (start time). Is this possible?
If so, which part of the code should I refer to?

Translate my thesis

Currently my thesis is in Traditional Chinese. I want more people to see this research, so I am going to translate my thesis to English.

why download fail?

python tools/audioset2.py lists/audioset1.csv ../pfann_dataset/audioset
python tools/audioset2.py lists/audioset2.csv ../pfann_dataset/audioset
python tools/audioset2.py lists/audioset3.csv ../pfann_dataset/audioset

download -4N3GjgnIQ0 from 260 to 270

failed to download ;-(
stop for a moment~~~
download -5lI0Wt-pNE from 250 to 260
failed to download ;-(
stop for a moment~~~
download -ADTZNx531s from 340 to 350
failed to download ;-(
stop for a moment~~~
download -D1uGn8bvIA from 40 to 50
failed to download ;-(
stop for a moment~~~
download -DZN4RxMD3s from 140 to 150
failed to download ;-(
stop for a moment~~~
download -DrChjJRNIk from 80 to 90
failed to download ;-(
stop for a moment~~~
download -EveIKsp3nE from 270 to 280

Providing a pretrained model

Hello,

I know this might be too late to ask but can you provide some of your pretrained fp models? I saw that you have done experiments with many values of training parameters.
And btw congrats with your work! It is really helpful and inspiring to me.

Do you use a different loss function?

Is the loss function different from the original paper? Also the method to calculate validate score? I can't find any details about them in your thesis.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.