Code Monkey home page Code Monkey logo

neural_srp's Introduction

Neural-SRP

example.png

This repository contains the code for the Neural-SRP paper, published in the Open Journal of Signal Processing (OJSP). Neural-SRP is a neural network-based multi-source tracking algorithm which combines architectural features from SRP-PHAT, an established model-based algorithm for sound source localization.

This code contains the code used for training the Neural-SRP method under different scenarios, and also the code used for evaluating the performance of the trained models on the LOCATA and TAU-NIGENS datasets.

Configuration

Parameters are controled in the params.json file. Note that some parameters should be changed depending on the script being run, as detailed below. The most important parameters are:

  • model: the model to be trained. It can be neural_srp, doanet, cross3d or srp. doanet only works for the TAU-NIGENS (train_multisource.py) dataset.
  • win_size: the window size in samples. Set to 640 when using the TAU-NIGENS dataset, and 4096 when using the LOCATA dataset.
  • hop_size: the hop size in percentage. Set to 0.5 when using the TAU-NIGENS dataset, and 0.75 when using the LOCATA dataset.
  • model_checkpoint_path: the path to the pretrained model. Must be compatible with the model parameter.

Main scripts

After setting the correct parameters in params.json, you can run the following scripts using python script_name.py. They are:

  • visualize_locata.py Visualize single-source tracking trajectories on the LOCATA dataset. Generates results of Tables 3 and 4 in the paper.
  • visualize_tau.py Visualize multi-source tracking trajectories on the TAU-NIGENS dataset. Generates results of Table 5 in the paper.
  • train_singlesource.py trains the single-source models using simulated data.
  • train_multisource.py trains the multi-source models using the TAU-NIGENS dataset.
  • analyze_complexity.py analyzes the complexity of the different models. Generates results of Table 6 in the paper.

Pretrained models

You can find the pretrained models in the checkpoints folder.

Datasets

  • LOCATA LOCATA challenge.
  • TAU-NIGENS TAU-NIGENS dataset. After downloading, run the preprocessing script python -m datasets.preprocess_tau_nigens_dataset to generate the dataset in the correct format. Set the variable path_tau_nigens_preprocessed to the path of the output preprocessed dataset in params.json.

neural_srp's People

Contributors

egrinstein avatar yezhangyinge avatar

Stargazers

Yuchen Wu avatar  avatar sagit avatar  avatar  avatar  avatar oucxlw avatar

Watchers

 avatar

neural_srp's Issues

Problems encountered in reproducing the model

Hi, I've tried to reproduce neural-srp and encountered some problems:

  1. when I proprocess the tau-nigens dataset, the following error occurs, and when I change the value of self._nb_unique_classes from 2 to
    3 it have been solved, Does it mean that the samples in the dataset do not have more than 2 sound source activities at most?
tnb_classes[frame_ind, active_event] = 1
IndexError: index 2 is out of bounds for axis 1 with size 2
  1. But When I tried to visualize the tau using the neural-srp-multi.bin, it reported that:
target_doas = target_doas.view( target_doas.shape[0], target_doas.shape[1], 3, max_nb_doas ).transpose(-1, -2)
RuntimeError: shape '[1, 50, 3, 2]' is invalid for input of size 500

Is it because I change the value of self._nb_unique_classes?

  1. When I tried to loading the doanet.bin to visualize the tau, it occured:
    c884ce4b050eff7fd256d6c8227d20f
    how can I solve it?

Thank you in advance, looking forward to your reply!

Questions bout the feature shape transform during normalization

Hello, after reading your code, i have some questions:

In GccExtractor, the feat matric shape is

gcc_feat = np.zeros((nb_frames, self._nb_bins, n_output_channels))
gcc_feat = gcc_feat.transpose((0, 2, 1))

and, in Preprocessor::extract_all_feature, the feat is trans to

        feat = self._gcc_extractor(audio_in)
        nb_frames = feat.shape[1]
        feat = feat.transpose((0, 2, 1)).reshape((nb_frames, -1))

which not make sense. ( i mean, the nb_frames refers to different meanings

I can't get it, could u help me about it ๏ผŸ If there is something wrong in this version of code? Thanks!!!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.