Code Monkey home page Code Monkey logo

field-recording-segmentation's Introduction

field-recording-segmentation

Python and Matlab code for segmentation of field recordings. Please cite the following paper if you use the code in your research:

Tensorflow

The folder contains Python code for training and using the deep learning models for labelling field recordings into a set of classes (e.g. speech, singing, instrumental etc.)

  • generate_features.py processes audio files, splits them into frames, calculates FFT and stores a them into Tensorflow .tfr files to be read during training. Input to the program is a list of folders, each should include a "sample labels XXX.txt" text file, which contains a list of audio files in the folder XXX and their corresponding class labels. Both, the FFT ans the labels are stored in the tfr files
  • train_and_test.py trains and test a deep residual model. It reads .tfr files, converts it into mel spectra and trains/tests models with cross valdation. Several parameters can be set in the default.ini file.
  • label_file.py takes an already trained model and labels an arbitrary file with the corresponding labels. The sample rate for labeling is read from defaults.ini.

Matlab

The folder contains Matlab code for probabilistic segmentation of field recordings based on energy and classification into a set of classes (e.g. speech, singing, instrumental etc.)

  • segmentRecordingDeep.m is the main function that takes an audio file (field recording) and probabilities of classification of the file into a set of classes (as returned e.g. by the tensorflow model) and returns the segment boundaries and segment labels.

MIREX 2015, 2018

The folders contains our submissions to MIREX 2015 Music/Speech Classification and Detection task, as well as MIREX 2018 Music and or Speech Detection task. See the enclosed READMEs for usage.

Tensorflow model

The folder contains a trained tensorflow model export that labels 2 second audio fragments as solo singing, choir singing, instrumental or speech. Input to the model consists of 513x140 blocks of magnitude spectrogram values (as calculated e.g. by librosa stft) with window size 1024 and step size 315 at 22050 sampling rate. Output are the probabilities of the four target classes for each block. The model can be used in code as:

  Df=# Nx513x140x1 sized array of magnitude fft blocks
  exportpath=PATH_TO_EXPORT_FOLDER
  with tf.Session() as sess:
    tf.saved_model.loader.load(sess, ["scoring-tag"], exportpath)
    predictions = tf.get_default_graph().get_tensor_by_name("predictions:0")
    probabilities = sess.run([predictions], feed_dict={'xinput:0': Df})

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.