Code Monkey home page Code Monkey logo

wav-transcription2-midi's Introduction

Newer PyTorch Implementation of Onsets and Frames Adapted Fom Jongwook onsets-and-frames

This is a PyTorch implementation of Google's Onsets and Frames model, using the Maestro dataset for training and the Disklavier portion of the MAPS database for testing. And further adapted from Jongwook onsets-and-frames to accomodate more recent pytorch implementations specifically torch-2.2.0+cu12.3

Instructions

This project is quite resource-intensive; 32 GB or larger system memory and 8 GB or larger GPU memory is recommended.

Downloading Dataset

To download the Maestro dataset use the link provided. Bear in mind that this dataset is ~100Gb

Transcribing

To use the Model provided from Jongwook onsets-and-frames you need to modify one torch source file specifically torch.nn.modules.rnn.py by replacing it with the file at the root of the repositort aliased rnn.py Download and Place the model at the root of the repository in your local directory. Then you can run the command:

python transcribe.py -i 'folder-with-wavs'

Training

All package requirements are contained in requirements.txt. To train the model, run:

pip install -r requirements.txt
python train.py

train.py is written using sacred, and accepts configuration options such as:

python train.py with logdir=runs/model iterations=1000000

Trained models will be saved in the specified logdir, otherwise at a timestamped directory under runs/.

Testing

To evaluate the trained model using the MAPS database, run the following command to calculate the note and frame metrics:

python evaluate.py runs/model/model-100000.pt

Specifying --save-path will output the transcribed MIDI file along with the piano roll images:

python evaluate.py runs/model/model-100000.pt --save-path output/

In order to test on the Maestro dataset's test split instead of the MAPS database, run:

python evaluate.py runs/model/model-100000.pt Maestro test

Implementation Details

This implementation contains a few of the additional improvements on the model that were reported in the Maestro paper, including:

  • Offset head
  • Increased model capacity, making it 26M parameters by default
  • Gradient stopping of inter-stack connections
  • L2 Gradient clipping of each parameter at 3
  • Using the HTK mel frequencies

Meanwhile, this implementation does not include the following features:

  • Variable-length input sequences that slices at silence or zero crossings
  • Harmonically decaying weights on the frame loss

Despite these, this implementation is able to achieve a comparable performance to what is reported on the Maestro paper as the performance without data augmentation.

wav-transcription2-midi's People

Contributors

cliffordkleinsr avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.