Code Monkey home page Code Monkey logo

pyin-rs's Introduction

pYIN algorithm written in Rust

Crates.io Version Docs.rs Crates.io Downloads

This crate provides a pitch estimate for each frame of the audio signal and a probability that the frame is a voiced region.

The implementation is based on librosa. For easy translation from Python + Numpy to Rust, the implementation is written on top of ndarray crate.

Build & Run

You can use this both as an executable binary and as a library (C shared library and Rust library).

As an executable binary

cargo run --release <input_file> <output_npy_file> <fmin> <fmax> --frame_ms <frame length in miliseconds>

or

cargo build --release
./target/release/pyin <input_file> <output_npy_file> <fmin> <fmax> --frame_ms <frame length in miliseconds>

Note

  • Supported audio files: the same as Creak crate.
    • Multi-channel audio files are supported.
  • output file: npy file contains the output ndarray with
    • shape: (4, no. of channels in input audio, no. of frames)
    • [0, :, :]: timestamp [sec]
    • [1, :, :]: f0 array [Hz]
    • [2, :, :]: voiced flag(1.0 for voiced, 0.0 for unvoiced) array
    • [3, :, :]: voiced probability array
  • If "-" is used as the output filename, the app will send output data to stdout.

Example using pYIN as a C shared library

The example is in test/test.c. To build and run it with GCC,

./compile_test.sh
LD_LIBRARY_PATH=target/release ./test_pyin

Using pYIN as a Rust library

Add the following to your Cargo.toml:

[dependencies]
pyin = "1.0"

TODO

  • Input from stdio
  • More options supported by command-line arguments

pyin-rs's People

Contributors

sytronik avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

Forkers

utheory

pyin-rs's Issues

more idiomatic api

Hello,

This is really cool you ported this algorithm to rust, thank you for all your work! Ideally a (rust) PSOLA algorithm with this would allow very simple but interesting audio manipulations of spoken voice, for example.

So I'm wondering if you have any plans to make the api more "idiomatic rust", specifically, here are a couple issues I encountered and which I think could improve:

  1. The wav input being a CowArray forces users to explicitly depend on ndarray; more idiomatic would be e.g., a &[T] or &[f32] or &[i16], depending on how this gets done. I also understand this gets more complicated when there are multiple channels, but it might be nice for a "simple" api that accepts a (&[f32], usize) or something that has num channels, etc., so there is less burden on the user to convert to appropriate types, etc.
    For example, this is a lot of type manipulations just to get that CowArray (I had to look at your binary to even see how to do this, since i don't use ndarray at all):
      let shape = (1, wav.len());
      let wav = Array2::from_shape_vec(shape.strides((1, shape.0)), wav).unwrap();
      let (f0, voiced_flag, voiced_prob) =
          pyin_exec.pyin(wav.remove_axis(Axis(0)).into(), fill_unvoiced, pad_mode);

I'd prefer something like:

    let wav: Vec<f32> = ...;
    let (f0, voiced_flag, voiced_prob): (Vec<f32>, Vec<bool>, Vec<f32>) = pyin_exec.pyin_with(&wav, nchannels, fil_unvoiced, pad_mode);

Note that the returned objects are regular vecs, and user doesn't need to know about ndarrays or cows arrays or whatever, etc.
2. Reuse of storage; specifically, the api can be made backwards compatible by exposing a inout version (where you pass in the storage, e.g., f0 vec, voiced flag, voiced_prob via &mut), and the current api calls this internally with local vecs as the inouts and returns those. (of course this would also likely depend on above, using vecs as returns etc.)
3. Not really api related, but if the number of deps the library uses could be trimmed down, that would be a plus too. Probably majority come from ndarray, I'm not sure. Similarly, there are some older crates (nalgebra 0.27, depended on by statrs, which rustc flags as having invalid code and will break at a future date) in the dep tree, that could use a version bump/update most likely.

Regardless whether you make it more idiomatic or not, thank you so much for writing this in rust, really great stuff! :)

timestamp missing

Hello !
Very nice piece of code. The only difficulty was to guess the timestamp of each predicted pitch value. Maybe you could add in the output file the timestamp like in the CREPE algorithm.
Regards,
Robin

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.