sevagh / demucs.cpp Goto Github PK

View Code? Open in Web Editor NEW

82.0 82.0 12.0 1.72 MB

C++17 port of Demucs v3 (hybrid) and v4 (hybrid transformer) models with ggml and Eigen3

Home Page: https://freemusicdemixer.com/

License: MIT License

CMake 1.10% C++ 92.04% Python 6.47% Shell 0.40%

blas demixing demucs eigen3 ggml guitar music-source-separation piano pytorch

demucs.cpp's Introduction

sevagy.xyz: blog and personal site

freemusicdemixer.com: Demucs-powered stem separation in your browser and Android phone

metalgroove.xyz: metal-related projects in groove, rhythm, beat, tempo, etc.

demucs.cpp's People

Contributors

Stargazers

Watchers

Forkers

uzstudio joergatgithub olilarkin ishine adamski unruhschuh dszakallas jasonxu cduvenhorst kcoul gitbearflying grattanbodkin

demucs.cpp's Issues

Demucs weights

Hi,

I've successfully compiled demucs.cpp but have problems setting up the python environment.
Is there another way(link?) to download the demucs weights/models, to then convert these to ggml?

Thanks!

Feature request - optional logging

Making this a separate issue, although it could be combined with #5.

As demucs.cpp processes the various stages of the algorithm you print the current operation to the std::out which is perhaps undesirable in a library. Could this logging could be made optional or via a callback?

unknown target CPI 'apple-m1'

I am getting this build error since updating to the recent commits:

error: unknown target CPU 'apple-m1'
note: valid target CPU values are: nocona, core2, penryn, bonnell, atom, silvermont, slm, goldmont, goldmont-plus, tremont, nehalem, corei7, westmere, sandybridge, corei7-avx, ivybridge, core-avx-i, haswell, core-avx2, broadwell, skylake, skylake-avx512, skx, cascadelake, cooperlake, cannonlake, icelake-client, rocketlake, icelake-server, tigerlake, sapphirerapids, alderlake, knl, knm, k8, athlon64, athlon-fx, opteron, k8-sse3, athlon64-sse3, opteron-sse3, amdfam10, barcelona, btver1, btver2, bdver1, bdver2, bdver3, bdver4, znver1, znver2, znver3, x86-64, x86-64-v2, x86-64-v3, x86-64-v4

demucs_mt.cpp.main hard wired for 4-source

The multithreaded cli seems to be hard wired for 4-source models as you can see in this for loop:

demucs.cpp/cli-apps/threaded_inference.hpp

Line 141 in 6f0eba0

for (int t = 0; t < 4; ++t)

Support Demucs v3 (hdemucs_mmi)

Architecture is not too far from v4, except this codebase is missing an LSTM (which I have in umx.cpp: https://github.com/sevagh/umx.cpp/blob/main/src/lstm.cpp)

Still, it will take some effort to trace through every step of v3 inference and implement it all correctly.

How to apply it in WebAssembly?

Hello, I would like to use this library through WebAssembly technology in the browser, but after looking at this library and the project https://github.com/sevagh/freemusicdemixer.com, I really don't understand how it is applied. Could you please provide some documentation or tutorials to guide me? Thank you very much.

Amount of time to demux an audio file

Hi, do you know roughly how long it takes for you to demux an audio file? I've tried with a clip that's 1m30s long, and it took 10m43, so x7 times as long. Is this expected, or have I done something silly?

$ cat /proc/cpuinfo | grep "model name" | head -n1
model name : Intel(R) Core(TM) i3-6100 CPU @ 3.70GHz

(used the 4 sources model).

Thanks!

Question about GPU

How difficult is it to build this using cuBLAS? I'm looking into creating a native project around demucs but need GPU acceleration. Python is way too bloated for my liking.

Feature request - better progress reporting and logging

Firstly thanks for this fantastic project and contribution to the world of open source stem separation. I may make a few more PRs to make it build well on macOS and windows, and e.g. add performance measurements with BLAS from Apple's Accelerate.

I would like to suggest an improvement. It's great to have the progress callback, but each pass of the demucs processing is so resource intensive that it can be a while before you get any feedback. In the case where you want to run this on a background thread it would be good to have a more granular progress callback - so that you can check if the user cancelled the processing, which can take a very long time.

CMakeLists.txt demucs.cpp.test target - missing dependency gtest

demucs.cpp.test depends on googletest, which is not vendord or in the README.md.
I can't build that target on macOS. Maybe if I install via homebrew, although I think it might be better to include it as a submodule?

Memory access error with MT on mac

seems to be in the layer_norm

./build/demucs_ft_mt.cpp.main ./ggml-demucs/

DRUMS [THREAD 1] (9.615%) Time encoder 2
DRUMS [THREAD 3] (11.538%) Freq encoder 2
DRUMS [THREAD 3] (13.462%) Time encoder 3
DRUMS [THREAD 3] (15.385%) Freq encoder 3
DRUMS [THREAD 0] (11.538%) Freq encoder 2
DRUMS [THREAD 3] (15.385%) Applying crosstransformer
zsh: segmentation fault ./build/demucs_ft_mt.cpp.main ./ggml-demucs/

using lldb to watch it

RUMS [THREAD 3] (9.615%) Time encoder 2
DRUMS [THREAD 1] (15.385%) Freq encoder 3
DRUMS [THREAD 1] (15.385%) Applying crosstransformer
Process 85078 stopped

thread #22, stop reason = EXC_BAD_ACCESS (code=1, address=0x8)
frame #0: 0x0000000111c361b0 libdemucs_ft_mt_python.dylibdemucscpp::layer_norm(Eigen::Tensor<float, 3, 0, long> const&, Eigen::Tensor<float, 1, 0, long> const&, Eigen::Tensor<float, 1, 0, long> const&, float) + 832 libdemucs_ft_mt_python.dylibdemucscpp::layer_norm:
-> 0x111c361b0 <+832>: ldr x8, [x8]
0x111c361b4 <+836>: ldr x9, [x9]
0x111c361b8 <+840>: ldp x13, x14, [x19]
0x111c361bc <+844>: ldr x15, [x19, #0x10]
Target 0: (Python) stopped.

Two stem model

Is there a way to use a two-stem model, like --two-stems vocals option in demucs?