magenta / mt3 Goto Github PK

View Code? Open in Web Editor NEW

1.4K 26.0 183.0 212 KB

MT3: Multi-Task Multitrack Music Transcription

License: Apache License 2.0

Python 58.22% Jupyter Notebook 41.78%

mt3's Introduction

MT3: Multi-Task Multitrack Music Transcription

MT3 is a multi-instrument automatic music transcription model that uses the T5X framework.

This is not an officially supported Google product.

Transcribe your own audio

Use our colab notebook to transcribe audio files of your choosing. You can use a pretrained checkpoint from either a) the piano transcription model described in our ISMIR 2021 paper or b) the multi-instrument transcription model described in our ICLR 2022 paper.

Train a model

For now, we do not (easily) support training. If you like, you can try to follow the T5X training instructions and use one of the tasks defined in tasks.py.

mt3's People

Contributors

Stargazers

Watchers

Forkers

agangzz ravi-annaswamy almostimplemented miblue119 kouheifurukawa k1ngcyk serragnarok ak391 paperwave williamqzy blackeyecircles wilsontw xinlinyu petercao jojo861 mirai-explorer contropist mohan-zhang-u arryboom wenhuilu luxyceremy aleksichen dogdogshit stamilo alerbog hu-xiao-max lpngy corehello meadow163 agentku lifuyi seraphir jingcheng-wu googujiang jackboyla yuanrui3 xk-wang cmlyotwhn jonathansum stjordanis keepchatalive manjekim isabella232 luzwcha melodico triper1022 jonah-chen yepman0620 lapocarrieri tycoon108 gideshi shen-zc noise-labs jiexintian mhdramezani hadisaadat vessue tagii79 egoleecode magenta1223 gorinars ilanmotiei noor-ho-tabbe yourdj 9cat zli2001 nicolasanjoran gamegrd landers125 feifei-ss rukalin oceanuse cxzgwing guruace sapieneptus inotiawu tris-sondon windxiao1997 quandong-zhang zhishengyuan zefyrr daydayup999 ma5onic hydrogenblackhole sapienzainteractivegraphicscourse zhaoyk1986 ymy2078069948 glorytune cmarr31 pablomanuellucero dartist0405 yourbirthcertificate feayre-jnx chrisladd liujingxiu23 latruffle ldzhangyx jerrychir bingjiezhu xiaocdh

mt3's Issues

Upload get error `AudioIOReadError: <ufunc 'positive'>`

/usr/local/lib/python3.7/dist-packages/librosa/core/audio.py:165: UserWarning: PySoundFile failed. Trying audioread instead.
  warnings.warn("PySoundFile failed. Trying audioread instead.")
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
[/usr/local/lib/python3.7/dist-packages/note_seq/audio_io.py](https://localhost:8080/#) in load_audio(audio_filename, sample_rate)
    279   try:
--> 280     y, unused_sr = librosa.load(audio_filename, sr=sample_rate, mono=True)
    281   except Exception as e:  # pylint: disable=broad-except

18 frames
[/usr/local/lib/python3.7/dist-packages/librosa/core/audio.py](https://localhost:8080/#) in load(path, sr, mono, offset, duration, dtype, res_type)
    174     if sr is not None:
--> 175         y = resample(y, sr_native, sr, res_type=res_type)
    176 

[/usr/local/lib/python3.7/dist-packages/librosa/core/audio.py](https://localhost:8080/#) in resample(y, orig_sr, target_sr, res_type, fix, scale, **kwargs)
    603     else:
--> 604         y_hat = resampy.resample(y, orig_sr, target_sr, filter=res_type, axis=-1)
    605 

[/usr/local/lib/python3.7/dist-packages/resampy/core.py](https://localhost:8080/#) in resample(x, sr_orig, sr_new, axis, filter, **kwargs)
    119     y_2d = y.swapaxes(0, axis).reshape((y.shape[axis], -1))
--> 120     resample_f(x_2d, y_2d, sample_ratio, interp_win, interp_delta, precision)
    121 

[/usr/local/lib/python3.7/dist-packages/numba/core/dispatcher.py](https://localhost:8080/#) in _compile_for_args(self, *args, **kws)
    433         llvm : dict[signature, str] or str
--> 434             Either the LLVM IR string for the specified signature, or, if no
    435             signature was given, a dictionary mapping signatures to LLVM IR

[/usr/local/lib/python3.7/dist-packages/numba/core/dispatcher.py](https://localhost:8080/#) in _compile_for_args(self, *args, **kws)
    366                      "Please see nested and suppressed exceptions.")
--> 367                 info = ', '.join('Arg #{} is {}'.format(i, args[i])
    368                                  for i in  sorted(already_lit_pos))

[/usr/local/lib/python3.7/dist-packages/numba/core/compiler_lock.py](https://localhost:8080/#) in _acquire_compile_lock(*args, **kwargs)
     31             with self:
---> 32                 return func(*args, **kwargs)
     33         return _acquire_compile_lock

[/usr/local/lib/python3.7/dist-packages/numba/core/dispatcher.py](https://localhost:8080/#) in compile(self, sig)
    818         # Ensure the old overloads are disposed of, including compiled functions.
--> 819         self._make_finalizer()()
    820         self._reset_overloads()

[/usr/local/lib/python3.7/dist-packages/numba/core/dispatcher.py](https://localhost:8080/#) in compile(self, args, return_type)
     77         status, retval = self._compile_cached(args, return_type)
---> 78         if status:
     79             return retval

[/usr/local/lib/python3.7/dist-packages/numba/core/dispatcher.py](https://localhost:8080/#) in _compile_cached(self, args, return_type)
     91             retval = self._compile_core(args, return_type)
---> 92         except errors.TypingError as e:
     93             self._failed_cache[key] = e

[/usr/local/lib/python3.7/dist-packages/numba/core/dispatcher.py](https://localhost:8080/#) in _compile_core(self, args, return_type)
    109                                       pipeline_class=self.pipeline_class)
--> 110         # Check typing error if object mode is used
    111         if cres.typing_error is not None and not flags.enable_pyobject:

[/usr/local/lib/python3.7/dist-packages/numba/core/compiler.py](https://localhost:8080/#) in compile_extra(typingctx, targetctx, func, args, return_type, flags, locals, library, pipeline_class)
    625             cres = norw_cres
--> 626         return cres
    627 

[/usr/local/lib/python3.7/dist-packages/numba/core/compiler.py](https://localhost:8080/#) in __init__(self, typingctx, targetctx, library, args, return_type, flags, locals)
    312         self.state.reload_init = []
--> 313         # hold this for e.g. with_lifting, null out on exit
    314         self.state.pipeline = self

[/usr/local/lib/python3.7/dist-packages/numba/core/base.py](https://localhost:8080/#) in refresh(self)
    282         self.install_registry(builtin_registry)
--> 283         self.load_additional_registries()
    284         # Also refresh typing context, since @overload declarations can

[/usr/local/lib/python3.7/dist-packages/numba/core/cpu.py](https://localhost:8080/#) in load_additional_registries(self)
     65         # Add target specific implementations
---> 66         from numba.np import npyimpl
     67         from numba.cpython import cmathimpl, mathimpl, printimpl, randomimpl

[/usr/local/lib/python3.7/dist-packages/numba/np/npyimpl.py](https://localhost:8080/#) in <module>()
    545         ufunc = getattr(np, ufunc_name)
--> 546         kernel = _kernels[ufunc]
    547         if ufunc.nin == 1:

KeyError: <ufunc 'positive'>

During handling of the above exception, another exception occurred:

AudioIOReadError                          Traceback (most recent call last)
[<ipython-input-19-cb6a80f4894b>](https://localhost:8080/#) in <module>()
      4 
      5 log_event('uploadAudioStart', {})
----> 6 audio = upload_audio(sample_rate=SAMPLE_RATE)
      7 log_event('uploadAudioComplete', {'value': round(len(audio) / SAMPLE_RATE)})
      8 

[<ipython-input-16-152851d66c54>](https://localhost:8080/#) in upload_audio(sample_rate)
     37     print('Multiple files uploaded; using only one.')
     38   return note_seq.audio_io.wav_data_to_samples_librosa(
---> 39     data[0], sample_rate=sample_rate)
     40 
     41 

[/usr/local/lib/python3.7/dist-packages/note_seq/audio_io.py](https://localhost:8080/#) in wav_data_to_samples_librosa(audio_file, sample_rate)
    169     # And back the file position to top (not need for Copy but for certainty)
    170     wav_input_file.seek(0)
--> 171     return load_audio(wav_input_file.name, sample_rate)
    172 
    173 

[/usr/local/lib/python3.7/dist-packages/note_seq/audio_io.py](https://localhost:8080/#) in load_audio(audio_filename, sample_rate)
    280     y, unused_sr = librosa.load(audio_filename, sr=sample_rate, mono=True)
    281   except Exception as e:  # pylint: disable=broad-except
--> 282     raise AudioIOReadError(e)
    283   return y
    284 

AudioIOReadError: <ufunc 'positive'>

When will the usage or tutorial comes out?

I try figuring how to infer, which leads me to figuring out how to use t5x which based on another confusing library called gin. I am totally messed up by your all new google stuff. Plz release the usage soon, thanks!

how to make my own dataset

HI, thanks for your sharing. If I have the source audio and target label(midi file), how do I generate dataset(tf.dataset), thanks

module 'functools' has no attribute 'cached_property'

Second cell in Colab Notebook results in the following error.

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
[<ipython-input-2-152851d66c54>](https://localhost:8080/#) in <module>()
     14 import seqio
     15 import t5
---> 16 import t5x
     17 
     18 from mt3 import metrics_utils

4 frames
[/content/t5x/__init__.py](https://localhost:8080/#) in <module>()
     15 """Import API modules."""
     16 
---> 17 import t5x.adafactor
     18 import t5x.checkpoints
     19 import t5x.decoding

[/content/t5x/adafactor.py](https://localhost:8080/#) in <module>()
     61 import jax.numpy as jnp
     62 import numpy as np
---> 63 from t5x import utils
     64 from t5x.optimizers import OptimizerDef
     65 from t5x.optimizers import OptimizerState

[/content/t5x/utils.py](https://localhost:8080/#) in <module>()
     41 import numpy as np
     42 import seqio
---> 43 from t5x import checkpoints
     44 from t5x import optimizers
     45 from t5x import partitioning

[/content/t5x/checkpoints.py](https://localhost:8080/#) in <module>()
     50 from t5x import checkpoint_utils
     51 from t5x import optimizers
---> 52 from t5x import partitioning
     53 from t5x import state_utils
     54 from t5x import train_state as train_state_lib

[/content/t5x/partitioning.py](https://localhost:8080/#) in <module>()
     45   cached_property = property  # pylint: disable=invalid-name
     46 else:
---> 47   cached_property = functools.cached_property  # pylint: disable=invalid-name
     48 
     49 

AttributeError: module 'functools' has no attribute 'cached_property'

Colab has an import issue with jaxlib?

Hey all, I'm getting an import issue for jaxlib when trying to run the colab.
ValueError: jaxlib is version 0.1.71, but this version of jax requires version 0.1.74.

Anything obvious I'm missing? I'm on a macbook pro, 2019 I think.

Attribute error in imports and definitions

How to get the pretrained checkpoint

How can I get access to the the pretrained checkpoint? I can not see any download link in this repo

Question about midi result bpm

In fact, I don't know if I should mention this question, but I still hope to add a bpm detection function. The default bpm of the current result is 120. But this is not accurate, so it would be better if the bpm result was also more accurate. Of course, it would be a pity if it was technically impossible. But it is not unacceptable. Please also consider my advice! Thanks!

Google Colab notebook running out of disk space installing tf-nightly from clu

Issue

This notebook runs out of disk space installing tf-nightly when running the Setup Environment cell.

Details

I poked around a bit, and it looks like as part of the t5x installation, clu also gets installed. During the clu installation, it looks like tf-nightly attempts to get installed twice. The first time around it grabs the most recent version, tf_nightly-2.10.0.dev20220521-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl, as seen here

The second time around, it tries to grab every version (including back to 2.9.0) from https://pypi.org/project/tf-nightly/#history, as seen here , until it eventually hits ERROR: Could not install packages due to an OSError: [Errno 28] No space left on device

I'm not quite sure what's causing the second installation to trigger. The rest of the installation seems to go okay, but running Imports and Definitions fails out when it can't find clu

Notes

These errors came from running !pip install git+https://github.com/google/CommonLoopUtils#egg=clu , which I got from the t5x repo . If it makes more sense to open an issue there, I can do that -- I opened one here as it happened when I was running the demo notebook
This has run for me as recently as April 26, 2022. I'm not sure if this issue was still happening, but I was just not running out of disk space on Colab because there were fewer tf-nightly versions (they're each ~500MB)

Training

Hi, I'm trying to train a model for a similar task and I'm trying to find the dataset used for the project but I don't see it in any of the files is it loaded through a package, or do I need to download that separately?

Question about popup warning in tensorflow in google colab

The following prompt appears when running Transcribe Audio: WARNING:tensorflow:From /usr/local/lib/python3.7/dist-packages/tensorflow/python/util/nest.py:914: RandomDataset.init (from tensorflow.python. data.ops.dataset_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.data.Dataset.random(...).
This error will cause google colab to process very slowly. Can there be a solution? Thanks!

Model conversion to tensorflow lite

Hi, this project is impressive.
I would like to try it out on mobile ios / android.
Can the model be converted easily to tensorflow lite ? What would be the required redesign ? Thanks a lot.

Upload Audio Upload widget is only available when the cell has been executed in the current browser session. Please rerun this cell to enable.

Hi all, the model works great for the guitar samples, I would like to export this model, can you please advise on this?
Thanks

Error when running to Transcribe Audio

ImportError Traceback (most recent call last)
in ()
22
23 note_seq.play_sequence(est_ns, synth=note_seq.fluidsynth,
---> 24 sample_rate=SAMPLE_RATE, sf2_path=SF2_PATH)
25 note_seq.plot_sequence(est_ns)

4 frames
/usr/local/lib/python3.7/dist-packages/pretty_midi/instrument.py in fluidsynth(self, fs, sf2_path)
454
455 if not _HAS_FLUIDSYNTH:
--> 456 raise ImportError("fluidsynth() was called but pyfluidsynth "
457 "is not installed.")
458

ImportError: fluidsynth() was called but pyfluidsynth is not installed.

NOTE: If your import is failing due to a missing package, you can
manually install dependencies using either !pip or !apt.

To view examples of installing some common dependencies, click the
"Open Examples" button below.

how to run it in my own pc?

OSError: Unable to open file: /content/mt3/gin/model.gin. Searched config paths: [''].

Training with Vocals

Hi, this is by far the most accurate transcription!

I know the project is for piano transcription, and as you mentioned in caveat it's not trained on singing vocals.
Would it be possible to transcribe as good as piano if I train this with vocals?
What should I consider if I want to use this for vocal transcriptions?

No module named 't5x'

ModuleNotFoundError Traceback (most recent call last)
in ()
14 import seqio
15 import t5
---> 16 import t5x
17
18 from mt3 import metrics_utils

ModuleNotFoundError: No module named 't5x

Getting a ValueError when loading model

When loading the mt3 model, I'm getting a valueerror: "Configurable 'models.ContinuousInputsEncoderDecoderModel.loss_fn' doesn't have a parameter named 'z_loss'.
In file "/content/mt3/gin/model.gin", line 16
z_loss = %Z_LOSS
^"
Not sure what this means since I know nothing about code but feel like it would be helpful to address.

Question on Slakh2100 instruments

Hi !

Thanks a lot for the research, the project is neat :)
I'm trying to reproduce your results from scratch. My first trainings on GuitarSet and Maestro were successful, but when I include Slakh it seems the training loss bounces several times after some time and I wondered if I had issues with the dataset itself.

You mention in the paper in Table 7 a "Mapping of Slakh2100 “classes” to MT3 Instrument Token numbers". Do you actually restrain yourself to using only those particular MIDI programs, although tracks exhibit more instruments, and discard the rest? If not, I don't really understand the mapping in the first place, as Slakh seems to have correct MIDI programs (and do not exhibit any instrument names).

I hope my explanation of the problem is clear enough, thanks in advance for your time!

Error when I run Imports and Definitions

When I run Imports and Definitions I get the following error: ModuleNotFoundError Traceback (most recent call last)
in ()
13 import note_seq
14 import seqio
---> 15 import t5
16 import t5x
17

4 frames
/usr/local/lib/python3.7/dist-packages/t5/evaluation/metrics.py in ()
35 from t5.evaluation import qa_utils
36
---> 37 from rouge_score import rouge_scorer
38 from rouge_score import scoring
39

ModuleNotFoundError: No module named 'rouge_score'
I pip rouge_score as requested. But it still goes wrong.

Error importing jaxlib

Export the model to TFLite

Hi all, the model works great for the guitar samples, I would like to export this model, can you please advise on this?
The checkpoint you have is only for training inference or for deployment?

Thanks

Transcribing mt3 step error

When running the transcribing step on a .wav file (subwoofer lullaby from minecraft volume alpha if that helps) I'm now getting an error stating "ImportError: fluidsynth() was called but pyfluidsynth is not installed."

this is a bug

this is the best music transcription model i see before,but i have bugs to run

python mt3/task.py

How to transcribe music with mt3?

Hello! I want to transcribe multi-instrument music with this repository. I cloned it and installed the requirements. However, I am unsure which file I need to run in order to automatically transcribe music.

Track IDs of the split of musicnet dataset

Hi, how can I get track IDs of the split of musicnet dataset? Are they in the released dataset folder?

Can I get a interpretation of what the dense layer output that has a shape of 1536 means?

I know the 0-127 midi instrument program is included in this 1536, but what exactly does this region locate? I'm asking this because I want to apply a mask to the raw output to constrain the predicted types of instrument before softmax layer. So appreciated it if you can help me with this!

Read the weights in _weird_ `0.0` file

How to read the weights present in 0.0 file? I want to put them in .pkl or .pt format.

Music Transcription with Transformers colab notebook get error AttributeError: module 'orbax.checkpoint' has no attribute 'Checkpointer'

Detail

AttributeError Traceback (most recent call last)
in ()
14 import seqio
15 import t5
---> 16 import t5x
17
18 from mt3 import metrics_utils

2 frames
/content/t5x/init.py in ()
15 """Import API modules."""
16
---> 17 import t5x.adafactor
18 import t5x.checkpoints
19 import t5x.decoding

/content/t5x/adafactor.py in ()
61 import jax.numpy as jnp
62 import numpy as np
---> 63 from t5x import utils
64 from t5x.optimizers import OptimizerDef
65 from t5x.optimizers import OptimizerState

/content/t5x/utils.py in ()
152
153
--> 154 class LegacyCheckpointer(orbax.checkpoint.Checkpointer):
155 """Implementation of Checkpointer interface for T5X.
156

AttributeError: module 'orbax.checkpoint' has no attribute 'Checkpointer'

ImportError: cannot import name 'Protocol' from 'typing'

ImportError Traceback (most recent call last)
in ()
14 import seqio
15 import t5
---> 16 import t5x
17
18 from mt3 import metrics_utils

2 frames
/content/t5x/utils.py in ()
26 import time
27 import typing
---> 28 from typing import Any, Callable, Iterable, Mapping, Optional, Sequence, Tuple, Type, Union, Protocol
29 import warnings
30

ImportError: cannot import name 'Protocol' from 'typing' (/usr/lib/python3.7/typing.py)

NOTE: If your import is failing due to a missing package, you can
manually install dependencies using either !pip or !apt.

To view examples of installing some common dependencies, click the
"Open Examples" button below.

When I run Imports and Definitions It always show that.

How can I solve it!

Export the Midi-like tokenized file

Hi all, I would like to have access to the data in a text format like.
How is it possible to get the tokenised file?

Strange tracks and names in the transcribed MIDI

I have been using the scripts for a while, and It is amazing. However, I lost between the track names when I tried to pull the guitars and Arabic instruments back from the MIDI. Sometimes it splits them so much that you get disoriented between small parts and large parts of the same instrument, while being afraid to overlap the tunes. There are only the drums that can be easily found as they are on channel 10. Note: I am using FL Studio embed FLEX system to import the MIDI file. Any workaround for this?

Training Facility and Time?

Thank you for opening soure your wonderful work, I am curious about how many TPUs were used and how long it takes to train the marvelous model?

Tutorial on how to install on Windows or Ubuntu

Hello, I was able to test Music Transcription in Colab, and I thought it was simply magnificent, however, I would like to be able to test this tool locally,

As I have Windows OS, it is to be expected that some lib is not compatible,

So I would like to know if it is possible to refactor your install and import code directly for Linux or Windows?

Because we know that Google colab has many hardware limitations, and making this update for Windows compatibility, it would be very interesting and still more practical,

Thanks in advance for providing this unique tool,

Regards,
Lucas Rodrigues.

Error when opening the '.zarray' file in mt3 checkpoints

I'm trying to run the inference part in Slurm hpc (system: centOS 7.3) and meet an issue.
The issue occurs when the checkpoint is loading:
inference_model = InferenceModel(checkpoint_path, MODEL)
I tried to search solutions on google but there was few. I will appreciate it if you could give me some advice.

The following is some of my packages which may be related to this issue:

python 3.7
cuda 11.2.2
cudnn 8.1a11
tensorflow 2.8.0
flax 0.4.0
jax 0.3.4
jaxlib 0.3.2+cuda11.cudnn82
xarray 0.20.2
zarr 2.11.1

The error information:

Traceback (most recent call last):
File "mt3_inference.py", line 267, in
inference_model = InferenceModel('./checkpoints/mt3/', 'mt3')

File "mt3_inference.py", line 114, in __init__
self.restore_from_checkpoint(checkpoint_path)

File "mt3_inference.py", line 161, in restore_from_checkpoint
[restore_checkpoint_cfg], init_rng=jax.random.PRNGKey(0))

File "/gpfsnyu/scratch/kf2395/new_trail/t5x/utils.py", line 522, in from_checkpoint_or_scratch
return (self.from_checkpoint(ckpt_cfgs, ds_iter=ds_iter, init_rng=init_rng)

File "/gpfsnyu/scratch/kf2395/new_trail/t5x/utils.py", line 508, in from_checkpoint
self.from_checkpoints(ckpt_cfgs, ds_iter=ds_iter, init_rng=init_rng))

File "/gpfsnyu/scratch/kf2395/new_trail/t5x/utils.py", line 466, in from_checkpoints
yield _restore_path(path, restore_cfg)

File "/gpfsnyu/scratch/kf2395/new_trail/t5x/utils.py", line 458, in _restore_path
fallback_state=fallback_state)

File "/gpfsnyu/scratch/kf2395/new_trail/t5x/checkpoints.py", line 861, in restore
lazy_parameters=lazy_parameters)

File "/gpfsnyu/scratch/kf2395/new_trail/t5x/checkpoints.py", line 910, in _read_state_from_tensorstore
state_dict = _run_future_tree(future_state_dict)

File "/gpfsnyu/scratch/kf2395/new_trail/t5x/checkpoints.py", line 161, in _run_future_tree
leaves = loop.run_until_complete(asyncio.gather(*future_leaves))

File "/gpfsnyu/scratch/kf2395/.cache/env/tf2-gpu-py3.7/lib/python3.7/site-packages/nest_asyncio.py", line 81, in run_until_complete
return f.result()

File "/gpfsnyu/scratch/kf2395/.cache/env/tf2-gpu-py3.7/lib/python3.7/asyncio/tasks.py", line 251, in __step
result = coro.throw(exc)

File "/gpfsnyu/scratch/kf2395/new_trail/t5x/checkpoint_importer.py", line 114, in _get_and_cast
arr = await self._get_fn() # pytype: disable=bad-return-type

File "/gpfsnyu/scratch/kf2395/new_trail/t5x/checkpoints.py", line 1241, in _read_ts
t = await ts.open(tmp_ts_spec_dict, open=True)

File "/gpfsnyu/scratch/kf2395/.cache/env/tf2-gpu-py3.7/lib/python3.7/asyncio/futures.py", line 263, in __await__
yield self # This tells Task to wait for completion.

File "/gpfsnyu/scratch/kf2395/.cache/env/tf2-gpu-py3.7/lib/python3.7/asyncio/tasks.py", line 318, in __wakeup
future.result()

File "/gpfsnyu/scratch/kf2395/.cache/env/tf2-gpu-py3.7/lib/python3.7/asyncio/futures.py", line 181, in result
raise self._exception

ValueError: Error opening "zarr" driver: Error reading local file "./checkpoints/mt3/target.decoder.layers_0.encoder_decoder_attention.key.kernel/.zarray": Invalid key: "./checkpoints/mt3/target.decoder.layers_0.encoder_decoder_attention.key.kernel/.zarray"

AttributeError: module 'jax' has no attribute 'tree_multimap'

The jax package doesn't seem to have the correct version installed.
This version of the jax package does not seem to have a function called 'tree_multimap'

Im having an issue on where to start

I was trying to train the model with some more noise added to the data, and I'm confused. I downloaded the mt3 data, but it's the raw data. Has anybody gotten it into a train, Val, test format and started training the mt3 model for multi-instrument transcription? If so, how did you end up doing that? I've been trying for a few days without success; any help would be greatly appreciated

(note I attach this to every message I send after I haven't slept in a while just to know I haven't slept in a few days and could come off sounding weird)

Track/instrument naming

I see there are multiple references to instrument names, particularly in preprocessors.py. However, transcribed midi file is saved with all tracks unnamed. I tried to look through the code to see if names would be easy to add (e.g. as per _SLAKH_CLASS_PROGRAMS), but unfortunately the code is largely sorcery to me without proper diving.

Any of the following solutions will solve my problem at hand:
a) Only drum tracks are named drum tracks.
b) All tracks are named according to instrument.
c) Some other way to output information in the process which track numbers are drum tracks.

Anybody who's more familiar with the code, any tips on passing the instrument names to the resulting midi file? Is this doable?

Error when trying to load the model

In order to run the code locally I have cloned the colab notebook, and finally have set up the environment.
Yet when running the code from the notebook there was this error occured, and the console output like below:

/home/lwd/.conda/envs/lwd_mt3/lib/python3.7/site-packages/flax/optim/base.py:52: DeprecationWarning: Use optax instead of flax.optim. Refer to the update guide https://flax.readthedocs.io/en/latest/howtos/optax_update_guide.html for detailed instructions.
'for detailed instructions.', DeprecationWarning)
/home/lwd/.conda/envs/lwd_mt3/lib/python3.7/site-packages/jax/experimental/pjit.py:183: UserWarning: pjit is an experimental feature and probably has bugs!
warn("pjit is an experimental feature and probably has bugs!")
/home/lwd/.conda/envs/lwd_mt3/lib/python3.7/site-packages/jax/_src/lib/xla_bridge.py:430: UserWarning: jax.host_count has been renamed to jax.process_count. This alias will eventually be removed; please update your code.
"jax.host_count has been renamed to jax.process_count. This alias "
Traceback (most recent call last):
File "/mnt/fast/lwd/aisheet/test.py", line 252, in
inference_model = InferenceModel(checkpoint_path, MODEL)
File "/mnt/fast/lwd/aisheet/test.py", line 88, in init
self.restore_from_checkpoint(checkpoint_path)
File "/mnt/fast/lwd/aisheet/test.py", line 134, in restore_from_checkpoint
[restore_checkpoint_cfg], init_rng=jax.random.PRNGKey(0))
File "/mnt/fast/lwd/aisheet/t5x/utils.py", line 522, in from_checkpoint_or_scratch
return (self.from_checkpoint(ckpt_cfgs, ds_iter=ds_iter, init_rng=init_rng)
File "/mnt/fast/lwd/aisheet/t5x/utils.py", line 508, in from_checkpoint
self.from_checkpoints(ckpt_cfgs, ds_iter=ds_iter, init_rng=init_rng))
File "/mnt/fast/lwd/aisheet/t5x/utils.py", line 466, in from_checkpoints
yield _restore_path(path, restore_cfg)
File "/mnt/fast/lwd/aisheet/t5x/utils.py", line 458, in _restore_path
fallback_state=fallback_state)
File "/mnt/fast/lwd/aisheet/t5x/checkpoints.py", line 880, in restore
return self._restore_train_state(state_dict)
File "/mnt/fast/lwd/aisheet/t5x/checkpoints.py", line 891, in _restore_train_state
train_state, train_state_axes)
File "/mnt/fast/lwd/aisheet/t5x/partitioning.py", line 639, in move_params_to_devices
train_state, _ = p_id_fn(train_state, jnp.ones((), dtype=jnp.uint32))
File "/mnt/fast/lwd/aisheet/t5x/partitioning.py", line 729, in call
return self._pjitted_fn(*args)
File "/home/lwd/.conda/envs/lwd_mt3/lib/python3.7/site-packages/jax/experimental/pjit.py", line 266, in wrapped
args_flat, params, _, out_tree, _ = infer_params(*args, **kwargs)
File "/home/lwd/.conda/envs/lwd_mt3/lib/python3.7/site-packages/jax/experimental/pjit.py", line 250, in infer_params
tuple(isinstance(a, GDA) for a in args_flat))
File "/home/lwd/.conda/envs/lwd_mt3/lib/python3.7/site-packages/jax/linear_util.py", line 272, in memoized_fun
ans = call(fun, *args)
File "/home/lwd/.conda/envs/lwd_mt3/lib/python3.7/site-packages/jax/experimental/pjit.py", line 385, in _pjit_jaxpr
allow_uneven_sharding=False)
File "/home/lwd/.conda/envs/lwd_mt3/lib/python3.7/site-packages/jax/experimental/pjit.py", line 581, in _check_shapes_against_resources
raise ValueError(f"One of {what} was given the resource assignment "
ValueError: One of pjit arguments was given the resource assignment of PartitionSpec(None, 'model'), which implies that the size of its dimension 1 should be divisible by 3, but it is equal to 1024

This occurs when executing the line around 252: inference_model = InferenceModel(checkpoint_path, MODEL)
I have totally no idea why this happened, hoping you guys could help me work this out, thanks!

pip takes too much time to install the dependencies

pip takes over an hour and it's still installing...

Missing cached tasks

The gs://mt3/data/cache_tasks folder only contains cached tasks for the Maestrov3 task, which means that the 'mega_notes_ties_vb1_train' dataset mixture can't be used for training. Is there a way to generate cached tasks locally? There are gin flags like 'USE_CACHED_TASKS' but they seem to only read the data, not write it.

Training with uncached tasks using this command:
python ~/venvs/mt3env/manual_installs/t5x/train.py --gin_file="mt3/gin/model.gin" --gin_file="mt3/gin/train.gin" --gin_file="mt3/gin/mt3.gin" --gin.MODEL_DIR="'/Mypath/to/models'" --gin.CHECKPOINT_PATH="'gs://mt3/checkpoints/mt3/checkpoint'" --gin.TASK_PREFIX="'mega_notes_ties'" --gin.USE_CACHED_TASKS=False --gin.utils.DatasetConfig.batch_size=1
produces the following error:

Traceback (most recent call last):
  File "/home/user/venvs/mt3env/manual_installs/t5x/t5x/train.py", line 617, in <module>
    gin_utils.run(main)
  File "/home/user/venvs/mt3env/manual_installs/t5x/t5x/gin_utils.py", line 103, in run
    app.run(
  File "/home/user/venvs/mt3env/lib/python3.9/site-packages/absl/app.py", line 312, in run
    _run_main(main, args)
  File "/home/user/venvs/mt3env/lib/python3.9/site-packages/absl/app.py", line 258, in _run_main
    sys.exit(main(argv))
  File "/home/user/venvs/mt3env/manual_installs/t5x/t5x/train.py", line 596, in main
    _main(argv)
  File "/home/user/venvs/mt3env/manual_installs/t5x/t5x/train.py", line 614, in _main
    train_using_gin()
  File "/home/user/venvs/mt3env/lib/python3.9/site-packages/gin/config.py", line 1605, in gin_wrapper
    utils.augment_exception_message_and_reraise(e, err_str)
  File "/home/user/venvs/mt3env/lib/python3.9/site-packages/gin/utils.py", line 41, in augment_exception_message_and_reraise
    raise proxy.with_traceback(exception.__traceback__) from None
  File "/home/user/venvs/mt3env/lib/python3.9/site-packages/gin/config.py", line 1582, in gin_wrapper
    return fn(*new_args, **new_kwargs)
  File "/home/user/venvs/mt3env/manual_installs/t5x/t5x/train.py", line 245, in train
    train_ds = get_dataset_fn(train_dataset_cfg, ds_shard_id, num_ds_shards,
  File "/home/user/venvs/mt3env/manual_installs/t5x/t5x/utils.py", line 891, in get_dataset
    return get_dataset_inner(cfg, shard_info, feature_converter_cls, seed,
  File "/home/user/venvs/mt3env/manual_installs/t5x/t5x/utils.py", line 906, in get_dataset_inner
    ds = seqio.get_dataset(
  File "/home/user/venvs/mt3env/lib/python3.9/site-packages/seqio/dataset_providers.py", line 1593, in get_dataset
    ds = mixture_or_task.get_dataset(
  File "/home/user/venvs/mt3env/lib/python3.9/site-packages/seqio/dataset_providers.py", line 1396, in get_dataset
    rates = [self.get_rate(task) for task in tasks]
  File "/home/user/venvs/mt3env/lib/python3.9/site-packages/seqio/dataset_providers.py", line 1396, in <listcomp>
    rates = [self.get_rate(task) for task in tasks]
  File "/home/user/venvs/mt3env/lib/python3.9/site-packages/seqio/dataset_providers.py", line 1290, in get_rate
    value += float(rate(task) if callable(rate) else rate)
  File "/home/user/venvs/mt3env/lib/python3.9/site-packages/seqio/utils.py", line 672, in mixing_rate_num_examples
    ret *= scale
TypeError: unsupported operand type(s) for *=: 'NoneType' and 'float'
  In call to configurable 'train' (<function train at 0x7f7cb8aec940>)

Unable to run the script offline

Is there any way to clone this project and run it on my pc offline?

Add Usage tutorial.

Question about missing melody instrument phrases

Thank you for creating such an interesting tool, but I am very puzzled why the melody instruments in the intro or interlude are incomplete or missing in the mid results. For example, if there are melody instruments such as flutes in the source file, the phrases of these melody instruments in the mid result are often incomplete or even lost. Is there a solution? Thanks!

colab notebook throws error in "Imports and Definitions" section

It was working fine until a few days ago, But now, "Imports and Definitions" section says,

/usr/local/lib/python3.7/dist-packages/jax/_src/lib/__init__.py in check_jaxlib_version(jax_version, jaxlib_version, minimum_jaxlib_version)
     80            f'incompatible with jax version {jax_version}. Please '
     81            'update your jax and/or jaxlib packages.')
---> 82     raise RuntimeError(msg)
     83 
     84   return _jaxlib_version

RuntimeError: jaxlib version 0.3.20 is newer than and incompatible with jax version 0.3.15. Please update your jax and/or jaxlib packages.

is there an exe version?

I need it

Fail to install

note-seq requires protobuf>=4.21.2,but tensorflow requires protobuf==3.20.x

Weird midi output file

Playing the first track from the beginning does not sound the same as playing it after pressing pause.
I run it on Colab and mt3 model is used.

transcribed-120s.-.Cakewalk.2022-12-08.10-11-32.mp4

The following link contains the input audio and the mt3 output midi.
https://drive.google.com/drive/folders/1gO5zZl4pVAijgqCUOEE8qvUZeTL3xncp?usp=sharing

I can't get to the part where I upload my file.

Clear output executed at 7:53 PM (0 minutes ago) executed in 13.212s

/usr/local/lib/python3.7/dist-packages/flax/optim/base.py:52: DeprecationWarning: Use optax instead of flax.optim. Refer to the update guide https://flax.readthedocs.io/en/latest/howtos/optax_update_guide.html for detailed instructions.
'for detailed instructions.', DeprecationWarning)
/usr/local/lib/python3.7/dist-packages/jax/_src/lib/xla_bridge.py:435: UserWarning: jax.host_count has been renamed to jax.process_count. This alias will eventually be removed; please update your code.
"jax.host_count has been renamed to jax.process_count. This alias "

Aditionally:

No files selected. Upload widget is only available when the cell has been executed in the current browser session. Please rerun this cell to enable.

MessageError Traceback (most recent call last)
in ()
4
5 log_event('uploadAudioStart', {})
----> 6 audio = upload_audio(sample_rate=SAMPLE_RATE)
7 log_event('uploadAudioComplete', {'value': round(len(audio) / SAMPLE_RATE)})
8

4 frames

/usr/local/lib/python3.7/dist-packages/google/colab/_message.py in read_reply_from_input(message_id, timeout_sec)
104 reply.get('colab_msg_id') == message_id):
105 if 'error' in reply:
--> 106 raise MessageError(reply['error'])
107 return reply.get('data', None)
108

MessageError: TypeError: google.colab._files is undefined
No idea what is going on.

magenta / mt3 Goto Github PK

mt3's Introduction

MT3: Multi-Task Multitrack Music Transcription

Transcribe your own audio

Train a model

mt3's People

Contributors

Stargazers

Watchers

Forkers

mt3's Issues

Issue

Details

Notes

Detail

No files selected. Upload widget is only available when the cell has been executed in the current browser session. Please rerun this cell to enable.

Recommend Projects

Recommend Topics

Recommend Org