Code Monkey home page Code Monkey logo

grandaddyshmax / audiocraft_plus Goto Github PK

View Code? Open in Web Editor NEW

This project forked from facebookresearch/audiocraft

527.0 527.0 62.0 1.42 MB

Audiocraft is a library for audio processing and generation with deep learning. It features the state-of-the-art EnCodec audio compressor / tokenizer, along with MusicGen, a simple and controllable music generation LM with textual and melodic conditioning.

License: MIT License

Python 98.82% Makefile 0.15% Dockerfile 0.10% CSS 0.15% HTML 0.76%

audiocraft_plus's People

Contributors

adefossez avatar adiyoss avatar ashleykleynhans avatar bocytko avatar carankt avatar felixkreuk avatar frinkleko avatar grandaddyshmax avatar jadecopet avatar jamierpond avatar jonathanfly avatar mimbres avatar patrickvonplaten avatar radames avatar sanchit-gandhi avatar srezasm avatar sungeuns avatar syhw avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

audiocraft_plus's Issues

--listen flag returns error

Hello, trying to use the --listen flag so that I can access the UI from another computer in my LAN and I get the following error:
app.py: error: argument --listen: expected one argument

I've tried putting the port number and/or IP (x.x.x.x:xxxx) in various ways, and each time I get a 'False
ERROR: [Errno -2] Name or service not known' or ' False ERROR: [Errno 99] error while attempting to bind on address ('0.0.30.180', 7860): cannot assign requested address' or similar error.

How do we use listen properly with this code?

EDIT: Ok, I figured it out - if you are having the same issue try putting --listen 0.0.0.0

Problem loading new stereo music gen models (mac-os-fix)

commit 882222d branch mac-os-fix

I'm having a problem loading a custom model, new stereo one. Largely new to Torch and Gradio so I think I'm doing it wrong..

I download the state_dict.bin from https://huggingface.co/facebook/musicgen-stereo-melody/tree/main and save it as musicgen-stereo-melody.pt in models/.
Gradio app finds file and I set following configuration for model in UI:

Screenshot 2023-11-09 at 15 48 40

But when generating it Gradio gives error as something seems to be in wrong format:

Loading model GrandaddyShmax/musicgen-custom
RuntimeError: Error(s) in loading state_dict for LMModel:
/Users/user/.local/share/virtualenvs/audiocraft_plus-v8WMZNJZ/lib/python3.11/site-packages/torch/nn/utils/weight_norm.py:30: UserWarning: torch.nn.utils.weight_norm is deprecated in favor of torch.nn.utils.parametrizations.weight_norm.
  warnings.warn("torch.nn.utils.weight_norm is deprecated in favor of torch.nn.utils.parametrizations.weight_norm.")
Traceback (most recent call last):
  File "/Users/user/.local/share/virtualenvs/audiocraft_plus-v8WMZNJZ/lib/python3.11/site-packages/gradio/queueing.py", line 407, in call_prediction
    output = await route_utils.call_process_api(
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/user/.local/share/virtualenvs/audiocraft_plus-v8WMZNJZ/lib/python3.11/site-packages/gradio/route_utils.py", line 226, in call_process_api
    output = await app.get_blocks().process_api(
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/user/.local/share/virtualenvs/audiocraft_plus-v8WMZNJZ/lib/python3.11/site-packages/gradio/blocks.py", line 1550, in process_api
    result = await self.call_function(
             ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/user/.local/share/virtualenvs/audiocraft_plus-v8WMZNJZ/lib/python3.11/site-packages/gradio/blocks.py", line 1185, in call_function
    prediction = await anyio.to_thread.run_sync(
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/user/.local/share/virtualenvs/audiocraft_plus-v8WMZNJZ/lib/python3.11/site-packages/anyio/to_thread.py", line 33, in run_sync
    return await get_asynclib().run_sync_in_worker_thread(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/user/.local/share/virtualenvs/audiocraft_plus-v8WMZNJZ/lib/python3.11/site-packages/anyio/_backends/_asyncio.py", line 877, in run_sync_in_worker_thread
    return await future
           ^^^^^^^^^^^^
  File "/Users/user/.local/share/virtualenvs/audiocraft_plus-v8WMZNJZ/lib/python3.11/site-packages/anyio/_backends/_asyncio.py", line 807, in run
    result = context.run(func, *args)
             ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/user/.local/share/virtualenvs/audiocraft_plus-v8WMZNJZ/lib/python3.11/site-packages/gradio/utils.py", line 661, in wrapper
    response = f(*args, **kwargs)
               ^^^^^^^^^^^^^^^^^^
  File "/Users/user/dev/audio-generation/audiocraft_plus/app.py", line 738, in predict_full
    load_model(model, custom_model, base_model, gen_type)
  File "/Users/user/dev/audio-generation/audiocraft_plus/app.py", line 154, in load_model
    MODEL.lm.load_state_dict(torch.load(file_path))
  File "/Users/user/.local/share/virtualenvs/audiocraft_plus-v8WMZNJZ/lib/python3.11/site-packages/torch/nn/modules/module.py", line 2152, in load_state_dict
    raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
RuntimeError: Error(s) in loading state_dict for LMModel:
	Missing key(s) in state_dict: "condition_provider.conditioners.self_wav.output_proj.weight", "condition_provider.conditioners.self_wav.output_proj.bias", "condition_provider.conditioners.self_wav.chroma.spec.window", "condition_provider.conditioners.description.output_proj.weight", "condition_provider.conditioners.description.output_proj.bias", "emb.0.weight", "emb.1.weight", "emb.2.weight", "emb.3.weight", "transformer.layers.0.self_attn.in_proj_weight", "transformer.layers.0.self_attn.out_proj.weight", "transformer.layers.0.linear1.weight", "transformer.layers.0.linear2.weight"

...

nsformer.layers.46.norm2.weight", "transformer.layers.46.norm2.bias", "transformer.layers.47.self_attn.in_proj_weight", "transformer.layers.47.self_attn.out_proj.weight", "transformer.layers.47.linear1.weight", "transformer.layers.47.linear2.weight", "transformer.layers.47.norm1.weight", "transformer.layers.47.norm1.bias", "transformer.layers.47.norm2.weight", "transformer.layers.47.norm2.bias", "out_norm.weight", "out_norm.bias", "linears.0.weight", "linears.1.weight", "linears.2.weight", "linears.3.weight".
	Unexpected key(s) in state_dict: "best_state", "xp.cfg", "version", "exported".

What would be the correct steps to get this working?

I assume I messed up cause I don't have the compression_state_dict.bin used but I couldn't figure out how to put that into models folder with .pt file extension that gradio app looks for.

Apple M1 Pro, 16GB model.

Melody - Connection errored out

First, AudioCraft Plus is great. Thank you very much.

In MusicGen, Audio Tab, Input Audio Mode in Melody, and Settings tab model Melody, the only way I can make it work is when I generate the audio inside the MusicGen Generation tab and use the option "Send to Input Audio" first.

I can't make it work when I drag the file myself to the Audio tab; it doesn't matter the audio extension or length; nothing seems to work, not even using the same audio file that was sent with the "Send to Input Audio" option (I download to my PC and drag it there).

Is this normal?

Thank you

Thank you very much for your UI! Have few suggestion if u would mind.

  1. Can we make defoult location to output files like ./output/date/ ? If genegate alot of files - difficult to navigate.
  2. Can we store generation infomation (prompt, settings, sid) inside sound file? Or if it imposible may be generate small .txt file same name as sound file. Its will be very useful.

Error caught was: No module named 'triton' in windows11

venv config

home = C:\Program Files\Python310
implementation = CPython
version_info = 3.10.0.candidate.2
virtualenv = 20.16.7
include-system-site-packages = false
base-prefix = C:\Program Files\Python310
base-exec-prefix = C:\Program Files\Python310
base-executable = C:\Program Files\Python310\python.exe

run app.py ERROR info

E:\AIProject\audiocraft_plus>set TRANSFORMERS_CACHE=.\cache

E:\AIProject\audiocraft_plus>.\venv\Scripts\python.exe .\app.py
A matching Triton is not available, some optimizations will not be enabled.
Error caught was: No module named 'triton'
Traceback (most recent call last):
  File "E:\AIProject\audiocraft_plus\app.py", line 33, in <module>
    from audiocraft.data.audio_utils import convert_audio
  File "E:\AIProject\audiocraft_plus\audiocraft\__init__.py", line 24, in <module>
    from . import data, modules, models
  File "E:\AIProject\audiocraft_plus\audiocraft\data\__init__.py", line 10, in <module>
    from . import audio, audio_dataset, info_audio_dataset, music_dataset, sound_dataset
  File "E:\AIProject\audiocraft_plus\audiocraft\data\info_audio_dataset.py", line 19, in <module>
    from ..modules.conditioners import SegmentWithAttributes, ConditioningAttributes
  File "E:\AIProject\audiocraft_plus\audiocraft\modules\conditioners.py", line 21, in <module>
    import spacy
  File "E:\AIProject\audiocraft_plus\venv\lib\site-packages\spacy\__init__.py", line 14, in <module>
    from . import pipeline  # noqa: F401
  File "E:\AIProject\audiocraft_plus\venv\lib\site-packages\spacy\pipeline\__init__.py", line 1, in <module>
    from .attributeruler import AttributeRuler
  File "E:\AIProject\audiocraft_plus\venv\lib\site-packages\spacy\pipeline\attributeruler.py", line 6, in <module>
    from .pipe import Pipe
  File "spacy\pipeline\pipe.pyx", line 1, in init spacy.pipeline.pipe
  File "spacy\vocab.pyx", line 1, in init spacy.vocab
  File "E:\AIProject\audiocraft_plus\venv\lib\site-packages\spacy\tokens\__init__.py", line 1, in <module>
    from .doc import Doc
  File "spacy\tokens\doc.pyx", line 36, in init spacy.tokens.doc
  File "E:\AIProject\audiocraft_plus\venv\lib\site-packages\spacy\schemas.py", line 6, in <module>
    from pydantic import StrictStr, StrictInt, StrictFloat, StrictBool, ConstrainedStr
  File "E:\AIProject\audiocraft_plus\venv\lib\site-packages\pydantic\__init__.py", line 363, in __getattr__
    return _getattr_migration(attr_name)
  File "E:\AIProject\audiocraft_plus\venv\lib\site-packages\pydantic\_migration.py", line 302, in wrapper
    raise PydanticImportError(f'`{import_path}` has been removed in V2.')
pydantic.errors.PydanticImportError: `pydantic:ConstrainedStr` has been removed in V2.

For further information visit https://errors.pydantic.dev/2.5/u/import-error

E:\AIProject\audiocraft_plus>pause

I don't know what to do. Please help me

Error when genrating music

I tryed to run it a few times but it keeps saying error. i did install pytorch with cuda
image

OS Name Microsoft Windows 11 Pro
Version 10.0.22621 Build 22621
Processor AMD Ryzen 9 5900X 12-Core Processor, 4301 Mhz, 12 Core(s), 24 Logical Processor(s)
Installed Physical Memory (RAM) 64.0 GB
Name NVIDIA GeForce RTX 3090 Ti
Adapter RAM (1,048,576) bytes
Driver Version 31.0.15.3742

'TypeError: can only concatenate str (not "NoneType") to str' problem and solution here!

When I click on the Generate button, I get this error:
I:\audiocraft_plus\venv\lib\site-packages\gradio\components\textbox.py:163: UserWarning: Using the update method is deprecated. Simply return a new object instead, e.g. return gr.Textbox(...) instead of return gr.Textbox.update(...).
warnings.warn(
Traceback (most recent call last):
File "I:\audiocraft_plus\venv\lib\site-packages\gradio\queueing.py", line 406, in call_prediction
output = await route_utils.call_process_api(
File "I:\audiocraft_plus\venv\lib\site-packages\gradio\route_utils.py", line 226, in call_process_api
output = await app.get_blocks().process_api(
File "I:\audiocraft_plus\venv\lib\site-packages\gradio\blocks.py", line 1554, in process_api
result = await self.call_function(
File "I:\audiocraft_plus\venv\lib\site-packages\gradio\blocks.py", line 1192, in call_function
prediction = await anyio.to_thread.run_sync(
File "I:\audiocraft_plus\venv\lib\site-packages\anyio\to_thread.py", line 33, in run_sync
return await get_asynclib().run_sync_in_worker_thread(
File "I:\audiocraft_plus\venv\lib\site-packages\anyio_backends_asyncio.py", line 877, in run_sync_in_worker_thread
return await future
File "I:\audiocraft_plus\venv\lib\site-packages\anyio_backends_asyncio.py", line 807, in run
result = context.run(func, *args)
File "I:\audiocraft_plus\venv\lib\site-packages\gradio\utils.py", line 659, in wrapper
response = f(*args, **kwargs)
File "I:\audiocraft_plus\app.py", line 826, in predict_full
custom_model = "models/" + custom_model
TypeError: can only concatenate str (not "NoneType") to str

Here is the solution:
use pip install gradio_client==0.6.0 command in your venv, and that's it!

Converting this repo to a package

Hi, this repository is originally made as a fork; however, given the implementation style it was convertible to a package. This allows using it with whichever audiocraft version (until something breaks), and separates the custom code from the internals.

I have made a repository here that does just that, it makes this project into a one-script python package that can be installed.

https://github.com/rsxdalv/extension_audiocraft_plus

This is the installation command:

pip install git+https://github.com/rsxdalv/extension_audiocraft_plus@legacy#egg=extension_audiocraft_plus

Through my testing, the only extra library needed was pytaglib, so the requirements are simply:

audiocraft>=1.2
pytaglib

Secondly, this repository deals with the issue of installing the UI as a finished product. For that my personal answer is to create it as an extension for my project which makes the torch + everything installation a separate concern.

I think this repository has quite a novel approach to audiocraft so I hope it can survive without becoming obsolete.

UnboundLocalError: cannot access local variable 'sentencepiece_model_pb2' where it is not associated with a value

i keep getting this error when trying to generate

Loading model GrandaddyShmax/musicgen-large
Traceback (most recent call last):
File "Z:\Progams\Anaconda3\Lib\site-packages\gradio\routes.py", line 488, in run_predict
output = await app.get_blocks().process_api(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "Z:\Progams\Anaconda3\Lib\site-packages\gradio\blocks.py", line 1431, in process_api
result = await self.call_function(
^^^^^^^^^^^^^^^^^^^^^^^^^
File "Z:\Progams\Anaconda3\Lib\site-packages\gradio\blocks.py", line 1109, in call_function
prediction = await anyio.to_thread.run_sync(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "Z:\Progams\Anaconda3\Lib\site-packages\anyio\to_thread.py", line 28, in run_sync
return await get_asynclib().run_sync_in_worker_thread(func, *args, cancellable=cancellable,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "Z:\Progams\Anaconda3\Lib\site-packages\anyio_backends_asyncio.py", line 818, in run_sync_in_worker_thread
return await future
^^^^^^^^^^^^
File "Z:\Progams\Anaconda3\Lib\site-packages\anyio_backends_asyncio.py", line 754, in run
result = context.run(func, *args)
^^^^^^^^^^^^^^^^^^^^^^^^
File "Z:\Progams\Anaconda3\Lib\site-packages\gradio\utils.py", line 706, in wrapper
response = f(*args, **kwargs)
^^^^^^^^^^^^^^^^^^
File "Z:\Progams\AI\audiocraft_plus\app.py", line 857, in predict_full
load_model(model, custom_model, gen_type)
File "Z:\Progams\AI\audiocraft_plus\app.py", line 154, in load_model
MODEL = MusicGen.get_pretrained(version)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "Z:\Progams\AI\audiocraft_plus\audiocraft\models\musicgen.py", line 111, in get_pretrained
lm = load_lm_model(name, device=device)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "Z:\Progams\AI\audiocraft_plus\audiocraft\models\loaders.py", line 111, in load_lm_model
model = builders.get_lm_model(cfg)
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "Z:\Progams\AI\audiocraft_plus\audiocraft\models\builders.py", line 97, in get_lm_model
condition_provider = get_conditioner_provider(kwargs["dim"], cfg).to(cfg.device)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "Z:\Progams\AI\audiocraft_plus\audiocraft\models\builders.py", line 137, in get_conditioner_provider
conditioners[str(cond)] = T5Conditioner(output_dim=output_dim, device=device, **model_args)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "Z:\Progams\AI\audiocraft_plus\audiocraft\modules\conditioners.py", line 415, in init
self.t5_tokenizer = T5Tokenizer.from_pretrained(name)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "Z:\Progams\Anaconda3\Lib\site-packages\transformers\tokenization_utils_base.py", line 1854, in from_pretrained
return cls._from_pretrained(
^^^^^^^^^^^^^^^^^^^^^
File "Z:\Progams\Anaconda3\Lib\site-packages\transformers\tokenization_utils_base.py", line 2017, in _from_pretrained
tokenizer = cls(*init_inputs, **init_kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "Z:\Progams\Anaconda3\Lib\site-packages\transformers\models\t5\tokenization_t5.py", line 194, in init
self.sp_model = self.get_spm_processor()
^^^^^^^^^^^^^^^^^^^^^^^^
File "Z:\Progams\Anaconda3\Lib\site-packages\transformers\models\t5\tokenization_t5.py", line 200, in get_spm_processor
model_pb2 = import_protobuf()
^^^^^^^^^^^^^^^^^
File "Z:\Progams\Anaconda3\Lib\site-packages\transformers\convert_slow_tokenizer.py", line 40, in import_protobuf
return sentencepiece_model_pb2
^^^^^^^^^^^^^^^^^^^^^^^
UnboundLocalError: cannot access local variable 'sentencepiece_model_pb2' where it is not associated with a value

Question: what is the application doing (takes long time) after GPU usage goes down to 0% ?

Hi,

I am trying out the latest git version and I consistently see the following behavior:

when clicking "Generate" it obviously starts generating the audio, GPU usage goes up; then the following gets printed in the console where the application was started:

Make a video took 3.4332385063171387
video: PMfTuclUS4V9.mp4
batch finished 1 204.6192593574524
Tempfiles currently stored:  3

At this point I would have assumed that everything is "done", but the progress bars keep counting while GPU usage is at 0% and and CPU is at about 1%. My question is: is this normal or is this some sort of bug, what exactly is happening during this phase?

Here is what I mean:
image

I think GPU stops computing at about 15% if not earlier, takes only a few minutes. Then however it takes over half an hour to reach the 100% and nvtop looks like this during that time:
image

I am using the "melody" model, generating 30 secs via a simple console script does not show such delays, i.e. it takes the few minutes where the GPU is active, then my script saves the generated audio and exits. So this something extra must be happening in audiocraft plus and I am wondering what it is?

Possible update with stereo models

Hi Bro,

What's up?
Is there possibility you can update to include the stereo models of musicgen? maybe no need to make stereo effect.

Is this a program that I can use?

I am on a macbook m1. after git cloning the repo and trying to run app.py i get the following error:

Traceback (most recent call last):
File "/Volumes/Files/AI/audiocraft_plus/app.py", line 26, in
import taglib
ImportError: dlopen(/opt/homebrew/Caskroom/miniconda/base/lib/python3.10/site-packages/taglib.cpython-310-darwin.so, 0x0002): symbol not found in flat namespace '__ZN6TagLib10StringList6appendERKNS_6StringE'

not downloadin models

Make a video took 1.1086127758026123
video: NsYgICqINIfJ.mp4
batch finished 1 3.945178985595703
Tempfiles currently stored: 33
Exception in callback _ProactorBasePipeTransport._call_connection_lost(None)
handle: <Handle _ProactorBasePipeTransport._call_connection_lost(None)>
Traceback (most recent call last):
File "G:\miniconda\envs\audiocraft\lib\asyncio\events.py", line 80, in _run
self._context.run(self._callback, *self._args)
File "G:\miniconda\envs\audiocraft\lib\asyncio\proactor_events.py", line 162, in _call_connection_lost
self._sock.shutdown(socket.SHUT_RDWR)
ConnectionResetError: [WinError 10054] An existing connection was forcibly closed by the remote host
Exception in callback _ProactorBasePipeTransport._call_connection_lost(None)
handle: <Handle _ProactorBasePipeTransport._call_connection_lost(None)>
Traceback (most recent call last):
File "G:\miniconda\envs\audiocraft\lib\asyncio\events.py", line 80, in _run
self._context.run(self._callback, *self._args)
File "G:\miniconda\envs\audiocraft\lib\asyncio\proactor_events.py", line 162, in _call_connection_lost
self._sock.shutdown(socket.SHUT_RDWR)
ConnectionResetError: [WinError 10054] An existing connection was forcibly closed by the remote host
loading MBD
'(ReadTimeoutError("HTTPSConnectionPool(host='huggingface.co', port=443): Read timed out. (read timeout=10)"), '(Request ID: 5348ba3b-e321-44cf-8107-28a72b982755)')' thrown while requesting HEAD https://huggingface.co/facebook/musicgen-small/resolve/main/compression_state_dict.bin
Traceback (most recent call last):
File "G:\miniconda\envs\audiocraft\lib\site-packages\gradio\routes.py", line 488, in run_predict
output = await app.get_blocks().process_api(
File "G:\miniconda\envs\audiocraft\lib\site-packages\gradio\blocks.py", line 1431, in process_api
result = await self.call_function(
File "G:\miniconda\envs\audiocraft\lib\site-packages\gradio\blocks.py", line 1109, in call_function
prediction = await anyio.to_thread.run_sync(
File "G:\miniconda\envs\audiocraft\lib\site-packages\anyio\to_thread.py", line 33, in run_sync
return await get_asynclib().run_sync_in_worker_thread(
File "G:\miniconda\envs\audiocraft\lib\site-packages\anyio_backends_asyncio.py", line 877, in run_sync_in_worker_thread
return await future
File "G:\miniconda\envs\audiocraft\lib\site-packages\anyio_backends_asyncio.py", line 807, in run
result = context.run(func, *args)
File "G:\miniconda\envs\audiocraft\lib\site-packages\gradio\utils.py", line 706, in wrapper
response = f(*args, **kwargs)
File "G:\audiocraft\app.py", line 849, in predict_full
load_diffusion()
File "G:\audiocraft\app.py", line 471, in load_diffusion
MBD = MultiBandDiffusion.get_mbd_musicgen()
File "G:\audiocraft\audiocraft\models\multibanddiffusion.py", line 73, in get_mbd_musicgen
codec_model = load_compression_model(name, device=device)
File "G:\audiocraft\audiocraft\models\loaders.py", line 72, in load_compression_model
pkg = load_compression_model_ckpt(file_or_url_or_id, cache_dir=cache_dir)
File "G:\audiocraft\audiocraft\models\loaders.py", line 68, in load_compression_model_ckpt
return _get_state_dict(file_or_url_or_id, filename="compression_state_dict.bin", cache_dir=cache_dir)
File "G:\audiocraft\audiocraft\models\loaders.py", line 63, in _get_state_dict
file = hf_hub_download(repo_id=file_or_url_or_id, filename=filename, cache_dir=cache_dir)
File "G:\miniconda\envs\audiocraft\lib\site-packages\huggingface_hub\utils_validators.py", line 118, in _inner_fn
return fn(*args, **kwargs)
File "G:\miniconda\envs\audiocraft\lib\site-packages\huggingface_hub\file_download.py", line 1291, in hf_hub_download
raise LocalEntryNotFoundError(
huggingface_hub.utils._errors.LocalEntryNotFoundError: Connection error, and we cannot find the requested files in the disk cache. Please try again or make sure your Internet connection is on.

Question: Generate multiple files at once?

Using the API and a simple python script to use model.generate, I can create multiple files in one run, simply by adding several prompts in a row.
Is this possible with your app?

Web ui not displaying the generated audio wav and mp4.

The web ui, sometimes hangs, and it displays error, in the audio and the mp4 section. The console however points out, that the files are generated.

Is there a way to find, and download the files from the files folders? I searched all around, i couldn't figure out where the files are temporary located, in the file hierarchy.

Multiband diffusion not working.

After git pull to latest versio, the multiband diffusion does not work.
I lanch AudioCraft Plus, write TEST to prompt and hit generate. It takes quite some time to generate 10sec test. Then i go to settings and change decoder to multiband_diffusion and hit generate and get long list of errors that end to this line:
TypeError: issubclass() arg 1 must be a class
Files that give errors:
\Python\Python310\lib\site-packages\omegaconf\dictconfig.py
\Python\Python310\lib\site-packages\omegaconf_utils.py
Python\Python310\lib\site-packages\omegaconf\base.py
Python\Python310\lib\site-packages\omegaconf\dictconfig.py

Also, latest version seem quite slow. How can i de-bug that? It is using GPU as GPU goes 100% while generating. I have RTX3090 with 24gb VRAM.

Installation instruction

Hello there

Can you please provide additional instructions on how can i install this awesome webUI if i've already got installed the latest version on audiocraft with all dependencies etc.
For example, i've got D:\audiocraft folder. Can i install PLUS in the same one? Or better to do D:\PLUS ? And the next steps?

Thank you for your help.

Memory leak :(

Every generation increase memory usa :(((
1st generation - 8gb
2nd generation - 12gb
3nd generation - 14gb
4nd generation - 17gb
And crash :(

Issue with loading the models error

Hi Bro,

I know youu've been working hard to improve.
I encountered an error this morning. Please see below.

Loading model melody
Downloading (โ€ฆ)ssion_state_dict.bin: 100% 236M/236M [00:01<00:00, 208MB/s]
Downloading state_dict.bin: 100% 2.77G/2.77G [00:26<00:00, 103MB/s]
Downloading: "https://dl.fbaipublicfiles.com/demucs/hybrid_transformer/955717e8-8726e21a.th" to /root/.cache/torch/hub/checkpoints/955717e8-8726e21a.th
100% 80.2M/80.2M [00:00<00:00, 96.4MB/s]
Downloading (โ€ฆ)ve/main/spiece.model: 100% 792k/792k [00:00<00:00, 8.28MB/s]
Downloading (โ€ฆ)lve/main/config.json: 100% 1.21k/1.21k [00:00<00:00, 8.98MB/s]
Downloading model.safetensors: 100% 892M/892M [00:10<00:00, 83.4MB/s]
Traceback (most recent call last):
File "/usr/local/lib/python3.10/dist-packages/gradio/routes.py", line 442, in run_predict
output = await app.get_blocks().process_api(
File "/usr/local/lib/python3.10/dist-packages/gradio/blocks.py", line 1389, in process_api
result = await self.call_function(
File "/usr/local/lib/python3.10/dist-packages/gradio/blocks.py", line 1094, in call_function
prediction = await anyio.to_thread.run_sync(
File "/usr/local/lib/python3.10/dist-packages/anyio/to_thread.py", line 33, in run_sync
return await get_asynclib().run_sync_in_worker_thread(
File "/usr/local/lib/python3.10/dist-packages/anyio/_backends/_asyncio.py", line 877, in run_sync_in_worker_thread
return await future
File "/usr/local/lib/python3.10/dist-packages/anyio/_backends/_asyncio.py", line 807, in run
result = context.run(func, *args)
File "/usr/local/lib/python3.10/dist-packages/gradio/utils.py", line 703, in wrapper
response = f(*args, **kwargs)
File "/content/audiocraft_plus/app.py", line 634, in predict_full
load_model(model, custom_model, base_model)
File "/content/audiocraft_plus/app.py", line 152, in load_model
MODEL.lm.load_state_dict(torch.load("models/" + str(version) + ".pt"))
File "/usr/local/lib/python3.10/dist-packages/torch/serialization.py", line 791, in load
with _open_file_like(f, 'rb') as opened_file:
File "/usr/local/lib/python3.10/dist-packages/torch/serialization.py", line 271, in _open_file_like
return _open_file(name_or_buffer, mode)
File "/usr/local/lib/python3.10/dist-packages/torch/serialization.py", line 252, in init
super().init(open(name, mode))
FileNotFoundError: [Errno 2] No such file or directory: 'models/melody.pt'

UnboundLocalError: local variable 'sentencepiece_model_pb2' referenced before assignment

python3 app.py --listen 192.168.2.91
False
Running on local URL: http://192.168.2.91:7860

To create a public link, set share=True in launch().
Loading model GrandaddyShmax/audiogen-medium
Downloading (โ€ฆ)ssion_state_dict.bin: 100%|โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ| 236M/236M [00:02<00:00, 115MB/s]
Downloading state_dict.bin: 100%|โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ| 3.68G/3.68G [00:33<00:00, 111MB/s]
Downloading (โ€ฆ)ve/main/spiece.model: 100%|โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ| 792k/792k [00:00<00:00, 1.71MB/s]
Downloading (โ€ฆ)lve/main/config.json: 100%|โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ| 1.21k/1.21k [00:00<00:00, 12.4MB/s]
Traceback (most recent call last):
File "/home/koen/.local/lib/python3.10/site-packages/gradio/routes.py", line 488, in run_predict
output = await app.get_blocks().process_api(
File "/home/koen/.local/lib/python3.10/site-packages/gradio/blocks.py", line 1431, in process_api
result = await self.call_function(
File "/home/koen/.local/lib/python3.10/site-packages/gradio/blocks.py", line 1109, in call_function
prediction = await anyio.to_thread.run_sync(
File "/home/koen/.local/lib/python3.10/site-packages/anyio/to_thread.py", line 33, in run_sync
return await get_asynclib().run_sync_in_worker_thread(
File "/home/koen/.local/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 877, in run_sync_in_worker_thread
return await future
File "/home/koen/.local/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 807, in run
result = context.run(func, *args)
File "/home/koen/.local/lib/python3.10/site-packages/gradio/utils.py", line 706, in wrapper
response = f(*args, **kwargs)
File "/home/koen/audiocraft_plus/app.py", line 857, in predict_full
load_model(model, custom_model, gen_type)
File "/home/koen/audiocraft_plus/app.py", line 156, in load_model
MODEL = AudioGen.get_pretrained(version)
File "/home/koen/audiocraft_plus/audiocraft/models/audiogen.py", line 92, in get_pretrained
lm = load_lm_model(name, device=device)
File "/home/koen/audiocraft_plus/audiocraft/models/loaders.py", line 111, in load_lm_model
model = builders.get_lm_model(cfg)
File "/home/koen/audiocraft_plus/audiocraft/models/builders.py", line 97, in get_lm_model
condition_provider = get_conditioner_provider(kwargs["dim"], cfg).to(cfg.device)
File "/home/koen/audiocraft_plus/audiocraft/models/builders.py", line 137, in get_conditioner_provider
conditioners[str(cond)] = T5Conditioner(output_dim=output_dim, device=device, **model_args)
File "/home/koen/audiocraft_plus/audiocraft/modules/conditioners.py", line 415, in init
self.t5_tokenizer = T5Tokenizer.from_pretrained(name)
File "/home/koen/.local/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 1854, in from_pretrained
return cls._from_pretrained(
File "/home/koen/.local/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 2017, in _from_pretrained
tokenizer = cls(*init_inputs, **init_kwargs)
File "/home/koen/.local/lib/python3.10/site-packages/transformers/models/t5/tokenization_t5.py", line 194, in init
self.sp_model = self.get_spm_processor()
File "/home/koen/.local/lib/python3.10/site-packages/transformers/models/t5/tokenization_t5.py", line 200, in get_spm_processor
model_pb2 = import_protobuf()
File "/home/koen/.local/lib/python3.10/site-packages/transformers/convert_slow_tokenizer.py", line 40, in import_protobuf
return sentencepiece_model_pb2
UnboundLocalError: local variable 'sentencepiece_model_pb2' referenced before assignment

after fresh install Ubuntu server 22.04.2
nvidia rtx 3070
ffmpeg + build-essential + python3-pip installed
git clone https://github.com/GrandaddyShmax/audiocraft_plus
cd audiocraft_plus
pip install 'torch>=2.0'
pip install -e .

linux> python3 app.py --listen 192.168.2.91

MacOS Build and Performance

Hey @GrandaddyShmax,

after you already helped me to solve a compatibility installation problem for MacOS (by pointing me to the additional branch for MacOS called "mac-os-fix"), I have a general question regarding the performance.

First, I noticed that the MacOS version ("MusicGen+ V1.2.8c Mac OS Version") is still different from the current release ("AudioCraft Plus - v2.0.0a") - but this is probably irrelevant for the way the (audio) data is generated I guess :). I had tried to generate a 4 seconds long file (mono, 32 kHz, small model. But after 10 minutes of waiting I had the feeling that maybe something is not working properly or that my hardware is not optimal.

My question: the data generation is done locally using your own processing power, isn't it? If yes: is use made of Apple's M1/2 chip with Neural Engine, as it happens for example with the tool "TextToSample") (which is also based on METAs Audiocraft)?

Thanks (again)

My Specs:

  • Apple M2 Max (12-core CPU, 30-core GPU, 16-core Neural Engine)
  • 64 GB RAM
  • MacOS Ventura 13.5 (current release)

No models found or set allow_custom_value=True.

When I launch the application, I get this error message:


False
I:\audiocraft_plus\venv\lib\site-packages\gradio\components\dropdown.py:231: UserWarning: The value passed into gr.Dropdown() is not in the list of choices. Please update the list of choices to include: No models found or set allow_custom_value=True.
  warnings.warn(
Running on local URL:  http://127.0.0.1:7862

What is the problem, how can it be fixed? This is how I start the program:
python app.py --inbrowser --cache --unload_to_cpu

Load models stored in User\.Cache folder

I have musicgen model already downloaded at C:\User.Cache\huggingface\hub\ .
I want to use them in audiocraft plus to save disk space. I have checked on the code of app.py file and have a vague idea of how it works. It basically uses the models folder in audiocraft plus directory for custom models. Can anyone help me out with modification needed to be done so that I can use already downloaded models. I am not much familiar with python and don't know how to modify os.listdir() or anything else in the get_available_folders() method.

mac-os-fix for update

I got the updated version fixed for mac os on my own now.

But Multiband_Diffusion is not working.
I assume following the errors, because it heavy relies on torch.
Am I right, or is there a way to use it on CPU, without torch/CUDA enabled?

CUDA: Out of memory

I have RTX 2060, nVidia driver 531, everything is idle, yet im not able to generate anything (I have no problem with generating images with stable diffusion). Any idea what can be wrong? Those 0 bytes seems suspicious.

CUDA out of memory. Tried to allocate 24.00 MiB. GPU 0 has a total capacty of 6.00 GiB of which 0 bytes is free

Error on AudioCraft Plus Huggingface Space and Google Colab?

Hello there, everybody!

There's an error when I'm doing after 4-5 minutes on both MusicGen and AudioGen via AudioCraft Plus on Huggingface Space and Google Colab, can you fix this problem, please, GrandaddyShmax?

Thank you for your help.

Documentation.

Could we please get a wiki or some documentation for the functionality of audiocraft_plus. This has just been integrated into @rundiffusion and they have some documentation in the tab - your ui, but I would like a reference on your github page for the options and tools in the app.

MusGen Tab

Generate (button)]:
Generates the music with the given settings and prompts.
[Interrupt (button)]:
Stops the music generation as soon as it can, providing an incomplete output.
Generation Tab:
Structure Prompts:
This feature helps reduce repetetive prompts by allowing you to set global prompts
that will be used for all prompt segments.

[Structure Prompts (checkbox)]:
Enable/Disable the structure prompts feature.
[BPM (number)]:
Beats per minute of the generated music.
[Key (dropdown)]:
The key of the generated music.
[Scale (dropdown)]:
The scale of the generated music.
[Global Prompt (text)]:
Here write the prompt that you wish to be used for all prompt segments.
Multi-Prompt:
This feature allows you to control the music, adding variation to different time segments.
You have up to 10 prompt segments. the first prompt will always be 30s long
the other prompts will be [30s - overlap].
for example if the overlap is 10s, each prompt segment will be 20s.

[Prompt Segments (number)]:
Amount of unique prompt to generate throughout the music generation.
[Prompt/Input Text (prompt)]:
Here describe the music you wish the model to generate.
[Repeat (number)]:
Write how many times this prompt will repeat (instead of wasting another prompt segment on the same prompt).
[Time (text)]:
The time of the prompt segment.
[Calculate Timings (button)]:
Calculates the timings of the prompt segments.
[Duration (number)]:
How long you want the generated music to be (in seconds).
[Overlap (number)]:
How much each new segment will reference the previous segment (in seconds).
For example, if you choose 20s: Each new segment after the first one will reference the previous segment 20s
and will generate only 10s of new music. The model can only process 30s of music.
[Seed (number)]:
Your generated music id. If you wish to generate the exact same music,
place the exact seed with the exact prompts
(This way you can also extend specific song that was generated short).
[Random Seed (button)]:
Gives "-1" as a seed, which counts as a random seed.
[Copy Previous Seed (button)]:
Copies the seed from the output seed (if you don't feel like doing it manualy).
Audio Tab:
[Input Type (selection)]:
File mode allows you to upload an audio file to use as input
Mic mode allows you to use your microphone as input
[Input Audio Mode (selection)]:
Melody mode only works with the melody model: it conditions the music generation to reference the melody
Sample mode works with any model: it gives a music sample to the model to generate its continuation.
[Trim Start and Trim End (numbers)]:
Trim Start set how much you'd like to trim the input audio from the start
Trim End same as the above but from the end
[Input Audio (audio file)]:
Input here the audio you wish to use with "melody" or "sample" mode.
Customization Tab:
[Background Color (color)]:
Works only if you don't upload image. Color of the background of the waveform.
[Bar Color Start (color)]:
First color of the waveform bars.
[Bar Color End (color)]:
Second color of the waveform bars.
[Background Image (image)]:
Background image that you wish to be attached to the generated video along with the waveform.
[Height and Width (numbers)]:
Output video resolution, only works with image.
(minimum height and width is 256).
Settings Tab:
[Output Audio Channels (selection)]:
With this you can select the amount of channels that you wish for your output audio.
mono is a straightforward single channel audio
stereo is a dual channel audio but it will sound more or less like mono
stereo effect this one is also dual channel but uses tricks to simulate a stereo audio.
[Output Audio Sample Rate (dropdown)]:
The output audio sample rate, the model default is 32000.
[Model (selection)]:
Here you can choose which model you wish to use:
melody model is based on the medium model with a unique feature that lets you use melody conditioning
small model is trained on 300M parameters
medium model is trained on 1.5B parameters
large model is trained on 3.3B parameters
custom model runs the custom model that you provided.
[Custom Model (selection)]:
This dropdown will show you models that are placed in the models folder
you must select custom in the model options in order to use it.
[Refresh (button)]:
Refreshes the dropdown list for custom model.
[Decoder (selection)]:
Choose here the decoder that you wish to use:
Default is the default decoder
MultiBand_Diffusion is a decoder that uses diffusion to generate the audio.
[Top-k (number)]:
is a parameter used in text generation models, including music generation models. It determines the number of most likely next tokens to consider at each step of the generation process. The model ranks all possible tokens based on their predicted probabilities, and then selects the top-k tokens from the ranked list. The model then samples from this reduced set of tokens to determine the next token in the generated sequence. A smaller value of k results in a more focused and deterministic output, while a larger value of k allows for more diversity in the generated music.
[Top-p (number)]:
also known as nucleus sampling or probabilistic sampling, is another method used for token selection during text generation. Instead of specifying a fixed number like top-k, top-p considers the cumulative probability distribution of the ranked tokens. It selects the smallest possible set of tokens whose cumulative probability exceeds a certain threshold (usually denoted as p). The model then samples from this set to choose the next token. This approach ensures that the generated output maintains a balance between diversity and coherence, as it allows for a varying number of tokens to be considered based on their probabilities.
[Temperature (number)]:
is a parameter that controls the randomness of the generated output. It is applied during the sampling process, where a higher temperature value results in more random and diverse outputs, while a lower temperature value leads to more deterministic and focused outputs. In the context of music generation, a higher temperature can introduce more variability and creativity into the generated music, but it may also lead to less coherent or structured compositions. On the other hand, a lower temperature can produce more repetitive and predictable music.
[Classifier Free Guidance (number)]:
refers to a technique used in some music generation models where a separate classifier network is trained to provide guidance or control over the generated music. This classifier is trained on labeled data to recognize specific musical characteristics or styles. During the generation process, the output of the generator model is evaluated by the classifier, and the generator is encouraged to produce music that aligns with the desired characteristics or style. This approach allows for more fine-grained control over the generated music, enabling users to specify certain attributes they want the model to capture.

AudioGen Tab

[Generate (button)]:
Generates the audio with the given settings and prompts.
[Interrupt (button)]:
Stops the audio generation as soon as it can, providing an incomplete output.
Generation Tab:
Structure Prompts:
This feature helps reduce repetetive prompts by allowing you to set global prompts
that will be used for all prompt segments.

[Structure Prompts (checkbox)]:
Enable/Disable the structure prompts feature.
[Global Prompt (text)]:
Here write the prompt that you wish to be used for all prompt segments.
Multi-Prompt:
This feature allows you to control the audio, adding variation to different time segments.
You have up to 10 prompt segments. the first prompt will always be 10s long
the other prompts will be [10s - overlap].
for example if the overlap is 2s, each prompt segment will be 8s.

[Prompt Segments (number)]:
Amount of unique prompt to generate throughout the audio generation.
[Prompt/Input Text (prompt)]:
Here describe the audio you wish the model to generate.
[Repeat (number)]:
Write how many times this prompt will repeat (instead of wasting another prompt segment on the same prompt).
[Time (text)]:
The time of the prompt segment.
[Calculate Timings (button)]:
Calculates the timings of the prompt segments.
[Duration (number)]:
How long you want the generated audio to be (in seconds).
[Overlap (number)]:
How much each new segment will reference the previous segment (in seconds).
For example, if you choose 2s: Each new segment after the first one will reference the previous segment 2s
and will generate only 8s of new audio. The model can only process 10s of music.
[Seed (number)]:
Your generated audio id. If you wish to generate the exact same audio,
place the exact seed with the exact prompts
(This way you can also extend specific song that was generated short).
[Random Seed (button)]:
Gives "-1" as a seed, which counts as a random seed.
[Copy Previous Seed (button)]:
Copies the seed from the output seed (if you don't feel like doing it manualy).
Audio Tab:
[Input Type (selection)]:
File mode allows you to upload an audio file to use as input
Mic mode allows you to use your microphone as input
[Trim Start and Trim End (numbers)]:
Trim Start set how much you'd like to trim the input audio from the start
Trim End same as the above but from the end
[Input Audio (audio file)]:
Input here the audio you wish to use.
Customization Tab:
[Background Color (color)]:
Works only if you don't upload image. Color of the background of the waveform.
[Bar Color Start (color)]:
First color of the waveform bars.
[Bar Color End (color)]:
Second color of the waveform bars.
[Background Image (image)]:
Background image that you wish to be attached to the generated video along with the waveform.
[Height and Width (numbers)]:
Output video resolution, only works with image.
(minimum height and width is 256).
Settings Tab:
[Output Audio Channels (selection)]:
With this you can select the amount of channels that you wish for your output audio.
mono is a straightforward single channel audio
stereo is a dual channel audio but it will sound more or less like mono
stereo effect this one is also dual channel but uses tricks to simulate a stereo audio.
[Output Audio Sample Rate (dropdown)]:
The output audio sample rate, the model default is 32000.
[Top-k (number)]:
is a parameter used in text generation models, including music generation models. It determines the number of most likely next tokens to consider at each step of the generation process. The model ranks all possible tokens based on their predicted probabilities, and then selects the top-k tokens from the ranked list. The model then samples from this reduced set of tokens to determine the next token in the generated sequence. A smaller value of k results in a more focused and deterministic output, while a larger value of k allows for more diversity in the generated music.
[Top-p (number)]:
also known as nucleus sampling or probabilistic sampling, is another method used for token selection during text generation. Instead of specifying a fixed number like top-k, top-p considers the cumulative probability distribution of the ranked tokens. It selects the smallest possible set of tokens whose cumulative probability exceeds a certain threshold (usually denoted as p). The model then samples from this set to choose the next token. This approach ensures that the generated output maintains a balance between diversity and coherence, as it allows for a varying number of tokens to be considered based on their probabilities.
[Temperature (number)]:
is a parameter that controls the randomness of the generated output. It is applied during the sampling process, where a higher temperature value results in more random and diverse outputs, while a lower temperature value leads to more deterministic and focused outputs. In the context of music generation, a higher temperature can introduce more variability and creativity into the generated music, but it may also lead to less coherent or structured compositions. On the other hand, a lower temperature can produce more repetitive and predictable music.
[Classifier Free Guidance (number)]:
refers to a technique used in some music generation models where a separate classifier network is trained to provide guidance or control over the generated music. This classifier is trained on labeled data to recognize specific musical characteristics or styles. During the generation process, the output of the generator model is evaluated by the classifier, and the generator is encouraged to produce music that aligns with the desired characteristics or style. This approach allows for more fine-grained control over the generated music, enabling users to specify certain attributes they want the model to capture.

Can't use custom models - error

Traceback (most recent call last):
File "C:\Users\dalli\Desktop\audiocraft plus installers\audiocraft_plus\venv\lib\site-packages\gradio\queueing.py", line 407, in call_prediction
output = await route_utils.call_process_api(
File "C:\Users\dalli\Desktop\audiocraft plus installers\audiocraft_plus\venv\lib\site-packages\gradio\route_utils.py", line 226, in call_process_api
output = await app.get_blocks().process_api(
File "C:\Users\dalli\Desktop\audiocraft plus installers\audiocraft_plus\venv\lib\site-packages\gradio\blocks.py", line 1550, in process_api
result = await self.call_function(
File "C:\Users\dalli\Desktop\audiocraft plus installers\audiocraft_plus\venv\lib\site-packages\gradio\blocks.py", line 1185, in call_function
prediction = await anyio.to_thread.run_sync(
File "C:\Users\dalli\Desktop\audiocraft plus installers\audiocraft_plus\venv\lib\site-packages\anyio\to_thread.py", line 56, in run_sync
return await get_async_backend().run_sync_in_worker_thread(
File "C:\Users\dalli\Desktop\audiocraft plus installers\audiocraft_plus\venv\lib\site-packages\anyio_backends_asyncio.py", line 2144, in run_sync_in_worker_thread
return await future
File "C:\Users\dalli\Desktop\audiocraft plus installers\audiocraft_plus\venv\lib\site-packages\anyio_backends_asyncio.py", line 851, in run
result = context.run(func, *args)
File "C:\Users\dalli\Desktop\audiocraft plus installers\audiocraft_plus\venv\lib\site-packages\gradio\utils.py", line 661, in wrapper
response = f(*args, **kwargs)
File "C:\Users\dalli\Desktop\audiocraft plus installers\audiocraft_plus\app.py", line 914, in predict_full
outs, outs_audio, outs_backup, input_length = _do_predictions(
File "C:\Users\dalli\Desktop\audiocraft plus installers\audiocraft_plus\app.py", line 584, in _do_predictions
outputs = MODEL.generate(texts, progress=progress, return_tokens=USE_DIFFUSION)
File "C:\Users\dalli\Desktop\audiocraft plus installers\audiocraft_plus\audiocraft\models\musicgen.py", line 181, in generate
return self.generate_audio(tokens)
File "C:\Users\dalli\Desktop\audiocraft plus installers\audiocraft_plus\audiocraft\models\musicgen.py", line 403, in generate_audio
gen_audio = self.compression_model.decode(gen_tokens, None)
File "C:\Users\dalli\Desktop\audiocraft plus installers\audiocraft_plus\audiocraft\models\encodec.py", line 356, in decode
res = self.model.decode(codes[None], scales)
File "C:\Users\dalli\Desktop\audiocraft plus installers\audiocraft_plus\venv\lib\site-packages\transformers\models\encodec\modeling_encodec.py", line 738, in decode
audio_values = self._decode_frame(audio_codes[0], audio_scales[0])
File "C:\Users\dalli\Desktop\audiocraft plus installers\audiocraft_plus\venv\lib\site-packages\transformers\models\encodec\modeling_encodec.py", line 702, in _decode_frame
embeddings = self.quantizer.decode(codes)
File "C:\Users\dalli\Desktop\audiocraft plus installers\audiocraft_plus\venv\lib\site-packages\transformers\models\encodec\modeling_encodec.py", line 435, in decode
layer = self.layers[i]
File "C:\Users\dalli\Desktop\audiocraft plus installers\audiocraft_plus\venv\lib\site-packages\torch\nn\modules\container.py", line 293, in getitem
return self._modules[self._get_abs_string_index(idx)]
File "C:\Users\dalli\Desktop\audiocraft plus installers\audiocraft_plus\venv\lib\site-packages\torch\nn\modules\container.py", line 283, in _get_abs_string_index
raise IndexError(f'index {idx} is out of range')
IndexError: index 4 is out of range

Bug in temperature settings

Isn't the temperature supposed to be, a float between 0 and 1? The UI permits higher values, as well as the numbers with every click, go up by one, instead of a 0.1 or something like that.

If i may suggest, the way the GPT's temperature is defined, is a slider of 0 until it reaches 1. The same javascirpt element we define seconds of the song, in the audiocraft UI.

Apart from that the UI, is very easy and a joy to to use it. Thanks!

Is melody conditioning supported ?

Hi , I am trying to run the colab notebook and i got a bit confused by the webui . Is melody conditioning not supported ? Like melofy +text ?

Also , is there a guide or preferred way of writing good prompts ( like a prompt guide ) . Perhaps it is specific for text2audio , melody conditioning ? (Different prompts for different task)

Info on this could be helpful

Collab support discontinued?

Hello, the google collab opens the audiocraft+ 2.0.0a gradio, but upon running it, throws an error. Observing that the README doesn't mention any collab version anymore, does the 2 version dropped support?

VRAM staying full and OOM?

Hi, so I got it installed(using Pinokio) and I'm able to generate but say I generate a clip of music and then try again, I get an OOM error, I can see that the VRAM isn't being cleared after the first generation and I have to close and restart the app to clear the VRAM, this is a bug right? I use Stable Diffusion locally and have come across similar issues but perhaps it is just a misconfiguration somewhere.

I'm on 8gb of VRAM and noticed I can only really generate upto about 5 seconds and more so with the audio as opposed to music generator, not sure why. I know the recommended VRAM is 12gb but surely it's all related to the length?

Any suggestions?

Thanks.

Can't Genarate Anything

When I Installed This Using Pinokeo I Got This

Loading model GrandaddyShmax/musicgen-melody

G:\pinokio\api\audiocraft_plus.git\app\env\lib\site-packages\torch\nn\utils\weight_norm.py:28: UserWarning: torch.nn.utils.weight_norm is deprecated in favor of torch.nn.utils.parametrizations.weight_norm.
warnings.warn("torch.nn.utils.weight_norm is deprecated in favor of torch.nn.utils.parametrizations.weight_norm.")
new batch 1 [['driving rock']] [None] [None]
Traceback (most recent call last):
File "G:\pinokio\api\audiocraft_plus.git\app\env\lib\site-packages\gradio\queueing.py", line 407, in call_prediction
output = await route_utils.call_process_api(
File "G:\pinokio\api\audiocraft_plus.git\app\env\lib\site-packages\gradio\route_utils.py", line 226, in call_process_api
output = await app.get_blocks().process_api(
File "G:\pinokio\api\audiocraft_plus.git\app\env\lib\site-packages\gradio\blocks.py", line 1550, in process_api
result = await self.call_function(
File "G:\pinokio\api\audiocraft_plus.git\app\env\lib\site-packages\gradio\blocks.py", line 1185, in call_function
prediction = await anyio.to_thread.run_sync(
File "G:\pinokio\api\audiocraft_plus.git\app\env\lib\site-packages\anyio\to_thread.py", line 56, in run_sync
return await get_async_backend().run_sync_in_worker_thread(
File "G:\pinokio\api\audiocraft_plus.git\app\env\lib\site-packages\anyio_backends_asyncio.py", line 2177, in run_sync_in_worker_thread
return await future
File "G:\pinokio\api\audiocraft_plus.git\app\env\lib\site-packages\anyio_backends_asyncio.py", line 859, in run
result = context.run(func, *args)
File "G:\pinokio\api\audiocraft_plus.git\app\env\lib\site-packages\gradio\utils.py", line 661, in wrapper
response = f(*args, **kwargs)
File "G:\pinokio\api\audiocraft_plus.git\app\app.py", line 914, in predict_full
outs, outs_audio, outs_backup, input_length = _do_predictions(
File "G:\pinokio\api\audiocraft_plus.git\app\app.py", line 584, in _do_predictions
outputs = MODEL.generate(texts, progress=progress, return_tokens=USE_DIFFUSION)
File "G:\pinokio\api\audiocraft_plus.git\app\audiocraft\models\musicgen.py", line 186, in generate
tokens = self._generate_tokens(attributes, prompt_tokens, progress)
File "G:\pinokio\api\audiocraft_plus.git\app\audiocraft\models\musicgen.py", line 349, in _generate_tokens
gen_tokens = self.lm.generate(
File "G:\pinokio\api\audiocraft_plus.git\app\env\lib\site-packages\torch\utils_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "G:\pinokio\api\audiocraft_plus.git\app\audiocraft\models\lm.py", line 452, in generate
tokenized = self.condition_provider.tokenize(conditions)
File "G:\pinokio\api\audiocraft_plus.git\app\audiocraft\modules\conditioners.py", line 1176, in tokenize
output[attribute] = self.conditioners[attribute].tokenize(batch)
File "G:\pinokio\api\audiocraft_plus.git\app\audiocraft\modules\conditioners.py", line 444, in tokenize
inputs = self.t5_tokenizer(entries, return_tensors='pt', padding=True).to(self.device)
File "G:\pinokio\api\audiocraft_plus.git\app\env\lib\site-packages\transformers\tokenization_utils_base.py", line 3055, in call
encodings = self._call_one(text=text, text_pair=text_pair, **all_kwargs)
File "G:\pinokio\api\audiocraft_plus.git\app\env\lib\site-packages\transformers\tokenization_utils_base.py", line 3142, in _call_one
return self.batch_encode_plus(
File "G:\pinokio\api\audiocraft_plus.git\app\env\lib\site-packages\transformers\tokenization_utils_base.py", line 3338, in batch_encode_plus
return self._batch_encode_plus(
File "G:\pinokio\api\audiocraft_plus.git\app\env\lib\site-packages\transformers\tokenization_utils.py", line 880, in _batch_encode_plus
ids, pair_ids = ids_or_pair_ids
ValueError: not enough values to unpack (expected 2, got 1)

Is This Pinokeos fualt?

error on installing, mac OS

Cannot install on MacOS due to that problem:

ImportError: dlopen(/Users/XXXXX/opt/anaconda3/lib/python3.9/site-packages/taglib.cpython-39-darwin.so, 0x0002): symbol not found in flat namespace (__ZN6TagLib10StringList6appendERKNS_6StringE)

Any idea on how to fix it? Thanks for your support!

mac os 12.6.5
M1

Difference Between GrandaddyShmax Models and Facebook

Are there any difference between GrandaddyShmax and Facebook models? I downloaded the models once from facebook huggingface and they were fine, the models do run on AudioCraft+ as well, but is there any difference? for example with censorship?

AttributeError: 'NoneType' object has no attribute 'set_custom_progress_callback'

When I try to use Gradio Client to use this program, it just response:
ValueError: None

The console of the program:

Loading model small
Traceback (most recent call last):
File "E:\ProgramData\Anaconda3\envs\audiocraftplus\lib\site-packages\gradio\queueing.py", line 407, in call_prediction
output = await route_utils.call_process_api(
File "E:\ProgramData\Anaconda3\envs\audiocraftplus\lib\site-packages\gradio\route_utils.py", line 226, in call_process_api
output = await app.get_blocks().process_api(
File "E:\ProgramData\Anaconda3\envs\audiocraftplus\lib\site-packages\gradio\blocks.py", line 1550, in process_api
result = await self.call_function(
File "E:\ProgramData\Anaconda3\envs\audiocraftplus\lib\site-packages\gradio\blocks.py", line 1185, in call_function
prediction = await anyio.to_thread.run_sync(
File "E:\ProgramData\Anaconda3\envs\audiocraftplus\lib\site-packages\anyio\to_thread.py", line 56, in run_sync
return await get_async_backend().run_sync_in_worker_thread(
File "E:\ProgramData\Anaconda3\envs\audiocraftplus\lib\site-packages\anyio_backends_asyncio.py", line 2134, in run_sync_in_worker_thread
return await future
File "E:\ProgramData\Anaconda3\envs\audiocraftplus\lib\site-packages\anyio_backends_asyncio.py", line 851, in run
result = context.run(func, *args)
File "E:\ProgramData\Anaconda3\envs\audiocraftplus\lib\site-packages\gradio\utils.py", line 661, in wrapper
response = f(*args, **kwargs)
File "E:\audiocraft_plus\app.py", line 870, in predict_full
MODEL.set_custom_progress_callback(_progress)
AttributeError: 'NoneType' object has no attribute 'set_custom_progress_callback'

What I've tried:
install gradio client version 0.6.0

How can I FIX it, THX

Longer durations cause error

Up to 60sec this works fine, but 120sec cause error. Original audiocraft generates 120sec files just fine so it's not hw related. GPU is RTX3090 with 24gb VRAM.

This error after it gets to the 1500 steps: TypeError: MusicGen.generate_continuation() got an unexpected keyword argument 'melody_wavs'

No module named 'taglib'

When running app.py I get the error:
Traceback (most recent call last):
File "D:\MusicGen\audiocraft_plus\app.py", line 26, in
import taglib
ModuleNotFoundError: No module named 'taglib'

How can this be fixed?

I fixed this

Updated via git pull. Now unable to start AudioCraft.

Traceback (most recent call last):
  File "C:\Users\mattb\audiocraft_plus\app.py", line 26, in <module>
    import taglib
ModuleNotFoundError: No module named 'taglib'
Press any key to continue . . .

I press a key and the cmd window just closes.

How exactly do you update an existing installation of Audiocraft to the latest version anyway?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.