I have tried to create an npz, although I think I have done something wrong. I have go

"no description" when bark run about bark-voice-cloning-hubert-quantizer HOT 11 CLOSED

gitmylo commented on June 14, 2024

"no description" when bark run

from bark-voice-cloning-hubert-quantizer.

Comments (11)

gitmylo commented on June 14, 2024 1

You should probably wrap your code in code blocks (``` around your text) in the future.

I ran that code, and it created the file just fine. Can you send me the wav you're using? I think your input wav is a bit broken, and encodec can't load it.

Again, this issue is not really related to my repository here. But it's probably your wav file.

from bark-voice-cloning-hubert-quantizer.

gitmylo commented on June 14, 2024 1

You don't need to shorten the audio, but it's recommended to shorten it to 15 or 20 seconds, going beyond 15 seconds will result in less audio for it to clone from.

make sure you take the audio from the end, not the start.

from bark-voice-cloning-hubert-quantizer.

gitmylo commented on June 14, 2024

This doesn't seem to be an issue with my repository. This repository exclusively extracts semantics.

Also, i was not able to reproduce the issue, your code worked fine on my side.

Do you have the right path for your models?
Do you have the right version of encodec installed? (pip install -f encodec)
Maybe your wav is invalid? (try using ffmpeg ffmpeg -i 0520.wav audio.wav)

from bark-voice-cloning-hubert-quantizer.

NickAnastasoff commented on June 14, 2024

Thank you so much for your reply! sadly it still didn't work for me. How did you generate the npz? this is what I wrote, so its probably the issue: ```
from encodec import EncodecModel
from encodec.utils import convert_audio

import torchaudio
import torch

""" Instantiate a pretrained EnCodec model
model = EncodecModel.encodec_model_24khz()
The number of codebooks used will be determined bythe bandwidth selected.
E.g. for a bandwidth of 6kbps, n_q = 8 codebooks are used.
Supported bandwidths are 1.5kbps (n_q = 2), 3 kbps (n_q = 4), 6 kbps (n_q = 8) and 12 kbps (n_q =16) and 24kbps (n_q=32).
For the 48 kHz model, only 3, 6, 12, and 24 kbps are supported. The number
of codebooks for each is half that of the 24 kHz model as the frame rate is twice as much.
model.set_target_bandwidth(6.0)"""

"""Load and pre-process the audio waveform"""
wav, sr = torchaudio.load("0520.wav")
wav = convert_audio(wav, sr, model.sample_rate, model.channels)
wav = wav.unsqueeze(0)

"""Extract discrete codes from EnCodec"""
with torch.no_grad():
encoded_frames = model.encode(wav)
codes = torch.cat([encoded[0] for encoded in encoded_frames], dim=-1) # [B, n_q, T]

fine_prompt = codes <- is this the issue?

coarse = fine_prompt[:2, :]

import numpy

numpy.savez(semantic_prompt=semantic_tokens, fine_prompt=fine_prompt, coarse_prompt=coarse, file="pleasework.npz")```

from bark-voice-cloning-hubert-quantizer.

NickAnastasoff commented on June 14, 2024

Sorry for the wait!

I put my code into a jupyter notebook, and I still got the same problem! Ill link that, and my audio.wav is in it.

Thanks so much for your time!

VoiceCloning Google Colab

from bark-voice-cloning-hubert-quantizer.

gitmylo commented on June 14, 2024

You can't upload an audio file like that to google colab, since it's storage is not persistent.

Check if you can clone the file in here

from bark-voice-cloning-hubert-quantizer.

NickAnastasoff commented on June 14, 2024

I found the problem! You were right! I shortened my wav to under 10 seconds, and its working, thank you so much! btw, It might be helpful for others if you put that google colab I had above in the readme
https://colab.research.google.com/drive/1IA3c_R859nANerMARazCSrjc2UD3ws8A?usp=sharing

from bark-voice-cloning-hubert-quantizer.

gitmylo commented on June 14, 2024

Oh, actually, i noticed this today.

codes = torch.cat([encoded[0] for encoded in encoded_frames], dim=-1) # [B, n_q, T]

should be

codes = torch.cat([encoded[0] for encoded in encoded_frames], dim=-1).squeeze() # [B, n_q, T]

from bark-voice-cloning-hubert-quantizer.

NickAnastasoff commented on June 14, 2024

That makes sense! The code I write is usually the problem 🤣

Thanks so much!

from bark-voice-cloning-hubert-quantizer.

gitmylo commented on June 14, 2024

That makes sense! The code I write is usually the problem 🤣

Thanks so much!

That was actually something i was missing in the old version, plus the encodec example doesn't have it. So that's on me.

from bark-voice-cloning-hubert-quantizer.

NickAnastasoff commented on June 14, 2024

For anyone trying to find an answer - codes = torch.cat([encoded[0] for encoded in encoded_frames], dim=-1) # [B, n_q, T]

should be

codes = torch.cat([encoded[0] for encoded in encoded_frames], dim=-1).squeeze() # [B, n_q, T]

~~And shorten audio file to under 10 seconds~~

from bark-voice-cloning-hubert-quantizer.

"no description" when bark run about bark-voice-cloning-hubert-quantizer HOT 11 CLOSED

Comments (11)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent