Comments (11)
You should probably wrap your code in code blocks (``` around your text) in the future.
I ran that code, and it created the file just fine. Can you send me the wav you're using? I think your input wav is a bit broken, and encodec can't load it.
Again, this issue is not really related to my repository here. But it's probably your wav file.
from bark-voice-cloning-hubert-quantizer.
You don't need to shorten the audio, but it's recommended to shorten it to 15 or 20 seconds, going beyond 15 seconds will result in less audio for it to clone from.
make sure you take the audio from the end, not the start.
from bark-voice-cloning-hubert-quantizer.
This doesn't seem to be an issue with my repository. This repository exclusively extracts semantics.
Also, i was not able to reproduce the issue, your code worked fine on my side.
- Do you have the right path for your models?
- Do you have the right version of encodec installed? (
pip install -f encodec
) - Maybe your wav is invalid? (try using ffmpeg
ffmpeg -i 0520.wav audio.wav
)
from bark-voice-cloning-hubert-quantizer.
Thank you so much for your reply! sadly it still didn't work for me. How did you generate the npz? this is what I wrote, so its probably the issue: ```
from encodec import EncodecModel
from encodec.utils import convert_audio
import torchaudio
import torch
""" Instantiate a pretrained EnCodec model
model = EncodecModel.encodec_model_24khz()
The number of codebooks used will be determined bythe bandwidth selected.
E.g. for a bandwidth of 6kbps, n_q = 8
codebooks are used.
Supported bandwidths are 1.5kbps (n_q = 2), 3 kbps (n_q = 4), 6 kbps (n_q = 8) and 12 kbps (n_q =16) and 24kbps (n_q=32).
For the 48 kHz model, only 3, 6, 12, and 24 kbps are supported. The number
of codebooks for each is half that of the 24 kHz model as the frame rate is twice as much.
model.set_target_bandwidth(6.0)"""
"""Load and pre-process the audio waveform"""
wav, sr = torchaudio.load("0520.wav")
wav = convert_audio(wav, sr, model.sample_rate, model.channels)
wav = wav.unsqueeze(0)
"""Extract discrete codes from EnCodec"""
with torch.no_grad():
encoded_frames = model.encode(wav)
codes = torch.cat([encoded[0] for encoded in encoded_frames], dim=-1) # [B, n_q, T]
fine_prompt = codes <- is this the issue?
coarse = fine_prompt[:2, :]
import numpy
numpy.savez(semantic_prompt=semantic_tokens, fine_prompt=fine_prompt, coarse_prompt=coarse, file="pleasework.npz")```
from bark-voice-cloning-hubert-quantizer.
Sorry for the wait!
I put my code into a jupyter notebook, and I still got the same problem! Ill link that, and my audio.wav is in it.
Thanks so much for your time!
from bark-voice-cloning-hubert-quantizer.
You can't upload an audio file like that to google colab, since it's storage is not persistent.
Check if you can clone the file in here
from bark-voice-cloning-hubert-quantizer.
I found the problem! You were right! I shortened my wav to under 10 seconds, and its working, thank you so much! btw, It might be helpful for others if you put that google colab I had above in the readme
https://colab.research.google.com/drive/1IA3c_R859nANerMARazCSrjc2UD3ws8A?usp=sharing
from bark-voice-cloning-hubert-quantizer.
Oh, actually, i noticed this today.
codes = torch.cat([encoded[0] for encoded in encoded_frames], dim=-1) # [B, n_q, T]
should be
codes = torch.cat([encoded[0] for encoded in encoded_frames], dim=-1).squeeze() # [B, n_q, T]
from bark-voice-cloning-hubert-quantizer.
That makes sense! The code I write is usually the problem 🤣
Thanks so much!
from bark-voice-cloning-hubert-quantizer.
That makes sense! The code I write is usually the problem 🤣
Thanks so much!
That was actually something i was missing in the old version, plus the encodec example doesn't have it. So that's on me.
from bark-voice-cloning-hubert-quantizer.
For anyone trying to find an answer - codes = torch.cat([encoded[0] for encoded in encoded_frames], dim=-1) # [B, n_q, T]
should be
codes = torch.cat([encoded[0] for encoded in encoded_frames], dim=-1).squeeze() # [B, n_q, T]
And shorten audio file to under 10 seconds
from bark-voice-cloning-hubert-quantizer.
Related Issues (20)
- Support Japanese voice cloing HOT 7
- Switch fairseq dependency to transformers' Hubert HOT 1
- issues in notebook due to fairseq version HOT 2
- RuntimeError: The size of tensor a (28) must match the size of tensor b (33) at non-singleton dimension 2
- `KeyError: 'best_loss'` when testing self-trained model HOT 2
- HTTPError: 404 Client Error: Not Found for url: https://huggingface.co/GitMylo/bark-voice-cloning/resolve/main/japanese-HuBERT-quantizer_24_epoch.pth HOT 1
- german-HuBERT-quantizer_14_epoch.pth does not have all meta data HOT 2
- Omni-Lingual Quantizer? HOT 2
- adding batches to training?
- Support for Turkish langauge
- No module named 'hubert' HOT 1
- [Question] This notebook it's for create a speark with trained semantic model?
- How To Train Chinese Tokenizer
- How to create a quantizer in a dialect that Bark didn't support? HOT 2
- Request: Write a tutorial for training quantizers
- Added improved functionality and RestAPI
- Stuck on `Installing Demucs`
- How to Train for Non-Verbal Effects Voice? HOT 2
- Torch compile errors on windows 11
- Support for Swahili Language
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from bark-voice-cloning-hubert-quantizer.