Code Monkey home page Code Monkey logo

Comments (11)

gitmylo avatar gitmylo commented on June 14, 2024 1

You should probably wrap your code in code blocks (``` around your text) in the future.

I ran that code, and it created the file just fine. Can you send me the wav you're using? I think your input wav is a bit broken, and encodec can't load it.

Again, this issue is not really related to my repository here. But it's probably your wav file.

from bark-voice-cloning-hubert-quantizer.

gitmylo avatar gitmylo commented on June 14, 2024 1

You don't need to shorten the audio, but it's recommended to shorten it to 15 or 20 seconds, going beyond 15 seconds will result in less audio for it to clone from.

make sure you take the audio from the end, not the start.

from bark-voice-cloning-hubert-quantizer.

gitmylo avatar gitmylo commented on June 14, 2024

This doesn't seem to be an issue with my repository. This repository exclusively extracts semantics.

Also, i was not able to reproduce the issue, your code worked fine on my side.

  • Do you have the right path for your models?
  • Do you have the right version of encodec installed? (pip install -f encodec)
  • Maybe your wav is invalid? (try using ffmpeg ffmpeg -i 0520.wav audio.wav)

from bark-voice-cloning-hubert-quantizer.

NickAnastasoff avatar NickAnastasoff commented on June 14, 2024

Thank you so much for your reply! sadly it still didn't work for me. How did you generate the npz? this is what I wrote, so its probably the issue: ```
from encodec import EncodecModel
from encodec.utils import convert_audio

import torchaudio
import torch

""" Instantiate a pretrained EnCodec model
model = EncodecModel.encodec_model_24khz()
The number of codebooks used will be determined bythe bandwidth selected.
E.g. for a bandwidth of 6kbps, n_q = 8 codebooks are used.
Supported bandwidths are 1.5kbps (n_q = 2), 3 kbps (n_q = 4), 6 kbps (n_q = 8) and 12 kbps (n_q =16) and 24kbps (n_q=32).
For the 48 kHz model, only 3, 6, 12, and 24 kbps are supported. The number
of codebooks for each is half that of the 24 kHz model as the frame rate is twice as much.
model.set_target_bandwidth(6.0)"""

"""Load and pre-process the audio waveform"""
wav, sr = torchaudio.load("0520.wav")
wav = convert_audio(wav, sr, model.sample_rate, model.channels)
wav = wav.unsqueeze(0)

"""Extract discrete codes from EnCodec"""
with torch.no_grad():
encoded_frames = model.encode(wav)
codes = torch.cat([encoded[0] for encoded in encoded_frames], dim=-1) # [B, n_q, T]

fine_prompt = codes <- is this the issue?

coarse = fine_prompt[:2, :]

import numpy

numpy.savez(semantic_prompt=semantic_tokens, fine_prompt=fine_prompt, coarse_prompt=coarse, file="pleasework.npz")```

from bark-voice-cloning-hubert-quantizer.

NickAnastasoff avatar NickAnastasoff commented on June 14, 2024

Sorry for the wait!

I put my code into a jupyter notebook, and I still got the same problem! Ill link that, and my audio.wav is in it.

Thanks so much for your time!

VoiceCloning Google Colab

from bark-voice-cloning-hubert-quantizer.

gitmylo avatar gitmylo commented on June 14, 2024

You can't upload an audio file like that to google colab, since it's storage is not persistent.

Check if you can clone the file in here

from bark-voice-cloning-hubert-quantizer.

NickAnastasoff avatar NickAnastasoff commented on June 14, 2024

I found the problem! You were right! I shortened my wav to under 10 seconds, and its working, thank you so much! btw, It might be helpful for others if you put that google colab I had above in the readme
https://colab.research.google.com/drive/1IA3c_R859nANerMARazCSrjc2UD3ws8A?usp=sharing

from bark-voice-cloning-hubert-quantizer.

gitmylo avatar gitmylo commented on June 14, 2024

Oh, actually, i noticed this today.

codes = torch.cat([encoded[0] for encoded in encoded_frames], dim=-1) # [B, n_q, T]

should be

codes = torch.cat([encoded[0] for encoded in encoded_frames], dim=-1).squeeze() # [B, n_q, T]

from bark-voice-cloning-hubert-quantizer.

NickAnastasoff avatar NickAnastasoff commented on June 14, 2024

That makes sense! The code I write is usually the problem 🤣

Thanks so much!

from bark-voice-cloning-hubert-quantizer.

gitmylo avatar gitmylo commented on June 14, 2024

That makes sense! The code I write is usually the problem 🤣

Thanks so much!

That was actually something i was missing in the old version, plus the encodec example doesn't have it. So that's on me.

from bark-voice-cloning-hubert-quantizer.

NickAnastasoff avatar NickAnastasoff commented on June 14, 2024

For anyone trying to find an answer - codes = torch.cat([encoded[0] for encoded in encoded_frames], dim=-1) # [B, n_q, T]

should be

codes = torch.cat([encoded[0] for encoded in encoded_frames], dim=-1).squeeze() # [B, n_q, T]

And shorten audio file to under 10 seconds

from bark-voice-cloning-hubert-quantizer.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.