bigvgan-mirror

A mirror of BigVGAN for access via PyTorch Hub. There are no dependencies other than PyTorch. I cleaned up the original code from NVIDIA. The weights here are for prediction. If you want to train BigVGAN, you also need the discriminator and have to add weight_norm to the generator.

Example Usage

Below is an example demonstrating how to generate a mel spectrogram from an audio file and use BigVGAN to synthesize audio from it.

import torch
import librosa
import numpy as np
from scipy.io import wavfile

def mel_spectrogram(y, n_fft, num_mels, sampling_rate, hop_size, win_size, fmin, fmax):
    # Create mel filterbank
    mel_basis = librosa.filters.mel(
        sr=sampling_rate, n_fft=n_fft, n_mels=num_mels, fmin=fmin, fmax=fmax
    )

    # Pad the signal
    pad_length = int((n_fft - hop_size) / 2)
    y = np.pad(y, (pad_length, pad_length), mode="reflect")

    # Compute STFT
    D = librosa.stft(
        y,
        n_fft=n_fft,
        hop_length=hop_size,
        win_length=win_size,
        window="hann",
        center=False,
        pad_mode="reflect",
    )

    # Convert to magnitude spectrogram and add small epsilon
    S = np.sqrt(np.abs(D) ** 2 + 1e-9)

    # Apply mel filterbank
    S = np.dot(mel_basis, S)

    # Convert to log scale
    S = np.log(np.maximum(S, 1e-5))

    return S

model = torch.hub.load("lars76/bigvgan-mirror", "bigvgan_base_22khz_80band",
                       trust_repo=True, pretrained=True)

wav, sr = librosa.load('/path/to/your/audio.wav', sr=model.sampling_rate, mono=True)
mel = torch.FloatTensor(mel_spectrogram(wav, model.n_fft, model.num_mels, model.sampling_rate, model.hop_size, model.win_size, model.fmin, model.fmax)).unsqueeze(0)

with torch.inference_mode():
    predicted_wav = model(mel) # 1 x T tensor (32-bit float)

predicted_wav = np.int16(predicted_wav.squeeze(0).cpu().numpy() * 32767)
wavfile.write("output.wav", model.sampling_rate, predicted_wav)

Model descriptions

TTS models

Model Name	Mels	n_fft	Hop Size	Win Size	Sampling Rate	fmax	Params	Dataset
bigvgan_v2_22khz_80band_fmax8k_256x	80	1024	256	1024	22050	8000	112M	Large-scale Compilation
bigvgan_base_22khz_80band	80	1024	256	1024	22050	8000	14M	LibriTTS + VCTK + LJSpeech
bigvgan_22khz_80band	80	1024	256	1024	22050	8000	112M	LibriTTS + VCTK + LJSpeech

Since bigvgan_v2_22khz_80band_fmax8k_256x was also trained with non-speech data, I found that bigvgan_base_22khz_80band and bigvgan_22khz_80band are much better suited for use in text-to-speech systems such as FastSpeech2. In addition, bigvgan_base_22khz_80band also seems to be better than bigvgan_22khz_80band.

Other models

Model Name	Mels	n_fft	Hop Size	Win Size	Sampling Rate	fmax	Params	Dataset
bigvgan_v2_44khz_128band_512x	128	2048	512	2048	44100	22050	122M	Large-scale Compilation
bigvgan_v2_44khz_128band_256x	128	1024	256	1024	44100	22050	112M	Large-scale Compilation
bigvgan_v2_24khz_100band_256x	100	1024	256	1024	24000	12000	112M	Large-scale Compilation
bigvgan_v2_22khz_80band_256x	80	1024	256	1024	22050	11025	112M	Large-scale Compilation
bigvgan_base_24khz_100band	100	1024	256	1024	24000	12000	14M	LibriTTS
bigvgan_24khz_100band	100	1024	256	1024	24000	12000	112M	LibriTTS

Benchmark

You can run python benchmark.py to compare the performance between the original code and this one.

This table presents the results of evaluating the PESQ (Perceptual Evaluation of Speech Quality) score on 20 WAV files from the AISHELL-3 dataset. The evaluation compares the performance of the cleaned up model with the original model.

Model Name	PESQ ± StdDev
bigvgan_base_22khz_80band	3.5709 ± 0.3007
bigvgan_22khz_80band	3.9903 ± 0.2270
bigvgan_base_24khz_100band	3.6300 ± 0.3398
bigvgan_24khz_100band	4.0123 ± 0.2957
bigvgan_v2_44khz_128band_512x	3.7670 ± 0.3044
bigvgan_v2_44khz_128band_256x	3.8614 ± 0.4240
bigvgan_v2_24khz_100band_256x	4.1535 ± 0.3204
bigvgan_v2_22khz_80band_256x	3.9525 ± 0.2709
bigvgan_v2_22khz_80band_fmax8k_256x	4.0165 ± 0.2682

The result between the cleaned up model and the original model is the same.

lars76 / bigvgan-mirror Goto Github PK

bigvgan-mirror's Introduction

bigvgan-mirror

Example Usage

Model descriptions

TTS models

Other models

Benchmark

bigvgan-mirror's People

Contributors

Watchers

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent