slp-rl / aero Goto Github PK

View Code? Open in Web Editor NEW

179.0 6.0 23.0 163.3 MB

This repo contains the official PyTorch implementation of "Audio Super Resolution in the Spectral Domain" (ICASSP 2023)

License: MIT License

Python 100.00%

audio audio-processing machine-learning audio-super-resolution pytorch bandwidth-extension speech-synthesis

aero's Introduction

AERO

Audio Super Resolution in the Spectral Domain

This is the official PyTorch implemenation of AERO: Audio Super Resolution in the Spectral Domain: paper, project page.

Checkpoint files are available! Details below.

Requirements

Install requirements specified in requirements.txt:
pip install -r requirments.txt

We ran our code on CUDA/11.3, we therefore installed pytorch/torchvision/torchaudio with the following:

pip install torch==1.12.1+cu113 torchvision==0.13.1+cu113 torchaudio==0.12.1 --extra-index-url https://download.pytorch.org/whl/cu113

Our code uses hydra to set parameters to different experiments.

ViSQOL

If you want to run code without using ViSQOL, set visqol: False in file: conf/main_config.yaml.

In order to evaluate model output with the ViSQOL metric, one first needs to install Bazel and then ViSQOL.
In our code, we use ViSQOL via its command line API by using a Python subprocess.

Build Bazel and ViSQOL following directions from here.

Add the absolute path of the root directory of ViSQOL (where the WORKSPACE file is), to the visqol path parameter in main_config.yaml.

Data

Download data

For speech we use the VCTK Corpus.
For music we use the mixture tracks of MUSDB18-HQ dataset. Make sure to download the uncompressed WAV version.

Resample data

Data are a collection of high/low resolution pairs. Corresponding high and low resolution signals should be in different folders.

In order to create each folder, one should run resample_data a total of 5 times, to include all source/target pairs in both speech and music settings.

For speech, we use 4 lr-hr settings: 8-16 kHz, 8-24 kHz, 4-16 kHz, 12-48 kHz. This requires to resample to 4 different resolutions (not including the original 48 kHz): 4, 8, 16, and 24 kHz.

For music, we downsample once to a target 11.025 kHz, from the original 44.1 kHz.

E.g. for 4 and 16 kHz:
python data_prep/resample_data.py --data_dir <path for 48 kHz data> --out_dir <path for 4 kHz data> --target_sr 4
python data_prep/resample_data.py --data_dir <path for 48 kHz data> --out_dir <path for 16 kHz data> --target_sr 16

Create egs files

For each low and high resolution pair, one should create "egs files" twice: for low and high resolution.
create_meta_files.py creates a pair of train and val "egs files", each under its respective folder. Each "egs file" contains meta information about the signals: paths and signal lengths.

e.g. to create egs files for the various speech settings:

python data_prep/create_meta_files.py <path for 4 kHz data> egs/vctk/4-16 lr
python data_prep/create_meta_files.py <path for 16 kHz data> egs/vctk/4-16 hr

python data_prep/create_meta_files.py <path for 8 kHz data> egs/vctk/8-16 lr
python data_prep/create_meta_files.py <path for 16 kHz data> egs/vctk/8-16 hr

python data_prep/create_meta_files.py <path for 8 kHz data> egs/vctk/8-24 lr
python data_prep/create_meta_files.py <path for 24 kHz data> egs/vctk/8-24 hr

python data_prep/create_meta_files.py <path for 12 kHz data> egs/vctk/12-48 lr
python data_prep/create_meta_files.py <path for 46 kHz data> egs/vctk/12-48 hr

Creating dummy egs files (for debugging code)

If you want to create dummy egs files for debugging code on small number of samples. (This might be a little buggy, make sure that the same files exist in high/low resolution meta (egs) files)

python data_prep/create_meta_files.py <path for 4 kHz data> egs/vctk/4-16 lr --n_samples_limit=32
python data_prep/create_meta_files.py <path for 16 kHz data> egs/vctk/4-16 hr --n_samples_limit=32

Train

Run train.py with dset and experiment parameters.
(make sure that the parameters lr_sr, hr_sr in the experiment comply with the sample rates of the dataset).

e.g. for upsampling from 4kHz to 16kHz, with n_fft=512 and hop_length=64:

python train.py dset=4-16 experiment=aero_4-16_512_64

To train with multiple GPUs, run with parameter ddp=true. e.g.

python train.py dset=4-16 experiment=aero_4-16_512_64 ddp=true

Test (on whole dataset)

Make sure to create appropriate egs files for specific LR to HR setting
- e.g. for 4-16:
  python data_prep/create_meta_files.py <path for 4 kHz data> egs/vctk/4-16 lr
  python data_prep/create_meta_files.py <path for 16 kHz data> egs/vctk/4-16 hr
Create a directory with experiment name in the format: aero-nfft=<NFFT>-hl=<HOP_LENGTH> (e.g. aero-nfft=512-hl=64)
Copy/download appropriate checkpoint.th file to directory (make sure that the corresponding nfft,hop_length parameters correspond to experiment file)
Run python test.py dset=<LR>-<HR> experiment=aero_<LR>-<HR>_<NFFT>_<HOP_LENGTH>

e.g. for upsampling from 4kHz to 16kHz, with n_fft=512 and hop_length=64:

python test.py \
  dset=4-16 \
  experiment=aero_4-16_512_64

Predict (on single sample)

Copy/download appropriate checkpoint.th file to directory (make sure that the corresponding nfft,hop_length parameters correspond to experiment file)
Run predict.py with appending new filename and output parameters via hydra framework, corresponding to the input file and output directory respectively.

e.g. for upsampling from 4kHz to 16kHz, with n_fft=512 and hop_length=64:

python predict.py \
  dset=4-16 \
  experiment=aero_4-16_512_64 \
  +filename=<absolute path to input file> \
  +output=<absolute path to output directory>

Checkpoints

To use pre-trained models, one can download checkpoints from here.

To link to checkpoint when testing or predicting, override/set path under checkpoint_file:<path> in conf/main_config.yaml.
e.g.

python test.py \
  dset=4-16 \
  experiment=aero_4-16_512_64 \
  +checkpoint_file=<path to appropriate checkpoint.th file>

Alternatively, make sure that the checkpoint file is in its corresponding output folder:
For each low to high resolution setting, hydra creates a folder under outputs/: lr-hr (e.g. outputs/4-16), under each such folder hydra creates a folder with the experiment name and n_fft and hop_length hyper-paremers (e.g. aero-nfft=512-hl=256). Make sure that each checkpoint exists beforehand in appropriate output folder, if you download the outputs folder and place it under the root directory (which contains train.py and /src), it should retain the appropriate structure and no renaming should be necessary (make sure that restart: false in conf/main_config.yaml)

aero's People

Contributors

Stargazers

Watchers

aero's Issues

Optimal configuration for 12khz - 48khz

Hi Authors,

Are there any specific alterations to the configuration for 12->48khz upsample?
I seem to be generating a spectrogram with a distinct cut such as the one seen in Fig3 (before spectral upsampling) on longer voice clips

Thanks, appreciate the work

Output not good when using predict and provided checkpoints

Hey there!

Awesome project, thank you for sharing.

Not sure if I need to train the models, but i was expecting the provided checkpoints to work out of the box for the normal speech data.

Here is a notebook to predict
https://colab.research.google.com/drive/1s8nk1Iadwajd3cFoTqis8nZf_F2Tqgaw?usp=sharing

Input: https://drive.google.com/file/d/1c5sM6CoOQfD8OCy7GRr4qJ9RxchllA3B/view?usp=drive_link

Output: https://drive.google.com/file/d/1FCJCXVwKXN3WXKXw7PdBlk7Up_6Ay0jv/view?usp=drive_link

Am I doing something wrong, or provided checkpoints just won't generalize?

real-time

Have you made any progress applying the algorithm to real-time streaming audio?

Applying the model for bandwidth extension (not super-resolution)

I am trying to train your model for the task of bandwidth extension, where the input observations are lowpassed with a certain lowpass filter at a given cutoff frequency. This differs from the super-resolution or upsampling scenario, where the lowpass filter is an antialiasing filter, which cutoff frequency is around the Nyquist of a low-resolution signal.

I understand that the spectral upsampling you propose serves as a strong regularization strategy to prevent filter overfitting. I am just wondering if your method could be considered as a baseline in my case, given that the spectral upsampling method you propose is not applicable. You mention in the paper that you noticed some artifacts when this trick was not used, but, does it still work despite that? Or it affects the GAN training stability somehow?

I'm now training a couple of models to bandwidth-extend some piano music (MAESTRO) from a lowpassed version at 1kHz and 3kHz. I'm training with a single lowpass filter, expecting the model to overfit to it. So far, it kind of works but the quality is still not great. I have only trained for half a day, so I'll better wait.

Request for Source Code of Your Impressive Demo Page

Is it possible to reduce GPU memory usage during inference?

It appears it is taking >30GB of memory for inference. What are the parameters I can set which can reduce the demand?

Thanks!

A Question about vocals

Hello Mr author

I had some couple of Questions I hope they get answered.

So I want to train a 24khz to 48khz model for singing raw vocals that have no background music nor Reverberation at all,
just pure raw studio vocals but then I have some Questions.

1-These vocals are in different languages English has approximately 50 hours of vocals, Chinese 90 hours,
Korean 2.5 hours , Japanese 6.6 hours and Italian 1 hour. So the question is, would having different languages confuse and corrupt the algorithm and make the output bad and hallucinating or would it have a greater good impact or should I only train the English ones ?

2- I Should keep my almost 150 hours dataset alone and not mix it with MusDB dataset that has musical mixtures and full songs with instrumentals on, right?? just treat this 150 hours dataset like VCTK, right?

3- Some of these raw vocals aren't transcribed , is this okay ? Does aero train and accept non transcribed data.

4-I Don't really understand FFT , hop lengths and window sizes thingy. What will produce the best quality out of
these parameters for my case which I stated above? The (64/512) or the (256/1024) or (256/512) or (128/512)?

And Also how to know the configuration aka FFT , hop lengths and window sizes of my dataset and how to convert my
dataset to the specific configuration that I want out of the 4 or the one that you will recommend.

I don't mind the training time I want the most optimal and the most highest quality for upscaling so if there is another configuration That isn't mentioned above tell me about it.

Again sorry if I'm asking basic questions, And thanks for your time.

Training logs

Hi, thanks for great work.

Could you share training logs of your trained model.
Since i am trying to make 24k -> 48k aero model, it would be great help if you can share some training logs.

artifacts at the verge between existing and extended frequency bands

I trained the model by myself and found there was a line of artifacts at at the verge between existing and extended frequency bands using the model I trained. I read the paper related to this repo and found the three reasons provided in the paper for this phenomenon didn't exist in my training process. I don't know why.
I guess maybe it's because of the process i transformed .flac to .wav for the dataset VCTK. So could you please tell me how did you transform .flac to .wav?
Thank you!

Predict on CPU

As I understand form code there is hardcoded CUDA support. So I have changed device to cpu and replaced model.cuda() with model.cpu() But when I run predict I got strange error:

RuntimeError: Argument #4: Padding size should be less than the corresponding input dimension, but got: padding (256, 256) at dimension 2 of input [1, 2, 160]

I don't know is it problem with cpu or something else.

Many related small questions

Hiya, thanks for your great work, it really does work well.

I was trying to train my own model and hit on a few small hitches that I hope to solve for future users of the code:

When going over the instructions to resample the VCTK files, it took me a while to debug that you need to pass the full sample rate (e.g. 16000) and not the sample rate in KHz, as is given in the example command-lines (16). A small update to the readme would be nice for future users (e.g. --target_sr 4 -> --target_sr 4000).
Converting the audio from .flac (VCTK default) to .wav is also a necessary step for the scripts (though note that sox has no issue reading .flac files)
Creating configuration & dataset .yaml files: I copied the provided 4-16 files and modified appropriately.
One of the files, p271/p271_069_mic1.wav has exactly 96001 samples, so for the 12-48KHz task, its length is rounded up to 2 sections for the high-res dataset, but after downsampling it's rounded down to 24000 samples, so it's only one section. This breaks the training code unfortunately. I fixed it by manually trimming the file by one sample (sox p271_069_mic1.wav p271_trimmed.wav trim 0 -1s) and regenerating the .egs files.

I was left with a few questions:

The paper says the setting for 12-48 was nfft=1024 and hop=256, batch size 8. The models you provide in the drive say 512/256 and 512/128. What should I believe?
Does the code support resuming a training run?

Thanks again for the paper and the code - the results are good and the code is as well.

How to successfully install sox in conda virtual environment?

I have created virtual enviroment, where I installed all the requirements.txt into that envrioment. However, although I add the PATH variable of the installed Sox, when I call the resample_data.py I got the error SoX is not found. My OS is Win11 btw.

Musdb model

Hello,

I'm trying to use predict to improve some old music have, as was done here in your project:

Section Ⅴ: Examples for samples upsampled from 11.025kHz to 44.1kHz.
The model is trained on the train set of the MusDB-HQ dataset.

but I think I need a msudb experiment yaml file. I was able to download the checkpoint.tf, and tried to use the output naming convention to predict, but there is not a matching experiment yaml file I believe. The dset training hydra config would be nice too if possible.

Thanks much, and cool project.

Model Comparison

I am very interested in your paper, thank you for sharing.

May I ask how different models are compared, such as how many epochs are appropriate for training nuwave2. Is the code of the comparison model (such as nuwave2) merged into the AERO code to run the comparison results?

BEHMGAN Training parameters

Hi,
As you've done in your paper, I'm also trying to train BEHMGAN with 44.1 kHz recordings but I'm having issues -- the frequencies generated above 4kHz are shifted some miliseconds in time.

Did you have this kind of issue at some point in training? If no, could you share the parameters of the conf file used for this setting?

Thanks

Wallace

“assert len(self.hr_set) == len(self.lr_set)" error

Hi Authors,
When I run train.py, it prompts an error as below.

Traceback (most recent call last):
  File "/workspace/aero-main/train.py", line 135, in main
    _main(args)
  File "/workspace/aero-main/train.py", line 127, in _main
    run(args)
  File "/workspace/aero-main/train.py", line 54, in run
    tr_dataset = LrHrSet(args.dset.train, args.experiment.lr_sr, args.experiment.hr_sr,
  File "/home/wsd/workspace/aero-main/src/data/datasets.py", line 136, in __init__
    assert len(self.hr_set) == len(self.lr_set)
AssertionError
ERROR conda.cli.main_run:execute(49): `conda run python /workspace/aero-main/train.py dset=12-48 experiment=aero_12-48_512_256` failed.

prompt error : assert len(self.hr_set) == len(self.lr_set),then I print the length of both :

len(self.lr_set): 64349
len(self.hr_set): 64353
I read the code about the length of segment,The logic is to segment the audio based on different sampling rates.

class Audioset:
    def __init__(self, files=None, length=None, stride=None,
                 pad=True, with_path=False, sample_rate=None,
                 channels=None):
        """
        files should be a list [(file, length)]
        """
        self.files = files
        self.num_examples = []
        self.length = length
        self.stride = stride or length
        self.with_path = with_path
        self.sample_rate = sample_rate
        self.channels = channels

        for file, file_length in self.files:
            if length is None:
                examples = 1
            elif file_length < length:
                examples = 1 if pad else 0
            elif pad:
                examples = int(math.ceil((file_length - self.length) / self.stride) + 1)
            else:
                examples = (file_length - self.length) // self.stride + 1
            self.num_examples.append(examples)

I am using the same dataset as the author, which is VCTK (excluding p315 and s5).
can you tell me How should this situation be handled?

Restore old tape recordings

Hello,
I wondered if you might have a comment on how to improve the clarity of recordings such as these:
https://fsi-languages.yojik.eu/languages/FSI/fsi-french-basic.html

They are from old (reel-to-reel) tapes, with some kind of NR/noise gate, and low-bitrate compression.

Maybe I could make a dataset of original and (simulated) degraded recordings by running clean recordings through this cassette-tape simulator VST plugin?

https://www.wavesfactory.com/audio-plugins/cassette/
https://github.com/teragonaudio/MrsWatson

Thanks for your time.

Training metrics

Hi Authors,

I can't find any papers that beat your implementation - what a milestone!
I am considering training the model for 16khz - 48khz and 22khz - 48khz.

Could I ask what was the hardware you used and the steps of the provided checkpoints were? Any additional information regarding the parameters for training would be extremely helpful, I can provide the checkpoints to this repo for open-source use once they're trained.

Again, fantastic stuff here! Thank you 👍

soundfile.LibsndfileError: <unprintable LibsndfileError object> Error in train.py

I am using Windows. I have clonned the github repo and in the
AERO\lib\site-packages\torch_utils.py function reraise gives an soundfile.LibsndfileError:
error. How can I fixed it?

Suggestion: Allow bfloat16 use to improve speed/memory usage

Hi, as you may have noticed, I've been developing a fork of this in an attempt to repurpose to train models for upscaling AM and FM radio recordings. In any event, while researching possible ways to improve speed and memory consumption, I started looking at 16-bit floating point formats. The standard float16 doesn't appear to have enough range to be useful for this project (I ended up with nan values relatively quickly), but bfloat16 (which uses the same number of exponent bits as float32) seems to work quite well and does speed up training a bit and has a significant impact on memory usage. You can see an example of its implementation in a recent commit I made. Note that I did have to restructure the code so loss calculation is done using standard 32-bit float values (as recommended by pytorch).

AssertionError in training

Hello, I ran into this problem when training the aero model:
assert len(self.hr_set) == len(self.lr_set) AssertionError

Predict.py output is silence

Hi,
I have used the pre-trained checkpoint for 16-48. My original audio file is 16Khz. I want to perform super-res to enhance the quality of speech. I have replaced the checkpoint path in the main_config.yaml. But there is no audio in the output that I'm getting, it's just a silence audio. Any idea why that might be the case?

Thanks in advance !

Regards,
Harsh

How to use this to improve audio quality of existing recordings?

Hello.

I want to improve audio of my recordings

I didn't understand from readme that how can I achieve this

here my video that I want to improve. so what do I need to do? I am not interested in training

https://youtu.be/2zY1dQDGl3o

RuntimeError: Given groups=1, weight of size [48, 2, 1, 1], expected input[1, 4, 256, 7501] to have 2 channels, but got 4 channels instead

I am trying to run 12-48 / aero-nfft=512-hl=256
Although I have no idea what is 512 and 256

installed all requirements

run my command like this and got error

python predict.py dset=4-16 experiment=aero_4-16_512_256 +filename="D:\86 se courses youtube kanali\aero\5dk.mp3" +output="D:\86 se courses youtube kanali\aero\5_v2dk.mp3" checkpoint_file="D:\86 se courses youtube kanali\aero\checkpoint.th"

I want to improve quality of this audio 5 min sound : https://sndup.net/stjs/

(env) D:\86 se courses youtube kanali\aero>python predict.py dset=4-16 experiment=aero_4-16_512_256 +filename="D:\86 se courses youtube kanali\aero\5dk.mp3" +output="D:\86 se courses youtube kanali\aero\5_v2dk.mp3" checkpoint_file="D:\86 se courses youtube kanali\aero\checkpoint.th"
D:\86 se courses youtube kanali\aero\env\lib\site-packages\hydra\_internal\defaults_list.py:251: UserWarning: In 'main_config': Defaults list is missing `_self_`. See https://hydra.cc/docs/upgrades/1.0_to_1.1/default_composition_order for more information
  warnings.warn(msg, UserWarning)
{'experiment': {'name': 'aero-nfft=${experiment.nfft}-hl=${experiment.hop_length}', 'lr_sr': 4000, 'hr_sr': 16000, 'segment': 2, 'stride': 2, 'pad': True, 'upsample': False, 'batch_size': 16, 'nfft': 512, 'hop_length': 256, 'model': 'aero', 'aero': {'in_channels': 1, 'out_channels': 1, 'channels': 48, 'growth': 2, 'nfft': '${experiment.nfft}', 'hop_length': '${experiment.hop_length}', 'end_iters': 0, 'cac': True, 'rewrite': True, 'hybrid': False, 'hybrid_old': False, 'freq_emb': 0.2, 'emb_scale': 10, 'emb_smooth': True, 'kernel_size': 8, 'strides': [4, 4, 2, 2], 'context': 1, 'context_enc': 0, 'freq_ends': 4, 'enc_freq_attn': 0, 'norm_starts': 2, 'norm_groups': 4, 'dconv_mode': 1, 'dconv_depth': 2, 'dconv_comp': 4, 'dconv_time_attn': 2, 'dconv_lstm': 2, 'dconv_init': 0.001, 'rescale': 0.1, 'lr_sr': '${experiment.lr_sr}', 'hr_sr': '${experiment.hr_sr}', 'spec_upsample': True, 'act_func': 'snake', 'debug': False}, 'adversarial': True, 'features_loss_lambda': 100, 'only_features_loss': False, 'only_adversarial_loss': False, 'discriminator_models': ['msd_melgan'], 'melgan_discriminator': {'n_layers': 4, 'num_D': 3, 'downsampling_factor': 4, 'ndf': 16}}, 'dset': {'name': '4-16', 'train': 'egs/vctk/4-16/tr', 'valid': None, 'test': 'egs/vctk/4-16/val'}, 'num_prints': 5, 'device': 'cuda', 'num_workers': 2, 'verbose': 0, 'show': 0, 'log_results': True, 'checkpoint': True, 'continue_from': '', 'continue_best': False, 'restart': False, 'checkpoint_file': 'D:\\86 se courses youtube kanali\\aero\\checkpoint.th', 'best_file': 'best.th', 'history_file': 'history.json', 'test_results_file': 'test_results.json', 'samples_dir': 'samples', 'keep_history': True, 'seed': 2036, 'dummy': '', 'visqol': True, 'visqol_path': None, 'eval_every': 25, 'enhance_samples_limit': -1, 'valid_equals_test': None, 'cross_valid': False, 'cross_valid_every': 5, 'joint_evaluate_and_enhance': True, 'evaluate_on_best': False, 'wandb': {'project_name': 'Spectral Bandwidth Extension', 'entity': None, 'mode': 'online', 'log': 'all', 'log_freq': 5, 'n_files_to_log': 10, 'n_files_to_log_to_table': 10, 'tags': [], 'resume': False}, 'optim': 'adam', 'lr': 0.0003, 'beta1': 0.8, 'beta2': 0.999, 'losses': ['stft'], 'stft_sc_factor': 0.5, 'stft_mag_factor': 0.5, 'epochs': 125, 'ddp': False, 'ddp_backend': 'nccl', 'rendezvous_file': './rendezvous', 'rank': None, 'world_size': None, 'filename': 'D:\\86 se courses youtube kanali\\aero\\5dk.mp3', 'output': 'D:\\86 se courses youtube kanali\\aero\\5_v2dk.mp3'}
[2023-02-09 14:36:36,703][__main__][INFO] - Loading model aero from last state.
[2023-02-09 14:36:38,679][__main__][INFO] - lr wav shape: torch.Size([2, 14400000])
[2023-02-09 14:36:38,680][__main__][INFO] - number of chunks: 30
Error executing job with overrides: ['dset=4-16', 'experiment=aero_4-16_512_256', '+filename=D:\\86 se courses youtube kanali\\aero\\5dk.mp3', '+output=D:\\86 se courses youtube kanali\\aero\\5_v2dk.mp3', 'checkpoint_file=D:\\86 se courses youtube kanali\\aero\\checkpoint.th']
Traceback (most recent call last):
  File "D:\86 se courses youtube kanali\aero\predict.py", line 77, in main
    pr_chunk = model(lr_chunk.unsqueeze(0).to(device)).squeeze(0)
  File "D:\86 se courses youtube kanali\aero\env\lib\site-packages\torch\nn\modules\module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "D:\86 se courses youtube kanali\aero\src\models\aero.py", line 472, in forward
    x = encode(x, inject)
  File "D:\86 se courses youtube kanali\aero\env\lib\site-packages\torch\nn\modules\module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "D:\86 se courses youtube kanali\aero\src\models\aero.py", line 120, in forward
    x = self.pre_conv(x)
  File "D:\86 se courses youtube kanali\aero\env\lib\site-packages\torch\nn\modules\module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "D:\86 se courses youtube kanali\aero\env\lib\site-packages\torch\nn\modules\conv.py", line 457, in forward
    return self._conv_forward(input, self.weight, self.bias)
  File "D:\86 se courses youtube kanali\aero\env\lib\site-packages\torch\nn\modules\conv.py", line 453, in _conv_forward
    return F.conv2d(input, weight, bias, self.stride,
RuntimeError: Given groups=1, weight of size [48, 2, 1, 1], expected input[1, 4, 256, 7501] to have 2 channels, but got 4 channels instead

Set the environment variable HYDRA_FULL_ERROR=1 for a complete stack trace.

slp-rl / aero Goto Github PK

aero's Introduction

AERO

Requirements

ViSQOL

Data

Download data

Resample data

Create egs files

Creating dummy egs files (for debugging code)

Train

Test (on whole dataset)

Predict (on single sample)

Checkpoints

aero's People

Contributors

Stargazers

Watchers

Forkers

aero's Issues

Recommend Projects

Recommend Topics

Recommend Org