fgnt / padertorch Goto Github PK

A collection of common functionality to simplify the design, training and evaluation of machine learning models based on pytorch with an emphasis on speech processing.

License: MIT License

Shell 0.15% Python 89.42% Makefile 0.46% C++ 1.06% Cuda 8.67% C 0.23%

audio pytorch speech

padertorch's People

Contributors

Stargazers

Watchers

Forkers

tcord thequilo janekebb entn-at boeddeker frederikrautenberg michael-kuhlmann alexanderwerning yisiying jensheit sibange fragrantrookie tobvogel runngezhang baekms hiyoung-asr

padertorch's Issues

Drop Python 3.6?

Python 3.6 had eol last year (23 Dec 2021).
Should we drop test for it?

Why:
The tests in #138 fail, because python 3.6 has a different representer for annotations.
Finding a workaround for the doctest is annoying and since Python 3.6 is eol,
I thought we could drop it.

Support dataclass default_factory

default_factory for dataclasses is not supported. Finding the default args for the class fails.

import padertorch as pt

from dataclasses import dataclass, field
@dataclass
class A(pt.Configurable):
    a: dict = field(default_factory=dict)

A.get_config()

gives

Traceback (most recent call last):
...
TypeError: Object of type _HAS_DEFAULT_FACTORY_CLASS is not JSON serializable
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
...
ValueError: Invalid config.
See above exception msg from json.dumps and below the sub config:
{'factory': 'A', 'a': <factory>}

Segmentation Fault when Training Model

I ran into an issue with training a model using Padertorch [https://github.com/fgnt/ham_radio/issues/1](Ham radio). I get a segmentation fault in the training loop function. When if place a pdb debug statement in train_step I get the loss, summary, etc but I get the segmentation fault upon return. Location:

padertorch>train>trainer.py>Trainer>step . In the if block with len(device) == 1

I am using a single RTX 4090 on Python 3.10 with PyTorch 2.1.0.

Thank you.

Show example json to run ORPIT

Hi,
First thanks for your efforts of the ORPIT example, which is the only one I can find on the GitHub.
The code is complex and I am in a hurry to run this. Is this possible to show several lines of dataset json files? And how to run the command to include both the 2 and 3 speaker conditions?

restructure and update the contrib examples

Split example directory into toy examples and task specific examples.
Update the example training scripts so that the experiments use similar directory structures.
Add evaluation scripts if missing and update the evaluation directory structure, so that the evaluation files are written in a new directory inside the training directory.

Move backwards step into model

In case of multiple chained models for example source separation + speech recognition it might be necessary to do intermediate backwards steps to reduce the required gpu memory during training.
The user could be enabled to use mulitple backwards steps by moving the backwards step and the train_step into the model.

However, we have to consider the implications for the Hook post_step, which is at the moment called after train_step but before the backwards step.

Another open question is how to handle the timer information.

Error in tbx_utils.py

/padertorch/padertorch/summary/tbx_utils.py, line 145, in audio
signal *= 0.95
ValueError: output array is read-only
I think it happens when there are zero signals.

The current code is like

    if normalize:
        denominator = np.max(np.abs(signal))
        if denominator > 0:
            signal = signal / denominator
        signal *= 0.95

I think it should be:

    if normalize:
        denominator = np.max(np.abs(signal))
        if denominator > 0:
            signal = signal / denominator
            signal *= 0.95

@boeddeker could you check it?

ToDo: Support data parallel in validate

This has to be done for a complete implementation of multiple GPUs.

Add review visualization utilities

Issue:

I trained a model and want to visualize it in a jupyter notebook.
At the moment my workflow is, that I execute the model and call manually some plotting functions,
because the review is designed for tensorboard (especially the images are non-obvious, how to print).

Note: Manually calling the plotting functions is better than visualizing the tensorboard visualizing,
because tensorboard doesn't know what axis labels and ticks are and how a proper title is formatted,
but simply visualizing the review is faster, because the code is already written.

Suggestion:

Add some utilities to visualize entries of the review.
e.g.

for k, (data, sample_rate) in review.get('audios', {}).items():
    pb.io.play(data, sample_rate=sample_rate, normalize=False, name=k)

for audios and something like

with pb.visualization.axes_context(columns=4) as axes:
    for k, image in review['images'].items():
        axes.new
        image = np.einsum('chw->hwc', image)[::-1]
        plt.imshow(image, origin='lower')
        plt.title(k)
        plt.grid(False)

for images.

Two proposals for high level functions:

class VisualizeReview:
    def __init__(self, review, trainer=None):
        self.review = review
        if trainer is not None:
            # Ensure, that loss is in review and add loss to scalars
            _, review = trainer._review_to_loss_and_summary(review)
        else:
            review.setdefault('scalars', {})['loss'] = review['loss']
    
    def __call__(self):
        self.scalars()
        self.audios()
        self.images()
    
    def scalars(self):
        display(pd.Series({
            k: pt.utils.to_numpy(v, detach=True)
            for k, v in self.review['scalars'].items()
        }))
    
    def audios(self):
        for k, (data, sample_rate) in self.review.get('audios', {}).items():
            play(data, sample_rate=sample_rate, normalize=False, name=k)
    
    def images(self, columns=4):
        with pb.visualization.axes_context(columns=columns) as axes:
            for k, image in self.review['images'].items():
                axes.new
                image = np.einsum('chw->hwc', image)
                plt.imshow(
                    image,
                    origin='lower',
                )
                plt.title(k)
                plt.grid(False)

VisualizeReview(model_review)()

def visualize_review(
        review,
        trainer=None,
        axes_context_kwargs=dict(columns=4)
):
    from IPython.display import display

    display(pd.Series({
        k: pt.utils.to_numpy(v, detach=True)
        for k, v in review['scalars'].items()
    }))
    
    for k, (data, sample_rate) in review.get('audios', {}).items():
        play(data, sample_rate=sample_rate, normalize=False, name=k)
        
    with pb.visualization.axes_context(**axes_context_kwargs) as axes:
        for k, image in review['images'].items():
            axes.new
            image = np.einsum('chw->hwc', image)
            image = image[::-1]
            plt.imshow(
                image,
                origin='lower',
            )
            plt.title(k)
            plt.grid(False)

visualize_review(model_review)

Configurable: Support positional only arguments

At the moment is any callable a configurable, when it supports key value arguments.
While this is verbose and most factories and classes are supported, there are a few we don't support.

Maybe the most relevant example for padertorch is torch.nn.Sequential(*args).
To use this class, at the moment we have to use a wrapper around that class.
A native support would be nice.

In a small group, we discussed this offline, but haven't found the solution, that we want to realize.

First priority is, that the implementation is no breaking change, so all examples that follow, won't break current configs.

Here some examples, how it could be realized:

First, lets assume, we have these factories:

def foo(*numbers):
    ...
def bar(a, b, \):  # positional only, like operator.add
    ...

and we want to call them as

foo(1, 2)
bar(1, 2)

now follow some ideas, how the config could look like:

1. Reserved keyword 'args', 'factory_args' or '*'

{'factory': 'foo', 'args': [1, 2]}
{'factory': 'bar', 'args': [1, 2]}

{'factory': 'foo', '*': [1, 2]}
{'factory': 'bar', '*': [1, 2]}

2. Signature check with "assignment"

Ignore that the arguments is a positional only in the config and simply do an assignment style in the config.
In the implementation, we then to the mapping to the positional argument.

{'factory': 'foo', 'numbers': [1, 2]}
{'factory': 'bar', a=1, b=2]}

{'factory': 'foo', '*numbers': [1, 2]}
{'factory': 'bar', a=1, b=2]}

3. Lisp style

The factory can be a list, the first argument is then the function, while the others are the arguments:

{'factory': ['foo', 1, 2]}
{'factory': ['foo', 1, 2]}

My opinion

The 2. is nice for positional only arguments like operator.add.
But this rarely happen in practice, because it was only supported for C and C++ functions until py37 (PEP 570).
For *args I don't like it. It looks strange and is not robust against renaming.

For the first my favorite key would be args, but it could happen, that someone uses args as normal keyword:

threading.Thread(group=None, target=None, name=None, args=(), kwargs={}, *, daemon=None)
scipy.optimize.minimize_scalar(fun, bracket=None, bounds=None, args=(), method='brent', tol=None, options=None)[source]

We didn't find a relevant example, but it would be better to prevent the conflict.

At the moment I am unsure, if I like {'factory': ['foo', 1, 2]} or {'factory': 'foo', '*'=[1, 2]} more.

STFT inverse, stacked representation

Hi,
using complex_representation='stacked' for inverse STFT leads to an error:

import torch                                                                    
from padertorch.ops import STFT                                                 
                                                                                
stft_signal = torch.rand((2, 4, 10, 257, 2))                                    
torch_stft = STFT(512, 20, window_length=40, \                                  
                        complex_representation='stacked')                       
torch_signal = torch_stft.inverse(stft_signal)

Traceback (most recent call last):
  File "bug.py", line 7, in <module>
    torch_signal = torch_stft.inverse(stft_signal)
  File "/mnt/matylda6/izmolikova/JSALT2020/sse/tools/padertorch/padertorch/ops/_stft.py", line 215, in inverse
    stride=self.shift)
RuntimeError: Expected 3-dimensional input for 3-dimensional weight [512, 1, 40], but got 5-dimensional input of size [2, 6, 10, 257, 1] instead

The problem starts already at

padertorch/padertorch/ops/_stft.py

Line 201 in 8eec9aa

signal_real, signal_imag = torch.chunk(stft_signal, 2, dim=-1)

which leads to a different shape of signal_real and signal_imag than in the concat case.

A quick fix is to unify the representation in the beginning

if self.complex_representation == 'stacked':                            
        stft_signal = torch.cat((stft_signal[...,0], stft_signal[...,1]),   
                                    dim = -1)

and then treat both representations as concat.