Code Monkey home page Code Monkey logo

rising's Introduction

logo Unittests

PyPI codecov PyPI - License Chat Documentation Status pre-commit.ci statusDOI

What is rising?

Rising is a high-performance data loading and augmentation library for 2D and 3D data completely written in PyTorch. Our goal is to provide a seamless integration into the PyTorch Ecosystem without sacrificing usability or features. Multiple examples for different use cases can be found in our tutorial docs e.g. 2D Classification on MedNIST, 3D Segmentation of Hippocampus (Medical Decathlon), Example Transformation Output, Integration of External Frameworks

Why another framework?

rising TorchIO Batchgenerators Kornia DALI Vanilla PyTorch Albumentations
Volumetric
Gradients
GPU
Backend PyTorch PyTorch/SITK NumPy PyTorch C++ PyTorch NumPy

Docs

master

Installation

Pypi Installation

pip install rising

Editable Installation for development

git clone [email protected]:PhoenixDL/rising.git
cd rising
pip install -e .

Running tests inside rising directory (top directory not the package directory)

python -m unittest

Check out our contributing guide for more information or additional help.

What can I do with rising?

Rising currently consists out of two main modules:

rising.loading

The Dataloader of rising will be your new best friend because it handles all your transformations and applies them efficiently to the data either on CPU or GPU. On CPU you can easily switch between transformations which can only be performed per sample and transformations which can be applied per batch. In contrast to the native PyTorch datasets you don't need to integrate your augmentation into your dataset. Hence, the only purpose of the dataset is to provide an interface to access individual data samples. Our DataLoader is a direct subclass of the PyTorch's dataloader and handles the batch assembly and applies the augmentations/transformations to the data.

rising.transforms

This module implements many transformations which can be used during training for preprocessing and augmentation. All of them are implemented directly in PyTorch such that gradients can be propagated through the transformations and (optionally) it can be applied on the GPU. Finally, all transforms are implemented for 2D (natural images) and 3D (volumetric) data.

In the future, support for keypoints and other geometric primitives which can be assembled by connected points will be added.

rising MNIST Example with CPU and GPU augmentation

rising uses the same Dataset structure as PyTorch and thus we can just reuse the MNIST dataset from torchvision.

import torchvision
from torchvision.transforms import ToTensor

# define dataset and use to tensor trafo to convert PIL image to tensor
dataset = torchvision.datasets.MNIST('./', train=True, download=True,
                                     transform=ToTensor())

In the next step, the transformations/augmentations need to be defined. The first transforms converts the Sequence from the torchvision dataset into a dict for the following rising transform which work on dicts. At the end, the transforms are compose to one callable transform which can be passed to the Dataloader.

import rising.transforms as rtr
from rising.loading import DataLoader, default_transform_call
from rising.random import DiscreteParameter, UniformParameter

# define transformations
transforms = [
    rtr.SeqToMap("data", "label"),  # most rising transforms work on dicts
    rtr.NormZeroMeanUnitStd(keys=["data"]),
    rtr.Rot90((0, 1), keys=["data"], p=0.5),
    rtr.Mirror(dims=DiscreteParameter([0, 1]), keys=["data"]),
    rtr.Rotate(UniformParameter(0, 180), degree=True),
]

# by default rising assumes dicts but torchvision outputs tuples
# so we need to modify `transform_call` to support sequences and dicts
composed = rtr.Compose(transforms, transform_call=default_transform_call)

The Dataloader from rising automatically applies the specified transformations to the batches inside the multiprocessing context of the CPU.

dataloader = DataLoader(
    dataset, batch_size=8, num_workers=8, batch_transforms=composed)

Alternatively, the augmentations can easily be applied on the GPU as well.

dataloader = DataLoader(
    dataset, batch_size=8, num_workers=8, gpu_transforms=composed)

If either the GPU or CPU is the bottleneck of the pipeline, the Dataloader can be used to balance the augmentations load between them.

transforms_cpu = rtr.Compose(transforms[:2])
transforms_gpu = rtr.Compose(transforms[2:])

dataloader = DataLoader(
    dataset, batch_size=8, num_workers=8,
    batch_transforms=transforms_cpu,
    gpu_transforms=transforms_gpu,
)

More details about how and where the augmentations are applied can be found below. You can also check out our example Notebooks for 2D Classification, 3D Segmentation and Transformation Examples.

Dataloading with rising

In general you do not need to be familiar with the whole augmentation process which runs in the background but if you are still curious about the detailed pipeline this section will give a very short introduction into the backend of the Dataloader. The flow charts below highlight the differences between a conventional augmentation pipeline and the pipeline used in rising. CPU operations are visualized in blue while GPU operations are green.

The flow chart below visualizes the default augmentation pipeline of many other frameworks. The transformations are applied to individual samples which are loaded and augmented inside of multiple background workers from the CPU. This approach is already efficient and might only be slightly slower than batched execution of the transformations (if applied on the CPU). GPU augmentations can be used to perform many operations in parallel and profit heavily from vectorization. DefaultAugmentation

rising lets the user decide from case to case where augmentations should be applied during this pipeline. This can heavily dependent on the specific tasks and the underlying hardware. Running augmentations on the GPU is only efficient if they can be executed in a batched fashion to maximize the parallelization GPUs can provide. As a consequence, rising implements all its transformations in a batched fashion and the Dataloader can execute them efficiently on the CPU and GPU. Optionally, the Dataloader can still be used to apply transformations on a per sample fashion, e.g. when transforms from other frameworks should be integrated. RisingAugmentation

Because the rising augmentation pipeline is a superset of the currently used methods, external frameworks can be integrated into rising.

Project Organization

Issues: If you find any bugs, want some additional features or maybe just have a question don't hesitate to open an issue :)

General Project Future: Most of the features and the milestone organisation can be found inside the projects tab. Features which are planned for the next release/milestone are listed under TODO Next Release while features which are not scheduled yet are under Todo.

Slack: Join our Slack for the most up to date news or just to have a chat with us :)

rising's People

Contributors

borda avatar firasgit avatar haarburger avatar justusschock avatar mibaumgartner avatar ndalton12 avatar nkpmedia avatar pre-commit-ci[bot] avatar weningerleon avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

rising's Issues

[Bug] Compose doesn't forward pytorch functions to internal transforms

Description
Compose uses a list to store the transformation which leads to some problems when specific functions of torch.nn.Module should also be applied to children (e.g. the to() method).

Quick fix: change list to torch.nn.ModuleList which limits our transformation to rising transforms which are subclasses of torch.nn.Module

Other solutions: look at which functions we really need and overwrite them appropriately (should at least fix problems with to())

Status: looking for a better solution because that is not really satisfying...
Any ideas @justusschock @haarburger ?

Lightning segmentation missing visualisation

Hi and thanks for the great module! I went through your segmentation example and that's very useful for my project. Unfortunately, you don't show how one can visualise the predictions of the network (i.e. show examples of the network predict to the test/validation dataset using the ground truth and the images).

Let me know if I just missed that! :)

[Question] Multi-gpu support?

Description
Hi, I am interested in doing GPU transforms on batched data, but I am wondering if there is support for multi-gpu? Right now I am using pytorch lightning and its data modules. In distributed training, each gpu gets its own process to run a data module -- so I am thinking that by virtue of torch.cuda.current_device(), gpu transforms will just run correctly on the right GPU. I will test this theory tomorrow, but advice is appreciated. Thanks!

[Bug] Docstring for random_crop claims it returns crops corner but it doesn't

Description
When calling rising.transforms.functional.crop.random_crop the docstring says it returns the crop corner. It doesn't.

Environment

  • OS: Windows 10
  • Python version: Python 3.8.5
  • rising version: 0.2.0post0
  • How did you install rising? pip

Reproduction

import torch
from rising.transforms.functional import random_crop

x = torch.zeros(1, 1, 10, 10)
print(random_crop(x, (3, 3)))  # Should have returned both crop and corner

[FeatureRequest] Generalize affine and grid transforms to support keys with different spatial size

Description
The grid is only created for the first element in the batch. This behaviour should be generalised to support keys with different spatial size without introducing computation overhead (e.g. computing and augmenting a grid multiple times even though the keys have the same spatial size)

it should be save to ignore the number of channels and only focus on spatial size of the grid because pytorch does not use that anyway (even though affine_grid wants the number of channels) https://github.com/pytorch/pytorch/blob/74b65c32be68b15dc7c9e8bb62459efbfbde33d8/aten/src/ATen/native/AffineGridGenerator.cpp#L34-L62

[FeatureRequest] FollowUp on docs

  • On my local build the collapse on the right does not work as on pytorch docs (no clue why)
  • Integrate notebooks properly
  • For each file: Write introductory section, what this file does. (They will be included automatically)

[Bug] Progressive resizing only works with one process

Description
Internal state of progressive resizing is not updated correctly when used with multiple processes.

Environment

  • OS: MacOS, Ubuntu
  • Python version: 3.7
  • rising version: 0.0.a

Reproduction
An additional integration test for this transform needs to be created.

[Bug] Stacking affines/grid transforms with different keys

Description
When stacking affine/grid transforms with different keys per transform, the whole transformation will be applied to all keys.

Proposal
Temporarily, check keys when stacking transforms and raise error if keys are changing. Additionally, open a feature request issue to support this special case (I do not think we should add support for this right away because it will make things fairly complicated)

[FeatureRequest] Queue for GPU transforms

Description & Proposal
Introduce a queue for GPU transforms to enable optional asynchronous augmentation is training and augmentation is performed on different GPUs.

[FeatureRequest] ApplyMask transform

Description
Add a transform that applies a binary mask to an image

Proposal
Given a mask key, apply the mask to the image to set all background voxels to a predefined value

*Are you able/willing to implement the feature yourself (with some guidance from us)?
yes

[FeatureRequest] Improve Pull Request Template

Description
This issue is intended to provide a forum to improve our PR template

Proposals

Additional todos:

  • update codeowners.md
  • update changelog

Furthermore, I would like to propose to structure the todos a little bit more. Something like:
Developer (the person who implements the PR), RisingMember (someone who has rights to add lables, projects, modify codeowner , chagelog ... )

We could also introduce a reviewer section where we add some points which every reviewer should check (probably with an link to our contribution guideline)

Do you have additional points or do not like any of the above? @haarburger @justusschock

[FeatureRequest] Interface for random number

Benefits:
-> much easier testing
-> complete control over parameters for transformations for users

e.g.
Mirror(dims=choose(0, 1, 2)) would sample mirror axis
Mirror(dims=(0,)) would always mirror the 0th dim

Implementation:
Based on classes
Enable sampling of multiple values (probably a tensor) in one iteration

[FeatureRequest] `Per Sample` option transforms

Description
Introduce a Per Sample option to transforms.

Proposal
How could the feature be implemented?
Spatial Transforms:

  1. Transforms which use pytorch function -> use loop inside the functional.
  2. If possible, introduce an affine equivalent which can be stacked with other affine transforms and will support per sample augmentation without any loop.

Cropping: Probably only possible with an internal loop

Affine Transforms: already support this option

Intensity/ Channel Transforms: tbd

[FeatureRequest] GPU transforms

Description
Currently, the data loader can not execute transformations on the GPU due to multiprocessing and some pickling issues. A workaround is to manually apply the transforms during training before the networks gets the data.

Proposal
How could the feature be implemented?
Native support of GPU transformations for the data loader.

Update docs of Resize

Docs of resize wrongly states that the new size must contain the new size including batch size and channels in both, the functional interface as well as the module interface.

[Bug] Wrong point/image trafo

Description
@mibaumgartner
The point transformation has to be the inverse of the image transformation.

Environment

  • OS:
  • Python version:
  • rising version 0.2.0.post0
  • How did you install rising? [ pip]

Reproduction

imgT = parametrize_matrix(rotation=30, scale=1, translation=0, image_transform=True, batchsize=1, ndim=2)
pointT = parametrize_matrix(rotation=30, scale=1, translation=0, image_transform=False, batchsize=1, ndim=2)
print("img Trafo A:")
print(to_homogeneus_matrix(imgT))
print("point Trafo B:")
print(to_homogeneus_matrix(pointT))
print("C should be equal to B:")
print(to_homogeneus_matrix(imgT).inverse())

The code works if the points are not xy but yx because
matrix_revert_coordinate_order(pointT) produces matrix C.

More over the permutation of the sub transformations if wrong because it changes with the inverse operation.

[Bug] GPU transforms are not fed GPU data for keys other than 'data'

Description
If I supply GPU transforms that operate on multiple keys to a DataLoader, then only data for the data key is transferred to the GPU prior to feeding it to the transforms. For example, if I'm doing spatial transforms (such as flipping), I want to flip both the data and labels - and I want it to happen on the GPU for speed.

The error seems to happen at line 187 of loading/loader.py:

if gpu_transforms is not None:
if device is None:
device = torch.cuda.current_device()
to_gpu_trafo = ToDevice(device=device, non_blocking=pin_memory)
gpu_transforms = Compose(to_gpu_trafo, gpu_transforms)
gpu_transforms = gpu_transforms.to(device)

No keys argument is given to ToDevice so it uses its default which is keys = ('data',), c.f. transforms/tensor.py:52.

Environment

  • OS: Windows 10
  • Python version: Python 3.8.5
  • rising version: 0.2.0post0

Reproduction

from rising.loading import DataLoader
from rising.transforms.abstract import BaseTransform


def check_on_gpu(x):
    assert(x.is_cuda)
    return x


class GpuChecker(BaseTransform):
    def __init__(self, keys=('data',)):
        super().__init__(augment_fn=check_on_gpu, keys=keys)


if __name__ == '__main__':
    # Data is definitely on CPU...
    data = [
        { 'data': 1, 'label': 1 },
        { 'data': 2, 'label': 2 },
        { 'data': 3, 'label': 3 }
    ]

    # This will work
    print('Only data')
    loader = DataLoader(data, gpu_transforms=GpuChecker())
    for x in loader:
        print(x)

    # This will crash
    print('Both data and labels')
    loader = DataLoader(data, gpu_transforms=GpuChecker(('data', 'label')))
    for x in loader:
        print(x)

[Bug] Mirror transformation does not accept prob keyword parameter

Description
When rising.transforms.spatial.Mirror is called with the prob keyword parameter, it is stored in **kwargs and forwarded in Mirror.init() to the parents BaseTransform.init() which in turn forwards it in BaseTransform.forward() to the functional.mirror() function, which does not take prob as an keyword argument. It seems like the prob argument is not handled at all. The documentation for rising.transforms.spatial.Mirror probably is just wrong and should drop the argument prob.
Environment

  • OS: linux/ubuntu
  • Python version: 3.7.7
  • rising version master
  • How did you install rising?
git clone [email protected]:PhoenixDL/rising.git
cd rising
pip install -e .

Reproduction
rtr.Mirror(dims=DiscreteParameter([0, 1]), keys=["data"], prob=0.5)

[FeatureRequest] Determine transform call inside compose function instead of Batchtransformer

Description
Currently the Batchtransformer decides how to call the transformation

if self._transforms is not None:
if isinstance(batch, Mapping):
batch = self._transforms(**batch)
elif isinstance(batch, Sequence):
batch = self._transforms(*batch)
else:
batch = self._transforms(batch)

Proposal
I would propose to call the transforms with a simple positional argument inside the Batchtransformer and add a keyword argument to the respective compose functions. By default the keyword argument does the same thing as the Batchtransformer and tries to identify the optimal way by checking the type but the user has the option to influence this.

Additional context
Instead of automatically unpacking the batch when passing it to the transforms, the user can control this behaviour (if needed).

Thoughts @justusschock ?

[Bug] Cachedataset needs pickable load function if num_worker>0

Description
Pickle error when num_workers>0 and

  • mode == "extend"

  • load function can not be pickled

  • tqdm can also be used with multiprocessing, this should also be addressed

Environment

  • OS: MacOS, Ubuntu
  • Python version: 3.7
  • rising version 0.0.a

Reproduction
Just change up the test case

Solution
Could try something like 'pathos' or 'dill' https://stackoverflow.com/questions/8804830/python-multiprocessing-picklingerror-cant-pickle-type-function

[Bug] Scale(...,adjust_size=True) does not result in image with all the content of original image

Description
What happens? What should happen?
Using the Scale transformation rt.Scale( scale=1.25, adjust_size = True ) with input size (D,H,W) = (32, 192, 192) does not result in a scaled version of the input image with all the content of the image present and resolution (D,H,W)*1.25, but instead results in an image with resolution (D,H,W) / 1.25 and the same image content as the result of rt.Scale( scale=1.25, adjust_size = False ), which is, as expected, effectively a center crop of 3/4 the input image size.

  • Input
    input
  • Scale(adjust_size=False)
    adjust_size=False
  • Scale(adjust_size=True) (Note same content as False but resolution (D,H,W) / 1.25 )
    adjust_size=True

Maybe this stems from parametrize_matrix, create_scale, _check_new_img_size in transforms.functional.affine.py, as create_scale is called with default value image_transform=True in parametrize_matrix, which appears to inverse the scale so that the ' scale' parameter in the ' rt.Scale()' effectively refers to the image scale and not the GridSampler scale. This seems to be not correctly handled in _check_new_img_size

I think this
Environment

  • OS: ubtunu/mint
  • Python version: 3.7
  • rising version: 0.2.0.post0+3.g2a580e9
  • How did you install rising? cloned master

Reproduction
Use any volumetric input image and the above mentioned transformations.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.