stjordanis / dataloaders.jl Goto Github PK

View Code? Open in Web Editor NEW

A parallel iterator for large machine learning datasets that don't fit into memory inspired by PyTorch's `DataLoader` class.

License: MIT License

Julia 100.00%

dataloaders.jl's Introduction

DataLoaders

Documentation (latest)

A threaded data iterator for machine learning on out-of-memory datasets. Inspired by PyTorch's DataLoader.

It uses to load data in parallel while keeping the primary thread free. It can also load data inplace to avoid allocations.

Many data containers work out of the box and it is easy to extend with your own.

DataLoaders is built on top of and fully compatible with MLDataPattern.jl's Data Access Pattern, a functional interface for machine learning datasets.

Usage

x = rand(128, 10000)  #  10000 observations of size 128
y = rand(1, 10000)

dataloader = DataLoader((x, y), 16)

for (xs, ys) in dataloader
    @assert size(xs) == (128, 16)
    @assert size(ys) == (1, 16)
end

Of course, in the above example, we can keep the dataset in memory and don't need parallel workers. See Custom data containers for a more realistic example.

Getting Started

If you get the idea and know it from PyTorch, see Quickstart for PyTorch users.

Otherwise, read on here.

Available methods are documented here.

Acknowledgements

Recommend Projects

stjordanis / dataloaders.jl Goto Github PK

dataloaders.jl's Introduction

DataLoaders

Usage

Getting Started

Acknowledgements

dataloaders.jl's People

Contributors

Watchers

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent