Code Monkey home page Code Monkey logo

betterloader's Introduction

Making it harder to do easy things, but easier to do harder things with the Pytorch Dataloader


AboutInstallationUsageDevelopmentLicense

PyPi Badge PyPi Version Github Actions Build Status Issues license


About BetterLoader

BetterLoader is a hyper-customizable extension of the default PyTorch dataloader class, that allows for custom transformations pre-load and image subset definitions. Use the power of custom index files to maintain only a single copy of a dataset with a fixed, flat file structure, and allow BetterLoader to do all the heavy lifting.

Installation

pip install betterloader

Usage

BetterLoader allows you to dynamically assign images to labels, load subsets of images conditionally, perform custom pretransforms before loading an image, and much more.

Basic Usage

A few points worth noting are that:

  • BetterLoader does not expect a nested folder structure. In its current iteration, files are expected to all be present in the root directory.
  • Every instance of BetterLoader requires an index file to function. Sample index files may be found here.
from betterloader import BetterLoader

index_json = './examples/sample_index.json'
basepath = "./examples/sample_dataset/"
batch_size = 2

loader = BetterLoader(basepath=basepath, index_json_path=index_json)
dataloaders, sizes = loader.fetch_segmented_dataloaders(batch_size=batch_size, transform=None)

print("Dataloader sizes: {}".format(str(sizes)))

For more information and more detailed examples, please check out the BetterLoader docs!

Development

We use Makefile to make our lives a little easier :)

Install Dependancies

make install

Run Sample

make sample

Run Unit Tests

make test

Meta

Distributed under the MIT license. See LICENSE for more information.

Documentation & Usage

betterloader's People

Contributors

devopsbinit avatar ishaanchandratreya avatar jamesbollas avatar raghavmecheri avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

betterloader's Issues

Baseline unit test suites

Feature Request

Description of Problem:

We currently have no unit tests, literally just some really basic integration tests, to get us off the ground.

Potential Solutions:

We need to write a baseline test suite, looking at things like unit testing the individual custom classes and their helper methods, as well as maybe a few overall integration tests as well

Review the README.md

Feature Request

Description of Problem:

The README.md for this project could probably be more succinct, and documentation about things like the Makefile could probably be removed.

Potential Solutions:

Maybe we could use something like PyTorch's README, as a baseline (ours should be less elaborate, of course). The idea would be to make the README slightly more concise and potentially refined, while still keeping it succinct and to the point.

Unsupervised Learning support

Feature Request

Description of Problem:

BetterLoader should support unsupervised learning tasks too

Potential Solutions:

The first step is definitely to review the data loading process for models like Autoencoders and chart out a gameplan based on that. Updates to this ticket are coming soon

Fix BetterLoader landing page

Feature Request

We really need to give the landing page a facelift

Description of Problem:

  1. There are too many Docusaurus defaults on the landing page that we never changed

Potential Solutions:

  1. Customise the icons on the landing page
  2. Pick a consistent color scheme

Usage Documentation

Feature Request

Description of Problem:

Again, our usage documentation is extremely minimal. To the point where this is probably unusable until we write usage docs.

Potential Solutions:

Just got to take a few hours out and document how far we've got so far :)

Baseline Integration Test Suite

Feature Request

Description of Problem:

We currently have 2 integration tests, literally for the sake of having them. This needs to be fixed

Potential Solutions:

Given BetterLoader's modular nature, a comprehensive integration test suite would be key moving forward. Testing various types of index files, as well as the consequent functions that would handle them would all be an integral part of ensuring that we don't break what we've already got :)

Landing page refresh

Feature Request

Description of Problem:

We can probably make the website a little more clear overall

Potential Solutions:

  1. Make the actual documentation more informative
  2. Get rid of all the extra detail, and move that to an API documentation page

Add support for SubsetRandomSampler

Feature Request

Adding a sampler param option for the BetterLoader

Description of Problem:

As we begin moving towards supporting unsupervised learning, one of the first steps will be allowing a user to pass an SubsetRandomSampler object into the BetterLoader, which would be used in order to arbitrarily load data.

Potential Solutions:

Let's use #19 to do this.

Rectify all the default variables being passed around

Feature Request

Description of Problem:

We've taken a fairly risky/bad approach by just resorting to setting function params to arbitrary defaults when they aren't passed in. While this works sometimes, I think we've overused this approach and should really trim parts of it down.

Potential Solutions:

Eliminate optional parameters in any non-public function, and actually throw exceptions when things aren't right, rather than passing None values around

Implement transforms as dictionaries

Feature Request

Description of Problem:

Right now, we just pass a single transform object in. This is inconvenient if we want different transforms for train, test, and val, as mentioned in #25.

Potential Solutions:

Rename the transform parameter to transforms and treat it as a dictionary instead. We can then split it within fetch_segmented_dataloaders . We would also want to update our tests to reflect this change + cover the edge case where the transforms parameter is either None or {}

data_transforms

Hi !
Is it possible to use a data transform dictionary?
Like this :

data_transforms = {   
    'train': transforms.Compose([
        transforms.Resize([224,224]),
        transforms.RandomHorizontalFlip(),
        transforms.ToTensor(),
        transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
    ]),
    'val': transforms.Compose([
        transforms.Resize([224,224]),
        transforms.ToTensor(),
        transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
    ]),
    'test': transforms.Compose([
        transforms.Resize([224,224]),
        transforms.ToTensor(),
        transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
    ]),
}

And do you have an example for the dataset_metadata ?

Add support for DataLoaderParams Metadata key

Feature Request

As we require more and more custom params to be set at the underlying dataloader level, specifying all of these in a variable object that is a key-value pair of the metadata object would be useful.

Description of Problem:

We're going to end up adding more and more constructor args to mimic DataLoader args. This deals with that whole problem entirely.

Potential Solutions:

Add a dataloader_params key to the dataset_metadata parameter passed into the BetterLoader. This would contain a dict of key-value pairs that we would want to set on the Dataloader level

Explain dataset metadata better

Feature Request

Description of Problem:

The Dataset Metadata section of the Getting Started docs page definitely needs some details for the callable function parameters passed as key-value pairs.

The current description is confusing and is really difficult to understand without peeping under the hood.

Potential Solutions:

We should probably add a short function example along with a docstring for every callable parameter listed

Shuffle

Hi !
Is it possible to make a shuffle ?
I did not find in the documentation.

Change index and subset file inputs to object inputs instead

Feature Request

Description of Problem:

You don't always have access to subset and index files saved - sometimes you want to generate them dynamically. Being able to pass in objects/arrays is more useful than passing in just filenames, especially since those filenames can just be opened and then accessed anyway.

Potential Solutions:

We may have to tweak the metadata functions that dynamically read these files, but aside from that, I think it would be a useful value-add.

@JamesBollas thoughts?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.