Code Monkey home page Code Monkey logo

Comments (11)

fmassa avatar fmassa commented on May 27, 2024 4

Also, the csv might contain several columns, and you might only be interested in a subset of those.While possible to write a somewhat generic dataset, the interface might get clumsy, and one might get tempted to extend it to handle specific use-cases, making something which was supposed to be easy complicated.

To close this issue, I'll post a snippet of how one can go to writing their own dataset for csv-like files:

import pandas as pd

class PandasDataset(object):
    def __init__(self, path_to_csv_file, input_name, target_name):
        self.dataset = pd.read_csv(path_to_csv_file)
        self.input_name = input_name
        self.target_name = target_name
        # add transforms as well

    def __getitem__(self, idx):
        item = self.dataset.iloc[idx]
        # add transforms
        return item[self.input_name], item[self.target_name]

    def __len__(self):
        return len(self.dataset)

from vision.

hyojinie avatar hyojinie commented on May 27, 2024 3

I am using this and often times the data loading speed is very slow (inconsistently.. some images take 0.001 second while others take 10 second). When number of workers are N, every N-th batch takes 10 or more second while other batches takes less time. Any ideas?

from vision.

fmassa avatar fmassa commented on May 27, 2024 2

I agree with @yannadani, if you have a dataset text file it's very easy to write a dataset class to parse it. For example, one could want to use pandas to parse arbitrary csv files (which could have the space as a separator), and many input and target labels per example.

Do you think there would be value in adding a generic dataset for csv files, that tries to handle arbitrary number of data from different types? That seems like an overkill, given how easy it is to write your own dataset.

Let me know what you think.

from vision.

hyojinie avatar hyojinie commented on May 27, 2024 2

from vision.

dlmacedo avatar dlmacedo commented on May 27, 2024 1

Make it a pull request.

from vision.

yannadani avatar yannadani commented on May 27, 2024 1

I believe that using rich python libraries, one can leverage the iterator of the dataset class to do most of the things with ease. Passing a text file and reading again from it seems a bit roundabout for me. It is fine for caffe because the API is in CPP, and the dataloaders are not exposed as in pytorch.

from vision.

Jiaming-Liu avatar Jiaming-Liu commented on May 27, 2024

Agree with this but the title is misleading. Would better to call it load image dataset from list files.

BTW, I think it would be helpful if you make it a pull request.

from vision.

yannadani avatar yannadani commented on May 27, 2024

@fmassa I believe the question would be how generic can it be. In this case, the dataset will be limited to csv files and there might be some use cases which has some data\path-to-data which is not present in csv, for example in a mat file or a xml file in case of annotations. I believe unless more people use csv, then it might just be an overkill.

from vision.

stites avatar stites commented on May 27, 2024

I'm working with datasets (like in the face poses tutorial) where the labels exist in a file alongside the images and it would be useful to have a simple ImageFolder-like abstraction which just says "treat these columns as our labels."

I'd imagine that if one column is given, the data is using a simple regression or classification label and if multiple columns are given, the output is a numpy array / torch tensor which needs to be reshaped or post-processed.

It looks like this thread is working towards that, but the issue is closed -- is this abstraction too trivial or too uncommon to go into torchvision?

from vision.

PantherYan avatar PantherYan commented on May 27, 2024

I am using this and often times the data loading speed is very slow (inconsistently.. some images take 0.001 second while others take 10 second). When number of workers are N, every N-th batch takes 10 or more second while other batches takes less time. Any ideas?

Yes, I also facing this problem, have you has any idea solve this?
If you solved, please share with us. Many Thanks

from vision.

fmassa avatar fmassa commented on May 27, 2024

@PantherYan this happens because of the way data loading is done.
Your pre-processing / loading is very slow, so I see two possibilities:

  • make it faster by identifying the bottleneck in loading / processing
  • increase the number of loader threads

from vision.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.