I suggest to add a IO for read images from a list like this to support custom image da

I agree with <a class="user-mention notranslate" data-hovercard-type="user" data-hover

Unfortunately, I have not... <a

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

load image dataset from list files about vision HOT 11 CLOSED

pytorch commented on May 27, 2024 19

load image dataset from list files

from vision.

Comments (11)

fmassa commented on May 27, 2024 4

Also, the csv might contain several columns, and you might only be interested in a subset of those.While possible to write a somewhat generic dataset, the interface might get clumsy, and one might get tempted to extend it to handle specific use-cases, making something which was supposed to be easy complicated.

To close this issue, I'll post a snippet of how one can go to writing their own dataset for csv-like files:

import pandas as pd

class PandasDataset(object):
    def __init__(self, path_to_csv_file, input_name, target_name):
        self.dataset = pd.read_csv(path_to_csv_file)
        self.input_name = input_name
        self.target_name = target_name
        # add transforms as well

    def __getitem__(self, idx):
        item = self.dataset.iloc[idx]
        # add transforms
        return item[self.input_name], item[self.target_name]

    def __len__(self):
        return len(self.dataset)

from vision.

hyojinie commented on May 27, 2024 3

I am using this and often times the data loading speed is very slow (inconsistently.. some images take 0.001 second while others take 10 second). When number of workers are N, every N-th batch takes 10 or more second while other batches takes less time. Any ideas?

from vision.

fmassa commented on May 27, 2024 2

I agree with @yannadani, if you have a dataset text file it's very easy to write a dataset class to parse it. For example, one could want to use pandas to parse arbitrary csv files (which could have the space as a separator), and many input and target labels per example.

Do you think there would be value in adding a generic dataset for csv files, that tries to handle arbitrary number of data from different types? That seems like an overkill, given how easy it is to write your own dataset.

Let me know what you think.

from vision.

hyojinie commented on May 27, 2024 2

Unfortunately, I have not...

…

On Thu, Nov 1, 2018 at 6:15 PM, PantherGSU ***@***.***> wrote: I am using this and often times the data loading speed is very slow (inconsistently.. some images take 0.001 second while others take 10 second). When number of workers are N, every N-th batch takes 10 or more second while other batches takes less time. Any ideas? Yes, I also facing this problem, have you has any idea solve this? If you solved, please share with us. Many Thanks — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#81 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AOH7IvAOUKvsHr8i4zbT20gu1vg-uyUdks5uq5yZgaJpZM4MPJcE> .

from vision.

dlmacedo commented on May 27, 2024 1

Make it a pull request.

from vision.

yannadani commented on May 27, 2024 1

I believe that using rich python libraries, one can leverage the iterator of the dataset class to do most of the things with ease. Passing a text file and reading again from it seems a bit roundabout for me. It is fine for caffe because the API is in CPP, and the dataloaders are not exposed as in pytorch.

from vision.

Jiaming-Liu commented on May 27, 2024

Agree with this but the title is misleading. Would better to call it load image dataset from list files.

BTW, I think it would be helpful if you make it a pull request.

from vision.

yannadani commented on May 27, 2024

@fmassa I believe the question would be how generic can it be. In this case, the dataset will be limited to csv files and there might be some use cases which has some data\path-to-data which is not present in csv, for example in a mat file or a xml file in case of annotations. I believe unless more people use csv, then it might just be an overkill.

from vision.

stites commented on May 27, 2024

I'm working with datasets (like in the face poses tutorial) where the labels exist in a file alongside the images and it would be useful to have a simple ImageFolder-like abstraction which just says "treat these columns as our labels."

I'd imagine that if one column is given, the data is using a simple regression or classification label and if multiple columns are given, the output is a numpy array / torch tensor which needs to be reshaped or post-processed.

It looks like this thread is working towards that, but the issue is closed -- is this abstraction too trivial or too uncommon to go into torchvision?

from vision.

PantherYan commented on May 27, 2024

I am using this and often times the data loading speed is very slow (inconsistently.. some images take 0.001 second while others take 10 second). When number of workers are N, every N-th batch takes 10 or more second while other batches takes less time. Any ideas?

Yes, I also facing this problem, have you has any idea solve this?
If you solved, please share with us. Many Thanks

from vision.

fmassa commented on May 27, 2024

@PantherYan this happens because of the way data loading is done.
Your pre-processing / loading is very slow, so I see two possibilities:

make it faster by identifying the bottleneck in loading / processing
increase the number of loader threads

from vision.

load image dataset from list files about vision HOT 11 CLOSED

Comments (11)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent