Comments (11)
Also, the csv might contain several columns, and you might only be interested in a subset of those.While possible to write a somewhat generic dataset, the interface might get clumsy, and one might get tempted to extend it to handle specific use-cases, making something which was supposed to be easy complicated.
To close this issue, I'll post a snippet of how one can go to writing their own dataset for csv-like files:
import pandas as pd
class PandasDataset(object):
def __init__(self, path_to_csv_file, input_name, target_name):
self.dataset = pd.read_csv(path_to_csv_file)
self.input_name = input_name
self.target_name = target_name
# add transforms as well
def __getitem__(self, idx):
item = self.dataset.iloc[idx]
# add transforms
return item[self.input_name], item[self.target_name]
def __len__(self):
return len(self.dataset)
from vision.
I am using this and often times the data loading speed is very slow (inconsistently.. some images take 0.001 second while others take 10 second). When number of workers are N, every N-th batch takes 10 or more second while other batches takes less time. Any ideas?
from vision.
I agree with @yannadani, if you have a dataset text file it's very easy to write a dataset class to parse it. For example, one could want to use pandas
to parse arbitrary csv files (which could have the space as a separator), and many input and target labels per example.
Do you think there would be value in adding a generic
dataset for csv files, that tries to handle arbitrary number of data from different types? That seems like an overkill, given how easy it is to write your own dataset.
Let me know what you think.
from vision.
from vision.
Make it a pull request.
from vision.
I believe that using rich python libraries, one can leverage the iterator of the dataset class to do most of the things with ease. Passing a text file and reading again from it seems a bit roundabout for me. It is fine for caffe because the API is in CPP, and the dataloaders are not exposed as in pytorch.
from vision.
Agree with this but the title is misleading. Would better to call it load image dataset from list files.
BTW, I think it would be helpful if you make it a pull request.
from vision.
@fmassa I believe the question would be how generic
can it be. In this case, the dataset will be limited to csv files and there might be some use cases which has some data\path-to-data which is not present in csv, for example in a mat file or a xml file in case of annotations. I believe unless more people use csv, then it might just be an overkill.
from vision.
I'm working with datasets (like in the face poses tutorial) where the labels exist in a file alongside the images and it would be useful to have a simple ImageFolder
-like abstraction which just says "treat these columns as our labels."
I'd imagine that if one column is given, the data is using a simple regression or classification label and if multiple columns are given, the output is a numpy array / torch tensor which needs to be reshaped or post-processed.
It looks like this thread is working towards that, but the issue is closed -- is this abstraction too trivial or too uncommon to go into torchvision?
from vision.
I am using this and often times the data loading speed is very slow (inconsistently.. some images take 0.001 second while others take 10 second). When number of workers are N, every N-th batch takes 10 or more second while other batches takes less time. Any ideas?
Yes, I also facing this problem, have you has any idea solve this?
If you solved, please share with us. Many Thanks
from vision.
@PantherYan this happens because of the way data loading is done.
Your pre-processing / loading is very slow, so I see two possibilities:
- make it faster by identifying the bottleneck in loading / processing
- increase the number of loader threads
from vision.
Related Issues (20)
- No module named 'torchvision.transforms.v2' HOT 4
- support compact/encoded RLE masks in COCO dataset wrapper HOT 5
- PyTorch standard Coco dataset (datasets.CocoDetection) not compatible with Faster R-CNN object detection model HOT 6
- Update typing input annotations to `convert_bounding_box_format`. HOT 2
- Allow size to equal max_size for resizes on longer edge
- Choose either 'long' or 'short' options for the resize anchor edge if the size variable is scalar HOT 3
- About uint16 support
- When VideoClip processes video, the final voice output time is not equal to the video time because of the different sampling of video and voice. HOT 1
- Unable to obtain results of ResNet50 v1 HOT 1
- Improved functionality for Oxford IIIT Pet data loader HOT 2
- Any operation on loaded image segfaults since 0.17.1 on Mac HOT 2
- Image scaling is performed incorrectly (in all detection models!) HOT 5
- IndexError: index 168 is out of bounds for dimension 0 with size 168 in keypointrcnn_loss HOT 1
- Nightly build flaky pytorch/vision / conda-py3_11-cpu builds HOT 1
- AVX512 support machine cannot resize uint8 image with BILINEAR interpolation as it is
- RuntimeError/AssertionError when finetuning fasterrcnn_resnet50_fpn on visdrone dataset HOT 3
- Mypy job is broken
- Regarding IMAGENET1K_V1 and IMAGENET1K_V2 weights
- Compiling resize_image: function interpolate not_implemented HOT 1
- AttributeError: module 'torchvision.transforms' has no attribute 'v2' HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from vision.