Comments (6)
Hey, so you are interested in the pairs of source and destination. Something like (x.jpg, test/x.jpg)? What is your use case for the paths? When do you need the file paths instead of moving/copying the files?
from split-folders.
Hey, so you are interested in the pairs of source and destination. Something like (x.jpg, test/x.jpg)? What is your use case for the paths? When do you need the file paths instead of moving/copying the files?
Exactly!
the reason is two things:
- I often have a whole lot of files - sometimes above 500GB. copying takes up too much space
- I want to keep the original pile of annotation in the original structure to be able to keep track of my data version, and what happens to the data since it was added (via Weights and Biases)
I therefore often just need a list of the split file pairs and can add it by filename.
I still from time to time want to physically split or copy files and folders, therefore I though it could make sense to be able to get the lists of filenames in the different splits as outputs
from split-folders.
Maybe what I would actually need is just the list of source files that would be in each split. for my current scenario i am working on semantic segmentation, and the folder structure is therefore:
- images
- masks
it would then be nice to be able to get all the source destinations for images and masks in the different splits: train, val and test
from split-folders.
Thanks for the explanations. I will look into the issue.
from split-folders.
I'm not sure if this package is right for you. I does not support this kind of folder structure. I think scikit learn got you covered: https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.train_test_split.html
from split-folders.
I also find that several repositories require you to organise your dataset in a specific data/
directory under their main codebase, which further requires you to have train, val, test splits. Different codebases might have different requirements/structure. So while working with multiple codebases at once, to be efficient and save some space instead of copying/moving files to different directories, its much easier to create symlinks (ln -s
). See issue #31. I have created a pull request #48 for the same and tested it.
from split-folders.
Related Issues (20)
- It would be better to have option of moving images rather than copying
- How to access file once created HOT 1
- splitfolders is not copying files but execute without error HOT 2
- To speed up, use "ln -s" instead of "cp" ? HOT 1
- AttributeError: 'PosixPath' object has no attribute 'rstrip' HOT 1
- Splitting all numbers of files of a folder into train val test using fixed will throw an error HOT 1
- Feature Request : Specify file format(s) HOT 1
- group_by_prefix function finding multiple matches
- Feature Request : Automatically set an optimal default for fixed when oversampling
- Feature Request: Train/Test split by file HOT 1
- Feature Request: Stratify Train/Test by Class
- Need to split time series data into train/val/test, but cannot shuffle
- There is still an error in split_class_dir_fixed
- Feature request
- Feature request: ratio() classes structure option argument HOT 1
- I am unable to create train/test/val folders
- Feature request: Split with Cross Validation
- Feature request: Input folder path
- Can this library works with txt files?
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from split-folders.