Comments (5)
Add optional argument group_prefix
in 0.4.0
.
group_prefix
needs the length of the group. So set it to group_prefix=2
if you have an image and a text for each item.
from split-folders.
group_prefix
seems to take into account the whole filename and separate files using the extension. I was thinking more on something that could use a prefix in the filename. It will give an example for the sake of clarity. Imagine I have a set of images like:
485092_Soft_DXm.1.2.840.113619.2.401.101117117236165.6548190722132000.31.jpeg
485092_Soft_DXm.1.2.840.113619.3.401.10111711723611.101117117236165.jpeg
1037264_Normal_DXm.1.2.840.113619.2.401.101117117236165.29256180320170934.3.jpeg
1522377_Normal_DXm.1.2.840.113619.2.401.101117117236165.13665191212160814.3.jpeg
1338551_Normal_DXm.1.2.840.113619.2.401.101117117236165.14135180423173036.7.jpeg
1094100_Hard_DXm.1.2.840.113619.2.401.101117117236165.14398190521104701.11.jpeg
1094100_Normal_DXm.1.2.840.113619.2.401.1011171172361636165.141351804231.jpeg
I would like to use the first numbers before _
to group images and keep them in the same folder. In this case, the first and second files (485092_) and the last two (1094100_) will group together.
from split-folders.
No, the prefix is derived dynamically based on the number of files that belong to one group. It only works if the number of fields for each group is the same. In your example the number of files for each group is different. Is your example a real-world scenario?
from split-folders.
Yes, this is a real world scenario.
Sample with the same ID belong to the same patient but images have been took differently. I could manage to have the same number of characters for the prefix if that may help.
Another example would be the PANDA kaggle challenge. In this dataset, you have huge images with sparse information. One strategy in this dataset would be to tile the image in small parts- In there, you could generate tiles that all of them will bear the same sample name but may differ in the coordinates where you extracted the images.
from split-folders.
I understand that this is a real world scenario. But for me, this is outside the scope of this package. If anybody wants to implement it, please open a new issue.
from split-folders.
Related Issues (20)
- It would be better to have option of moving images rather than copying
- How to access file once created HOT 1
- splitfolders is not copying files but execute without error HOT 2
- To speed up, use "ln -s" instead of "cp" ? HOT 1
- AttributeError: 'PosixPath' object has no attribute 'rstrip' HOT 1
- Splitting all numbers of files of a folder into train val test using fixed will throw an error HOT 1
- Feature Request: output filepaths in lists without moving or copying files HOT 6
- Feature Request : Specify file format(s) HOT 1
- group_by_prefix function finding multiple matches
- Feature Request : Automatically set an optimal default for fixed when oversampling
- Feature Request: Train/Test split by file HOT 1
- Feature Request: Stratify Train/Test by Class
- Need to split time series data into train/val/test, but cannot shuffle
- There is still an error in split_class_dir_fixed
- Feature request
- Feature request: ratio() classes structure option argument HOT 1
- I am unable to create train/test/val folders
- Feature request: Split with Cross Validation
- Feature request: Input folder path
- Can this library works with txt files?
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from split-folders.