Comments (8)
Hi @sidijju ,
We'll have to make sure the v2 version is still compatible with v1. However, there should be an easy workaround for the issue you're seeing - e.g. perhaps setting check_v1_compatibility=(size is not None)
? Since this is a new feature that v1 doesn't have, we shouldn't need to check for v1 compat for these parameter values anyway.
I'm not 100% sure what the best solution is right now though, do you mind opening a PR so I can take a better look? It's totally OK if the PR isn't completely ready just yet (you can e.g. set it as "draft")
from vision.
@NicolasHug Just made a PR!
from vision.
Closed by #8459, thank you @sidijju !
Thanks for the feedback @MyNameIsFu , this will be available soon in the next version (0.19)
from vision.
Thanks for the feature request @sidijju .
In order to get [500, 250]
and [250, 500]
from these specific input images, you could set size=499, max_size=500
. But of course, this isn't a great UX, and it might not be possible to find a size
value that would satisfy all input images.
There's been discussion of adding an edge
parameter in the past but the parameters of resize are already fairly confusing. It seems that if we were to allow size=None, max_size=500
we could implement the behaviour you are looking for and this should cover all of the potential use cases:
- size=tuple -> resize to fixed size
- size=int -> resize shorter edge to
size
while preserving aspect ratio - size=int, max_size=int -> try to resize shorter edge to
size
while preserving aspect ratio but if resulting longer edge exceeds max_size, then scale down. This corresponds to the resizing strategy of some detection models. - size=None, max_size -> resize longer edge to max_size while preserving aspect ratio.
The first 3 are already implemented, the last one isn't. Any thoughts @pmeier @vfdev-5 ?
from vision.
Thanks for the reply @NicolasHug! I agree that setting size=499, max_size=500
would work for this set of input images, but I'm not sure about the effect this would have on a more varied dataset. I also agree that it isn't the best UX since it's not very intuitive.
I think the proposal for a size=None
option is a good stopgap for now. If others agree and I have some guidance from more experienced contributors, I can attempt to implement this feature.
from vision.
Thanks for your feedback @sidijju . I'm happy to review a PR from you if you'd like to try to submit one. Our contributing guide is here: https://github.com/pytorch/vision/blob/main/CONTRIBUTING.md
For this specific change you'd only need to update torchvision.transforms.v2
(transform class, PIL functional and tensor functional). No need to change the "v1" transforms, i.e. the stuff in torchvision.transforms
from vision.
Hey @NicolasHug! Quick question for you regarding the development of this feature - would it be acceptable to make the new v2 Resize no longer compatible with the v1 Resize (by removing the _v1_transform_cls attribute from v2.Resize)?
Some of the current unit tests check for v1 compatibility, and this is running into issues since the v1 Resize doesn't accept size=None
as an argument. In this scenario, we could implement a way to extract parameters for the v1 version of the transform, as other v2 transforms have done, but the problem with this is that there isn't a clean way to provide a value for the v1 size
that will closely approximate the result of the v2 size = None, max_size = x
.
Right now, I've implemented a workaround that sets size
to max_size - 1
in these scenarios (which passes all the unit tests) but I'd appreciate your opinion on this before I make a PR.
from vision.
Using the current implementation (if using the workaround with size = max_size-1
) on different aspect ratios results in different behaviours for shapes (n, m)
n < m --> (new_n, max_size)
n == m --> (max_size - 1, max_size -1)
which crashes my pipeline for mixed datasets.
Your PR would fix this issue so thanks and i'm looking forward for the merge!
from vision.
Related Issues (20)
- ImportError: cannot import name 'datapoints' from 'torchvision.prototype' HOT 1
- crossvit vs vision transformer HOT 2
- GPU accelerated video loading with optimizations for reading at specific timestamps or time intervals HOT 1
- add typing to torchvision.models.detection.faster_rcnn HOT 2
- ConvertImageDtype not converting properly from uint8 HOT 3
- loss_box_reg increasing while training mask rcnn HOT 1
- Torch Load Warning causing test suite to fail. HOT 5
- Prebuilt .whl and .conda of torchvision for aarch64 + cuda
- RuntimeError: Argument #4: Padding size should be less than the corresponding input dimension for v2 transforms HOT 4
- ImageReadMode should support strings
- Better decoder docs
- Suitable augmentation of Keras in Pytorch HOT 1
- Detection lr_scheduler.step() called every step instead of every epoch HOT 1
- Use COCO Mask Parsing from pycocotools HOT 1
- Simplify transfer learning by modifying get_model() HOT 1
- Grayscale transform does not work correctly for 3 channels to 1. HOT 2
- Deepcopy of torchvision.models.feature_extraction.DualGraphModule breaks eval HOT 1
- error: invalid argument '-std=c++17' not allowed with 'C' HOT 1
- Missing f-string at the error messages in torchvision.utils._parse_colors HOT 1
- `torchvision.datasets.Imagenette()` might raise RuntimeError when setting `download=True`
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from vision.