Hi! I think the transforms could be improved by adding a multicrop t

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

About your points: While I agree that it would be a simple ext

1 - makes sense 2 - I think FiveCrop and <code cl

Could anyone elaborate on how are FiveCrop and <code

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Multicrop - missing feature about vision HOT 13 CLOSED

pytorch commented on May 27, 2024

Multicrop - missing feature

from vision.

Comments (13)

soumith commented on May 27, 2024

do you mean, like 5-crop and 10-crop?

from vision.

edouardoyallon commented on May 27, 2024

Hi, yes. It can be implemented by using a custom transform for the input and the class. I'm trying to do one. Hopefully it will be nice enough to be PR

from vision.

catalystfrank commented on May 27, 2024

@edouardoyallon have you done an implementation of multi-crop? If so where can I find it ? Many thanks.

from vision.

penguinshin commented on May 27, 2024

Has anyone done the multi-crop implementation yet?

from vision.

alykhantejani commented on May 27, 2024

I can implement this but I have a few comments/questions:

I think we should only implement 5-crop as the 10-crop version is the flipped 5-crops and after #240 is merged this will be trivial to do with just the 5-crop version.
should we call it Oversample following the caffe standard? Or do you have a better name?
This transform wont be useful for training (at least easily) as if your transform emits multiple images/tensors you would need to have a custom collate function to put them all into a batch as far as I can tell. I think the same problem occurs if you want to batch these 5/10 crops for fast evaluation using the pytorch DataLoader?

from vision.

fmassa commented on May 27, 2024

About your points:

While I agree that it would be a simple extension to have the 10-crop from the 5-crop, I think that the 10-crop is common enough so that having the 10 crops would be the most used case, so I'd rather keep it.
oversample is not a representative name to me. In fb-resnet-torch, they called it TenCrop, what do you think? Or maybe MultiCrop?
I believe this transform is only used for testing, in order to improve results during evaluation, so I think this shouldn't be a concern. We could eventually just concatenate over the 1st dimension, for example, but that might indeed need a different collate function.

from vision.

alykhantejani commented on May 27, 2024

1 - makes sense
2 - I think FiveCrop and TenCrop are pretty good, MultiCrop wouldn't accept any other number so might be misleading?
3 - yeah I think even for testing you will need a special collate function (unless you want to test one image at a time (10-cropped)).

Ok great, I think that's enough to go on. Will send a PR soon.

from vision.

alykhantejani commented on May 27, 2024

This has now been merged into master

from vision.

wlnirvana commented on May 27, 2024

Could anyone elaborate on how are FiveCrop and TenCrop supposed to be used? It seems they return plain Python list, which fails ToTensor as it is neither an ndarray nor PIL.Image.

from vision.

fmassa commented on May 27, 2024

@wlnirvana for the moment, the plug with ToTensor doesn't work, and you need to manually call ToTensor for each of the outputs, following an approach similar to the one described in #230 . We will discuss about a better way of doing this, because it doesn't fit nicely with the dataloader either.

from vision.

wlnirvana commented on May 27, 2024

@fmassa I see. Thanks for the quick response.

from vision.

fmassa commented on May 27, 2024

cc @alykhantejani so that we discuss on this this week

from vision.

fmassa commented on May 27, 2024

@wlnirvana so, we discussed with @alykhantejani and we think the best in this case is to use the functional interface for that. We reached the conclusion that you need anyway to do some post-processing of the results because you will want to average/max the results over the crops.
Here is a snippet that illustrates how you can do it:

def my_transform(img):
  # do any transforms you want here
  imgs = ten_crop(img)  # this is a list of PIL Images
  return torch.stack([normalize(to_tensor(x)) for x in imgs], 0) # returns a 4D tensor

Now in your test loop, you obtain batch from your data loader, and you'd do something like

input, target = batch
# input is a 5d tensor
bs, ncrops, c, h, w = input.size()
result = model(input.view(-1, c, h, w)) # fuse batch size and ncrops
# say result is a 2D tensor
result_averaged = result.view(bs, ncrops, -1).mean(1) # avg over crops

from vision.

Multicrop - missing feature about vision HOT 13 CLOSED

Comments (13)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent