Comments (10)
wow that is some spaghetti code..
Good point! You can definitely make transforms another layer by subclassing the nn.Module
class or the Functional
class and implementing the transform in the forward()
function (see the part about creating extensions in the tutorials: https://github.com/pytorch/tutorials .. note that these functions don't have to be differentiable or have gradients, so it's completely possible to do what you're saying).
Although, transforms at the sampler level will get the benefit of using the multi-processing and queuing.. so you can do these transforms in parallel with model training.. probably more efficient that way idk. It is definitely trivial to apply the same transforms to input/labels w/ sampling... it's just not implemented right now
from vision.
I think that some of these transforms are going to be very handy indeed!
I think that at the moment we are trying to figure out a way of better handling random transforms applied to both several images (as discussed in #9 ), and once that is sorted out, we will be adding more transforms, and any help is welcome!
One note though, I wonder is RandomZoom
is not redundant with RandomCrop
+ Scale
. Is that the case?
from vision.
Cool. Sure yes you are right about the zoom.. was just using an example :). I use transforms in my own code which apply to both input and target images (segmentation) so I definitely understand that issue.. This would require changes to the sampling code and/or Compose
function. The way I do it is to first build up the affine matrix as a combination of sub-transforms, then have the target image as an optional argument and apply it to that as necessary.
e.g.
def random_transform(x, y=None):
# build up sub-transforms
if random_shift:
random_shift_matrix = ..affine transform matrix..
if random_rotation:
...
# combine sub-transforms into a single transform for one interpolation
final_transform = np.ones(3,3)
for sub_transform in sub_transforms:
final_transform = np.dot(final_transform, sub_transform)
# apply transform to x
x = apply_transform(x, final_transform)
# apply to y if necessary
if y:
y = apply_transform(y, final_transform)
return x, y
You could also build up this transform matrix, then store it if you still want to keep Compose
working on a single image.
I attached a full function taken from my code (largely adapted from keras/preprocessing/image.py
) which shows basically how this works.
It would be easy to pick these transforms out and combine them in the Compose
function, perhaps by adding self.requires_interpolation = True
to the relevant transform classes.
from vision.
Yes, we definitely would like to add more transforms! Thanks!
Could you please upload the function as a gist?
from vision.
Oh good call. Main takeway from this function is just to present one possible way to combine multiple transforms.
https://gist.github.com/ncullen93/66c1803f9a3dccd1d63b041c90ecf784
from vision.
At least to my own private repo I need to add the data augmentation pieces used by the 3-D Unet work: "Besides rotation, scaling and gray value augmentation, we apply a smooth dense deformation field on both data and ground truth labels. For this, we sample random vectors from a normal distribution with standard deviation of 4 in a grid with a spacing of 32 voxels in each direction and then apply a B-spline interpolation."
They have a patch for caffe - but I decided I'd rather go through the pain of reimplementing it and have the relative sanity of the pytorch API than use caffe. My question is, is augmentation (mostly) specific to medical imaging too exotic for general consumption?
http://lmb.informatik.uni-freiburg.de/resources/opensource/unet.en.html
from vision.
Hey I work with structural brain images as well! The Unet sampling is very similar to what my lab uses, but to answer your question - yes I think something like "smooth dense deformation field" is quite exotic but there is definitely a growing need for good sampling/transforms for 3D images (especially taking 3D sub-volumes or 2D slices).. If you're asking whether that type of stuff will ever be included in pytorch, I can't answer that but I hope so! Good, comprehensive sampling is a second-class citizen in most of the big DL packages.
It is very straight-forward to add your own transforms and dataset pre-processing steps in pytorch, so you should go after it and make that part at least publicly available! People will find it useful and may contribute!
To make a transform, just create a callable class:
class SmoothDeformation(object):
def __init__(self, params):
self.params = params
def __call__(self, input):
.... apply smooth deformation ...
You can string multiple transforms together in theCompose
class.
Unfortunately, there is no supported way in pytorch to perform a transform on both the input and target images at the same time, but hopefully that will be supported soon.
from vision.
@ncullen93 Although the caffe syntax (see below) itself is horrific I think the notion of the transforms being just another layer at train time is very appealing as it maps cleanly to running the transformations on GPU concurrently with training and it makes it logically trivial to apply the same transformation to both input and labels. Since the current setup doesn't support either and hasn't planned for it I'm inclined to go with making it an add on module. I'll re-do it "the pytorch way" down the line when it can satisfy those two requirements.
http://lmb.informatik.uni-freiburg.de/resources/opensource/3dUnet_miccai2016_no_BN.prototxt
from vision.
Thanks for the pointers I will definitely take a stab at that. I really appreciate the prompt (and enthusiastic response). At least for a first pass I'm more comfortable just implementing it as as a simple forward function.
You may be right but how would you apply the transforms to the labels too? It needs to do so statefully to apply the same transformation to the labeled data. It doesn't seem like the API currently is set up for that. Also, in terms of using SMP, my CPU is just a 6-core Broadwell (I'm told an Intel core peaks at ~75GFLOPS) and my GPU is a Pascal Titan X which has a peak of 10.8TFLOPS and an achievable of 7.4TFLOPS. IMO, if one has a reasonable GPU the CPU's job is just to keep the GPU fed with data.
from vision.
Sry accidentally closed the issue. I can think of one way - adding a co_transform
argument to a Dataset
subclass. You can pretty much do whatever you want to the input/target in the __call__
method.
Adapted from the current ImageFolder
class:
class ImageFolder(data.Dataset):
def __init__(self, root, transform=None, target_transform=None,
co_transform=None, loader=default_loader):
classes, class_to_idx = find_classes(root)
imgs = make_dataset(root, class_to_idx)
if len(imgs) == 0:
raise(RuntimeError("Found 0 images in subfolders of: " + root + "\n"
"Supported image extensions are: " + ",".join(IMG_EXTENSIONS)))
self.root = root
self.imgs = imgs
self.classes = classes
self.class_to_idx = class_to_idx
self.transform = transform
self.target_transform = target_transform
self.co_transform = co_transform # ADDED THIS
self.loader = loader
def __getitem__(self, index):
path, target = self.imgs[index]
img = self.loader(os.path.join(self.root, path))
if self.transform is not None:
img = self.transform(img)
if self.target_transform is not None:
target = self.target_transform(target)
if self.co_transform is not None:
img, target = self.co_transform(img, target) # ADDED THIS
return img, target
def __len__(self):
return len(self.imgs)
Easy enough :). Now just create a transform that takes in both input and image..
class MyCoTransform(object):
def __init__(self):
pass
def __call__(self, input, target):
# do something to both images
return input, target
There's a good post about this on the pytorch forum if you search for it.
from vision.
Related Issues (20)
- Choose either 'long' or 'short' options for the resize anchor edge if the size variable is scalar HOT 3
- About uint16 support
- When VideoClip processes video, the final voice output time is not equal to the video time because of the different sampling of video and voice. HOT 1
- Unable to obtain results of ResNet50 v1 HOT 1
- Improved functionality for Oxford IIIT Pet data loader HOT 2
- Any operation on loaded image segfaults since 0.17.1 on Mac HOT 2
- Image scaling is performed incorrectly (in all detection models!) HOT 5
- IndexError: index 168 is out of bounds for dimension 0 with size 168 in keypointrcnn_loss HOT 1
- Nightly build flaky pytorch/vision / conda-py3_11-cpu builds HOT 1
- AVX512 support machine cannot resize uint8 image with BILINEAR interpolation as it is
- RuntimeError/AssertionError when finetuning fasterrcnn_resnet50_fpn on visdrone dataset HOT 3
- Mypy job is broken
- Regarding IMAGENET1K_V1 and IMAGENET1K_V2 weights
- Compiling resize_image: function interpolate not_implemented HOT 1
- AttributeError: module 'torchvision.transforms' has no attribute 'v2' HOT 1
- Run all torchvision models in one script. HOT 1
- Build fails: error: unknown type name 'j_decompress_ptr' HOT 3
- Differences in CPU vs CUDA resize for uint8 images HOT 2
- Enable Video models for other tasks
- Can't use gaussian_blur if sigma is a tensor on gpu HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from vision.