Comments (5)
(@tibuch @alex-krull please correct me if I am wrong.)
I always understood the core idea of n2v
as based on the mental model of additive stochastic noise on-top of a non-stochastic signal in an image.
Leaving issues with hyperparameters aside for a moment, I always thought that the blind-spot architecture trained with SGD forces the network to estimate the intensity of the blinded pixel by using a neighborhood of i.i.d. intensities. At the heart of this lies the assumption, that mini-batched SGD inherently tries to produce a representation for the mean distribution of values (here pixel intensities) of the training data. If you now return to the model from above, the mean values must origin from the signal as the noise is i.i.d. around the signal intensities.
This concept is also described in the noise2noise paper.
from n2v.
Thanks Peter, I agree.
I hope the following explanation can help:
It helps to imagine a situation where an infinite amount of training data is available.
During training, the network will see the same noisy blind spot input patch with different conflicting target values, depending of the instantiation of the noise.
Since we are using the MSE loss, the network will learn to predict the expected value when it is presented with conflicting target values.
Note that it cannot predict the original noisy value 42, because this value will be different for each instantiation of the noise and there is no possibility to infer the value from the (masked) input.
We assume that the noise is zero centred, which means that the conflicting target values for that pixel are centred around the true signal at the pixel, i.e. the expected value of the noisy values is equal to the expected value of the clean signal at the pixel.
This is indeed the same general approach as in noise2noise.
We can use a noisy target, because we know that the expected value of the noisy target is the same as the expected value of the true signal we are interested in.
You can view noise2void as noise2noise, with the central pixel removed from the input.
Removing the pixel allows us to use it as our target. This is just as good as taking a pixel from a second noise image, as it's done in noise2noise.
Note that the noise does not have to be i.i.d. for each pixel, Poisson noise would be an example that is zero centred but differently distributed in every pixel depending on the underlying signal.
from n2v.
Thanks for the explanations and the discussion in yesterdays meeting! I am probably making this more complicated now when trying to throw formulas in, but it would be great to be able to rigorously derive the n2v behaviour. Here is what I understood (and sorry for doing the latex via images, but github seems to not support tex directly):
You define the output of the CNN for a given pixel i as
For training n2v, you use as label the noisy observation x_i and as input the modified receptive field of x_i where only the pixel value of x_i is replaced by some neighborhood pixel:
I denote that as
Then, (in the infinite sample case) the following objective is minimzed:
(Here, I am already not sure about if the expectation is running over all pixels or only the ones from the receptive field)
From my statistics classes I recall, that the mean squared error loss between two random variables X and Y (where Y is observed and X the random variable which we want to do inference about.)
is minimized by the conditional expectation (also a random variable)
However, I struggle to apply that fact to the n2v objective to see that (assuming the network optimization is able to find parameters achieving the minimum),
Once again, sorry for this complicated formulation of the question but I hope I am missing some theoretical arguments which you could point me to.
from n2v.
Hi Sebastian,
I am sorry for my late reply.
I hope the explanation below helps.
If not, please let me know where I lose you.
Let me go through your equations one by one and see where are on the same page and where not:
- Eq. 1: I completely agree. I am not 100% sure of the i_+1, i_-1, ... notation, but I think we agree that the receptive field should include all pixels it some kind of square around pixel i.
- Eq. 2: You can view it like this, but I would rather see the modified receptive field as simply excluding the pixel i from Eq. 1. Note that there really are network architectures by now [1,2] that have this type of receptive field. For the general principle it does not matter whether we use such an architecture or whether we achieve this indirectly via masking as we do in N2V. It is easiest to assume we have a particular architecture that has this property.
- Eq. 3: This equation is correct. The expected value is over all the pixels (together with their respective blind spot receptive field) in your training data distribution. You can think of this (ignoring the fact that your training images have borders) by simply randomly selecting pixels from your training set and cropping their surrounding receptive field. If you did this for a large number of pixels from your training set and average the result, you have an approximation of the expected value.
- Eqs. 4, 5, 6: I totally agree with both the equations and they are the key to why this works. Imagine now that your training set is of truly infinite size. If you go through this training set, always looking at pairs of x_i and m(RF_i), you will, because the set in infinite, observe the same blind spot receptive field m(RF_i) together with different values x_i in the center pixel. So it now makes sense to think of a probability distribution p(x_i|m(RF_i)) over pixel values x_i given a fixed blind spot receptive field m(RF_i).
The Y in your Eq. 5 corresponds to x_i, the X corresponds to the blind spot receptive field m(RF_i).
Like you say in your Eq. 5. the solution to minimizing the squared error is really just the expected value of x_i with respect to this conditional distribution p(x_i|m(RF_i)).
Now, since the network uses the squared error a squared loss to x_i, it will learn a mapping to this expected value.
It can be shown (assuming the noise is zero centered) that this expected value is the clean signal at the pixel s_i.
from n2v.
I will close this discussion for now. Please feel free to reopen it. Another excellent place for this discussion would be forum.image.sc/.
from n2v.
Related Issues (20)
- Issue with creating the network model from StructN2V HOT 2
- Unable to get sample notebooks to run HOT 5
- GPU training doesn't work? HOT 5
- Fiji error when runnnig n2v-sem demo model with bioimage.io HOT 3
- Tensorflow-gpu version 2.4.1 is not compatile with Windows HOT 12
- 'ReduceLROnPlateau' object has no attribute '_implements_predict_batch_hooks' HOT 2
- making patch size smaller in the given 3D example notebook lead to nan loss HOT 2
- Examples fails due to auth login data HOT 4
- Prediction: allocation exceeds 10% of system memory
- kernel dies during prediction after "allocation of system memory exceeds 10%"
- 2D RGB data not accessible anymore: 502 Bad Gateway HOT 5
- Issue with XLA devices HOT 1
- Issue when training HOT 1
- The direction of the 3D data test result is inconsistent with the original data HOT 4
- Check license / copyright holders HOT 1
- Verbosity management
- Model export does not work for N2V2 HOT 1
- Add 3D support for N2V2
- M1 Mac: Graph execution error on 3D data HOT 4
- error in predicting model HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from n2v.