After reading your paper and being impressed by the results I was curious of how you i

(<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/u

why does it work so well (theoretical arguments)? about n2v HOT 5 CLOSED

juglab commented on July 20, 2024

why does it work so well (theoretical arguments)?

from n2v.

Comments (5)

psteinb commented on July 20, 2024 1

(@tibuch @alex-krull please correct me if I am wrong.)

I always understood the core idea of n2v as based on the mental model of additive stochastic noise on-top of a non-stochastic signal in an image.

Leaving issues with hyperparameters aside for a moment, I always thought that the blind-spot architecture trained with SGD forces the network to estimate the intensity of the blinded pixel by using a neighborhood of i.i.d. intensities. At the heart of this lies the assumption, that mini-batched SGD inherently tries to produce a representation for the mean distribution of values (here pixel intensities) of the training data. If you now return to the model from above, the mean values must origin from the signal as the noise is i.i.d. around the signal intensities.

This concept is also described in the noise2noise paper.

from n2v.

alex-krull commented on July 20, 2024 1

Thanks Peter, I agree.
I hope the following explanation can help:

It helps to imagine a situation where an infinite amount of training data is available.
During training, the network will see the same noisy blind spot input patch with different conflicting target values, depending of the instantiation of the noise.
Since we are using the MSE loss, the network will learn to predict the expected value when it is presented with conflicting target values.
Note that it cannot predict the original noisy value 42, because this value will be different for each instantiation of the noise and there is no possibility to infer the value from the (masked) input.
We assume that the noise is zero centred, which means that the conflicting target values for that pixel are centred around the true signal at the pixel, i.e. the expected value of the noisy values is equal to the expected value of the clean signal at the pixel.

This is indeed the same general approach as in noise2noise.
We can use a noisy target, because we know that the expected value of the noisy target is the same as the expected value of the true signal we are interested in.
You can view noise2void as noise2noise, with the central pixel removed from the input.
Removing the pixel allows us to use it as our target. This is just as good as taking a pixel from a second noise image, as it's done in noise2noise.

Note that the noise does not have to be i.i.d. for each pixel, Poisson noise would be an example that is zero centred but differently distributed in every pixel depending on the underlying signal.

from n2v.

codingS3b commented on July 20, 2024 1

Thanks for the explanations and the discussion in yesterdays meeting! I am probably making this more complicated now when trying to throw formulas in, but it would be great to be able to rigorously derive the n2v behaviour. Here is what I understood (and sorry for doing the latex via images, but github seems to not support tex directly):
You define the output of the CNN for a given pixel i as

For training n2v, you use as label the noisy observation x_i and as input the modified receptive field of x_i where only the pixel value of x_i is replaced by some neighborhood pixel:
I denote that as

Then, (in the infinite sample case) the following objective is minimzed:

(Here, I am already not sure about if the expectation is running over all pixels or only the ones from the receptive field)

From my statistics classes I recall, that the mean squared error loss between two random variables X and Y (where Y is observed and X the random variable which we want to do inference about.)

is minimized by the conditional expectation (also a random variable)

However, I struggle to apply that fact to the n2v objective to see that (assuming the network optimization is able to find parameters achieving the minimum),

Once again, sorry for this complicated formulation of the question but I hope I am missing some theoretical arguments which you could point me to.

from n2v.

alex-krull commented on July 20, 2024

Hi Sebastian,
I am sorry for my late reply.
I hope the explanation below helps.
If not, please let me know where I lose you.

Let me go through your equations one by one and see where are on the same page and where not:

Eq. 1: I completely agree. I am not 100% sure of the i_+1, i_-1, ... notation, but I think we agree that the receptive field should include all pixels it some kind of square around pixel i.
Eq. 2: You can view it like this, but I would rather see the modified receptive field as simply excluding the pixel i from Eq. 1. Note that there really are network architectures by now [1,2] that have this type of receptive field. For the general principle it does not matter whether we use such an architecture or whether we achieve this indirectly via masking as we do in N2V. It is easiest to assume we have a particular architecture that has this property.
Eq. 3: This equation is correct. The expected value is over all the pixels (together with their respective blind spot receptive field) in your training data distribution. You can think of this (ignoring the fact that your training images have borders) by simply randomly selecting pixels from your training set and cropping their surrounding receptive field. If you did this for a large number of pixels from your training set and average the result, you have an approximation of the expected value.
Eqs. 4, 5, 6: I totally agree with both the equations and they are the key to why this works. Imagine now that your training set is of truly infinite size. If you go through this training set, always looking at pairs of x_i and m(RF_i), you will, because the set in infinite, observe the same blind spot receptive field m(RF_i) together with different values x_i in the center pixel. So it now makes sense to think of a probability distribution p(x_i|m(RF_i)) over pixel values x_i given a fixed blind spot receptive field m(RF_i).
The Y in your Eq. 5 corresponds to x_i, the X corresponds to the blind spot receptive field m(RF_i).
Like you say in your Eq. 5. the solution to minimizing the squared error is really just the expected value of x_i with respect to this conditional distribution p(x_i|m(RF_i)).
Now, since the network uses the squared error a squared loss to x_i, it will learn a mapping to this expected value.
It can be shown (assuming the noise is zero centered) that this expected value is the clean signal at the pixel s_i.

from n2v.

tibuch commented on July 20, 2024

I will close this discussion for now. Please feel free to reopen it. Another excellent place for this discussion would be forum.image.sc/.

from n2v.

why does it work so well (theoretical arguments)? about n2v HOT 5 CLOSED

Comments (5)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent