Code Monkey home page Code Monkey logo

Comments (8)

sweverett avatar sweverett commented on August 11, 2024

@jjobel @paigemkelly now that we are moving on to the plotting functions, this is something to keep in mind. I haven't double checked yet, but I assume that this has to do with an issue with the pivot. For example, here's what plotting the simple output of the scaled data looks like:
image
These lambdas don't make sense because it's really plotting ln(lambda) - pivot(x) = ln(lambda) - median(ln(lambda)). For the final plots I reverse this scaling in plotlib.py, but there may be a bug that does this incorrectly. Something to keep an eye out for.

from clustr.

sweverett avatar sweverett commented on August 11, 2024

Actually, just looking at that it seems wrong. Isn't the pivot supposed to be the median of lambda, not the median of the scaled lambda @tejelt ?

from clustr.

sweverett avatar sweverett commented on August 11, 2024

Ok now I'm unsure again. In my mind, the idea of the pivot is to choose the "center" of the data that you are fitting to minimize correlation between your fitted slope & intercept. In that case, it would seem that the choice of median of the scaled lambda is correct.

from clustr.

tejelt avatar tejelt commented on August 11, 2024

What do you mean by scaled lambda? You want ln(lambda/pivot) where pivot is the median lambda. The ln(median(lambda)) should be (roughly) the same as median(ln(lambda). The plot axes are definitely weird above. Also not sure of the y-axis. This must be scaled by something.

from clustr.

sweverett avatar sweverett commented on August 11, 2024

Ignore the y, this was just a test plot the rewrite branch made to make sure it was running. Don't take the numbers seriously.

I went through the math and think I've discovered my confusion. In my mind, the whole point of a pivot is to shift the data distribution to the center to minimize the correlation on slope & intercept. So I visualized it like this:
y = m * (x' - x_0) + b
where x_0 is the pivot. With this definition, x_0 = pivot = med(x) = med(ln(lambda)), where x' is the lambda in the scaled space (what I meant by "scaled lambda"). However, I could not get this to work consistently with ln(lambda/pivot). It looks like what people instead do is the following:
x_0 = med(ln(lambda)) = ln(lambda_p)=ln(pivot)
where lambda_p is the lambda corresponding to the median in the scaled ln space, x'. So I think this all came down to me thinking that x_0 was the pivot (which is normally the convention when you're just dealing with a linear fit outside of any transformations), whereas here it is lambda_p.
Does any of that make sense?

from clustr.

sweverett avatar sweverett commented on August 11, 2024

Here's a shorter version of my argument, starting from the usual definition:

L = a * (lambda / pivot) ^ b
ln(L) = ln(a) + b*[ln(lambda) - ln(pivot)]
y = intercept + slope*(x' - x_0)
y = intercept + slope*x

Thus x_0 is not the pivot referenced by the usual equation, and so the pivot that clustr.py computes:

# Log-x before pivot
 xlog = np.log(data.x)

# Set pivot
if piv_type == 'median':
    piv = np.median(xlog)

# Scale log_x by pivot
log_x = xlog - piv

is inconsistent with the usual equation.

from clustr.

sweverett avatar sweverett commented on August 11, 2024

Now this difference in definition may not actually matter. Here is the unscale() function in the plotting code, which takes the data in the fitted (x,y) space to the original (lambda, L) space:

def unscale(x, y, x_err, y_err, x_piv):
    ''' Recover original data from fit-scaled data '''
    return (np.exp(x + x_piv), np.exp(y), x_err * x, y_err * y)

This transformation is completely consistent with my definition of x_piv from above, as:

ln(L) = ln(a) + b*[ln(lambda) - ln(lambda_p)]
-> y = intercept + slope * [ln(lambda) - pivot]
-> y = intercept + slope * x
-> lambda = e ^ (x + pivot)

So I don't see how the pivot would cause an incorrect lambda in the plots. But we can easily double check this with some tests.

from clustr.

sweverett avatar sweverett commented on August 11, 2024

The plot_scatter function takes the data from the loaded catalog and plots it directly with errorbar(), it doesn't even interact with any of the scaling or unscaling functions. So any bug in displayed lambda values would come from the catalog reading itself, which I find to be much less likely.

The place I was worried about this was if I was displaying lambda / lambda_piv incorrectly on the plots by using x_piv. However, it looks like I was lazy and just had it print out x / x_piv:
image
Thus, so far I don't see any bugs or inconsistencies other than vocabulary.

from clustr.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.