Comments (8)
@jjobel @paigemkelly now that we are moving on to the plotting functions, this is something to keep in mind. I haven't double checked yet, but I assume that this has to do with an issue with the pivot. For example, here's what plotting the simple output of the scaled data looks like:
These lambdas don't make sense because it's really plotting ln(lambda) - pivot(x) = ln(lambda) - median(ln(lambda))
. For the final plots I reverse this scaling in plotlib.py
, but there may be a bug that does this incorrectly. Something to keep an eye out for.
from clustr.
Actually, just looking at that it seems wrong. Isn't the pivot supposed to be the median of lambda, not the median of the scaled lambda @tejelt ?
from clustr.
Ok now I'm unsure again. In my mind, the idea of the pivot is to choose the "center" of the data that you are fitting to minimize correlation between your fitted slope & intercept. In that case, it would seem that the choice of median of the scaled lambda is correct.
from clustr.
What do you mean by scaled lambda? You want ln(lambda/pivot) where pivot is the median lambda. The ln(median(lambda)) should be (roughly) the same as median(ln(lambda). The plot axes are definitely weird above. Also not sure of the y-axis. This must be scaled by something.
from clustr.
Ignore the y, this was just a test plot the rewrite
branch made to make sure it was running. Don't take the numbers seriously.
I went through the math and think I've discovered my confusion. In my mind, the whole point of a pivot is to shift the data distribution to the center to minimize the correlation on slope & intercept. So I visualized it like this:
y = m * (x' - x_0) + b
where x_0
is the pivot. With this definition, x_0 = pivot = med(x) = med(ln(lambda))
, where x'
is the lambda in the scaled space (what I meant by "scaled lambda"). However, I could not get this to work consistently with ln(lambda/pivot)
. It looks like what people instead do is the following:
x_0 = med(ln(lambda)) = ln(lambda_p)=ln(pivot)
where lambda_p
is the lambda corresponding to the median in the scaled ln space, x'
. So I think this all came down to me thinking that x_0
was the pivot (which is normally the convention when you're just dealing with a linear fit outside of any transformations), whereas here it is lambda_p
.
Does any of that make sense?
from clustr.
Here's a shorter version of my argument, starting from the usual definition:
L = a * (lambda / pivot) ^ b
ln(L) = ln(a) + b*[ln(lambda) - ln(pivot)]
y = intercept + slope*(x' - x_0)
y = intercept + slope*x
Thus x_0
is not the pivot referenced by the usual equation, and so the pivot that clustr.py
computes:
# Log-x before pivot
xlog = np.log(data.x)
# Set pivot
if piv_type == 'median':
piv = np.median(xlog)
# Scale log_x by pivot
log_x = xlog - piv
is inconsistent with the usual equation.
from clustr.
Now this difference in definition may not actually matter. Here is the unscale()
function in the plotting code, which takes the data in the fitted (x,y)
space to the original (lambda, L)
space:
def unscale(x, y, x_err, y_err, x_piv):
''' Recover original data from fit-scaled data '''
return (np.exp(x + x_piv), np.exp(y), x_err * x, y_err * y)
This transformation is completely consistent with my definition of x_piv
from above, as:
ln(L) = ln(a) + b*[ln(lambda) - ln(lambda_p)]
-> y = intercept + slope * [ln(lambda) - pivot]
-> y = intercept + slope * x
-> lambda = e ^ (x + pivot)
So I don't see how the pivot would cause an incorrect lambda in the plots. But we can easily double check this with some tests.
from clustr.
The plot_scatter
function takes the data from the loaded catalog and plots it directly with errorbar()
, it doesn't even interact with any of the scaling or unscaling functions. So any bug in displayed lambda values would come from the catalog reading itself, which I find to be much less likely.
The place I was worried about this was if I was displaying lambda / lambda_piv
incorrectly on the plots by using x_piv
. However, it looks like I was lazy and just had it print out x / x_piv
:
Thus, so far I don't see any bugs or inconsistencies other than vocabulary.
from clustr.
Related Issues (20)
- Fix incorrect scatter plot legend
- Make PEP 8 compatible
- Don't hard-code flags HOT 1
- Switch flag type to enum HOT 1
- Use warnings module
- CluStR rewrite!
- Rewrite branch needs to consolidate code HOT 6
- fitter class HOT 6
- getting rid of R HOT 6
- using parts other peoples code? HOT 4
- plotting HOT 4
- Complete first-pass run of updated pipeline HOT 12
- run_options HOT 1
- Keyerror Raised HOT 2
- fit function of fitter class won't run HOT 30
- Plotlib.py Rework HOT 7
- Flags for Rewrite HOT 7
- SNR flag HOT 5
- to do HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from clustr.