Code Monkey home page Code Monkey logo

Comments (14)

DStrelak avatar DStrelak commented on August 19, 2024

Hi @MohamadHarastani ,
Thanks for reporting this issue.
I have no experience with this program, but I guess I can at least debug the crash. Can you provide me with the big input, or tell me how to generate it?

from xmipp.

DStrelak avatar DStrelak commented on August 19, 2024

@cossorzano , can you please have a look at the output, or assign somebody who knows this program?

from xmipp.

MohamadHarastani avatar MohamadHarastani commented on August 19, 2024

Hi @MohamadHarastani ,
Thanks for reporting this issue.
I have no experience with this program, but I guess I can at least debug the crash. Can you provide me with the big input, or tell me how to generate it?

Hi @DStrelak ,
Thanks for looking into this. Here is a bigger input that I have used as a test (I didn't test other than this)
X.txt
This is the command you need to see the error
xmipp_matrix_dimred -i X.txt -o Y.txt -m PCA --din 35832 --samples 100 --dout 2 --saveMapping M.txt
It fails for these methods and gives the message between the parenthesis:
PCA (KILLED)
LLTSA (KILLED)
LPP (KILLED)
pPCA (KILLED)
SPE (Segmentation fault (core dumped))
NPE (XMIPP_ERROR 30: Incorrect matrix dimensions)

While it works for the remaining methods (LTSA, DM, kPCA, LE, HLLE)

from xmipp.

cossorzano avatar cossorzano commented on August 19, 2024

I will have a look into this to see if I can find the problem

from xmipp.

DStrelak avatar DStrelak commented on August 19, 2024

Hi @MohamadHarastani ,
so, I had a look at the issue. It seems that the big input is indeed too big. On my machine, it crashes due to insufficient memory. Your example uses around 19GB of memory, and then fails at another allocation (of ~9.5GB). In theory, we can release 9.5GB during the process. Still, you will need at least 20GB of RAM.
I have prepared a hotfix for devel version of xmipp (xmippCore, ds_issue315_dimredCrashing), which fixes PCA.
We will discuss if we will include this change into release, and I will have a look at the other cases meanwhile.

from xmipp.

cossorzano avatar cossorzano commented on August 19, 2024

I have gone over the PCA problem, and it is indeed a memory problem. The possible solution by David is not really that useful, because it frees the memory in Xmipp in the function firstEigs, which is certainly not pleasant, as in that function we do not know what that matrix is used for.

I am checking now for LLTSA

from xmipp.

DStrelak avatar DStrelak commented on August 19, 2024

which is certainly not pleasant, as in that function we do not know what that matrix is used for.

Indeed.

from xmipp.

cossorzano avatar cossorzano commented on August 19, 2024

Hi @MohamadHarastani , I have gone over LLTSA and it is also a memory problem. I guess it is the same for all the methods that get killed. Apart from the interesting stress problem, in which context in CryoEM do you have such a high input dimensionality?

I will check now about the mapping problem you reported first.

from xmipp.

cossorzano avatar cossorzano commented on August 19, 2024

Hi @MohamadHarastani ,

the reason for the difference between the projected PCA and the one in python, is that in PCA you have to subtract first the mean by columns

from numpy import loadtxt, matmul, sum, mean, outer, ones
X = loadtxt('X.txt')
M_PCA = loadtxt('M_PCA.txt')
M_pPCA = loadtxt('M_pPCA.txt')
Xp = X-outer(ones(X.shape[0]),mean(X, axis=0))
print('PCA mapping verification (should be zero) ', sum(matmul(Xp,M_PCA) - loadtxt('Y_PCA.txt')))
print('pPCA mapping verification (should be zero) ', sum(matmul(Xp,M_pPCA) - loadtxt('Y_pPCA.txt')))

The result is
('PCA mapping verification (should be zero) ', 9.5580000628398e-05)
('pPCA mapping verification (should be zero) ', -1.150100000465483e-06)

Cheers and thank you for such detailed issues, they are very easy to reproduce.
Carlos Oscar

from xmipp.

MohamadHarastani avatar MohamadHarastani commented on August 19, 2024

Hi @cossorzano and @DStrelak ,
Thanks for looking into this.

Apart from the interesting stress problem, in which context in CryoEM do you have such a high input dimensionality?

Each line of this data is an atomic structure (atom coordinates reshaped into a line).
I was able to use PCA from scikit-learn without asking for anything specific other than the output dimensions (by these codes on the big X.txt ).

import numpy as np
import matplotlib.pyplot as plt
from sklearn import decomposition
X = np.loadtxt('X.txt')
pca = decomposition.PCA(n_components=3)
X = pca.fit_transform(X)
plt.figure()
plt.scatter(X[:, 0], X[:, 1])
plt.show()

I looked at how PCA is handled there and they seem to use the method of (Halko et al. 2009) for a size bigger than 500*500.
You can search on this page for (svd_solverstr {‘auto’, ‘full’, ‘arpack’, ‘randomized’}). We can consider the issue with the big matrix a secondary problem.

Thanks for your efforts

from xmipp.

MohamadHarastani avatar MohamadHarastani commented on August 19, 2024

Hi @MohamadHarastani ,

the reason for the difference between the projected PCA and the one in python, is that in PCA you have to subtract first the mean by columns

from numpy import loadtxt, matmul, sum, mean, outer, ones
X = loadtxt('X.txt')
M_PCA = loadtxt('M_PCA.txt')
M_pPCA = loadtxt('M_pPCA.txt')
Xp = X-outer(ones(X.shape[0]),mean(X, axis=0))
print('PCA mapping verification (should be zero) ', sum(matmul(Xp,M_PCA) - loadtxt('Y_PCA.txt')))
print('pPCA mapping verification (should be zero) ', sum(matmul(Xp,M_pPCA) - loadtxt('Y_pPCA.txt')))

The result is
('PCA mapping verification (should be zero) ', 9.5580000628398e-05)
('pPCA mapping verification (should be zero) ', -1.150100000465483e-06)

Cheers and thank you for such detailed issues, they are very easy to reproduce.
Carlos Oscar

Thanks a lot @cossorzano for the solution.
Honestly, it is not intuitive to subtract the mean for these two methods particularly. I was depending on an implementation of inverse PCA in our HEMNMA plugin that was wrong for years (inverse PCA to generate animations).
Now I can fix all.
But please add somewhere in xmipp_matrix_dimred discription that these two methods shoule be treated specially.

Thanks again,
Cheers,
Mohamad

from xmipp.

DStrelak avatar DStrelak commented on August 19, 2024

Each line of this data is an atomic structure (atom coordinates reshaped into a line).

I strongly recommend to NOT use xmipp_matrix_dimred for this. I tried it using your dataset. My patience run out after 80 minutes :-D

from xmipp.

MohamadHarastani avatar MohamadHarastani commented on August 19, 2024

Each line of this data is an atomic structure (atom coordinates reshaped into a line).

I strongly recommend to NOT use xmipp_matrix_dimred for this. I tried it using your dataset. My patience run out after 80 minutes :-D

Oh 80 minutes! I didn't reach to this point before. If you try to use the sklearn PCA that I put before it works in seconds on the same data.
Thanks a lot @DStrelak for your patience,
Cheers

from xmipp.

MohamadHarastani avatar MohamadHarastani commented on August 19, 2024

I want to close this issue so I did another check.
Actually, all the methods require that subtracting the mean from original matrix, and not only the two that I reported (PCA, pPCA).
To verify, I repeated the testing using the instructions of @cossorzano

from numpy import loadtxt, matmul, sum, mean, outer, ones
X = loadtxt('X.txt')
M_PCA = loadtxt('M_PCA.txt')
M_LLTSA = loadtxt('M_LLTSA.txt')
M_LPP = loadtxt('M_LPP.txt')
M_pPCA = loadtxt('M_pPCA.txt')
M_NPE = loadtxt('M_NPE.txt')

Xp = X-outer(ones(X.shape[0]),mean(X, axis=0))

print('PCA mapping verification (should be zero) ', sum(matmul(Xp,M_PCA) - loadtxt('Y_PCA.txt')))
print('LLTSA mapping verification (should be zero) ', sum(matmul(Xp,M_LLTSA) - loadtxt('Y_LLTSA.txt')))
print('LPP mapping verification (should be zero) ', sum(matmul(Xp,M_LPP) - loadtxt('Y_LPP.txt')))
print('pPCA mapping verification (should be zero) ', sum(matmul(Xp,M_pPCA) - loadtxt('Y_pPCA.txt')))
print('NPE mapping verification (should be zero) ', sum(matmul(Xp,M_NPE) - loadtxt('Y_NPE.txt')))

And here is the result:

PCA mapping verification (should be zero)  9.5580000628398e-05
LLTSA mapping verification (should be zero)  1.2249999997534415e-05
LPP mapping verification (should be zero)  0.5852513499999976
pPCA mapping verification (should be zero)  -1.150100000465483e-06
NPE mapping verification (should be zero)  -0.6755314400000025

Meaning, only this description of "xmipp_matrix_dimred" should explain what is x, y and m to solve this confusion:

   [--saveMapping <fn=>]
           Save mapping if available (PCA, LLTSA, LPP, pPCA, NPE) so that it can be reused later (Y=X*M) 

I will add a little description to the file and open a PR.

Regards,
Mohamad

from xmipp.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.