pasalab / marlin Goto Github PK
View Code? Open in Web Editor NEWA Distributed Matrix Operations Library Built on Top of Spark
A Distributed Matrix Operations Library Built on Top of Spark
Does it support the spark 1.6.0?
I'd like to compute the large sparse matrix of 100000*100000 inversion. Can I use marlin to solve the problem?
Hello,
I am trying to invert a 5000 x 5000 matrix at Google DataProc, (code below) the code already works for a 1000 x 1000 matrix in my local pc.
However, it seems something is happening when calling the inverse method, the job fails and I get this in the log :
Any ideas ?
LOG:
fourth
fifth
17/09/14 14:32:15 INFO org.apache.hadoop.mapred.FileInputFormat: Total input paths to process : 1
sixth
septh
[Stage 1:> (0 + 2) / 2]
[Stage 1:=============================> (1 + 1) / 2]
17/09/14 14:32:28 INFO com.github.fommil.jni.JniLoader: successfully loaded /tmp/jniloader3386225062470282445netlib-native_system-linux-x86_64.so
17/09/14 14:32:29 INFO com.github.fommil.jni.JniLoader: already loaded netlib-native_system-linux-x86_64.so
CODE
def main(args: Array[String]) {
System.out.println("first")
val conf = new SparkConf()
System.out.println("second")
conf.set("spark.default.parallelism","8")
System.out.println("third")
val sc = new SparkContext(conf)
System.out.println("fourth")
val SIZE = 5000
System.out.println("fifth")
val ma = sc.textFile("gs://sparkfilesjsaray/matr_5000.csv")
.map(line => line.split(",").map(_.toDouble)).zipWithIndex().map(line=> (line._2, BDV(line._1)) )
System.out.println("sixth")
val matrix = new DenseVecMatrix(ma,SIZE,SIZE)
System.out.println("septh")
val inverse = matrix.inverse()
System.out.println("eight")
inverse.saveToFileSystem("gs://sparkfilesjsaray/output5000.csv")
System.out.println("nine")
System.out.println("Done")
System.out.println("first")
}
MatrixSuite.scala:306: type mismatch;
found : Int (2)
required: (Int, Int, Int)
[ERROR] var result = ma.multiply(denVecMat, 2)
^
I think it should be something like:
val result = ma.multiply(denVecMat, (2, 2, 2))
canal
Hello PasaLab,
Thanks for your amazing work.
Can you please update your code in order to work with Spark 2.0 or 1.6.2 at least?
Regards
After trials and errors, I finally made spark and saury work with MKL on my working cluster without su or sudo (I don't have the password for root). Here is the procedures:
Example environment: MKL, spark-1.0.2, saury
Package needed and the download path:
blas: > wget http://www.netlib.org/blas/blas.tgz
cblas: > wget http://www.netlib.org/blas/blast-forum/cblas.tgz
netlib-java: > git clone https://github.com/fommil/netlib-java.git
\0. prepare /lib and /include directory at home
mkdir ~/lib
cd ~/lib
ln -s /opt/intel/mkl/lib/intel64/libmkl_rt.so libblas.so.3
ln -s /opt/intel/mkl/lib/intel64/libmkl_rt.so liblapack.so.3
(symbolic link libblas.so.3 and liblapack.so.3 to libmkl_rt.so)
mkdir ~/include
export LD_LIBRARY_PATH=/home/***/lib
\1. build netlib BLAS
tar zxvf blas.tgz
cd BLAS/
make all
cp ./blas_LINUX.a ~/lib/blas.a
\2. build netlab CBLAS
tar zxvf cblas.tgz
cd CBLAS/
ln -s Makefile.LINUX Makefile.in (this step is required by CBLAS/README, but failed in my installation)
modify Makefile.in:
modify BLLIB, CBLIB, CBDIR (see CBLAS/README for detail)
make all
cp CBLAS/lib/cblas.a ~/lib/
cd CBLAS/include/
cp * ~/include/ (copied cblas_f77.h cblas.h to ~/include/)
\3. build netlib-java to get netlib-native_system-linux-x86_64-natives-1.1.jar, jniloader.jar and native_system-java.jar
cd netlib-java/
sed -i "s/1.2-SNAPSHOT/1.1/g"grep -rl 1.2-SNAPSHOT .
mvn package (build may fail, ignore it)
cd native_system/
mvn package (build may fail, ignore it)
cd xbuilds/
mvn package (build may fail, ignore it)
cd linux-x86_64/
mvn package (build may fail, ignore it)
vi target/netlib-native/com_github_fommil_netlib_NativeSystemBLAS.c
line 36
-- #include <cblas.h>
++ #include "/home/***/include/cblas.h"
cd ../../../netlib/JNI/
vi netlib-jni.c
line 2
-- #include <cblas.h>
++ #include "/home/***/include/cblas.h"
cd - (return to linux-x86_64/)
vi pom.xml
line 78, line 79
-- -lblas
-- -llapack
++ -lmkl_rt
line 54 to line 68
delete 15 lines:
com.github.fommil.netlib
generator
blas
lapack
arpack
mvn package (this build should succeed)
cd target/
ls
you should see netlib-native_system-linux-x86_64-natives.jar
cd lib/
ls
you should see jniloader.jar and native_system-java.jar
\4. build spark-1.0.2 with the jars we just get
reference: http://apache-spark-user-list.1001560.n3.nabble.com/Native-library-can-not-be-loaded-when-using-Mllib-PCA-td7042.html
(1). build the spark assembly once
./make-distribution.sh -Pnetlib-lgpl
(2). copy jniloader.jar, native_system-java-1.1.jar and netlib-native_system-linux-x86_64-1.1-natives.jar to $SPARK_HOME/lib_managed/natives .
(3). copy netlib-native_system-linux-x86_64-1.1-natives.jar to ~/.ivy2/cache/com.github.fommil.netlib/netlib-native_system-linux-x86_64/jars to replace the existing one. make sure the name is consistent with the original one.
(4). modify $SPARK_HOME/assembly/pom.xml add a plugin under build/plugins
com.googlecode.addjars-maven-plugin addjars-maven-plugin 1.0.5 add-jars ${basedir}/../lib_managed/natives (5). rebuild sparkNow you should have spark-1.0.2 with call to MKL as BLAS
Enjoy it!
As for matrix multiplication,how big matrices can you support at most using your current configurations(one 32G master plus sixteen 24G workers)???Looking forward to your reply!
Hi,
I am looking into the code to check if it is feasible to support Complex Double data type for matrix inverse and multiplication.
I realize that you have use a couple of external packages:
BLAS : you use dspr so I can replace it with zspr I presume ?
ARPACK : you use dsaupd and dseupd, I can not find a equivalent method.
Breeze : it supports Complex data type, so should be fine I guess ?
What's your assessment/advice for supporting Complex Double data type ?
many thanks
canal
Do you have any plan update the library to base on spark 1.6?
as the title.
Hello,
Looking at the source code, there is a comment in DenseVecMatrix for the inverse method (line 570: "get the inverse of the triangular matrix"). But I think since we support LU decomposition, we can inverse a non-triangular square matrix right ?
And also, the matrix inverse and multiplication is 'out-of-core' - that means the calculation is not limited to the available physical memory, am I correct ? I have a fairly large matrix (1 million x 1 million, double precision)
thank you for sharing the code,
canal
From what i understand this library has created a completely new BlockMatrix. How does this compare to the BlockMatrix provided by Spark itself[1]. I am trying to understand what more i can gain by using this library for BlockMatrix for BlockMatrix multiplication.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.