Code Monkey home page Code Monkey logo

marlin's Introduction

Marlin

A distributed matrix operations library build on top of Spark. Now, the master branch is in version 0.2-SNAPSHOT.

##Prerequisites As Marlin is built on top of Spark, you need to get the Spark installed first. If you are not clear how to setup Spark, please refer to the guidelines here. Currently, Marlin is developed on the APIs of Spark 1.0.x version.

##Compile Marlin We have offered a default build.sbt file, make sure you have installed sbt, and you can just type sbt package to get a package, or type sbt assembly to get a assembly jar.

Note: In build.sbt file, the default Spark Version is 1.0.1, and the default Hadoop version is 2.3.0, you can modify the build.sbt file to fit your environment.

Note: Version of breeze in Spark 1.1.0 is 0.9 .

##Run Marlin We have already offered some examples in edu.nju.pasalab.marlin.examples to show how to use the APIs in the project. For example, if you want to run two large matrices multiplication, use spark-submit method, and type in command

$./bin/spark-submit \
 --class edu.nju.pasalab.marlin.examples.MatrixMultiply
 --master <master-url> \
 --executor-memory <memory> \
 marlin_2.10-0.2-SNAPSHOT.jar \
 <matrix A rows> <martrix A columns> \
 <martrix B columns> <cores cross the cluster> <output path>

Note: Because the pre-built Spark-assembly jar doesn't have any files about netlib-java native compontent, which means you cannot use the native linear algebra library e.g BLAS to accelerate the computing, but have to use pure java to perform the small block matrix multiply in every worker. We have done some experiments and find it has a significant performance difference between the native BLAS computing and the pure java one, here you can find more info about the performance comparison and how to load native library.

Note: this example use MTUtils.randomDenVecMatrix to generate distributed random matrix in-memory without reading data from files.

Note: <cores cross the cluster> is the num of cores across the cluster you want to use.

Note: <output path> is the file path you want to store the result matrix, this matrix is store in DenseVecMatrix Type

##Martix Operations API in Marlin Currently, we have finished some APIs, you can find documentation in this page.

##Algorithms and Performance Evaluation The details of the matrix multiplication algorithm is here.

###Performance Evaluation We have done some performance evaluation of Marlin. It can be seen here.

##Contact gurongwalker at gmail dot com

myasuka at live dot com

marlin's People

Contributors

myasuka avatar ronggu avatar xingkungao avatar wangzk avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.