Exploration of Data Frame Libraries for Scala

Our main purpose of using a data frame library would be matrix manipulations rather than doing math (like pandas rather than numpy in Python).

[Breeze] (https://github.com/scalanlp/breeze/)

Breeze aims to be the Numpy for Scala. Selecting columns/rows, transposing matrices, joining and slicing matrices and vectors seem simple. Great documentation. Their [cheat sheet] (https://github.com/scalanlp/breeze/wiki/Linear-Algebra-Cheat-Sheet) offer a list of breeze, matlab, numpy, and R commands. Breeze supports csv io.

[Spark] (http://spark.apache.org/docs/latest/sql-programming-guide.html)

Spark has updated their data frames API. While its default data type is parquet, it also supports json io. It supports slicing and joining data frames, but does not support basic linear algebra functions, such as transpose or inverse. It can be easily integrated into Spark's distributed computing and machine learning, which is another benefit.

[Saddle] (https://saddle.github.io/)

Saddle is strongly influenced by Pandas. For a column to have more than one type, it requires extra effort. It supports basic lienar algebra functions like transposing and joining/slicing frames. It does not have a well written documentation.

[Scala-Dataable] (https://github.com/martincooper/scala-datatable)

Inspired by immutable data structures. It is a light-weight data frame library, compared to others. It requires the user to specify type for each column/row. It does not support csv io.

[Distributed DataFrame for Java] (http://ddf.io/)

DDF supports Java, Python, and R. Its syntax is very similar to R, and it claims it can do most things R does. Its main goal is to provide simple API for big-data, and offers easy integration into Spark or Hadoop MapReduce. It isn't clear from the documentation whether DDF offers linear algebra functions or easy data selection.

Nice overviews

[data-frames] (https://darrenjw.wordpress.com/2015/08/21/data-frames-and-tables-in-scala/)
[number=crunching] (https://www.chrisstucchio.com/pubs/slides/thoughtworks_scientific_2014/slides.html#1)

jeenalee / scala-dataframe-libraries Goto Github PK

scala-dataframe-libraries's Introduction

Exploration of Data Frame Libraries for Scala

[Breeze] (https://github.com/scalanlp/breeze/)

[Spark] (http://spark.apache.org/docs/latest/sql-programming-guide.html)

[Saddle] (https://saddle.github.io/)

[Scala-Dataable] (https://github.com/martincooper/scala-datatable)

[Distributed DataFrame for Java] (http://ddf.io/)

Nice overviews

scala-dataframe-libraries's People

Contributors

Stargazers

Watchers

Forkers

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent