Code Monkey home page Code Monkey logo

scala-dataframe-libraries's Introduction

Exploration of Data Frame Libraries for Scala

Our main purpose of using a data frame library would be matrix manipulations rather than doing math (like pandas rather than numpy in Python).

Breeze aims to be the Numpy for Scala. Selecting columns/rows, transposing matrices, joining and slicing matrices and vectors seem simple. Great documentation. Their [cheat sheet] (https://github.com/scalanlp/breeze/wiki/Linear-Algebra-Cheat-Sheet) offer a list of breeze, matlab, numpy, and R commands. Breeze supports csv io.

Spark has updated their data frames API. While its default data type is parquet, it also supports json io. It supports slicing and joining data frames, but does not support basic linear algebra functions, such as transpose or inverse. It can be easily integrated into Spark's distributed computing and machine learning, which is another benefit.

Saddle is strongly influenced by Pandas. For a column to have more than one type, it requires extra effort. It supports basic lienar algebra functions like transposing and joining/slicing frames. It does not have a well written documentation.

Inspired by immutable data structures. It is a light-weight data frame library, compared to others. It requires the user to specify type for each column/row. It does not support csv io.

[Distributed DataFrame for Java] (http://ddf.io/)

DDF supports Java, Python, and R. Its syntax is very similar to R, and it claims it can do most things R does. Its main goal is to provide simple API for big-data, and offers easy integration into Spark or Hadoop MapReduce. It isn't clear from the documentation whether DDF offers linear algebra functions or easy data selection.

Nice overviews
  1. [data-frames] (https://darrenjw.wordpress.com/2015/08/21/data-frames-and-tables-in-scala/)
  2. [number=crunching] (https://www.chrisstucchio.com/pubs/slides/thoughtworks_scientific_2014/slides.html#1)

scala-dataframe-libraries's People

Contributors

jeenalee avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

Forkers

stjordanis

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.