Code Monkey home page Code Monkey logo

ddf's Introduction

DDF - Distributed DataFrame

DDF aims to make Big Data easy yet powerful, by bringing together the best ideas from R Data Science, RDBMS/SQL, and Big Data distributed processing.

It exposes high-level abstractions like RDBMS tables, SQL queries, data cleansing and transformations, machine-learning algorithms, even collaboration and authentication, etc., while hiding all the complexities of parallel distributed processing and data handling.

DDF is a general abstraction that can be implemented on multiple execution and data engines. We are providing a native implementation on Apache Spark, as it is today the most expressive in its DAG parallelization and also most powerful in its in-memory distributed dataset abstraction (RDD). With this release, DDF provides native Spark support for R, Python, Java, Scala.

An aim of the DDF project is to shine a focus of Big Data conversations on top-down, user-focussed simplicity and power, where "users" include business analysts, data scientists, and high-level Big Data engineers.


Directory Structure

Directory Description
bin useful helper scripts
exe DDF execution/launch scripts and executables
conf DDF configuration files
clients DDF client code, e.g., R, Python, etc.
contrib Contributed DDF code that has not/does not fit into the core API
core DDF core API
spark DDF Spark implementation
examples DDF example API-user code
project Scala build config files

Getting Started

First clone or fork a copy of DDF, e.g.:

$ git clone http://git.adatao.com/DDF 

Now you need to prepare the build, which prepares the libraries, creates pom.xml in the various sub-project directories, and Eclipse .project and .classpath files.

$ cd DDF
$ bin/run-once.sh

If you ever need to regenerated the pom.xml files:

$ bin/make-poms.sh

The following regenerates Eclipse .project and .classpath files:

$ bin/make-eclipse-projects.sh

Building DDF_core or DDF_spark

$ (cd core ; mvn clean package)
$ (cd spark ; mvn clean package)

Running tests

$ bin/sbt test

or

$ (cd core ; mvn test)
$ (cd spark ; mvn test)

ddf's People

Contributors

huandao0812 avatar khangich avatar binhmop avatar nhanitvn avatar ctn avatar ckbui avatar cmpitg avatar adatao-git avatar

Stargazers

The Vinh LUONG (LƯƠNG Thế Vinh) avatar

Watchers

Hai-Anh Trinh avatar Tri Le avatar Bao Nguyen avatar  avatar Khoa  avatar James Cloos avatar Thang Tran avatar  avatar Phan Hong An avatar  avatar  avatar  avatar Taejin Chun avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.