Code Monkey home page Code Monkey logo

bigdata_stack's Introduction

Big Data Stack

Big data stack running in pseudo-distributed mode with the following components:

  • Hadoop 2.8.5
  • Minio RELEASE.2019-10-12T01-39-57Z
  • Hive 2.3.6
  • Presto 326
  • Superset 0.35.1
  • Hue 4.5.0

For more details see the following post.

Quick start

Clone the repository and create .env file based on sample.env making sure DATADIR points to a suitable directory (persistent storage for all containers). Bring up the base stack:

docker-compose up -d

If you also want to start Superset and Hue, then run:

docker-compose -f superset/docker-compose.yml up -d
docker-compose -f hue/docker-compose.yml up -d

and initialize:

./scripts/init-hue.sh
./scripts/init-superset.sh

The stack should now be up and running and the following services available:

Contents

The stack uses update/modified Docker images from Big Data Europe, shawnzhu, and Cloudera. See Dockerfiles for details.

All needed images are on Docker Hub, but if you want to build the updated/modified images yourself, just run build-local.sh in the different sub-directories.

Changes compared to original images:

  • Hadoop updated to version 2.8.5
  • Hive update to version 2.3.6
  • S3 support added
  • Presto update to 326
  • Presto JDBC driver added to Hue

The scripts directory contains some helper scripts:

  • beeline.sh: Launch Beeline (Hive CLI) in Hive container
  • hadoop-client.sh: Start container with Hadoop utilities (host filesystem mounted as /host). Useful for moving files to HDFS.
  • init-hue.sh: Create admin home folder in HDFS in order to avoid error in Hue File Browser.
  • init-superset.sh: Initialize Superset database and add Presto as data source
  • presto-cli.sh: Launch Presto CLI (downloads jar if needed)

bigdata_stack's People

Contributors

johannestang avatar hungunicorn avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.