Code Monkey home page Code Monkey logo

big-data-processing-with-apache-spark-elearning's Introduction

GitHub issues GitHub forks GitHub stars PRs Welcome

Big Data Processing with Apache Spark eLearning

Processing big data in real-time is challenging due to scalability, information consistency, and fault tolerance. This course shows you how you can use Spark to make your overall analysis workflow faster and more efficient. You'll learn all about the core concepts and tools within the Spark ecosystem, like Spark Streaming, the Spark Streaming API, machine learning extension, and structured streaming.

What you will learn

  • Write your own Python programs that can interact with Spark
  • Implement data stream consumption using Apache Spark
  • Recognize common operations in Spark to process known data streams
  • Integrate Spark streaming with Amazon Web Services
  • Create a collaborative filtering model with Python and the movielens dataset
  • Apply processed data streams to Spark machine learning APIs

Hardware requirements

For an optimal student experience, we recommend the following hardware configuration:

  • Processor: Intel Core i5 or equivalent
  • Memory: 4 GB RAM
  • Hard disk: 40 GB available space
  • An Internet connection

Software requirements

You’ll also need the following software installed in advance:

  • Operating System: Windows 7 SP1 64-bit, Windows 8.1 64-bit or Windows 10 64-bit
  • Browser: Google Chrome, Latest Version
  • PostgreSQL 9.0 or above
  • Spark 2.3.0
  • Amazon Web Services (AWS) account

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.