Code Monkey home page Code Monkey logo

playlogspark's Introduction

PlayLogSpark

Log (structured) analysis

IMPORTANT ASSUMTION::

For the purpose of demonstration, the application is configured to run in the standalone (single node) mode. In other words all the reads/writes are from/to the local file system. But it can easily be configured for any distributed file system such as HDFS.

Directory structure:

  1. In the spark bin directory please create the subdirectory "test". Inside test create three sub directories "app","input" and "result". <>The app directory is for application jar. Please put the pop_test.jar in this directory. <>The input directory is for the input files, please put raw_pop.json and campaign.csv in this directory. <>The result directory is for result files. The enrich_pop.json and aggregate_pop.json files will be generated in this location.

Running the program:

Please launch the program using spark-submit:

spark-submit --class test.PlayLogTest test/app/pop-test.jar YYYY-MM-DD

Example: spark-submit --class test.PlayLogTest test/app/pop-test.jar 2017-03-29

The above command will process only the logs after 2017-03-29 (inclusive) and will genereate enrich_pop.json and aggregate_pop.json files.

Addition information:

  1. There are optimization scopes in terms of the usage of partitionedMaps, worker thread, JSON DeSer usage etc to be explored in the clustered setup.

playlogspark's People

Contributors

chinmaymohanta avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.