Code Monkey home page Code Monkey logo

log_analyser's Introduction

Log_Analyser

Using Big Data Technologies to analyse application logs as per requirement

Maximo Application when deployed in a clustered environment and functioning on multiple Java Virtual Machines can lead to generation of many log files everyday. These log files based on logging level can range from 5 MB-800 MB in size. Based on our requirement and need to debug certain issues, we may need to go through these log files to detect anomalies. But going through any 100 mb log file even though for a particular time duration can be draining. This is because a lot of services write to log files simultaneously. For a need of easy extraction and saving manual hours I came up with using Spark application to perform this task.

Objective : In the attached spark job, I have tried to extract SQL statements logged in the application that take more than 3 seconds(customizable limit in maximo) to execute. Outcome : Using this Data , we can extract the sql statements in excel sheet , sorted in the order of maximum time taken. This order will help us to identify most resource intensive SQL statement which in turn can point us in the direction of optimising SQL queries, hence improving the performance of the system

About Spark:

Apache Spark is a open source framework for real time data analytics in a distributed computing environment. It executes in-memory computations to increase speed of data processing. It is faster for processing large scale data as it exploits in-memory computations and other optimizations. Therefore, it requires high processing power. Resilient Distributed Dataset (RDD) is a fundamental data structure of Spark. It is an immutable distributed collection of objects. Each dataset in RDD is divided into logical partitions, which may be computed on different nodes of the cluster. RDDs can contain any type of Python, Java, or Scala objects, including user-defined classes. Spark components make it fast and reliable. Following image shows the different spark components.

Spark performs its operations on RDDs. There are two types of operations: Transformation and Action. Transformation is functions like map, reduce, filter (functions used often in python functional programming). Actions is functions used to count or collect(print) the data. Whenever a transformation is performed on RDD, Spark doesn’t execute it right away. Instead it creates a DAG (Directed Acyclic Graph) for these transformations and the following transformations. The RDD is evaluated only when the action operation is called on it. This is also called as Lazy evaluation which makes it good for spark to perform optimization once it has the chance to look at the RDD in its entirety.

More info : https://spark.apache.org/

About Maximo:

IBM Maximo® optimizes performance, extends asset life cycles, and reduces operational downtime and costs. OT and IT leaders can use Maximo to operate high-value physical assets with visibility and control across the enterprise. Streamline your global operations from procurement to contract management, and manage costs through a subscription-based model. Maintain all asset types no matter where they reside, set up new assets quickly and upgrade enterprise asset management (EAM) software automatically for nonstop uptime, reduced costs and minimized risk. Source : https://www.ibm.com/in-en/products/maximo

log_analyser's People

Contributors

ashwini130 avatar

Watchers

James Cloos avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.