Code Monkey home page Code Monkey logo

network_topology_analysis's Introduction

Network Topology Analysis

This repo contains the code used to collect network topology data (using a traceroute script), the Apache Spark code used for real-time analysis, and the Zeppelin notebook used for data visualization.

Background:

A network topology consists of many nodes (or hosts) and edges (connections) that link each of the nodes. In communication systems, there are typically many routes that we can take to get from Point A to Point B.

For example, if you are on your home wifi (Point A) and you request a webpage from Google.com (Point B), then your request will be relayed through many hosts along the route. Each time you make the request, a slightly different path may be used based on how the network is optimized, timeouts, failed nodes, etc.

For communication providers, these failed nodes create a problem. Isolating and resolving the issue is critical, since a failed node reduces performance, may cause downtime, cost money if a truck or person needs to manual troubleshoot a node, etc.

This example focuses on a network topology for telecom, but the process and technology can be extended to any use case that involves a topology or hierarchy of information that needs to be analyzed in real-time.

Technology Stack:

Apache Kafka: Streams in the real-time health status of each device in the network topology.
Apache Spark: Spark Streaming was used to process and analyze the device health status in real-time. I also maintained the device state, or current health status, of all devices using mapWithState so that recompute and roll-ups / aggregations could be performed quickly on the real-time stream.
Apache HBase: The NoSQL distributed database, where all health status values are stored. This enables real-time read/write access to large database tables.
Apache Phoenix: Phoenix is the SQL interface to HBase, allowing SQL syntax to be used on top of the NoSQL DB.
Apache Zeppelin: Browser-based code editor and visualization tool (see screenshots below). Zeppelin has many interpreters, or code plugins, that enable a variety of languages/protocols to be used. These include Python, Spark, HBase, JDBC, Hive, Angular, etc. For this example, Angular was used to produce the Google Maps, by leveraging Zeppelin's front-end Angular API and some tricks (thanks Randerzander) used to bind several backend JS variables to a globally accessible object.

To run this project:

1. Clone this repo
2. Navigate to the docker directory
3. Execute ./run.sh (You'll need to have Docker installed on your machine)
4. Enter the Zeppelin container bash (docker exec -it zeppelin bash)
5. cd SparkNetworkAnalysis
6. Build the Spark streaming project (/apache-maven-3.3.9/bin/mvn clean package)
7. Start the Spark streaming project (/spark/bin/spark-submit --master local[*] --class "SparkNetworkAnalysis" --jars /phoenix-spark-4.8.1-HBase-1.1.jar target/SparkStreaming-0.0.1.jar phoenix.dev:2181 mytestgroup dztopic1 1 kafka.dev:9092)
8. Start the Kafka stream, which will simulate the heath status for each device (docker exec kafka python stream_kafka.py)
9. View the results as a Google Map within Zeppelin (also run interactive queries on data stored in HBase, via Phoenix)
   • Open your browser and go to http://localhost:8079/
   • Select the "Dashboard" notebook
   • Run the notebook, and enter in new IP addresses (POI) as desired.

Screenshot #1: Zeppelin notebook screenshot showing the user-input, where IP addresses (or points of interest) can be entered within the Zeppelin. This input is fed into a Spark job that fetches the data from HBase, performs data processing, then feeds the results to angular where it is rendered within Google Maps.




Screenshot #2: Zeppelin notebook screenshot showing the IP traceroute from my home wifi in Raleigh to Google.com servers (in Mountain View, CA).



References:
Apache Zeppelin - Angular (front-end API)
Apache Zeppelin - Angular (back-end API)
Randerzander's Data Apps in Zeppelin
Apache Spark - mapWithState

network_topology_analysis's People

Contributors

zaratsian avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.