Code Monkey home page Code Monkey logo

digivigi's Introduction

DigiVigi

Project Description: DigiVigi is a 'DNIF Open Source' project which simply tries exhibiting a "How To?" process of analyzing real-time data inside DNIF from start to finish.


Tool

DNIF - Open Big Data Analytics Platform (Free Forever Version)


Other Support Tools/ Software

- Virtual Box
- JetBrains: PyCharm Community Edition
- Ubuntu 16.04 or above
- Docker
- Postman
- AlwaysUp (Trial Version)

Project Sketch

The execution of project is carried out in two process-phase. Its just a procedural way based on the diagram from issue #1

PROCESS 1: Refer Issue #1

Stage 1:

  • Select & understand data-set from a domain of interest.

Stage 2:

  • Understand DNIF platform and its capabilities limited to project scope.
  • Here's a link to DNIF's complete documentation: https://dnif.it/docs/

Stage 3:

  • Get the static data-set inside DNIF platform by following the guidelines mentioned on the website.
  • One can use a ready made dataset(json, csv or excel) or create one by scripting it, like writing a web scrapper code.

Stage 4:


NOTE:Process 1 will ensure you get a good grasp of executing the project on a more fundametal level, before moving on to advanced level.


PROCESS 2: Refer Issue #1

Stage 1:

  • Select & understand data-set from a domain of interest. Dataset has to be dynamic this time. Meaning it can update over a period of min(s), hour(s), day(s), week(s) ... The selection should be careful because in a later stage this might count when it comes down to your machine processing speed (CPU, RAM, Disk Space) - - - Check out the pre-requisites on https://dnif.it/docs/guides/getting-started/prerequisites.html

Stage 2:

  • Understand DNIF platform and its capabilities limited to project scope.
  • Here's a link to DNIF's complete documentation: https://dnif.it/docs/

Stage 3:

  • Fetch the data continuously from the source selected in stage one and store/ update it in a file (ex; mydata.csv)
  • For this one can write a code to do so or using an existing one from the repo.
  • This file will get updated continuously depending on your scheduler/cron job frequency or any other way of your choice.

Stage 4:

  • The data captured needs to be fed to DNIF for it to be analyzed and played with.
  • Use DNIF API which does this job for you and use the existing unified code from the repo - under Process 2 folder; a file called "SourceToDnif.py".

Stage5:


Diagrammatic Representation

Process 1:

process_1_dnif

Process 2:

process_2_dnif

Additional Credits to: SOC18-Genesis 39711316-09db6e06-523d-11e8-8975-175ccc03622d


Data Set Used In Tutorial Guides:

Webiron Feeds: It provides a comprehensive managed security service that will keep your web servers safe from harm. Webiron's intelligent technology is designed to immediately detect, block and prevent automated bot and malware attacks.

Key Metrics Abuse e-mail feed contains a log of our abuse reports and status of the issue reported. This feed is filterable by e-mail address, IP address, or ASN number. This is the master feed for the Twitter “bad abuse” feed and is pulled from live data.

Fields Descriptions:

Field Description
Log Entry Type Contains the action. This is either, report sent, report opened, report or if the host has replied with a resolved statement.
Log Time Time action was done.
Attacker IP The IP reported for issues (lookup link forwards to IP lookup page). The “IP” link filters the feed by the IP while the “lookup” provides more detailed information on the IP
Logged E-Mails These are either a list of e-mail addresses reported to for the attacker IP or the address that responded to a resolved or opened event. Clicking on an e-mail will filter the feed by that e-mail address.
Log Message The list of issues reported or an action message.
Deliverable Was the e-mail accepted by the host?
Days Unresolved The number of days the issue since the issue was reported to the host.
Incidents Reported The number of incidents reported. Some bots use thousands of nodes rather than heavier concentrations from fewer hosts. The damages are the same however.

Hey! Do you want to stop coming to the repository & get all the project files on your system?

There's only one thing you'll need to do. Click on the "clone or download" button and get that ZIP file.

Here's a Video Link to get going with Installation Part https://youtu.be/ddpfh5sHMtA

THANKS FOR VISITING

digivigi's People

Contributors

aakratisahu avatar prashant-sawant avatar sharbanibasu23 avatar shomiron avatar

Stargazers

 avatar

Watchers

 avatar  avatar  avatar

digivigi's Issues

installing DNIF on EC2 instance

I am trying to install the DNIF on EC2 instance of AWS for real-time streaming of our server logs. But get the problem in it. Is it due to less space in that instance.

While configuring SMTP I am getting error as 'SMTP Configuration failed due to string indices must be integers, not str'.

Hi Everyone,
I am facing an issue while configuring SMTP in DNIF container. As @PRASHANT-SAWANT has already configured this I was following the steps mentioned by him #6. As mentioned in the steps, in smtp.yaml file I commented the internal settings and uncommented the external settings. I changed the external settings as follows-
domain: smtp.gmail.com
username: email id1
password: email id1 password
port: 587
tls: 1
from: email id1
to: email id2

After editing I saved the changes and ran the command $python configsmtp.py. But I am getting an error as 'SMTP Configuration failed due to string indices must be integers, not str'. I also opened the configsmtp.py file. But not being able to understand for which string object it is giving this error. If anyone of you have faced similar issue or have any solution for this please comment here.

Diagrammatic Overview of Project

Can anyone post a diagrammatic view of our project? It would help have a clear picture in mind and just so we are on the same page. I am working on one; will post it as soon as possible.

Need a web scrapper to capture tabular data

This is with reference to PROCESS 1 - Stage 3 given in README file.

Does anyone have a web scrapper that can scrape off data in tabular format on html pages and save it in a csv file.
Also need the header of this csv file to be custom edited such that the header names should start as '$headername' - this is a mandatory requirement.

Also please add comments to your code so its better to understand. Thanks. 👍

Dataset Hunting

Was just going through a set of data - mostly dynamic in nature and contextually cyber space inclined.
If there's more sources please add it to this discussion list. I'll soon create document featuring the list of probable sources we could work on in future.

Anyone is free to add to that doc by the way. Peace!

DNIF installation on MAC

I have installed docker on mac machine. Then I execute "sudo docker-compose up" command. It shows me the ASCII of DNIF which is a sign of completion of the DNIF installation. But after that when I tried to connect with the source IP on [https://go.dnif.it/] . It is not connecting. But in case of Ubuntu machine, it works fine

Is there any specific issue for mac machine?

AWS log forwarding

Hello Team,

I have hosted one site on AWS which in node js and MongoDB. I want to forward the logs of the site to dnif.

Have anyone any idea regarding it??

Redundant code in repository

@Sharbanibasu23 @shreyaskulkarni412 @aakratisahu

I'm removing the redundant code from the repository.
I had originally kept PS_ConvertCsvToJson, PS_DailyDataFetch & PS_PostToAPI for the coding principle known as separation of concern.

But since the code is already merged and split into sections inside the script named - SourceToDnif, one could easily copy-paste it from there as required.

This removal was done to avoid further confusion and keep it simple.
P.S: Please pull the newer repository, locally.

Scheduling Repository Scripts

Hey Team Mates,

Does anyone know the right and easy way to schedule python scripts in Ubuntu, on a daily basis?

Need a quick one here.

Scrapping only updated data from source website

Hey,
This is with regards to Process 2 - Stage 3.
After scrapping data from a source website, the next challenge was to fetch only that data which comes anew on a daily basis, as well as keep the old data.

I'm thinking of comparing date fields from source dataset and the system date to append only the new record to an existing csv. Hope it works out well.

Scraping through paged html tables

This is a good second issue, but was never posted here. But the solution is here for finding.
@Sharbanibasu23 Thanks. And really great job. I ran that script SB_WebScrapper_Using_Pandas_And_PageCrawling.py

I'd like to say its an excellent solution as well. We can further add to the code to send this data via POST request to DNIF Web API. And could fetch using the "event" data model.
This could be used for analysis of present and historical data for making predictions as well.

META ISSUE: Additional Series of Issues Faced

Hello All,

@aakratisahu @shreyaskulkarni412 @Sharbanibasu23
This here is just a heads-up message which describes what - "META ISSUE: Additional Series of Issues Faced" is.

So, each of us have faced some minor/ individual issues which we did not think of as important at that time of occurrence but could be a hindrance for any person encountering them. Since our major purpose is to make this project as simple as possible, this sure does serve the purpose.

This is what we will be doing:

  1. Every Issue will start with a naming convention. And the naming convention will be such;
    Issue#1:[Description of Issue in short]
  2. These issues could be some which were raised on dnifHQ as well. But we'll have to frame them properly and provide links.

Have created an example issue which is a valid one, please check Issue#1: Log-In Problem onto DNIF Web Console

Post Script: This is just a description of what we will be doing. Do Not Close this Issue - It will be the last to be closed when the project ends.

Ubuntu 16.04 - Network Device Not Managed

Hi,
Please help me out with this issue. Stuck with this for a very long time now. It is stopping me from working any further.

Environment:
Windows -> VBox -> Ubuntu -> A Docker Container

Issue Arised When:
Did a requests.post(....) from my Ubuntu IP to it self [IP(A) -> IP(A)].
Soo... what happens is, my network manager says, device is not managed.
Cannot connect to internet. Cannot ping my self.

One time fix:
Changed - managed=false to true in NetworkManager.conf
But the it did not work out much.

Seems like my networking utilities are confused as to who is my DNS or its getting confused as to which network config to follow. (Although i did only one config. But the network manager did some overwriting for me or something when i did that post request)

Any solutions?
Anyone stumbled across this?

Configure SMTP to Send Alerts

Hey,
Could somebody configure SMTP inside DNIF container to send alerts from DNIF Web Console to our email Ids AND then document the exact detailed steps from Configuring TO "How To Send Alerts" since we're working on a similar environment?

@aakratisahu @Sharbanibasu23 Could you check this out? Since you're ahead in querying datasets comprehensively.
This could speed things up for us in the long run when we're nearing project completion.

Capturing "How To?" of Process 1

Hi Guys,
Let us speed things up.
Currently Me and @Sharbanibasu23 are looking into the SMTP setup thing.

So maybe anyone of could volunteer to document the "How To?" of Process 1 of our project.
This will serve for future references.

We could have screenshots, brief informational steps, table of contents (for structuring this short process) Also we could use references in this "How To?" document of our other repository documents like installation docx, or analysis docx , or DNIF Website DOCS.

@shreyaskulkarni412 @aakratisahu ???

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.