Code Monkey home page Code Monkey logo

group-anomaly-detection's Introduction

GRAND

GRAND: Group-based Anomaly Detection for Self-Monitoring of Complex Systems

This is an implementation a group-based anomaly detection method. It allows to autonomously monitor a system (unit) that generates data over time, by comparing it against its own past history, or against a group of other similar systems (units). It detects anomalies/deviations in a streaming fashion while accounting for concept drift which is due to external factors that can affect the data.

Install

You can install the package locally (for use on your system), with:

$ pip install .

You can also install the package with a symlink, so that changes to the source files will be immediately available to other users of the package on our system:

$ pip install -e .

Examples

For intuitive examples and explanations, please check the Jupyter notebook at ./examples/notebooks/examples.ipynb

Usage

The library can perform individual anomaly detection (where a system is compared against it's own past history), or group-based anomaly detection (where the system's behaviour is modeled and compared against the behaviour of other systems). Please check the examples in the ./examples directory. Here is a brief explanation for the group-based anomaly detection:

First, you need to import GroupAnomaly and create an instance:

from grand import GroupAnomaly

# Create an instance of GroupDeviation
gdev = GroupAnomaly(  nb_units=19,                # Number of units (vehicles)
                        ids_target_units=[0, 1],    # Ids of the (target) units to diagnoise
                        w_ref_group="7days",        # Time window for the reference group
                        w_martingale=15,            # Window size for computing the deviation level
                        non_conformity="median",    # Non-conformity (strangeness) measure: "median" or "knn"
                        k=50,                       # Used if non_conformity is "knn"
                        dev_threshold=.6)           # Threshold on the deviation level

An example of data is provided. It contains data from 19 units (consisting of vehicles). Each csv file contains data from one unit (vehicle). To use this example data, you can import the load_vehicles function from cosmo.datasets as folows:

from grand.datasets import load_vehicles

# Streams data from several units (vehicles) over time
dataset = load_vehicles()
nb_units = dataset.get_nb_units() # nb_units = 19 vehicles in this example

# dataset.stream() can then be used as a generator
# to simulate a stream (see example below)

# dataset.stream_unit(uid) can also be used to generate 
# a stream from one unit identified by index (e.g. uid=0)

A streaming setting is considered where data (indicated as x_units in the example below) is received from the units at each time step dt. The method GroupDeviation.predict(uid, dt, x_units) is then called each time to diagnoise the test unit indicated by the index uid (i.e. the data-point received from this unit at time dt is x_units[uid]). The predict method returns a list of DeviationContext objects. Each DeviationContext object contains the following information:

  1. a strangeness score : the non-conformity of the test unit to the other units).
  2. a p-value (in [0, 1]) : the ratio of data from other units which are stranger than the test unit's data.
  3. an updated devaliation level (in [0, 1]) for the test unit.
  4. a boolean is_dev indicating if the test unit is significantly deviating from the group.
# At each time dt, x_units contains data from all units.
# Each data-point x_units[i] comes from the i'th unit.

for dt, x_units in dataset.stream():
    
    # diagnoise the selected target units (0 and 1)
    devContextList = gdev.predict(dt, x_units)
    
    for uid, devCon in enumerate(devContextList):
        print("Unit:{}, Time: {} ==> strangeness: {}, p-value: {}, deviation: {} ({})".format(uid, dt, devCon.strangeness, 
        devCon.pvalue, devCon.deviation, "high" if devCon.is_deviating else "low"))

# Plot p-values and deviation levels over time
gdev.plot_deviations()

group-anomaly-detection's People

Contributors

caisr-hh avatar mohamed-rafik-bouguelia avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

group-anomaly-detection's Issues

What is the expected input of IndividualAnomalyTransductive()?

I succesfully installed your package and manage to run through the example. However, when trying with simulated data I get an error message.

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from datetime import datetime

from grand import IndividualAnomalyInductive, IndividualAnomalyTransductive, GroupAnomaly

df = pd.read_csv("https://raw.githubusercontent.com/Ferrologic/simulated-data/master/simulated_data.csv", parse_dates=True, header = 0)

df["timestamp"] = pd.to_datetime(df['timestamp'])

df.columns = ["timestamp", "value"]

df.plot(x = "timestamp", y = "value")

model = IndividualAnomalyTransductive(ref_group = ["day-of-week"], w_martingale = 100)

for t, x in zip(df.index, df.values):
    info = model.predict(t, x)
    print("Time: {} ==> strangeness: {}, deviation: {}".format(t, info.strangeness, info.deviation), end="\r")

And the error message:

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
 in 
      2 
      3 for t, x in zip(df.index, df.values):
----> 4     info = model.predict(t, x)
      5     print("Time: {} ==> strangeness: {}, deviation: {}".format(t, info.strangeness, info.deviation), end="\r")

/anaconda3/envs/myenv/lib/python3.7/site-packages/grand/individual_anomaly/individual_anomaly_transductive.py in predict(self, dtime, x, external)
     87 
     88         self.T.append(dtime)
---> 89         self._fit(dtime, x, external)
     90 
     91         strangeness, diff, representative = self.strg.predict(x)

/anaconda3/envs/myenv/lib/python3.7/site-packages/grand/individual_anomaly/individual_anomaly_transductive.py in _fit(self, dtime, x, external)
    146             df_sub = self.df.append(self.df_init)
    147             for criterion in self.ref_group:
--> 148                 current = dt2num(dtime, criterion)
    149                 historical = np.array([dt2num(dt, criterion) for dt in df_sub.index])
    150                 df_sub = df_sub.loc[(current == historical)]

/anaconda3/envs/myenv/lib/python3.7/site-packages/grand/utils.py in dt2num(dt, criterion)
     53         elif criterion == "season-of-year":
     54             season = {12: 1, 1: 1, 2: 1, 3: 2, 4: 2, 5: 2, 6: 3, 7: 3, 8: 3, 9: 4, 10: 4, 11: 4}
---> 55             return season[dt.month]
     56         else:
     57             raise InputValidationError("Unknown criterion {} in ref_group.".format(criterion))

AttributeError: 'int' object has no attribute 'month'

What is the expected input in IndividualAnomalyTransductive() and is there any specific documentation besides the example Notebook?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.