group-anomaly-detection's Introduction

GRAND

GRAND: Group-based Anomaly Detection for Self-Monitoring of Complex Systems

This is an implementation a group-based anomaly detection method. It allows to autonomously monitor a system (unit) that generates data over time, by comparing it against its own past history, or against a group of other similar systems (units). It detects anomalies/deviations in a streaming fashion while accounting for concept drift which is due to external factors that can affect the data.

Install

You can install the package locally (for use on your system), with:

$ pip install .

You can also install the package with a symlink, so that changes to the source files will be immediately available to other users of the package on our system:

$ pip install -e .

Examples

For intuitive examples and explanations, please check the Jupyter notebook at ./examples/notebooks/examples.ipynb

Usage

The library can perform individual anomaly detection (where a system is compared against it's own past history), or group-based anomaly detection (where the system's behaviour is modeled and compared against the behaviour of other systems). Please check the examples in the ./examples directory. Here is a brief explanation for the group-based anomaly detection:

First, you need to import GroupAnomaly and create an instance:

from grand import GroupAnomaly

# Create an instance of GroupDeviation
gdev = GroupAnomaly(  nb_units=19,                # Number of units (vehicles)
                        ids_target_units=[0, 1],    # Ids of the (target) units to diagnoise
                        w_ref_group="7days",        # Time window for the reference group
                        w_martingale=15,            # Window size for computing the deviation level
                        non_conformity="median",    # Non-conformity (strangeness) measure: "median" or "knn"
                        k=50,                       # Used if non_conformity is "knn"
                        dev_threshold=.6)           # Threshold on the deviation level

An example of data is provided. It contains data from 19 units (consisting of vehicles). Each csv file contains data from one unit (vehicle). To use this example data, you can import the load_vehicles function from cosmo.datasets as folows:

from grand.datasets import load_vehicles

# Streams data from several units (vehicles) over time
dataset = load_vehicles()
nb_units = dataset.get_nb_units() # nb_units = 19 vehicles in this example

# dataset.stream() can then be used as a generator
# to simulate a stream (see example below)

# dataset.stream_unit(uid) can also be used to generate 
# a stream from one unit identified by index (e.g. uid=0)

A streaming setting is considered where data (indicated as x_units in the example below) is received from the units at each time step dt. The method GroupDeviation.predict(uid, dt, x_units) is then called each time to diagnoise the test unit indicated by the index uid (i.e. the data-point received from this unit at time dt is x_units[uid]). The predict method returns a list of DeviationContext objects. Each DeviationContext object contains the following information:

a strangeness score : the non-conformity of the test unit to the other units).
a p-value (in [0, 1]) : the ratio of data from other units which are stranger than the test unit's data.
an updated devaliation level (in [0, 1]) for the test unit.
a boolean is_dev indicating if the test unit is significantly deviating from the group.

# At each time dt, x_units contains data from all units.
# Each data-point x_units[i] comes from the i'th unit.

for dt, x_units in dataset.stream():
    
    # diagnoise the selected target units (0 and 1)
    devContextList = gdev.predict(dt, x_units)
    
    for uid, devCon in enumerate(devContextList):
        print("Unit:{}, Time: {} ==> strangeness: {}, p-value: {}, deviation: {} ({})".format(uid, dt, devCon.strangeness, 
        devCon.pvalue, devCon.deviation, "high" if devCon.is_deviating else "low"))

# Plot p-values and deviation levels over time
gdev.plot_deviations()

group-anomaly-detection's People

Contributors

Stargazers

Watchers

group-anomaly-detection's Issues

What is the expected input of IndividualAnomalyTransductive()?

I succesfully installed your package and manage to run through the example. However, when trying with simulated data I get an error message.

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from datetime import datetime

from grand import IndividualAnomalyInductive, IndividualAnomalyTransductive, GroupAnomaly

df = pd.read_csv("https://raw.githubusercontent.com/Ferrologic/simulated-data/master/simulated_data.csv", parse_dates=True, header = 0)

df["timestamp"] = pd.to_datetime(df['timestamp'])

df.columns = ["timestamp", "value"]

df.plot(x = "timestamp", y = "value")

model = IndividualAnomalyTransductive(ref_group = ["day-of-week"], w_martingale = 100)

for t, x in zip(df.index, df.values):
    info = model.predict(t, x)
    print("Time: {} ==> strangeness: {}, deviation: {}".format(t, info.strangeness, info.deviation), end="\r")

And the error message:

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
 in 
      2 
      3 for t, x in zip(df.index, df.values):
----> 4     info = model.predict(t, x)
      5     print("Time: {} ==> strangeness: {}, deviation: {}".format(t, info.strangeness, info.deviation), end="\r")

/anaconda3/envs/myenv/lib/python3.7/site-packages/grand/individual_anomaly/individual_anomaly_transductive.py in predict(self, dtime, x, external)
     87 
     88         self.T.append(dtime)
---> 89         self._fit(dtime, x, external)
     90 
     91         strangeness, diff, representative = self.strg.predict(x)

/anaconda3/envs/myenv/lib/python3.7/site-packages/grand/individual_anomaly/individual_anomaly_transductive.py in _fit(self, dtime, x, external)
    146             df_sub = self.df.append(self.df_init)
    147             for criterion in self.ref_group:
--> 148                 current = dt2num(dtime, criterion)
    149                 historical = np.array([dt2num(dt, criterion) for dt in df_sub.index])
    150                 df_sub = df_sub.loc[(current == historical)]

/anaconda3/envs/myenv/lib/python3.7/site-packages/grand/utils.py in dt2num(dt, criterion)
     53         elif criterion == "season-of-year":
     54             season = {12: 1, 1: 1, 2: 1, 3: 2, 4: 2, 5: 2, 6: 3, 7: 3, 8: 3, 9: 4, 10: 4, 11: 4}
---> 55             return season[dt.month]
     56         else:
     57             raise InputValidationError("Unknown criterion {} in ref_group.".format(criterion))

AttributeError: 'int' object has no attribute 'month'

What is the expected input in IndividualAnomalyTransductive() and is there any specific documentation besides the example Notebook?

Recommend Projects

caisr-hh / group-anomaly-detection Goto Github PK

group-anomaly-detection's Introduction

GRAND

Install

Examples

Usage

group-anomaly-detection's People

Contributors

Stargazers

Watchers

Forkers

group-anomaly-detection's Issues

What is the expected input of IndividualAnomalyTransductive()?

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent