Code Monkey home page Code Monkey logo

guansongpang / deviation-network Goto Github PK

View Code? Open in Web Editor NEW
135.0 4.0 53.0 37.01 MB

Source code of the KDD19 paper "Deep anomaly detection with deviation networks", weakly/partially supervised anomaly detection, few-shot anomaly detection, semi-supervised anomaly detection

License: GNU General Public License v3.0

Python 100.00%
anomaly-detection deep-learning weakly-supervised-learning semi-supervised-learning outlier-detection few-shot-learning

deviation-network's People

Contributors

guansongpang avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

deviation-network's Issues

Interpretability of anomaly scores

In terms of interpretability of the anomaly scores, can we also get the contribution of each feature importances of the datasets (columns) from the Z-Score-based deviation loss ?

Reproduce for a different dataset

Hello,
I have a 1d time series dataset in the format (x, y, z), where x = number of samples, y = number of timesteps and z = 1 (dimensions).
How am I able to run your code with my dataset? Where do I need to make changes?

Infinite loop when generating the batches

I noticed that there is an infinite loop in this function batch_generator_sup() and I don't fully understand the logic of generating the batches and injecting the noise. Why not just use the normal dataloader with negative sampling?

Calculating metrics

Hey,

I have a question concerning the way you calculate metrics:

Predictions of trained model can vary from close to 0 to over 60 for some experiments that I performed using Kaggle Credit Card Fraud Dataset.

My question is: why do use the predictions as if they were probabilities (even though they are not withing [0, 1] range) when calculating AUC-ROC and AUC-PR?

I know that both sklearn functions support this type of mixed input (binary and continuous vectors) but doesn't it give a false result?

Screenshot from 2020-04-23 13-16-29

This is a bit similar to what is happening in your implementation. Also, I think that the confidence threshold you mentioned in paper should be dependent on the value of margin used in deviation loss - not only on probit and normal distribution parameters.

Thanks for the reproducible paper :)
Kuba

Anomaly Contamination level

Hi @GuansongPang, I have another question regarding the anomaly contamination level setting on the unlabeled training dataset.

Since we have this feature on the model during the training process (assuming that we have anomalous data on the unlabeled training dataset), does that mean the model can also see which unlabeled data instances that tend to be "anomalous" (have high anomaly score) on the testing dataset?

Thank you.

TF2 implementation

Hi Mr Pang.
I have read your paper and I found it very interesting to replicate. Congratulations!

I am currently trying to explore your ideas and to implement them in a fraud prevention context but I found some trouble trying to adapt your code to tf2 and our dataset. I am experimenting problems with the code, particularly inside the custom cost function.

The first one occurs with the ref variable. The framework throws the following error: tf.function-decorated function tried to create variables on non-first call. One solution would be to declare the symbolic variable outside the function but the random sample from which the mean and variance are obtained would always be the same.
On the other hand, for debugging purposes only, we train the model in eager mode and the error we encountered is: assertion failed: [predictions must be> = 0] [Condition x> = y did not hold element-wise: ] ….

Have you or your team ever tried to migrate your code to tf2? It could be very helpful for us your experience.

Thank you.

Code for modified DSVDD

Hello,

I would like to first thank you for providing the code for deviation network.
In the paper, it was mentioned that you used a modified version of DSVDD. Would it be possible to provide the code for that?

Model performance

Hi,
I run the code but dont get the performance reported in the paper with Credit card default dataset:
AUC-ROC: 0.6005, AUC-PR: 0.1028 (last run)
average AUC-ROC: 0.6429, average AUC-PR: 0.0941
Did you feature engineering or change default arguments to get the best performance for this dataset (I trained model with default arguments) ?

Using deviation networks on custom datasets

Hey again!

Do you have any advice on how to approach usage of deviation networks on real world datasets?

Main concerns when using real world data:

  • due to feature engineering many columns are dependent (so central limit theorem doesn't quite apply - the variables are not independent)
  • there are many missing values (imputation with mean&mode helps but it may skew the distribution even more)

Having that in mind, do you have any advice about how to approach new datasets?

Do you think of any method that would let me check if my dataset's anomaly scores fit normal distribution and (if not) guide me towards other distribution type?

Have you considered using other distributions or other distribution parameters during your research?

I've seen that you coauthored other paper that solves similar problem. Do you plan to release it's implementation?

Thanks again and cheers,
Kuba

Model performance on backdoor datasets

Hi,Mr Pang
I trained model with default arguments by using the backdoor dataset and get the performance as follows
AUC-ROC: 0.7793, AUC-PR: 0.2663
average AUC-ROC: 0.7834, average AUC-PR: 0.2730
I saw a similar question in previous issue and your answer is that perform some standard data prepocessing steps,could you provide specific steps and data process code on github?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.