Code Monkey home page Code Monkey logo

feedbackisolationforest's Introduction

Build the C++ code with 'make' command

(Tested on gcc version 5.1.0. Any version above 4.7 should work)

$ make
gcc --std=c99 -D_GNU_SOURCE -Wall -Werror -g -c C/common.c -o C/common.o
gcc --std=c99 -D_GNU_SOURCE -Wall -Werror -g -c C/object.c -o C/object.o
gcc --std=c99 -D_GNU_SOURCE -Wall -Werror -g -c C/strfun.c -o C/strfun.o
gcc --std=c99 -D_GNU_SOURCE -Wall -Werror -g -c C/readwrite.c -o C/readwrite.o
gcc --std=c99 -D_GNU_SOURCE -Wall -Werror -g -c C/argparse.c -o C/argparse.o
gcc --std=c99 -D_GNU_SOURCE -Wall -Werror -g -c C/argparse_iforest.c -o C/argparse_iforest.o
gcc --std=c99 -D_GNU_SOURCE -Wall -Werror -g -c C/frames.c -o C/frames.o
ld -r C/common.o C/object.o C/strfun.o C/readwrite.o C/argparse.o C/argparse_iforest.o C/frames.o -o cincl.o
g++ --std=c++11 -Wall -Werror -g   -c Forest.cpp -o Forest.o
g++ --std=c++11 -Wall -Werror -g   -c IsolationForest.cpp -o IsolationForest.o
g++ --std=c++11 -Wall -Werror -g   -c Tree.cpp -o Tree.o
g++ --std=c++11 -Wall -Werror -g   -c utility.cpp -o utility.o
g++ --std=c++11 -Wall -Werror -g   -c OnlineIF.cpp -o OnlineIF.o
g++ --std=c++11 -Wall -Werror -g   -c main.cpp -o main.o
g++ --std=c++11 -Wall -Werror -g   -o iforest.exe cincl.o Forest.o IsolationForest.o Tree.o utility.o OnlineIF.o main.o

This will create the executable 'iforest.exe'

$ ../iforest.exe
Usage: D:\Codes\githubFeedbackIF\feedbackisolationforest\iforest.exe [options]
Options:
        -i FILE, --infile=FILE
                Specify path to input data file. (Required).
        -o FILE, --outfile=FILE
                Specify path to output results file. (Required).
        -m COLS, --metacol=COLS
                Specify columns to preserve as meta-data. (Separated by ',' Use '-' to specify ranges).
        -t N, --ntrees=N
                Specify number of trees to build.
                Default value is 100.
        -s S, --sampsize=S
                Specify subsampling rate for each tree. (Value of 0 indicates to use entire data set).
                Default value is 2048.
        -d MAX, --maxdepth=MAX
                Specify maximum depth of trees. (Value of 0 indicates no maximum).
                Default value is 0.
        -H, --header
                Toggle whether or not to expect a header input.
                Default value is true.
        -v, --verbose
                Toggle verbose ouput.
                Default value is false.
        -w W, --RegType=W
                Type of regularizer: 1 indicates l1 and 2 indicates l2
                Default value is 1.
        -c N, --columns=N
                specify number of columns to use.
                Default value is 0.
        -r R, --regularizer=R
                specify regularization constant.
                Default value is 0.
        -u U, --updatetype=U
                specify type of update to perform on weights.
                Default value is 0.
        -f F, --numfeedback=F
                specify number of feedback iteration to perform.
                Default value is 100.
        -x X, --numiter=X
                specify number of times experiments to rerun.
                Default value is 1.
        -l L, --losstype=L
                specify type of loss to use.
                Default value is 0.
        -g G, --numgradupd=G
                specify number of gradient update to run.
                Default value is 1.
        -a A, --learningrate=A
                specify learning rate for gradient update. Set 0 for variable learning rate 1/sq(t).
                Default value is 1.
        -p P, --posweight=P
                specify whether weights to be restricted to be positive only.
                Default value is 0.
        -z Z, --reInitWeights=Z
                specify whether weights reset to 1 after each feedback.(for stochastic and batch update)
                Default value is 0.
        -h, --help
                Print this help message and exit.
Note: The first column of the input file should have the groundtruth label as "anomaly" or "nominal" to incorporate feedback propertly. See one of the input file inside the "dataset" directory for proper csv format.

Test Run

$ cd test
$ ../iforest.exe -i datasets/anomaly/ann_thyroid_1v3/fullsamples/ann_thyroid_1v3_1.csv -o outtest/ann_thyroid_1v3_1 -t 100 -s 256 -m 1 -x 5 -f 10 -w 2 -l 2 -a 1
# Trees          = 100
# Samples        = 256
MaxHeight        = 0
Orig Dimension   = 3251,21
# Iterations     = 5
# Feedbacks      = 10
Loss   type      = logistic
Update type      = online
Num Grad Upd     = 1
Reg. Constant    = 0
Learning Rate    = 1
Variable LRate   = 0
Positive W only  = 0
ReInitWgts       = 0
Regularizer type = L2
iter 0, # Anomaly: Baseline -> 2 Feedback -> 8
iter 1, # Anomaly: Baseline -> 1 Feedback -> 1
iter 2, # Anomaly: Baseline -> 2 Feedback -> 8
iter 3, # Anomaly: Baseline -> 2 Feedback -> 5
iter 4, # Anomaly: Baseline -> 2 Feedback -> 7
Avg: Baseline -> 1.8 Feedback -> 5.8
Time elapsed: 2 seconds

This will create two output files: "ann_thyroid_1v3_1_summary_feed_0_losstype_logistic_updatetype_online_ngrad_1_reg_0_lrate_1_pwgt_0_inwgt_0_rtype_L2.csv" and "ann_thyroid_1v3_1_summary_feed_10_losstype_logistic_updatetype_online_ngrad_1_reg_0_lrate_1_pwgt_0_inwgt_0_rtype_L2.csv" containing number of anomaly discovered after each feedback on different iterations.

iter,feed1,feed2,feed3,feed4,feed5,feed6,feed7,feed8,feed9,feed10
1,0,1,1,1,1,1,1,1,2,2
2,0,0,0,0,0,0,0,0,0,1
3,1,1,1,1,1,2,2,2,2,2
4,0,0,0,0,0,1,1,1,1,2
5,0,0,1,1,1,1,1,2,2,2

iter,feed1,feed2,feed3,feed4,feed5,feed6,feed7,feed8,feed9,feed10
1,0,1,2,3,4,5,6,7,8,8
2,0,0,0,0,0,0,0,0,0,1
3,1,2,2,2,3,4,5,6,7,8
4,0,0,0,0,0,1,2,3,4,5
5,0,0,1,1,2,3,4,5,6,7

Note that, the first output file is the result from the baseline isolation forest hence ignores all the feedbacks.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.