Code Monkey home page Code Monkey logo

ccfd-rf's Introduction

[CCFD-RF] Credit Card Fraudulent Detection with Random Forest

This is a project for Credit Card Fraudulent Detection with Random Forest using Spark Structured Streaming

http://url/to/img.png

In the code:

There are 3 options if you want to run CCFD-RF

  1. Option 1: Run job locally, reading from a file and writing to console
  2. Option 2: Run job locally, reading from a kafka source and writing to a kafka sink
  3. Option 3: Run job in SoftNet cluster, reading from HDFS and writing to HDFS

Notes:
We propose to run the project with Option 2 because it is easier to test:
The attached code is written in Option 2

Configure SparkSession

Option 1 & 2 Run locally:

In line 25-30 [StructuredRandomForest]: Configure SparkSession variable
    val spark = SparkSession.builder()
      .appName("SparkStructuredStreamingExample")
      .master("local[*]")
      .config("spark.sql.streaming.checkpointLocation", "checkpoint_saves/")
      .getOrCreate()

Option 3 Run on the cluster:

In line 25-30 [StructuredRandomForest]: Configure SparkSession variable
    val spark = SparkSession.builder()
       .appName("SparkStructuredRandomForest")
       .config("spark.sql.streaming.checkpointLocation", "/user/vvittis")
       .getOrCreate()

Read

Option 1 Read from file:

In line 35-43 [StructuredRandomForest]: Read from Source
 val rawData = spark.readStream.text("dataset_source/")

Option 2 Read from kafka:

In line 35-43 [StructuredRandomForest]: Read from Source
 val rawData = spark.readStream
          .format("kafka")
          .option("kafka.bootstrap.servers", "localhost:9092")
          .option("subscribe", "testSource")
          .option("startingOffsets", "earliest")
          .load()
          .selectExpr("CAST(value AS STRING)")

Note: of course you have to execute:

Open 2 command line windows and cd on โ€œC:\kafka_2.12-2.3.0โ€
1st window
bin\windows\zookeeper-server-start.bat config\zookeeper.properties
2nd window
bin\windows\kafka-server-start.bat config\server.properties

Option 3 Read from an HDFS file:

In line 35-43 [StructuredRandomForest]: Read from Source
val rawData = spark.readStream.text("/user/vvittis/numbers")

Note: /user/vvittis/numbers is a path to a HDFS folder

Write

Option 1 Write to console:

In line 212 [StructuredRandomForest]: Write to Console
  val query = kafkaResult
      .writeStream
      .outputMode("update")
      .option("truncate", "false")
      .format("console")
      .queryName("TestStatefulOperator")
      .start()

Option 2 Write to kafka:

In line 215-230 [StructuredRandomForest]: Write to kafka sink
        val query = kafkaResult
          .selectExpr("CAST(value AS STRING)")
          .writeStream.outputMode("update")
          .format("kafka")
          .option("kafka.bootstrap.servers", "localhost:9092")
          .option("topic", "testSink")
          .queryName("RandomForest")
          .start()

Option 3 Write to HDFS file:

In line 224-230 [StructuredRandomForest]: Write to HDFS sink
        val query = kafkaResult
            .writeStream
            .outputMode("append")
            .format("csv")
            .option("path","/user/vvittis/results/")          
            .queryName("RandomForest")
            .start()

Note: /user/vvittis/results is a path to a HDFS folder

RUN the project.

In Intellij

Step 1: Clone CCFD-RF File > New > Project From Version Control... 
Step 2: In the URL: copy https://github.com/vvittis/CCFD-RF.git 
        In the Directory: Add your preferred directory
Step 3: Click the build button or Build > Build Project
Step 4: Go to src > main > scala > StructuredRandomForest.scala and click Run
  • A typical Console showing the state:

alt text

  • A typical Console showing the output:

alt text

In Cluster

You will find the sbt folder

Step 1: Run sbt assembly and create a .jar file
Step 2: Run
        ./bin/spark-submit 
        --class StructuredRandomForest 
        --master yarn-client 
        --num-executors 10 
        --driver-memory 512m 
        --executor-memory 512m 
        --executor-cores 1 /home/vvittis/StructuredRandomForest-assembly-0.1.jar
  • A typical Cluster showing that each executor takes one Hoeffding Tree of the Random Forest:
  • This test executed with 10 executors and 10 HT.

alt text

Licensed under the MIT Licence.

ccfd-rf's People

Contributors

nikolastz avatar vvittis avatar

Stargazers

 avatar  avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.