C3-Sex

Project Objectives

This project aims to design and build a Bot with interactive sentimental intelligence, able to discover and reveal the profile of cybercriminals in massive online chats, through the elaboration of pertinent answers considering the linguistic and sentimental content.

In addition, the following are the specific objectives, which are established to describe the expected results of the project:

Design a chatbot solution based on an integration of different models of the artificial intelligence domain; a generative based model and another on retrieval-based.
Implement a chatbot capable of generating responses in order to empathize with the suspect and manipulate it to obtain useful information.
Profile suspects (by sentiment analysis and emotions classification) using the data obtained from the interaction with the Bot with interactive sentimental intelligence.

How to install & Set Up

System Requirements

	Minimum	Recommended
CPU:	Cores: 2; Frequency: 1,70GHz; Cache: 3MB	Cores: 4; Frequency: 2,80GHz; Cache: 6MB
Memory:	8GB RAM	12GB RAM
Free Disk Space:	2GB	2GB
Operative System:	Windows 7/8/10	Windows 7/8/10
GPU:	None	Nvidia GeForce GTX 950
Software:	Python 3.6.2; Pip; Google Chrome	Python 3.6.2; Pip; CUDA Toolkit 10.1; Google Chrome
Network Access:	Yes	Yes

Required Packages

Before installing or downloading the project, you must ensure that you have installed the following packages:

Package Name	Description and Functionality
selenium	It provides the Selenium WebDriver API, which allows the project to connect to a browser natively.
tensorflow	As a Machine Learning library, the project use it to build the recurrent neural network (RNN), which simulates the generative conversational agent
pandas	A data analysis library used to manage and read certain data structures, such as csv and dataframes
sklearn	In addition to making use of tensorflow, sklearn is available for the adoption of ML algorithms, such as the Bayesian network in the opinion classification model.
nltk	The natural language tool gives the project the advantage of tokenize words, its used in the different models.
bs4	The Beautiful Soup library allows you to extract data from HTML files, the project use it to extract the knowledge of a GitHub repository, for mapping Slangs in the English language.
python-aiml	This library serves as an inference engine to read xml files as the AIML form, which contain the entire knowledge base of the retrieval-based conversational module

To download any of the packages its recommended to use the pip command, replacing the package-name part:

pip install package-name

Before using the project

Before starting to use the product and to ensure that each functionality works properly, you should consider the following instructions.

To use the ACA module where all the logic of the conversational agent resides, you must ensure that it is trained, for this, the following files must be in the ./ACA/Data/Result folder:

To generate them you must train the generative model (which will require a lot of time, depending on the training data you want to use) or use the already trained data that can be found here.

Software Modules usage

The following information is provided so that the user will be able to use each of these services individually, taking advantage of everything they can offer.

Artifitial Conversational Agent/Entity (ACA)

This conversation module can be used in different ways, depending on what you want to do:

Model Training: During this stage, the model whould be trained based on the dataset and the vocabulary provided. To do so, you should run the following command from the root of the project.

py ACA/chatbot/bottrainer.py

Keep in mind that training can take a long time (depending on parameters you want to train with). These parameters can be modified in ACA/Data/Corpus/hparams.json. Likewise, the dataset that is provided for training can be modified by adding files in ACA/Data/Corpus/Argument0 with the format Question-Answer, as its show in the clean data that we used. Once the training has begun, you must ensure that the following files are created in ACA/Data/Result, which are used for testing and prediction.

basic.data-00000-of-00001
basic.index

Test / Inference: This stage provide a simple interface and a web-based one. Note that ACA/Data/Corpus/vocab.txt and ACA/Data/Rules/hot-startup.aiml files are necessary for inference. To see the performance of the model, you must execute the following command and wait until the ">" indicator appears:

py ACA/chatbot/botui.py

Sentiment Analysis Module

For the sentiment analysis module, you can make use of the two classifiers; opinion and emotions, the first one developed on Python and the second one on R.

Opinion Classifier: You can access the file SA_Module/sentimentAnalysis.py from the root, and make use of the following functions, considering their inputs.

predict(data) # data should be a response
predict_proba(data) # Same way, a sentence and the output differs 'cause returns the probability
sa_measure(replies_file) # A text file with sentences/responses to be classified with probabilities.

Emotions Classifier: In the case of emotions module implemented in R, you can make use of the model trained by the EC_Module/emotional_classifier.py file, which executes an R script to generate prediction for those sentences that will be classified.

ec_measure(replies_file) # A text file with sentences to classify, returns a list with positive (1) emotions or negative (0)

You must consider that the two modules in addition to returning the above, they generate a file with the respective opinions and emotions by sentence, these files have the same name as the one provided, but adding an "_SA" and "_EC" in the end respectively.

Slangs PreProcessing

To make use of this module, you can get the slangs saved on a csv file through a method that offers Slangs / slangExtractor.py, with the following method inside the file:

getSlangs() # Returns a slangs map with its formal language representation

Software General Usage

To use the software as a whole (which integrates all the mentioned modules). You need to execute the following command from the root of the project.

However, before doing so, you must ensure the following indications:

Check your Google Chrome browser; you must keep your Telegram account open (where the suspects will speak to you) and you must verify the anti-bot captcha on Omegle. Also, you must close all Google Chrome processes on Task Manager.
You must change in the file ./ACA/Data/Rules/illegalcontent/hotmaterial.aiml all those fields about the Telegram username (by default is @alanaJe) that you are going to use.

py main.py

References and Credits

C3-Sex: a Chatbot to Chase Cyber perverts : (Paper link Soon)
ACA Generative Module : ChatLearner by Bo Shao
ACA Retrieval Module Intro : AIML basics - ALICE

jemurcia / c3-sex Goto Github PK

c3-sex's Introduction

C3-Sex

Project Objectives

How to install & Set Up

System Requirements

Required Packages

Before using the project

Software Modules usage

Artifitial Conversational Agent/Entity (ACA)

Sentiment Analysis Module

Slangs PreProcessing

Software General Usage

References and Credits

c3-sex's People

Contributors

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent