Code Monkey home page Code Monkey logo

mil_urlembedding's Introduction

URL Embedding Using MIL Algorithm

Paper:Bag-of-Characters: A Multiple Instance Learning Framework for URL Embedding in Web Security

Project Overview

This project implements a novel approach for embedding URLs using a Multi-Instance Learning (MIL) approach. The goal is to enhance the detection of malicious web activities by transforming URLs into a structured, vectorized format that captures both semantic and structural nuances.

Modules

The project is divided into three main modules:

  1. data_preprocessing.py - Handles the loading and initial processing of URL data from CSV files.
  2. feature_extraction.py - Manages the transformation of URLs to vector representations using position encoding and normalizes these vectors using a MIL-based strategy.
  3. main.py - Orchestrates the training process, applies KMeans clustering, computes miVLAD vectors, and saves the results.

Getting Started

To get started with this project, clone the repository and install the required dependencies:

git clone https://github.com/chiachen-chang/mil_urlembedding
cd your-repository-directory
pip install -r requirements.txt

Usage

Run the main.py to start the process:

python main.py

Contributing

We welcome contributions from the community, whether they are feature requests, improvements, or bug fixes. Please fork the repository and submit your pull requests for review.

Discussion and Learning

We encourage everyone to participate in discussions and learning around this project. If you have questions, suggestions, or insights, please feel free to open an issue for discussion

Let's collaborate to make URL embedding even more effective and secure!

mil_urlembedding's People

Contributors

chiachen-chang avatar

Stargazers

 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.