Code Monkey home page Code Monkey logo

spec5g's Introduction

SPEC5G

This repository contains the code and data of the paper titled "SPEC5G: A Dataset for 5G Cellular Network Protocol Analysis" which is under review.

SPEC5G is a dataset for the analysis of natural language specification of 5G Cellular network protocol specification. SPEC5G contains 3,547,587 sentences with 134M words, from 13094 cellular network specifications and 13 online websites. By leveraging large-scale pre-trained language models that have achieved state-of-the-art results on ML-based natural language processing (NLP) tasks, we have used this dataset for security-related text classification and summarization. Security-related text classification can be used to extract relevant security-related properties for protocol testing. On the other hand, summarization can help developers and practitioners understand the high level of the protocol, which is itself a daunting task.

SPEC5G is the first-ever public 5G dataset for NLP research on network security.

Updates

  • The pretrained models are now available for download.

Table of Contents

Datasets

Download the dataset from here. This includes:

  • Our original 134M Word training corpus (Gold_5G_v4.0.zip)
  • 5GSum - Summarization Dataset (simplification_dataset.csv)
  • 5GSC - Classification Dataset (5GSC.csv)

Models

The pretrained model checkpoints can be found below:

Dependencies

Training & Evaluation

Citation

If you use this dataset or code modules, please cite the following paper:

@misc{https://doi.org/10.48550/arxiv.2301.09201,
  doi = {10.48550/ARXIV.2301.09201},
  url = {https://arxiv.org/abs/2301.09201},
  author = {Karim, Imtiaz and Mubasshir, Kazi Samin and Rahman, Mirza Masfiqur and Bertino, Elisa},
  keywords = {Information Retrieval (cs.IR), Cryptography and Security (cs.CR), Machine Learning (cs.LG), FOS: Computer and information sciences, FOS: Computer and       information sciences},
  title = {SPEC5G: A Dataset for 5G Cellular Network Protocol Analysis},
  publisher = {arXiv},
  year = {2023},
  copyright = {Creative Commons Attribution 4.0 International}
}

spec5g's People

Contributors

masfiqur-mim avatar imtiazkarimik23 avatar ksmubasshir avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.