This repository contains source code, notebooks, and models for the "A Machine Learning-based Malicious Payload Detection and Classification Framework for New Web Attacks" paper.
- Arguably the most comprehensive web attack (payload) dataset of the time, which should benefit every security researcher and company working on web application attacks.
- A "minimalist" customized framework to do your payload machine learning projects
- A starting place for mixing machine learning and cybersecurity
- Experimental code
- Only traditional ML models
- Not battle-tested
Here is how our simple framework directories are structured
- sources: sources we used to generate the final dataset
- final: crafted final datasets from sources
- dataset.py: dataset generator
- server.py: bare minimal async HTTP webserver to collect and detect attacks
- cli.py: contains model loading and prediction examples. please note it uses the base model not the optimized one by default.
contains saved models from our experiments. Use with caution as you may need to rebuild your models.
- EDA: exploratory data analysis
- Experiments: the payload detection logic and experiments.You can find some thoughts and ideas in commented codes as well.
- the academic paper behind this work
We suggest using Google Colab or Anaconda to manage dependencies; otherwise, installing the required packages might become overwhelming.
This framework can also act as a boilerplate for similar research. You can now use the same structure to easily manage your research.
example of payloads and base (not optimized) RandomForest model.
test_payloads = ["system(ls -la)", "xss", "crlf", "xxe", "sqli", "injection", "../../../../wow", "passwd", "etc", "onmouseover",
"<<<>>>>",
"onload", "192.168.1.100:9887", "127.0.0.1", "10.255.255.255", "host", "localhost"
"10.0.0.0",
"172.31.255.255", "192.168.255.255", "192.168.0.0", "172.16.0.0", "wait", "count",
"select", "google", "google.com", "www.google.com", "alert", "alert(1)"
"bin", "bash",
"curl", "where", "char", "exec", "cgi", "extractvalue", "1", "2", "3"
"tftp",
"192.168", "192.", "127.", "'", "<>",
"<a>example.com</a>", "cmd", "<>@!@#$%^&*()_+", "<b>example.com<<>>@",
"Mozilla/5.0 (X11; OpenBSD i386) AppleWebKit/537.36 (KHTML, like Gecko) "
"Chrome/36.0.1985.125 Safari/537.36" , "../../etc/passwd" , "and 1=1--" , "<svg/onload=alert(0)"]
S. Ramezany, R. Setthawong and T. Tanprasert, "A Machine Learning-based Malicious Payload Detection and Classification Framework for New Web Attacks," 2022 19th International Conference on Electrical Engineering/Electronics, Computer, Telecommunications and Information Technology (ECTI-CON), 2022, pp. 1-4, doi: 10.1109/ECTI-CON54298.2022.9795455.
- payloadidentifier
- OWASP coreruleset
- PayloadAllTheThings
- Burp XSS cheat sheet
- ML-based WAF
- WAF Dataset
** For academic refrences, please check the paper.