Code Monkey home page Code Monkey logo

attack_data's Introduction

Attack Data Repository 🧱

A Repository of curated datasets from various attacks to:

  • Easily develop detections without having to build an environment from scratch or simulate an attack.
  • Test detections, specifically Splunks Security Content
  • Replay/inject into streaming pipelines for validating your detections in your production SIEM

Anatomy of a Dataset 🧬

Datasets

datasets are defined by a common yml structure the structure has the following fields:

field description
name name of author
date last odified date
dataset array of urls where the hosted version of the dataset is located
description describes the dataset as detailed as possible
environment markdown filename of the environment description see below
technique array of MITRE ATT&CK techniques associated with dataset
references array of urls that references the dataset
sourcetypes array of sourcetypes that are contained in the dataset

For example

author: Patrick Bareiss
date: '2020-10-08'
description: 'Atomic Test Results: Successful Execution of test T1003.002-1 Registry
  dump of SAM, creds, and secrets Return value unclear for test T1003.002-2 Registry
  parse with pypykatz Successful Execution of test T1003.002-3 esentutl.exe SAM copy '
environment: attack_range
technique:
- T1003.002
dataset:
- https://attack-range-attack-data.s3-us-west-2.amazonaws.com/T1003.002/attack-range-windows-domain-controller.json
- https://attack-range-attack-data.s3-us-west-2.amazonaws.com/T1003.002/windows-powershell.log
- https://attack-range-attack-data.s3-us-west-2.amazonaws.com/T1003.002/windows-security.log
- https://attack-range-attack-data.s3-us-west-2.amazonaws.com/T1003.002/windows-security_ssa.log
- https://attack-range-attack-data.s3-us-west-2.amazonaws.com/T1003.002/windows-sysmon.log
- https://attack-range-attack-data.s3-us-west-2.amazonaws.com/T1003.002/windows-system.log
references: 
  - https://attack.mitre.org/techniques/T1003/002/
  - https://github.com/redcanaryco/atomic-red-team/blob/master/atomics/T1003.002/T1003.002.md
  - https://github.com/splunk/security-content/blob/develop/tests/T1003_002.yml
sourcetypes: 
  - XmlWinEventLog:Microsoft-Windows-Sysmon/Operational
  - WinEventLog:Microsoft-Windows-PowerShell/Operational
  - WinEventLog:System
  - WinEventLog:Security

typically datasets generated by the attack_range dump features will start with a folder name of the MITRE ATT&CK technique, and seen by the default filename: dataset.yml otherwise a custom folder and file name can be used.

Environments

Environments are a description of where the dataset was collected. At this moment there are no specific restrictions, although we do have a simple template a user can start with here. The most common environment for most datasets will be the attack_range since this is the tool that used to generate attack data sets automatically.

Ingest Datasets 🍽

Most datasets generated will be of type JSON. There are two main simple ways to ingest it.

Into Splunk

  1. Download dataset
  2. In splunk enterprise , add data -> Files & Directories -> select dataset
  3. Set the sourcetype to JSON
  4. Set SHOULD_LINEMERGE to false
  5. Explore your data

See a quick demo 📺 of this process here.

Into DSP

To send datasets into DSP the simplest way is to use the scloud command-line-tool as a requirement.

  1. Download the dataset
  2. Ingest the dataset into DSP via scloud command cat attack_data.json | scloud ingest post-events --format json
  3. Build a pipeline that reads from firehose and you should seeing the events.

Contribute Datasets 🥰

  1. Generate a dataset
  2. Upload dataset to s3 bucket or similar hosting service
  3. Make PR with dataset .yml file under the corresponding MITRE ATT&CK technique folder.

Note the simplest way to generate a dataset to contribute is to launch your simulations in the attack_range, or manually attack the machines and when done dump the data using the dump function.

See a quick demo 📺 of the process to dump a dataset here.

To contribute a dataset simple create a PR on this repository, for general instructions on create a PR see this guide.

Automatically generated Datasets ⚙️

This project takes advantage of automation to generate datasets using the attack_range. You can see details about this service on this sub-project folder attack_data_service.

License

Copyright 2020 Splunk Inc.

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

attack_data's People

Contributors

josehelps avatar miskosplunk avatar p4t12ick avatar peter-cg avatar rosplk avatar xlinsplunk avatar

Stargazers

 avatar

Watchers

 avatar  avatar

Forkers

redcybershield

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.