A collection of sources of documentation, and field best practices, to build/run a SOC.
Those are my view, based on my own experience as SOC/CERT analyst and team manager, as well as well-known papers. Focus is more on SOC than on CERT.
NB: Generally speaking, SOC here refers to detection activity, and CERT/CSIRT to incident response activity.
- Must read
- Fundamental concepts
- Mission-critical means (tools/sensors)
- SOAR
- IT/security Watch (recommended sources)
- Detection engineering management
- Management
- HR and training
- IT achitecture
- To go further (next steps)
- Appendix
- LetsDefend SOC analyst interview questions
- NIST, Cybersecurity framework
- FIRST, Building a SOC
- MITRE, 11 strategies for a world-class SOC, part 0 (Fundamentals)
- FIRST, CERT-in-a-box
- ENISA, Good practice for incident management
- NIST, SP800-86, integration forensics techniques into IR
- CIS, 8 critical security controls
- NIST, SP800-61 rev2, incident handling guide
- MITRE, ATT&CK: Getting started
- ThreatConnect, SIRP / SOA / TIP benefits
- Gartner, Market Guide for Security Orchestration, Automation and Response Solutions
- Orange Cyberdefense, Feedback regarding experience with SOAR in 2020 (in French)
- Playbook for ransomware incident response (in French)
- FIRST, CVSS v3.1 specs
- OASIS Open, STIX
- FIRST, TLP (intelligence sharing and confidentiality)
See: SOC/CSIRT Basic and fundamental concepts.
As per CYRAIL's paper here is an example of architecture of detection (SIEM, SIRP, TIP ):
- SIEM:
- See Gartner magic quadrant
- My recommendations: Splunk, Elastic
- SIRP:
- e.g.: IBM Resilient, TheHive, SwimLane
- SOA:
- My recommendations: IBM Resilient, SwimLane, TheHive, PAN Cortex XSOAR
- TIP:
- My recommendations: MISP, OpenCTI, Sekoia.io, ThreatQuotient
- don't forget the needed feeds (community / paid ones)
- My recommendations for paid ones: ESET, Sekoia.io, Mandiant, RecordedFuture...
- My recommendations for community ones: MISP default feeds list, ISAC, OTX, the Covert.io list.
- Antimalware:
- See Gartner magic quadrant
- My recommendations: Microsoft Defender, BitDefender, ESET Nod32.
- Endpoint Detection and Response:
- See Gartner magic quadrant
- My recommendations: SentinelOne, Microsoft Defender for Endpoint, Harfanglab.
- Secure Email Gateway (SEG):
- See Gartner reviews and ratings
- My recommendations: Microsoft Defender for Office365, ProofPoint, Mimecast
- Secure Web Gateway (SWG) / Security Service Edge:
- see Gartner magic quadrant
- My recommendations: BlueCoat, CISCO, Zscaler, Netskope.
- AD security (audit logs, or specific security monitoring solutions):
- My recommendations: Semperis or PingCastle
- ASM: Asset Security Monitoring / Attack Surface Management:
- My recommendations: Intrinsec (in French), Mandiant
- CASB: Cloud Access Security Broker, if company's IT environment uses a lot of external services like SaaS/IaaS:
- See Gartner magic quadrant
- My recommendations: Microsoft MCAS, Zscaler, Netskope.
- Deceptive technology:
- My recomendation: implement AD decoy acounts
- on-demand volatile data collection tool:
- my recommendations: VARC, DFIR-ORC, FireEye Redline
- On-demand sandbox:
- My recommendations: Joe's sandbox, Hybrid Analysis, etc.
- Forensics and reverse-engineering tools suite:
- My recommendations: SIFT Workstation, or Tsurugi
- My recommendation for reverse engineering, FireEye Flare-VM
- Incident tracker:
- My recommendation: Timesketch
- Scanners:
- IOC scanners:
- Offline antimalware scanners:
- My recommendation: Windows Defender Offline
- Ticketing system:
- My recommendation: GitLab
- Knowledge sharing and management tool:
- My recommendations: Microsoft SharePoint, Wiki (choose the one you prefer, or use GitLab as a Wiki).
As per Gartner definition:
Hence 3 critical tools (see below): SIRP, TIP, SOA, on top of SIEM.
And in my view, SOAR is more an approach, a vision, based on technology and processes, than a technology or tool per say.
-
Online automated Hash checker:
-
Online automated sample analyzer:
- my recommendation: malwoverview
-
(pure) Windows tasks automation:
- My recommendation: AutoIT
-
SaaS-based (and partly free, for basic stuff) SOA:
Try to implement at least the following automations, leveraging the SOA/SIRP/TIP/SIEM capabilities:
- Make sure all the context from any alert is being automatically transfered to the SIRP ticket, with a link to the SIEM alert(s) in case of.
- Leverage API (through SOA) if needed to retrieve the missing context info, when using built-in integrations.
- Automatically query the TIP for any artefacts or even IOC that is associated to a SIRP ticket.
- Automatically retrieve the history of antimalware detections for an user and/or endpoint, that is associated to a SIRP ticket.
- Automatically retrieve the history of SIEM detections for an user and/or endpoint, that is associated to a SIRP ticket.
- Automatically retrieve the history of SIRP tickets for an user and/or endpoint, that is associated to a new SIRP ticket.
- Automatically query AD or the assets management solution, for artefact anrichment (user, endpoint, IP, application, etc.).
- Block an IP on all firewalls (including VPN), and SWG.
- Block an URL on SWG.
- Block an email address (sender) on SEG.
- Block an exe file (by hash) on endpoints (leveraging EDR, Sysmon, or AppLocker).
- Reset an AD account password.
- Disable an AD account (both user and computer, since computer account disabling will block authentication with any AD account on the endpoint, thus preventing from lateral movement or priv escalation).
- Report a (undetected) sample to security vendors, via email.
- SIEM rules publications:
- Threat intel sources:
- Known exploited vulnerabilities:
- LinkedIn / Twitter:
- RSS reader/portal:
- e.g.: Netvibes
- Government CERT, industry sector related CERT...
- Other interesting websites:
- e.g.: ISC, ENISA, ThreatPost ...
-
No real need for tiering (L1/L2/L3)
- this is an old model for service provider, not necesseraly for a SOC!
- as per MITRE paper (p65):
In this book, the constructs of “tier 1” and “tier 2+” are sometimes used to describe analysts who are primarily responsible for front-line alert triage and in-depth investigation/analysis/ response, respectively. However, not all SOCs are arranged in this manner. In fact, some readers of this book are probably very turned off by the idea of tiering at all [38]. Some industry experts have outright called tier 1 as “dead” [39]. Once again, every SOC is different, and practitioners can sometimes be divided on the best way to structure operations. SOCs which do not organize in tiers may opt for an organizational structure more based on function. Many SOCs that have more than a dozen analysts find it necessary and appropriate to tier analysis in response to these goals and operational demands. Others do not and yet still succeed, both in terms of tradecraft maturity and repeatability in operations. Either arrangement can succeed if by observing the following tips that foreshadow a longer conversation about finding and nurturing staff in “Strategy 4: Hire AND Grow Quality Staff.”
Highly effective SOCs enable their staff to reach outside their assigned duties on a routine basis, regardless of whether they use “tier” to describe their structure.
-
3 different teams should be needed:
- security monitoring team (which does actually the "job" of detecting security incident being fully autonomous)
- security monitoring engineering team (which fixes/improves security monitoring like SIEM rules and SOA playbooks, generates reportings)
- build / project management team (which does tools integration, SIEM data ingestion, specific DevOps tasks, project management).
- Designate among team analysts:
- triage officer;
- incident handler;
- incident manager;
- deputy CERT manager.
- Generally speaking, follow best practices as described in ENISA's paper ("Good practice for incident management", see "Must read")
- Use MITRE ATT&CK
- Document all detections (SIEM Rules, etc.) using MITRE ATT&CK ID, whenever possible.
- Implement an information model, like the Splunk CIM one:
- do not hesitate to extend it, depending on your needs
- make sure this datamodel is being implemented in the SIEM, SIRP, SOA and even TIP.
- Document an audit policy, that is tailored of the detection needs/expectations of the SOC:
- The document aims to answer a generic question: what to audit/log, on which equipments/OSes/services/apps?
- Take the (Yamato Security work)(https://github.com/Yamato-Security/EnableWindowsLogSettings#smbclient-security-log-2-sigma-rules) as an exemple regarding an audit policy required for the Sigma community rules.
- Document a detection strategy, tailored to the needs and expectations regarding the SOC capabilities.
- The document will aim to list the detection rules (SIEM searches, for instance), with key examples of results, and an overview of handling procedures.
- Run regular purpleteaming sessions in time!!
- e.g.: Intrinsec, FireEye
- To do it on your own, recommended tool: Atomic Red Team
- Picture the currently confirmed detection capabilities thanks to purpleteaming, with tools based on ATT&CK:
- e.g.: Vectr
- Use Security Stack Mappings to picture detection capabilities for a given security solution/environment (like AWS, Azure, NDR, etc.):
- Generate ATT&CK heatmaps, to picture the SOC detection capabilities
- Read the SOC Cyber maturity model from CMM
- Run the SOC-CMM self-assessment tool
- Read the OpenCSIRT cybersecurity maturity framework from ENISA
- Run the OpenCSIRT, SIM3 self-assessment
- Read the SOC-CMM 4CERT from CMM
- Generate metrics, leveraging the SIRP capabilities to do so:
- top security incident types
- top applications associated to alerts (detections)
- top detection rules triggering most false positives
- top detection rules taking the longest to be handled
- number of false positives
- top 10 SIEM searches (detection rules) triggering false positives
- number of new detection use-cases being put in production.
- number of detection rules which detection capability and handling process have been confirmed with purpleteaming session, so far
- most seen TTP in detection
- most common incident types
- mean time to triage (assign) the alerts
- mean time to handle (verify and be ready for incident response) the alerts
- top 10 longest tickets before closure
- percentage of SIEM data that is not associated to SIEM searches (detection rules)
Recommended timeframes to compute those KPI: 1 week, 1 month, and 6 months.
- BlueTeamLabs (level 1 & 2)
- SANS 555: SIEM with tactical analytics
- SANS SEC450: Blue Team Fundamentals: Security Operations and Analysis
- OSDA SOC-200
- SOC & SIEM Security program: L1, L2, L3
- Splunk Core User
- Microsoft Cybersecurity Architect
- AWS Security Fundamentals
- CEH
- SANS FOR572: Advanced Network Forensics: Threat Hunting, Analysis, and Incident Response
- Splunk Core User
- GCIH
- SANS FOR508: Advanced Incident Response, Threat Hunting, and Digital Forensics
- SANS 555: SIEM with tactical analytics
- MITRE, 11 strategies for a world-class SOC (remaining of PDF)
- ANSSI (FR), EBIOS RM methodology
- Microsoft, SOC/IR hierarchy of needs
- CISA, Cyber Defense Incident Responder role
- Betaalvereniging, TaHiTI (threat hunting methodology)
- GMU, Improving Social Maturity of Cybersecurity Incident Response Teams
- FireEye, Purple Team Assessment
- FireEye, OpenIOC format
- Kaspersky, AV / EP / EPP / EDR / XDR
- NIST, SP800-53 rev5 (Security and Privacy Controls for Information Systems and Organizations)
- Amazon, AWS Security Fundamentals
- Microsoft, PAW Microsoft
- CIS, Business Impact Assessment
- Abdessabour Boukari, RACI template (in French)
- Trellix, XDR Gartner market guide
- Elastic, BEATS agents
- V1D1AN's Drawing: architecture of detection,
- RFC2350 (CERT description)
- Awesome Security Resources
- Incident Response & Computer Forensics, 3rd ed
- GDPR cybersecurity implications (in French)
- SANS SOC survey 2022
- Honeypot:
- My recommendation: Canary.tools
- NDR:
- My recommendation: Gatewatcher
- MDM:
- My recommendation: Microsoft Intune
- DLP:
- Network TAP:
- My recommendation: Gigamon
-
Define SOC priorities, with feared events and offensive scenarios (TTP) to be monitored, as per risk analysis results.
- My recommendation: leverage EBIOS RM methodology (see above).
-
Leverage machine learning, wherever it can be relevant in terms of good ratio false positives / real positives.
- My recommendation: be careful, try not to saturate SOC consoles with FP.
-
Make sure to follow the 11 strategies for a (world class) SOC, as per MITRE paper (see Must Read).
-
Publish your RFC2350, declaring what your CERT is (see "Nice to read" above)
- Implement hardening measures on SOC workstations, servers, and IT services that are used (if possible).
- Put the SOC assets in a separate AD forest, as forest is the AD security boundary, for isolation purposes, in case of a global enterprise's IT compromise
- Create/provide a disaster recovery plan for the SOC assets and resources.
Yann F., Wojtek S., Nicolas R., Clément G., Alexandre C., Jean B., Frédérique B., Pierre d'H., Julien C., Hamdi C., Fabien L., Michel de C., Gilles B., Olivier R., Jean-François L., Fabrice M., Pascal R., Florian S., Maxime P., Pascal L., Jérémy d'A., Olivier C. x2, David G., Guillaume D., Patrick C., Lesley K., Gérald G., Jean-Baptiste V., Antoine C. ...