sealingtech / edcop Goto Github PK

Expandable Defensive Cyber Operations Platform

License: Apache License 2.0

Shell 3.76% Makefile 0.08% CSS 0.05% Ruby 0.16% Python 95.25% C 0.65% Dockerfile 0.04%

edcop's Introduction

EDCOP

The Expandable Defensive Cyber Operations Platform

NOTE: This is still in the prototype phase. While the tools work, there are some growing pains as well as known and possibly unknown defects.

EDCOP is a bootable ISO based on Centos 7. EDCOP will install all the necessary components required for deploying EDCOP-Tools from a repository. Current Tools that are supported natively (linked to associated GitHub repos):

Checkout this quick feature demo of EDCOP:

Overview

The EDCOP is a scalable cluster to deploy virtual network defense tools on. It is designed to install a kubernetes cluster that is purpose-built to deploy and manage tools for Defensive Cyber Operations (DCO), although it could be used for any NFVi or standard application.

EDCOP is designed to be a platform to deploy any CND tools. Once deployed you will have Bro, Suricata, ELK stack and other tools made available to you. Each tool has a seperate Github repository viewable here: https://github.com/sealingtech/

EDCOP is designed to work in a number of deployment scenarios including a single physical system as well as large cluster with a traffic load balancer.

Installation takes place by building a "master" node which is then used to build more "minion" nodes. Once this process is completed then it is possible to deploy tools in a scalable fashion.

Install Guide: https://github.com/sealingtech/EDCOP/blob/master/docs/installation_guide.rst

edcop's People

Contributors

Stargazers

Watchers

Forkers

jjung1 miked235 giraldo925 dlohin awesome-security cjunevicus mbaker97 jonesry cyb3rchi3f

edcop's Issues

Integrate Documentation into the platform

Documentation should be easily accessible within the platform itself. Best way accomplish this is to create a container that provides the docs as a URI location in NGINX ("/docs"), and utilize GitHub as the version control system for the docs. This way the docs correspond to a specific reason.

ReadTheDocs is the latest open-source standard for this process. They currently support a Docker container at the following location.

https://hub.docker.com/r/readthedocs/build/

TODO:

Research how ReadTheDocs applicatio pulls from VCS repos. Will the solution work properly in a non-internet connected system? (if not, update this ticket with another alternative that will)
Create Kubernetes deployment file for the container.
Update NGINX proxy with proper settings to support HTTPS & HTTP (redirection to ssl site) at /docs
Integrate container into offline build repo.

Improve the RPM build process

Spin up a Vagrant Centos image which them builds all of the RPM files to make this process more automated. All the RPM files will then be copied to the local user's directory.

Interface assignment is currently not consistent

The eth adapters are seemingly random. We must have a consistent naming scheme on the minions. My suggestion is to look for the Intel XL710 adapter and then use the port numbers to assign out the names. My suggested names are:

Port0 - net0
Port1 - inl0
Port2 - inl1
Port3 - pas0

This gives an idea of what each if for.

Below is some code to make this happen. I am not sure when this is best to execute. Possibly post kickstart or when it first comes up? My fear is that other services are relying on the old network names.

Distributed storage with PV provisioning

Currently, we are using NFS for the PV provisioning. This has a few disavdantages:

It only stores the data on a single host (no redundancy/single point of failure)
Doesn't take advantage of the bulk storage across the cluster.
No elasticity

Long-term we need to look into moving to a distributed storage system for persistent data. Ceph and GlusterFS are the current leading stable technologies within this space and both have support for Kubernetes. The solution should meet a few requirements:

Be fully containerized (nothing required of the host system)
Be able easily integrated into the node installation system so that the available storage seamlessly expands with the addition of new nodes.
Support auto-provisioning within kubernetes.

Please reply to this issue with potential solutions.

Set auto_expand_replicas option

In helm there should be an option to set the auto_expand_replicas. This will allow the replicas to scale as nodes are added. This will solve the issue of the index being yellow when only one node exists (because there is no where to place the replicas).

Place this option in the elasticsearchconfig section as this with a default of 0-2

elasticsearchConfig:
auto_expand_replicas: 0-2

For more details see:
https://www.elastic.co/guide/en/elasticsearch/reference/current/index-modules.html

Update generated Certificates for Chrome

Chrome doesn't accept the certificates that are generated at install-time. Suspected culprit is that there is x509v3 certificates without any Subject Alternative Names. More troubleshooting required to figure out other reasons.

Kubeapps not installing Custom Resource Definitions

When I install the networks (from configure sensors) using Kubeapps, the networks do not install. This works fine in regular helm. If you install the network from helm the kubeapps installs them fine. It is like it is having an issue installing CRDs it doesn't know about? not sure. I tried it with the 1.0.0alpha4 version and it did the same thing.

from kubeapps:

from helm:
==> v1/Network
NAME AGE
inline-1 0s
inline-2 0s
passive 0s

Move GlusterFS into a container

Currently, GlusterFS is being installed as part of the overall platform. It makes more sense for it to be installed as a DaemonSet-Pod so that it more easily scales when the cluster is started.

For more info, see:
https://blog.lwolf.org/post/how-i-deployed-glusterfs-cluster-to-kubernetes/

Create blank docker registry on Master

We have three options:

Create a blank registry in an RPM and deploy as part of the initial build. Once this is done, we would need to build the necessary Tools and push them to the blank registry.
BENEFITS: Easier to manage the nginx proxy forwarding and baseline configuration
DISADVANTAGES: Tools would need to be built and pushed to the registry every time that you rebuild.
Create a master RPM that deploys a registry with the baseline set of tools already
BENEFITS: Fully automated deployment of all tools.
DISADVANTAGES: Very large RPM means very large deployment image
Manage an external registry.
BENEFITS: Better ability to collaborate quickly on the tools.
DISADVANTAGES: Builds would have to have access to the registry to complete.

Enable KeyCloak

Enable Keycloak to the following Dashboards:

Kubernetes Dashboards
Kubeapps
Rook
Traefik

Update firstboot to install kubernetes v1.11

The version of Kubernetes is determined by the kubeadm command in the firstboot.sh script.

This should be update to use v1.11.

Update Docker to 17.03.2

EDCOP currently uses docker v17.09.0. Kubernetes recommends using Docker v17.03.x. kubeadm currently throws a warning about using an unsupported version of docker. Additionally there are no known selinux packages/rules available for docker >17.03.x.

Update EDCOP repo to pull and install docker v17.03.x.

The necessary RPMs can be found at:
Docker RPM
Docker SELINUX RPM

Nice to have listvf.sh

Add listvf program to /sbin

#!/bin/bash

NIC_DIR="/sys/class/net"
for i in $( ls $NIC_DIR) ;
do
if [ -d "${NIC_DIR}/$i/device" -a ! -L "${NIC_DIR}/$i/device/physfn" ]; then
declare -a VF_PCI_BDF
declare -a VF_INTERFACE
k=0
for j in $( ls "${NIC_DIR}/$i/device" ) ;
do
if [[ "$j" == "virtfn"* ]]; then
VF_PCI=$( readlink "${NIC_DIR}/$i/device/$j" | cut -d '/' -f2 )
VF_PCI_BDF[$k]=$VF_PCI
#get the interface name for the VF at this PCI Address
for iface in $( ls $NIC_DIR );
do
link_dir=$( readlink ${NIC_DIR}/$iface )
if [[ "$link_dir" == "$VF_PCI" ]]; then
VF_INTERFACE[$k]=$iface
fi
done
((k++))
fi
done
NUM_VFs=${#VF_PCI_BDF[@]}
if [[ $NUM_VFs -gt 0 ]]; then
#get the PF Device Description
PF_PCI=$( readlink "${NIC_DIR}/$i/device" | cut -d '/' -f4 )
PF_VENDOR=$( lspci -vmmks $PF_PCI | grep ^Vendor | cut -f2)
PF_NAME=$( lspci -vmmks $PF_PCI | grep ^Device | cut -f2).
echo "Virtual Functions on $PF_VENDOR $PF_NAME ($i):"
echo -e "PCI BDF\t\tInterface"
echo -e "=======\t\t========="
for (( l = 0; l < $NUM_VFs; l++ )) ;
do
echo -e "${VF_PCI_BDF[$l]}\t${VF_INTERFACE[$l]}"
done
unset VF_PCI_BDF
unset VF_INTERFACE
echo " "
fi
fi
done

Performance enhancements

performance optimizations.txt

Needed to come on at first boot.

Label nodes upon startup

Kubernetes controls which containers goes on which nodes by looking at the labels we put on each host and we're currently using the labels nodetype=master and nodetype=worker to distinguish between the two.
The syntax for labeling a node is as follows:
kubectl label nodes $nodename nodetype=master|worker --overwrite

In the future, we'd like to add more labels to further distinguish hosts from storage, sensors, master, worker, etc.

Integrate Multus and Calico v3.1 (latest release)

Relies on #44 for specifications

EDCOP currently integrates Multus v1.2 with Calico 3.0.4. After implementation of Multus v3.0, we should upgrade to the latest version of Calico to v3.1 (v3.1.3 as of this writing).

Additionally, integration of calico should be changed to utilize the Kubernetes API as the datastore. In the current version of EDCOP, we implement Calico with a separate etcd cluster dedicated to Calico. This was due to lack of network policies when using the kubernetes API store. This has been resolved in the latest version of Calico. An initial version of Calico with the Kubernetes API store was tested in a file called calico-multus-etcdless.yaml

According to the Calico Release Notes will provide a number of enhancements that may be beneficial to EDCOP:

IPVS instead of iptables implementation of kube-proxy
Global Network policies when using Kubernetes API as the datastore
A number of bug-fixes

Implement Multus v3.0 as DaemonSet

Relies on #44 for design specifics

EDCOP is designed to scale out horizontally as new nodes are added to the cluster. DaemonSets are generally used for this purpose for sensors, event stores, and datastores. Multus v3.0 release notes states that it can be implemented as a DaemonSet. It is assumed that this will be how we want it deployed within EDCOP to conform to our current implementation.

Allow Suricata to run as non-root

This will likely require some changes to Suricata C itself. We need it so that there is a configurable option to not setuid from root to the suricata user. We will define capabilities that will be needed for that application to run.

Enable SELinux in EDCOP

I believe that this is doable now in 1.14. Setting the Selinux Boolean:
container_manage_cgroup

is a start. We will probably need to do some exceptions for things like host paths as well.

Come up with a plan for making performance enhancements more generic

As we have seen, there is some work to get performance enhancements specific to the hardware, this includes various changes to the underlying OS as well as changes to the containers. It would be ideal if we could abstract out these settings. Even if we standardize on hardware, eventually we will need to "upgrade" We may also at some point have a situation where we have two classes of hardware working simultaneously.

vm.max_map_count=262144 must be set

Need to set this on all hosts running Elasticsearch:

sysctl -w vm.max_map_count=262144

make sure it comes up with boot.

Finish Makefiles in build

Makefiles don't currently work. They should support the following options:

make offline-config
make online-config
make iso
make all

Enable kubectl autocomplete on the master

Just a QOL update. This should be enabled on the master and stored in the root user's .bashrc:

yum install bash-completion -y
source <(kubectl completion bash)
echo "source <(kubectl completion bash)" >> ~/.bashrc

Design Proposal for implementing rkt instead of docker

There have been some strides made in implementing CRI-O compliant runtime engines. CoreOS (now owned by Red Hat) created rkt as a container runtime that improves on the security and standardization over docker.

Since EDCOP generally standardizes on CentOS/RHEL, it may make sense to implement rkt as the container runtime engine for EDCOP. A design proposal would be necessary to outline the advantages/disadvantages of implementing rkt over docker, and how to do it.

Some differences are outline in this CoreOS Blog Post, although these are likely biased toward rkt.

Add elasticsearch user + directory permissions

For now, we're using a hostpath for elasticsearch and need to have permissions to write to that directory from the container. We run as user elasticsearch with uid 2000, so here's a simple script to create the user and give the dir to elasticsearch:

useradd -r -u 2000 elasticsearch
mkdir /EDCOP/bulk/esdata
chown elasticsearch:elasticsearch /EDCOP/bulk/esdata

This script should be run on all nodes (including the master).
There might be more users to create in the future until we find a better way to handle data.

FQDN not properly set in firstboot.sh script

Currently, the firstboot.sh script tries to determine the FQDN by using the following commands:

TESTHOSTNAME=(`hostname -A`)
if [[ ${TESTHOSTNAME[0]} = "localhost.localdomain" ]]
then
HOSTNAME="edcop-master.local"
else
HOSTNAME=${TESTHOSTNAME[0]}
fi

Using hostname -A provides unreliable results. A more realistic approach is simply to ask for the FQDN at installation time and modify the firstboot.sh appropriately.

Rook OSDs don't deploy to new hosts

Currently Rook only deploys to the master node. This prevents us from having replication. Need to figure out why this is happening.

Make index.number of shards an option

For clusters larger then 10 nodes we want to increase the number of shards for better performance.

Make a number_of_shards option configureable to the end user.

index.number_of_shards

Deploy helm after cluster initialization

We're starting to look at helm as a deployment solution for the pods we have. It's a simple service to setup and we should deploy it after the cluster is running.
Here's a basic script to install it on an existing cluster:

install helm.txt

Multus 3.0 Design Proposal

Multus v3.0 has implemented a lot of changes to conform with the Kubernetes SIG standards. v3.0 is not compatible with previous versions of Multus. EDCOP multi-networking will need to be overhauled to implement v3.0.

Research new version implementation details
Proposal: CNI binary implementation of Multus
Proposal: Modifications required to implement Calico with Multus v3.0

Research needed:

According to Multus v3.0 release notes, Multus can be "launched as a DaemonSet". Currently, we implement the binary as an RPM and deploy it on each node. Does the new version place the binary on each node via DaemonSet?
How is CNI-chaining implemented in Multus v3.0?
Previous iterations required us to heavily modify the default Calico yaml files and adapt for Multus v1.2. How is the new process defined for implementing a network under Multus?

Traefik UI not accessible after install

After install, traefiks ingress isn't accessible. This was after the sub domain change.

Upgrade EDCOP Platform to Kubernetes 1.14 and other related packages

Need to move to Kubernetes 1.14 and the following:

Multus v3.2
Rook 0.9.3
Kubeapps 1.3.1
Calico 3.6.1
cockpit-173-7.el7.centos.x86_64.rpm
Kubernetes Dashboard 1.10.1
Traefik 1.7.10

Disable swap

Per Kubernetes best practices, swap should be disabled host hosts.

Implement Local Storage Provisioner

In place of Host paths, local storage provisioner adds a number of new features.

https://kubernetes.io/docs/concepts/storage/volumes/#local

Add HugePages and configure OVS DPDK access

OpenVSwitch has been built with DPDK, but is not currently being used with DPDK.

Enabled hugepages in build scripts
Modify firstboot files to initialize DPDK with OVS

Consider performance enhancements

We need to consider these and if there are any impacts to the other tools. This was mostly from the septun enhancements. There was a few I skipped (like disable SR-IOV).

This requires a script from Intel's driver d/l, but I am not sure if we need all of the driver, or just the script. I d/l the driver in the script.

performance optimizations.txt

Add intel_iommu=pt and SR-IOV to physical build

SR-IOV should be enabled by default. Grub must be configured for intel_iommu=pt or intel_iommu=on.

This may be able to be done in the master and minion kickstart.