- The golden rule which must be followed is as follows. Everything must be managed as code.
- Mesure, mesure and mesure everything. Use "Observability" pattern solution. It is all about control.
- Use MOB as development process. This will bring knowledge and quality into the teams.
In continuous delivery CD any infra/app could be released at any time. Asynchronous from each other. The deploy will happen by Spinnaker/Jenkins automatically once the code is pushed to git. This means that uncontrolled breaking changes are strongly forbidden.
To comply to the agreed contract is the soul of building software landscapes between producer and consumers.
All integrations points should have a contract. The contracts should be managed in an API managment system e.g. API gateway, APIGEE, Mulesoft.
This will enable a governance layer over all integrations.
Examples of integrations:
- Application Services (REST, SOAP,...)
- Database integrations such as STORE PROCEDURES
- Batches
- Data flows
- File transfers
The matrix below describes what a Environment strategy normally looks like. Copy-paste the goodies like the Quality Gates into the BLUE/GREEN deployment pipeline.
Environment | Purpose | Quality Gates | Comment |
---|---|---|---|
DEV | Sandbox for development, component tests (UT) | Code Review from MR, static code analysis ok | Only a subset environment, likely to be a local dev pc |
QA | UIT | SmokeTests, Run automated tests, regression test, monitoring test | functional tests |
ACC/PT | Complete flows | performance test, security tests, look-and-feel, Release Note approval | Downscaled (X) prod environment |
PROD | Environment for customers |
The HLD (High Level Design) below describes one solution for the problem statement. Something like ~10+ nodes for RabbitMQ is required for "caters for millions of users". RabbitMQ is an awesome component for building distributed system. However it is also a large complex component, see detailed info at RabbitMQ.
The HLD is on a really high level and assumes that something like "AWS Well-Architected Framework" is used as the foundation in the implementation.
Based on five pillars
- Operational excellence (automating changes, responding to events, and defining standards to manage daily operations)
- Security (confidentiality and integrity of data, identifying and managing who can do what with privilege management, protecting systems, and establishing controls to detect security events)
- Reliability (distributed system design, recovery planning, and how to handle change)
- Performance efficiency (selecting the right resource types and sizes based on workload requirements, monitoring performance)
- Cost optimization (focuses on avoiding unnecessary costs)
Component | Infrastructure | Application | Comment |
---|---|---|---|
Terraform | X | Terraform scripts in GIT | |
CI (GIT, Jenkins) | X | Artifact saved as VM image or Docker image | |
CD (Spinnaker, k8s) | X | X | Spinnaker pipeline for Application: DEV->TEST->PROD. CD for Infrastructure needs runner executor node (DEV cli or VM). |
Code changes always starts from DEV->TEST->PROD. Each (environment) has Quality Gates making the change more safe to deploy into PROD.
Always force and fix issues and incidents to PROD. Rollback should not be performed.
Developer push code to git from MR. This triggers (Webhook) a build. Compile->Test->Build (artifact/docker image)->Push Artifact.
Infrastructure components should be coded in Terraform or other IaC (cloud agnostic) language. MR into master triggers a terraform execution with the delta.
Spinnaker observes new artifacts from container registry.
When a new artifact version occurs the Spinnaker triggers a new deployment to TEST stage. The Spinnaker pipelines then uses the BLUE/GREEN deployment.
Spinnaker takes care of the two key areas
Features to view and manage your cloud resources.
Applications, clusters, and server groups are the key concepts Spinnaker uses to describe your services.
Manage the CD pipeline.
One environment running the "PROD" version. The other is a release candidate with the delta for the new release. When promoting the new release candidate, switch DNS (loadbalancer). After some time when everything runs smooth update the previous prod environment. Spinnaker takes care of this.
NOTE: Caution when the release involves database layer!
- All application and infrastructure code is in GIT and container registry.
- Database tiers needs to have a backup policy determine various points
- Validity TTL
- Sensitive data
- Runtime/Persistence data
- Size of data
At least schedule daily Backup (recovery peroid of 30 days) when the system has the lowest latency.
A data directory contains two types of data: definitions (metadata, schema/topology) and message store data.
Minimum is to schedule daily Backup of datatype definitions.
The Runtime phase comprises three critical areas: compute, access, and storage.
The strategy is to give as little authority as possible to consumers.
Pre-defined policies should be used and if not existing one should be created and used. For further detail see Pillar: Security.
Use the Elasticsearch (ELK) stack which fulfill the Observability pattern.
This includes:
- Store, search and aggregate data.
- metrics (ITIL, application metrics)
- logs
- APM (Application Performance Monitoring - detect and manage flows, transactions, bottlenecks, root cause investigations in the system)
- Alerts (alarms)
- Visualize (dashboards, reports, cool apps)
- Security (OWASP)
- Anormaly detection (Machine learning)
NOTE: proactive detections such as alerts and dashboards visualization is a key for success. Huge flexability for diffrent users in the system e.g developers and ITIL operations.
The Kubernetes clusters needs a monitoring tool such as Grafana or Prometheus.
See chapter Monitoring solution.
Spinnaker takes fixes alot of the HA and scaling implementation using scale sets and server groups.
This is defined in the configuration of the deployment cluster. Strategy: BLUE/GREEN. Scale down replace server groups.
However, there is no one-size-fits-all approach for configuring Spinnaker.
- Spinnaker - Externalize Redis for Caching
- Horizontally Scale Spinnaker Services
- Scaling Clouddriver
- Scaling Orca (execution engine)
Stuff Spinnaker should take care of if managed by Spinnaker.
One of the most power features of orchestration tools such as Kubernetes is the ability to automcatically scale resource allocation in response to real-time changes in resource usage.
However, when the Spinnaker acts as an abstracted layer for HA/Scaling only a subset should be used from below.
- k8s kind
- Deployment/spec/replicas
- HorizontalPodAutoscaler
- LimitRange
- PersistentVolume
- PersistentVolumeClaim
- ResourceQuota
- PodDisruptionBudget
The system should use Multi-AZ spread (failover) for the k8s cluster. Done in Spinnaker.
The Database tier should also use a read replica set depending of data type (performance, DR, migration).
This is a very important and vast area ;)
Try to follow the guidelines from cisecurity.
Areas:
- Least Access
- Least Privilege
- Configuration Management/Change Management
- Audit Logs