Code Monkey home page Code Monkey logo

terraform-aws-rds-alarms's Introduction

Terraform Module for AWS RDS Cloudwatch Alarms

This Terraform module manages Cloudwatch Alarms for an RDS instance. It does NOT create or manage RDS, only Metric Alarms.

Requires:

  • AWS Provider
  • Terraform 0.13 or higher

If you need Terraform 0.12, you should use version 2.x of this module and contribute changes to the tf-0.12 branch.

Alarms Created

Alarms Always Created (default values can be overridden):

  • CPU Utilization above 90%
  • Disk queue depth above 64
  • Disk space less than 10 GB
  • EBS Volume burst balance less than 100
  • Freeable memory below 256 MB
  • Swap usage above 256 MB
  • Anomalous connection count

If the instance type is a T-Series instance type (automatically determind), the following alarms are also created:

  • CPU Credit Balance below 100

If the database engine is any of postgres type (configured with var.engine), then the following alarms are also created:

  • Maximum used transaction IDs over 1,000,000,000 [reference]

Estimated Operating Cost: $ 1.00 / month

  • $ 0.10 / month for Metric Alarms (7x)
  • $ 0.30 / month for Anomaly Alarm (1x)

Example

resource "aws_db_instance" "default" {
  allocated_storage    = 10
  storage_type         = "gp2"
  engine               = "mysql"
  engine_version       = "5.7"
  instance_class       = "db.t2.micro"
  identifier_prefix    = "rds-server-example"
  db_name               = "my-db"
  username             = "foo"
  password             = "bar"
  parameter_group_name = "default.mysql5.7"
  apply_immediately    = "true"
  skip_final_snapshot  = "true"
}

resource "aws_sns_topic" "default" {
  name_prefix = "my-premade-topic"
}

module "aws-rds-alarms" {
  source            = "lorenzoaiello/rds-alarms/aws"
  version           = "x.y.z"
  db_instance_id    = aws_db_instance.default.identifier
  db_instance_class = "db.t2.micro"
  actions_alarm     = [aws_sns_topic.default.arn]
  actions_ok        = [aws_sns_topic.default.arn]
}

This above can pair very nicely for example with an module which creates an SNS topic which sends your alerts into Slack. For example:

resource "aws_db_instance" "default" {
  allocated_storage    = 10
  storage_type         = "gp2"
  engine               = "mysql"
  engine_version       = "5.7"
  instance_class       = "db.t2.micro"
  identifier_prefix    = "rds-server-example"
  db_name              = "my-db"
  username             = "foo"
  password             = "bar"
  parameter_group_name = "default.mysql5.7"
  apply_immediately    = "true"
  skip_final_snapshot  = "true"
}

module "notify_slack" {
  source  = "terraform-aws-modules/notify-slack/aws"
  version = "~> 4.0"

  sns_topic_name = "slack-topic"

  slack_webhook_url = "https://hooks.slack.com/services/AAA/BBB/CCC"
  slack_channel     = "aws-notification"
  slack_username    = "reporter"
}

module "aws-rds-alarms" {
  source            = "lorenzoaiello/rds-alarms/aws"
  version           = "x.y.z"
  db_instance_id    = aws_db_instance.default.identifier
  db_instance_class = "db.t2.micro"
  actions_alarm     = [module.sns_to_slack.this_slack_topic_arn]
  actions_ok        = [module.sns_to_slack.this_slack_topic_arn]
}

Variables

Name Description Type Default Required
actions_alarm A list of actions to take when alarms are triggered. Will likely be an SNS topic for event distribution. list [] no
actions_ok A list of actions to take when alarms are cleared. Will likely be an SNS topic for event distribution. list [] no
anomaly_period The number of seconds that make each evaluation period for anomaly detection. string "600" no
anomaly_band_width The width of the anomaly band detection. Higher numbers means less sensitive string "2" no
db_instance_id RDS Instance ID string n/a yes
engine The RDS engine being used. Used for database engine specific alarms string "" no
evaluation_period The evaluation period over which to use when triggering alarms. string "5" no
prefix Alarm Name Prefix string "" no
statistic_period The number of seconds that make each statistic period. string "60" no
tags Tags to attach to each alarm map(string) {} no
db_instance_class The rds instance-class, e.g. db.t3.medium string yes
cpu_utilization_too_high_threshold Alarm threshold for the 'highCPUUtilization' alarm string "90" no
cpu_credit_balance_too_low_threshold Alarm threshold for the 'lowCPUCreditBalance' alarm string "100" no
disk_queue_depth_too_high_threshold Alarm threshold for the 'highDiskQueueDepth' alarm string "64" no
disk_free_storage_space_too_low_threshold Alarm threshold for the 'lowFreeStorageSpace' alarm (in bytes) string "10000000000" no
disk_burst_balance_too_low_threshold Alarm threshold for the 'lowEBSBurstBalance' alarm string "100" no
maximum_used_transaction_ids_too_high_threshold Alarm threshold for the 'maximumUsedTransactionIDs' alarm string "1000000000" no
memory_freeable_too_low_threshold Alarm threshold for the 'lowFreeableMemory' alarm (in bytes) string "256000000" no
memory_swap_usage_too_high_threshold Alarm threshold for the 'highSwapUsage' alarm (in bytes) string "256000000" no
create_high_cpu_alarm Whether or not to create the high cpu alarm bool true no
create_low_cpu_credit_alarm Whether or not to create the low cpu credit alarm bool true no
create_high_queue_depth_alarm Whether or not to create the high queue depth alarm bool true no
create_low_disk_space_alarm Whether or not to create the low disk space alarm bool true no
create_low_disk_burst_alarm Whether or not to create the low disk burst alarm bool true no
create_low_memory_alarm Whether or not to create the low memory free alarm bool true no
create_swap_alarm Whether or not to create the high swap usage alarm bool true no
create_anomaly_alarm Whether or not to create the fairly noisy anomaly alarm bool true no

Outputs

Name Description
alarm_connection_count_anomalous The CloudWatch Metric Alarm resource block for anomalous Connection Count
alarm_cpu_credit_balance_too_low The CloudWatch Metric Alarm resource block for low CPU Credit Balance
alarm_cpu_utilization_too_high The CloudWatch Metric Alarm resource block for high CPU Utilization
alarm_disk_burst_balance_too_low The CloudWatch Metric Alarm resource block for low Disk Burst Balance
alarm_disk_free_storage_space_too_low The CloudWatch Metric Alarm resource block for low Free Storage Space
alarm_disk_queue_depth_too_high The CloudWatch Metric Alarm resource block for high Disk Queue Depth
alarm_memory_freeable_too_low The CloudWatch Metric Alarm resource block for low Freeable Memory
alarm_memory_swap_usage_too_high The CloudWatch Metric Alarm resource block for high Memory Swap Usage
alarm_maximum_used_transaction_ids_too_high The CloudWatch Metric Alarm resource block for postgres' Transaction ID Wraparound

terraform-aws-rds-alarms's People

Contributors

andrewfarley avatar jsalvata avatar lorenzoaiello avatar mattie112 avatar os11k avatar ppihus avatar s4ros avatar tonylovesdevops avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

terraform-aws-rds-alarms's Issues

Clarify how to override default values

The README file says:

Alarms Always Created (default values can be overridden):

CPU Utilization above 90%
Disk queue depth above 64
Disk space less than 10 GB
EBS Volume burst balance less than 100
Freeable memory below 256 MB
Swap usage above 256 MB
Anomalous connection count

But the terraform documentation does not provide any hints on how to override values in resources defined by sub-modules. How are these default values overriden?

alarm `cpu_credit_balance_too_low` causes error during `plan` when db instance does not exist yet

This line:

resource "aws_cloudwatch_metric_alarm" "cpu_credit_balance_too_low" {
  count               = length(regexall("(t2|t3)", data.aws_db_instance.database.db_instance_class)) > 0 ? "1" : "0"

results in:


Error: Invalid count argument

  on .terraform/modules/mymod.rds-alarms/main.tf line 28, in resource "aws_cloudwatch_metric_alarm" "cpu_credit_balance_too_low":
  28:   count               = length(regexall("(t2|t3)", data.aws_db_instance.database.db_instance_class)) > 0 ? "1" : "0"

The "count" value depends on resource attributes that cannot be determined
until apply, so Terraform cannot predict how many instances will be created.
To work around this, use the -target argument to first apply only the
resources that the count depends on.

probably the instance-class should not be read from the data provider but passed via a regular variable.

wrong variable description and example

So I used this module to provision database alarms as per example. It did not work - all alarms were stuck in Insufficient data state. I did raise a case to AWS support so they looked at my alarms and identified that database ID was used where Identifier need to be used. This is for RDS SQL Server at least.
Your example says:
db_instance_id = aws_db_instance.default.id
Should be:
db_instance_id = aws_db_instance.default.identifier
Please amend documentation accordingly.

Feature Request/Interest: Disable certain types checks

I'm wondering if there's anybody else out there whose using this module and wished they could configure only a subset of these alarms. Perhaps passing an enabled = False boolean in for the various types of checks. Loosely I think it could be:

  • cpu
  • disk
  • memory
  • anomaly

These can then be used in the aws_cloudwatch_metric_alarm resources as the cpu_credit resource does for the count parameter.

As of right now I just define variables such that the alarm won't actually ever alarm, which isn't ideal. I'd be willing to fork and give this a try if there's interest. Or if there's another way to accomplish this please enlighten me.

Add treat missing data options to alarms

I'd like to be able to set the treat_missing_data option for the alarms while using this module. For example, I'd like to set it to notBreaching, to avoid instances where no data is captured by AWS or it erroneously marks the the alarm as insufficient data and then reports when it transitions back to OK.

Support / Maintenance of this module...?

Hey, I/we avidly use this module in some foundational components and would like to offer to maintain this module moving forwards @lorenzoaiello if you're keen. It looks like you don't hardly get a chance to review/merge/update/support this repo. So we'd be happy to take this on. Would you mind? Please reach out to me on my email on my profile to discuss further.

And yes, I realize I could fork this but I think for the folks that have forked/starred/used this repo it would be good for them to transparently-ish be able to continue using this module without manually changing to an fork. Would be happy to transfer ownership to us to our org https://github.com/DevOps-Nirvana/ and add you as an fellow manager of this repo, or to simply add us as a contributor and give us the ability to do releases (eg: setup some Github Actions to do this automatically upon release).

Thoughts?

support a metric for used storage space to handle autoscaling better

When using RDS autoscaling of storage you can set a min and max allocation. Setting "disk_free_storage_space_too_low_threshold" is hard because it depends on the currently allocated space which is dynamic.

I really want a metric to tell me when I'm approaching the max storage allocation to know when AWS won't save me anymore. If you enable enhanced monitoring you can use the fileSys.used metric to trigger on this and it'd be nice if it were an option to enable.

Maybe some new options could be added around this conditional on a bool to know whether enhanced monitoring is enabled.

https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/USER_Monitoring-Available-OS-Metrics.html

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.