Code Monkey home page Code Monkey logo

amazon-redshift-monitoring's Introduction

Redshift Advance Monitoring

Goals

Amazon Redshift is a fast, fully managed, petabyte-scale data warehouse solution that uses columnar storage to minimise IO, provides high data compression rates, and offers fast performance. This GitHub project provides an advance monitoring system for Amazon Redshift that is completely serverless, based on AWS Lambda and Amazon CloudWatch. A serverless Lambda function runs on a schedule, connects to the configured Redshift cluster, and generates CloudWatch custom alarms for common possible issues.

Most of the graphs are based on the information provided in AWS Big Data Blog articles and Redshift Documentation:

Installation

This function can be automatically deployed using a Serverless Application Model (SAM) in CloudFormation. Use the links below based on the specified region to walk through the CloudFormation deployment model.

You must supply parameters for your cluster name, endpoint address and port, master username and the encrypted password, and the aggregation interval to be used by the monitoring scripts (default 1 hour).

The SAM stack will create:

  • An IAM Role called LambdaRedshiftMonitoringRole
  • This IAM Role will have a single linked IAM Policy called LambdaRedshiftMonitoringPolicy that can:
    • Decrypt the KMS Key used to encrypt the cluster password (kms::Decrypt)
    • Emit CloudWatch metrics (cloudwatch::PutMetricData)
Region VPC Template Non-VPC Template
ap-northeast-1
ap-northeast-2
ap-south-1
ap-southeast-1
ap-southeast-2
ca-central-1
eu-central-1
eu-west-1
eu-west-2
eu-west-3
sa-east-1
us-east-1
us-east-2
us-west-1
us-west-2

If you wish to deploy manually, you can use the prebuilt zip in the dist folder, or you can build it yourself. We've included a build script for bash shell that will create a zip file which you can upload into AWS Lambda.

The password for the Redshift user must be encrypted with KMS, and plaintext passwords are NOT supported. Furthermore, Lambda Environment Variables can also be encrypted within the Lambda service using KMS.

Setting up KMS Keys for encryption

If you use the above SAM deployment templates, then all permissions are configured for you. If not, then these are the steps you should follow to configure the function:

  • Create a KMS key in the same region as the Redshift Cluster. Take note of the key ARN Documentation
  • Create a Role for the lambda function, at least this role should have the policy "AWSLambdaVPCAccessExecutionRole" to be able to run in a VPC, and the custom policy (to access the KMS key):
{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "Stmt1458213823000",
            "Effect": "Allow",
            "Action": [
                "kms:Decrypt"
            ],
            "Resource": [
                "<kms key arn>"
            ]
        },
        {
            "Sid": "Stmt1458218675000",
            "Effect": "Allow",
            "Action": [
                "cloudwatch:PutMetricData"
            ],
            "Resource": [
                "*"
            ]
        }
    ]
}
  • Create a user in Redshift to use it with the script, this user should have at least access to the tables in the "pg_catalog" schema:

grant select on all tables in schema pg_catalog to tamreporting

  • Encrypt the password of the user with the KMS key, you can use this command line to do it:

aws kms encrypt --key-id <kms_key_id> --plaintext <password>

Configuration

Static Configuration (Bad - deprecated after v1.2)

You can edit the variables at the top of the script, and rebuild. Please note that anyone who has access to the Lambda function code will also have access to these configuration values. This includes:

  • user: The user in the database.
  • enc_password: The password encrypted with the KMS key.
  • host: The endpoing dns name of the Redshift cluster.
  • port: The port used by the Redshift cluster.
  • database: Database name of the Redshift cluster.
  • ssl: If you want to use SSL to connect to the cluster.
  • cluster: A cluster name, your graphs in CloudWatch are going to use it to reference the Redshift Cluster.
  • interval: The interval you're going to use to run your lambda function, 1 hour is a recommended interval.

Environment Variables (Better)

Alternatively, you can now use Lambda Environment Variables for configuration, including:

"Environment": {
        "Variables": {
            "encrypted_password": "KMS encrypted password",
            "db_port": "database part number",
            "cluster_name": "display name for cloudwatch metrics",
            "db_name": "database name",
            "db_user": "database user name",
            "cluster_endpoint": "cluster DNS name"
        }
    }

Configuring with Events (Best)

This option allows you to send the configuration as part of the Scheduled Event, which then means you can support multiple clusters from a single Lambda function. This option will override any Environment variables that you've configured. An example event looks like:

{
  "DbUser": "master",
  "EncryptedPassword": "AQECAHh+YtzV/K7+L/VDT7h2rYDCWFSUugXGqMxzWGXynPCHpQAAAGkwZwYJKoZIhvcNAQcGoFowWAIBADBTBgkqhkiG9w0BBwEwHgYJYIZIAWUDBAEuMBEEDM8DWMFELclZ2s7cmwIBEIAmyVGjoB7F4HbwU5Y1lq7GVQ3UU3MaE10LWieCKMHOtVhJioi+IHw=",
  "ClusterName": "energy-demo",
  "HostName": "energy-demo.c7bpmf3ajaft.eu-west-1.redshift.amazonaws.com",
  "HostPort": "5439",
  "DatabaseName": "master",
  "AggregationInterval": "1 hour"
}

The old environment variable names are provided for backward compatibility, but you can use environment variables with the above names, and it will use those instead.

Manual Deployment Instructions

  • If you are rebuilding the function, download and install dependencies

pip install -r requirements.txt -t .

  • Assemble and compress the Lambda function package:

./build.sh

If you are including any user defined query extensions, then build with:

./build.sh --include-user-queries

Please note the labelled version in Github does not include any user queries

  • Create a lambda function, some of the parameters of the function are:

    • Runtime: Python 2.7
    • Upload the zip file generated
    • Handler: lambda_function.lambda_handler
    • Role: Use the role created
    • Memory: 256MB
    • Timeout: 5 minutes
    • VPC: Use the same VPC as the Redshift cluster. You're going to need at least two private subnets with access to the Redshift cluster in its Security Group. You should have a NAT Gateway to give access to Internet to those subnets routing tables. You cannot use public subnets. You can read more information here AWS blog
  • Add an Event Source to the Lambda function with a Scheduled Event, running with the same frequency you configured in the Lambda function.

Confirming Successful Execution

  • After a period of time, you can check your CloudWatch metrics, and create alarms. You can also create a Dashboard with all the graphs and have a view of your database as this one:

Dashboard1 Dashboard2

Extensions

The published CloudWatch metrics are all configured in a JSON file called monitoring-queries.json. These are queries that have been built by the AWS Redshift database engineering and support teams and which provide detailed metrics about the operation of your cluster.

If you would like to create your own queries to be instrumented via AWS CloudWatch, such as user 'canary' queries which help you to see the performance of your cluster over time, these can be added into the user-queries.json file. The file is a JSON array, with each query having the following structure:

{
	"query": "my select query that returns a numeric value",
	"name":"MyCanaryQuery",
	"unit":"Count | Seconds | Milliseconds | Whatever",
	"type":"(value | interval)"
}

The last attribute type is probably the most important. If you use value, then the value from your query will be exported to CloudWatch with the indicated unit and Metric Name. However, if you use interval, then the runtime of your query will be instrumented as elapsed milliseconds, giving you the ability to create the desired 'canary' query.


Copyright 2016-2017 Amazon.com, Inc. or its affiliates. All Rights Reserved.

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

amazon-redshift-monitoring's People

Contributors

enkeboll avatar frankfarrell avatar hyandell avatar ianmeyers avatar jaskirat avatar javierros avatar tomdaly avatar vintageplayer avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

amazon-redshift-monitoring's Issues

Interval is not being used

Hi ,
It seems that the aggregation Interval is not used in the code .
I would like to have some monitors run in one interval (every 5 minutes for example)
and other monitors will run in a different ones (every hour)

Support for IAM-based authentication

Are there plans to update this project with an option to avoid encrypting a password with KMS and instead relying on the IAM role attached to the Lambda function to provide authentication to Redshift?

no new metrics after 3 days?

Hi again - I've got the function successfully running hourly, but I don't see any new metrics resulting from it. Here's an error reported in the logs that might be related:

(u'ERROR', u'42P01', u'relation "sensor_data" does not exist', u'/home/ec2-user/padb/src/pg/src/backend/catalog/namespace.c', u'237', u'RangeVarGetRelid'): ProgrammingError
Traceback (most recent call last):
File "/var/task/lambda_function.py", line 328, in lambda_handler
put_metrics.extend(run_external_commands('User Configured', 'user-queries.json', cursor, cluster))
File "/var/task/lambda_function.py", line 93, in run_external_commands
interval = run_command(cursor, command['query'])
File "/var/task/lambda_function.py", line 129, in run_command
cursor.execute(statement)
File "/var/task/lib/pg8000/core.py", line 852, in execute
self._c.execute(self, operation, args)
File "/var/task/lib/pg8000/core.py", line 1741, in execute
self.handle_messages(cursor)
File "/var/task/lib/pg8000/core.py", line 1879, in handle_messages
raise self.error
ProgrammingError: (u'ERROR', u'42P01', u'relation "sensor_data" does not exist', u'/home/ec2-user/padb/src/pg/src/backend/catalog/namespace.c', u'237', u'RangeVarGetRelid')

This error is reported at least twice for each Lambda run, amongst the series of diagnostic queries. Here's a screenshot. I do not see any new metrics for Redshift listed.

No LAMBDA

Non-Lambda version please? We're in Austrlia, managing clients with Redhisft cluster but Lambda is not available here yet.

Can't create Change Set

Hi there,
I'm trying to set this up using the links you provide below, but I can't complete the setup. I get a message saying:
Check the following transforms: ["AWS::Serverless-2016-10-31"] You must use a change set to create this stack because it includes one or more transforms.

But when I click the Create Change Set button, nothing happens and Execute is still greyed out.

Am I missing a step?

-- joe.

Cannot work with Redshift version 1.0.18788

The metrics cannot be generated with version 1.0.18788. The possible cause might be connection failure. Here is the stack trace:

The read operation timed out: timeout
Traceback (most recent call last):
  File "/var/task/lambda_function.py", line 15, in lambda_handler
    redshift_monitoring.monitor_cluster(config_sources)
  File "/var/task/redshift_monitoring.py", line 329, in monitor_cluster
    put_metrics.extend(gather_service_class_stats(cursor, cluster))
  File "/var/task/redshift_monitoring.py", line 124, in gather_service_class_stats
    ''')
  File "/var/task/redshift_monitoring.py", line 98, in run_command
    cursor.execute(statement)
  File "/var/task/lib/pg8000/core.py", line 861, in execute
    self._c.execute(self, operation, args)
  File "/var/task/lib/pg8000/core.py", line 1909, in execute
    self.handle_messages(cursor)
  File "/var/task/lib/pg8000/core.py", line 1972, in handle_messages
    code, data_len = ci_unpack(self._read(5))
  File "/var/lang/lib/python3.6/socket.py", line 586, in readinto
    return self._sock.recv_into(b)
  File "/var/lang/lib/python3.6/ssl.py", line 1012, in recv_into
    return self.read(nbytes, buffer)
  File "/var/lang/lib/python3.6/ssl.py", line 874, in read
    return self._sslobj.read(len, buffer)
  File "/var/lang/lib/python3.6/ssl.py", line 631, in read
    v = self._sslobj.read(len, buffer)
socket.timeout: The read operation timed out

In comparison, we have another cluster with the latest version (1.0.18861) which generated and updated the metrics successfully. Based on the comparison, we assume that the Redshift version might influence the project.

NOTE
We modified the code to make timeout as an input parameter. The code changes are listed below:
redshift-monitoring.py

line 262      #Add timeout to config_sources
line 263     timeout = int(get_config_value(['TimeOut', 'time_out', 'timeOut'], config_sources))
..........
line 305     conn = pg8000.connect(database=database, user=user, password=pwd, host=host, port=port, ssl=ssl, timeout=timeout)

The input JSON of the cloud watch event rule:
{ "DbUser": "xxxxxxx", "EncryptedPassword": "**************", "ClusterName": "xxxxxxxxxx", "HostName": "xxxxxxxxxxxxxx", "HostPort": "xxxx", "DatabaseName": "xxxxxxxxxx", "AggregationInterval": "1 hour", "TimeOut": "20" }

P.S

  • Project version: 1.7
  • Deployment method: manually create AWS resources and upload zip onto Lambda function

Lambda function timing out while publishing cloudwatch metrics

I deployed CloudWatch using SAM and I can see the data being fetched from Redshift Cluster within 1 minute, however the Lambda function still doesn't complete after running for 5 minutes,.

Lambda logs report:
Executing Redshift Diagnostic Query: WLMQuerySlotCountWarning
Publishing 24 CloudWatch Metrics
END RequestId: b054f912-b614-11e8-aa9e-d5eb851e7827
REPORT RequestId: b054f912-b614-11e8-aa9e-d5eb851e7827 Duration: 300005.52 ms Billed Duration: 300000 ms Memory Size: 192 MB Max Memory Used: 32 MB
2018-09-11T22:52:41.248Z b054f912-b614-11e8-aa9e-d5eb851e7827 Task timed out after 300.01 seconds

Cloudwatch Log report:


22:47:42
Executing Redshift Diagnostic Query: WLMQuerySlotCountWarning

22:47:42
Publishing 24 CloudWatch Metrics
Publishing 24 CloudWatch Metrics

22:52:41
END RequestId: b054f912-b614-11e8-aa9e-d5eb851e7827
END RequestId: b054f912-b614-11e8-aa9e-d5eb851e7827

22:52:41
REPORT RequestId: b054f912-b614-11e8-aa9e-d5eb851e7827 Duration: 300005.52 ms Billed Duration: 300000 ms Memory Size: 192 MB Max Memory Used: 32 MB

22:52:41
2018-09-11T22:52:41.248Z b054f912-b614-11e8-aa9e-d5eb851e7827 Task timed out after 300.01 seconds

22:52:41
Pushing metrics to CloudWatch failed: exception ('Connection aborted.', error(1, 'Operation not permitted'))

22:52:41
/var/task/redshift_monitoring.py:249: SyntaxWarning: name 'debug' is assigned to before global declaration

22:52:41
global debug

Can't use password without encrypting with KMS

The section in redshift_monitoring.py that handles passwords checks for an unencrypted password and then sets the password to None. Because of this, even if an unencrypted password exists, it will never be used.

Need to change the flow of logic here so that the unencrypted password can be used.

No module named enum

I deployed the v1.5 zip which had the fix for pgpass library and it seems like i am seeing another module being missing.

17:15:47
global debug

17:15:47
START RequestId: 51a926fe-5e2d-43a0-adfd-4e8e022e016b Version: $LATEST

17:15:47
Unable to import module 'lambda_function': No module named enum

17:15:47
END RequestId: 51a926fe-5e2d-43a0-adfd-4e8e022e016b

17:15:47
REPORT RequestId: 51a926fe-5e2d-43a0-adfd-4e8e022e016b Duration: 0.50 ms Billed Duration: 100 ms Memory Size: 192 MB Max Memory Used: 32 MB

17:16:44
START RequestId: 51a926fe-5e2d-43a0-adfd-4e8e022e016b Version: $LATEST

17:16:44
Unable to import module 'lambda_function': No module named enum

17:16:44
END RequestId: 51a926fe-5e2d-43a0-adfd-4e8e022e016b

17:16:44
REPORT RequestId: 51a926fe-5e2d-43a0-adfd-4e8e022e016b Duration: 0.68 ms Billed Duration: 100 ms Memory Size: 192 MB Max Memory Used: 32 MB

17:18:35
START RequestId: 51a926fe-5e2d-43a0-adfd-4e8e022e016b Version: $LATEST

17:18:35
Unable to import module 'lambda_function': No module named enum

17:18:35
END RequestId: 51a926fe-5e2d-43a0-adfd-4e8e022e016b

17:18:35
REPORT RequestId: 51a926fe-5e2d-43a0-adfd-4e8e022e016b Duration: 0.52 ms Billed Duration: 100 ms Memory Size: 192 MB Max Memory Used: 32 MB

Password decrypt is timing out

I have setup the KMS key in the same region where the Redshift cluster resides. Encrypted the database user password using command line "aws kms encrypt --key-id $KEY_ID --plaintext ". Edited the lambda_function.py script to fill in the configurations where the enc_password field is set to the "CiphertextBlob" output from the above command line. Now when I am running a test on the lambda function the decrypt step is timing out. Any suggestion on why it is timing out would be appreciated.

module initialization error: 'db_user'

I'm getting the error in the subject each time the function is invoked. Is there a way for me to better diagnose the issue? The function logs in CloudWatch aren't very verbose besides to point out the module error.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.