awsdocs / amazon-cloudwatch-user-guide Goto Github PK

View Code? Open in Web Editor NEW

94.0 94.0 172.0 2.16 MB

The open source version of the Amazon CloudWatch User Guide

License: Other

amazon-cloudwatch-user-guide's People

Contributors

Stargazers

Watchers

Forkers

sureshchaganti cxystras kennyk65 dlabey changlees sangeeyeah cyrmeow c-mckenna lorenzo-dr jeffreydemuth dalavancloud charanrajb sankarcloud lingarajk wzyboy pbeardshear bcush vadv mswilson edenhochbaum rosswilson gcaraciolo njchandu abhaybhardwaj dstine zhongjiekang kranthikiriti mattgillard krvax maybebored boyapativenkat mihir-acharya apoveda naveenholla chiragmo jenciso jschiefer-innovative yokota-shinsuke jsbonso rnhurt doris-missionlane mitirmizi hborders torstendunkel keisukenakaya emersongaudencio adityavs arc-jung milanflowcast brunovu20 mforutan meetsushantpatil thilinapiy clarkngo ivoanjo conflswcg oak2278 chaitanyagummadi eagletmt dan-hill2802 fawadkhaliq manuelh2410 hiraken-w adnanalvee jenseickmeyer gubupt novilfahlevy double-think spuds51 ayazai jazzwang mtoothman qshou-sstk armaciej tmatsuba usmanovbf congnt3 pgrm knihit niksko thoughtprocess sam-fakhreddine efeet kakarotbyte eduardoaw basilwong sanjana-cell kagawron elluvium hassanhashmy keynanlevi1 leancssmbb alfred-nsh otterley andy-canexia pablopagani gavrilandrei saravanan-sathyamoorthy yphanikumar1995 davidmgre

amazon-cloudwatch-user-guide's Issues

Undocumented behaviour

CloudWatch Dashboards allows you to reference to a previous metric parameter for metrics in the same widget (instead of having to re-type the same parameter for each and every metric). For more information look here: https://docs.aws.amazon.com/AmazonCloudWatch/latest/APIReference/CloudWatch-Dashboard-Body-Structure.html#CloudWatch-Dashboard-Properties-Metrics-Array-Format

What the documentation doesn't mention is that the you can use multiple dots instead of having to separate them as shown in the docs. It would be great if it could be changed to the following

// The following example graphs the DiskReadBytes metric for three instances.
[ "AWS/EC2", "DiskReadBytes", "InstanceId", "i-xyz" ],
[ ".", ".", ".", "i-abc" ],
[ "...", "i-123" ]

Additionally, as obvious as it might seem that you can't use this reference in the first metric of that widget, if you mistakenly do it gives you a misleading error as shown below:

The dashboard body is invalid, there are 1 validation errors: [ { "message": "Should NOT have more than 5 items", "dataPath": "/widgets/6/properties/metrics/1" } ] (Service: AmazonCloudWatch; Status Code: 400; Error Code: InvalidParameterInput; Request ID: 1234567-abcd-8901-efgh-ijklmnopq)

The error provided should be improved

Missing documentation on how to start the agent with wizard generated config file

Docs appear to be missing the command on how to start the cloudwatch-agent using the config file that was generated by the wizard. Please add.

explain evaluation range for CloudWatch alarms

https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/AlarmThatSendsEmail.html#alarms-and-missing-data states

Whenever an alarm evaluates whether to change state, CloudWatch attempts to retrieve a higher number of data points than the number specified as Evaluation Periods. The exact number of data points it attempts to retrieve depends on the length of the alarm period and whether it is based on a metric with standard resolution or high resolution. The time frame of the data points that it attempts to retrieve is the evaluation range.

but offers no specific details about how the evaluation range depends on those things, making it impossible for your readers to set appropriate expectations for how our alarms will behave. Further down the page there are several examples in which "the evaluation range is 5", but we have no idea why it is 5, and there are no examples given in which it is anything other than 5.

Please document the specific logic used by CloudWatch to determine what the evaluation range will be for a given alarm.

Error parsing /opt/aws/amazon-cloudwatch-agent/etc/amazon-cloudwatch-agent.toml, open /usr/share/collectd/types.db: no such file or directory

Hi All,

I see this issue when i am trying to configure the AWS Cloud watch agent Version 1.246396.0.

2020/08/17 12:18:05 I! AmazonCloudWatchAgent Version 1.246396.0.
2020/08/17 12:18:05 E! Error parsing /opt/aws/amazon-cloudwatch-agent/etc/amazon-cloudwatch-agent.toml, open /usr/share/collectd/types.db: no such file or directory

any help on where i can find a fix for this in the documentation will be really helpful. Many Thanks.

Ability to set or customize process_name for procstat process

It looks like the procstat plugin for the amazon cloudwatch agent is getting the process name from /proc/<pid>/status which isn't helpful in many cases. For example, if a command is called with bundle exec the Name will just show bundle. In my case, using /proc/<pid>/cmdline is much more helpful. I tried using process_name in my config just like the telegraf procstat plugin but it doesn't like it and won't use it. Is telegraf being used underneath? Would it be possible to enable the ability to set process_name?

Provide Metrics/Dimensions list by machine readable format (e.g JSON)

This request might be out of scope of this repo, but only place I can request this to AWS.

If AWS officially provide it, it is really useful for creating CloudWatch related tool.
For example, I maintain Metrics/Dimensions list for Grafana.
https://github.com/grafana/grafana/blob/v5.1.4/pkg/tsdb/cloudwatch/metric_find_query.go#L40-L158

Currently, I need to copy&paste from docs to the code...

Lambda insight alpine compatibility

Hello,
I want to monitor lambda with alpine based container.
Do you know if there is a solution to do that?
This documentation doesn't mention how to install lambda insight with different image without rpm like alpine linux:
https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/Lambda-Insights-Getting-Started-docker.html

Have you a tips or solution to do that?
Thanks in advance for your help

CloudWatch service quotas table UX issue

When visiting https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/cloudwatch_limits.html on macOS in Chrome or Safari it is really not obvious that the limits table is scrollable as no scrollbar is shown by default.

Instead of limiting the table height to the screen size, it would be more obvious if the table would be the full size by default and the entire page is scrollable like on other parts of the CloudWatch documentation (example).

Table on Safari, especially the bottom border makes it seem like the table ends:

Other part without closing border:

This page is empty: Creating and working with widgets on CloudWatch dashboards

AWS link: https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/create-and-work-with-widgets.html

The link on github: https://github.com/awsdocs/amazon-cloudwatch-user-guide/blob/master/doc_source/create-and-work-with-widgets.md

Screenshot:

I needed to figure out how to create a widget with some custom options and needed some basic info to understand how to do it properly but there is no content on this page when I clicked it open

ServiceLens_Research_Health Page Fixes, IAM Policy and Github

Hi, the github page doesn't exist yet for ServiceLens_Resource_Health in the doc_source folder.

https://github.com/awsdocs/amazon-cloudwatch-user-guide/tree/master/doc_source/servicelens_resource_health.md

Also, on the IAM policy in prerequisites, the action "cloudwatch:Describe*" is listed twice https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/servicelens_resource_health.html.

Poorly worded description of pod_memory_utilization and pod_memory_utilization_over_pod_limit metrics

The definition of pod_memory_utilization_over_pod_limit is given here:

amazon-cloudwatch-user-guide/doc_source/Container-Insights-metrics-EKS.md

Line 32 in 20a560c

    
           |  `pod_memory_utilization_over_pod_limit` |  PodName, Namespace, ClusterName Namespace, ClusterName Service, Namespace, ClusterName ClusterName  |  The percentage of memory that is being used by pods that is over the pod limit\.  |

However it's not possible for a pod's memory to be over its limit (it would be OOM terminated).

For example, the definition as written implies a metric value of 20% would signal a pod with a limit of 100Mi using 120Mi of memory: $(120Mi-100Mi)/100Mi = 0.2$

It would be more correct to use the definition here:

amazon-cloudwatch-user-guide/doc_source/Container-Insights-reference-performance-entries-EKS.md

Line 20 in d309920

    
           |  Pod |  `pod_memory_utilization_over_pod_limit`  |  Calculated  |  Formula: `pod_memory_working_set / pod_memory_limit` If any containers in the pod don't have a memory limit defined, this field doesn't appear in the log event\.  |

On a similar note with pod_memory_utilization, there is no mention that this is the pod's memory as a proportion of the node's memory (as opposed to the pod's memory request or something else):

amazon-cloudwatch-user-guide/doc_source/Container-Insights-metrics-EKS.md

Line 31 in 20a560c

    
           |  `pod_memory_utilization` |  PodName, Namespace, ClusterName Namespace, ClusterName Service, Namespace, ClusterName ClusterName  |  The percentage of memory currently being used by the pod or pods\.  |

Again, the definition here is presumably more accurate:

amazon-cloudwatch-user-guide/doc_source/Container-Insights-reference-performance-entries-EKS.md

Line 16 in d309920

    
           |  Pod |  `pod_memory_utilization`  |  Calculated  |  Formula: `pod_memory_working_set / node_memory_limit` It is the percentage of pod memory usage over the node memory limitation\.  |

How to choose math expression with the new page design

Under Graphed metrics there is no Math expression how to ceate a math expression in cloudwatch

ec2tagger not accepting credentials-file

First of all, thank you for a great product, and a very thourough documentation.

This feels sort of like a support question, yet it could also be intentional, in which case I think it should just be included in the documentation. Feel free to close this and refer me to the right place :-) Or, and very likely, I didn't understand the docs right, IAM is hard :-p

I've been trying to setup Cloudwatch such that it will asumme an IAM role for all operations it needs to do.
Getting the outputs working, was painless and generally very easy.
However, it seems that the processors-section(ec2tagger) of the configuration, which relies on a credentials file, does not allow configuration with an assumed role. I've tried adding an IAM role to the instance, allowing it to check it's own data, which seems to work.

However, once I try and configure a credentials file to use an assumed role, it breaks. Specifying a credentials file, like credentials1, yields the following error:

2019-12-03T10:25:18Z E! refresh EC2 Instance Tags failed: SharedCredsAccessKey: shared credentials AmazonCloudWatchAgent in *credentials-file* did not contain aws_access_key_id, metrics will be dropped until it got fixed

Just specifying bogus values for both access_key and secret_key doesn't help - as it just uses those values and then fails :-(

I would expect the tag processor to pick up the credcential configuration and assume the assigned role.

credentials1

[AmazonCloudWatchAgent]
credential_source = Ec2InstanceMetadata
role_arn = arn:aws:iam…

[default]
credential_source = Ec2InstanceMetadata

processors-section

[processors]

  [[processors.ec2tagger]]
    ec2_instance_tag_keys = ["aws:autoscaling:groupName"]
    ec2_metadata_tags = ["ImageId", "InstanceId", "InstanceType"]
    profile = "AmazonCloudWatchAgent"
    refresh_interval_seconds = "2147483647s"
    shared_credential_file = "*credentials-file*"
    [processors.ec2tagger.tagpass]
      metricPath = ["metrics"]

"Start the CloudWatch Agent on an Amazon EC2 Instance Using the Command Line" section is wrong

This page is really unhelpful. I first tried to use it when the agent first came out and had to give up.

There's a bunch of information about creating the configuration json file. However, the agent doesn't read the configuration file. It reads a toml file which is generated by the amazon-cloudwatch-agent-ctl fetch-config action. The instructions here:
https://github.com/awsdocs/amazon-cloudwatch-user-guide/blob/master/doc_source/install-CloudWatch-Agent-on-first-instance.md#start-the-cloudwatch-agent-on-an-amazon-ec2-instance-using-the-command-line

Aren't for starting the agent, they're for generating the toml file. This needs to be called out in a separate, labeled step.

https://github.com/awsdocs/amazon-cloudwatch-user-guide/blob/master/doc_source/install-CloudWatch-Agent-on-EC2-Instance-fleet.md

and the pages reference here:

https://github.com/awsdocs/amazon-cloudwatch-user-guide/blob/master/doc_source/install-CloudWatch-Agent-on-premise.md

Also need this update.

Detailed Guide for Autodiscovery on Amazon ECS clusters: ex3/ex4 JSON syntax, edit github link

UserGuide URL:

https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/ContainerInsights-Prometheus-Setup-autodiscovery-ecs.html

Issues:

Example 3 missing a comma after '"sd_container_name_pattern": "^envoy$"`
Example 4 missing a comma after docker_label value
Broken Github link:

Give information on Container Insights metric resolution

Hello,

it would be great if Container Insights documentation will elaborate on metric resolution.

The doc is here:
https://github.com/awsdocs/amazon-cloudwatch-user-guide/blob/master/doc_source/Container-Insights-metrics-ECS.md

Thanks.

Policy to allow access to CloudWatch log group/stream missing

I am trying to setup Container Insights on Amazon EKS and Kubernetes. To grant IAM permissions to enable the Amazon EKS worker node to send metrics and logs to CloudWatch, I referred to the Verify prerequisites. To configure the service account to use an IAM role, I again referred to Configuring a Kubernetes service account to assume an IAM role. But this page doesn't tell me how to create a policy to allow access for the CloudWatch log groups/streams. Instead the example talks about policy for S3 access.

Lambda-Insights-Getting-Started-clouddevelopmentkit.md does not exist

The documentation at https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/Lambda-Insights-Getting-Started-clouddevelopmentkit.html is incomplete, and does not include the attachment of the Lambda Insight layer to the actual user lambda function. When attempting to update the user documentation, the github link at the bottom of the page links to https://github.com/awsdocs/amazon-cloudwatch-user-guide/tree/master/doc_source/Lambda-Insights-Getting-Started-clouddevelopmentkit.md, and there is no such file in the repo, no anything similarly named.

cwagent user misconfigured

Package amazon-cloudwatch-agent. The package creates user cwagent in post-install scriptlet:
preinstall scriptlet (using /bin/sh):
# Stop the agent before upgrades.
if [ $1 -ge 2 ]; then
if [ -x /opt/aws/amazon-cloudwatch-agent/bin/amazon-cloudwatch-agent-ctl ]; then
/opt/aws/amazon-cloudwatch-agent/bin/amazon-cloudwatch-agent-ctl -a stop
fi
fi

if ! grep "^cwagent:" /etc/group >/dev/null 2>&1; then
groupadd -r cwagent >/dev/null 2>&1
echo "create group cwagent, result: $?"
fi

if ! id cwagent >/dev/null 2>&1; then
useradd -r -M cwagent -d /home/cwagent -g cwagent >/dev/null 2>&1
echo "create user cwagent, result: $?"
fi
Syntax used in useradd command is meaningless. -d specifies home directory and -M tells to not create the directory. User is created with default shell /bin/bash which is inappropriate for service user.
Line should be changed to:
useradd -r -M cwagent -s /sbin/nologin -c "Cloudwatch Agent" -g cwagent >/dev/null 2>&1

Source Code Cloud watch Agent

Good Day,

Can anyone point me to the repository where the source code for the cloud watch agent( windows and linux) is stored

Regards,
Manuel Hubbard

Amazon CloudWatch Agent Restart Updates amazon-cloud-watch-agent.toml File Every Time

Agent Version: 1.3.14217
OS: Windows Server 2016

I am using a chef cookbook to install and configure Amazon CloudWatch Agent inside my server. I have my code configured to restart the agent anytime it detects a change has been made to the amazon-cloud-watch-agent.toml file.

However, every time the agent is restarted, it modifies the amazon-cloud-watch-agent.toml file itself as well. The content stays the same but the MD5 hash of the file changes. This causes chef to think that the file has been updated and hence it gives a call to the agent to restart itself and the cycle continues.

Is there a way to stop the agent from modifying the toml config file on every restart?

Thank you.

Curly Brackets Typo at https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/graph-dynamic-labels.html

At https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/graph-dynamic-labels.html there are curly brackets instead of round brackets in the table for some of the new property dynamic labels:

MetricName
Namespace
Period
Region
Stat

For example in the table is shows: ${PROP('Stat'}} instead of ${PROP('Stat')}

Delete Unknown dimension created when push metric

Hi,

I'm successfully pushed the metric to the cloudwatch, but I found some dimensions that I don't know why it's there.
Is there any way I can delete those default dimensions created? In my configuration, I just append 2 dimensions for the metric (AutoScalingGroupName & InstanceID). Currently there are 7 dimensions created, I just want the dimensions only 2 (AutoScalingGroupName & InstanceID).

And for the configuration that I use is here

{
    "agent": {
        "metrics_collection_interval": 15,
        "logfile": "{{ aws_cwa_logfile_path }}"
    },

    "metrics": {
        "namespace": "CWAgent2",
        "metrics_collected": {
            "cpu": {
                "measurement": [
                    {"name": "usage_system", "rename": "CPU Usage System", "unit": "Percent"},
                    {"name": "usage_user", "rename": "CPU Usage User", "unit": "Percent"},
                    {"name": "usage_idle", "rename": "CPU Usage Idle", "unit": "Percent"}
                ]
            },

            "mem": {
                "measurement": [
                    {"name": "total", "rename": "Memory Total"},
                    {"name": "available_percent", "rename": "Memory Available", "unit": "Percent"}
                ]
            }
        },

        "append_dimensions": {
            "InstanceId": "${aws:InstanceId}",
            "AutoScalingGroupName": "${aws:AutoScalingGroupName}"
        },

        "aggregation_dimensions" : [["AutoScalingGroupName"], ["InstanceId"],[]]
    }
}

Thank you.

Unclear condition key support documentation in CloudWatch ARC page

I know https://docs.aws.amazon.com/IAM/latest/UserGuide/list_amazoncloudwatch.html is under the IAM namespace, but apparently the data in there is managed by the relevant service teams, so I'm filing the issue here.

The table in that page mentions that aws:ResourceTag/${TagKey} is supported at the bottom, but it's not listed on any of the actions in the table above, so it's unclear which actions actually support it. It seems important in tag-based access control to actually have ResourceTag conditions because RequestTag will only apply on creation of the resource, and if we can't do anything with the tag after it's created, it's not as useful.

Should we assume that every action that takes an alarm resource also supports the aws:ResourceTag family of condition keys? If I compare against https://docs.aws.amazon.com/IAM/latest/UserGuide/list_amazonec2.html for example, that includes aws:ResourceTag in all actions that support it.

Example of underutilization alarm in Anomaly Detection

All of the example in the anomaly detection page show instances of overutilization. i.e. When cpu usage experiences a spike upwards, the number of invocations of a lambda function spike upwards, etc. It might also be useful to know when a metric suddenly spikes downward. i.e. Number of connections to web server suddenly spiking downwards might indicate that customers cannot connect to my website.

https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/CloudWatch_Anomaly_Detection.html

IMO, the page mainly focuses on instances where a metric suddenly spikes upwards, but metrics that suddenly spike downwards might also be just as significant to the overall health of the system and might want to be alarmed on. The page focusing on the upwards pattern makes it appear that the anomaly feature only works for that scenario.

CloudWatch logs to ElasticSearch service?

Is it possible to ship CloudWatch logs into AWS ElasticSearch service? If so, how to ship logs from CloudWatch into ElasticSearch service?

"Premature transitions to alarm state" description contradicts examples and real behavior

The "Avoiding premature transitions to alarm state" section seems to be wrong, not in line with the example above and experiment results.

Firstly, it mixes up Evaluation Periods and Evaluation Range in two provided examples (see bolded parts):

However, if the last few data points are - - X - -, the alarm goes into ALARM state even if missing data points are treated as missing. This is because alarms are designed to always go into ALARM state when the oldest available breaching datapoint during the Evaluation Periods number of data points is at least as old as the value of Datapoints to Alarm, and all other more recent data points are breaching or missing. In this case, the alarm goes into ALARM state even if the total number of datapoints available is lower than M (Datapoints to Alarm).

This alarm logic applies to M out of N alarms as well. If the oldest breaching data point during the evaluation range is at least as old as the value of Datapoints to Alarm, and all of the more recent data points are either breaching or missing, the alarm goes into ALARM state no matter the value of M (Datapoints to Alarm).

Furthermore, with that description, the state for 0 - X - - with 2 of 3 data points to alarm and treat missing data as missing should be ALARM:

oldest available breaching point: 3rd from the end
is as least as old as the value of Datapoints to Alarm (2): yes
all more recent data points are breaching or missing: yes

But in the table above, this exact example has the state OK.

I tested it by creating a CloudWatch Alarm with a period of 10 seconds.

with 0 - X - - data, the alarm was raised 1 minute and 34 seconds after the X datapoint
with X - - data, the alarm was raised after 31 seconds after the X datapoint.

This shows that if there are NOT_BREACHING datapoints in the evaluation range, even older than the BREACHING one, the alarm is not raised. This aligns with the examples but not with the "Avoiding premature transitions to alarm state" section, which disregards any data points before the BREACHING one.

Improve Readability

In this page, https://github.com/awsdocs/amazon-cloudwatch-user-guide/blob/master/doc_source/WhatIsCloudWatch.md

The paragraph needs to be written better, currently it starts like this,

You can create alarms which watch metrics

Clarify what causes the cluster_failed_node_count metric to increment

On the page for EKS and Kubernetes container insight metrics, you document the cluster_failed_node_count metric, by simply saying 'The number of failed worker nodes in the cluster'.

It's unclear as to what conditions constitute a node failure. Anecdotally, we have found this metric incrementing when all nodes in the cluster display a 'Ready' status, and no nodes have a failed instance health check.

Some clarity as to when a node is considered to be in a 'failed' state would greatly aid us in assessing the usefulness of this metric when it comes to alerting on unhealthy nodes.

Cloudwatch agent stops immediately after starting

I've tried the installation steps on both Amazon Linux and RHEL 7 Instances and I'm getting the same result.

First I started the agent with the following command:

sudo /opt/aws/amazon-cloudwatch-agent/bin/amazon-cloudwatch-agent-ctl -a fetch-config -m ec2 -c file:/opt/aws/amazon-cloudwatch-agent/bin/config.json -s

When I check the agent status with this command:

sudo /opt/aws/amazon-cloudwatch-agent/bin/amazon-cloudwatch-agent-ctl -a status

I get an output of this

This is a screenshot of my /var/log/messages

This is a screenshot of my cloudwatch agent log file:

Please let me know if you have any tips to solve this issue

ContainerInsights-Prometheus-Setup-configure guide produces errors

Following the directions exact results in these errors:

D! Profiler dump:
[cloudwatchlogs_/aws/containerinsights/<my_cluster>/prometheus_emfMetricDrop: 1606.000000]
[cloudwatchlogs_/aws/containerinsights/<my_cluster>/prometheus_rawSize: 2346952.000000]
D! [outputs.cloudwatchlogs] Buffer fullness: 0 / 10000 metrics
D! receive metric batch with 17984 prometheus metrics
D! receive metric batch with 5 prometheus metrics
E! [outputs.cloudwatchlogs] Unalbe to marshal structured log content: json: unsupported value: +Inf
D! [outputs.cloudwatchlogs] Wrote batch of 820 metrics in 22.00095ms
D! [outputs.cloudwatchlogs] Buffer fullness: 0 / 10000 metrics
D! [outputs.cloudwatchlogs] Pusher published 172 log events to group: /aws/containerinsights/<my_cluster>/prometheus stream: kubernetes-service-endpoints with size 97 KB in 70.890857ms.
D! [outputs.cloudwatchlogs] Buffer fullness: 0 / 10000 metrics
D! [outputs.cloudwatchlogs] Buffer fullness: 0 / 10000 metrics
D! [outputs.cloudwatchlogs] Buffer fullness: 0 / 10000 metrics
D! [outputs.cloudwatchlogs] Buffer fullness: 0 / 10000 metrics
D! [outputs.cloudwatchlogs] Buffer fullness: 0 / 10000 metrics
D! [outputs.cloudwatchlogs] Buffer fullness: 0 / 10000 metrics
D! [outputs.cloudwatchlogs] Buffer fullness: 0 / 10000 metrics
D! [outputs.cloudwatchlogs] Buffer fullness: 0 / 10000 metrics
D! receive metric batch with 818 prometheus metrics
D! receive metric batch with 5 prometheus metrics
D! [outputs.cloudwatchlogs] Wrote batch of 164 metrics in 7.039542ms
D! [outputs.cloudwatchlogs] Buffer fullness: 0 / 10000 metrics
D! [outputs.cloudwatchlogs] Buffer fullness: 0 / 10000 metrics
E! [outputs.cloudwatchlogs] Aws error received when sending logs to /aws/containerinsights/<my_cluster>/prometheus/kubernetes-apiservers: InvalidParameter: 1 validation error(s) found.
- minimum field size of 1, PutLogEventsInput.LogEvents[19].Message.
E! [outputs.cloudwatchlogs] All /aws/containerinsights/<my_cluster>/prometheus retries to kubernetes-apiservers/28 failed for PutLogEvents, request dropped.
W! [outputs.cloudwatchlogs] Retried 28 time, going to sleep 30.819530014s before retrying.

I obfuscated my cluster name with <my_cluster>

better documentation on how to set region for on-premises servers

I'm trying to configure the Cloudwatch Agent to run outside of EC2. I get stuck at this step:

/opt/aws/amazon-cloudwatch-agent/bin/amazon-cloudwatch-agent-ctl         -a fetch-config -m onPremise -c "file:/etc/meter/cloudwatch/cloudwatch.json" -s
/opt/aws/amazon-cloudwatch-agent/bin/config-downloader --output-dir /opt/aws/amazon-cloudwatch-agent/etc/amazon-cloudwatch-agent.d --download-source file:/etc/meter/cloudwatch/cloudwatch.json --mode onPrem --config /opt/aws/amazon-cloudwatch-agent/etc/common-config.toml --multi-config default
Got Home directory: /root
I! Set home dir Linux: /root
Unable to determine aws-region.
Please make sure the credentials and region set correctly on your hosts.
Refer to http://docs.aws.amazon.com/cli/latest/userguide/cli-chap-getting-started.html
Fail to fetch the config!

Here's what I've tried:

Putting the following contents in both in /root/.aws/credentials and /home/myuser/.aws/credentials:

cat /root/.aws/credentials
[AmazonCloudWatchAgent]
aws_access_key_id = mykey
aws_secret_access_key = mysecret
region = us-west-2

Adding region to both [default] and [AmazonCloudWatchAgent] stanzas in /root/.aws/config and /home/myuser/.aws/config
Editing /opt/aws/amazon-cloudwatch-agent/etc/common-config.toml to point directly to /root/.aws/credentials and use AmazonCloudWatchAgent as the profile, though I understadn that these are supposed to be the default values
Setting AWS_REGION=us-west-2 in the environment before invoking amazon-cloudwatch-agent-ctl

Still I'm getting a region error. There's an unanswered post on the forum that seems to have the same issue: https://forums.aws.amazon.com/thread.jspa?threadID=314543

What am I missing?

CloudWatch Agent on FreeBSD?

Can the CloudWatch agent be installed on FreeBSD? The list of metrics mentions a few that are only collected on FreeBSD, but I can't find any mention of FreeBSD being supported or a way to install the agent on it.

Typo on CloudWatch pricing page

Hi there,

This is not directly related to the user guide, but rather the pricing page: https://aws.amazon.com/cloudwatch/pricing/

I have tried to report this to my TAM, but weeks have passed and it hasn't been fixed. So I am trying here.

Example 1 near the bottom says:

Total number of minutes in the month = 60 * 24 * 30 = 42,300 minutes

But that's wrong: 60 * 24 * 30 = 43,200

And then of course this line:

Total Number of API requests = 10 instances * (42,300 minutes/5 minutes) = 84,600 requests

Should say:

Total Number of API requests = 10 instances * (43,200 minutes/5 minutes) = 86,400 requests

The second example has the correct figure.

Hopefully someone here will be able to get it fixed. Thank you!

Tracking Unique Opens and Clicks on Simple Email Service - AWS

I have been working on Amazon SES for quite some time now, my friend and I have recently set it up for an organization for email marketing purposes and its working really well. It tracks metrics as well. But now I as an individual may possibly open emails multiple times to check back details if the mail interests me. Upon which the no. of times I open from the same IP gets recorded multiple times, which is the same problem with clicks, other services such as mail chimp are able to determine the unique number of opens that is counting an email opened by a single individual person as one no matter how many times the same person opens and the clicks as only one per individual, I want to know how do you set this up, Amazon's documentation is complicated the way it is and no one has a clear idea from the support team how to proceed with this. So email marketers request you to let me know how it can be done

HSM missing from the list?

Is HSM not included in this list for any specific reason or is it just an oversight?

https://docs.aws.amazon.com/cloudhsm/latest/userguide/hsm-metrics-cw.html

Screenshot in docs is inconsistent with current UI

In https://github.com/awsdocs/amazon-cloudwatch-user-guide/blob/master/doc_source/US_AlarmAtThresholdEC2.md#setting-up-a-cpu-usage-alarm-using-the-aws-management-console, set 4, the UI screenshot is inconsistent with the current CloudWatch UI.

Documentation of ECS Container Insights NetworkRxPackets/NetworkTxPackets

There does not seem to be documentation on the exact meaning of the NetworkRxPackets and NetworkTxPackets fields in Container Insights log records stored in CloudWatch logs:

https://github.com/awsdocs/amazon-cloudwatch-user-guide/blob/master/doc_source/Container-Insights-reference-performance-logs-ECS.md

https://github.com/awsdocs/amazon-cloudwatch-user-guide/blob/master/doc_source/Container-Insights-metrics-ECS.md

We would like to convert those numbers to bytes, to calculate the total amount of data transferred in/out of ECS containers.

Installation of CloudWatch agent on RHEL 8 Instance fails

Hello, when I run the installer in a RHEL 8 instance I get the following error message:

rpm -U amazon-cloudwatch-agent.rpm
error: unpacking of archive failed on file /etc/init/amazon-cloudwatch-agent.conf;63079042: cpio: open failed - No such file or directory
error: amazon-cloudwatch-agent-1.247354.0b251981-1.x86_64: install failed

Thanks for your assistance

Missing instructions for running on EKS Fargate with an IAM role

The documentation is currently missing instructions to run cloudwatch agent on EKS Fargate using an IAM role with OIDC provider. The IAM role using OIDC works when running the same pod on an EC2 worker node but not when running on fargate.

Better documentation for aggregation_dimensions

If I omit the "aggregation_dimensions" key, Cloudwatch publishes metrics using the "host" as an aggregation key and the hostname as the value. I would like to roll up on AutoScalingGroup and on the "host" key.

However, when I add any values to "aggregation_dimensions" Cloudwatch stops publishing using the "host" rollup. I tried doing:

"aggregation_dimensions": [["AutoScalingGroupName"], ["host"]]

as well as an empty array ([[]]) but did not see any "host" results when I used either technique.

Do you have tips for how I can publish metrics that are aggregated by hostname?

Agent does not resolve instance ID dimension when appended to only 1 metric

Summary:
When I tell the CloudWatch agent to append the EC2 Instance ID dimension to only ONE metric, it does not resolve the Instance ID using the AWS Instance Metadata -- instead it sets the literal pseudo-param string (${aws:InstanceId}) as the Instance ID.
However, if I tell the agent to append the exact same EC2 Instance ID dimension to ALL metrics, it works perfectly (it correctly resolves the pseudo-param into the actual EC2 Instance ID)!

Steps to reproduce:

This is my initial config file:

{
    "metrics":
    {
        "metrics_collected":
        {
            "mem":
            {
                "measurement":
                [
                    "mem_used_percent"
                ],
                "metrics_collection_interval": 300,
                "append_dimensions":
                {
                    "InstanceId": "${aws:InstanceId}"
                }
            }
        }
    }
}

As you can see, this is a very simple configuration: I am telling the agent to publish 1 memory metric every 5 minutes, and append the Instance ID dimension to only that 1 metric.

When I run the agent with this config, this is what I see in CloudWatch:

As you can see, the agent does not resolve ${aws:InstanceId} into the actual EC2 instance ID -- instead, it sets that literal string as the instance ID.

You can also see that in the TOML config file generated by the agent from my JSON config file above:

[agent]
  collection_jitter = "0s"
  debug = false
  flush_interval = "1s"
  flush_jitter = "0s"
  hostname = ""
  http_proxy = "<REDACTED>"
  https_proxy = "<REDACTED>"
  interval = "60s"
  logfile = "/opt/aws/amazon-cloudwatch-agent/logs/amazon-cloudwatch-agent.log"
  metric_batch_size = 1000
  metric_buffer_limit = 10000
  no_proxy = "localhost,<REDACTED>,169.254.169.254"
  omit_hostname = false
  precision = ""
  quiet = false
  round_interval = false

[inputs]

  [[inputs.mem]]
    fieldpass = ["used_percent"]
    interval = "300s"
    [inputs.mem.tags]
      InstanceId = "${aws:InstanceId}"      # TAG IS WRONGLY SET TO THE LITERAL STRING '${aws:InstanceId}'

[outputs]

  [[outputs.cloudwatch]]
    force_flush_interval = "60s"
    namespace = "CWAgent"
    region = "us-east-1"
    [outputs.cloudwatch.tagdrop]
      log_group_name = ["*"]

Now, I stop the agent and tell the agent to use the following config instead:

{
    "metrics":
    {
        "append_dimensions":
        {
            "InstanceId": "${aws:InstanceId}"
        },
        "metrics_collected":
        {
            "mem":
            {
                "measurement":
                [
                    "mem_used_percent"
                ],
                "metrics_collection_interval": 300
            }
        }
    }
}

As you can see, the new config is almost identical to the old one -- the only difference is that, now, I am telling the agent to add the Instance ID dimension to ALL metrics (effectively no difference, since I am still collecting only 1 metric).
At this point, apart from the minor tweak to the config above, nothing else has changed -- I'm on the same EC2, with the same IAM role, security group, proxy, etc.

When I run the agent with this new config, this is what I see in CloudWatch:

As you can see, the agent now DOES correctly resolve ${aws:InstanceId} into the actual EC2 instance ID (ID number redacted in the pic above).

You can also see that in the TOML config file generated by the agent from my JSON config file above:

[agent]
  collection_jitter = "0s"
  debug = false
  flush_interval = "1s"
  flush_jitter = "0s"
  hostname = ""
  http_proxy = "<REDACTED>"
  https_proxy = "<REDACTED>"
  interval = "60s"
  logfile = "/opt/aws/amazon-cloudwatch-agent/logs/amazon-cloudwatch-agent.log"
  metric_batch_size = 1000
  metric_buffer_limit = 10000
  no_proxy = "localhost,<REDACTED>,169.254.169.254"
  omit_hostname = false
  precision = ""
  quiet = false
  round_interval = false

[inputs]

  [[inputs.mem]]
    fieldpass = ["used_percent"]
    interval = "300s"

[outputs]

  [[outputs.cloudwatch]]
    force_flush_interval = "60s"
    namespace = "CWAgent"
    region = "us-east-1"
    tagexclude = ["host"]
    [outputs.cloudwatch.tagdrop]
      log_group_name = ["*"]

[processors]

  [[processors.ec2tagger]]
    ec2_metadata_tags = ["InstanceId"]     # AGENT NOW KNOWS THAT THE INSTANCE ID IS A METADATA-BASED TAG
    refresh_interval_seconds = "2147483647s"
    [processors.ec2tagger.tagdrop]
      log_group_name = ["*"]

Further Details:
I installed the agent by downloading the latest amd64 deb file for Ubuntu (https://s3.amazonaws.com/amazoncloudwatch-agent/ubuntu/amd64/latest/amazon-cloudwatch-agent.deb), manually transferring it to my Ubuntu EC2, and then running the following command:
dpkg -i amazon-cloudwatch-agent.deb

CloudWatch agent version: 1.208036.0

I started the agent, and told it what config file to use, with the following command:
/opt/aws/amazon-cloudwatch-agent/bin/amazon-cloudwatch-agent-ctl -a fetch-config -m ec2 -c file:/opt/aws/amazon-cloudwatch-agent/etc/amazon-cloudwatch-agent.json -s

This seems like a bug with the CloudWatch agent -- if not, can you please tell me what I'm doing wrong?
Let me know if you need any further info from me.

Configuration not consistent with document

While configuring the disk the measurement is "used_percent" and not "disk_used_percent"
as mentioned in
https://github.com/awsdocs/amazon-cloudwatch-user-guide/blob/master/doc_source/metrics-collected-by-CloudWatch-agent.md#linux-metrics-enabled-by-CloudWatch-agent

Enabling Lambda Insights for the arm64-based lambda base image

I was building a Lambda container image using public.ecr.aws/lambda/python:3.9-arm64, and the following Error occured during docker build:

=> ERROR [2/3] RUN curl -O https://lambda-insights-extension.s  2.9s
...
#6 2.917        package AWSLogsLambdaInsights-1.0-1.x86_64 is intended for a different architecture
------
executor failed running [/bin/sh -c curl -O https://lambda-insights-extension.s3-ap-northeast-1.amazonaws.com/amazon_linux/lambda-insights-extension.rpm &&
     rpm -U lambda-insights-extension.rpm &&
     rm -f lambda-insights-extension.rpm

It looks like the downloading extension in the user guide is for x86_64 architecture. Is there an arm64 version of the Lambda Insights extension?