Hi, I recently started using this project for pulling my logs from AWS. There aren't a

Thanks a lot for the reply! I tried a bit by tweaking <code class="n

Inconsistent behavior and how to scale up about cloudwatchlogsbeat HOT 6 CLOSED

kimlym commented on July 2, 2024

Inconsistent behavior and how to scale up

from cloudwatchlogsbeat.

Comments (6)

kimlym commented on July 2, 2024 1

Thanks a lot! I think that's all my questions at the moment!

Really appreciate the answers and all the great work!

from cloudwatchlogsbeat.

kkentzo commented on July 2, 2024

Hello!

the application logs will help you troubleshoot your situation (run ./cloudwatchlogsbeat -e -d '*' for viewing the debug logs as well). Feel free to post some portion of the output here.

I am assuming you've read the comments in the provided configuration sample and you've set the appropriate values for your use case. I would suggest deactivating hot streams (hot_stream_event_horizon:0) and try to troubleshoot the basic functionality.

As for your scaling question: the application process is not distributed in the sense that it does not scale horizontally. If you fire up more instances, it will process the logs as many times as you have processes running. However, the logs groups are processed in parallel within the application process (using goroutines), so your basic bottlenecks would be network- or AWS- related (e.g. AWS API throttling limits).

Hope the above help - good luck!

from cloudwatchlogsbeat.

kimlym commented on July 2, 2024

Hi, thanks for the reply!

Yes I've gone through the config several times and the source code a little bit. Not a go expert, so it's bit hard for me to understand the whole thing.

So when I debug earlier, I was basically not see my stream get monitored by the group for a while, but once it's picked up, it get processed relative faster. Some example output for illustration,

2020-04-16T16:34:36.976Z	INFO	cwl/group.go:110	report[group] 251 2 0 /aws/app-group/ 3m0s
2020-04-16T16:34:42.895Z	INFO	cwl/stream.go:143	report[stream] 82 /aws/app1/logs 3m0s
2020-04-16T16:34:45.111Z	INFO	cwl/stream.go:143	report[stream] 4  /aws/app2/logs 3m0s

I was hoping to see a test-app to pop out in the logs like above, but for a very long period time it didn't show up. Is it because I have too many streams in the group and the roundrobin loadbalancer missed it sometimes as other streams are more active?

And I noticed that the report_frequency actually affects the group to pick up newly active streams? Cause I ran a test again this morning, it takes about 16min to finish, but ever after that, it runs smoothly.

So is it true that a new stream would be picked until the report interval is met, i.e. after 5 minutes, if the stream is not active for more than stream_event_horizon, i.e. 2 hours ?

I will debug with hot_stream off and if I am able to see any useful info, will let you know. Thanks!

from cloudwatchlogsbeat.

kkentzo commented on July 2, 2024

The report interval should not be affecting the stream monitoring - it's just a log statement printing out some counters.

My understanding is that your issue has to do with the delay in picking up streams when the application starts. After that, things tend to be going "smoothly".

One reason why this happens is that the application has to process all the events within the stream_event_horizon - any event older than this value will be ignored. So for stream_event_horizon=2h, the application has to process 2 hours worth of events when it first starts and that's why it appears to be slower. On top of that, because of the larger amount of data that need to be fetched from cloudwatchlogs, you would also get more throttling AWS API errors (these are handed gracefully by the application).

Once the application processes all the events within the stream_event_horizon at startup time, then only new events will be considered, so things will appear to work smoothly.

Hope this helps.

from cloudwatchlogsbeat.

kimlym commented on July 2, 2024

Thanks a lot for the reply!

I tried a bit by tweaking stream_event_horizon and the logs do come up faster and experience less latency. And after that things are going much smoother.

Hopefully one last question regarding the throttling. After retuning the settings, I notice that even the app itself is not giving back throttling: rate limit exceeded as often, I'm constantly experiencing throttling when manually using the AWS Console for cloudwatch. So I'm curious that if that is true the app is always set to use as much AWS api calls as possible or there is a limit I can set, e.g., only process 10 streams at given time range for x amount of time?

Is the queue_size something that I can use to control this?

Thanks!

from cloudwatchlogsbeat.

kkentzo commented on July 2, 2024

The queue_size is not relevant to the aws throttling errors. A counter-measure for decreasing the number of aws api calls (and reducing throttling errors) is to try increasing the values of the frequency parameters, for example:

stream_event_refresh_frequency (how often to inquire for new events in streams),
hot_stream_event_refresh_frequency (how often to inquire for new events in "hot" streams),
group_refresh_frequency (how often to inquire for new log groups) and
stream_refresh_frequency (how often to inquire for new event streams within log groups)

The AWS client uses the default retry policy (10 retries with exponential backoff). Limits involving API calls per unit of time have not been implemented.

from cloudwatchlogsbeat.

Inconsistent behavior and how to scale up about cloudwatchlogsbeat HOT 6 CLOSED

Comments (6)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent