Code Monkey home page Code Monkey logo

Comments (6)

kimlym avatar kimlym commented on July 2, 2024 1

Thanks a lot! I think that's all my questions at the moment!

Really appreciate the answers and all the great work!

from cloudwatchlogsbeat.

kkentzo avatar kkentzo commented on July 2, 2024

Hello!

the application logs will help you troubleshoot your situation (run ./cloudwatchlogsbeat -e -d '*' for viewing the debug logs as well). Feel free to post some portion of the output here.

I am assuming you've read the comments in the provided configuration sample and you've set the appropriate values for your use case. I would suggest deactivating hot streams (hot_stream_event_horizon:0) and try to troubleshoot the basic functionality.

As for your scaling question: the application process is not distributed in the sense that it does not scale horizontally. If you fire up more instances, it will process the logs as many times as you have processes running. However, the logs groups are processed in parallel within the application process (using goroutines), so your basic bottlenecks would be network- or AWS- related (e.g. AWS API throttling limits).

Hope the above help - good luck!

from cloudwatchlogsbeat.

kimlym avatar kimlym commented on July 2, 2024

Hi, thanks for the reply!

Yes I've gone through the config several times and the source code a little bit. Not a go expert, so it's bit hard for me to understand the whole thing.

So when I debug earlier, I was basically not see my stream get monitored by the group for a while, but once it's picked up, it get processed relative faster. Some example output for illustration,

2020-04-16T16:34:36.976Z	INFO	cwl/group.go:110	report[group] 251 2 0 /aws/app-group/ 3m0s
2020-04-16T16:34:42.895Z	INFO	cwl/stream.go:143	report[stream] 82 /aws/app1/logs 3m0s
2020-04-16T16:34:45.111Z	INFO	cwl/stream.go:143	report[stream] 4  /aws/app2/logs 3m0s

I was hoping to see a test-app to pop out in the logs like above, but for a very long period time it didn't show up. Is it because I have too many streams in the group and the roundrobin loadbalancer missed it sometimes as other streams are more active?

And I noticed that the report_frequency actually affects the group to pick up newly active streams? Cause I ran a test again this morning, it takes about 16min to finish, but ever after that, it runs smoothly.

So is it true that a new stream would be picked until the report interval is met, i.e. after 5 minutes, if the stream is not active for more than stream_event_horizon, i.e. 2 hours ?

I will debug with hot_stream off and if I am able to see any useful info, will let you know. Thanks!

from cloudwatchlogsbeat.

kkentzo avatar kkentzo commented on July 2, 2024

The report interval should not be affecting the stream monitoring - it's just a log statement printing out some counters.

My understanding is that your issue has to do with the delay in picking up streams when the application starts. After that, things tend to be going "smoothly".

One reason why this happens is that the application has to process all the events within the stream_event_horizon - any event older than this value will be ignored. So for stream_event_horizon=2h, the application has to process 2 hours worth of events when it first starts and that's why it appears to be slower. On top of that, because of the larger amount of data that need to be fetched from cloudwatchlogs, you would also get more throttling AWS API errors (these are handed gracefully by the application).

Once the application processes all the events within the stream_event_horizon at startup time, then only new events will be considered, so things will appear to work smoothly.

Hope this helps.

from cloudwatchlogsbeat.

kimlym avatar kimlym commented on July 2, 2024

Thanks a lot for the reply!

I tried a bit by tweaking stream_event_horizon and the logs do come up faster and experience less latency. And after that things are going much smoother.

Hopefully one last question regarding the throttling. After retuning the settings, I notice that even the app itself is not giving back throttling: rate limit exceeded as often, I'm constantly experiencing throttling when manually using the AWS Console for cloudwatch. So I'm curious that if that is true the app is always set to use as much AWS api calls as possible or there is a limit I can set, e.g., only process 10 streams at given time range for x amount of time?

Is the queue_size something that I can use to control this?

Thanks!

from cloudwatchlogsbeat.

kkentzo avatar kkentzo commented on July 2, 2024

The queue_size is not relevant to the aws throttling errors. A counter-measure for decreasing the number of aws api calls (and reducing throttling errors) is to try increasing the values of the frequency parameters, for example:

  • stream_event_refresh_frequency (how often to inquire for new events in streams),
  • hot_stream_event_refresh_frequency (how often to inquire for new events in "hot" streams),
  • group_refresh_frequency (how often to inquire for new log groups) and
  • stream_refresh_frequency (how often to inquire for new event streams within log groups)

The AWS client uses the default retry policy (10 retries with exponential backoff). Limits involving API calls per unit of time have not been implemented.

from cloudwatchlogsbeat.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.