Code Monkey home page Code Monkey logo

Comments (11)

hossain-rayhan avatar hossain-rayhan commented on September 25, 2024 1

Hi @akshayhiremath , I believe we have default retry mechanism implemented in our output plugin which is supported by the fulent-bit. If Firehose returns a throttling error, the plugin should return FLB_RETRY and fluent-bit will try sending the log again. Here is the code implemented I think.

Can you please share your full config file and debug log.

from amazon-kinesis-firehose-for-fluent-bit.

PettitWesley avatar PettitWesley commented on September 25, 2024 1

The fluent bit image being used is amazon/aws-for-fluent-bit:1.2.0.

Please upgrade. That version is extremely old. Its one of the first versions we ever released, more than a year ago.

from amazon-kinesis-firehose-for-fluent-bit.

PettitWesley avatar PettitWesley commented on September 25, 2024 1

If the limit is less and the data volume as well is small then the microbursts should be handled with a retry of the output plugin (which is the exact case we have). How to do this? As per fluent bit documentation, the output plugin should support it.

I do mostly agree with your reasoning... but having your destination pipe slightly over provisioned is also wise/safe. What if you get a spike in logs? Using retries to handle spikes only works if they are very small and very occasional.

Of course yes, the plugin should retry. And the current code will. The old code should have as well but we don't really support debugging versions that are this old. So please upgrade.

from amazon-kinesis-firehose-for-fluent-bit.

akshayhiremath avatar akshayhiremath commented on September 25, 2024

Hey @hossain-rayhan, thanks for providing reference to the code. Is there a way I can check the version of the plugin being used in our setup? where can I find the version of the plugin?

The config file is attached. Extension of the file is changed to txt (to comply with github, it doesn't accept yaml yet :-) )
I don't have full logs but I'm adding below the logs from the case we had opened to the AWS.
Right now, because there was a lot of noise in our project about this, someone in the organization decided to increase the limit of firehose to 5MB/s.

After this, I am looking at the CloudWatch metrics and I find that there are microbursts going a little over 1 MB/s but overall logs volume being pushed to firehose is quite less.

I want to get the limit reduced back to 1 MB/s for one of our AWS accounts and retry.
Error Logs, when we were getting throttling error, are as follows, sorry these don't provide enough context but I will keep you posted,

time="2021-04-05T17:45:55Z" level=error msg="[firehose] 327 records failed to be delivered\n"
time="2021-04-05T17:45:55Z" level=error msg="[firehose] 284 records failed to be delivered\n"
time="2021-04-05T17:45:55Z" level=error msg="[firehose] 149 records failed to be delivered\n"
time="2021-04-05T17:45:59Z" level=error msg="[firehose] 137 records failed to be delivered\n"
fbConf-redact.yaml.txt

from amazon-kinesis-firehose-for-fluent-bit.

hossain-rayhan avatar hossain-rayhan commented on September 25, 2024

Hi @akshayhiremath , when the fluent-bit starts it prints the version name in the beginning. It's important to know the version you are using.

If you cannot find it, can you tell me the image version you are using. We can also know the version from gitHub release.

If you can run the fluent-bit by enabling the log-level to DEBUG, it will give us more insights in the log to see whats going wrong.

from amazon-kinesis-firehose-for-fluent-bit.

akshayhiremath avatar akshayhiremath commented on September 25, 2024

Hey @hossain-rayhan, fluent-bit version we are using is 1.20. I'm looking for a way to find the output plugin version. Let me check the image.

Sorry for the late response.

Output:
bash-4.2# cd fluent-bit/
bash-4.2# ls -lrt
total 44444
-rw-r--r-- 1 root root 22587776 Jul 9 2019 firehose.so
-rw-r--r-- 1 root root 22917608 Jul 9 2019 cloudwatch.so
drwxr-xr-x 2 root root 6 Jul 9 2019 log
drwxr-xr-x 2 root root 24 Jul 9 2019 bin
drwxr-xr-x 1 root root 24 Jul 9 2019 licenses
drwxrwxrwx 3 root root 158 Apr 23 04:07 etc
bash-4.2# cd bin
bash-4.2# ls -lrt
total 15716
-rwxr-xr-x 1 root root 16090432 Jul 9 2019 fluent-bit
bash-4.2# ./fluent-bit --version
Fluent Bit v1.2.0

from amazon-kinesis-firehose-for-fluent-bit.

zhonghui12 avatar zhonghui12 commented on September 25, 2024

Hi @akshayhiremath, fluent bit 1.2.0 is an old version and the latest version is 1.7.4: https://fluentbit.io/announcements/v1.7.4/. Can you please try upgrade it to 1.7.4? Because I think we have default retry mechanism implemented in our output plugin. If you are using our aws-for-fluent-bit, we have released 2.14.0 for it to support fluent bit 1.7.4: https://github.com/aws/aws-for-fluent-bit/releases/tag/v2.14.0.

from amazon-kinesis-firehose-for-fluent-bit.

PettitWesley avatar PettitWesley commented on September 25, 2024

someone in the organization decided to increase the limit of firehose to 5MB/s.

If you are having throttling issues with Firehose you should request a limit increase. This is the right approach. If your destination can not handle your log throughput- there is nothing fluent bit can do to help.

It is very common for customers who use Firehose for logs to need higher limits than the default.

from amazon-kinesis-firehose-for-fluent-bit.

akshayhiremath avatar akshayhiremath commented on September 25, 2024

There are two issues,

  1. If the limit is less and the data volume as well is small then the microbursts should be handled with a retry of the output plugin (which is the exact case we have). How to do this? As per fluent bit documentation, the output plugin should support it.
  2. How to avoid the microbursts altogether? By better managing the data being pushed to the firehose, maybe by buffering the data using fluentbit capabilities.

What if the actual data being transferred is way below 5MB/s with some occasional microbursts. With better analysis of cloudwatch metrics, I have found there are just a few seconds in every 3 hours when we go a little over 1MB/s. This little is so low it is almost a few KBs over an MB.
Rest all the time the traffic is less than 300KB/s.

Shouldn't the microbursts be handled better than just increasing the limit? If the volume of data being pushed is really more than 1MB/s then we should certainly increase the limit.
Also, this frequent small chunk delivery to the destination increases the cost unnecessarily. AWS charges S3 based on the number of requests as well.

Ref: https://docs.aws.amazon.com/firehose/latest/dev/limits.html

"If the increased quota is much higher than the running traffic, it causes small delivery batches to destinations. This is inefficient and can result in higher costs at the destination services. Be sure to increase the quota only to match current running traffic, and increase the quota further if traffic increases."

from amazon-kinesis-firehose-for-fluent-bit.

akshayhiremath avatar akshayhiremath commented on September 25, 2024

The fluent bit image being used is amazon/aws-for-fluent-bit:1.2.0.

Image hash: v2/amazon/aws-for-fluent-bit/manifests/sha256:8b293d1c731f6626604b49edc2e9897528a10e05e0510018ae496569ab8eae8b

from amazon-kinesis-firehose-for-fluent-bit.

akshayhiremath avatar akshayhiremath commented on September 25, 2024

Sure we will upgrade the plugin version and see.
This issue could be closed because output plugin supports retry in a newer version now.

About logs spike:
For non production environments where we have quite low traffic (a handful users) and where, even in projection, we don’t see any growth in users or load on environments, we will try to manage it with fluentbit configuration management.
Right now the spikes are basically for the time period when the automated test suite hits the environment. After proper tuning if we will start hitting the limit then that would be an early alarm than finding the AWS service utilization in the shocking invoice.
We have multiple AWS accounts and aggregated cost can make a significant difference.

from amazon-kinesis-firehose-for-fluent-bit.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.