Code Monkey home page Code Monkey logo

Comments (17)

RobiD avatar RobiD commented on August 11, 2024 1

@frodenas it works with the upgrade. Many thanks!

from firehose_exporter.

RobiD avatar RobiD commented on August 11, 2024

Dear @frodenas,

do you have any news about this ticket?

Thanks!

Regards,
Robert

from firehose_exporter.

frodenas avatar frodenas commented on August 11, 2024

Is it possible that for the same exchange you have two different queues, one with name insurable-interest-service and another one with insurableInterestService?

Do you get a 500 error message from the firehose exporter and then it dies? or just prints those errors messages?

A similar issue was reported 2 months ago at #20. I changed the behavior to not fail, but just print an error message in the logs and not to report the metric. The problem here is that the metric name reported by the firehose contains the exchange and queue names, instead of being reported as labels. The exporter needs to do a name/description conversion because prometheus has some rules w/r/t metrics names, and in those cases, there's a collision.

from firehose_exporter.

RobiD avatar RobiD commented on August 11, 2024

Dear @frodenas,

its unlikely that we would use two different queues.

To answer the second question. It only prints those messages. The funny part is that its always different number:

An error has occurred during metrics collection:

24 error(s) occurred:
* collected metric firehose_value_metric_p_rabbitmq_p_rabbitmq_rabbitmq_queues_0_d_8748_f_6_8_c_33_4_c_51_99_ed_1_f_78_c_077761_e_deal_duplication_exchange_coverage_service_depth label:<name:"bosh_deployment" value:"cf-rabbitmq" > label:<name:"bosh_job_id" value:"0763273a-49d1-4615-8ce0-0e9cca4dbdcb" > label:<name:"bosh_job_ip" value:"10.16.190.138" > label:<name:"bosh_job_name" value:"rabbitmq-server" > label:<name:"environment" value:"" > label:<name:"origin" value:"p-rabbitmq" > label:<name:"unit" value:"count" > gauge:<value:0 >  has help "Cloud Foundry Firehose '/p-rabbitmq/rabbitmq/queues/0d8748f6-8c33-4c51-99ed-1f78c077761e/DealDuplicationExchange.coverage-service/depth' value metric from 'p-rabbitmq'." but should have "Cloud Foundry Firehose '/p-rabbitmq/rabbitmq/queues/0d8748f6-8c33-4c51-99ed-1f78c077761e/DealDuplicationExchange.coverageService/depth' value metric from 'p-rabbitmq'."

...

It does not give any 500 errors, nor does it die. The logs from the container look like this:


time="2018-01-31T09:04:10-05:00" level=info msg="Starting firehose_exporter (version=, branch=, revision=)" source="firehose_exporter.go:250"
time="2018-01-31T09:04:10-05:00" level=info msg="Build context (go=go1.7.3, user=, date=)" source="firehose_exporter.go:251"
time="2018-01-31T09:04:10-05:00" level=info msg="Starting Firehose Nozzle..." source="firehose_nozzle.go:58"
time="2018-01-31T09:04:10-05:00" level=info msg="Listening on :9186" source="firehose_exporter.go:328"

Also, when we stop the container and deploy it from scratch, it always works for couple of seconds and then it starts giving us the above errors.

Any idea how to resolve it?

from firehose_exporter.

frodenas avatar frodenas commented on August 11, 2024

@RobiD Can you please tell me what version of RabbitMQ for PCF are you using? I'll try to some research on that tile.

from firehose_exporter.

RobiD avatar RobiD commented on August 11, 2024

Dear @frodenas,

here you go:
Pivotal Elastic Runtime version: v1.11.12
RabbitMQ version: 1.10.8

Your assistance in this matter would be greatly appreciated. Thanks!

Regards,
Robert

from firehose_exporter.

RobiD avatar RobiD commented on August 11, 2024

Hi @frodenas,

do you have any news regarding this ticket?

Thanks!

Regards,
Robert

from firehose_exporter.

frodenas avatar frodenas commented on August 11, 2024

@RobiD apologies for the late reply, it took me some time to set up an environment with PCF 1.11 & RMQ 1.10 in order to debug this issue.

I'm only able to reproduce this problem when I create two queues with similar names at the same vhost. In your output, there seems to be two queues (DealDuplicationExchange.insurable-interest-service & DealDuplicationExchange.insurableInterestService) defined at the same vhost (0d8748f6-8c33-4c51-99ed-1f78c077761e). RMQ service metrics returns 2 metrics for each queue (consumers & queue depth), and the name of those metrics contains the vhost & queue name, so for the above queues, it will return:

  • /p-rabbitmq/rabbitmq/queues/0d8748f6-8c33-4c51-99ed-1f78c077761e/DealDuplicationExchange.insurable-interest-service/consumers
  • /p-rabbitmq/rabbitmq/queues/0d8748f6-8c33-4c51-99ed-1f78c077761e/DealDuplicationExchange.insurable-interest-service/depth
  • /p-rabbitmq/rabbitmq/queues/0d8748f6-8c33-4c51-99ed-1f78c077761e/DealDuplicationExchange.insurableInterestService/consumers
  • /p-rabbitmq/rabbitmq/queues/0d8748f6-8c33-4c51-99ed-1f78c077761e/DealDuplicationExchange.insurableInterestService/depth

Prometheus has strict rules on metric naming, so this exporter sanitizes the metric names returned by the firehose before reporting them to prometheus. At the above case, both DealDuplicationExchange.insurable-interest-service & DealDuplicationExchange.insurableInterestService queue names collide when they are sanitized, as both returned metrics are deal_duplication_exchange_insurable_interest_service. When there are collisions, the resulting metric value will be the last reported, so in this case, only one of the metric values will be reported without printing any error. But as part of the returned metric, the exporter also returns a metric help, and here, the exporter does NOT sanitize it, because we want to return the original firehose metric name. So what is happening here is that the exporter returns 2 metrics with the same name but different help, and therefore, prometheus complains about this (has help ... but should have ...).

There is no easy workaround for this. But before trying to find a solution, can you please confirm if at the vhost 0d8748f6-8c33-4c51-99ed-1f78c077761e you see the DealDuplicationExchange.insurable-interest-service & DealDuplicationExchange.insurableInterestService queues?

from firehose_exporter.

RobiD avatar RobiD commented on August 11, 2024

Hi @frodenas,

thanks for this comprehensive research. Yes, I can confirm that both queues are available.

The correct name of the second queue is: DealCreationExchange.insurable-interest-service

Thanks!

Regards,
Robert

from firehose_exporter.

geofffranks avatar geofffranks commented on August 11, 2024

We recently ran into this issue ourselves, where there were to queuest named similarly, one with a - as the word delimiter, one as a .. Is there a way we can update the exporter to log these errors + then report them into the exporter metrics, but not break metrics collection for everything in the firehose?

from firehose_exporter.

frodenas avatar frodenas commented on August 11, 2024

@geofffranks what version are you using? v4.2.7 should fix that, it just logs an error, discards the conflicting metric, and keeps reporting everything else. Do you see still happening on newer versions?

from firehose_exporter.

geofffranks avatar geofffranks commented on August 11, 2024
  build user:       root@3856e24b8b0b
  build date:       20180202-07:01:33
  go version:       go1.9.2``` (from prometheus-boshrelease v21.1.1)

from firehose_exporter.

geofffranks avatar geofffranks commented on August 11, 2024

The exporter isn't panicking for us, but the error output shows up in the /metrics endpoint (with no metric data), rather than the app logs.

from firehose_exporter.

RobiD avatar RobiD commented on August 11, 2024

Hi @frodenas,

do you have any news on how to resolve the above issue?

Thanks!

Regards,
Robert

from firehose_exporter.

frodenas avatar frodenas commented on August 11, 2024

@RobiD are you also seeing those error outputs at the /metrics endpoint?

from firehose_exporter.

RobiD avatar RobiD commented on August 11, 2024

Hi @frodenas,

Yes, we also see the error outputs in the /metrics endpoint. See the snippet of the message here:

An error has occurred during metrics collection:

20 error(s) occurred:
* collected metric firehose_value_metric_p_rabbitmq_p_rabbitmq_rabbitmq_queues_e_5_d_35_ba_5_1360_4514_8_a_04_6_b_850_fcd_1192_deal_renewal_exchange_insurable_interest_service_depth label:<name:"bosh_deployment" value:"cf-rabbitmq" > label:<name:"bosh_job_id" value:"0763273a-49d1-4615-8ce0-0e9cca4dbdcb" > label:<name:"bosh_job_ip" value:"10.16.190.138" > label:<name:"bosh_job_name" value:"rabbitmq-server" > label:<name:"environment" value:"" > label:<name:"origin" value:"p-rabbitmq" > label:<name:"unit" value:"count" > gauge:<value:29 >  has help "Cloud Foundry Firehose '/p-rabbitmq/rabbitmq/queues/e5d35ba5-1360-4514-8a04-6b850fcd1192/DealRenewalExchange.insurableInterestService/depth' value metric from 'p-rabbitmq'." but should have "Cloud Foundry Firehose '/p-rabbitmq/rabbitmq/queues/e5d35ba5-1360-4514-8a04-6b850fcd1192/DealRenewalExchange.insurable-interest-service/depth' value metric from 'p-rabbitmq'."
* collected metric firehose_value_metric_p_rabbitmq_p_rabbitmq_rabbitmq_queues_35_e_8_f_6_af_6250_44_c_7_ae_2_d_0_e_672693_ac_5_a_deal_duplication_exchange_coverage_service_consumers label:<name:"bosh_deployment" value:"cf-rabbitmq" > label:<name:"bosh_job_id" value:"0763273a-49d1-4615-8ce0-0e9cca4dbdcb" > label:<name:"bosh_job_ip" value:"10.16.190.138" > label:<name:"bosh_job_name" value:"rabbitmq-server" > label:<name:"environment" value:"" > label:<name:"origin" value:"p-rabbitmq" > label:<name:"unit" value:"count" > gauge:<value:0 >  has help "Cloud Foundry Firehose '/p-rabbitmq/rabbitmq/queues/35e8f6af-6250-44c7-ae2d-0e672693ac5a/DealDuplicationExchange.coverageService/consumers' value metric from 'p-rabbitmq'." but should have "Cloud Foundry Firehose '/p-rabbitmq/rabbitmq/queues/35e8f6af-6250-44c7-ae2d-0e672693ac5a/DealDuplicationExchange.coverage-service/consumers' value metric from 'p-rabbitmq'."

I will send the complete error message directly to your email.

Regards,
Robert

from firehose_exporter.

frodenas avatar frodenas commented on August 11, 2024

Thanks. This has been fixed at v5.0.3, those errors are now logged and the /metrics endpoint will not show them and will report all metrics.

I'll cut a new prometheus bosh release with this version soon, in the meantime, if you want to test it, you can override manually the firehose_exporter binary.

from firehose_exporter.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.