Comments (17)
@frodenas it works with the upgrade. Many thanks!
from firehose_exporter.
Dear @frodenas,
do you have any news about this ticket?
Thanks!
Regards,
Robert
from firehose_exporter.
Is it possible that for the same exchange you have two different queues, one with name insurable-interest-service
and another one with insurableInterestService
?
Do you get a 500 error message from the firehose exporter and then it dies? or just prints those errors messages?
A similar issue was reported 2 months ago at #20. I changed the behavior to not fail, but just print an error message in the logs and not to report the metric. The problem here is that the metric name reported by the firehose contains the exchange and queue names, instead of being reported as labels. The exporter needs to do a name/description conversion because prometheus has some rules w/r/t metrics names, and in those cases, there's a collision.
from firehose_exporter.
Dear @frodenas,
its unlikely that we would use two different queues.
To answer the second question. It only prints those messages. The funny part is that its always different number:
An error has occurred during metrics collection:
24 error(s) occurred:
* collected metric firehose_value_metric_p_rabbitmq_p_rabbitmq_rabbitmq_queues_0_d_8748_f_6_8_c_33_4_c_51_99_ed_1_f_78_c_077761_e_deal_duplication_exchange_coverage_service_depth label:<name:"bosh_deployment" value:"cf-rabbitmq" > label:<name:"bosh_job_id" value:"0763273a-49d1-4615-8ce0-0e9cca4dbdcb" > label:<name:"bosh_job_ip" value:"10.16.190.138" > label:<name:"bosh_job_name" value:"rabbitmq-server" > label:<name:"environment" value:"" > label:<name:"origin" value:"p-rabbitmq" > label:<name:"unit" value:"count" > gauge:<value:0 > has help "Cloud Foundry Firehose '/p-rabbitmq/rabbitmq/queues/0d8748f6-8c33-4c51-99ed-1f78c077761e/DealDuplicationExchange.coverage-service/depth' value metric from 'p-rabbitmq'." but should have "Cloud Foundry Firehose '/p-rabbitmq/rabbitmq/queues/0d8748f6-8c33-4c51-99ed-1f78c077761e/DealDuplicationExchange.coverageService/depth' value metric from 'p-rabbitmq'."
...
It does not give any 500 errors, nor does it die. The logs from the container look like this:
time="2018-01-31T09:04:10-05:00" level=info msg="Starting firehose_exporter (version=, branch=, revision=)" source="firehose_exporter.go:250"
time="2018-01-31T09:04:10-05:00" level=info msg="Build context (go=go1.7.3, user=, date=)" source="firehose_exporter.go:251"
time="2018-01-31T09:04:10-05:00" level=info msg="Starting Firehose Nozzle..." source="firehose_nozzle.go:58"
time="2018-01-31T09:04:10-05:00" level=info msg="Listening on :9186" source="firehose_exporter.go:328"
Also, when we stop the container and deploy it from scratch, it always works for couple of seconds and then it starts giving us the above errors.
Any idea how to resolve it?
from firehose_exporter.
@RobiD Can you please tell me what version of RabbitMQ for PCF are you using? I'll try to some research on that tile.
from firehose_exporter.
Dear @frodenas,
here you go:
Pivotal Elastic Runtime version: v1.11.12
RabbitMQ version: 1.10.8
Your assistance in this matter would be greatly appreciated. Thanks!
Regards,
Robert
from firehose_exporter.
Hi @frodenas,
do you have any news regarding this ticket?
Thanks!
Regards,
Robert
from firehose_exporter.
@RobiD apologies for the late reply, it took me some time to set up an environment with PCF 1.11 & RMQ 1.10 in order to debug this issue.
I'm only able to reproduce this problem when I create two queues with similar names at the same vhost. In your output, there seems to be two queues (DealDuplicationExchange.insurable-interest-service
& DealDuplicationExchange.insurableInterestService
) defined at the same vhost (0d8748f6-8c33-4c51-99ed-1f78c077761e
). RMQ service metrics returns 2 metrics for each queue (consumers & queue depth), and the name of those metrics contains the vhost & queue name, so for the above queues, it will return:
/p-rabbitmq/rabbitmq/queues/0d8748f6-8c33-4c51-99ed-1f78c077761e/DealDuplicationExchange.insurable-interest-service/consumers
/p-rabbitmq/rabbitmq/queues/0d8748f6-8c33-4c51-99ed-1f78c077761e/DealDuplicationExchange.insurable-interest-service/depth
/p-rabbitmq/rabbitmq/queues/0d8748f6-8c33-4c51-99ed-1f78c077761e/DealDuplicationExchange.insurableInterestService/consumers
/p-rabbitmq/rabbitmq/queues/0d8748f6-8c33-4c51-99ed-1f78c077761e/DealDuplicationExchange.insurableInterestService/depth
Prometheus has strict rules on metric naming, so this exporter sanitizes the metric names returned by the firehose before reporting them to prometheus. At the above case, both DealDuplicationExchange.insurable-interest-service
& DealDuplicationExchange.insurableInterestService
queue names collide when they are sanitized, as both returned metrics are deal_duplication_exchange_insurable_interest_service
. When there are collisions, the resulting metric value will be the last reported, so in this case, only one of the metric values will be reported without printing any error. But as part of the returned metric, the exporter also returns a metric help, and here, the exporter does NOT sanitize it, because we want to return the original firehose metric name. So what is happening here is that the exporter returns 2 metrics with the same name but different help, and therefore, prometheus complains about this (has help ... but should have ...
).
There is no easy workaround for this. But before trying to find a solution, can you please confirm if at the vhost 0d8748f6-8c33-4c51-99ed-1f78c077761e
you see the DealDuplicationExchange.insurable-interest-service
& DealDuplicationExchange.insurableInterestService
queues?
from firehose_exporter.
Hi @frodenas,
thanks for this comprehensive research. Yes, I can confirm that both queues are available.
The correct name of the second queue is: DealCreationExchange.insurable-interest-service
Thanks!
Regards,
Robert
from firehose_exporter.
We recently ran into this issue ourselves, where there were to queuest named similarly, one with a -
as the word delimiter, one as a .
. Is there a way we can update the exporter to log these errors + then report them into the exporter metrics, but not break metrics collection for everything in the firehose?
from firehose_exporter.
@geofffranks what version are you using? v4.2.7 should fix that, it just logs an error, discards the conflicting metric, and keeps reporting everything else. Do you see still happening on newer versions?
from firehose_exporter.
build user: root@3856e24b8b0b
build date: 20180202-07:01:33
go version: go1.9.2``` (from prometheus-boshrelease v21.1.1)
from firehose_exporter.
The exporter isn't panicking for us, but the error output shows up in the /metrics endpoint (with no metric data), rather than the app logs.
from firehose_exporter.
Hi @frodenas,
do you have any news on how to resolve the above issue?
Thanks!
Regards,
Robert
from firehose_exporter.
@RobiD are you also seeing those error outputs at the /metrics
endpoint?
from firehose_exporter.
Hi @frodenas,
Yes, we also see the error outputs in the /metrics
endpoint. See the snippet of the message here:
An error has occurred during metrics collection:
20 error(s) occurred:
* collected metric firehose_value_metric_p_rabbitmq_p_rabbitmq_rabbitmq_queues_e_5_d_35_ba_5_1360_4514_8_a_04_6_b_850_fcd_1192_deal_renewal_exchange_insurable_interest_service_depth label:<name:"bosh_deployment" value:"cf-rabbitmq" > label:<name:"bosh_job_id" value:"0763273a-49d1-4615-8ce0-0e9cca4dbdcb" > label:<name:"bosh_job_ip" value:"10.16.190.138" > label:<name:"bosh_job_name" value:"rabbitmq-server" > label:<name:"environment" value:"" > label:<name:"origin" value:"p-rabbitmq" > label:<name:"unit" value:"count" > gauge:<value:29 > has help "Cloud Foundry Firehose '/p-rabbitmq/rabbitmq/queues/e5d35ba5-1360-4514-8a04-6b850fcd1192/DealRenewalExchange.insurableInterestService/depth' value metric from 'p-rabbitmq'." but should have "Cloud Foundry Firehose '/p-rabbitmq/rabbitmq/queues/e5d35ba5-1360-4514-8a04-6b850fcd1192/DealRenewalExchange.insurable-interest-service/depth' value metric from 'p-rabbitmq'."
* collected metric firehose_value_metric_p_rabbitmq_p_rabbitmq_rabbitmq_queues_35_e_8_f_6_af_6250_44_c_7_ae_2_d_0_e_672693_ac_5_a_deal_duplication_exchange_coverage_service_consumers label:<name:"bosh_deployment" value:"cf-rabbitmq" > label:<name:"bosh_job_id" value:"0763273a-49d1-4615-8ce0-0e9cca4dbdcb" > label:<name:"bosh_job_ip" value:"10.16.190.138" > label:<name:"bosh_job_name" value:"rabbitmq-server" > label:<name:"environment" value:"" > label:<name:"origin" value:"p-rabbitmq" > label:<name:"unit" value:"count" > gauge:<value:0 > has help "Cloud Foundry Firehose '/p-rabbitmq/rabbitmq/queues/35e8f6af-6250-44c7-ae2d-0e672693ac5a/DealDuplicationExchange.coverageService/consumers' value metric from 'p-rabbitmq'." but should have "Cloud Foundry Firehose '/p-rabbitmq/rabbitmq/queues/35e8f6af-6250-44c7-ae2d-0e672693ac5a/DealDuplicationExchange.coverage-service/consumers' value metric from 'p-rabbitmq'."
I will send the complete error message directly to your email.
Regards,
Robert
from firehose_exporter.
Thanks. This has been fixed at v5.0.3, those errors are now logged and the /metrics
endpoint will not show them and will report all metrics.
I'll cut a new prometheus bosh release with this version soon, in the meantime, if you want to test it, you can override manually the firehose_exporter
binary.
from firehose_exporter.
Related Issues (20)
- Duplicate labels for application_id and application_name HOT 9
- Why firehose_http_start_stop_requests is not a counter? HOT 4
- The process misses to export all firehose tags keys HOT 5
- Application CounterEvent metrics are not splitted over the application instances HOT 3
- The process misses to export the firehose tag key "id" HOT 2
- firehose-exporter is missing in depth heap data ( Survivor Space, Eden Space, Metaspace etc) HOT 3
- bug: hangs after disconnect and does not attempt to reconnect HOT 2
- firehose_value_metric_metrics_forwarder_http_server* spam a huge numvers of diffenent metrics name that kills my prometheus HOT 1
- Lost a lot of metrics like firehose_value_metric_rep_capacity_total_containers after upgarde to vertsion 6.x HOT 1
- with upgrade to 6.0.0 no firehose metrics is shown HOT 2
- with upgrade to 6.0.0 no firehose metrics is shown HOT 3
- with 6.1.0, firehose_http_start_stop_server_request_duration_seconds is missing HOT 2
- "--doppler.subscription-id" doesn't seem to be working. HOT 2
- Can you get metrics for specific apps in PCF or just the space? HOT 2
- "discarded: duplicate label names" errors in the log HOT 3
- Counter Event Labels Have Double Underscore HOT 4
- value metric origin difficult to use HOT 3
- Is it possible to monitor emitted log messages per app or per org? HOT 2
- The bosh_deployment tag appears to be missing from some metrics HOT 7
- Release 7.1.0 returns 404's on all endpoints HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from firehose_exporter.