Comments (7)
Yup, thanks for the quick support, we appreciate.
We will plan the prometheus upgrade.
thanks
Harry
from firehose_exporter.
@metskem This is really hard to reproduce and debug. I have a guess, and it's just a guess, that this is related to the number of collectors (metrics) gathered and how the prometheus golang client handles goroutines.
I've updated the vendored prometheus library because it contains a change to create goroutines adaptively during metrics gathering, exactly where your firehose_exporter
instance panics.
I don't know if this will solve the problem or not, but I need to ask you a favor. I just cut a new release with this change, do you mind testing it in your environment? If yes, can you please download firehose_exporter-5.0.1.linux-amd64.tar.gz, untar it in one your failing firehose vms, and replace the binary at /var/vcap/packages/firehose_exporter/bin
? The 5.x series contains a breaking change, so you will also need to update the firehose ctl script to switch the flags to use --
instead of -
. Then restart the firehose_exporter
process and check the logs to see if still panics.
Based on the results I can cut a new version of the prometheus bosh release with the new firehose_exporter, or I will need to do more research. Thanks!
from firehose_exporter.
done, it is running now on our devtest environment (the --skip-ssl-verify also needed a tweak) :
time="2018-02-02T08:35:20Z" level=info msg="Starting firehose_exporter (version=5.0.1, branch=master, revision=38a5c23be483c97f4d241af85a81a6b4ad4ab9a9)" source="firehose_exporter.go:152"
time="2018-02-02T08:35:20Z" level=info msg="Build context (go=go1.9.2, user=root@3856e24b8b0b, date=20180202-07:01:33)" source="firehose_exporter.go:153"
time="2018-02-02T08:35:20Z" level=info msg="Starting Firehose Nozzle..." source="firehose_nozzle.go:58"
time="2018-02-02T08:35:20Z" level=info msg="Listening on :9186" source="firehose_exporter.go:230"
I see data in grafana coming in, so that looks good.
I will have it run for a few minutes/hours, and then do the same change on our 2 prod envs, just so we have more change of (not) knowing if it helps or not.
Then run it for, let's say, a week, and come back again here. Or earlier if the panic pops up again.
thanks for the quick reaction!
cheers,
Harry
from firehose_exporter.
Thanks for testing it, looking forward to the results!
from firehose_exporter.
The new version looks stable. 2 of the 3 envs have not had the issue at all.
One of them did, but checking out it appeared that in that env the firehose VM had been recreated by BOSH and therefore was running the old version again (so even more proof that the new version works better).
Now how to proceed, should we upgrade our prometheus-boshrelease (we are running 19.0.0 atm)?
Is there already a prometheus-boshrelease with firehose_exporter version 5.0.1? How can I see which versions of them are in the release btw?
thanks,
Harry
from firehose_exporter.
Looking at https://github.com/bosh-prometheus/prometheus-boshrelease/releases I would think that firehose_exporter 4.2.5 is the highest version, so that would require a new prometheus-boshrelease then (including firehose_exporter 5.0.1)?
from firehose_exporter.
@metskem Glad to hear it worked!
I just cut a new prometheus bosh release v21.1.1 with the new firehose_exporter
. You will need to upgrade to this version in order to get the fix.
Closing the issue, but please feel free to reopen if the problem happens again in the future.
from firehose_exporter.
Related Issues (20)
- Why firehose_http_start_stop_requests is not a counter? HOT 4
- The process misses to export all firehose tags keys HOT 5
- Application CounterEvent metrics are not splitted over the application instances HOT 3
- The process misses to export the firehose tag key "id" HOT 2
- firehose-exporter is missing in depth heap data ( Survivor Space, Eden Space, Metaspace etc) HOT 3
- bug: hangs after disconnect and does not attempt to reconnect HOT 2
- firehose_value_metric_metrics_forwarder_http_server* spam a huge numvers of diffenent metrics name that kills my prometheus HOT 1
- Lost a lot of metrics like firehose_value_metric_rep_capacity_total_containers after upgarde to vertsion 6.x HOT 1
- with upgrade to 6.0.0 no firehose metrics is shown HOT 2
- with upgrade to 6.0.0 no firehose metrics is shown HOT 3
- with 6.1.0, firehose_http_start_stop_server_request_duration_seconds is missing HOT 2
- "--doppler.subscription-id" doesn't seem to be working. HOT 2
- Can you get metrics for specific apps in PCF or just the space? HOT 2
- "discarded: duplicate label names" errors in the log HOT 3
- Counter Event Labels Have Double Underscore HOT 4
- value metric origin difficult to use HOT 3
- Is it possible to monitor emitted log messages per app or per org? HOT 2
- The bosh_deployment tag appears to be missing from some metrics HOT 7
- Release 7.1.0 returns 404's on all endpoints HOT 3
- Duplicate labels for application_id and application_name HOT 9
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from firehose_exporter.