Comments (7)
refactoring the exporter is not currently possible. Gathering info about release and stemcells is easy, but gathering processes is still a challenge.
If you're experiencing high cpu load at your bosh director, can you please open an issue at the bosh repo? It does not make any sense that director consumes so much cpu just to gather info from vms.
from prometheus-boshrelease.
@avasseur-pivotal It'd be interesting to see you BOSH VM configuration (VM CPU, memory, disk), because we never ran into this issue, even when using the default OM director configuration. Bear in mind also, that the bosh_exporter
FAQ recommends increasing the default scrape interval
and the scrape timeout
, but again something is really wrong with that BOSH when it takes so much time to fetch the VM states. It might happen that some VM's agents are unresponsive, but that should not take longer that 45 seconds. Why are so many tasks queued is also a mystery, I'll need to dig into the bosh logs to see what might be happening.
This release doesn't set any exporter by default, it's up to you to decide which exporter you want to use. Although the boshhmforwarder
is an option (outlined at the bosh_exporter FAQ), I'm reluctant to use it on my deployments because you then have a dependency on the CF Firehose. What we have used in some deployments is the graphite_exporter
and the BOSH Graphite Health Monitor plugin, but I understand that you cannot use this configuration when using OM (because there is no option to enable that plugin).
Also, I want to point that I've never been happy with the bosh_exporter
querying bosh and creating so many tasks, but it was the only way to fetch the VM IP addresses in order to use the service discovey. Some recent additions (cloudfoundry/bosh@6a70432) to bosh have made this task more feasible, so I plan to refactor the bosh_exporter
to use a different mechanism to gather both the VM IP addresses and metrics.
from prometheus-boshrelease.
Hi @frodenas,
The problem that @avasseur-pivotal describe is the same as I explain to you some time ago by email.
When I checked, I saw that one of the bosh worker process was stuck and stop working on the queue. Task were just piling up and the cpu was stuck..
Even when prometheus was stop, I had to restart all bosh process to recover.
I increase the scrape interval to 5m even 10m to be able to solve the issue.
I am working on the tsdb collector that I need to test it out, but I guess it should be possible to do the same as the graphite one.
from prometheus-boshrelease.
@avasseur-pivotal @shinji62 Have you guys opened an issue at the Bosh repo? I'd interesting to know why the worker processes consumes so much cpu and finally got stuck.
from prometheus-boshrelease.
Just a quick update. The plan here is to use:
-
the bosh_tsdb_exporter to gather vm metrics. This exporter will receive metrics from the BOSH OpenTSDB Health Monitor plugin and will be compatible with OpsManager (it only allows you to configure this plugin). This exporter doesn't hit the BOSH API (and it will not generate a Task), so the problem stated in this issue will be mitigated.
-
the
bosh_exporter
will be responsible to only gather administrative info (like releases, stemcells, ... being used) and the VM's IPs (but this will require a director version >= 261). To get this info, the exporter will not need to generate a task, so this will also help to mitigate the problem stated in this issue.
I'm still finishing 2) and doing some tests. After that, I'll update this release and the associated dashboards and alerts.
from prometheus-boshrelease.
@avasseur-pivotal just checking in to see if this is still happening; thanks
from prometheus-boshrelease.
I'll throw it out that I have a customer experiencing this issue. I can gather more information if needed. We've gone with the workaround of deploying the TSDB exporter, but they would like to collect the administrative info if the bosh_exporter
is refactored.
from prometheus-boshrelease.
Related Issues (20)
- Write TCP FirehoseVM-IP: 9186 -> Prometheus-VM-IP: 49486 write: broken pipe in firehose exporter logs HOT 1
- Monitoring Bosh Director itself? HOT 5
- Grafana APP dashboard Issue: No data even after upgrade from 26.2.0 to 26.4.0 HOT 6
- mongodb exporter crash HOT 1
- bionic stemcell. grafana startup freeze HOT 5
- Bump to GA bionic stemcell HOT 1
- Error calling humanizeDuration: can't convert int to float HOT 4
- Configuring prometheus to use a different alertmanager HOT 8
- Update grafana to 7.5.15 HOT 2
- release 27.0.0 job checksum error HOT 4
- Any plans for upgrading to Grafana 9? HOT 8
- Any plans for upgrading 7.5.17? We are having issues with Chrome browsers HOT 3
- Grafana cannot deploy with custom home dashboard HOT 2
- Nginx not updating upstream ip addresses when bosh changes them HOT 5
- plugin grafana-piechart-panel no longer available (should be replaced with piechart plugin?) HOT 4
- New App Latency and Requests Dashboards have no data HOT 4
- bosh_exporter 3.6.0 compilation failing due to naming change HOT 3
- Release for v29.5.0 missing HOT 2
- Unable to disable retro compatibily on firehose exporter HOT 8
- wrongly renamed label applicationID HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from prometheus-boshrelease.