att-comdev / prometheus-openstack-exporter Goto Github PK
View Code? Open in Web Editor NEWLicense: Apache License 2.0
License: Apache License 2.0
In the hypervisor_stats collector, a list of aggregates is assembled, and then each hypervisor is compared against each aggregate's membership list to identify whether the hypervisor is a member of the aggregate. However, aggregate memberships are stored and returned as shortnames, while the hypervisor name is an FQDN (at least in my OpenStack-Ansible-deployed OpenStack); so no hosts are ever mapped to any aggregates.
If the project maintainers agree, I will propose a PR stripping the shortname out of the hypervisor FQDN for the purposes of comparing aggregate membership; this could be controlled by a configuration option if necessary.
have grafana dashboard file? thx.
Currently openstack_check_placement_api metric is expecting 401, however plane request to placement api is a List Versions call that doesn't expect authentication - https://developer.openstack.org/api-ref/placement/. I would propose to expect 200 then. Tested in Queens.
docker run
018-10-16 02:34:37,987:INFO:Starting new HTTP connection (1): controller
2018-10-16 02:34:58,132:DEBUG:Incremented Retry for (url='/v3/auth/tokens'): Retry(total=0, connect=None, read=None, redirect=None)
2018-10-16 02:34:58,133:WARNING:Retrying (Retry(total=0, connect=None, read=None, redirect=None)) after connection broken by 'ConnectTimeoutError(<requests.packages.urllib3.connection.HTTPConnection object at 0x7ff990fe2c90>, 'Connection to controller timed out. (connect timeout=20)')': /v3/auth/tokens
2018-10-16 02:34:58,133:INFO:Starting new HTTP connection (2): controller
2018-10-16 02:35:18,284:ERROR:Got exception for 'http://controller:35357/v3/auth/tokens': 'HTTPConnectionPool(host='controller', port=35357): Max retries exceeded with url: /v3/auth/tokens (Caused by ConnectTimeoutError(<requests.packages.urllib3.connection.HTTPConnection object at 0x7ff990fe2dd0>, 'Connection to controller timed out. (connect timeout=20)'))'
Hi,
I have been trying to use your "prometheus-openstack-exporter" to collect openstack mectrics, but some problems occur, can you give me some help?
the error logs are as follows:
2019-07-17 03:06:57,487:INFO:Trying to get token from 'http://10.0.0.5:35357/v3'
2019-07-17 03:06:57,490:INFO:Starting new HTTP connection (1): 10.0.0.5
2019-07-17 03:06:57,570:DEBUG:"POST /v3/auth/tokens HTTP/1.1" 201 4600
2019-07-17 03:06:57,570:INFO:http://10.0.0.5:35357/v3/auth/tokens responded with status code 201
2019-07-17 03:06:57,571:DEBUG:Got token 'gAAAAABdLpAOdQ3mAbaxpI8Tp7X0I-Q85mkdv1VwyuLFXbD1tKkUQVNj-z0Dg0mStesTAR_yEttN6tFGf5RephAlk-K5F2pH4klc3HNXruBPGPv4yL1YGR2fm-7LfShP5jGE945ixUV1seN4oEF0uTabX02gPHD0bA0tvZiQPWgKTY5qaTa5jsY'
2019-07-17 03:06:57,572:INFO:Starting new HTTP connection (1): controller
2019-07-17 03:07:17,585:DEBUG:Incremented Retry for (url='/'): Retry(total=0, connect=None, read=None, redirect=None)
2019-07-17 03:07:17,585:WARNING:Retrying (Retry(total=0, connect=None, read=None, redirect=None)) after connection broken by 'NewConnectionError('<requests.packages.urllib3.connection.HTTPConnection object at 0x7fee902b0050>: Failed to establish a new connection: [Errno -3] Temporary failure in name resolution',)': /
2019-07-17 03:07:17,585:INFO:Starting new HTTP connection (2): controller
2019-07-17 03:07:37,601:ERROR:Got exception for 'http://controller:8774': 'HTTPConnectionPool(host='controller', port=8774): Max retries exceeded with url: / (Caused by NewConnectionError('<requests.packages.urllib3.connection.HTTPConnection object at 0x7fee902b0190>: Failed to establish a new connection: [Errno -3] Temporary failure in name resolution',))'
2019-07-17 03:07:37,601:INFO:Service nova check failed (returned '500' but expected '[200]')
2019-07-17 03:07:37,602:INFO:Starting new HTTP connection (1): controller
2019-07-17 03:07:57,618:DEBUG:Incremented Retry for (url='/'): Retry(total=0, connect=None, read=None, redirect=None)
2019-07-17 03:07:57,618:WARNING:Retrying (Retry(total=0, connect=None, read=None, redirect=None)) after connection broken by 'NewConnectionError('<requests.packages.urllib3.connection.HTTPConnection object at 0x7fee902b00d0>: Failed to establish a new connection: [Errno -3] Temporary failure in name resolution',)': /
2019-07-17 03:07:57,618:INFO:Starting new HTTP connection (2): controller
2019-07-17 03:08:17,622:ERROR:Got exception for 'http://controller:8776': 'HTTPConnectionPool(host='controller', port=8776): Max retries exceeded with url: / (Caused by NewConnectionError('<requests.packages.urllib3.connection.HTTPConnection object at 0x7fee902b0390>: Failed to establish a new connection: [Errno -3] Temporary failure in name resolution',))'
2019-07-17 03:08:17,622:INFO:Service cinderv2 check failed (returned '500' but expected '[200, 300]')
2019-07-17 03:08:17,623:INFO:Starting new HTTP connection (1): controller
2019-07-17 03:08:37,636:DEBUG:Incremented Retry for (url='/'): Retry(total=0, connect=None, read=None, redirect=None)
2019-07-17 03:08:37,636:WARNING:Retrying (Retry(total=0, connect=None, read=None, redirect=None)) after connection broken by 'NewConnectionError('<requests.packages.urllib3.connection.HTTPConnection object at 0x7fee902b02d0>: Failed to establish a new connection: [Errno -3] Temporary failure in name resolution',)': /
2019-07-17 03:08:37,636:INFO:Starting new HTTP connection (2): controller
2019-07-17 03:08:57,651:ERROR:Got exception for 'http://controller:8778': 'HTTPConnectionPool(host='controller', port=8778): Max retries exceeded with url: / (Caused by NewConnectionError('<requests.packages.urllib3.connection.HTTPConnection object at 0x7fee902b0590>: Failed to establish a new connection: [Errno -3] Temporary failure in name resolution',))'
2019-07-17 03:08:57,651:INFO:Service placement check failed (returned '500' but expected '[401]')
2019-07-17 03:08:57,652:INFO:Starting new HTTP connection (1): controller
2019-07-17 03:09:17,662:DEBUG:Incremented Retry for (url='/'): Retry(total=0, connect=None, read=None, redirect=None)
2019-07-17 03:09:17,663:WARNING:Retrying (Retry(total=0, connect=None, read=None, redirect=None)) after connection broken by 'NewConnectionError('<requests.packages.urllib3.connection.HTTPConnection object at 0x7fee902b04d0>: Failed to establish a new connection: [Errno -3] Temporary failure in name resolution',)': /
2019-07-17 03:09:17,663:INFO:Starting new HTTP connection (2): controller
2019-07-17 03:09:37,684:ERROR:Got exception for 'http://controller:5000': 'HTTPConnectionPool(host='controller', port=5000): Max retries exceeded with url: / (Caused by NewConnectionError('<requests.packages.urllib3.connection.HTTPConnection object at 0x7fee902b0790>: Failed to establish a new connection: [Errno -3] Temporary failure in name resolution',))'
2019-07-17 03:09:37,684:INFO:Service keystone check failed (returned '500' but expected '[300]')
2019-07-17 03:09:37,684:INFO:No check found for service 'cinderv3', creating one
2019-07-17 03:09:37,685:INFO:Starting new HTTP connection (3): controller
2019-07-17 03:09:57,696:DEBUG:Incremented Retry for (url='/'): Retry(total=0, connect=None, read=None, redirect=None)
2019-07-17 03:09:57,696:WARNING:Retrying (Retry(total=0, connect=None, read=None, redirect=None)) after connection broken by 'NewConnectionError('<requests.packages.urllib3.connection.HTTPConnection object at 0x7fee902b0610>: Failed to establish a new connection: [Errno -3] Temporary failure in name resolution',)': /
2019-07-17 03:09:57,696:INFO:Starting new HTTP connection (4): controller
Finnally, the version of openstack is ocata.
Waiting for your reply. Thanks a lot.
Good day
After I've build the container from github, configure & install it on a node. I get parsing errors show on the prometheus status page next to the node:
text format parsing error in line 49: second HELP line for metric name "openstack_services_neutron_neutron_dhcp_agent"
ย | text format parsing error in line 52: second HELP line for metric name "openstack_services_neutron_neutron_openvswitch_agent"
etc
2019-02-25 11:54:01,818:INFO:Trying to get token from 'https://api.xxxxxx:5000/v3'
2019-02-25 11:54:01,823:INFO:Starting new HTTPS connection (1): api.xxxxxxxx
2019-02-25 11:54:01,846:DEBUG:"POST /v3/auth/tokens HTTP/1.1" 401 114
2019-02-25 11:54:01,847:INFO:https://api.xxxxx:5000/v3/auth/tokens responded with status code 401
2019-02-25 11:54:01,847:ERROR:Cannot get a valid token from https://api.xxxxxx:5000/v3
2019-02-25 11:54:01,847:ERROR:https://api.xxxxxx:5000/v3 responded with code 401
2019-02-25 11:54:01,847:ERROR:'token'
2019-02-25 11:54:01,847:ERROR:failed to get data for cache key check_os_api
2019-02-25 11:54:01,847:ERROR:Service 'neutron' not found in catalog
2019-02-25 11:54:01,847:WARNING:Cannot get state of neutron workers
2019-02-25 11:54:01,847:ERROR:Service 'cinder' not found in catalog
2019-02-25 11:54:01,847:WARNING:Cannot get state of cinder workers
2019-02-25 11:54:01,847:ERROR:Service 'nova' not found in catalog
2019-02-25 11:54:01,847:WARNING:Cannot get state of nova workers
2019-02-25 11:54:01,848:ERROR:Service 'nova' not found in catalog
2019-02-25 11:54:01,848:WARNING:Could not get nova aggregates
2019-02-25 11:54:01,848:ERROR:Service 'nova' not found in catalog
2019-02-25 11:54:01,848:WARNING:Could not get hypervisor statistics
anyone know how to fix this ?
Launching with
sudo docker run --env-file /etc/prometheus-openstack-exporter/env.conf -d --name openstack-exporter -p 9103:9103 xxxxxx:5000/prometheus-openstack-exporter```
also the website metrics shows only?
sudo curl http://localhost:9103/metrics
HELP openstack_exporter_cache_refresh_duration_seconds Cache refresh duration in seconds.
TYPE openstack_exporter_cache_refresh_duration_seconds gauge
openstack_exporter_cache_refresh_duration_seconds{region="RegionOne"} 0.029609203338623047
2018-10-25 08:56:22,440:ERROR:'HypervisorStats' object has no attribute 'extra_config'
2018-10-25 08:56:22,441:ERROR:failed to get data for cache key hypervisor_stats
free = ((int(self.extra_config['cpu_ratio'] *
m_vcpus)) -
m_vcpus_used)
nova_aggregates[agg]['metrics']['free_vcpus'] += free
Looks like openstack_check_*_api are picking the first endpoint for particular service yield by endpoints api call. You can't be sure if you get public, internal or admin interface url (endpoint type). Would be good to check all of them (and label in metrics) or just particular type configured somewhere.
Hello,
Thank you.
The Docker container ships with prometheus_client 0.13, which appears to be less stringent about its inputs; however, installing the exporter alongside a current 0.4.2 version of prometheus results in many metrics not being returned at all.
By adding a try-except block around lines lines 141-151 in hypervisor_stats.py (https://github.com/att-comdev/prometheus-openstack-exporter/blob/master/exporter/hypervisor_stats.py#L141-L151), I was able to identify that prometheus_client is throwing a Duplicated timeseries in CollectorRegistry
exception when assembling the metrics for delivery.
I suspect that the older version of prometheus_client was simply silently overwriting duplicate timeseries. I believe the same issue may affect all of the collector modules. Exception handling should be added around calls to the prometheus_client methods, and input should be sanitized ahead of those calls to ensure no duplicate timeseries are propagated. I'd be happy to help develop a PR to address some of this, if I can get some guidance from the project maintainers on how they'd like to see that implemented.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.