Code Monkey home page Code Monkey logo

prometheus-openstack-exporter's Issues

Hypervisors aren't mapped to aggregates because of shortname/FQDN mismatch

In the hypervisor_stats collector, a list of aggregates is assembled, and then each hypervisor is compared against each aggregate's membership list to identify whether the hypervisor is a member of the aggregate. However, aggregate memberships are stored and returned as shortnames, while the hypervisor name is an FQDN (at least in my OpenStack-Ansible-deployed OpenStack); so no hosts are ever mapped to any aggregates.

If the project maintainers agree, I will propose a PR stripping the shortname out of the hypervisor FQDN for the purposes of comparing aggregate membership; this could be controlled by a configuration option if necessary.

get token not connection

docker run
018-10-16 02:34:37,987:INFO:Starting new HTTP connection (1): controller
2018-10-16 02:34:58,132:DEBUG:Incremented Retry for (url='/v3/auth/tokens'): Retry(total=0, connect=None, read=None, redirect=None)
2018-10-16 02:34:58,133:WARNING:Retrying (Retry(total=0, connect=None, read=None, redirect=None)) after connection broken by 'ConnectTimeoutError(<requests.packages.urllib3.connection.HTTPConnection object at 0x7ff990fe2c90>, 'Connection to controller timed out. (connect timeout=20)')': /v3/auth/tokens
2018-10-16 02:34:58,133:INFO:Starting new HTTP connection (2): controller
2018-10-16 02:35:18,284:ERROR:Got exception for 'http://controller:35357/v3/auth/tokens': 'HTTPConnectionPool(host='controller', port=35357): Max retries exceeded with url: /v3/auth/tokens (Caused by ConnectTimeoutError(<requests.packages.urllib3.connection.HTTPConnection object at 0x7ff990fe2dd0>, 'Connection to controller timed out. (connect timeout=20)'))'

got exception for "http://controller:8774"

Hi,
I have been trying to use your "prometheus-openstack-exporter" to collect openstack mectrics, but some problems occur, can you give me some help?
the error logs are as follows:
2019-07-17 03:06:57,487:INFO:Trying to get token from 'http://10.0.0.5:35357/v3'
2019-07-17 03:06:57,490:INFO:Starting new HTTP connection (1): 10.0.0.5
2019-07-17 03:06:57,570:DEBUG:"POST /v3/auth/tokens HTTP/1.1" 201 4600
2019-07-17 03:06:57,570:INFO:http://10.0.0.5:35357/v3/auth/tokens responded with status code 201
2019-07-17 03:06:57,571:DEBUG:Got token 'gAAAAABdLpAOdQ3mAbaxpI8Tp7X0I-Q85mkdv1VwyuLFXbD1tKkUQVNj-z0Dg0mStesTAR_yEttN6tFGf5RephAlk-K5F2pH4klc3HNXruBPGPv4yL1YGR2fm-7LfShP5jGE945ixUV1seN4oEF0uTabX02gPHD0bA0tvZiQPWgKTY5qaTa5jsY'
2019-07-17 03:06:57,572:INFO:Starting new HTTP connection (1): controller
2019-07-17 03:07:17,585:DEBUG:Incremented Retry for (url='/'): Retry(total=0, connect=None, read=None, redirect=None)
2019-07-17 03:07:17,585:WARNING:Retrying (Retry(total=0, connect=None, read=None, redirect=None)) after connection broken by 'NewConnectionError('<requests.packages.urllib3.connection.HTTPConnection object at 0x7fee902b0050>: Failed to establish a new connection: [Errno -3] Temporary failure in name resolution',)': /
2019-07-17 03:07:17,585:INFO:Starting new HTTP connection (2): controller
2019-07-17 03:07:37,601:ERROR:Got exception for 'http://controller:8774': 'HTTPConnectionPool(host='controller', port=8774): Max retries exceeded with url: / (Caused by NewConnectionError('<requests.packages.urllib3.connection.HTTPConnection object at 0x7fee902b0190>: Failed to establish a new connection: [Errno -3] Temporary failure in name resolution',))'
2019-07-17 03:07:37,601:INFO:Service nova check failed (returned '500' but expected '[200]')
2019-07-17 03:07:37,602:INFO:Starting new HTTP connection (1): controller
2019-07-17 03:07:57,618:DEBUG:Incremented Retry for (url='/'): Retry(total=0, connect=None, read=None, redirect=None)
2019-07-17 03:07:57,618:WARNING:Retrying (Retry(total=0, connect=None, read=None, redirect=None)) after connection broken by 'NewConnectionError('<requests.packages.urllib3.connection.HTTPConnection object at 0x7fee902b00d0>: Failed to establish a new connection: [Errno -3] Temporary failure in name resolution',)': /
2019-07-17 03:07:57,618:INFO:Starting new HTTP connection (2): controller
2019-07-17 03:08:17,622:ERROR:Got exception for 'http://controller:8776': 'HTTPConnectionPool(host='controller', port=8776): Max retries exceeded with url: / (Caused by NewConnectionError('<requests.packages.urllib3.connection.HTTPConnection object at 0x7fee902b0390>: Failed to establish a new connection: [Errno -3] Temporary failure in name resolution',))'
2019-07-17 03:08:17,622:INFO:Service cinderv2 check failed (returned '500' but expected '[200, 300]')
2019-07-17 03:08:17,623:INFO:Starting new HTTP connection (1): controller
2019-07-17 03:08:37,636:DEBUG:Incremented Retry for (url='/'): Retry(total=0, connect=None, read=None, redirect=None)
2019-07-17 03:08:37,636:WARNING:Retrying (Retry(total=0, connect=None, read=None, redirect=None)) after connection broken by 'NewConnectionError('<requests.packages.urllib3.connection.HTTPConnection object at 0x7fee902b02d0>: Failed to establish a new connection: [Errno -3] Temporary failure in name resolution',)': /
2019-07-17 03:08:37,636:INFO:Starting new HTTP connection (2): controller
2019-07-17 03:08:57,651:ERROR:Got exception for 'http://controller:8778': 'HTTPConnectionPool(host='controller', port=8778): Max retries exceeded with url: / (Caused by NewConnectionError('<requests.packages.urllib3.connection.HTTPConnection object at 0x7fee902b0590>: Failed to establish a new connection: [Errno -3] Temporary failure in name resolution',))'
2019-07-17 03:08:57,651:INFO:Service placement check failed (returned '500' but expected '[401]')
2019-07-17 03:08:57,652:INFO:Starting new HTTP connection (1): controller
2019-07-17 03:09:17,662:DEBUG:Incremented Retry for (url='/'): Retry(total=0, connect=None, read=None, redirect=None)
2019-07-17 03:09:17,663:WARNING:Retrying (Retry(total=0, connect=None, read=None, redirect=None)) after connection broken by 'NewConnectionError('<requests.packages.urllib3.connection.HTTPConnection object at 0x7fee902b04d0>: Failed to establish a new connection: [Errno -3] Temporary failure in name resolution',)': /
2019-07-17 03:09:17,663:INFO:Starting new HTTP connection (2): controller
2019-07-17 03:09:37,684:ERROR:Got exception for 'http://controller:5000': 'HTTPConnectionPool(host='controller', port=5000): Max retries exceeded with url: / (Caused by NewConnectionError('<requests.packages.urllib3.connection.HTTPConnection object at 0x7fee902b0790>: Failed to establish a new connection: [Errno -3] Temporary failure in name resolution',))'
2019-07-17 03:09:37,684:INFO:Service keystone check failed (returned '500' but expected '[300]')
2019-07-17 03:09:37,684:INFO:No check found for service 'cinderv3', creating one
2019-07-17 03:09:37,685:INFO:Starting new HTTP connection (3): controller
2019-07-17 03:09:57,696:DEBUG:Incremented Retry for (url='/'): Retry(total=0, connect=None, read=None, redirect=None)
2019-07-17 03:09:57,696:WARNING:Retrying (Retry(total=0, connect=None, read=None, redirect=None)) after connection broken by 'NewConnectionError('<requests.packages.urllib3.connection.HTTPConnection object at 0x7fee902b0610>: Failed to establish a new connection: [Errno -3] Temporary failure in name resolution',)': /
2019-07-17 03:09:57,696:INFO:Starting new HTTP connection (4): controller

Finnally, the version of openstack is ocata.
Waiting for your reply. Thanks a lot.

Text parsing errors

Good day

After I've build the container from github, configure & install it on a node. I get parsing errors show on the prometheus status page next to the node:

text format parsing error in line 49: second HELP line for metric name "openstack_services_neutron_neutron_dhcp_agent"

ย  | text format parsing error in line 52: second HELP line for metric name "openstack_services_neutron_neutron_openvswitch_agent"

etc

Could not get hypervisor statistics

2019-02-25 11:54:01,818:INFO:Trying to get token from 'https://api.xxxxxx:5000/v3'
2019-02-25 11:54:01,823:INFO:Starting new HTTPS connection (1): api.xxxxxxxx
2019-02-25 11:54:01,846:DEBUG:"POST /v3/auth/tokens HTTP/1.1" 401 114
2019-02-25 11:54:01,847:INFO:https://api.xxxxx:5000/v3/auth/tokens responded with status code 401
2019-02-25 11:54:01,847:ERROR:Cannot get a valid token from https://api.xxxxxx:5000/v3
2019-02-25 11:54:01,847:ERROR:https://api.xxxxxx:5000/v3 responded with code 401
2019-02-25 11:54:01,847:ERROR:'token'
2019-02-25 11:54:01,847:ERROR:failed to get data for cache key check_os_api
2019-02-25 11:54:01,847:ERROR:Service 'neutron' not found in catalog
2019-02-25 11:54:01,847:WARNING:Cannot get state of neutron workers
2019-02-25 11:54:01,847:ERROR:Service 'cinder' not found in catalog
2019-02-25 11:54:01,847:WARNING:Cannot get state of cinder workers
2019-02-25 11:54:01,847:ERROR:Service 'nova' not found in catalog
2019-02-25 11:54:01,847:WARNING:Cannot get state of nova workers
2019-02-25 11:54:01,848:ERROR:Service 'nova' not found in catalog
2019-02-25 11:54:01,848:WARNING:Could not get nova aggregates
2019-02-25 11:54:01,848:ERROR:Service 'nova' not found in catalog
2019-02-25 11:54:01,848:WARNING:Could not get hypervisor statistics

anyone know how to fix this ?

Launching with

sudo docker run --env-file /etc/prometheus-openstack-exporter/env.conf -d --name openstack-exporter -p 9103:9103 xxxxxx:5000/prometheus-openstack-exporter```

also the website metrics shows only?

sudo curl http://localhost:9103/metrics

HELP openstack_exporter_cache_refresh_duration_seconds Cache refresh duration in seconds.
TYPE openstack_exporter_cache_refresh_duration_seconds gauge
openstack_exporter_cache_refresh_duration_seconds{region="RegionOne"} 0.029609203338623047

Error when get metrics Hypervisor

2018-10-25 08:56:22,440:ERROR:'HypervisorStats' object has no attribute 'extra_config'
2018-10-25 08:56:22,441:ERROR:failed to get data for cache key hypervisor_stats

                free = ((int(self.extra_config['cpu_ratio'] *
                             m_vcpus)) -
                        m_vcpus_used)

nova_aggregates[agg]['metrics']['free_vcpus'] += free

Choose interface for check API

Looks like openstack_check_*_api are picking the first endpoint for particular service yield by endpoints api call. You can't be sure if you get public, internal or admin interface url (endpoint type). Would be good to check all of them (and label in metrics) or just particular type configured somewhere.

Github Releases

Hello,

  • Would it be possible for you to add github releases? In this way we can
    integrate it in other projects/docker images without the hassle of having to download
    the sources.

Thank you.

Metric collection fails on current versions of prometheus_client

The Docker container ships with prometheus_client 0.13, which appears to be less stringent about its inputs; however, installing the exporter alongside a current 0.4.2 version of prometheus results in many metrics not being returned at all.

By adding a try-except block around lines lines 141-151 in hypervisor_stats.py (https://github.com/att-comdev/prometheus-openstack-exporter/blob/master/exporter/hypervisor_stats.py#L141-L151), I was able to identify that prometheus_client is throwing a Duplicated timeseries in CollectorRegistry exception when assembling the metrics for delivery.

I suspect that the older version of prometheus_client was simply silently overwriting duplicate timeseries. I believe the same issue may affect all of the collector modules. Exception handling should be added around calls to the prometheus_client methods, and input should be sanitized ahead of those calls to ensure no duplicate timeseries are propagated. I'd be happy to help develop a PR to address some of this, if I can get some guidance from the project maintainers on how they'd like to see that implemented.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.