Code Monkey home page Code Monkey logo

vsanmetrics's Introduction

vsanmetrics

vsanmetrics is a tool written in Python for collecting usage and performance metrics and health status from a VMware vSAN cluster and translating them in InfluxDB's line protocol.

It can be useful to send metrics in a time-serie database like InfluxDB or Graphite with the help of Telegraf and then display metrics in Grafana.

A detailed list of all entities types and metrics is available here

Prerequisites

  • Python 3 (This script has been tested with python 3.6.7)
  • Pyvmomi python's librairy

You can install the librairies with pip -> pip install -r requirements.txt

To use the vSAN Python bindings, download the SDK and place vsanmgmtObjects.py and vsanapiutis.py on a path where your Python applications can import library or in the same folder than vsanmetrics.py.

Installation

  • Download the script vsanmetrics.py
  • On linux box, make the script executable
% chmod +x ./vsanmetrics
  • Run the script with the -h parameter to check if it works
% ./vsanmetrics -h

usage: vsanmetrics.py [-h] -s VCENTER [-o PORT] -u USER [-p PASSWORD] -c
                      CLUSTERNAME [--performance] [--capacity] [--health]
                      [--skipentitytypes SKIPENTITYTYPES]

Export vSAN cluster performance and storage usage statistics to InfluxDB line
protocol

optional arguments:
  -h, --help            show this help message and exit
  -s VCENTER, --vcenter VCENTER
                        Remote vcenter to connect to
  -o PORT, --port PORT  Port to connect on
  -u USER, --user USER  User name to use when connecting to vcenter
  -p PASSWORD, --password PASSWORD
                        Password to use when connecting to vcenter
  -c CLUSTERNAME, --cluster_name CLUSTERNAME
                        Cluster Name
  --performance         Output performance metrics
  --capacity            Output storage usage metrics
  --health              Output cluster health status
  --skipentitytypes SKIPENTITYTYPES
                        List of entity types to skip. Separated by a comma
  --cachefolder CACHEFOLDER
                        Folder where the cache files are stored
  --cacheTTL CACHETTL   TTL of the object inventory cache

Usage

Run the script against a vSAN cluster to gather the storage usage statistics.

% ./vsanmetrics.py -s vcenter.example.com -u [email protected] -p MyAwesomePassword -c VSAN-CLUSTER --capacity

capacity_global,scope=global,vcenter=vcenter.example.com,cluster=VSAN-CLUSTER totalCapacityB=7200999211008,freeCapacityB=1683354550260 1525422314084382976
capacity_summary,scope=summary,vcenter=vcenter.example.com,cluster=VSAN-CLUSTER temporaryOverheadB=0,physicalUsedB=2636212338688,primaryCapacityB=2688980877312,usedB=5380734189568,reservedCapacityB=3607749040540,overReservedB=2744521850880,provisionCapacityB=6986210377728,overheadB=2828663783436 1525422314084382976
capacity_vmswap,scope=vmswap,vcenter=vcenter.example.com,cluster=VSAN-CLUSTER temporaryOverheadB=0,physicalUsedB=8422162432,primaryCapacityB=177330978816,usedB=355240771584,reservedCapacityB=355089776640,overReservedB=346818609152,overheadB=177909792768 1525422314084382976
capacity_checksumOverhead,scope=checksumOverhead,vcenter=vcenter.example.com,cluster=VSAN-CLUSTER temporaryOverheadB=0,physicalUsedB=0,primaryCapacityB=0,usedB=8858370048,reservedCapacityB=0,overReservedB=0,overheadB=8858370048 1525422314084382976

Run the script against a vSAN cluster to gather performance statistics.

% ./vsanmetrics.py -s vcenter.example.com -u [email protected] -p MyAwesomePassword -c VSAN-CLUSTER --performance

cluster-domclient,cluster=VSAN-CLUSTER,vcenter=vcenter.example.com,uuid=52b29fa6-9cb9-6d67-31ed-4bf8f2dd9294 oio=7.0,throughputRead=40883.0,latencyAvgWrite=11218.0,latencyAvgRead=985.0,iopsRead=1.0,throughputWrite=2819.0,congestion=0.0,iopsWrite=0.0 1525462200000000000
cluster-domcompmgr,cluster=VSAN-CLUSTER,vcenter=vcenter.example.com,uuid=52b29fa6-9cb9-6d67-31ed-4bf8f2dd9294 oio=6.0,throughputRecWrite=0.0,latencyAvgRecWrite=0.0,throughputRead=45309.0,latencyAvgWrite=1335.0,tputResyncRead=0.0,latencyAvgRead=935.0,iopsRead=1.0,throughputWrite=14476.0,latAvgResyncRead=0.0,iopsResyncRead=0.0,iopsRecWrite=0.0,iopsWrite=2.0,congestion=0.0 1525462200000000000
host-domclient,cluster=VSAN-CLUSTER,vcenter=vcenter.example.com,hostname=esx01.example.com,uuid=5ae60a2b-fe13-25dd-1f19-005056a3a442 oio=1.0,throughputRead=95.0,latencyAvgWrite=0.0,latencyAvgRead=340.0,iopsRead=0.0,clientCacheHitRate=0.0,throughputWrite=0.0,congestion=0.0,iopsWrite=0.0,clientCacheHits=0.0 1525462200000000000
host-domclient,cluster=VSAN-CLUSTER,vcenter=vcenter.example.com,hostname=esx03.example.com,uuid=5ae750e2-bc6d-487b-1283-005056a38be2 oio=6.0,throughputRead=40788.0,latencyAvgWrite=11218.0,latencyAvgRead=1000.0,iopsRead=1.0,clientCacheHitRate=0.0,throughputWrite=2819.0,congestion=0.0,iopsWrite=0.0,clientCacheHits=0.0 1525462200000000000
host-domclient,cluster=VSAN-CLUSTER,vcenter=vcenter.example.com,hostname=esx02.example.com,uuid=5ae7229f-771d-1091-ffe7-005056a35f01 oio=0.0,throughputRead=0.0,latencyAvgWrite=0.0,latencyAvgRead=0.0,iopsRead=0.0,clientCacheHitRate=0.0,throughputWrite=0.0,congestion=0.0,iopsWrite=0.0,clientCacheHits=0.0 1525462200000000000

Run the script against a vSAN cluster to gather performance statistics and skip some entity types like virtual machines or VSCSI entities:

% ./vsanmetrics.py -s vcenter.example.com -u [email protected] -p MyAwesomePassword -c VSAN-CLUSTER --performance --skipentitytypes virtual-machine,vscsi

cluster-domclient,cluster=VSAN-CLUSTER,vcenter=vcenter.example.com,uuid=52b29fa6-9cb9-6d67-31ed-4bf8f2dd9294 oio=7.0,throughputRead=40883.0,latencyAvgWrite=11218.0,latencyAvgRead=985.0,iopsRead=1.0,throughputWrite=2819.0,congestion=0.0,iopsWrite=0.0 1525462200000000000
cluster-domcompmgr,cluster=VSAN-CLUSTER,vcenter=vcenter.example.com,uuid=52b29fa6-9cb9-6d67-31ed-4bf8f2dd9294 oio=6.0,throughputRecWrite=0.0,latencyAvgRecWrite=0.0,throughputRead=45309.0,latencyAvgWrite=1335.0,tputResyncRead=0.0,latencyAvgRead=935.0,iopsRead=1.0,throughputWrite=14476.0,latAvgResyncRead=0.0,iopsResyncRead=0.0,iopsRecWrite=0.0,iopsWrite=2.0,congestion=0.0 1525462200000000000
host-domclient,cluster=VSAN-CLUSTER,vcenter=vcenter.example.com,hostname=esx01.example.com,uuid=5ae60a2b-fe13-25dd-1f19-005056a3a442 oio=1.0,throughputRead=95.0,latencyAvgWrite=0.0,latencyAvgRead=340.0,iopsRead=0.0,clientCacheHitRate=0.0,throughputWrite=0.0,congestion=0.0,iopsWrite=0.0,clientCacheHits=0.0 1525462200000000000
host-domclient,cluster=VSAN-CLUSTER,vcenter=vcenter.example.com,hostname=esx03.example.com,uuid=5ae750e2-bc6d-487b-1283-005056a38be2 oio=6.0,throughputRead=40788.0,latencyAvgWrite=11218.0,latencyAvgRead=1000.0,iopsRead=1.0,clientCacheHitRate=0.0,throughputWrite=2819.0,congestion=0.0,iopsWrite=0.0,clientCacheHits=0.0 1525462200000000000
host-domclient,cluster=VSAN-CLUSTER,vcenter=vcenter.example.com,hostname=esx02.example.com,uuid=5ae7229f-771d-1091-ffe7-005056a35f01 oio=0.0,throughputRead=0.0,latencyAvgWrite=0.0,latencyAvgRead=0.0,iopsRead=0.0,clientCacheHitRate=0.0,throughputWrite=0.0,congestion=0.0,iopsWrite=0.0,clientCacheHits=0.0 1525462200000000000

Cache

The script will try to maintain an inventory of the vSAN infrastructure in a cache. There are two major benefits:

  • Reducing the global execution time of the script for larger environnement
  • Avoid errors when a host is disconnected wilhe the script is executing

By default cache validity duration is 60 minutes. You can choose your own duration with the parameter --cacheTTL. Cache files are stored where the script is executed, you can modify this behavior with parameter --cachefolder.

% ./vsanmetrics.py -s vcenter.example.com -u [email protected] -p MyAwesomePassword -c VSAN-CLUSTER --performance --cacheTTL 300 --cachefolder /tmp

List of available entities types

A more detailed list of entities and metrics is available here

Name Description
cluster-domclient Metrics about clusters in the perspective of VM consumption.
cluster-domcompmgr Metrics about clusters in the perspective of vSAN backend.
host-domclient Metrics about hosts in the perspective of VM consumption
host-domcompmgr Metrics about hosts in the perspective of vSAN backend.
cache-disk Metrics about Cache-tier disks
capacity-disk Metrics about Capacity-tier disks
disk-group Metrics about disk groups.
vscsi Metrics for Virtual SCSI of virtual machines
virtual-machine Metrics for virtual machines
virtual-disk Metrics for virtual disks.
vsan-vnic-net Metrics for vSAN VMkernel Network Adapter.
vsan-host-net Metrics for vSAN Host Network.
vsan-pnic-net Metrics for vSAN physical NIC.
vsan-iscsi-host Metrics for all vSAN iSCSI targets on this ESXi host.
vsan-iscsi-target Metrics for all LUNs on a vSAN iSCSI target.
vsan-iscsi-lun Metrics for a vSAN iSCSI LUN.

Using vsanmetrics with Telegraf

The exec input plugin of Telegraf executes the commands on every interval and parses metrics from their output in any one of the accepted Input Data Formats.

Don't forget to configure Telegraf to output data to a time series database !

vsanmetrics output the metrics in InfluxDB's line protocol. Telegraf will parse them and send them to any data format configured in the outputs plugins.

vsanmetrics and and the Python's librairies should be available by the user who run the Telegraf service. (typically root on Linux boxes...).

TIP: On Linux, install the librairies with the command sudo -H pip install -r requirements.txt to make it available to the root user.

Here is an example of a working telegraf's config file:

###############################################################################
#                            INPUT PLUGINS                                    #
###############################################################################

[[inputs.exec]]
  # Shell/commands array
  # Full command line to executable with parameters, or a glob pattern to run all matching files.
  commands = ["/path/to/script/vsanmetrics.py -s vcenter01.example.com -u [email protected] -p MyAwesomePassword -c VSAN-CLUSTER --performance --capacity --health"]

  # Timeout for each command to complete.
  timeout = "60s"

  # Data format to consume.
  # NOTE json only reads numerical measurements, strings and booleans are ignored.
  data_format = "influx"

  interval = "300s"

If needed, you can specify more than one input plugin. It might be useful if you want to gather different statistics with different intervals or if you want to query different vSAN clusters.

###############################################################################
#                            INPUT PLUGINS                                    #
###############################################################################

[[inputs.exec]]
  # Shell/commands array
  # Full command line to executable with parameters, or a glob pattern to run all matching files.
  commands = ["/path/to/script/vsanmetrics.py -s vcenter01.example.com -u [email protected] -p MyAwesomePassword -c VSAN-CLUSTER --performance --capacity --health"]

  # Timeout for each command to complete.
  timeout = "60s"

  # Data format to consume.
  # NOTE json only reads numerical measurements, strings and booleans are ignored.
  data_format = "influx"

  interval = "300s"

[[inputs.exec]]
  # Shell/commands array
  # Full command line to executable with parameters, or a glob pattern to run all matching files.
  commands = ["/path/to/script/vsanmetrics.py -s vcenter02.example.com -u [email protected] -p MyAwesomePassword -c VSAN-CLUSTER --performance --capacity --health"]

  # Timeout for each command to complete.
  timeout = "60s"

  # Data format to consume.
  # NOTE json only reads numerical measurements, strings and booleans are ignored.
  data_format = "influx"

  interval = "300s"

Author

Erwan Quélin

License

Copyright 2018 Erwan Quelin and the community.

Licensed under the Apache License 2.0.

vsanmetrics's People

Contributors

bdeam avatar equelin avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

vsanmetrics's Issues

TypeError: 'int' object is not iterable

Script fails to collect any metrics via telegraf config or manual execution. I've tried with all three (performance, capacity, and health) together and all three individually. Each time gives the same error below for each of the three categories. Every error ends with the same TypeError.

This is against a 6.7U1 cluster with the same sdk version. Latest versions of all pip modules. Python 2.7.5 on RHEL7. Got the same behavior on Python 2.7.13 as well.

[root@system vsanmetrics]# ./vsanmetrics.py -s vcenter.example.com -u "[email protected]" -p secret -c My_Cluster --performance
Process Process-1:
Traceback (most recent call last):
  File "/usr/lib64/python2.7/multiprocessing/process.py", line 258, in _bootstrap
    self.run()
  File "/usr/lib64/python2.7/multiprocessing/process.py", line 114, in run
    self._target(*self._args, **self._kwargs)
  File "./vsanmetrics.py", line 539, in getPerformance
    si, _, cluster_obj = connectvCenter(args, context)
TypeError: 'int' object is not iterable

No module named vsanapiutils

root@ubuntu-xenial:/home/ubuntu/vsanmetrics-master# source vmware/bin/activate

(vmware) root@ubuntu-xenial:/home/ubuntu/vsanmetrics-master# pip freeze
pkg-resources==0.0.0
(vmware) root@ubuntu-xenial:/home/ubuntu/vsanmetrics-master# pip install -r requirements.txt
Collecting pyvmomi (from -r requirements.txt (line 1))
Collecting requests>=2.3.0 (from pyvmomi->-r requirements.txt (line 1))
  Using cached https://files.pythonhosted.org/packages/f1/ca/10332a30cb25b627192b4ea272c351bce3ca1091e541245cccbace6051d8/requests-2.20.0-py2.py3-none-any.whl
Collecting six>=1.7.3 (from pyvmomi->-r requirements.txt (line 1))
  Using cached https://files.pythonhosted.org/packages/67/4b/141a581104b1f6397bfa78ac9d43d8ad29a7ca43ea90a2d863fe3056e86a/six-1.11.0-py2.py3-none-any.whl
Collecting urllib3<1.25,>=1.21.1 (from requests>=2.3.0->pyvmomi->-r requirements.txt (line 1))
  Using cached https://files.pythonhosted.org/packages/62/00/ee1d7de624db8ba7090d1226aebefab96a2c71cd5cfa7629d6ad3f61b79e/urllib3-1.24.1-py2.py3-none-any.whl
Collecting chardet<3.1.0,>=3.0.2 (from requests>=2.3.0->pyvmomi->-r requirements.txt (line 1))
  Using cached https://files.pythonhosted.org/packages/bc/a9/01ffebfb562e4274b6487b4bb1ddec7ca55ec7510b22e4c51f14098443b8/chardet-3.0.4-py2.py3-none-any.whl
Collecting idna<2.8,>=2.5 (from requests>=2.3.0->pyvmomi->-r requirements.txt (line 1))
  Using cached https://files.pythonhosted.org/packages/4b/2a/0276479a4b3caeb8a8c1af2f8e4355746a97fab05a372e4a2c6a6b876165/idna-2.7-py2.py3-none-any.whl
Collecting certifi>=2017.4.17 (from requests>=2.3.0->pyvmomi->-r requirements.txt (line 1))
  Using cached https://files.pythonhosted.org/packages/56/9d/1d02dd80bc4cd955f98980f28c5ee2200e1209292d5f9e9cc8d030d18655/certifi-2018.10.15-py2.py3-none-any.whl
Installing collected packages: urllib3, chardet, idna, certifi, requests, six, pyvmomi
Successfully installed certifi-2018.10.15 chardet-3.0.4 idna-2.7 pyvmomi-6.7.1 requests-2.20.0 six-1.11.0 urllib3-1.24.1
(vmware) root@ubuntu-xenial:/home/ubuntu/vsanmetrics-master# pip freeze
certifi==2018.10.15
chardet==3.0.4
idna==2.7
pkg-resources==0.0.0
pyvmomi==6.7.1
requests==2.20.0
six==1.11.0
urllib3==1.24.1
(vmware) root@ubuntu-xenial:/home/ubuntu/vsanmetrics-master# ./vsanmetrics.py -h
Traceback (most recent call last):
  File "./vsanmetrics.py", line 19, in <module>
    import vsanapiutils
ImportError: No module named vsanapiutils

Error in plugin [inputs.exec]: metric parse error: expected field at offset 16736:

I get this error (in title) when I run telegraf however if I run the command on its own its seems that I get all the data displayed on the screen without issues:
./vsanmetrics.py -s vcenter.xxx.xxx -u [email protected] -p xxxxx -c xxxx --performance --capacity --health

Running vCenter 6.7.0.11000 and hosts are with 6.7.0, 8169922

Is this something to do with data format retrieved by telegraf and not compliant with influxdb?

--capacity not working

Traceback (most recent call last):
File "/opt/vsanmetrics/vsanmetrics.py", line 494, in
main()
File "/opt/vsanmetrics/vsanmetrics.py", line 361, in main
if spaceReport.efficientCapacity:
AttributeError: 'vim.cluster.VsanSpaceUsage' object has no attribute 'efficientCapacity'

Error if one host from cluster is down

If I ran the command to query the cluster for e.g. performance metrics and one host of the cluster is not available, then the whole command fails:

python /opt/vsanmetrics/vsanmetrics.py -s -u -p -c --performance
Traceback (most recent call last):
File "/opt/vsanmetrics/vsanmetrics.py", line 584, in
main()
File "/opt/vsanmetrics/vsanmetrics.py", line 466, in main
uuid, disks = getInformations(content, cluster_obj)
File "/opt/vsanmetrics/vsanmetrics.py", line 108, in getInformations
diskAll = host.configManager.vsanSystem.QueryDisksForVsan()
File "/usr/local/lib/python2.7/dist-packages/pyVmomi/VmomiSupport.py", line 580, in
self.f(*(self.args + (obj,) + args), **kwargs)
File "/usr/local/lib/python2.7/dist-packages/pyVmomi/VmomiSupport.py", line 386, in _InvokeMethod
return self._stub.InvokeMethod(self, info, args)
File "/usr/local/lib/python2.7/dist-packages/pyVmomi/SoapAdapter.py", line 1366, in InvokeMethod
raise obj # pylint: disable-msg=E0702
pyVmomi.VmomiSupport.HostNotConnected: (vmodl.fault.HostNotConnected) {
dynamicType = ,
dynamicProperty = (vmodl.DynamicProperty) [],
msg = 'Unable to communicate with the remote host, since it is disconnected.',
faultCause = ,
faultMessage = (vmodl.LocalizableMessage) []
}

type object 'vim.cluster.VsanPerfUnitType' has no attribute 'time_ns'

Hello Equelin,
many thanks for your contribution
I have a problem, when I use vsphere8.0 version, an error occurs
type object 'vim.cluster.VsanPerfUnitType' has no attribute 'time_ns'
I think it may be that in the vsanmgmtObjects.py file, CreateDataType, unit uses the vim.cluster.VsanPerfUnitType method. The version is incompatible and this method does not exist. If I want to use 8.0, how should I change the code?
Hope to get your help thank you

Cache, Capacity, Disk Group to esxi host

Hi,
great script. I am missing the connection between cache, capacity, disk group and esxi hosts. So if I choose cache-disk in grafana I can only filter by cluster and not by esxi host.
2018-05-23 12_41_04-grafana - vmware_ vsan disk group performance

iopsResyncRead & iopsRecWrite are the same

Hi,
can you please check if the following two counters were queried correct from vsan:
cluster-domcompmgr:

  • iopsResyncRead
  • iopsRecWrite
  • tputResyncRead
  • throughputRecWrite

Because if we have resync traffic on our vSAN clusters both counters have the same value

Request Grafana Dashboard

Hello Equelin,

Thx for your work. Really cool python's script for vSan :)
I just want to ask if it is possible to share Grafana Dashboard ?

Regard,

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.