Code Monkey home page Code Monkey logo

gnmi-gateway's Introduction

Release Testing Go Report Card License GoDoc

โš  Experimental. Please take note that this is a pre-release version.

gNMI Gateway

gnmi-gateway is a distributed and highly available service for connecting to multiple gNMI targets. Currently only the gNMI Subscribe RPC is supported.

Common use-cases are:

  • Provide multiple streams to gNMI clients while maintaining a single connection to gNMI targets.
  • Provide highly available streams to gNMI clients.
  • Distribute gNMI target connections among multiple servers.
  • Export gNMI streams to other data formats and protocols.
  • Dynamically form connections to gNMI targets based on data in other systems (e.g., your NMS, or network inventory, etc).

Design

Overview

gnmi-gateway is written in Golang and is designed to be easily extendable for users and organizations interested in utilizing gNMI data (modeled with OpenConfig). However, if you aren't interested in writing your own code there are a few built-in components to make it easy to use from the command-line.

gnmi-gateway connects to gNMI targets based on data received from Target Loaders. gNMI Notification messages are then forwarded to the gnmi-gateway cache, gNMI clients with relevant subscriptions, and Exporters which may forward data to other systems or protocols.

Target Loaders

Target Loaders are components that are used to generate target connection configurations that are sent to the connection manager. Target Loaders and the connection manager communicate using the target.proto model found in the github.com/openconfig/gnmi repository. gnmi-gateway accepts a few additional parameters in the Target.meta field:

NoTLSVerify: inlcude this field to disable TLS verification. This enables
             the use of self-signed certificates. Note that connections
             without TLS are not supported per the gNMI specification.

NoLock: include this field to disable locking for the associated target even
        if clustering is enabled. Only include this field if you are
        handling de-duplication outside of gnmi-gateway.

There are a few Target Loaders included with gnmi-gateway that you can use right away using the -TargetLoaders flag from the command-line. The Target Loaders included are:

If you'd like to build your own Target Loader see loaders/loader.go for details on how to implement the TargetLoader interface.

Exporters

Exporters are components of gnmi-gateway that are used to convert gNMI data into other formats and protocols for use by other systems. Some simple examples would be sending gNMI notifications to a Kafka stream or storing gNMI messages in a data store. Exporters will receive each gNMI message in the stream as it is received but also have access to query the local gNMI cache.

Exporters may be run on the same servers as your gnmi-gateway target connections or you can run exporters on a server acting as clients to another gnmi-gateway cluster. This allows for some flexibility in your deployment design.

Some Exporters have been included with gnmi-gateway and you can start using them by providing a comma-separated list of Exporters from the command-line with the -Exporters flag. The included Exporters are:

To build a custom Exporter see exporters/exporter.go for details on how to implement the Exporter interface.

Documentation

Most of the documentation resides in this repo. Please feel welcome to file a Github issue if you have question.

See the godoc pages for documentation and usage examples.

Pre-requisites

  • Golang 1.14 or newer
  • A target that supports gNMI Subscribe. This is usually a network router or switch.
  • A running instance of Apache Zookeeper. If you only want to run a single instance of gnmi-gateway (i.e. without failover) you don't need Zookeeper. See the development instructions below for how to set up a Zookeeper Docker container.

Source Install / Run Instructions

These are the commands that would be used to start gnmi-gateway on a Linux install that has make installed. If you are not on a platform that is compatible with the Makefile the commands inside the Makefile should translate to other platforms that support Golang.

  1. git clone github.com/openconfig/gnmi-gateway
  2. cd gnmi-gateway
  3. make tls (If you have your own TLS server certificates you may use them instead. It is recommended that you do not use these generated self-signed certificates in production.)
  4. Copy targets-example.json to targets.json and modify it to match your gNMI target. You need to modify the target name, target address, and credentials.
  5. make run
  6. gnmi-gateway should now be running. If you are unable to get gnmi-gateway running at this point please check the ./gnmi-gateway -help dialog for tips (assuming the binary built) and then file an issue on Github if you are still unsuccessful.

Examples

gNMI to Prometheus Exporter

gnmi-gateway ships with an Exporter that allows you to export OpenConfig-modeled gNMI data to Prometheus.

See the README in examples/gnmi-prometheus/ for details on how to start the gnmi-gateway Docker container and connect it to a Prometheus Docker container.

Production Deployment

It is recommended that gnmi-gateway be deployed to immutable infrastructure such as Kubernetes or an AWS EC2 instance (or something else). New version tags can be retrieved from Github and deployed with your configuration.

Most configuration can be done via command-line flags. If you need more complex options for configuring gnmi-gateway or want to configure the gateway at runtime you can create a .go file that imports the gateway package and create a configuration.GatewayConfig instance, passing that to gateway.NewGateway, and then calling StartGateway. For an example of how this is done you can look at the code in Main() in gateway/main.go.

To enable clustering of gnmi-gateway you will need an instance (or ideally a cluster) of Apache Zookeeper accessible to all of the gnmi-gateway instances. Additionally all of the gnmi-gateway instances in the cluster must be able to reach each other over the network.

It is recommended that you limit the deployment of a cluster to a single geographic region or a single geographic area with consistent latency for ideal performance. You may run instances of gnmi-gateway distributed globally but may encounter performance issues. You'll likely encounter timeout issues with Zookeeper as your latency begins to approach the Zookeeper tickTime.

Development

Check the to-do list for any open known issues or new features.

Start Zookeeper for development

This should ony be used for development and not for production. The container will maintain no state; you will have a completely empty Zookeeper tree when this starts/restarts. To start zookeeper and expose the server on 127.0.0.1:2181 run:

docker run -d -p 2181:2181 zookeeper

Test the code

You can test the code by running make test.

You can run integration tests by running make integration. (Ensure you have Zookeeper running on 127.0.0.1:2181.)

You can run test coverage by running make cover.

Build the code

You can build the gnmi-gateway binary by running make build.

Contributions

Please make any changes in a separate fork and make a PR to the release branch when your changes are ready. Tags for new release versions will be cut from the release branch.

You must also sign a one-time CLA for any pull requests to be accepted. See CONTRIBUTING.md for details.

Troubleshooting

"context deadline exceeded" Error

If you see a context deadline exceeded error from the connection manager it means there is some underlying issue that is causing the connection to a target to fail. This seems to often be a TLS issue (wrong certs, bad config, etc) but it could be something else. Try running gnmi-gateway with gRPC connection logging enabled. For example:

GRPC_GO_LOG_VERBOSITY_LEVEL=99 GRPC_GO_LOG_SEVERITY_LEVEL=info ./gnmi-gateway

gnmi-gateway's People

Contributors

colinmcintosh avatar mathershifter avatar mehrdadrad avatar tardoe avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

gnmi-gateway's Issues

Cisco XR gnmi streaming telemetry error: unsupported encoding: JSON (need proto)

Many thanks for this great opensource project! The cisco router is configured for gnmi.
But i believe as soon as i use Dial in mode, the data will be encode in proto.
So i have to create a custom target loader and give argument: -TargetLoaders=proto ?

When i tried with the default -TargetLoaders=json, i get:
(target "Cisco XR") failed: rpc error: code = Unimplemented desc = gNMI: subscribe: unsupported encoding: JSON;

I tried another package gnmic (https://github.com/karimra/gnmic) which is working fine but is missing the distributed options via zookeeper which i like very much. But at least i know the Cisco router has good connectivity and configured correctly.

I am not that experienced in the golang language, can you please give me an hint how to get it ready for proto encoding..?

Thanks in advance,
Lars

Updates are either delayed or getting lost when 256 network router streaming 10K updates/20sec

Bug description
Updates are either delayed or getting lost when 256 network router streaming 10K updates/20sec. I tried two case when streaming directly from devices m receiving all the updates but while getting it from gateway i m not receiving all updates on time (delayed not at 20 sec its random some time 1 min apart sometime 2 min apart ). I have changed target devices t0 300, "-TargetLimit=300". Any setting needs to to be tweaked, i have tried changing the "GatewayTransitionBufferSize" to higher value but no luck @colinmcintosh any input?

Step to reproduce
Steps to reproduce the behavior. For example:

  1. My config is '....'
  2. Type '....'
  3. Then type '....'
  4. See error

Expected behavior
A clear and concise description of what you expected to happen.

Output or code snippets
If applicable, add code, links to commits, or terminal output to help explain your problem.

Environment details
Please include the output of go version. If you have any additional specific details about your environment to share please include them in the bug report.

Additional context
Add any other context about the problem here.

Enhancement Request: Allow netbox configuration to be defined in JSON config file

We would need to specify json tags on the struct fields to allow these to be marshaled into the configuration struct.

type TargetLoadersConfig struct {

In a DCIM tool, such as netbox, we can specify a Platform - such as EOS as PANOS , NXOS , JUNOS etc . It would be useful to have a platform specific path as (even though this is Openconfig) not all platforms support all paths - i.e. Palo Alto 10.1 code has just introduced support for Openconfig , they don't have all models supported at this time.

Gateway exited with an error: could not open simple config file

I'm on Windows. I have a simple custom server built as follows running on localhost:50051 and I'm trying to follow the instructions in this blog post. I'd like to create an Exporter but getting the error:

{"level":"error","time":"...","message":"Gateway exited with an error: could not open simple config file \"\": open : The system cannot find the file specified."}

looper.proto

syntax = "proto3";

option go_package = "example.com/looper/protobuf";

package looper;

// The greeting service definition.
service Looper {
  rpc GetValue(ValueRequest) returns (ValueReply) {}
}

message ValueRequest {}

message ValueReply {
  int32 value = 1;
}

server/main.go

package main

import (
	"context"
	"log"
	"net"
	"google.golang.org/grpc"
	pb "example.com/looper/protobuf"
)

const (
	port = ":50051"
)

type server struct {
	pb.UnimplementedLooperServer
}

func (s *server) GetValue(ctx context.Context, in *pb.ValueRequest) (*pb.ValueReply, error) {
	return &pb.ValueReply{Value: 23}, nil
}

func main() {
	lis, err := net.Listen("tcp", port)
	if err != nil {
		log.Fatalf("failed to listen: %v", err)
	}
	s := grpc.NewServer()
	pb.RegisterLooperServer(s, &server{})
	if err := s.Serve(lis); err != nil {
		log.Fatalf("failed to serve: %v", err)
	}
}

I've created all the files:

/path/to/repo/gnmi-gateway/gnmi-gateway
$ ls -l
CONTRIBUTING.md
LICENSE
Makefile
README.md
docs/
examples/
gateway/
gateway-config-example.json
gnmi-gateway*
go.mod
go.sum
main.go
server.crt
server.key
targets-example.json
targets-example.yaml
targets.yaml

My targets.yaml looks like this:

---
connection:
  localhost:
    addresses:
      - localhost:50051
    #credentials:
      #username: myusername
      #password: mypassword
    request: demo-request
    meta: {}
request:
  demo-request:
    target: "*"
    paths:
      - "/"
    #  - /components
    #  - /interfaces/interface[name=*]/state/counters
    #  - /interfaces/interface[name=*]/ethernet/state/counters
    #  - /interfaces/interface[name=*]/subinterfaces/subinterface[index=*]/state/counters
    #  - /qos/interfaces/interface[interface-id=*]/output/queues/queue[name=*]/state

Running with:

make build && ./gnmi-gateway -EnableGNMIServer     -ServerTLSCert=server.crt     -ServerTLSKey=server.key     -TargetLoaders=simple     -TargetJSONFile=targets.yaml -Exporters=debug

Get "cache update error: update is stale," error after receiving subscription notification from a target.

Bug description
Get "cache update error: update is stale," error after receiving subscription notification from a target.

Step to reproduce

  1. targets.json:
{
  "request": {
    "default": {
      "subscribe": {
        "prefix": {
        },
        "subscription": [
          {
            "path": {
              "elem": [
                {
                  "name": "interfaces"
                }
              ]
            }
          }
        ]
      }
    }
  },
  "target": {
    "athena-gnmi-server": {
      "addresses": [
        "localhost:8080"
      ],
      "credentials": {
        "username": "admin",
        "password": "admin"
      },
      "request": "default",
      "meta": {
        "NoTLSVerify": "yes"
      }
    }
  }
}
  1. Execute as follows (using host network for simplicity):
$ docker run     -it --rm     -p 59100:59100     -v $(pwd)/examples/gnmi-prometheus/targets.json:/opt/gnmi-gateway/targets.json

Expected behavior
Receive notifications without error messages, expose to northbound.

gnmi-gateway logs:

WARNING: Published ports are discarded when using host network mode
{"level":"info","time":"2021-01-22T02:26:40Z","message":"Starting GNMI Gateway."}
{"level":"info","time":"2021-01-22T02:26:40Z","message":"Clustering is NOT enabled. No locking or cluster coordination will happen."}
{"level":"info","time":"2021-01-22T02:26:40Z","message":"Starting connection manager."}
{"level":"info","time":"2021-01-22T02:26:40Z","message":"Starting gNMI server on 0.0.0.0:9339."}
{"level":"info","time":"2021-01-22T02:26:40Z","message":"Starting Prometheus exporter."}
{"level":"info","time":"2021-01-22T02:26:40Z","message":"Connection manager received a target control message: 1 inserts 0 removes"}
{"level":"info","time":"2021-01-22T02:26:40Z","message":"Initializing target athena-gnmi-server ([localhost:8080]) map[NoTLSVerify:yes]."}
{"level":"info","time":"2021-01-22T02:26:40Z","message":"Target athena-gnmi-server: Connecting"}
{"level":"info","time":"2021-01-22T02:26:40Z","message":"Target athena-gnmi-server: Subscribing"}
{"level":"info","time":"2021-01-22T02:26:40Z","message":"Target athena-gnmi-server: Connected"}
{"level":"info","time":"2021-01-22T02:26:40Z","message":"Starting Prometheus HTTP server."}
{"level":"info","time":"2021-01-22T02:26:40Z","message":"Target athena-gnmi-server: Disconnected"}
E0122 02:26:40.826565       1 reconnect.go:114] client.Subscribe (target "athena-gnmi-server") failed: target 'athena-gnmi-server' cache update error: update is stale, update is stale, update is stale, update is stale, update is stale, update is stale, update is stale, update is stale, update is stale, update is stale, update is stale, update is stale, update is stale: { timestamp:1611282400 prefix:/athena-gnmi-server update:[ { path:/interfaces/interface[name=tx]/state/counters/in-octets val:&{2048000000} } { path:/interfaces/interface[name=tx]/state/counters/out-octets val:&{1518000000} } { path:/interfaces/interface[name=tx]/state/counters/out-pkts val:&{4000000} } { path:/interfaces/interface[name=tx]/config/description val:&{localhost:5555;1} } { path:/interfaces/interface[name=tx]/ethernet/state/counters/in-crc-errors val:&{0} } { path:/interfaces/interface[name=tx]/config/type val:&{ethernetCsmacd} } { path:/interfaces/interface[name=tx]/state/out-rate val:&{[0 0 0 0]} } { path:/interfaces/interface[name=tx]/state/counters/in-pkts val:&{4000000} } { path:/interfaces/interface[name=tx]/ethernet/config/port-speed val:&{SPEED_1GB} } { path:/interfaces/interface[name=tx]/state/in-rate val:&{[0 0 0 0]} } { path:/interfaces/interface[name=tx]/config/name val:&{tx} } { path:/interfaces/interface[name=tx]/name val:&{tx} } { path:/interfaces/interface[name=tx]/state/oper-status val:&{UP} } ] }; reconnecting in 552.330144ms
{"level":"info","time":"2021-01-22T02:26:41Z","message":"Target athena-gnmi-server: Connected"}
{"level":"info","time":"2021-01-22T02:26:41Z","message":"Target athena-gnmi-server: Disconnected"}

Target logs (partial) showing notifications being sent to server:

I0121 18:38:42.289725    8200 data_client.go:306] Notifications: [timestamp:1611283122 update:{path:{elem:{name:"interfaces"} elem:{name:"interface" key:{key:"name" value:"tx"}} elem:{name:"state"} elem:{name:"out-rate"}} val:{bytes_val:"\x00\x00\x00\x00"}} update:{path:{elem:{name:"interfaces"} elem:{name:"interface" key:{key:"name" value:"tx"}} elem:{name:"state"} elem:{name:"counters"} elem:{name:"in-pkts"}} val:{uint_val:4000002}} update:{path:{elem:{name:"interfaces"} elem:{name:"interface" key:{key:"name" value:"tx"}} elem:{name:"state"} elem:{name:"counters"} elem:{name:"out-octets"}} val:{uint_val:1518000000}} update:{path:{elem:{name:"interfaces"} elem:{name:"interface" key:{key:"name" value:"tx"}} elem:{name:"state"} elem:{name:"counters"} elem:{name:"out-pkts"}} val:{uint_val:4000002}} update:{path:{elem:{name:"interfaces"} elem:{name:"interface" key:{key:"name" value:"tx"}} elem:{name:"config"} elem:{name:"description"}} val:{string_val:"localhost:5555;1"}} update:{path:{elem:{name:"interfaces"} elem:{name:"interface" key:{key:"name" value:"tx"}} elem:{name:"ethernet"} elem:{name:"state"} elem:{name:"counters"} elem:{name:"in-crc-errors"}} val:{uint_val:0}} update:{path:{elem:{name:"interfaces"} elem:{name:"interface" key:{key:"name" value:"tx"}} elem:{name:"ethernet"} elem:{name:"config"} elem:{name:"port-speed"}} val:{string_val:"SPEED_1GB"}} update:{path:{elem:{name:"interfaces"} elem:{name:"interface" key:{key:"name" value:"tx"}} elem:{name:"name"}} val:{string_val:"tx"}} update:{path:{elem:{name:"interfaces"} elem:{name:"interface" key:{key:"name" value:"tx"}} elem:{name:"config"} elem:{name:"type"}} val:{string_val:"ethernetCsmacd"}} update:{path:{elem:{name:"interfaces"} elem:{name:"interface" key:{key:"name" value:"tx"}} elem:{name:"state"} elem:{name:"counters"} elem:{name:"in-octets"}} val:{uint_val:2048000414}} update:{path:{elem:{name:"interfaces"} elem:{name:"interface" key:{key:"name" value:"tx"}} elem:{name:"state"} elem:{name:"in-rate"}} val:{bytes_val:"\x00\x00\x00\x00"}} update:{path:{elem:{name:"interfaces"} elem:{name:"interface" key:{key:"name" value:"tx"}} elem:{name:"config"} elem:{name:"name"}} val:{string_val:"tx"}} update:{path:{elem:{name:"interfaces"} elem:{name:"interface" key:{key:"name" value:"tx"}} elem:{name:"state"} elem:{name:"oper-status"}} val:{string_val:"UP"}}]. Err <nil>

Environment details

  • go version go1.15.6 linux/amd64 (used to build target)
  • host networking

Additional context
I'm able to subscribe to this client using gnmi_cli, see below. Note this version is patched to support some of the options below, not sure it matters).

Patch inside gnmi_cli source dir:

  wget https://raw.githubusercontent.com/Azure/sonic-telemetry/master/patches/gnmi_cli.all.patch; patch -p4 < gnmi_cli.all.patch

Subscribe result:

$ gnmi_cli -a localhost:8080 -logtostderr -insecure -t ATHENA  -query "/device/interfaces"  -qt s -streaming_type SAMPLE  -streaming_sample_interval 2 -v 5 -display_type s
I0121 18:43:35.681664    8544 register.go:113] Attempting client types: [gnmi]
I0121 18:43:35.741912    8544 register.go:126] client "gnmi" create with type *client.Client
I0121 18:43:35.752914    8544 client.go:186] update:{timestamp:1611283415 update:{path:{elem:{name:"interfaces"} elem:{name:"interface" key:{key:"name" value:"tx"}} elem:{name:"state"} elem:{name:"counters"} elem:{name:"in-octets"}} val:{uint_val:2048000414}} update:{path:{elem:{name:"interfaces"} elem:{name:"interface" key:{key:"name" value:"tx"}} elem:{name:"ethernet"} elem:{name:"state"} elem:{name:"counters"} elem:{name:"in-crc-errors"}} val:{uint_val:0}} update:{path:{elem:{name:"interfaces"} elem:{name:"interface" key:{key:"name" value:"tx"}} elem:{name:"ethernet"} elem:{name:"config"} elem:{name:"port-speed"}} val:{string_val:"SPEED_1GB"}} update:{path:{elem:{name:"interfaces"} elem:{name:"interface" key:{key:"name" value:"tx"}} elem:{name:"config"} elem:{name:"name"}} val:{string_val:"tx"}} update:{path:{elem:{name:"interfaces"} elem:{name:"interface" key:{key:"name" value:"tx"}} elem:{name:"state"} elem:{name:"out-rate"}} val:{bytes_val:"\x00\x00\x00\x00"}} update:{path:{elem:{name:"interfaces"} elem:{name:"interface" key:{key:"name" value:"tx"}} elem:{name:"state"} elem:{name:"counters"} elem:{name:"in-pkts"}} val:{uint_val:4000002}} update:{path:{elem:{name:"interfaces"} elem:{name:"interface" key:{key:"name" value:"tx"}} elem:{name:"state"} elem:{name:"counters"} elem:{name:"out-octets"}} val:{uint_val:1518000000}} update:{path:{elem:{name:"interfaces"} elem:{name:"interface" key:{key:"name" value:"tx"}} elem:{name:"state"} elem:{name:"counters"} elem:{name:"out-pkts"}} val:{uint_val:4000002}} update:{path:{elem:{name:"interfaces"} elem:{name:"interface" key:{key:"name" value:"tx"}} elem:{name:"config"} elem:{name:"description"}} val:{string_val:"localhost:5555;1"}} update:{path:{elem:{name:"interfaces"} elem:{name:"interface" key:{key:"name" value:"tx"}} elem:{name:"state"} elem:{name:"in-rate"}} val:{bytes_val:"\x00\x00\x00\x00"}} update:{path:{elem:{name:"interfaces"} elem:{name:"interface" key:{key:"name" value:"tx"}} elem:{name:"name"}} val:{string_val:"tx"}} update:{path:{elem:{name:"interfaces"} elem:{name:"interface" key:{key:"name" value:"tx"}} elem:{name:"state"} elem:{name:"oper-status"}} val:{string_val:"UP"}} update:{path:{elem:{name:"interfaces"} elem:{name:"interface" key:{key:"name" value:"tx"}} elem:{name:"config"} elem:{name:"type"}} val:{string_val:"ethernetCsmacd"}}}
interfaces/interface/tx/state/counters/in-octets, 2048000414
interfaces/interface/tx/ethernet/state/counters/in-crc-errors, 0
interfaces/interface/tx/ethernet/config/port-speed, SPEED_1GB
interfaces/interface/tx/config/name, tx
interfaces/interface/tx/state/out-rate, [0 0 0 0]
interfaces/interface/tx/state/counters/in-pkts, 4000002
interfaces/interface/tx/state/counters/out-octets, 1518000000
interfaces/interface/tx/state/counters/out-pkts, 4000002
interfaces/interface/tx/config/description, localhost:5555;1
interfaces/interface/tx/state/in-rate, [0 0 0 0]
interfaces/interface/tx/name, tx
interfaces/interface/tx/state/oper-status, UP
interfaces/interface/tx/config/type, ethernetCsmacd

Support for Path.Origin field in Subscribe request

Some network vendors (e.g. Arista) have implemented support for non-OpenConfig models via GNMI utilising the Origin parameter on a Path.

I can see some comments among the code (e.g.

// TODO(yusufsn) : Origin field in the Path may need to be included
) indicating a future desire to support this behaviour.

Before I submit a PR, is there any particular desire for this to be supported and in any particular manner?

If/When I do submit a PR, this issue can serve as a reference.

Prometheus exporter causes panic if result of e.deltaCalc.Calc(metricHash, value) is < 0

Bug description

Prometheus exporter causes panic if result of e.deltaCalc.Calc(metricHash, value) is < 0

delta, exists := e.deltaCalc.Calc(metricHash, value)

Step to reproduce

  1. Setup the gnmi-prometheus example as described in https://github.com/openconfig/gnmi-gateway/blob/release/examples/gnmi-prometheus/README.md

  2. reload device being polled

Expected behavior
The Prometheus exporter should handle counter resets without a panic

Output or code snippets

"level":"info","time":"2021-01-17T23:09:08Z","message":"Target lab1-fab1-pod1-leaf1.hyposcaler.net: Connected"}
panic: counter cannot decrease in value

goroutine 49 [running]:
github.com/prometheus/client_golang/prometheus.(*counter).Add(0xc0010d3560, 0xc115413c00000000)
        /go/pkg/mod/github.com/prometheus/[email protected]/prometheus/counter.go:109 +0x116
github.com/openconfig/gnmi-gateway/gateway/exporters/prometheus.(*PrometheusExporter).Export(0xc0005b2270, 0xc004a4e5d0)
        /opt/gnmi-gateway/gateway/exporters/prometheus/prometheus.go:103 +0x22f
github.com/openconfig/gnmi-gateway/gateway.(*CacheClient).run(0xc00478bb00)
        /opt/gnmi-gateway/gateway/gateway.go:150 +0x55
created by github.com/openconfig/gnmi-gateway/gateway.NewCacheClient
        /opt/gnmi-gateway/gateway/gateway.go:136 +0x1bf
 

Environment details
Please include the output of go version. If you have any additional specific details about your environment to share please include them in the bug report.

go version go1.15.5 linux/amd64

Target device being polled was an arista vEOS (ami-036a5d80077d33df2)

Additional context

I was able to eliminate the issue by replacing:

if exists {
m.Add(delta)
}

with

  if exists && delta >= 0 {
    m.Add(delta)
  }

NX-OS issues

Love your project, i've been able to get it work with Arista vEOS, but have been unable to make it work with Cisco NX-OS 9.3(8).

When i run it against my NX-OS vm, it starts to subscribe, but never finishes.

gateway | {"level":"info","time":"2022-01-05T20:47:12Z","message":"Starting GNMI Gateway."} gateway | {"level":"info","time":"2022-01-05T20:47:12Z","message":"Clustering is NOT enabled. No locking or cluster coordination will happen."} gateway | {"level":"info","time":"2022-01-05T20:47:12Z","message":"Starting connection manager."} gateway | {"level":"info","time":"2022-01-05T20:47:12Z","message":"Starting gNMI server on 0.0.0.0:9339."} gateway | {"level":"info","time":"2022-01-05T20:47:12Z","message":"Starting InfluxDBv2 exporter."} gateway | {"level":"info","time":"2022-01-05T20:47:12Z","message":"Connection manager received a target control message: 1 inserts 0 removes"} gateway | {"level":"info","time":"2022-01-05T20:47:12Z","message":"Initializing target nx9300.example.com ([192.168.2.237:50051]) map[NoTLSVerify:yes]."} gateway | {"level":"info","time":"2022-01-05T20:47:12Z","message":"Target nx9300.example.com: Connecting"} gateway | {"level":"info","time":"2022-01-05T20:47:12Z","message":"Target nx9300.example.com: Subscribing"} gateway | INFO: 2022/01/05 20:47:12 [core] parsed scheme: "" gateway | INFO: 2022/01/05 20:47:12 [core] scheme "" not registered, fallback to default scheme gateway | INFO: 2022/01/05 20:47:12 [core] ccResolverWrapper: sending update to cc: {[{192.168.2.237:50051 <nil> 0 <nil>}] <nil> <nil>} gateway | INFO: 2022/01/05 20:47:12 [core] ClientConn switching balancer to "pick_first" gateway | INFO: 2022/01/05 20:47:12 [core] Channel switches to new LB policy "pick_first" gateway | INFO: 2022/01/05 20:47:12 [core] Subchannel Connectivity change to CONNECTING gateway | INFO: 2022/01/05 20:47:12 [core] Subchannel picks a new address "192.168.2.237:50051" to connect gateway | INFO: 2022/01/05 20:47:12 [core] pickfirstBalancer: UpdateSubConnState: 0xc000627500, {CONNECTING <nil>} gateway | INFO: 2022/01/05 20:47:12 [core] Channel Connectivity change to CONNECTING gateway | INFO: 2022/01/05 20:47:12 [core] Subchannel Connectivity change to READY gateway | INFO: 2022/01/05 20:47:12 [core] pickfirstBalancer: UpdateSubConnState: 0xc000627500, {READY <nil>} gateway | INFO: 2022/01/05 20:47:12 [core] Channel Connectivity change to READY gateway | {"level":"info","time":"2022-01-05T20:47:42Z","message":"Connection manager received a target control message: 1 inserts 0 removes"} gateway | {"level":"info","time":"2022-01-05T20:48:12Z","message":"Connection manager received a target control message: 1 inserts 0 removes"} gateway | {"level":"info","time":"2022-01-05T20:48:42Z","message":"Connection manager received a target control message: 1 inserts 0 removes"} gateway | {"level":"info","time":"2022-01-05T20:49:12Z","message":"Connection manager received a target control message: 1 inserts 0 removes"} gateway | {"level":"info","time":"2022-01-05T20:49:42Z","message":"Connection manager received a target control message: 1 inserts 0 removes"} gateway | {"level":"info","time":"2022-01-05T20:50:12Z","message":"Connection manager received a target control message: 1 inserts 0 removes"} gateway | {"level":"info","time":"2022-01-05T20:50:42Z","message":"Connection manager received a target control message: 1 inserts 0 removes"} gateway | {"level":"info","time":"2022-01-05T20:51:12Z","message":"Connection manager received a target control message: 1 inserts 0 removes"} gateway | {"level":"info","time":"2022-01-05T20:51:42Z","message":"Connection manager received a target control message: 1 inserts 0 removes"} gateway | {"level":"info","time":"2022-01-05T20:52:12Z","message":"Connection manager received a target control message: 1 inserts 0 removes"}

Here is the targets.json i'm using

{ "request": { "default": { "subscribe": { "encoding": "PROTO", "prefix": { }, "subscription": [ { "path": { "elem": [ { "name": "interfaces" } ] } } ] } } }, "target": { "nx9300.example.com": { "addresses": [ "192.168.2.237:50051" ], "credentials": { "username": "XXXX", "password": "XXX" }, "request": "default", "meta": { "NoTLSVerify": "yes" } } } }
Any help you could provide would be appreciated.

Using multiple instances within k8s deployment with zookeeper triggers panic and crashes pod

Hi, so I've been trying to use gnmi-gateway within a k8s cluster and deploy it with clustering using zookeeper.

For this configuration, everything is working fine:

image

As you can see here, the gateway acquires the lock for each target, syncs with them and starts sending requests and overall the deployment works fine:

image

Then, for testing purposes, I manually scaled up the gnmi-gateway deployment to two replicas, and started having issues with the second replica of the deployment, while the first one keeps running fine:

image
image

And then the pod enters in a crashloop.

I haven't figured out if I am missing some configuration when I traced the error and I was wondering what could be causing that panic.

Openconfig Path Question

I'm using gnmi-gateway (details below)

./gnmi-gateway --version
gnmi-gateway version v0.11.1-002a9b0 (Built 2021-08-23T22:15:23Z)

I'm trying to figure out how to translate an Openconfig path into my configuration.

If I use gnmic tool as follows :

root@XXXX-XXXX-logs:~# gnmic -a <hostname>:6030 -u <user> -p <password> --insecure subscribe --path "/interfaces/interface/state/counters" --stream-mode on_change > output.txt
^Croot@XXXX-XXXX-logs:~# tail output.txt 
    {
      "Path": "interfaces/interface[name=Management1]/state/counters/out-unicast-pkts",
      "values": {
        "interfaces/interface/state/counters/out-unicast-pkts": 237966235
      }
    }
  ]
}

You can see I get output

In my gnmi-gateway configuration file

{
  "request": {
    "default": {
      "subscribe": {
        "prefix": {
        },
        "subscription": [
          {
            "path": {
              "elem": [
                {
                  "name": "/interfaces/interface/state/counters"
                }
              ]
            }
          }
        ]
      }
    }
  },
  "target": {
    "<host>": {
      "addresses": [
        "<host>:6030"
      ],
      "credentials": {
        "username": "xxxxxx",
        "password": "xxxxxx"
      },
      "request": "default",
      "meta": {
        "NoTLSVerify": "yes"
      }
    }

  }
}

When using that same path I get

{"level":"info","time":"2021-08-23T18:22:58-07:00","message":"Target <host>.gh.st: Disconnected"}
E0823 18:22:58.050605   71921 reconnect.go:114] client.Subscribe (target "<host>.st") failed: rpc error: code = InvalidArgument desc = failed to subscribe to /\/interfaces\/interface\/state\/counters: path invalid: failed to access node "/interfaces/interface/state/counters" in node ""; reconnecting in 1.040217184s
INFO: 2021/08/23 18:22:58 [transport] transport: loopyWriter.run returning. connection error: desc = "transport is closing"

Can you advise how I use more specific paths ?

"NoTLS:yes" ignored

Hi there

I am trying to test gnmi-gateway against an Arista vEOS switch. I have the Arista side configured without TLS in the gnmi side but gnmi-gateway is still trying to negotiate TLS although the targets.json file says to ignore it

This the logging from gnmi-gateway

go:1.14.6|py:3.7.3|tomas@vm2:~/gnmi/gnmi-gateway release$ GRPC_GO_LOG_VERBOSITY_LEVEL=99 GRPC_GO_LOG_SEVERITY_LEVEL=info ./gnmi-gateway -TargetLoaders=json -TargetJSONFile=./examples/gnmi-prometheus/targets.json -EnableGNMIServer -Exporters=prometheus -OpenConfigDirectory=./oc-models/ -ServerTLSCert=server.crt -ServerTLSKey=server.key
{"level":"info","time":"2020-11-07T19:37:59Z","message":"Starting GNMI Gateway."}
{"level":"info","time":"2020-11-07T19:37:59Z","message":"Clustering is NOT enabled. No locking or cluster coordination will happen."}
{"level":"info","time":"2020-11-07T19:37:59Z","message":"Starting connection manager."}
{"level":"info","time":"2020-11-07T19:37:59Z","message":"Starting gNMI server on 0.0.0.0:9339."}
{"level":"info","time":"2020-11-07T19:37:59Z","message":"Starting Prometheus exporter."}
{"level":"info","time":"2020-11-07T19:37:59Z","message":"Connection manager received a target control message: 1 inserts 0 removes"}
{"level":"info","time":"2020-11-07T19:37:59Z","message":"Initializing target gcp-r1 ([192.168.249.4:3333]) map[NoTLS:yes]."}
{"level":"info","time":"2020-11-07T19:37:59Z","message":"Target gcp-r1: Connecting"}
{"level":"info","time":"2020-11-07T19:37:59Z","message":"Target gcp-r1: Subscribing"}
INFO: 2020/11/07 19:37:59 parsed scheme: ""
INFO: 2020/11/07 19:37:59 scheme "" not registered, fallback to default scheme
INFO: 2020/11/07 19:37:59 ccResolverWrapper: sending update to cc: {[{192.168.249.4:3333  <nil> 0 <nil>}] <nil> <nil>}
INFO: 2020/11/07 19:37:59 ClientConn switching balancer to "pick_first"
INFO: 2020/11/07 19:37:59 Channel switches to new LB policy "pick_first"
INFO: 2020/11/07 19:37:59 Subchannel Connectivity change to CONNECTING
INFO: 2020/11/07 19:37:59 Subchannel picks a new address "192.168.249.4:3333" to connect
INFO: 2020/11/07 19:37:59 pickfirstBalancer: UpdateSubConnState: 0xc0005aa270, {CONNECTING <nil>}
INFO: 2020/11/07 19:37:59 Channel Connectivity change to CONNECTING
WARNING: 2020/11/07 19:37:59 grpc: addrConn.createTransport failed to connect to {192.168.249.4:3333 192.168.249.4:3333 <nil> 0 <nil>}. Err: connection error: desc = "transport: authentication handshake failed: tls: first record does not look like a TLS handshake". Reconnecting...
INFO: 2020/11/07 19:37:59 Subchannel Connectivity change to TRANSIENT_FAILURE
INFO: 2020/11/07 19:37:59 pickfirstBalancer: UpdateSubConnState: 0xc0005aa270, {TRANSIENT_FAILURE connection error: desc = "transport: authentication handshake failed: tls: first record does not look like a TLS handshake"}
INFO: 2020/11/07 19:37:59 Channel Connectivity change to TRANSIENT_FAILURE
{"level":"info","time":"2020-11-07T19:38:00Z","message":"Starting Prometheus HTTP server."}
INFO: 2020/11/07 19:38:00 Subchannel Connectivity change to CONNECTING
INFO: 2020/11/07 19:38:00 Subchannel picks a new address "192.168.249.4:3333" to connect
INFO: 2020/11/07 19:38:00 pickfirstBalancer: UpdateSubConnState: 0xc0005aa270, {CONNECTING <nil>}
INFO: 2020/11/07 19:38:00 Channel Connectivity change to CONNECTING
WARNING: 2020/11/07 19:38:00 grpc: addrConn.createTransport failed to connect to {192.168.249.4:3333 192.168.249.4:3333 <nil> 0 <nil>}. Err: connection error: desc = "transport: authentication handshake failed: tls: first record does not look like a TLS handshake". Reconnecting...

This is targets.json

go:1.14.6|py:3.7.3|tomas@vm2:~/gnmi/gnmi-gateway release$ cat examples/gnmi-prometheus/targets.json 
{
  "request": {
    "default": {
      "subscribe": {
        "prefix": {
        },
        "subscription": [
          {
            "path": {
              "elem": [
                {
                  "name": "interfaces"
                }
              ]
            }
          }
        ]
      }
    }
  },
  "target": {
    "gcp-r1": {
      "addresses": [
        "192.168.249.4:3333"
      ],
      "credentials": {
        "username": "xxx",
        "password": "xxx"
      },
      "request": "default",
      "meta": {
        "NoTLS": "yes"
      }
    }
  }
}

This is the Arista say seeing TLS packets:

bash-4.2# tcpdump -i any "tcp port 3333 and (tcp[((tcp[12] & 0xf0) >> 2)] = 0x16)"

tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on any, link-type LINUX_SLL (Linux cooked v1), capture size 262144 bytes
19:47:01.367197  In 1e:3d:5b:13:d8:fe (oui Unknown) ethertype IPv4 (0x0800), length 320: 10.128.0.4.50486 > 192.168.249.4.dec-notes: Flags [P.], seq 2715923852:2715924104, ack 2576249027, win 511, options [nop,nop,TS val 1194424180 ecr 1250876], length 252
19:47:02.405870  In 1e:3d:5b:13:d8:fe (oui Unknown) ethertype IPv4 (0x0800), length 320: 10.128.0.4.50488 > 192.168.249.4.dec-notes: Flags [P.], seq 680803294:680803546, ack 3839769659, win 511, options [nop,nop,TS val 1194425218 ecr 1251136], length 252
19:47:04.139458  In 1e:3d:5b:13:d8:fe (oui Unknown) ethertype IPv4 (0x0800), length 320: 10.128.0.4.50490 > 192.168.249.4.dec-notes: Flags [P.], seq 3963338234:3963338486, ack 1760248652, win 511, options [nop,nop,TS val 1194426952 ecr 1251569], length 252

This is my gnmi arista config:

r1#show management api gnmi 
Enabled:            Yes
Server:             running on port 3333, in MGMT VRF
SSL Profile:        none
QoS DSCP:           none
r1#

!
management api gnmi
   transport grpc GRPC
      port 3333
      vrf MGMT
!

I am just following https://github.com/openconfig/gnmi-gateway/tree/release/examples/gnmi-prometheus

I can confirm gnmi works in my vEOS following https://netdevops.me/2020/arista-veos-gnmi-tutorial/

Let me know if you need me to provide more info.

I have the same issue building gnmi-gateway with "build" and "docker"

Thanks

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.