cloudfoundry / go-loggregator Goto Github PK

View Code? Open in Web Editor NEW

12.0 12.0 17.0 7.54 MB

Go Client Library for Loggregator

License: Apache License 2.0

Go 98.78% Shell 1.22%

go-loggregator's People

Contributors

Stargazers

Watchers

Forkers

anoop2811 gregpatricio jeanbza keymon kitemongerer masters-of-cats paltanmoy cdlliuy bkabrda qibobo jpmcb signalfx isabella232 garethjevans silvestre geofffranks

go-loggregator's Issues

have the selector being able to select specific events not just type of the events

Now there are predefined selectors which enable selecting/filtering events based on type. How do we use the selector to select specific sub type of the events? Say I only care about application metrics and access logs, I only want container metrics and http start/stop events. How can we do that with the go-loggreator client?

Use gobindata for fixture files

This is useful if you have to compile a test binary to run on another machine. That way you don't also have to copy a bunch of filesystem state around with the test binary. The test brings its own data dependencies.

Question: Should runtime metrics match dropsonde?

Some of the runtime metrics emitted by dropsonde aren't emitted here, like memoryStats.numMallocs, memoryStats.numFrees, and numCPUS. Should metrics match between dropsonde and this package, or were those metrics not useful enough to include here?

Stop duplicating conversion logic

See here and here.

Conversion logic should be shared, not duplicated.

Expose pulseemitter through ingress client

It would be super cool if the pulseemitter functionality was exposed through the ingress client so that we could interact with gauges and counters in the same way as timers and events

Get rid of `go-bindata` as it's not maintained anymore

The original author of go-bindata deleted their github account and someone else remade the repo with the exact same name. This is a bit sketchy. I also couldn't find any go-bindata forks that appear to be actively maintained.

We could keep using this tool, but it seems like we should go back to just having a fixtures directory with some certs for testing, as required.

EnvelopeStreamConnector does not close the gRPC connection

The EnvelopeStreamConnector takes a context to keep track of a stream's lifecycle. However, the underlying connection is never closed. This results in a go-routine leak.

Reorganize v1 compatibility

The v2 API should be the primary code path for this package. I propose we have the following structure:

Current:

/ - compatibility layer
/v1 - v1 client
/v2 - v2 client

Desired:

/ - v2 client
/compat - v1 client and compatibility layer

This will allow for an easy rm -r compat in the future.

Example docs are out of date

These need to be updated:

https://github.com/cloudfoundry/go-loggregator/tree/041998b54f880b3e5460fcc4e7d4d77742dc3d86#example

For instance examples/main.go doesn't exist anymore.

Tests should pass

While working on updating downstream consumers (e.g. code.cloudfoundry.org/diego-logging-client), I noticed the following items are not working for this repo:

Tests fails to run with Go 1.16.x. This is very likely due to certs generated without SANs
rfc5424 is integrated in this repo and the downstream rfc5424 is marked as archived, but none of the changes has been released.
The latest tag 8.0.4 is missing a v. The tag should be in the format of v8.0.4. This is causing go modules failing to pull the latest changes
This repo could use a dependabot and github-action ci for running tests and bumping dependencies independently outside of other releases.
Go mod version should be upgraded from Go 1.12 to Go 1.16

PulseEmitter is difficult to test with

tl;dr

The output types of NewCounterMetric(...) and NewGaugeMetric(...) (*CounterMetric and *GaugeMetric respectively) do not offer a user a straightforward path to create a spy.

Problem

While a PulseEmitter could be injected, the CounterMetric and GaugeMetric force the user to submit an envelope, and then query the underlying contents of the envelope. This implies the user has to have intimate knowledge about Loggregator envelopes, which goes against the goal of go-loggregator.

Proposed Solution

Update NewCounterMetric(...) and NewGaugeMetric(...) to return interface types:

type Gauge interface {
  Set(int64)
}

and

type Counter interface {
  Increment(uint64)
}

This would allow the entire PulseEmitter to be replaced, in-full, with a spy/mock. It would also enable the deletion of GetDelta() from CounterMetric. This method is only useful from a test perspective and could easily be "cleaned" away and break test compatibility.

IngressClient.EmitLog batching messages by default can be confusing

When using IngressClient, if the application exits before batchFlushInterval time has elapsed after calling EmitLog() the message will never be delivered because EmitLog() doesn't immediately send the message, but batches requests and delivers them at some point in the future.

I attempted to work around this by configuring the client using WithBatchMaxSize(0) but this didn't seem to help.

What's the recommended way to immediately deliver a single message without relying on time.Sleep()?

Thanks!

Please configure GITBOT

Pivotal uses GITBOT to synchronize Github issues and pull requests with Pivotal Tracker.
Please add your new repo to the GITBOT config-production.yml in the Gitbot configuration repo.
If you don't have access you can send an ask ticket to the CF admins. We prefer teams to submit their changes via a pull request.

Steps:

Fork this repo: cfgitbot-config
Add your project to config-production.yml file
Submit a PR

If there are any questions, please reach out to [email protected].

Timer create and send wrapper method

It would be nice if there were a wrapper method that would take care of both creating and sending a timer metric.

Vendored types in interfaces break non-module builds

The recent introduction of vendored libraries whose types are used in go-loggregator interfaces (in 78f871c) breaks non-module builds of libraries which use go-loggregator. For example, building an app that uses diego-logging-client results in the following error:

# code.cloudfoundry.org/diego-logging-client
src/code.cloudfoundry.org/diego-logging-client/client.go:99:64: cannot use "google.golang.org/grpc".WithBlock() (type "google.golang.org/grpc".DialOption) as type "code.cloudfoundry.org/go-loggregator/vendor/google.golang.org/grpc".DialOption in argument to loggregator.WithDialOptions:
        "google.golang.org/grpc".DialOption does not implement "code.cloudfoundry.org/go-loggregator/vendor/google.golang.org/grpc".DialOption (missing "code.cloudfoundry.org/go-loggregator/vendor/google.golang.org/grpc".apply method)
src/code.cloudfoundry.org/diego-logging-client/client.go:99:84: cannot use "google.golang.org/grpc".WithTimeout(time.Second) (type "google.golang.org/grpc".DialOption) as type "code.cloudfoundry.org/go-loggregator/vendor/google.golang.org/grpc".DialOption in argument to loggregator.WithDialOptions:
        "google.golang.org/grpc".DialOption does not implement "code.cloudfoundry.org/go-loggregator/vendor/google.golang.org/grpc".DialOption (missing "code.cloudfoundry.org/go-loggregator/vendor/google.golang.org/grpc".apply method)

This is caused by using the vendored grpc.DialOption as a parameter type here:

go-loggregator/ingress_client.go

Line 22 in 78f871c

func WithDialOptions(opts ...grpc.DialOption) IngressOption {

Builds using modules still work, because then vendored libraries are ignored. From what I could find, it doesn't seem to be a good idea to have vendored types in interfaces (see this SO answer).

Close envelope_stream_connector gracefully when selector subcription changes

User scenario:
We have interests in fetch container metrics & http start/stop from v2 API with selector subscription.
Referring to the code
https://github.com/cloudfoundry/go-loggregator/blob/master/examples/envelope_stream_connector/main.go
we can open a stream with the fixed selector, but the connection can't be closed easily if the selector changes on the fly.

For example, we subscribed the metrics for appA and appB at first, then after a while, we would like to subscribe the metrics for appB and appC. In this case, we would like to close previously connection, and open a new one for the new subscription.

Issue found:
The envelope_stream_connector implementation (https://github.com/cloudfoundry/go-loggregator/blob/master/envelope_stream_connector.go ) doesn't expose a method to client to close the stream.
This is the issue I hope to addressed here.

Further Concern:
For above user scenario, if we choose to close the staled connection and issue a new one , there is a probability that some of the metrics of appB maybe lost during the connection switch. Is that possible to modify the subscription on the fly ?

Nil pointer dereference on context cancel when using WithEnvelopeStreamBuffer

The log-store has ran into an issue trying to add the loggregator.WithEnvelopeStreamBuffer and Stream methods. When the provided context is canceled, which we do in our code to gracefully shutdown and handle hanging connections, a nil pointer is dereferenced causing a panic.

Here's a failing test that we were able to reproduce the error:

It("wont panic when context canceled", func() {
	producer, err := newFakeEventProducer()
	Expect(err).NotTo(HaveOccurred())

	// Producer will grab a port on start. When the producer is restarted,
	// it will grab the same port.
	producer.start()
	defer producer.stop()

	tlsConf, err := loggregator.NewIngressTLSConfig(
		fixture("CA.crt"),
		fixture("server.crt"),
		fixture("server.key"),
	)
	Expect(err).NotTo(HaveOccurred())

	var (
		mu     sync.Mutex
		missed int
	)
	addr := producer.addr
	c := loggregator.NewEnvelopeStreamConnector(
		addr,
		tlsConf,
		loggregator.WithEnvelopeStreamBuffer(5, func(m int) {
			mu.Lock()
			defer mu.Unlock()
			missed += m
		}),
	)

	// Use a context that can be canceled
	ctx, cancel := context.WithCancel(context.Background())
	rx := c.Stream(ctx, &loggregator_v2.EgressBatchRequest{})

	var count int
	// Read to allow the diode to notice it dropped data
	go func() {
		for range time.Tick(500 * time.Millisecond) {
			// Do not invoke rx while mu is locked
			l := len(rx())
			mu.Lock()
			count += l
			mu.Unlock()
		}
	}()

	Eventually(func() int {
		mu.Lock()
		defer mu.Unlock()
		return missed
	}).ShouldNot(BeZero())

	// When the context is canceled, the client panics
	cancel()

	mu.Lock()
	l := count
	mu.Unlock()
	Expect(l).ToNot(BeZero())
})

We copied this from "enables buffering" in envelope_stream_connector_test.go but added a cancel-able context.

When running we get:

❯ ./scripts/test
+ ginkgo -r -race
[1599068942] GoLoggregator Suite - 43/43 specs ••••••••••••••••••••••••••••••••panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x181778b]

goroutine 120 [running]:
code.cloudfoundry.org/go-loggregator/v8.(*OneToOneEnvelopeBatch).Next(...)
	/Users/jmcbride/workspace/go-loggregator/one_to_one_envelope_batch_diode.go:45
code.cloudfoundry.org/go-loggregator/v8_test.glob..func1.4.2(0xc000561a30, 0xc0000396e0, 0xc000039700)
	/Users/jmcbride/workspace/go-loggregator/envelope_stream_connector_test.go:186 +0x8c
created by code.cloudfoundry.org/go-loggregator/v8_test.glob..func1.4
	/Users/jmcbride/workspace/go-loggregator/envelope_stream_connector_test.go:183 +0x7b6

Ginkgo ran 1 suite in 8.758037312s
Test Suite Failed

We would expect this client library to be able to take a context that can be canceled as we cancel our context to start a graceful shutdown of our nozzle. Let me know if you have any questions!!

Remove source_instance tag

This is redundant and not needed:

https://github.com/cloudfoundry-incubator/go-loggregator/blob/17682e3bc1157ea3b83e292ef6ee974ba992918c/ingress_client.go#L161-L163
https://github.com/cloudfoundry-incubator/go-loggregator/blob/17682e3bc1157ea3b83e292ef6ee974ba992918c/ingress_client_test.go#L85-L106

SourceId on the envelope is where this is stored.

return error after retry to stream too many times?

I write some code according to the example(https://github.com/cloudfoundry/go-loggregator/blob/master/examples/envelope_stream_connector/main.go)
One problem is that when there is some error in the tlsConfig, the stream will fail and it continuously retry and print out too many error logs.
My question is that is it possible to set a max-retry time so that after max-retry it will return error?

streamConnector := loggregator.NewEnvelopeStreamConnector(
		os.Getenv("LOGS_API_ADDR"),
		tlsConfig,
		loggregator.WithEnvelopeStreamLogger(loggr),
	)

How to execute the examples without a local metron agent?

Hello, not sure if this it the correct place for this question, but to test locally writing code that emits to a loggregator, you would typically want to emit on localhost 3457 to a metron agent, but if you are testing on your laptop how do you simulate this? Is the best way an ssh tunnel into a bosh vm with metron agent or is there some other way to execute the samples? Thanks!

`github.com/golang/protobuf/proto` is deprecated

This project should switch to using google.golang.org/protobuf/proto instead.

Compatability should be compatibility or just compat

There is no WithEventSourceInfo EmitEventOption

We have WithSourceInfo (which is an EmitLogOption), WithCounterSourceInfo, WithGaugeSourceInfo and WithTimerSourceInfo, but there is no EmitEventOption corollary.

Update dependencies

Hi, would it be possible to update some of the dependencies that this project uses? Now that this project is using gomodules it should be possible to use dependabot to create PRs for dependency updates. To make full use of this it would be worth creating a basic build / test process, maybe using GitHub actions to validate that these changes. I'm happy to take a look at this if it would help?

Cannot use gRPC dial options with ingress client

As a user of go-loggregator I would like to be able to pass the ingress client keepalive.ClientParameters or any other grpc.DialOption.

Ingress client test fails sporadically

The test server seems to be the root of the problem.

@jvshahid has a fix. See here.

Examples should set source info

Since we are encouraging users of loggregator to set source_id and instance_id for use in log-cache. We should update our examples to setting the source info.

Unable to generate FakeIngressServer in diego-logging-client

Hello,

We generate FakeIngressServer in diego-logging-client for testing our Send*Log methods in the library. Since the update protobuf PR #81 we see that there is a new private method mustEmbedUnimplementedIngressServer that is making our FakeIngressServer implementation no longer valid. Can you please advice how we are suppose to solve this problem so that we can continue generating a FakeIngressServer.

Context: We got here because we had to upgrade lager, which meant that we have to upgrade ginkgo to v2 everywhere in diego-release, which then meant we had to upgrade from go-loggregator v8 to v9, which then meant we had to upgrade diego-logging-client, which then meant we can no longer generate FakeIngressServer.

Do you have any suggestion for a workaround or other solution for this problem?

Replace use of grpc.Dial and grpc.DialContext with grpc.NewClient

Recently the gRPC Go maintainers shared their intention to deprecate grpc.Dial and grpc.DialContext and encourage users to use grpc.NewClient instead.

go-loggregator accepts arbitrary grpc.DialOptions, some of which will not be honoured after switching to grpc.NewClient. As a result when we switch we should major version bump our module.

Current plan is:

Wait for gRPC Go project to cut a release that deprecates grpc.Dial and grpc.DialContext.
Update the loggregator-api to reflect a new major version of go-loggregator, and re-generate the associated RPC code in go-loggregator.
Bump our module version and cut a new release with release notes that indicate that specific dial options will no longer be honoured.

RFE: Enable logging with gosteno Logger from RLPGatewayClient

Hey 👋! I'm working on porting datadog-firehose-nozzle to loggregator V2 API [1]. I noticed that it's possible to add a log object to the NewRLPGatewayClient function to get logs from the underlying logic. I'd like to request adding an option to add a gosteno Logger, as (IIUC) this is a logging implementation used/developed for cloudfoundry purposes and thus it would be nice to get it supported.

Note that for our usecase, it would also be sufficient to implement a channel with errors that the RLPGatewayClient would be pushing to and we could read from it and do the logging ourselves. I requested this in [2]. (Although technically the errors channel would not be useful for debug/info logs, so this would be useful either way.)

Thanks for considering!

[1] DataDog/datadog-firehose-nozzle#71
[2] #42

Vendor directory only has a subset of dependencies

I would expect all the required dependencies to be in the vendor directory, however there are a few missing.

RFE: Better error handling for RLPGatewayClient

Hey 👋! I'm working on porting datadog-firehose-nozzle to loggregator V2 API [1]. With the noaa/consumer package for V1, we instantiated FilteredFirehose which returned channel with messages (envelopes) and also channel with errors. We could then have custom logic reacting to the errors (emitting custom logging messages etc). Additionally, we were able to use methods like SetMaxRetryCount to make the nozzle fail if there's e.g. a configuration problem and the URL we're trying to connect to has a typo in it.

The RLPGatewayClient structure has no means of sending errors and/or setting a maximum count of connection retries. This means that we have no way of customizing error behavior (either through reading some sort of channel with errors or using specialized methods like SetMaxRetryCount).

The biggest issue with the current codebase is that if user misconfigures the nozzle, it will keep trying to connect forever, which it shouldn't. A misconfigured nozzle should fail (soon-ish). The smaller issue is that we have no way of customizing what error messages will look like and at what levels they will be printed.

Would it be possible to add either a channel with errors or the specialized methods for error handling (or, ideally, both)? I realize that this RFE might in fact result in implementation of two unrelated features, but I hope that's ok for you as I see them very related.

Thanks for considering!

[1] DataDog/datadog-firehose-nozzle#71

v1 support for `runtimeemitter`

As part of diego's transition from v1 to v2 they will need to use the runtimeemitter on v1. We discussed simply adding EmitGauge(opts ...v2.EmitGaugeOption) to loggregator.Client interface and then implementing that method on the v1.Client.