Code Monkey home page Code Monkey logo

opentelemetry-collector's Issues

Add Memory Limiting Capability to Service

Memory Limiting

Due to the high variability inherent to telemetry data it is hard to estimate the memory consumption when running the agent or collector. In practice users have to allocate the maximum amount of memory that will be available for each instance. On the steady state the memory consumption is expected to be low but in any case the service needs to gracefully handle when data is not flowing as intended. In this case queues/buffers are expected to fill but should be kept under certain limits to avoid OOM crashes (that cause all data on memory to be lost).

This requires:

  1. Tracking memory usage and having configurable limits;
  2. Capability to suspend data ingress when the limits are about to be reached;

Tracking Memory Usage

There should a periodic check of current memory usage against the desired limits. When the limits are about to be reached/crossed the tracking system needs to be able to notify receivers that they should suspend data ingestion.

Capability to Suspend Ingest

Receivers will in a cooperative fashion check if it is ok to add more data to their pipelines. This is cooperative so they can provide the proper response according to their protocol and if they should exert back pressure or not (this should be a configurable option on each receive).

Implementation Plan

  • Modify receivers interface to support cooperative check if it is ok to
    ingest more data.
  • Modify each receiver implementation on core to properly react when it the
    ingestion check informs that it should not ingest more data. Add proper
    configurations regarding back-pressure.
  • Add memory limiter that can be used to notify receivers that they should
    suspend ingestion.

test: TestCreateTraceExporter flaky

Failed once but passed other times, perhaps it is related to using Darwin instead of Linux

--- FAIL: TestCreateTraceExporter (0.37s)
    --- FAIL: TestCreateTraceExporter/Headers (0.00s)
        factory_test.go:145: 
                Error Trace:    factory_test.go:145
                Error:          Expected nil, but got: &status.statusError{Code:1, Message:"grpc: the client connection is closing", Details:[]*any.Any(nil), XXX_NoUnkeyedLiteral:struct {}{}, XXX_unrecognized:[]uint8(nil), XXX_sizecache:0}
                Test:           TestCreateTraceExporter/Headers
FAIL
FAIL    github.com/open-telemetry/opentelemetry-service/exporter/opencensusexporter     0.485s

BUG: Data race in ocmetrics.TestExportProtocolViolations_nodelessFirstMessage

Test randomly failed when building on Travis CI: https://travis-ci.org/open-telemetry/opentelemetry-service/builds/548680880?utm_source=github_status&utm_medium=notification

==================
WARNING: DATA RACE
Read at 0x00c0000ac143 by goroutine 36:
  testing.(*common).logDepth()
      /home/travis/.gimme/versions/go1.12.6.linux.amd64/src/testing/testing.go:629 +0x132
  testing.(*common).Log()
      /home/travis/.gimme/versions/go1.12.6.linux.amd64/src/testing/testing.go:614 +0x78
  github.com/open-telemetry/opentelemetry-service/receiver/opencensusreceiver/ocmetrics.TestExportProtocolViolations_nodelessFirstMessage.func1()
      /home/travis/gopath/src/github.com/open-telemetry/opentelemetry-service/receiver/opencensusreceiver/ocmetrics/opencensus_test.go:194 +0x202
Previous write at 0x00c0000ac143 by main goroutine:
  testing.tRunner.func1()
      /home/travis/.gimme/versions/go1.12.6.linux.amd64/src/testing/testing.go:856 +0x354
  testing.tRunner()
      /home/travis/.gimme/versions/go1.12.6.linux.amd64/src/testing/testing.go:869 +0x17f
  testing.runTests()
      /home/travis/.gimme/versions/go1.12.6.linux.amd64/src/testing/testing.go:1155 +0x523
  testing.(*M).Run()
      /home/travis/.gimme/versions/go1.12.6.linux.amd64/src/testing/testing.go:1072 +0x2eb
  main.main()
      _testmain.go:98 +0x334
Goroutine 36 (running) created at:
  github.com/open-telemetry/opentelemetry-service/receiver/opencensusreceiver/ocmetrics.TestExportProtocolViolations_nodelessFirstMessage()
      /home/travis/gopath/src/github.com/open-telemetry/opentelemetry-service/receiver/opencensusreceiver/ocmetrics/opencensus_test.go:189 +0x3fa
  testing.tRunner()
      /home/travis/.gimme/versions/go1.12.6.linux.amd64/src/testing/testing.go:865 +0x163
==================
panic: Log in goroutine after TestExportProtocolViolations_nodelessFirstMessage has completed
goroutine 51 [running]:
testing.(*common).logDepth(0xc00028a200, 0xc0000270e0, 0x18, 0x3)
	/home/travis/.gimme/versions/go1.12.6.linux.amd64/src/testing/testing.go:634 +0x51a
testing.(*common).log(...)
	/home/travis/.gimme/versions/go1.12.6.linux.amd64/src/testing/testing.go:614
testing.(*common).Log(0xc00028a200, 0xc000055750, 0x1, 0x1)
	/home/travis/.gimme/versions/go1.12.6.linux.amd64/src/testing/testing.go:642 +0x79
github.com/open-telemetry/opentelemetry-service/receiver/opencensusreceiver/ocmetrics.TestExportProtocolViolations_nodelessFirstMessage.func1(0xc00002c230, 0xc00028a200, 0x77359400, 0xc0002f5a20)
	/home/travis/gopath/src/github.com/open-telemetry/opentelemetry-service/receiver/opencensusreceiver/ocmetrics/opencensus_test.go:194 +0x203
created by github.com/open-telemetry/opentelemetry-service/receiver/opencensusreceiver/ocmetrics.TestExportProtocolViolations_nodelessFirstMessage
	/home/travis/gopath/src/github.com/open-telemetry/opentelemetry-service/receiver/opencensusreceiver/ocmetrics/opencensus_test.go:189 +0x3fb
FAIL	github.com/open-telemetry/opentelemetry-service/receiver/opencensusreceiver/ocmetrics	0.201s
ok  	github.com/open-telemetry/opentelemetry-service/receiver/opencensusreceiver/octrace	1.453s	coverage: 75.9% of statements
ok  	github.com/open-telemetry/opentelemetry-service/receiver/prometheusreceiver	6.282s	coverage: 79.4% of statements
?   	github.com/open-telemetry/opentelemetry-service/receiver/vmmetricsreceiver	[no test files]
ok  	github.com/open-telemetry/opentelemetry-service/receiver/zipkinreceiver	1.253s	coverage: 57.4% of statements
ok  	github.com/open-telemetry/opentelemetry-service/receiver/zipkinreceiver/zipkinscribereceiver	1.030s	coverage: 90.6% of statements
ok  	github.com/open-telemetry/opentelemetry-service/translator/trace	1.016s	coverage: 100.0% of statements
ok  	github.com/open-telemetry/opentelemetry-service/translator/trace/jaeger	1.073s	coverage: 85.4% of statements
ok  	github.com/open-telemetry/opentelemetry-service/translator/trace/spandata	1.034s	coverage: 65.0% of statements
ok  	github.com/open-telemetry/opentelemetry-service/translator/trace/zipkin	1.038s	coverage: 88.0% of statements
make: *** [test-with-cover] Error 1
The command "make travis-ci" exited with 2.

Identify receivers, exporters and processors that will be part of the core.

Core OpenTelemetry Service will include only a subset of receivers and exporters that are currently supported by OpenCensus Service.

The rest of receivers and exporters will be moved to a separate contrib repo.

Here is the proposed list of receivers and exporters that will be supported by OpenTelemetry Service code:

  • Prometheus
  • Jaeger (agent and collector ones)
  • Zipkin
  • OpenCensus, temporarily until OpenTelemetry one is available (we may want to keep OC for longer to facilitate migrations)

Add docker-compose for configV2

We have one under /demos/traces for legacy config. It serves users both as an example and quick way to run tests. We should include the metrics in the composer too (with a prometheus server).

Introduce end-to-end performance test bed

The goal is to a create a test best - controlled environment and tools for conducting performance tests for the OpenTelemetry Agent, including reproducible short-term benchmarks, long-running stability tests and maximum load stress tests.

The test bed must allow specifying a configuration of the test, the load that needs to be generated, and be able to run and produce machine and human readable results. It should be possible to run the test bed locally on a developer machine or on a cloud.

Here is a more detailed proposal:
https://docs.google.com/document/d/1omU06mBYGY0slT1yojttn9BCyp18pHaRjkkYZrY8H4Q/edit#heading=h.9pln30mjg237

test: mem ballast test fails on Darwin

--- FAIL: TestApplication_MemBallast (1.01s)
panic: open /proc/11006/stat: no such file or directory [recovered]
        panic: open /proc/11006/stat: no such file or directory

goroutine 220 [running]:
testing.tRunner.func1(0xc0008fa000)
        /usr/local/Cellar/go/1.12.5/libexec/src/testing/testing.go:830 +0x69d
panic(0x37383c0, 0xc000270840)
        /usr/local/Cellar/go/1.12.5/libexec/src/runtime/panic.go:522 +0x1b5
github.com/open-telemetry/opentelemetry-service/cmd/occollector/app/collector.testMemBallast(0xc0008fa000, 0xc0008a2100, 0x0)
        /Users/pj/go/src/github.com/open-telemetry/opentelemetry-service/cmd/occollector/app/collector/collector_test.go:145 +0x8eb
github.com/open-telemetry/opentelemetry-service/cmd/occollector/app/collector.TestApplication_MemBallast(0xc0008fa000)
        /Users/pj/go/src/github.com/open-telemetry/opentelemetry-service/cmd/occollector/app/collector/collector_test.go:168 +0x97
testing.tRunner(0xc0008fa000, 0x3b31930)
        /usr/local/Cellar/go/1.12.5/libexec/src/testing/testing.go:865 +0x164
created by testing.(*T).Run
        /usr/local/Cellar/go/1.12.5/libexec/src/testing/testing.go:916 +0x65b
FAIL    github.com/open-telemetry/opentelemetry-service/cmd/occollector/app/collector   2.174s

BUG: Data race in TestApplication_StartUnified

From this Travis run: https://travis-ci.org/open-telemetry/opentelemetry-service/builds/548531356?utm_source=github_status&utm_medium=notification

{"level":"info","ts":1561097714.798292,"caller":"collector/collector.go:174","msg":"Starting...","NumCPU":2}
{"level":"info","ts":1561097714.7983959,"caller":"collector/collector.go:93","msg":"Setting up profiler..."}
{"level":"info","ts":1561097714.7984593,"caller":"collector/collector.go:101","msg":"Setting up health checks..."}
{"level":"info","ts":1561097714.798795,"caller":"healthcheck/handler.go:99","msg":"Health Check server started","http-port":43771,"status":"unavailable"}
{"level":"info","ts":1561097714.7989807,"caller":"collector/collector.go:110","msg":"Setting up zPages..."}
{"level":"info","ts":1561097714.7992342,"caller":"collector/collector.go:118","msg":"Running zPages","port":42581}
{"level":"info","ts":1561097715.8005574,"caller":"opencensus/receiver.go:51","msg":"OpenCensus receiver is running.","port":45320}
{"level":"info","ts":1561097715.8007278,"caller":"collector/collector.go:127","msg":"Setting up own telemetry..."}
{"level":"info","ts":1561097715.8020036,"caller":"collector/telemetry.go:93","msg":"Serving Prometheus metrics","port":35317}
{"level":"info","ts":1561097715.8023326,"caller":"collector/collector.go:137","msg":"Everything is ready. Begin running and processing data."}
{"level":"info","ts":1561097715.8031561,"caller":"healthcheck/handler.go:133","msg":"Health Check state change","status":"ready"}
{"level":"info","ts":1561097715.8068607,"caller":"collector/collector.go:157","msg":"Received stop test request"}
{"level":"info","ts":1561097715.8069491,"caller":"healthcheck/handler.go:133","msg":"Health Check state change","status":"unavailable"}
{"level":"info","ts":1561097715.8070078,"caller":"collector/collector.go:194","msg":"Starting shutdown..."}
{"level":"info","ts":1561097715.807472,"caller":"collector/collector.go:205","msg":"Shutdown complete."}
{"level":"info","ts":1561097715.8124511,"caller":"collector/collector.go:285","msg":"Starting...","NumCPU":2}
{"level":"info","ts":1561097715.8126116,"caller":"collector/collector.go:93","msg":"Setting up profiler..."}
{"level":"info","ts":1561097715.8127234,"caller":"collector/collector.go:101","msg":"Setting up health checks..."}
{"level":"info","ts":1561097715.8131378,"caller":"healthcheck/handler.go:99","msg":"Health Check server started","http-port":37679,"status":"unavailable"}
{"level":"info","ts":1561097715.8132315,"caller":"collector/collector.go:110","msg":"Setting up zPages..."}
{"level":"info","ts":1561097715.8135908,"caller":"collector/collector.go:118","msg":"Running zPages","port":45743}
{"level":"info","ts":1561097715.8136768,"caller":"collector/collector.go:127","msg":"Setting up own telemetry..."}
{"level":"info","ts":1561097715.8148317,"caller":"collector/telemetry.go:93","msg":"Serving Prometheus metrics","port":46092}
{"level":"info","ts":1561097715.8150716,"caller":"collector/collector.go:232","msg":"Loading configuration..."}
{"level":"info","ts":1561097715.8159492,"caller":"collector/collector.go:240","msg":"Applying configuration..."}
{"level":"info","ts":1561097715.8763263,"caller":"builder/exporters_builder.go:199","msg":"Exporter is enabled.","exporter":"opencensus"}
{"level":"info","ts":1561097715.8764515,"caller":"builder/pipelines_builder.go:118","msg":"Pipeline is enabled.","pipelines":"traces"}
{"level":"info","ts":1561097715.8765345,"caller":"builder/receivers_builder.go:210","msg":"Receiver is enabled.","receiver":"jaeger","datatype":"traces"}
{"level":"info","ts":1561097715.876596,"caller":"collector/collector.go:264","msg":"Starting receivers..."}
{"level":"info","ts":1561097715.8766484,"caller":"builder/receivers_builder.go:91","msg":"Receiver is starting...","receiver":"jaeger"}
{"level":"info","ts":1561097715.878872,"caller":"builder/receivers_builder.go:96","msg":"Receiver is started.","receiver":"jaeger"}
{"level":"info","ts":1561097715.8789592,"caller":"collector/collector.go:137","msg":"Everything is ready. Begin running and processing data."}
{"level":"info","ts":1561097715.8790317,"caller":"healthcheck/handler.go:133","msg":"Health Check state change","status":"ready"}
{"level":"info","ts":1561097715.8865044,"caller":"collector/collector.go:157","msg":"Received stop test request"}
{"level":"info","ts":1561097715.8866715,"caller":"healthcheck/handler.go:133","msg":"Health Check state change","status":"unavailable"}
{"level":"info","ts":1561097715.8867686,"caller":"collector/collector.go:304","msg":"Starting shutdown..."}
{"level":"info","ts":1561097715.886851,"caller":"collector/collector.go:276","msg":"Stopping receivers..."}
==================
WARNING: DATA RACE
Write at 0x00c000139ae8 by goroutine 54:
  sync.(*WaitGroup).Wait()
      /home/travis/.gimme/versions/go1.12.6.linux.amd64/src/internal/race/race.go:41 +0xef
  github.com/jaegertracing/jaeger/cmd/agent/app/processors.(*ThriftProcessor).Stop()
      /home/travis/gopath/pkg/mod/github.com/jaegertracing/[email protected]/cmd/agent/app/processors/thrift_processor.go:102 +0x11d
Previous read at 0x00c000139ae8 by goroutine 107:
  sync.(*WaitGroup).Add()
      /home/travis/.gimme/versions/go1.12.6.linux.amd64/src/internal/race/race.go:37 +0x169
  github.com/jaegertracing/jaeger/cmd/agent/app/processors.(*ThriftProcessor).Serve()
      /home/travis/gopath/pkg/mod/github.com/jaegertracing/[email protected]/cmd/agent/app/processors/thrift_processor.go:84 +0x62
Goroutine 54 (running) created at:
  github.com/jaegertracing/jaeger/cmd/agent/app.(*Agent).Stop()
      /home/travis/gopath/pkg/mod/github.com/jaegertracing/[email protected]/cmd/agent/app/agent.go:88 +0xa1
  github.com/open-telemetry/opentelemetry-service/receiver/jaegerreceiver.(*jReceiver).stopTraceReceptionLocked.func1()
      /home/travis/gopath/src/github.com/open-telemetry/opentelemetry-service/receiver/jaegerreceiver/trace_receiver.go:208 +0x5ad
  sync.(*Once).Do()
      /home/travis/.gimme/versions/go1.12.6.linux.amd64/src/sync/once.go:44 +0xde
  github.com/open-telemetry/opentelemetry-service/receiver/jaegerreceiver.(*jReceiver).stopTraceReceptionLocked()
      /home/travis/gopath/src/github.com/open-telemetry/opentelemetry-service/receiver/jaegerreceiver/trace_receiver.go:204 +0xa0
  github.com/open-telemetry/opentelemetry-service/receiver/jaegerreceiver.(*jReceiver).StopTraceReception()
      /home/travis/gopath/src/github.com/open-telemetry/opentelemetry-service/receiver/jaegerreceiver/trace_receiver.go:199 +0x8e
  github.com/open-telemetry/opentelemetry-service/cmd/occollector/app/builder.(*builtReceiver).Stop()
      /home/travis/gopath/src/github.com/open-telemetry/opentelemetry-service/cmd/occollector/app/builder/receivers_builder.go:42 +0x2f8
  github.com/open-telemetry/opentelemetry-service/cmd/occollector/app/builder.Receivers.StopAll()
      /home/travis/gopath/src/github.com/open-telemetry/opentelemetry-service/cmd/occollector/app/builder/receivers_builder.go:84 +0xc7
  github.com/open-telemetry/opentelemetry-service/cmd/occollector/app/collector.(*Application).shutdownPipelines()
      /home/travis/gopath/src/github.com/open-telemetry/opentelemetry-service/cmd/occollector/app/collector/collector.go:277 +0xb6
  github.com/open-telemetry/opentelemetry-service/cmd/occollector/app/collector.(*Application).executeUnified()
      /home/travis/gopath/src/github.com/open-telemetry/opentelemetry-service/cmd/occollector/app/collector/collector.go:306 +0x30a
  github.com/open-telemetry/opentelemetry-service/cmd/occollector/app/collector.(*Application).StartUnified.func1()
      /home/travis/gopath/src/github.com/open-telemetry/opentelemetry-service/cmd/occollector/app/collector/collector.go:322 +0x71
  github.com/spf13/cobra.(*Command).execute()
      /home/travis/gopath/pkg/mod/github.com/spf13/[email protected]/command.go:766 +0x8eb
  github.com/spf13/cobra.(*Command).ExecuteC()
      /home/travis/gopath/pkg/mod/github.com/spf13/[email protected]/command.go:852 +0x418
  github.com/open-telemetry/opentelemetry-service/cmd/occollector/app/collector.(*Application).StartUnified()
      /home/travis/gopath/pkg/mod/github.com/spf13/[email protected]/command.go:800 +0x298
  github.com/open-telemetry/opentelemetry-service/cmd/occollector/app/collector.TestApplication_StartUnified.func1()
      /home/travis/gopath/src/github.com/open-telemetry/opentelemetry-service/cmd/occollector/app/collector/collector_test.go:88 +0x78
Goroutine 107 (running) created at:
  github.com/jaegertracing/jaeger/cmd/agent/app.(*Agent).Run()
      /home/travis/gopath/pkg/mod/github.com/jaegertracing/[email protected]/cmd/agent/app/agent.go:75 +0x2bf
  github.com/open-telemetry/opentelemetry-service/receiver/jaegerreceiver.(*jReceiver).startAgent()
      /home/travis/gopath/src/github.com/open-telemetry/opentelemetry-service/receiver/jaegerreceiver/trace_receiver.go:335 +0x3cc
  github.com/open-telemetry/opentelemetry-service/receiver/jaegerreceiver.(*jReceiver).StartTraceReception.func1()
      /home/travis/gopath/src/github.com/open-telemetry/opentelemetry-service/receiver/jaegerreceiver/trace_receiver.go:180 +0x53
  sync.(*Once).Do()
      /home/travis/.gimme/versions/go1.12.6.linux.amd64/src/sync/once.go:44 +0xde
  github.com/open-telemetry/opentelemetry-service/receiver/jaegerreceiver.(*jReceiver).StartTraceReception()
      /home/travis/gopath/src/github.com/open-telemetry/opentelemetry-service/receiver/jaegerreceiver/trace_receiver.go:179 +0xe0
  github.com/open-telemetry/opentelemetry-service/cmd/occollector/app/builder.(*builtReceiver).Start()
      /home/travis/gopath/src/github.com/open-telemetry/opentelemetry-service/cmd/occollector/app/builder/receivers_builder.go:62 +0x312
  github.com/open-telemetry/opentelemetry-service/cmd/occollector/app/builder.Receivers.StartAll()
      /home/travis/gopath/src/github.com/open-telemetry/opentelemetry-service/cmd/occollector/app/builder/receivers_builder.go:93 +0x2c7
  github.com/open-telemetry/opentelemetry-service/cmd/occollector/app/collector.(*Application).setupPipelines()
      /home/travis/gopath/src/github.com/open-telemetry/opentelemetry-service/cmd/occollector/app/collector/collector.go:265 +0x485
  github.com/open-telemetry/opentelemetry-service/cmd/occollector/app/collector.(*Application).executeUnified()
      /home/travis/gopath/src/github.com/open-telemetry/opentelemetry-service/cmd/occollector/app/collector/collector.go:295 +0x26c
  github.com/open-telemetry/opentelemetry-service/cmd/occollector/app/collector.(*Application).StartUnified.func1()
      /home/travis/gopath/src/github.com/open-telemetry/opentelemetry-service/cmd/occollector/app/collector/collector.go:322 +0x71
  github.com/spf13/cobra.(*Command).execute()
      /home/travis/gopath/pkg/mod/github.com/spf13/[email protected]/command.go:766 +0x8eb
  github.com/spf13/cobra.(*Command).ExecuteC()
      /home/travis/gopath/pkg/mod/github.com/spf13/[email protected]/command.go:852 +0x418
  github.com/open-telemetry/opentelemetry-service/cmd/occollector/app/collector.(*Application).StartUnified()
      /home/travis/gopath/pkg/mod/github.com/spf13/[email protected]/command.go:800 +0x298
  github.com/open-telemetry/opentelemetry-service/cmd/occollector/app/collector.TestApplication_StartUnified.func1()
      /home/travis/gopath/src/github.com/open-telemetry/opentelemetry-service/cmd/occollector/app/collector/collector_test.go:88 +0x78
==================
{"level":"info","ts":1561097715.897572,"caller":"collector/collector.go:311","msg":"Shutdown complete."}
--- FAIL: TestApplication_StartUnified (0.10s)
    testing.go:809: race detected during execution of test
FAIL
coverage: 57.1% of statements

Refactor queued-retry

The type has low test coverage, large functions, and poor test coverage. While doing this we should also check its performance.

Rename executable otelsvc_linux simply to otelsvc

The suffix looks superfluous. I don't see a reason to keep this in the final deliverable file name.

To make sure we are able to build executables for multiple platforms simply use bin/$GOOS subdirectory for building.

documentation: add some vision notes to clarify relationships with FluentBit and Prometheus

This is vision/documentation issues.

There are other log forwarding and metrics aggregation products on the market and used by some customers as side cars or collectors. Namely FluentBit and Prometheus. OpenTelemetry agent and collector has some unique capabilities comparing to either of those solutions. It will be great to have some document explaining relationships between those products and OpenTelemetry Agent/Service and give end user some recommendations on when to use what.

Bootstrap OpenTelemetry Service using OpenCensus Service as the base

Goals

We need to bootstrap the OpenTelemetry Service using the existing OpenCensus Service codebase. We previously discussed with @bogdandrutu, @pjanotti and @songy23 and agreed to split the Service codebase into 2 parts: core and contrib. This bootstrapping is a good opportunity to do the splitting by only including in the OpenTelemetry Service core the minimum number of receivers and exporters and moving the rest of functionality to a contrib package (most vendor-specific code).

The contrib package and vendor-specific receivers and exporters will continue to be available and there is no intent to retire it. The intent is to have a clear decoupling in the codebase that facilitates independent contribution of new components in the future, allows to easily create customized versions of a Service and makes it clear that core contributors will be responsible for maintenance of the core while vendor-specific components will be maintained by corresponding vendors (note: this does not exclude dual participation at all - some developers will likely work for vendors and will also be core maintainers).

Proposed Approach

I suggest the following bootstrapping approach:

  1. Fork OpenCensus Service repo to OpenTelemetry org (or copy all commits - it is desirable to preserve the commit history).

  2. Identify receivers, exporters and processors that will be part of the core.

  3. Create a new OpenTelemetry Service binary in the same repo (otel_service). This will reuse existing functionally that is already in the codebase, but will only include receivers and exporters which we consider to be part of the core.

  4. The new otel_service codebase will contain improvements which are already in progress and which are aimed at making the codebase extensible and enable the splitting to core and contrib. This includes 3 initiatives:

    4.1 Decoupling of receiver and exporter implementations from the core logic.

    4.2 Introduction of receiver and exporter factories that can be individually registered to activate them.

    4.3 Implementation of the new configuration format that makes use of factories and allows for greater flexibility in the configuration.

    The functionally of the new otel_service will heavily lean on existing implementation and will be mostly a superset of the current agent/collector functionality when considering core receivers and exporters only (however we will allow deviations if it saves significant implementation effort and makes the service better).

  5. Provide guidelines and example implementations for vendors to follow when they add new receivers and exporters to the contrib package.

  6. With the help of vendors move remaining receivers and exporters to a contrib package (either to a separate repo or to a separate sub-directory in the main repo - TBD).

  7. Provide OpenCensus-to-OpenTelemetry Service migration guidelines for end-users who want to migrate. This will include recommendations on configuration file migration. We will also consider the possibility to support old configuration format in the new binary.

This approach allows us to have significant progress towards 2 stated goals in our vision document: unify the codebase for agent and collector and make the service more extensible.

I would be glad to hear everyone's opinion on this proposal and I am happy to take the lead on implementing the plan once it is accepted.

Logging Damper: avoid heavy logging when appropriate

The service needs to provide good observability to its users but it needs to be careful about not overwhelming the system with logging/metrics/traces. In this regard logging is typically the most problematic one. Ideally we should provide a global solution instead of doing piece meal (the typical log every x times or x seconds).

make docker-otelsvc fails

GOOS=linux /Library/Developer/CommandLineTools/usr/bin/make otelsvc
GO111MODULE=on CGO_ENABLED=0 go build -o ./bin/otelsvc_linux -ldflags "-X github.com/open-telemetry/opentelemetry-service/internal/version.GitHash=3aac7c1 " ./cmd/otelsvc
cp ./bin/ocotelsvc_linux ./cmd/ocotelsvc/
cp: directory ./cmd/ocotelsvc does not exist
make[1]: *** [docker-component] Error 1
make: *** [docker-otelsvc] Error 2

cmd/occollectpr/Dockerfile and cmd/otelsvc/Dockerfile assumes platform is always linux

both of those files assumes the built platform is always linus but in Makefile unde rthe main repo, it tries to detect it on runtime. So this will fail on say darwin.

Example:
FROM alpine:latest as certs
RUN apk --update add ca-certificates

FROM scratch
COPY --from=certs /etc/ssl/certs/ca-certificates.crt /etc/ssl/certs/ca-certificates.crt
COPY occollector_linux / . <---
ENTRYPOINT ["/occollector_linux"] <<-----
EXPOSE 55678

docker-component on Makefile fails on otelsvc

GO111MODULE=on CGO_ENABLED=0 go build -o ./bin/otelsvc_linux -ldflags "-X github.com/open-telemetry/opentelemetry-service/internal/version.GitHash=2a40f54 " ./cmd/otelsvc
cp ./bin/ocotelsvc_linux ./cmd/ocotelsvc/
cp: directory ./cmd/ocotelsvc does not exist

Because:

docker-component is assuming all component has a "oc" prefix.

docker-otelsvc:
COMPONENT=otelsvc $(MAKE) docker-component <<--

Make use of "disabled" flag in configuration

configv2.LoadConfig currently ignores the values of "disabled" flag for all entities. This must be changed. If "disabled=true" the corresponding entity should be unloaded as if it is not present in the configuration and all references from the pipelines to it are also not present.

test: TestCreateReceiver failed on Darwin

Just received this one locally, notice that I'm on Darwin

--- FAIL: TestCreateReceiver (0.00s)
    factory_test.go:43: 
                Error Trace:    factory_test.go:43
                Error:          Expected nil, but got: &errors.errorString{s:"failed to create new VMMetricsCollector: could not read /proc: stat /proc: no such file or directory"}
                Test:           TestCreateReceiver
    factory_test.go:44: 
                Error Trace:    factory_test.go:44
                Error:          Expected value not to be nil.
                Test:           TestCreateReceiver

Fix data race in childProcess.GetResourceConsumption/watchResourceConsumption

=== RUN   TestIdleMode
2019/07/01 09:51:43 Starting Agent (/Users/pj/go/src/github.com/open-telemetry/opentelemetry-service/bin/otelsvc_darwin)
2019/07/01 09:51:43 Writing Agent log to /Users/pj/go/src/github.com/open-telemetry/opentelemetry-service/testbed/tests/results/TestIdleMode/agent.log
2019/07/01 09:51:43 Agent running, pid=50759
==================
WARNING: DATA RACE
Read at 0x00c0002c8618 by goroutine 12:
  github.com/open-telemetry/opentelemetry-service/testbed/testbed.(*childProcess).GetResourceConsumption()
      /Users/pj/go/src/github.com/open-telemetry/opentelemetry-service/testbed/testbed/child_process.go:354 +0x56
  github.com/open-telemetry/opentelemetry-service/testbed/testbed.(*TestCase).logStatsOnce()
      /Users/pj/go/src/github.com/open-telemetry/opentelemetry-service/testbed/testbed/test_case.go:254 +0x56
  github.com/open-telemetry/opentelemetry-service/testbed/testbed.(*TestCase).logStats()
      /Users/pj/go/src/github.com/open-telemetry/opentelemetry-service/testbed/testbed/test_case.go:245 +0x8e

Previous write at 0x00c0002c8618 by goroutine 15:
  github.com/open-telemetry/opentelemetry-service/testbed/testbed.(*childProcess).watchResourceConsumption()
      /Users/pj/go/src/github.com/open-telemetry/opentelemetry-service/testbed/testbed/child_process.go:240 +0x6c
  github.com/open-telemetry/opentelemetry-service/testbed/testbed.(*TestCase).StartAgent.func1()
      /Users/pj/go/src/github.com/open-telemetry/opentelemetry-service/testbed/testbed/test_case.go:144 +0x47

Goroutine 12 (running) created at:
  github.com/open-telemetry/opentelemetry-service/testbed/testbed.NewTestCase()
      /Users/pj/go/src/github.com/open-telemetry/opentelemetry-service/testbed/testbed/test_case.go:97 +0x709
  github.com/open-telemetry/opentelemetry-service/testbed/tests.TestIdleMode()
      /Users/pj/go/src/github.com/open-telemetry/opentelemetry-service/testbed/tests/perf_test.go:56 +0x50
  testing.tRunner()
      /usr/local/Cellar/go/1.12.5/libexec/src/testing/testing.go:865 +0x163

Goroutine 15 (running) created at:
  github.com/open-telemetry/opentelemetry-service/testbed/testbed.(*TestCase).StartAgent()
      /Users/pj/go/src/github.com/open-telemetry/opentelemetry-service/testbed/testbed/test_case.go:143 +0x2fd
  github.com/open-telemetry/opentelemetry-service/testbed/tests.TestIdleMode()
      /Users/pj/go/src/github.com/open-telemetry/opentelemetry-service/testbed/tests/perf_test.go:62 +0xc3
  testing.tRunner()
      /usr/local/Cellar/go/1.12.5/libexec/src/testing/testing.go:865 +0x163
==================

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.