open-telemetry / opentelemetry-specification Goto Github PK

View Code? Open in Web Editor NEW

3.6K 153.0 865.0 9.28 MB

Specifications for OpenTelemetry

Home Page: https://opentelemetry.io

License: Apache License 2.0

Makefile 69.37% Go 27.62% Roff 3.00%

opentelemetry

opentelemetry-specification's Introduction

OpenTelemetry Specification

The OpenTelemetry specification describes the cross-language requirements and expectations for all OpenTelemetry implementations.

The latest release is hosted at opentelemetry.io/docs/specs/otel
Markdown sources are under specification

Change / contribution process

For details, see CONTRIBUTING.md, in particular read Proposing a change before submitting a PR.

Questions

Questions that need additional attention can be brought to the regular specifications meeting. EU and US timezone friendly meeting is held every Tuesday at 8 AM Pacific time. Meeting notes are held in the Google doc. APAC timezone friendly meetings are held on request. See OpenTelemetry calendar.

Escalations to technical committee may be made over the e-mail. Technical committee holds regular meetings, notes are held here.

Specification compliance matrix by language

See, Compliance of Implementations with Specification.

Project Timeline

The current project status as well as information on notable past releases is found at the OpenTelemetry project page.

Information about current work and future development plans is found at the specification development milestones.

Versioning the Specification

Changes to the specification are versioned according to Semantic Versioning 2.0 and described in CHANGELOG.md. Layout changes are not versioned. Specific implementations of the specification should specify which version they implement.

Changes to the change process itself are not currently versioned but may be independently versioned in the future.

License

By contributing to OpenTelemetry Specification repository, you agree that your contributions will be licensed under its Apache 2.0 License.

opentelemetry-specification's People

Contributors

Stargazers

Watchers

Forkers

yurishkuro bogdandrutu iredelmeier dynatrace-oss-contrib c24t sergeykanzhelev danielkhan tigrannajaryan songy23 jmacd mayurkale22 rghetia chaitanyaphalak saiya pavolloffay mukteshkrmishra etsangsplk carlosalberto luvtechno thomashchan1 jeje22 deepakmohanakrishnan07 iamrare honeycombio toumorokoshi freeformz ihuangyaoshi reyang kinvolk bg451 awesomegolang lzchen tsloughter hp685 hekike falenn pinghe mwear austinlparker sirianni peteraritchie devon-ye cloud-land chakra-coder bryannaegele dyladan nenad jriest jkwatson oberon00 bradstewartco billibit lmolkova naiduarvind hlcfan congim gsiteiovn mralias ekkarat-w-gmail-com defcon201 fbogsany nicolasrouquette huikang pcwiese xp10102232 ghadishayban jesesun kimmking dystudio untitaker ferhatelmas discostu105 damon1211 ptescher kbrockhoff ocelotl tukejonny thisthat doytsujin mderazon duonoid ethercrow willingc aviat codeboten newrelic-forks tompaana mikegoldsmith vmihailenco vittalgit gjtempleton codeblanch prydin davebarda tylerbenson martin-macak romaninozemtsev lukeenterprise jtescher murphpdx

opentelemetry-specification's Issues

Proposal: Tracer Components

Proposal: Instead of a global tracer singleton, every library needs to create it's own tracer, which also has a proper name that identifies the library/component that uses the tracer.

Motivation: "Configurability of library/component instrumentation"

Identify the library/component that created a span
Be able to turn the instrumentation of a certain library/component on/off. This might be necessary if the instrumentation
- causes too much overhead
- is buggy (e.g. does not call span.end in every case)
- avoid "double-tracing". E.g. a tracing vendor (e.g. Dynatrace) might already have built-in auto-instrumentation for a certain library. In this case the vendor might want to decide (automatically or config-based) which of the implementations should be enabled/disabled.

Technically, this could be achieved similar to how logging frameworks deal with Loggers and LoggerFactories (log4j).

Tracer tracer = OpenConsensus.Tracing.getTracer("io.opensomething.contrib.mongo");

Every span would contain the information about the Tracer that it was created with. Based on that a span can be discarded or not.

Not sure how a naming convention could look like. Not sure if "component" is the right word, as it has already been used in the past as attribute name. At Dynatrace we call that a "Sensor". Not sure about granularity of "Tracers". Should every class have one? Should Tracing act as a registry (cache already created instances) or as a factory (just create new ones at every call)?

This is kind of related to the GlobalTracer discussion (#107).

Document Java API: Resources API

Java repo was used to combine APIs from OpenTracing and OpenCensus.
Most of APIs are ready to be documented in a form of cross-language specification.

Resources API needs to be described

What should we call the Tags package?

Current Issue:

Tags in OpenTracing currently maps to Attributes in OpenTelemetry.
Tags in OpenCensus are also slightly different: no getters, and they are only available for metrics.

There is a fair amount of evidence that using the name Tags to describe this package is confusing to existing users of the previous projects.

Potential names:

Baggage
Labels
???
Live with Tags and confuse some people...

FAQ: What is the difference between Tags with TTL=0 and Context?

If you put a value in Tags, that value is available via for OpenTelemetry to consume in metrics, traces, and logs. Context is purely for propagating application information within a single process: Context information not placed directly in Tags is not visible to the telemetry system.

Conclusion:

Currently, there is a preference for calling this Baggage.

Users may want to propagate baggage for reasons beyond just labelling telemetry data. For example, feature flagging in a distributed system.
The term Baggage helps indicate to users that they are bloating their RPC calls with this information, so they should be careful about what they put in there. The costs associated with using this mechanism is a very important thing for users to understand.

We are planning to go with Baggage for now. Please post questions and comments if you feel this name is not accurate.

Consider to rename this to opentelemetry-specs?

Pros: Shorter

Terminology: describe adapters/intrumentation/auto-collectors

Describe the concept of adapters

SDK: toSpanData is needed on Span class and it may be called multiple time even before Span was completed

As we discussed in census-instrumentation/opencensus-csharp#134 there are scenarios that require reading properties from the incomplete Spans. It is needed when one want to put Span properties into the tracestate for instance.

Some sort of looking inside the Span may need to be implemented on API level.

API vs SDK: This method should be implemented in the same layer as the callback that will be used. E.g. if the callback will be in API - hook cannot be relying on safe casting to SDK's Span object. Thus should be operating with API's interface of Span. If the hook is implemeted in SDK - then it's OK to only have this method on SDK's SpanSDK class.

CC: @discostu105

API: add a note that Span is an interface and what are the expectations for alternative implementation

Current section about the Span doesn't say anything about the fact that is is an interface that may have an alternative implementation

Exposing tracer property on spans

From open-telemetry/opentelemetry-js#5

Span has a private tracer property, NoRecordSpan doesn't. I'd prefer to make tracer always available on Span like in OpenTracing. Coming from OpenTracing, I found it challenging not having tracer available on spans, especially that first I assumed it is and started to use it, then my code broke one NoRecordSpan(s).

@hekike, @rochdev

Terminology: describe DistributedContext terminology

Describe from high-level tags concepts

Canonical type field for spans

See discussion: #14 (comment)

The proposal is to have a strongly-typed field on a span that will define its type and ultimately - the set of attributed one should expect to see on it and interpret.

The alternative is to have a predefined attribute name for this type.

This change should be driven by scenarios we plan to enable in OpenTelemetry that will use this field. For instance, a strongly typed field can be justified by a scenario like metrics extraction from the specific span types.

Terminology: Describe Agent/Collector

Can be taken from here: open-telemetry/opentelemetry-collector#9

RFC: Structural definition of trace events & distinct structures for all data points

First let me say I'm happy to see the combined efforts here, I'm a strong believer in the value added by this projects goals. The goal of this post is to get some clarification on some of the projects goals and design decisions. I have a lot of opinions on this area but for sake of productive discussion I want to focus on two key areas:

Defining a formal structural specification for trace events.
Declaring distinct events for things that require nesting child objects or holding buffers for any period of time.

The initial rough draft that I used to start the conversation in gitter can be found in this gist, it covers the two points above in more details. I'll be happy to fill this issue out with a more intelligible digest of the gist by request or once I have a better understanding of the projects direction.

Thanks a lot for the efforts being started here.

Package structure common description

There should be some common description of the package structure. See open-telemetry/opentelemetry-js#6 for example of package structure discussion

semantic_convention: remoting client attributes

Document Java API: Metrics API

Java repo was used to combine APIs from OpenTracing and OpenCensus.
Most of APIs are ready to be documented in a form of cross-language specification.

Metrics API needs to be described

The name "IsRecordingEvents" misleading as it represent recording of any information, not just Events

The semantics of IsRecordingEvents is not only about recording events but also setting attributes.

Should it be called IsActive?

API: Span - should it provide `HasEnded` property

I can imagine scenarios where async operation may want to implement different behavior depending on whether the Span that started it up ended or nor. For instance, instead of adding Event to Span, report a separate Span. Or simply log information.

I hit this scenario in the following case:

Redis library in .NET allows to associate "session" with the current span. And there is a way to get all the redis operations happened in the "session". I need an indication that "session" has ended and span.HasEnded is a great way to check it.

Basically, any logic that requires to cache Span objects and then do some logic when Span in this cache ended would need this method.

API vs SDK: I need this on data collection module. So it will be great to have this method a part of API.

API: Fully separate context propagation

Ideally, both in-process and inter-process propagation should be a standalone module that can operate without the involvement of the Tracer.

Placeholder for more discussion.

Metadata alignment

Align on https://github.com/opentracing/specification/blob/master/semantic_conventions.md vs,

https://github.com/census-instrumentation/opencensus-specs/blob/master/trace/HTTP.md
https://github.com/census-instrumentation/opencensus-specs/blob/master/trace/gRPC.md
https://github.com/census-instrumentation/opencensus-specs/blob/master/resource/StandardResources.md

Is custom random generation a requirement for SDK or not?

See open-telemetry/opentelemetry-java#325.

The random generation of the Trace/Span id will be in the SDK. Main reason is that for example to support Amazon X-Ray the TraceID needs to be a specific format for example, also that is an implementation detail.

The question is whether it is a requirement for SDK to allow custom ID generators.

Document Java API: Tracing operations API

Java repo was used to combine APIs from OpenTracing and OpenCensus.
Most of APIs are ready to be documented in a form of cross-language specification.

Tracing API needs to be described

Terminology: describe resources concept

Describe high-level resources terminilogy

API: Describe Status class

Describe Status class and canonical codes

Add whitespace (and other markdown-specific) linter for PRs

Document Java API: DistributedContext API needs to be described

Java repo was used to combine APIs from OpenTracing and OpenCensus.
Most of APIs are ready to be documented in a form of cross-language specification.

Tags API needs to be described

Sync and Async children (FOLLOWS_FROM)

In OpenTracing, we have CHILD_OF and FOLLOWS_FROM. In the new project, we are considering whether to include this concept as a flag on the SpanBuilder option when setting the span parent. The new naming is proposed to be sync and async children, to make the relationship more clear.

Reference PR: open-telemetry/opentelemetry-java#130

Questions:

Do we still want this at all? It can be useful for critical path and other types of trace analysis.
Do we also need an unknown flag as well?

Document Java API: SpanBuilder API

Java repo was used to combine APIs from OpenTracing and OpenCensus.
Most of APIs are ready to be documented in a form of cross-language specification.

SpanBuilder and the ways to construct Span needs to be described

Document that default sampler is 100% sampler

See open-telemetry/opentelemetry-java#347

API: Raw vs. other metrics / measurements are unclear

Re-posting questions from open-telemetry/opentelemetry-java#226 (comment)

in many other metrics APIs that allow recording gauges, timers, and/or histograms, the recording API is still about recording the raw measure, but behind the scene the library may or may not perform some form of aggregation, like LastValue for gauges, or Histogram for timers. So the distinction between raw measure and Gauge/Histogram is not clear to me, since from instrumentation point of view it's always recording the raw value.
on the other hand, counters seem to be fundamentally different and can never be captured via raw measure, is that correct? This breaks the mental model for me.
being able to change aggregation without touching instrumentation is great - I often wanted to see something like queue_length not as a gauge but as a histogram (incidentally tally didn't support histograms for non-timer values)
application of request-scoped tags is also very tricky. As @bogdandrutu said, there are some stats that don't make sense to be labeled with request-scoped tags, e.g. CPU utilization. For measures where we do want to apply context tags, the ability to define views also provides a place to define context tag rules, potentially on a per-measure basis, e.g. "allow tags A and B for measureX, but only tag B for measure Y" (however, see next item).

API: describe SpanContext

Describe SpanContext properties

SDK: document SpanProcessor API

Document decisions made in Java repository: document processor API

Documentation for new interfaces

Includes at least docs for:

New APIs
Changes from OC/OT
Rationale

Interface of binary injector depends on whether we will have a single of two fields for trace context

Current implementation of binary context inject/extract expects that entire SpanContext can be encoded in a single field:

https://github.com/bogdandrutu/openconsensus/blob/42f90bf8196cc1e3913510160cdfdc3995a7119d/api/src/main/java/openconsensus/context/propagation/BinaryFormat.java#L26-L33

However current proposal lists traceparent and tracestate as separate headers. See w3c/trace-context-binary#7 with the proposal to change it

Document Java API: Tracer construction

Java repo was used to combine APIs from OpenTracing and OpenCensus.
Most of APIs are ready to be documented in a form of cross-language specification.

Tracing class construction needs to be described. Tracer for specific library. Global Tracer, etc.

Proposal: Probability sampling algorithm.

Originates from open-telemetry/opentelemetry-java#337 (comment).

In Java we currently use:

We assume the lower 64 bits of the traceId's are randomly distributed around the whole (long)
range. We convert an incoming probability into an upper bound on that value, such that we can
just compare the absolute value of the id and the bound to see if we are within the desired
probability range. Using the low bits of the traceId also ensures that systems that only use 64
bit ID's will also work with this sampler.

how to call auto-collection modules

Modules that can automatically collect telemetry from various technologies needs a common name consistent across the projects. How to name it:

Plugins? Instrumentation? AutoCollector? Other?

semantic_convention: GRPC attributes

Define the list of built-in propagation formats

Define the list of propagation formats that the OpenTelemetry will officially support, proposal:
- W3C for trace https://github.com/w3c/trace-context
- W3c for tags/baggage https://github.com/w3c/correlation-context
- B3, do we commit to support this?
- Jaeger, do we commit to support this?
Where does the implementation for each format live?
- In API?
- In SDK?
- In contrib?

Terminology: describe metrics terminology

A high-level description of metrics terminology.

Initial spec from java client

Create a spec for the opentelemetry API and SDK in the style of opencensus-specs.

The spec should describe the opentelemetry java client as it's implemented now. Where the API isn't well defined (as in issues labeled "Agreement Required"), the spec should note this.

This spec isn't meant to be authoritative yet, it's just meant to help the authors of other opentelemetry clients as we finalize the API.

Clarify span end behavior: should it close all the children

See discussion at open-telemetry/opentelemetry-js#4

Rate limiting/back pressure sampler as a default

See discussion here open-telemetry/opentelemetry-java#347

Document Java API: describe SpanData class

Java repo was used to combine APIs from OpenTracing and OpenCensus. Most of APIs are ready to be documented in a form of cross-language specification.

SpanData class and it's operations needs to be described

Extend semantic conventions for RPC

API: Sampling `Decision` may need to update tracestate, not only attributes

API: describe sampler API

Document decisions made in Java repository: describe sampler API

SDK: Resources SDK

Explain how the resources API is extended by SDK. Specifically:

how resources can be associated with Span Protos
how resoruces can be populated from environment variables

Move this document https://github.com/open-telemetry/opentelemetry-specification/blob/master/work_in_progress/specification/resource/Resource.md to the root folder

Define the list of standard exporters that SDK should support

Proposal for trace:

OpenTelemetry exporter (decide if HTTP vs gRPC).
Jaeger exporter (decide if the new gRPC or Thrift).
Zipkin exporter (probably v2)

Proposal for metrics:

OpenTelemetry exporter (decide if HTTP vs gRPC).
Prometheus exporter.

Another question is, do we want to have these standard exporters in the SDK artifact/package?

semantic_convention: HTTP attributes

Span.Kind.LOAD_BALANCER (PROXY/SIDECAR)

In some cases some of the sidecar/load_balancer implementations will generate only one span between a client and a server spans. Do we want to have a kind that describes this case so that backends will know this is an rpc like client -> proxie -> server?

open-telemetry / opentelemetry-specification Goto Github PK

opentelemetry-specification's Introduction

OpenTelemetry Specification

Change / contribution process

Questions

Specification compliance matrix by language

Project Timeline

Versioning the Specification

License

opentelemetry-specification's People

Contributors

Stargazers

Watchers

Forkers

opentelemetry-specification's Issues

Current Issue:

Potential names:

FAQ: What is the difference between Tags with TTL=0 and Context?

Conclusion:

Recommend Projects

Recommend Topics

Recommend Org