Code Monkey home page Code Monkey logo

sketches-go's Introduction

sketches-go

This repo contains Go implementations of the distributed quantile sketch algorithm DDSketch [1]. DDSketch has relative-error guarantees for any quantile q in [0, 1]. That is if the true value of the qth-quantile is x then DDSketch returns a value y such that |x-y| / x < e where e is the relative error parameter. DDSketch is also fully mergeable, meaning that multiple sketches from distributed systems can be combined in a central node.

Our default implementation, returned from NewDefaultDDSketch(relativeAccuracy), is guaranteed [1] not to grow too large in size for any data that can be described by a distribution whose tails are sub-exponential.

We also provide implementations, returned by LogCollapsingLowestDenseDDSketch(relativeAccuracy, maxNumBins) and LogCollapsingHighestDenseDDSketch(relativeAccuracy, maxNumBins), where the q-quantile will be accurate up to the specified relative error for q that is not too small (or large). Concretely, the q-quantile will be accurate up to the specified relative error as long as it belongs to one of the m bins kept by the sketch. For instance, If the values are time in seconds, maxNumBins = 2048 covers a time range from 80 microseconds to 1 year.

Usage

import "github.com/DataDog/sketches-go/ddsketch"

relativeAccuracy := 0.01
sketch := ddsketch.NewDefaultDDSketch(relativeAccuracy)

Add values to the sketch.

import "math/rand"

for i := 0; i < 500; i++ {
  v := rand.NormFloat64()
  sketch.Add(v)
}

Find the quantiles to within alpha relative error.

qs := []float64{0.5, 0.75, 0.9, 1}
quantiles, err := sketch.GetValuesAtQuantiles(qs)

Merge another DDSketch into sketch.

anotherSketch := ddsketch.NewDefaultDDSketch(relativeAccuracy)
for i := 0; i < 500; i++ {
  v := rand.NormFloat64()
  anotherSketch.Add(v)
}
sketch.MergeWith(anotherSketch)

The quantiles are in sketch are still accurate to within relativeAccuracy.

References

[1] Charles Masson and Jee E Rim and Homin K. Lee. DDSketch: A fast and fully-mergeable quantile sketch with relative-error guarantees. PVLDB, 12(12): 2195-2205, 2019. (The code referenced in the paper, including our implementation of the the Greenwald-Khanna (GK) algorithm, can be found at: https://github.com/DataDog/sketches-go/releases/tag/v0.0.1 )

sketches-go's People

Contributors

buyology avatar charlesmasson avatar daniel-peng-ddog avatar darccio avatar ebilling avatar jeerim avatar jeschkies avatar kserrania avatar mheffner avatar nikitabaranov-koho avatar paullegranddc avatar piochelepiotr avatar stephenkappel avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

sketches-go's Issues

Release v0.0.1?

As this is now a dependency of OpenTelemetry Go, we'd prefer it if this repo had a tagged stable version instead of having to use a git hash in our go.mod

MergeWith after calling Clear panics

Describe what happened:
Running MergeWith() with a sketch that has been recently cleared will panic in the buffered_paginated store:

panic: runtime error: index out of range [8] with length 8 [recovered]
	panic: runtime error: index out of range [8] with length 8

goroutine 6 [running]:
testing.tRunner.func1.2({0x79f1c0, 0xc0000223d8})
	/go/1.19.4/go/src/testing/testing.go:1396 +0x24e
testing.tRunner.func1()
	/go/1.19.4/go/src/testing/testing.go:1399 +0x39f
panic({0x79f1c0, 0xc0000223d8})
	/go/1.19.4/go/src/runtime/panic.go:884 +0x212
github.com/DataDog/sketches-go/ddsketch/store.(*BufferedPaginatedStore).page(0xc0000165a0, 0x8000000000000004, 0x2?)
	/git/clones/sketches-go/ddsketch/store/buffered_paginated.go:128 +0x5c5
github.com/DataDog/sketches-go/ddsketch/store.(*BufferedPaginatedStore).MergeWith(0xc0000165a0, {0x85abd8?, 0xc000016640?})
	/git/clones/sketches-go/ddsketch/store/buffered_paginated.go:383 +0x15b
github.com/DataDog/sketches-go/ddsketch.(*DDSketch).MergeWith(0xc000034d40, 0xc000034d80)
	/git/clones/sketches-go/ddsketch/ddsketch.go:300 +0x64
github.com/DataDog/sketches-go/ddsketch.TestMergeAfterClear(0x0?)
	/git/clones/sketches-go/ddsketch/ddsketch_test.go:456 +0xf4
testing.tRunner(0xc0000da820, 0x7f3528)
	/go/1.19.4/go/src/testing/testing.go:1446 +0x10b
created by testing.(*T).Run

Describe what you expected:

Expected this would simply skip the merge since the other sketch was cleared.

Steps to reproduce the issue:

I can reproduce with the following test case:

func TestMergeAfterClear(t *testing.T) {
	s1, err := NewDefaultDDSketch(0.01)
	require.NoError(t, err)

	s2, err := NewDefaultDDSketch(0.01)
	require.NoError(t, err)

	for i := 0; i < 1000; i++ {
		err := s2.Add(rand.NormFloat64())
		require.NoError(t, err)
	}

	s2.Clear()

	err = s1.MergeWith(s2)
	require.NoError(t, err)
}

[feature request] Serialization & Weighted Values

Two useful features for histograms are adding weighted values and the ability to serialize the sketch to and from bytes. Please consider adding these to DDSketch. I will attempt to add them in my own local fork of ddsketch as well.

The signature for adding weighted values is: addWeightedValue(val, count)

current dep google.golang.org/[email protected] requires too many dependencies, one of intransitive deps is vulnerable

Describe what happened:
current dep google.golang.org/[email protected] requires too many dependencies, one of intransitive deps is vulnerable

github.com/DataDog/[email protected]
↑
google.golang.org/[email protected]
↑
google.golang.org/[email protected]
↑
google.golang.org/[email protected]
↑
google.golang.org/[email protected]
↑
google.golang.org/[email protected]
↑
golang.org/x/[email protected]

CVE-2020-14040 is affecting golang.org/x/[email protected]

Describe what you expected:
update google.golang.org/protobuf to 1.26.0

Steps to reproduce the issue:
go mod graph

ProtoReflect method removed from DDSketch, while the lib (protobuf) still depends on it

Describe what happened:
Can't refresh packages due to broken dependencies.

Describe what you expected:
github.com/DataDog/sketches-go/ddsketch/pb/sketchpb
github.com/DataDog/sketches-go/ddsketch/mapping
github.com/DataDog/sketches-go/ddsketch/store
github.com/DataDog/sketches-go/ddsketch
gopkg.in/DataDog/dd-trace-go.v1/ddtrace/tracer
../../../gopkg.in/DataDog/dd-trace-go.v1/ddtrace/tracer/stats.go:294:33: cannot use msg (type *sketchpb.DDSketch) as type protoreflect.ProtoMessage in argument to "google.golang.org/protobuf/proto".Marshal:
*sketchpb.DDSketch does not implement protoreflect.ProtoMessage (missing ProtoReflect method)
../../../gopkg.in/DataDog/dd-trace-go.v1/ddtrace/tracer/stats.go:299:34: cannot use msg (type *sketchpb.DDSketch) as type protoreflect.ProtoMessage in argument to "google.golang.org/protobuf/proto".Marshal:
*sketchpb.DDSketch does not implement protoreflect.ProtoMessage (missing ProtoReflect method)

Steps to reproduce the issue:
Remove any DataDog local package.
Re-download the package
Observe errors stating that protoreflect.ProtoMessage implementation is missing.
Go versions used: 1.15 1.16 1.18

On including detrace and sketches-go module reference, Go Services Bazel build fails

Describe what happened:
We are trying to add DataDog Custom Metrics by referring to https://docs.datadoghq.com/tracing/setup_overview/setup/go/?tab=containers for the Golang Microservices by including following code snippet with import as mentioned in first line

import ( "gopkg.in/DataDog/dd-trace-go.v1/ddtrace/tracer" )

tracer.Start(
	tracer.WithEnv("prod"),
	tracer.WithService("test-go"),
	tracer.WithServiceVersion("abc123"),
)

I have added ddtrace library dependency in go_deps.bzl as following

go_repository(
    name = "in_gopkg_datadog_dd_trace_go_v1",
    importpath = "gopkg.in/DataDog/dd-trace-go.v1",
    sum = "h1:7/gxjN9TuCrGlMEISIk9I9NHeqm+xDXP2C6qw450D2w=",
    version = "v1.35.0",
)

I am getting build error as following:

RROR: /private/var/tmp/_bazel_gangavh/b7b51011abc161034e1e1e818ba502ef/external/com_github_datadog_sketches_go/ddsketch/store/BUILD.bazel:3:11: GoCompilePkg external/com_github_datadog_sketches_go/ddsketch/store/store.a failed: (Exit 1): builder failed: error executing command bazel-out/host/bin/external/go_sdk/builder compilepkg -sdk external/go_sdk -installsuffix darwin_arm64 -src external/com_github_datadog_sketches_go/ddsketch/store/bin.go -src ... (remaining 21 argument(s) skipped)

Use --sandbox_debug to see verbose messages from the sandbox
compilepkg: missing strict dependencies:
/private/var/tmp/_bazel_gangavh/b7b51011abc161034e1e1e818ba502ef/sandbox/darwin-sandbox/661/execroot/ss_mservices/external/com_github_datadog_sketches_go/ddsketch/store/dense_store.go: import of "github.com/DataDog/sketches-go/ddsketch/pb/sketchpb"
/private/var/tmp/_bazel_gangavh/b7b51011abc161034e1e1e818ba502ef/sandbox/darwin-sandbox/661/execroot/ss_mservices/external/com_github_datadog_sketches_go/ddsketch/store/store.go: import of "github.com/DataDog/sketches-go/ddsketch/pb/sketchpb"

I do see URL github.com/DataDog/sketches-go/ddsketch/pb/sketchpb not reachable

Describe what you expected:

Build should complete without errors.

Steps to reproduce the issue:

Can one get CountBelow / CountAbove a percentile?

I would like to use the histogram to get the number of values above/below percentile X.
This could be very helpful since I already got the data (approximated, with the required accuracy).

I couldn't find any such API or internal logic.
Is there a way to get this? If not, is this planned?

Thanks a lot for this great tool!!


Describe what happened:

Describe what you expected:

Steps to reproduce the issue:

Release an updated tag

Describe what happened:
I am looking to play with some of the functions in master however they are not part of the most recent tag as seen by this diff v1.0.0...master

Describe what you expected:
I would expect that any significant changes would also come with a new tagged release for others to consume within their applications.

Steps to reproduce the issue:
No real steps just sadness when looking to use the functions in master but unable to due to semconv imports.

dataset_test.go fails

It seems that
dataset_test.go is outdated as MinRank and Maxrank have bee removed from dataset.go:

# github.com/DataDog/sketches-go/dataset [github.com/DataDog/sketches-go/dataset.test]
./dataset_test.go:22:29: d.MinRank undefined (type *Dataset has no field or method MinRank)
./dataset_test.go:23:29: d.MaxRank undefined (type *Dataset has no field or method MaxRank)
./dataset_test.go:24:29: d.MinRank undefined (type *Dataset has no field or method MinRank)
./dataset_test.go:25:29: d.MaxRank undefined (type *Dataset has no field or method MaxRank)
./dataset_test.go:26:29: d.MinRank undefined (type *Dataset has no field or method MinRank)
./dataset_test.go:27:29: d.MaxRank undefined (type *Dataset has no field or method MaxRank)
./dataset_test.go:28:29: d.MinRank undefined (type *Dataset has no field or method MinRank)
./dataset_test.go:29:29: d.MaxRank undefined (type *Dataset has no field or method MaxRank)
./dataset_test.go:30:29: d.MinRank undefined (type *Dataset has no field or method MinRank)
./dataset_test.go:31:29: d.MaxRank undefined (type *Dataset has no field or method MaxRank)
./dataset_test.go:31:29: too many errors
FAIL	github.com/DataDog/sketches-go/dataset [build failed]

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.