grafadruid / go-druid Goto Github PK

View Code? Open in Web Editor NEW

40.0 40.0 32.0 226 KB

A Golang client for Druid

Home Page: https://join.slack.com/t/grafadruid/shared_invite/zt-1qy0skzy8-axnZuyzaWRm9t8f0r9dUWQ

License: Apache License 2.0

Go 100.00%

go-druid's People

Contributors

Stargazers

Watchers

go-druid's Issues

Boolean. Distinguish between nil and false.

See #46 (review)

write godoc

We need to start documenting the project

Document Native Query Types
Document Native Query Components

Question - Smile encoding/scan queries

Hey!

Just a quick one, I'm going to be using Druid + Go for some pretty heavy data fetching/processing. I was slightly concerned about having to do that via JSON+HTTP because we're talking about millions of rows. I noticed that Scan is supposedly more for this use case as it can stream results back to the client. I also noticed in the docs that it supports Smile encoding, which looks as though it could be a lot more efficient in terms of transferring data from a to b.

I noticed this library among a few others, which looks fantastic. Just wondered if you handled Scan queries by streaming at all? I had a glance through the code and it seemed like a regular HTTP request. Apologies if I'm being stupid or not understanding something here. I also wondered if you'd used Smile encoding/had any plans to support it at all?

I tried to get a response in Smile format using your library + https://github.com/zencoder/go-smile but seemed to get a few errors, which I raised here: apache/druid#10945

Appreciate any help/advice on this. Also, we'll probably use this library, so will be able to offer some help where/if needed :)

Empty Error Response

The error response is empty.

giving up after 6 attempt(s): error response from Druid: {Error: ErrorMessage: ErrorClass: Host:}

Need to investigate.

InDim with null value should fail

Intervals could be optional

Error 1/3

{"context":{"a":"a"},"dataSource":{"name":"wikipedia","type":"table"},"queryType":"dataSourceMetadata"} is failing because intervals is optional for dataSourceMetadata.

The intervals.Load in builder/query/query.go#69 we do a Load with empty []byte which causes JSON Umarshall to fail with the following error Load failed, unexpected end of JSON input . Tackle it using a different PR?

Originally posted by @vigith in #13 (comment)

Base interval

I see and error json: cannot unmarshal object into Go struct field .intervals of type []types.Interval in builder/query/scan.go if the interval passed to Base is a JSON object.

  "intervals": {
    "type": "intervals",
    "intervals": [
      "-146136543-09-08T08:23:32.096Z/146140482-04-24T15:36:27.903Z"
    ]
  }

The reason for this is because builder/types/intervals.go users strings and not structured types

// Interval represents a druid interval.
type Interval struct {
	StartTime time.Time
	EndTime   time.Time
}

Inconsistent Field Naming in search.go: 'builder' Field Instead of Expected 'query'

Tring to build query like below. https://druid.apache.org/docs/latest/querying/filters#search-filter

{
    "filter": {
        "type": "search",
        "dimension": "product",
        "query": {
          "type": "insensitive_contains",
          "value": "foo"
        }
    }
}

But the Search is define with Query as json:"builder,omitempty" in search.go , which won't have query field, but builder field.

type Search struct {
	Base
	Dimension    string               `json:"dimension,omitempty"`
	Query        string               `json:"builder,omitempty"`
	ExtractionFn builder.ExtractionFn `json:"extractionFn,omitempty"`
	FilterTuning *FilterTuning        `json:"filterTuning,omitempty"`
}

tdigest aggregation and post aggregation are missing

I was trying out the tdigest module in druid. Looks like this library currently does not support it.

Query Example:

{
    "queryType": "groupBy",
    "dataSource": "rollup-data1",
    "granularity": "ALL",
    "dimensions": [],
    "aggregations": [{
        "type": "tDigestSketch",
        "name": "merged_sketch",
        "fieldName": "ingested_sketch",
        "compression": 200
    }],
    "postAggregations": [{
        "type": "quantilesFromTDigestSketch",
        "name": "quantiles",
        "fractions": [0, 0.5, 1],
        "field": {
            "type": "fieldAccess",
            "fieldName": "merged_sketch"
        }
    }],
    "intervals": ["2016-01-01T00:00:00.000Z/2021-05-31T00:00:00.000Z"]
}

Not able to create query with dataSource as type `query`

I'm trying to run following query:

{
	"queryType": "timeseries",
	"dataSource": {
		"type": "query",
		"query": {
			"aggregations": [
				{
					"fieldName": "count",
					"name": "count",
					"type": "longSum"
				}
			],
			"dataSource": {
				"name": "dc_94b4f5fdfde940979b79c50539d8322a_b42fde98efed4e638a0016b34b3c10cf_dataset_pre",
				"type": "table"
			},
			"dimension": {
				"dimension": "string_value",
				"type": "default"
			},
			"filter": {
				"fields": [
					{
						"dimension": "_split_name__",
						"type": "selector",
						"value": "train"
					},
					{
						"dimension": "column_name",
						"type": "selector",
						"value": "addressState"
					}
				],
				"type": "and"
			},
			"granularity": "day",
			"intervals": {
				"intervals": [
					"${__from:date:iso}/${__to:date:iso}"
				],
				"type": "intervals"
			},
			"metric": {
				"metric": "count",
				"type": "numeric"
			},
			"queryType": "topN",
			"threshold": 100
		}
	},
	"intervals": {
		"type": "intervals",
		"intervals": [
			"${__from:date:iso}/${__to:date:iso}"
		]
	},
	"granularity": "day"
}

Getting error for query to be null - Looks like we're always returning null here: https://github.com/grafadruid/go-druid/blob/master/builder/datasource/query.go#L24

support offset in scan query

groupby-offset is added in 0.20.0

case for "unsupported type" in Load

In the func Load(data []byte) (builder.Filter, error) not all Load functions have a default case which returns return nil, errors.New("unsupported type"). We need to add the below pattern to all Load functions

        switch t.Typ {
        case "xxx":
                g = NewXXX()
        case "yyy":
                g = NewYYY()
        default:
                return nil, errors.New("unsupported type")
        }

Add support for aggregation of type `expression`

Getting above error when trying to use aggregation of type expression

Provision container for Integration tests

Apart from unit testing, it would be helpful to run the generated queries against Druid server.

Add support for all TDigest Post Aggregation

List of post aggregations not supported:

quantileFromTDigestSketch

Print Druid Response Properly

Print proper error for Druid Response

Currently, it prints:
giving up after 6 attempt(s): Error response from Druid: %!w()

However, the real error message should be

Cannot construct instance of `org.apache.druid.query.groupby.GroupByQuery`, problem: Duplicate output name[filtered_customTags] at [Source: (org.eclipse.jetty.server.HttpInputOverHTTP); line: 1, column: 795]

return error for base unmarshall json

We need to capture the error returned by Base.UnmarshalJSON(data)and return the error. The current code do not capture the return value of UnmarshalJSON(data)

current path

s.Base.UnmarshalJSON(data)
...
return nil

we need to change it to

err = s.Base.UnmarshalJSON(data)
...
return err

granularity all can be simple or can include type

the grammar of druid native queries is confusing sometimes..

  "granularity": {
    "type": "all"
  }

and "granularity": "all" works but

"granularity": {
    "type": "hour"
  }

won’t work.. i.e. all can be of simple and or can include Base

This causes issue when we try to Load() a query that has "granularity": {"type": "all" }

When the filter value or values are null, an error will occur in the query

Hello, I need your help with some issues currently.
The current package's filters, such as In, Selector, and other value values, ignore null values. However, in reality, the Druid database supports null value queries. Why should null values be ignored?

limitSpec: generates wrong fields

go-druid generates limitSpec

"limitSpec": {
    "Typ": "default",
    "columns": [
      {
        "string": "d1",
        "Direction": "descending",
        "dimensionComparator": "numeric"
      }
    ],
    "limit": 3
  }

Where Typ => type
string => dimension
Direction => direction
and dimensionComparator should be dimensionOrder
https://druid.apache.org/docs/latest/querying/limitspec.html#defaultlimitspec
https://druid.apache.org/docs/latest/querying/limitspec.html#orderbycolumnspec

json omitempty ignore valid input for float64 data type

https://github.com/grafadruid/go-druid/blob/master/builder/havingspec/equal_to.go#L6

If we want to filter a column where the value is greater than 0, the json marshal will ignore the value field because of "omitempty" while this is a valid use case.

Not filter

Not filter should get a filter as SetField argument, not a string.

Sub-query example query

Error 2/3

{
"batchSize":20480,
"columns":["__time","channel","cityName","comment","count","countryIsoCode","diffUrl","flags","isAnonymous","isMinor","isNew","isRobot","isUnpatrolled","metroCode","namespace","page","regionIsoCode","regionName","sum_added","sum_commentLength","sum_deleted","sum_delta","sum_deltaBucket","user"],
"dataSource":{"type":"query","query":{"queryType":"scan","dataSource":{"type":"table","name":"A"},"columns":["AT"],"intervals":{"type":"intervals","intervals":["1980-06-12T22:30:00.000Z/2020-01-26T23:00:00.000Z"]}}},
"filter":{"dimension":"countryName","extractionFn":{"locale":"","type":"lower"},"type":"selector","value":"france"},
"intervals":{"type":"intervals","intervals":["1980-06-12T22:30:00.000Z/2020-01-26T23:00:00.000Z"]},
"limit":10,
"order":"descending",
"queryType":"scan"
}

The above querying is failing when executed directly on Apache Druid with below error. I think this is a query error and not a bug in code. Perhaps we just need to fix this query.

Time-ordering on scan queries is only supported for queries with segment specs of type MultipleSpecificSegmentSpec or SpecificSegmentSpec...a [MultipleIntervalSegmentSpec] was received instead.

This query must have never worked indeed. Introduced here: 4995687#diff-7c1b8c5172fe7687c2af90f55f18e7e5eacf13af953ccc2bbeb5de3fa56e2688R25

Probably used only to debug the object model of the query (specific to circular dependency) rather than for getting results. We should fix it so it is valid and return results but still test the sub-query case

Originally posted by @jbguerraz in #13 (comment)

Allow tests be to run locally - Mage

We need to be able to run tests locally before merging code, a Magefile would help to iterate faster.

Additional field validations

Hi.

I'm looking into adding support for a new aggregation type. While doing it, it occurred to me that it might be a good idea to validate the field values as well, according to what Druid requires. For example, the lgK field in DataSketches HLL has to be in the [4,21] range, so the setter method might check this.

What do you think? Returning an error from the setter is a breaking change, so one alternative is storing errors in the struct, that could be returned via an Error method, or something along those lines.

New contributor

Bonjour,

Je viens de forker votre projet pour ajouter la notion de Task nécessaire pour lancer une ingestion de données en mode batch. J'aimerai bien contribuer directement plutôt que via des PR du fait qu'en go, ça n'est pas la panacée de faire une PR à partir d'un fork (find/replace à faire dans les imports et dans go.mod au préalable). Serait-il donc possible de m'ajouter au projet ? Au besoin, j'effectuerai mes ajouts dans une branche séparée et passerai ensuite par une PR pour l'intégration à master. Qu'en pensez-vous ?

grafadruid / go-druid Goto Github PK

go-druid's People

Contributors

Stargazers

Watchers

Forkers

go-druid's Issues

Error 1/3

Error 2/3

Recommend Projects

Recommend Topics

Recommend Org