grafadruid / go-druid Goto Github PK
View Code? Open in Web Editor NEWA Golang client for Druid
Home Page: https://join.slack.com/t/grafadruid/shared_invite/zt-1qy0skzy8-axnZuyzaWRm9t8f0r9dUWQ
License: Apache License 2.0
A Golang client for Druid
Home Page: https://join.slack.com/t/grafadruid/shared_invite/zt-1qy0skzy8-axnZuyzaWRm9t8f0r9dUWQ
License: Apache License 2.0
See #46 (review)
We need to start documenting the project
Hey!
Just a quick one, I'm going to be using Druid + Go for some pretty heavy data fetching/processing. I was slightly concerned about having to do that via JSON+HTTP because we're talking about millions of rows. I noticed that Scan is supposedly more for this use case as it can stream results back to the client. I also noticed in the docs that it supports Smile encoding, which looks as though it could be a lot more efficient in terms of transferring data from a to b.
I noticed this library among a few others, which looks fantastic. Just wondered if you handled Scan queries by streaming at all? I had a glance through the code and it seemed like a regular HTTP request. Apologies if I'm being stupid or not understanding something here. I also wondered if you'd used Smile encoding/had any plans to support it at all?
I tried to get a response in Smile format using your library + https://github.com/zencoder/go-smile but seemed to get a few errors, which I raised here: apache/druid#10945
Appreciate any help/advice on this. Also, we'll probably use this library, so will be able to offer some help where/if needed :)
The error response is empty.
giving up after 6 attempt(s): error response from Druid: {Error: ErrorMessage: ErrorClass: Host:}
Need to investigate.
{"context":{"a":"a"},"dataSource":{"name":"wikipedia","type":"table"},"queryType":"dataSourceMetadata"}
is failing because intervals
is optional for dataSourceMetadata.
The intervals.Load
in builder/query/query.go#69
we do a Load with empty []byte
which causes JSON Umarshall to fail with the following error Load failed, unexpected end of JSON input
. Tackle it using a different PR?
Originally posted by @vigith in #13 (comment)
I see and error json: cannot unmarshal object into Go struct field .intervals of type []types.Interval
in builder/query/scan.go
if the interval passed to Base
is a JSON object.
"intervals": {
"type": "intervals",
"intervals": [
"-146136543-09-08T08:23:32.096Z/146140482-04-24T15:36:27.903Z"
]
}
The reason for this is because builder/types/intervals.go
users strings and not structured types
// Interval represents a druid interval.
type Interval struct {
StartTime time.Time
EndTime time.Time
}
Tring to build query like below. https://druid.apache.org/docs/latest/querying/filters#search-filter
{
"filter": {
"type": "search",
"dimension": "product",
"query": {
"type": "insensitive_contains",
"value": "foo"
}
}
}
But the Search is define with Query as json:"builder,omitempty"
in search.go , which won't have query field, but builder field.
type Search struct {
Base
Dimension string `json:"dimension,omitempty"`
Query string `json:"builder,omitempty"`
ExtractionFn builder.ExtractionFn `json:"extractionFn,omitempty"`
FilterTuning *FilterTuning `json:"filterTuning,omitempty"`
}
I was trying out the tdigest module in druid. Looks like this library currently does not support it.
Query Example:
{
"queryType": "groupBy",
"dataSource": "rollup-data1",
"granularity": "ALL",
"dimensions": [],
"aggregations": [{
"type": "tDigestSketch",
"name": "merged_sketch",
"fieldName": "ingested_sketch",
"compression": 200
}],
"postAggregations": [{
"type": "quantilesFromTDigestSketch",
"name": "quantiles",
"fractions": [0, 0.5, 1],
"field": {
"type": "fieldAccess",
"fieldName": "merged_sketch"
}
}],
"intervals": ["2016-01-01T00:00:00.000Z/2021-05-31T00:00:00.000Z"]
}
I'm trying to run following query:
{
"queryType": "timeseries",
"dataSource": {
"type": "query",
"query": {
"aggregations": [
{
"fieldName": "count",
"name": "count",
"type": "longSum"
}
],
"dataSource": {
"name": "dc_94b4f5fdfde940979b79c50539d8322a_b42fde98efed4e638a0016b34b3c10cf_dataset_pre",
"type": "table"
},
"dimension": {
"dimension": "string_value",
"type": "default"
},
"filter": {
"fields": [
{
"dimension": "_split_name__",
"type": "selector",
"value": "train"
},
{
"dimension": "column_name",
"type": "selector",
"value": "addressState"
}
],
"type": "and"
},
"granularity": "day",
"intervals": {
"intervals": [
"${__from:date:iso}/${__to:date:iso}"
],
"type": "intervals"
},
"metric": {
"metric": "count",
"type": "numeric"
},
"queryType": "topN",
"threshold": 100
}
},
"intervals": {
"type": "intervals",
"intervals": [
"${__from:date:iso}/${__to:date:iso}"
]
},
"granularity": "day"
}
Getting error for query to be null - Looks like we're always returning null here: https://github.com/grafadruid/go-druid/blob/master/builder/datasource/query.go#L24
groupby-offset is added in 0.20.0
In the func Load(data []byte) (builder.Filter, error)
not all Load
functions have a default
case which returns return nil, errors.New("unsupported type")
. We need to add the below pattern to all Load
functions
switch t.Typ {
case "xxx":
g = NewXXX()
case "yyy":
g = NewYYY()
default:
return nil, errors.New("unsupported type")
}
Apart from unit testing, it would be helpful to run the generated queries against Druid server.
List of post aggregations not supported:
Print proper error for Druid Response
Currently, it prints:
giving up after 6 attempt(s): Error response from Druid: %!w()
However, the real error message should be
Cannot construct instance of `org.apache.druid.query.groupby.GroupByQuery`, problem: Duplicate output name[filtered_customTags] at [Source: (org.eclipse.jetty.server.HttpInputOverHTTP); line: 1, column: 795]
We need to capture the error returned by Base.UnmarshalJSON(data)
and return the error. The current code do not capture the return value of UnmarshalJSON(data)
current path
s.Base.UnmarshalJSON(data)
...
return nil
we need to change it to
err = s.Base.UnmarshalJSON(data)
...
return err
the grammar of druid native queries is confusing sometimes..
"granularity": {
"type": "all"
}
and "granularity": "all"
works but
"granularity": {
"type": "hour"
}
won’t work.. i.e. all
can be of simple
and or can include Base
This causes issue when we try to Load()
a query that has "granularity": {"type": "all" }
go-druid generates limitSpec
"limitSpec": {
"Typ": "default",
"columns": [
{
"string": "d1",
"Direction": "descending",
"dimensionComparator": "numeric"
}
],
"limit": 3
}
Where Typ
=> type
string
=> dimension
Direction
=> direction
and dimensionComparator
should be dimensionOrder
https://druid.apache.org/docs/latest/querying/limitspec.html#defaultlimitspec
https://druid.apache.org/docs/latest/querying/limitspec.html#orderbycolumnspec
https://github.com/grafadruid/go-druid/blob/master/builder/havingspec/equal_to.go#L6
If we want to filter a column where the value is greater than 0, the json marshal will ignore the value field because of "omitempty" while this is a valid use case.
Not filter should get a filter as SetField
argument, not a string.
Error 2/3
{ "batchSize":20480, "columns":["__time","channel","cityName","comment","count","countryIsoCode","diffUrl","flags","isAnonymous","isMinor","isNew","isRobot","isUnpatrolled","metroCode","namespace","page","regionIsoCode","regionName","sum_added","sum_commentLength","sum_deleted","sum_delta","sum_deltaBucket","user"], "dataSource":{"type":"query","query":{"queryType":"scan","dataSource":{"type":"table","name":"A"},"columns":["AT"],"intervals":{"type":"intervals","intervals":["1980-06-12T22:30:00.000Z/2020-01-26T23:00:00.000Z"]}}}, "filter":{"dimension":"countryName","extractionFn":{"locale":"","type":"lower"},"type":"selector","value":"france"}, "intervals":{"type":"intervals","intervals":["1980-06-12T22:30:00.000Z/2020-01-26T23:00:00.000Z"]}, "limit":10, "order":"descending", "queryType":"scan" }The above querying is failing when executed directly on Apache Druid with below error. I think this is a query error and not a bug in code. Perhaps we just need to fix this query.
Time-ordering on scan queries is only supported for queries with segment specs of type MultipleSpecificSegmentSpec or SpecificSegmentSpec...a [MultipleIntervalSegmentSpec] was received instead.
This query must have never worked indeed. Introduced here: 4995687#diff-7c1b8c5172fe7687c2af90f55f18e7e5eacf13af953ccc2bbeb5de3fa56e2688R25
Probably used only to debug the object model of the query (specific to circular dependency) rather than for getting results. We should fix it so it is valid and return results but still test the sub-query case
Originally posted by @jbguerraz in #13 (comment)
We need to be able to run tests locally before merging code, a Magefile would help to iterate faster.
Hi.
I'm looking into adding support for a new aggregation type. While doing it, it occurred to me that it might be a good idea to validate the field values as well, according to what Druid requires. For example, the lgK
field in DataSketches HLL has to be in the [4,21]
range, so the setter method might check this.
What do you think? Returning an error from the setter is a breaking change, so one alternative is storing errors in the struct, that could be returned via an Error
method, or something along those lines.
Bonjour,
Je viens de forker votre projet pour ajouter la notion de Task nécessaire pour lancer une ingestion de données en mode batch. J'aimerai bien contribuer directement plutôt que via des PR du fait qu'en go, ça n'est pas la panacée de faire une PR à partir d'un fork (find/replace à faire dans les imports et dans go.mod au préalable). Serait-il donc possible de m'ajouter au projet ? Au besoin, j'effectuerai mes ajouts dans une branche séparée et passerai ensuite par une PR pour l'intégration à master. Qu'en pensez-vous ?
#68 modifies builder.Intervals which could cause failures in other components that use builder.Intervals.
When query is formulated, we marshal Interval to short spec due to #17. We need to support both long and short spec for all Interval cases except for interval filter.
When I click on https://grafadruid.slack.com, it expects me to have an account and I do not have. I would like to know more about the project as we use golang
heavily and is evaluating druid
as our OLAP system.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.