Code Monkey home page Code Monkey logo

go-druid's People

Contributors

ahuret avatar bourbonkk avatar cosmic-chichu avatar fryuni avatar jbguerraz avatar jmichalak-fluxninja avatar jy4096 avatar marcper avatar nagarajatantry avatar peroxyacyl avatar raakasf avatar saketbairoliya2 avatar vigith avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

go-druid's Issues

write godoc

We need to start documenting the project

  • Document Native Query Types
  • Document Native Query Components

Question - Smile encoding/scan queries

Hey!

Just a quick one, I'm going to be using Druid + Go for some pretty heavy data fetching/processing. I was slightly concerned about having to do that via JSON+HTTP because we're talking about millions of rows. I noticed that Scan is supposedly more for this use case as it can stream results back to the client. I also noticed in the docs that it supports Smile encoding, which looks as though it could be a lot more efficient in terms of transferring data from a to b.

I noticed this library among a few others, which looks fantastic. Just wondered if you handled Scan queries by streaming at all? I had a glance through the code and it seemed like a regular HTTP request. Apologies if I'm being stupid or not understanding something here. I also wondered if you'd used Smile encoding/had any plans to support it at all?

I tried to get a response in Smile format using your library + https://github.com/zencoder/go-smile but seemed to get a few errors, which I raised here: apache/druid#10945

Appreciate any help/advice on this. Also, we'll probably use this library, so will be able to offer some help where/if needed :)

Empty Error Response

The error response is empty.

giving up after 6 attempt(s): error response from Druid: {Error: ErrorMessage: ErrorClass: Host:}

Need to investigate.

Intervals could be optional

Error 1/3

{"context":{"a":"a"},"dataSource":{"name":"wikipedia","type":"table"},"queryType":"dataSourceMetadata"} is failing because intervals is optional for dataSourceMetadata.

The intervals.Load in builder/query/query.go#69 we do a Load with empty []byte which causes JSON Umarshall to fail with the following error Load failed, unexpected end of JSON input . Tackle it using a different PR?

Originally posted by @vigith in #13 (comment)

Base interval

I see and error json: cannot unmarshal object into Go struct field .intervals of type []types.Interval in builder/query/scan.go if the interval passed to Base is a JSON object.

  "intervals": {
    "type": "intervals",
    "intervals": [
      "-146136543-09-08T08:23:32.096Z/146140482-04-24T15:36:27.903Z"
    ]
  }

The reason for this is because builder/types/intervals.go users strings and not structured types

// Interval represents a druid interval.
type Interval struct {
	StartTime time.Time
	EndTime   time.Time
}

Inconsistent Field Naming in search.go: 'builder' Field Instead of Expected 'query'

Tring to build query like below. https://druid.apache.org/docs/latest/querying/filters#search-filter

{
    "filter": {
        "type": "search",
        "dimension": "product",
        "query": {
          "type": "insensitive_contains",
          "value": "foo"
        }
    }
}

But the Search is define with Query as json:"builder,omitempty" in search.go , which won't have query field, but builder field.

type Search struct {
	Base
	Dimension    string               `json:"dimension,omitempty"`
	Query        string               `json:"builder,omitempty"`
	ExtractionFn builder.ExtractionFn `json:"extractionFn,omitempty"`
	FilterTuning *FilterTuning        `json:"filterTuning,omitempty"`
}

tdigest aggregation and post aggregation are missing

I was trying out the tdigest module in druid. Looks like this library currently does not support it.

Query Example:

{
    "queryType": "groupBy",
    "dataSource": "rollup-data1",
    "granularity": "ALL",
    "dimensions": [],
    "aggregations": [{
        "type": "tDigestSketch",
        "name": "merged_sketch",
        "fieldName": "ingested_sketch",
        "compression": 200
    }],
    "postAggregations": [{
        "type": "quantilesFromTDigestSketch",
        "name": "quantiles",
        "fractions": [0, 0.5, 1],
        "field": {
            "type": "fieldAccess",
            "fieldName": "merged_sketch"
        }
    }],
    "intervals": ["2016-01-01T00:00:00.000Z/2021-05-31T00:00:00.000Z"]
}

Not able to create query with dataSource as type `query`

I'm trying to run following query:

{
	"queryType": "timeseries",
	"dataSource": {
		"type": "query",
		"query": {
			"aggregations": [
				{
					"fieldName": "count",
					"name": "count",
					"type": "longSum"
				}
			],
			"dataSource": {
				"name": "dc_94b4f5fdfde940979b79c50539d8322a_b42fde98efed4e638a0016b34b3c10cf_dataset_pre",
				"type": "table"
			},
			"dimension": {
				"dimension": "string_value",
				"type": "default"
			},
			"filter": {
				"fields": [
					{
						"dimension": "_split_name__",
						"type": "selector",
						"value": "train"
					},
					{
						"dimension": "column_name",
						"type": "selector",
						"value": "addressState"
					}
				],
				"type": "and"
			},
			"granularity": "day",
			"intervals": {
				"intervals": [
					"${__from:date:iso}/${__to:date:iso}"
				],
				"type": "intervals"
			},
			"metric": {
				"metric": "count",
				"type": "numeric"
			},
			"queryType": "topN",
			"threshold": 100
		}
	},
	"intervals": {
		"type": "intervals",
		"intervals": [
			"${__from:date:iso}/${__to:date:iso}"
		]
	},
	"granularity": "day"
}

Screenshot 2022-11-16 at 5 02 12 PM

Getting error for query to be null - Looks like we're always returning null here: https://github.com/grafadruid/go-druid/blob/master/builder/datasource/query.go#L24

case for "unsupported type" in Load

In the func Load(data []byte) (builder.Filter, error) not all Load functions have a default case which returns return nil, errors.New("unsupported type"). We need to add the below pattern to all Load functions

        switch t.Typ {
        case "xxx":
                g = NewXXX()
        case "yyy":
                g = NewYYY()
        default:
                return nil, errors.New("unsupported type")
        }

Print Druid Response Properly

Print proper error for Druid Response

Currently, it prints:
giving up after 6 attempt(s): Error response from Druid: %!w()

However, the real error message should be

Cannot construct instance of `org.apache.druid.query.groupby.GroupByQuery`, problem: Duplicate output name[filtered_customTags] at [Source: (org.eclipse.jetty.server.HttpInputOverHTTP); line: 1, column: 795]

return error for base unmarshall json

We need to capture the error returned by Base.UnmarshalJSON(data)and return the error. The current code do not capture the return value of UnmarshalJSON(data)

current path

s.Base.UnmarshalJSON(data)
...
return nil

we need to change it to

err = s.Base.UnmarshalJSON(data)
...
return err

granularity all can be simple or can include type

the grammar of druid native queries is confusing sometimes..

  "granularity": {
    "type": "all"
  }

and "granularity": "all" works but

"granularity": {
    "type": "hour"
  }

won’t work.. i.e. all can be of simple and or can include Base

This causes issue when we try to Load() a query that has "granularity": {"type": "all" }

Not filter

Not filter should get a filter as SetField argument, not a string.

Sub-query example query

Error 2/3

{
"batchSize":20480,
"columns":["__time","channel","cityName","comment","count","countryIsoCode","diffUrl","flags","isAnonymous","isMinor","isNew","isRobot","isUnpatrolled","metroCode","namespace","page","regionIsoCode","regionName","sum_added","sum_commentLength","sum_deleted","sum_delta","sum_deltaBucket","user"],
"dataSource":{"type":"query","query":{"queryType":"scan","dataSource":{"type":"table","name":"A"},"columns":["AT"],"intervals":{"type":"intervals","intervals":["1980-06-12T22:30:00.000Z/2020-01-26T23:00:00.000Z"]}}},
"filter":{"dimension":"countryName","extractionFn":{"locale":"","type":"lower"},"type":"selector","value":"france"},
"intervals":{"type":"intervals","intervals":["1980-06-12T22:30:00.000Z/2020-01-26T23:00:00.000Z"]},
"limit":10,
"order":"descending",
"queryType":"scan"
}

The above querying is failing when executed directly on Apache Druid with below error. I think this is a query error and not a bug in code. Perhaps we just need to fix this query.

Time-ordering on scan queries is only supported for queries with segment specs of type MultipleSpecificSegmentSpec or SpecificSegmentSpec...a [MultipleIntervalSegmentSpec] was received instead.

This query must have never worked indeed. Introduced here: 4995687#diff-7c1b8c5172fe7687c2af90f55f18e7e5eacf13af953ccc2bbeb5de3fa56e2688R25

Probably used only to debug the object model of the query (specific to circular dependency) rather than for getting results. We should fix it so it is valid and return results but still test the sub-query case

Originally posted by @jbguerraz in #13 (comment)

Additional field validations

Hi.

I'm looking into adding support for a new aggregation type. While doing it, it occurred to me that it might be a good idea to validate the field values as well, according to what Druid requires. For example, the lgK field in DataSketches HLL has to be in the [4,21] range, so the setter method might check this.

What do you think? Returning an error from the setter is a breaking change, so one alternative is storing errors in the struct, that could be returned via an Error method, or something along those lines.

New contributor

Bonjour,

Je viens de forker votre projet pour ajouter la notion de Task nécessaire pour lancer une ingestion de données en mode batch. J'aimerai bien contribuer directement plutôt que via des PR du fait qu'en go, ça n'est pas la panacée de faire une PR à partir d'un fork (find/replace à faire dans les imports et dans go.mod au préalable). Serait-il donc possible de m'ajouter au projet ? Au besoin, j'effectuerai mes ajouts dans une branche séparée et passerai ensuite par une PR pour l'intégration à master. Qu'en pensez-vous ?

support long and short interval spec

When query is formulated, we marshal Interval to short spec due to #17. We need to support both long and short spec for all Interval cases except for interval filter.

Support skip TLS verification

Most plugins let you skip TLS verification, though not recommended, this might be helpful in some scenarios. Changes needed in frontend, backend and go-druid

Skip_tls_verification

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.