Code Monkey home page Code Monkey logo

promscale's Introduction

Warning

Promscale has been discontinued and is deprecated.

The code in this repository is no longer maintained.

Learn more.

Promscale

Go reviewdog - golangci Go Report Card Code Climate GoDoc

Promscale

Promscale is a unified metric and trace observability backend for Prometheus, Jaeger and OpenTelemetry built on PostgreSQL and TimescaleDB.

Promscale serves as a robust and 100% PromQL-compliant Prometheus remote storage and as a durable and scalable Jaeger storage backend. Promscale is a certified Jaeger storage backend.

Unlike other observability backends, it has a simple and easy-to-manage architecture with just two components: the Promscale Connector and the Promscale Database (PostgreSQL with the TimescaleDB and Promscale extensions).

Quick Start

Try it out now with our demo environment you can deploy on your laptop in five minutes with Docker.

git clone https://github.com/timescale/promscale.git
cd promscale/docker-compose/promscale-demo
docker compose up -d

Explore your metrics and traces in Grafana (http://localhost:3000, username: admin, password: admin) and Jaeger (http://localhost:16686).

Check our short demo guide to learn more.

Key Features

  • Prometheus metric storage: support for remote write, remote read, 100% PromQL, metric metadata, exemplars and Prometheus HA.
  • Certified Jaeger trace storage: Promscale is a certified Jaeger storage backend. Integrate Jaeger with Promscale to store and visualize your traces with a simple configuration change in Jaeger. Use Promscale as the storage backend for the metrics required by the Service Performance Management UI. No need for a separate Prometheus / PromQL compatible storage.
  • OpenTelemetry trace storage: support for ingestion of traces through the OpenTelemetry Protocol (OTLP).
  • Grafana integration: query and visualize your metrics and traces using the PromQL, SQL and Jaeger datasources.
  • Durable and reliable storage: built on top of the maturity of Postgres and TimescaleDB with millions of instances worldwide. A trusted system that offers high availability, replication, data integrity, data compression, backups, authentication, roles and permissions.
  • PromQL Alerts: full support for PromQL alerting rules. You can reuse the Prometheus configuration that you already have.
  • Multi-tenancy: support for Prometheus multi-tenancy so you can restrict data access by tenant.
  • Pick your query language: PromQL for metrics and SQL for metrics and traces. With full SQL support together with TimescaleDB's advanced analytics functions, you can query and correlate metrics, traces, and business data to derive new insights.
  • Flexible data management: configurable default retention for metrics and traces as well as per-metric retention and APIs to delete metric series that are no longer needed.
  • Downsampling: increase the performance of long-term queries by downsampling metrics with PromQL recording rules and TimescaleDB continuous aggregates. Combine downsampling with per-metric retention to only keep the data you need, reduce costs and accelerate performance.
  • Out-of-the-box monitoring: leverage the dashboard, alerting rules and runbooks built by the Promscale team to start monitoring Promscale since the first day following best practices from the team behind the product.
  • Easy data migration: use our prom-migrator tool to effortlessly migrate your existing Prometheus data to Promscale.
  • Simplified deployment on K8s: use tobs to deploy and manage a complete, pre-configured and production-ready observability stack for metrics and traces on a K8s cluster that includes Promscale, Prometheus, OpenTelemetry with auto-instrumentation, Grafana and plenty of out-of-the-box dashboards and alerts.

Architecture

Learn more about Promscale's architecture and how it works.

Promscale Architecture Diagram

Promscale for Prometheus

Promscale provides Prometheus users with:

  • A single-pane-of-glass across all Kubernetes clusters
    Use Promscale as a centralized storage for all your Prometheus instances so you can easily query data across all of them in Grafana and centralize alert management and recording rules. Use multi-tenancy to control who has access to the data for a Kubernetes cluster.

  • Efficient long-term trend analysis
    Use Promscale as a durable long-term storage for Prometheus metrics with a proven and rock-solid foundation based on PostgreSQL and TimescaleDB with millions of instances worldwide. With metric downsampling and per-metric retention you can keep just the data you need for your analysis for as long as you need. This allows you to cut down the costs associated with using the same retention for all data in Prometheus and dramatically improves query performance for long-term queries.

Key features: 100% PromQL-compliant, high availability, multi-tenancy, PromQL alerting and recording rules, downsampling, per-metric retention.

If you are already familiar with PostgreSQL, then Promscale is a great choice for your Prometheus remote storage. You can scale to millions of series and hundreds of thousands of samples per second on a single PostgreSQL node thanks to TimescaleDB.

To get started:

  1. Install Promscale.
  2. Configure Prometheus to send data to Promscale.
  3. Configure Grafana to query and visualize metrics from Promscale using a PromQL and/or a PostgreSQL datasource.

Promscale for Jaeger and OpenTelemetry

Promscale supports ingesting Jaeger and OpenTelemetry traces via the Jaeger Collector and the OpenTelemetry Collector. OpenTelemetry traces can also be sent directly from OpenTelemetry client libraries via the OpenTelemetry Protocol (OTLP). Promscale is a certified Jaeger storage that passess 100% of the compliance tests.

Promscale provides Jaeger and OpenTelemetry users with:

  • An easy-to-use durable and scalable storage backend for traces
    Most users run Jaeger with the in memory or badger storage because the two options recommended for production (Elasticsearch and Cassandra) are difficult to set up and operate. Promscale uses a much simpler architecture based on PostgreSQL which many developers are comfortable with and scales to 100s of thousands of spans per second on a single database node.

  • Service performance analysis
    Because Promscale can store both metrics and traces, you can use the new Service Performance Management feature in Jaeger with Promscale as the only storage backend for the entire experience. Promscale also includes a fully customizable, out-of-the-box, and modern Application Performance Management (APM) experience in Grafana built using SQL queries on traces.

  • Trace analysis
    Jaeger searching capabilities are limited to filtering individual traces. This is helpful when troubleshooting problems once you know what you are looking for. With Promscale you can use SQL to interrogate your trace data in any way you want and discover issues that would usally take you a long time to figure out by just looking at log lines, metric charts or individual traces. You can see some examples in our documentation and in this blog post

Key features: native OTLP support, high availability, SQL queries, APM capabilities, data compression, data retention

Try it out by installing our lightweight opentelemetry-demo with a single command. Check this blog post for more details.

To get started:

  1. Install Promscale.
  2. Send traces to Promscale in Jaeger, OpenTelemetry, or Zipkin format
  3. Configure Jaeger to query and visualize traces from Promscale.

Also consider:

  1. Configure Grafana to query and visualize traces from Promscale using a Jaeger and a PostgreSQL datasource.
  2. Install the APM dashboards in Grafana.

Documentation and Help

Complete user documentation is available at https://docs.timescale.com/promscale/latest/

If you have any questions, please join the #promscale channel on TimescaleDB Slack.

Promscale Repositories

This repository contains the source code of the Promscale Connector. Promscale also requires that the Promscale extension which lives in this repository is installed in the TimescaleDB/PostgreSQL database. The extension sets up and manages the database schemas and provides performance and SQL query experience improvements.

This repository also contains the source code for prom-migrator. Prom-migrator is an open-source, community-driven and free-to-use, universal prometheus data migration tool, that migrates data from one storage system to another, leveraging Prometheus's remote storage endpoints. For more information about prom-migrator, visit prom-migrator's README.

You may also want to check tobs which makes it very easy to deploy a complete observability stack built on Prometheus, OpenTelemetry and Promscale in Kubernetes via helm.

Contributing

We welcome contributions to the Promscale Connector, which is licensed and released under the open-source Apache License, Version 2. The same Contributor's Agreement applies as in TimescaleDB; please sign the Contributor License Agreement (CLA) if you're a new contributor.

Release

Release checklist is available when creating new "Release Checklist" issue.

promscale's People

Contributors

2nick avatar akulkarni avatar alejandrodnm avatar antekresic avatar arajkumar avatar atanasovskib avatar bors[bot] avatar cevian avatar davidkohn88 avatar dependabot[bot] avatar erimatnor avatar harkishen-singh avatar jamesguthrie avatar jgpruitt avatar jlockerman avatar kevinjqiu avatar mfreed avatar nhudson avatar niksajakovljevic avatar onprem avatar paulfantom avatar ramonguiu avatar renovate-bot avatar renovate[bot] avatar sc250024 avatar siddiqueahmad avatar spolcyn avatar sumerman avatar vineethreddy02 avatar wilfriedroset avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

promscale's Issues

Use too much connections

  • timescaledb
    • docker install - latest
    • 16G max-connection 500
  • tow timescale/timescale-prometheus - docker installed latest
    • used all connections, still reporting FATAL: sorry, too many clients already (SQLSTATE 53300)
  • promethes - 10 node-exporter

timescale-prometheus use prepared statement, so can not use transaction polling to reduce the connections.

Remote read not working as expected

Hi Team,
I am able to successfully write metrics to connector in the same format as remote write api of prometheus.I can see metrics in timescaledb.
But when I try to read the metrics using PromQL from prometheus dashboard, I don't see custom metrics that I manually wrote.
I can see other metrics in prometheus like node exporter metrics,etc.
Can you please help me resolve this issue?

Document how to install the TSL version of timescaledb + better error msg if running ApacheOnly

{"caller":"client.go:81","err starting ingestor":"ERROR: functionality not supported under the current license "ApacheOnly", license (SQLSTATE 0A000)","level":"error","ts":"2020-06-30T09:45:10.995Z"}

Connector works good for sometime and after a while connector stops running with the above error.

Also, logged this warning during initial start
{"Got an error finalizing metric: %v":"ERROR: functionality not supported under the current license "ApacheOnly", license (SQLSTATE 0A000)","caller":"pgx.go:287","level":"warn","ts":"2020-06-30T08:16:41.365Z"}

query returned wrong number of labels: 1, 0

but query process_start_time_seconds will failed on prom-b, prom-a can execute correctly

remote_read: remote server http://10.1.1.3:9202/read returned HTTP status 500 Internal Server Error: query returned wrong number of labels: 1, 0

https://github.com/timescale/timescale-prometheus/blob/6693256251629559a609f9e7f56131d6107eba2f/pkg/pgmodel/sql_reader.go#L252

both node_exporter and postgres_exporter has process_start_time_seconds metric

error log

{
  "caller": "read.go:45",
  "err": "query returned wrong number of labels: 1, 0",
  "level": "warn",
  "msg": "Error executing query",
  "query": {
    "queries": [
      {
        "start_timestamp_ms": 1596877133800,
        "end_timestamp_ms": 1596881033800,
        "matchers": [
          { "name": "__name__", "value": "process_start_time_seconds" }
        ],
        "hints": {
          "step_ms": 14000,
          "start_ms": 1596877133800,
          "end_ms": 1596881033800
        }
      }
    ]
  },
  "storage": "PostgreSQL",
  "ts": "2020-08-08T10:03:53.854Z"
}

Production setup installation

Hi. I am new to timescaleDB. I am searching for a way to install it and send my Prometheus instances data on it. The recommended way is to use the timescale-observability Helm chart.

Is this production ready? Or should I use instead the Timescale AWS AMI for more stability?

Thanks for your help!

Read only prometheus with timescaledb remote storage

Hi. I have trouble setting up a prometheus instance which only reads data from a timescaleDB remote storage.

What I tried to do

My idea is to have 2 prometheus instances. One which writes data in a timescale db and another which reads the data from the same timescale db. I use the timescale-observability chart to setup timescaledb + timescale-prometheus (read) + prometheus + grafana. The other prometheus (write) is installed separately in another location.

The result

The "write" prometheus works fine and I can see tables and data in timescaledb. I can even query it with the timescaleDB connector in grafana. However, my "read" prometheus instance can't see anything. It is empty (querying endpoint api/v1/label/__name__/values shows no data). I double checked the prometheus configuration but it is fine. It is actually automatically generated by the chart.

I found very few docs on the remote_read feature of prometheus so I struggle to know if I understand it correctly. From what I saw, prometheus will read data from the remote storage by querying it with matchers and time ranges.Does this means we have to know the metrics beforehand and query them at least one time to make them visible in prometheus?

I also found a few issues similar to mine on the previous adapter:

Questions

In short:

  • Is it possible to setup a "read-only" prometheus with a timescaledb remote storage?
  • Is it encouraged or should we connect directly to the timescaledb database with grafana instead?
    I would prefer to use prometheus in between because it exists much more grafana dashboards for prometheus datasource.
  • If we can do the "read-only" setup, how can I troubleshoot my read problems?
    I have no error logs either in prometheus or the adapter.

Thanks for your help!

FEATURE REQUEST: Having adapter import data that wasn't imported due to database being down

I believe having a feature that allows the adapter to temporarily store metric data while it's unable to connect to the database and have it import that data once it reconnects back to the database. This would help for those who needs to perform maintenance to the database (ie upgrading PostgreSQL from v11 to v12 & migrating data to work with v12) without having to worry about losing data during the maintenance window.

This should also apply to multi-prometheus setups too.

I'm curious to hear what you guys think of this.

Do not enable compression in connector (make ApacheOnly compatible)

Hi,

I am experiencing the "err starting ingestor" bug/feature and it's not clear to me how to disable compression.

Downloading the pre-built timescale-prometheus shows no options to disable compression on runtime. Reading your very thorough documentation on compression, it seems like:

ALTER TABLE <table_name> SET (timescaledb.compress = false);

Should work.. Is this correct?

Remove/replace language internal SQL functions

In our SQL scripts, we are using LANGUAGE internal for two functions. Namely:
label_jsonb_each_text and label_unnest.

Usage of this language is considered un-trusted by many DB providers and we remove or replace them with their equivalents in a trusted language.

runtime error: invalid memory address or nil pointer dereference PromQL API

When using the adapter directly in Grafana and doing a PromQL query such as:

sum (container_memory_working_set_bytes{project="production",region="us-central1",kubernetes_io_hostname=~"(.+)-np1-(.+)"}) / sum (machine_memory_bytes{project="production",region="us-central1",kubernetes_io_hostname=~"(.+)-np1-(.+)"}) * 100

The following error is generated in the adapter:

{"caller":"series_set.go:82","err":"query returned wrong number of labels: 4, 1","level":"error","ts":"2020-08-13T19:45:42.364Z"} {"caller":"series_set.go:82","err":"query returned wrong number of labels: 4, 1","level":"error","ts":"2020-08-13T19:45:42.366Z"} {"caller":"panic.go:967","err":"runtime error: invalid memory address or nil pointer dereference","level":"error","msg":"runtime panic in parser","stacktrace":"goroutine 6572 [running]:\ngithub.com/timescale/timescale-prometheus/pkg/promql.(*evaluator).recover(0xc004346a80, 0xc0544f5060)\n\t/go/timescale-prometheus/pkg/promql/engine.go:860 +0xd8\npanic(0xcf6720, 0x1590900)\n\t/usr/local/go/src/runtime/panic.go:967 +0x166\ngithub.com/timescale/timescale-prometheus/pkg/promql.(*evaluator).eval(0xc004346a80, 0xf53ce0, 0xc005a63c00, 0xc0544f5028, 0x40cf28)\n\t/go/timescale-prometheus/pkg/promql/engine.go:1376 +0x48d\ngithub.com/timescale/timescale-prometheus/pkg/promql.(*evaluator).Eval(0xc004346a80, 0xf53ce0, 0xc005a63c00, 0x0, 0x0, 0x0, 0x0)\n\t/go/timescale-prometheus/pkg/promql/engine.go:871 +0x88\ngithub.com/timescale/timescale-prometheus/pkg/promql.(*Engine).execEvalStmt(0xc00002e420, 0xf53820, 0xc053d52bd0, 0xc01ccdf0e0, 0xc0549a3c70, 0x0, 0x0, 0x0, 0x0, 0x0, ...)\n\t/go/timescale-prometheus/pkg/promql/engine.go:621 +0x1081\ngithub.com/timescale/timescale-prometheus/pkg/promql.(*Engine).exec(0xc00002e420, 0xf53820, 0xc053d52bd0, 0xc01ccdf0e0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...)\n\t/go/timescale-prometheus/pkg/promql/engine.go:517 +0x5b7\ngithub.com/timescale/timescale-prometheus/pkg/promql.(*query).Exec(0xc01ccdf0e0, 0xf53760, 0xc0221e6a40, 0xc0540c2c80)\n\t/go/timescale-prometheus/pkg/promql/engine.go:214 +0x94\ngithub.com/timescale/timescale-prometheus/pkg/api.QueryRange.func1(0xf4c4a0, 0xc005a63b90, 0xc023ad3e00)\n\t/go/timescale-prometheus/pkg/api/query_range.go:85 +0xb9e\nnet/http.HandlerFunc.ServeHTTP(0xc000426540, 0xf4c4a0, 0xc005a63b90, 0xc023ad3e00)\n\t/usr/local/go/src/net/http/server.go:2012 +0x44\ngithub.com/NYTimes/gziphandler.GzipHandlerWithOpts.func1.1(0xf4c0a0, 0xc02af71500, 0xc023ad3e00)\n\t/go/pkg/mod/github.com/!n!y!times/[email protected]/gzip.go:336 +0x211\nnet/http.HandlerFunc.ServeHTTP(0xc0002e2570, 0xf4c0a0, 0xc02af71500, 0xc023ad3e00)\n\t/usr/local/go/src/net/http/server.go:2012 +0x44\nmain.timeHandler.func1(0xf4c0a0, 0xc02af71500, 0xc023ad3e00)\n\t/go/timescale-prometheus/cmd/timescale-prometheus/main.go:386 +0xc5\ngithub.com/prometheus/common/route.(*Router).handle.func1(0xf4c0a0, 0xc02af71500, 0xc023ad3d00, 0x0, 0x0, 0x0)\n\t/go/pkg/mod/github.com/prometheus/[email protected]/route/route.go:83 +0x27f\ngithub.com/julienschmidt/httprouter.(*Router).ServeHTTP(0xc000099aa0, 0xf4c0a0, 0xc02af71500, 0xc023ad3d00)\n\t/go/pkg/mod/github.com/julienschmidt/[email protected]/router.go:387 +0xc37\ngithub.com/prometheus/common/route.(*Router).ServeHTTP(0xc00000ce80, 0xf4c0a0, 0xc02af71500, 0xc023ad3d00)\n\t/go/pkg/mod/github.com/prometheus/[email protected]/route/route.go:121 +0x4c\nnet/http.(*ServeMux).ServeHTTP(0xc000286a00, 0xf4c0a0, 0xc02af71500, 0xc023ad3d00)\n\t/usr/local/go/src/net/http/server.go:2387 +0x1a5\nnet/http.serverHandler.ServeHTTP(0xc0002880e0, 0xf4c0a0, 0xc02af71500, 0xc023ad3d00)\n\t/usr/local/go/src/net/http/server.go:2807 +0xa3\nnet/http.(*conn).serve(0xc0140b5b80, 0xf53760, 0xc0131289c0)\n\t/usr/local/go/src/net/http/server.go:1895 +0x86c\ncreated by net/http.(*Server).Serve\n\t/usr/local/go/src/net/http/server.go:2933 +0x35c\n","ts":"2020-08-13T19:45:42.366Z"} {"caller":"query_range.go:87","endpoint":"query_range","level":"error","msg":"unexpected error: runtime error: invalid memory address or nil pointer dereference","ts":"2020-08-13T19:45:42.366Z"}

The 'wrong number of labels' error is a persistent error when doing queries either through prometheus using remote_read or through the adapter directly; however the runtime error only occurs when grafana queries the adapter directly

Add custom logger to caching config

Currently used caching lib, bigcache, outputs its logs to standard out.

We should set a custom logger to it that conforms the format of the global logging lib and only outputs the logs to debug level.

Where is prometheus_postgresql_extra extension at?

I am having a hard time trying to figure out where I can download prometheus_postgresql_extra to add to PostgreSQL as an extension. I am doing this though manual setup (ansible playbook) instead of using Helm.

Document various ways to delete data

Prometheus offers an option to delete some metrics via their API, however that doesn't impact any data through remote read/write. Is there a way for me to delete metrics through PostgreSQL?

I could not find any documentation on how to do this. Perhaps a section in the design document or Wiki to show how to do metric deletions would be beneficial?

Feature: Provide also direct PromQL interface

It would be usefull if timescale-prometheus would also provide PromQL interface, to avoid additional data roundtrip via Prometheus when querying data via PromQL.
This should improve performance, reduce load on Prometheus and also maybe allow further SQL query optimizations. There is also problem, that when you are running multiple parallel prometheuses (HA setup), which one should handle do the remote_read request...
Using SQL interface (ie. from Grafana) makes reuse of existing dashboards much harder, and PromQL is easier to work with for sysadmins and other users, that are not SQL experts.

Having PromQL interface, makes it also easier to integrate Timescaledb with other PromQL compatible sources via Promxy- thats my use case - I'm aggregating multiple Prometheuses and Timescaledb.
Maybe see also related Promxy Issue

Btw. thanks for this project, Few days ago I started to think about how to restructure my metrics storage to handle (read) load better, and you have come up with changes that seem to fix most of my problems ;-) And yes, having an SQL interfaces for analytical purposes, was my primary reason for choosing TimescaleDB and not VictoriaMetrics or similar...

Create/document a way to use the adapter without permission to create extensions

Perhaps allow extensions to be created ahead of time or
simply run the extensions in "schema creation mode" with superuser that just sets everything up and quits so that it can be run in a job.

Inspired by "To clarify slightly, I expect we'll always have the timescale extension available, but the adapter may not have superuser. Even in some of the hosted platforms that use an extension instead of direct superuser permissions, that is a lot of access to grant the write-only adapters. A way for operators to enable the extension and create parts of the schema ahead of time would be super helpful." in design doc.

ERROR: could not open extension control file "/usr/share/postgresql/12/extension/timescale_prometheus_extra.control": No such file or directory

Hi,

I am looking for to use timescale-promethues with my existing postgres container. I have installed timescaleDB on my postgres container using the link and enabled timescaledb extension after that I am getting below error in timescale-prometheus container. Please let me know what I need to do to activate this extension,

db_1 | 2020-06-23 14:49:00.717 UTC [56] ERROR: could not open extension control file "/usr/share/postgresql/12/extension/timescale_prometheus_extra.control": No such file or directory
db_1 | 2020-06-23 14:49:00.717 UTC [56] STATEMENT: CREATE EXTENSION IF NOT EXISTS timescale_prometheus_extra WITH SCHEMA _prom_ext;

I have tried CREATE EXTENSION IF NOT EXISTS timescale_prometheus_extra WITH SCHEMA _prom_ext; this command on postgres database but there I am getting below error,

postgres=# CREATE EXTENSION IF NOT EXISTS timescale_prometheus_extra WITH SCHEMA _prom_ext;
ERROR: could not open extension control file "/usr/share/postgresql/12/extension/timescale_prometheus_extra.control": No such file or directory

Can't find timescale/timescale-observability chart in timescale repository

Hi. I am new to timescaleDB. I am searching for a way to install it and send my Prometheus instances data on it. The recommended way is to use the timescale-observability chart.

However, it isn't available in the timescale chart repository.

helm search repo timescale
NAME                            CHART VERSION   APP VERSION     DESCRIPTION
timescale/timescaledb-multinode 0.3.0                           TimescaleDB Multinode Deployment.
timescale/timescaledb-single    0.5.5                           TimescaleDB HA Deployment.

I will use the helm-git plugin to install it. However it would be awesome to either publish the chart since it is what the READMEs suggest (here and here) or update them to tell us to use the git files instead.

Optimize multi-metric queries

Right now, we send individual queries for each metric that a query requests. We can further optimize that by sending it all into a single batch and parse the results in one go.

#124 (comment)

histogram_quantile() function throws remote_read error

Prometheus: v2.17.1
Timescale-Prometheus: valpha.2
PostgreSQL: v11

Whenever I use histogram_quantile() function for any metrics, it keeps throwing the following error:

remote_read: error reading response: snappy: corrupt input

I am not sure if this is an issue on Prometheus' end or timescale-prometheus' end, but since its mentioning it's related to remote_read, I guess it's timescale-prometheus' end?

ERROR: insert/update/delete not permitted on chunk

I installed timescale-prometheus from timescale-observability helm chart.
There was an issue with prometheus remote_write remote_read configuration. So timescale-prometheus was not receiving the metrics from prometheus for 10 hours.
I updated prometheus settings to point the right timescale-prometheus connector url. It was working fine but after a while I started the receiving the following errors on timescale-prometheus pod.

,"level":"warn","msg":"Error sending samples to remote storage","num_samples":100,"ts":"2020-05-08T13:45:43.635Z"}
{"caller":"log.go:51","err":"ERROR: insert/update/delete not permitted on chunk \"_hyper_1977_685_chunk\" (SQLSTATE 0A000)","level":"warn","msg":"Error sending samples to remote storage","num_samples":2,"ts":"2020-05-08T13:45:43.652Z"}
{"caller":"log.go:51","err":"ERROR: insert/update/delete not permitted on chunk \"_hyper_1429_394_chunk\" (SQLSTATE 0A000)","level":"warn","msg":"Error sending samples to remote storage","num_samples":100,"ts":"2020-05-08T13:45:43.661Z"}
{"caller":"log.go:51","err":"ERROR: insert/update/delete not permitted on chunk \"_hyper_1977_685_chunk\" (SQLSTATE 0A000)","level":"warn","msg":"Error sending samples to remote storage","num_samples":92,"ts":"2020-05-08T13:45:43.685Z"}
{"caller":"log.go:51","err":"ERROR: insert/update/delete not permitted on chunk \"_hyper_1549_465_chunk\" (SQLSTATE 0A000)","level":"warn","msg":"Error sending samples to remote storage","num_samples":94,"ts":"2020-05-08T13:45:43.831Z"}
{"caller":"log.go:51","err":"ERROR: insert/update/delete not permitted on chunk \"_hyper_1549_465_chunk\" (SQLSTATE 0A000)","level":"warn","msg":"Error sending samples to remote storage","num_samples":100,"ts":"2020-05-08T13:45:43.840Z"}
{"caller":"log.go:51","err":"ERROR: insert/update/delete not permitted on chunk \"_hyper_1549_465_chunk\" (SQLSTATE 0A000)","level":"warn","msg":"Error sending samples to remote storage","num_samples":100,"ts":"2020-05-08T13:45:43.857Z"}
{"caller":"log.go:51","err":"ERROR: insert/update/delete not permitted on chunk \"_hyper_1429_394_chunk\" (SQLSTATE 0A000)","level":"warn","msg":"Error sending samples to remote storage","num_samples":93,"ts":"2020-05-08T13:45:43.893Z"}
{"caller":"log.go:51","err":"ERROR: insert/update/delete not permitted on chunk \"_hyper_1941_669_chunk\" (SQLSTATE 0A000)","level":"warn","msg":"Error sending samples to remote storage","num_samples":100,"ts":"2020-05-08T13:45:43.985Z"}
{"caller":"log.go:51","err":"ERROR: insert/update/delete not permitted on chunk \"_hyper_1941_669_chunk\" (SQLSTATE 0A000)","level":"warn","msg":"Error sending samples to remote storage","num_samples":100,"ts":"2020-05-08T13:45:43.995Z"}
{"caller":"log.go:51","err":"ERROR: insert/update/delete not permitted on chunk \"_hyper_1815_585_chunk\" (SQLSTATE 0A000)","level":"warn","msg":"Error sending samples to remote storage","num_samples":100,"ts":"2020-05-08T13:45:44.061Z"}

Errors on timescaledb

020-05-08 13:47:04 UTC [15991]: [5eb556b2.3e77-29011] postgres@postgres,app=[unknown] [0A000] HINT:  Make sure the chunk is not compressed.
2020-05-08 13:47:04 UTC [15991]: [5eb556b2.3e77-29012] postgres@postgres,app=[unknown] [0A000] CONTEXT:  COPY apiserver_admission_controller_admission_duration_seconds__1006, line 1
2020-05-08 13:47:04 UTC [15991]: [5eb556b2.3e77-29013] postgres@postgres,app=[unknown] [0A000] STATEMENT:  copy "prom_data"."apiserver_admission_controller_admission_duration_seconds__1006" ( "time", "value", "series_id" ) from stdin binary;
2020-05-08 13:47:05 UTC [15991]: [5eb556b2.3e77-29014] postgres@postgres,app=[unknown] [0A000] ERROR:  insert/update/delete not permitted on chunk "_hyper_1941_669_chunk"
2020-05-08 13:47:05 UTC [15991]: [5eb556b2.3e77-29015] postgres@postgres,app=[unknown] [0A000] HINT:  Make sure the chunk is not compressed.
2020-05-08 13:47:05 UTC [15991]: [5eb556b2.3e77-29016] postgres@postgres,app=[unknown] [0A000] CONTEXT:  COPY kube_statefulset_status_current_revision, line 1
2020-05-08 13:47:05 UTC [15991]: [5eb556b2.3e77-29017] postgres@postgres,app=[unknown] [0A000] STATEMENT:  copy "prom_data"."kube_statefulset_status_current_revision" ( "time", "value", "series_id" ) from stdin binary;
2020-05-08 13:47:05 UTC [15991]: [5eb556b2.3e77-29018] postgres@postgres,app=[unknown] [0A000] ERROR:  insert/update/delete not permitted on chunk "_hyper_1429_394_chunk"
2020-05-08 13:47:05 UTC [15991]: [5eb556b2.3e77-29019] postgres@postgres,app=[unknown] [0A000] HINT:  Make sure the chunk is not compressed.
2020-05-08 13:47:05 UTC [15991]: [5eb556b2.3e77-29020] postgres@postgres,app=[unknown] [0A000] CONTEXT:  COPY kubelet_runtime_operations_latency_microseconds_count, line 1
2020-05-08 13:47:05 UTC [15991]: [5eb556b2.3e77-29021] postgres@postgres,app=[unknown] [0A000] STATEMENT:  copy "prom_data"."kubelet_runtime_operations_latency_microseconds_count" ( "time", "value", "series_id" ) from stdin binary;
2020-05-08 13:47:05 UTC [16044]: [5eb556d1.3eac-28823] postgres@postgres,app=[unknown] [0A000] ERROR:  insert/update/delete not permitted on chunk "_hyper_1549_465_chunk"
2020-05-08 13:47:05 UTC [16044]: [5eb556d1.3eac-28824] postgres@postgres,app=[unknown] [0A000] HINT:  Make sure the chunk is not compressed.
2020-05-08 13:47:05 UTC [16044]: [5eb556d1.3eac-28825] postgres@postgres,app=[unknown] [0A000] CONTEXT:  COPY kubelet_cgroup_manager_duration_seconds_count, line 1
2020-05-08 13:47:05 UTC [16044]: [5eb556d1.3eac-28826] postgres@postgres,app=[unknown] [0A000] STATEMENT:  copy "prom_data"."kubelet_cgroup_manager_duration_seconds_count" ( "time", "value", "series_id" ) from stdin binary;

Cronjob drop-chunk failing

I have installed the timescale-observability stack (timescaledb + timescale-prometheus + prometheus), version 0.1.0-alpha.3 and managed to make it work.

However, I notice from times to times a pod of the drop-chunk cronjob in CrashLoopBackOff state:
metrics-timescale-prometheus-drop-chunk-1588905000-9t4rs 0/1 CrashLoopBackOff 5 5m1s

When I look at the logs, I have:

kubectl logs metrics-timescale-prometheus-drop-chunk-1588905000-9t4rs                                                                       SIGINT(2) 
ERROR:  schema "prom" does not exist
LINE 1: CALL prom.drop_chunks();
             ^

When I check in the database, the schema "prom" indeed does not exists.

postgres=# \dn
          List of schemas
          Name           |  Owner
-------------------------+----------
 _prom_catalog           | postgres
 _prom_ext               | postgres
 _timescaledb_cache      | postgres
 _timescaledb_catalog    | postgres
 _timescaledb_config     | postgres
 _timescaledb_internal   | postgres
 prom_api                | postgres
 prom_data               | postgres
 prom_data_series        | postgres
 prom_info               | postgres
 prom_metric             | postgres
 prom_series             | postgres
 public                  | postgres
 timescaledb_information | postgres
(14 rows)

However, I do see the "drop_chunks" procedure in the prom_api schema:

postgres=# select proname,nspname from pg_catalog.pg_proc JOIN pg_namespace ON pg_catalog.pg_proc.pronamespace = pg_namespace.oid where proname = 'drop_chunks';
   proname   | nspname
-------------+----------
 drop_chunks | prom_api
 drop_chunks | public
(2 rows)

Current leader metric in /metrics for HA setups

Having a metric like current_leader 1 in /metrics would allow us to know which adapter in HA setup is the current leader. This can help us identify which one's leading and/or if there's any flipping going on between adapters.

Helm 2 support

The apiVersion in Chart.yaml is set to v2 . This stops people with Helm 2 from using this chart.

When I change it to v1 and use the chart seems to work. Is there a reason why it has been set to v2?
Could we change it to v1 so people stuck on Helm 2 can use the chart?

panic: runtime error: invalid memory address or nil pointer dereference

I encountered following issue during testing the adapter. It was working for several hours, then the pods got into crashloop:

{"caller":"log.go:43","level":"info","msg":"Samples write throughput","samples/sec":67,"ts":"2020-04-15T12:57:02.440Z"}
{"caller":"log.go:43","level":"info","msg":"Samples write throughput","samples/sec":1,"ts":"2020-04-15T12:57:03.273Z"}
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x20 pc=0xaf002e]

goroutine 53 [running]:
github.com/timescale/timescale-prometheus/pkg/pgmodel.(*orderedMap).Front(...)
	/go/timescale-prometheus-connector/pkg/pgmodel/pgx.go:654
github.com/timescale/timescale-prometheus/pkg/pgmodel.(*insertHandler).flushTimedOutReqs(0xc0001c8000)
	/go/timescale-prometheus-connector/pkg/pgmodel/pgx.go:549 +0x4e
github.com/timescale/timescale-prometheus/pkg/pgmodel.runInserterRoutine(0xe3e1a0, 0xc0000ba028, 0xc0000bc4e0)
	/go/timescale-prometheus-connector/pkg/pgmodel/pgx.go:504 +0x322
created by github.com/timescale/timescale-prometheus/pkg/pgmodel.newPgxInserter
	/go/timescale-prometheus-connector/pkg/pgmodel/pgx.go:237 +0xaa

Product Feedback Wanted: Support for multi-tenant queries

Today you can already write data from multiple tenants to timescale-prometheus. You'd just have to set up relabel configs https://prometheus.io/docs/prometheus/latest/configuration/configuration/#relabel_config on the prometheus server to add labels indicating the tenant.

However it is not clear how to give single-tenant access to only their own metrics right now. We need more feedback from the community about how this should look?

PromQL: Should this be done with separate connectors for each tenant where we add a config to a connector that says that only results with a particular label are returned on all queries? If these are not separate connectors, how do we tell from a promQL query who the tenant making the query is?

Do we need single-tenant SQL access? Should this use Row-level-security?

What types of queries do individual tenants make? Do we need to have performance isolation on the read or write side?

Are their cross-tenant queries? Who makes them? How do they look?

Are there any existing projects in the Prometheus ecosystem that does this particularly well? Why do people like their approach?

Panics on arm

Trying to get this to run on a Pi 4 with docker. I built the docker image locally and configured it to connect to a (self-built) docker instance of pg_connector, with timescaledb & pg_prometheus extensions loaded & configured in that container (both using master branch).

Configured prometheus to use remote_read/write:

remote_write:
  - url: 'http://192.168.0.10:9201/write'
remote_read:
  - url: 'http://192.168.0.10:9201/read'

However, as soon as I bring up prometheus, I get a flood of the following error in the docker logfiles.

connector    | 2020/07/01 12:54:06 http: panic serving 172.30.0.1:52570: runtime error: invalid memory address or nil pointer dereference
connector    | goroutine 1318 [running]:
connector    | net/http.(*conn).serve.func1(0x29b445a0)
connector    | 	/usr/local/go/src/net/http/server.go:1772 +0xf0
connector    | panic(0x7d3bf8, 0xf35700)
connector    | 	/usr/local/go/src/runtime/panic.go:973 +0x3d4
connector    | github.com/timescale/timescale-prometheus/pkg/api.Write.func1(0x9b8500, 0xe3f78c0, 0x19dbf00)
connector    | 	/go/timescale-prometheus/pkg/api/write.go:42 +0x25c
connector    | net/http.HandlerFunc.ServeHTTP(0x244fff00, 0x9b8500, 0xe3f78c0, 0x19dbf00)
connector    | 	/usr/local/go/src/net/http/server.go:2012 +0x34
connector    | main.timeHandler.func1(0x9b8500, 0xe3f78c0, 0x19dbf00)
connector    | 	/go/timescale-prometheus/cmd/timescale-prometheus/main.go:327 +0x7c
connector    | net/http.HandlerFunc.ServeHTTP(0x244fff20, 0x9b8500, 0xe3f78c0, 0x19dbf00)
connector    | 	/usr/local/go/src/net/http/server.go:2012 +0x34
connector    | net/http.(*ServeMux).ServeHTTP(0xf43388, 0x9b8500, 0xe3f78c0, 0x19dbf00)
connector    | 	/usr/local/go/src/net/http/server.go:2387 +0x16c
connector    | net/http.serverHandler.ServeHTTP(0x18a0ab0, 0x9b8500, 0xe3f78c0, 0x19dbf00)
connector    | 	/usr/local/go/src/net/http/server.go:2807 +0x88
connector    | net/http.(*conn).serve(0x29b445a0, 0x9b9ca0, 0x29b7efc0)
connector    | 	/usr/local/go/src/net/http/server.go:1895 +0x7d4
connector    | created by net/http.(*Server).Serve
connector    | 	/usr/local/go/src/net/http/server.go:2933 +0x2d0

Add count of duplicates to logging

When inserting on conflict do nothing, get back actual number of rows inserted and subtract from total number of rows you tried to insert. This should give you the dupes.

Performance issues with a large number of metrics

We are faced with so many performance problems that after a few minutes the database is so busy that the queue can no longer be processed and the postgres server is no longer accessible.

Steps to Reproduce

  1. Installation of the TimescaleDB
  2. Run of Timescale-Tune (Configuration output postgresql.conf.txt)
  3. Clone https://github.com/timescale/timescale-prometheus.git and compile ./extension (timescale-prometheus_extra)
  4. Add timescale extension to postgres database
  5. Download and run latest release of timescale-prometheus (0.1.0-alpha.3.1)
  6. Add remote_write and remote_read to prometheus.yml

Current Behavior

In the first minute, the metrics are accessible via the Prometheus UI, after that it takes a long time until the metrics are no longer accessible. This can also be observed in the log files:
timescale-prometheus.log.txt:

May 20 14:17:44 timescale-db-01 timescale-prometheus[29369]: {"caller":"log.go:46","level":"info","msg":"Samples write throughput","samples/sec":877,"ts":"2020-05-20T12:17:44.213Z"}
May 20 14:17:45 timescale-db-01 timescale-prometheus[29369]: {"caller":"log.go:46","level":"info","msg":"Samples write throughput","samples/sec":100,"ts":"2020-05-20T12:17:45.520Z"}
May 20 14:18:58 timescale-db-01 timescale-prometheus[29369]: {"caller":"log.go:51","err":"ERROR: duplicate key value violates unique constraint \"pg_class_relname_nsp_index\" (SQLSTATE 23505)","level":"warn","msg":"Error sending samples to remote storage","num_samples":0,"ts":"2020-05-20T12:18:58.041Z"}
May 20 14:19:06 timescale-db-01 timescale-prometheus[29369]: {"caller":"log.go:51","err":"ERROR: duplicate key value violates unique constraint \"pg_class_relname_nsp_index\" (SQLSTATE 23505)","level":"warn","msg":"Error sending samples to remote storage","num_samples":0,"ts":"2020-05-20T12:19:06.355Z"}

prometheus.log.txt:

May 20 14:17:08 timescale-db-01 prometheus[3654]: ts=2020-05-20T12:17:08.632Z caller=dedupe.go:112 component=remote level=info remote_name=8639b6 url=http://localhost:9201/write msg="Currently resharding, skipping."
May 20 14:17:48 timescale-db-01 prometheus[3654]: ts=2020-05-20T12:17:48.632Z caller=dedupe.go:112 component=remote level=info remote_name=8639b6 url=http://localhost:9201/write msg="Remote storage resharding" from=324 to=1000
May 20 14:17:58 timescale-db-01 prometheus[3654]: ts=2020-05-20T12:17:58.632Z caller=dedupe.go:112 component=remote level=warn remote_name=8639b6 url=http://localhost:9201/write msg="Skipping resharding, last successful send was beyond threshold" lastSendTimestamp=1589977065 minSendTimestamp=1589977068
May 20 14:18:08 timescale-db-01 prometheus[3654]: ts=2020-05-20T12:18:08.632Z caller=dedupe.go:112 component=remote level=warn remote_name=8639b6 url=http://localhost:9201/write msg="Skipping resharding, last successful send was beyond threshold" lastSendTimestamp=1589977065 minSendTimestamp=1589977078

Context (Environment)

Specs: VM with 16 cores and 16 GB RAM
There are about 100 exporters in the scrapeconfig, which are queried every 15 seconds.

The setup was tried in the following constellations, which all gave the same result:

  1. (VM1: Prometheus) <-> (VM2: Postgres, TimescaleDB Prometheus adapter)
  2. (VM1: Prometheus, TimescaleDB Prometheus adapter) <-> (VM2: Postgres)
  3. (VM: Prometheus, TimescaleDB Prometheus adapter, Postgres)

Support continuous aggregates via PromQL

Two things to research:

  1. How do we expose the results of a continuous agg to the adapter for queries via PromQL
  2. What kind of aggregates do we want to support in continuous aggs. Specifically, do we need to support any of the prom aggregates? e.g. rate, increase, etc

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.