Code Monkey home page Code Monkey logo

pg-collectd's Introduction

ci

pg-collectd

pg-collectd provides an alternative and opinionated postgres collectd writer plugin, where flexibility is traded in for performance and ease of use. A quick rundown.

  • No dependencies (other than collectd). pg-collectd utilizes the (unofficial) collectd rust plugin for low cost binding to collectd's C API. The pure rust postgres driver is statically compiled into the plugin. No need to rely on libpq.
  • Simplified insertion as the data is expanded and denormalized, so instead of writing a function that receives an array of values / identifiers, these are expanded so everything fits into a single table and columns contain single values (not arrays).
  • a 4x reduction in db cpu usage compared to using collectd's default postgres writer + setup (a conservative estimate)

Here are the downsides:

  • Not an officially supported collectd plugin
  • Not as feature rich (eg: currently no support for TLS connections / does not support custom table names)
  • Only tested on a limited subset of collectds (though it may work on other versions depending on if collectd changed its C API)
  • Only distributed as debs or source (eg: no rpms / apt repository)

Compatibility

  • Collectd 5.7+
  • Postgres 7.4+

Installation

First we must set up the database with the following schema:

CREATE TABLE IF NOT EXISTS collectd_metrics (
   time TIMESTAMPTZ NOT NULL,
   plugin TEXT,
   plugin_instance TEXT,
   type_instance TEXT,
   type TEXT,
   host TEXT,
   metric TEXT,
   value DOUBLE PRECISION
);
  • (Optional) If using the TimescaleDB extension for postgres, the statements below contains provide good defaults
SELECT create_hypertable('collectd_metrics', 'time', chunk_time_interval => interval '1 day');
ALTER TABLE collectd_metrics SET (
   timescaledb.compress,
   timescaledb.compress_segmentby = 'plugin',
   timescaledb.compress_orderby = 'time'
);
SELECT add_compression_policy('collectd_metrics', INTERVAL '7 days');
SELECT add_retention_policy('collectd_metrics', INTERVAL '90 days');
  • Create a user that only has INSERT permissions on collectd_metrics:
CREATE USER collectd WITH PASSWORD 'xxx';
GRANT INSERT ON collectd_metrics TO collectd;
  • Download the appropriate package from the latest release (see the compatibility list shown earlier)
  • Install with dpkg -i pg-collectd-*.deb
  • Edit collectd configuration (eg: /etc/collectd/collectd.conf)
LoadPlugin pg_collectd
<Plugin pg_collectd>
    BatchSize 1000
    Connection "postgresql://<user>:<password>@<host>:<port>/<db>"
    StoreRates true
    LogTimings INFO
</Plugin>
  • Restart collectd

Not using Ubuntu / Debian? No problem, build from source.

Configuration Option

  • BatchSize: number of values to batch (eg: rows in the csv) before copying them to the database. Default is 100, which is extremely conservative. Test what is appropriate for you, but 500 to 1000 works well for me. Note that it is possible for the number of rows inserted to not be exactly equal to batch size, as NaN rates are not stored and some metrics given to write contain more than one value.
  • Connection (see postgres connection uri documentation)
  • StoreRates: Controls whether DERIVE and COUNTER metrics are converted to a rate before sending. Default is true.
  • LogTimings: The level at which to log performance timings. The default is DEBUG to cut down on potential log spam, though there is no problem setting it to INFO (or WARN / ERROR for that matter), as only a single line is logged per batched insert.

Performance Secret Sauce

The original postgres writer for collectd works by having a long lasting transaction writing many individual statements committed at CommitInterval seconds. A quote from postgres's official documentation leads us to ponder low hanging fruit:

Note that loading a large number of rows using COPY is almost always faster than using INSERT, even if PREPARE is used and multiple insertions are batched into a single transaction.

To take advantage of this, pg-collectd batches up a certain number of values (BatchSize), and then formats those values as part of a in-memory CSV file that can COPY over to postgres. What's nice is that memory allocations are amortized such that over time, no memory is allocated for the in-memory CSV file, only the CPU time for formatting the CSV is needed.

Building

To build the repo for collectd, ensure you have Rust installed and then execute the build process:

cargo build --release

The resulting ./target/release/libpg_collectd.so should be copied (locally or remotely) to /usr/lib/collectd/pg_collectd.so

pg-collectd's People

Contributors

dependabot-preview[bot] avatar dependabot-support avatar dependabot[bot] avatar nickbabcock avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

pg-collectd's Issues

TimescaleDB configuration

Thanks for this plugin. It works great on my Ubuntu 21.04 machine with PostgreSQL 13 and the TimescaleDB extension.

I've created this configuration to set-up the collectd_metrics table. It might be useful as a starting point for a section in the README.

Add TLS support

Currently communication with the postgres server is in the clear. This is undesirable when the database is remote, so we should add TLS capabilities through the NativeTls hook in rust-postgres.

Support configurable table name

It should be possible for one to specify a table name to insert into in collectd's config. I think the only issue is that we'd need to sanitize it as we'd be doing string formatting to create the COPY command. AFAIK, execute format(etc) won't work here.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.