Code Monkey home page Code Monkey logo

influxdb-haskell's Introduction

InfluxDB client library for Haskell

Hackage Hackage-Deps Haskell-CI Hackage CI Gitter

Currently this library is tested against InfluxDB 1.8. InfluxDB 2 isn't supported (yet).

Getting started

There is a quick start guide on Hackage.

Running tests

Either cabal new-test or stack test runs the doctests in Haddock comments. Note that they need a local running InfluxDB server.

Contact information

Contributions and bug reports are welcome!

Please feel free to contact me through github or on gitter.

influxdb-haskell's People

Contributors

alaendle avatar buecking avatar cocreature avatar finleymcilwaine avatar fumieval avatar luntain avatar lupino avatar maoe avatar mpickering avatar msakai avatar pacak avatar tmcgilchrist avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

influxdb-haskell's Issues

Backslash escaping in field values

Thank you for the wonderful work you have done on this library! It is a joy to work with such a well-thought-out, strongly typed interface to InfluxDB. I have run into a small escaping issue, however.

Database.InfluxDB.Line.escapeStringField only escapes double quotes but does not perform any type of backslash escaping.

The InfluxDB Line Protocol documentation requires that both double quotes and backslashes be escaped (albeit the way InfluxDB handles backslashes is fairly tolerant of most cases where they are not escaped).

This leads to unexpected behavior where InfluxDB rejects certain lines. Consider the following program:

{-# LANGUAGE OverloadedStrings #-}

import Data.Time.Clock (UTCTime)
import Database.InfluxDB
import Database.InfluxDB.Line
import qualified Data.ByteString.Lazy.Char8 as L
import qualified Data.Map as M
import qualified Data.Text.IO as T

s = "bl\\\"a"

main = do
  T.putStrLn s
  L.putStrLn $ encodeLine (scaleTo Nanosecond)
             $ Line "testing" mempty
                    (M.singleton "test" $ FieldString s)
                    (Nothing :: Maybe UTCTime)

When run it will output s verbatim follow by the library's encoding of an InfluxDB measurement line with s as the value of a field named "test":

bl\"a
testing test="bl\\"a"

Because the library escapes the double quote but does not escape the backslash, the result to InfluxDB's perspective is that the backslash was escaped but the double quote was not escaped. If you were to feed this measurement line to InfluxDB over the HTTP interface, it yields an unbalanced quotes parse error.

The expected behavior would be that the library would encode the measurement line as testing test="bl\\\"a", escaping both the backslash and the double quote. This rendering InfluxDB will accept.

(This can also be an issue any time an input field string contains a literal \\. InfluxDB will treat that as an escaped backslash, and so the field's double backslash becomes flattened to only a single backslash.)

Series data type doesn't reflect "best practices" for time-series schemas

I love the aeson-esque interface that you've built but there's a glaring misstep in that the ToSeriesData is being treated a lot like tables of data instead of series of data.

Let's say I have a datatype that's an instance of ToSeriesData that looks something like this:

data Reading = Reading UTCTimeEpoch UUID DeviceType ReadingType Double
data DeviceType = Plug | Switch
data ReadingType = Watts | Volts | Temp

Let's say you have a series named "device_readings" that go into it. Works great at small-scale but the minute you reach millions of points you're suddenly hitting performance problems because InfluxDB isn't designed to handle that type of querying (SELECT uuid FROM device_readings WHERE ...) and filtering on, say, the device type column. You'll traverse the entire key space of that specific series to do that because underneath Influx is just a dumb key-value store.

If there's 20million keys in the device_readings series, that's really severe pain and you've just tremendously fucked yourself because migrating that data to another schema could take quite a bit of time...

This is my major beef with InfluxDB because they wanted to keep a "SQL like" interface to the data but the underlying model definitely will not handle the kind of queries that you CAN run on it. This is also their fault for not urgently writing up a document on "Schema Design".

TempoDB got it right. Your series name should contain the key, category, and attributes you want to "query".

So instead of a series name like: device_readings. It would instead look like: device_readings.2c9e4570-9b35-0131-c7ce-48e0eb16f719.Watts.Dimmer. You will then, efficiently, be able to query the data you want by being able to construct the key from known categories, ID's, and attributes. The datatype then looks extremely simple:

data Reading = Reading UTCTimeEpoch Double

What I would love to is a data type that can give us a structured and easy way of building series names from a key, a category, and some attributes! Which is what I wish this library was doing, instead of following a more table like model.

I'm going to throw together my ideas in a fork and see what you think of them. Because right now I'm building series names with functions and its ugly, I would rather do it with specialized data types and instances of a class like ToSeriesName or something similar.

MessagePack support

As of InfluxDB v1.4 MessagePack can be used for responses by setting application/x-msgpack in the Accept header.

query fails silently on parse errors

I came across it when using (Tagged "time" Int) field. The query was returning no results even though printed and ran by hand was ok. It worked when I changed it to Tagged "time" UTCTime. But it broke agains when I set the precision to Second in query parameters. Now I know that was due to the precision being set to rfc3339 which returns human readable timestamps. Anyway, the bigger issue here is that parse errors are silent failures.

Should rename repo

You should probably rename this repo so people won't have a problem if they want to fork it and make updates. Since there's already an influxdb that they may have. Maybe influxdb-haskell is a better name?

unit64 (Word64) support

It's still gated behind a flag but InfluxDB devs seem to be enabling it by default in near future.

Typo in haddock documentation of ToSeriesData class

toSeriesColumn is used as a method name, but it should be toSeriesColumns to match actual implementation.

diff --git a/src/Database/InfluxDB/Encode.hs b/src/Database/InfluxDB/Encode.hs
index d812268..6b28c90 100644
--- a/src/Database/InfluxDB/Encode.hs
+++ b/src/Database/InfluxDB/Encode.hs
@@ -28,7 +28,7 @@ class ToSeries a where
 -- > data EventType = Login | Logout
 -- >
 -- > instance ToSeriesData Event where
--- >   toSeriesColumn _ = V.fromList ["user", "type"]
+-- >   toSeriesColumns _ = V.fromList ["user", "type"]
 -- >   toSeriesPoints (Event user ty) = V.fromList [toValue user, toValue ty]
 -- >
 -- > instance ToValue EventType

Invalid query failure suppressed

select time from measurement is an invalid query, because "at least one non-time field must be querried", but you won't get this error through the lib because it just silently returns an empty result. I got that error when running the query in the cli.

FieldFloat to parse Int

Hi,

It looks like using a FieldInt I can parse either an Int or a Float without a problem (of course for a Float it'll lose precision).
The opposite isn't true though, parsing a Double with FieldFloat (which is weirdly named ?) works, but trying to parse an Int just results in an empty Vector.

That's a bit problematic, that forces me to use Int and the loss of precision makes datas like "SELECT load1 FROM system" unusable (for a classical telegraf database). Am I missing something here ?
The goal being not to require the user to specify the field type, as that would make everything very tedious to use. If treating an Int as a Float worked, it would make everything easy to use in a generic way though.

Thanks

compile failure w/ base-4.7 and older

Hi,

I noticed that the last 3 release of influxdb run into a compile failure (see below).

Configuring component lib from influxdb-1.1.1...
Preprocessing library influxdb-1.1.1...

src/Database/InfluxDB/Manage.hs:27:8:
    Could not find module ‘Data.Void’
    It is a member of the hidden package ‘void-0.7.2’.
    Perhaps you need to add ‘void’ to the build-depends in your .cabal file.
    Use -v to see a list of the files searched for.

The easiest fix would be to tighten the lower constraint on base, i.e. base >= 4.8 and call it a day (also because the Travis job currently only validates GHC 7.10 & GHC 8.0).

QueryResults for Void fails with non-empty results

The QueryResults instance for Void used to accept non-empty results. As of #68 it started failing on non-empty results because now the strictDecoder is the default and there's no way to define succeeding parseMeasurement for Void. This commit allows the instance to restore the previous behavior by adding the new coerceDecoder field to the QueryResults class. The instance sets a decoder which always returns an empty vector bypassing the failing parseMeasurement.

Here's a reproducer of the issue:

case_issue79 :: (String -> IO ()) -> Assertion
case_issue79 step = withDatabase db $ do
  let w = writeParams db
  let q = queryParams db
  step "Querying an empty series with two fields expected"
  _ <- query @(Tagged "time" UTCTime, Tagged "value" Int) q "SELECT * FROM foo"
  step "Querying an empty series with the results ignored"
  _ <- query @Void q "SELECT * FROM foo"
  step "Writing a data point"
  write w $ Line "foo" mempty (Map.fromList [("value", FieldInt 42)]) (Nothing :: Maybe UTCTime)
  step "Querying a non-empty series with two fields expected"
  _ <- query @(Tagged "time" UTCTime, Tagged "value" Int) q "SELECT * FROM foo"
  step "Querying a non-empty series with the results ignored"
  _ <- query @Void q "SELECT * FROM foo"
  return ()
  where
    db = "case_issue79"

It fails as follows:

  issue #79: FAIL (0.19s)
    Querying an empty series with two fields expected
    Querying an empty series with the results ignored
    Writing a data point                                 (0.11s)
    Querying a non-empty series with two fields expected
    Querying a non-empty series with the results ignored (0.07s)
      UnexpectedResponse "BUG: Cannot parse Void in Database.InfluxDB.Query.query" Request {
        host                 = "localhost"
        port                 = 8086
        secure               = False
        requestHeaders       = []
        path                 = "/query"
        queryString          = "?q=SELECT%20%2A%20FROM%20foo&db=case_issue79"
        method               = "GET"
        proxy                = Nothing
        rawBody              = False
        redirectCount        = 10
        responseTimeout      = ResponseTimeoutDefault
        requestVersion       = HTTP/1.1
      }
       "{\"results\":[{\"series\":[{\"values\":[[\"2020-07-17T15:43:54.46928Z\",42]],\"name\":\"foo\",\"columns\":[\"time\",\"value\"]}],\"statement_id\":0}]}"

Support GHC 9.0.1

Currently it's not buildable due to a transitive dependency.

% cabal configure -w ghc-9.0
'cabal.project.local' already exists, backing it up to
'cabal.project.local~2'.
Resolving dependencies...
cabal: Could not resolve dependencies:
[__0] trying: influxdb-1.9.0 (user goal)
[__1] trying: http-client-0.7.5 (dependency of influxdb)
[__2] trying: streaming-commons-0.2.2.1 (dependency of http-client)
[__3] trying: zlib-0.6.2.2 (dependency of streaming-commons)
[__4] next goal: base (dependency of influxdb)
[__4] rejecting: base-4.15.0.0/installed-4.15.0.0 (conflict: zlib => base>=4
&& <4.15)
[__4] rejecting: base-4.14.1.0, base-4.14.0.0, base-4.13.0.0, base-4.12.0.0,
base-4.11.1.0, base-4.11.0.0, base-4.10.1.0, base-4.10.0.0, base-4.9.1.0,
base-4.9.0.0, base-4.8.2.0, base-4.8.1.0, base-4.8.0.0, base-4.7.0.2,
base-4.7.0.1, base-4.7.0.0, base-4.6.0.1, base-4.6.0.0, base-4.5.1.0,
base-4.5.0.0, base-4.4.1.0, base-4.4.0.0, base-4.3.1.0, base-4.3.0.0,
base-4.2.0.2, base-4.2.0.1, base-4.2.0.0, base-4.1.0.0, base-4.0.0.0,
base-3.0.3.2, base-3.0.3.1 (constraint from non-upgradeable package requires
installed instance)
[__4] fail (backjumping, conflict set: base, influxdb, zlib)
After searching the rest of the dependency tree exhaustively, these were the
goals I've had most trouble fulfilling: base, influxdb, zlib,
streaming-commons, http-client

Special support for the time column

Currently influxdb package doesn't have any special support for the time column. Given that the time precision depends on time_precision query string, we cannot tell what precision is actually used for a time value without looking at the request.

That is, FromValue type class is not sufficient for the time column.

Database Name that is Not a Literal

Hi. This is probably a simple issue to fix, but here goes:

I would like to run this function :

createDatabase :: String -> IO ()
createDatabase name = manage (queryParams name) $ F.formatQuery ("CREATE DATABASE "%F.database) name

The error i receive is:
Couldn't match type ‘[Char]’ with ‘Database’
Expected type: Database
Actual type: String

I have the OverloadedStrings extension loaded at the top of my module

How can make the type-checker happy and manage to create databases with different names, without having to know in advance (as string literals) what they are?

Precision is confusing

What is the purpose of it? Why not have only nanoseconds? This seems to be the actual representation in the database. Setting precision to anything else just loses precision. For example, setting precision to seconds now means:

  • when connecting to the db, specify precision as seconds
  • scale the timestamp to said precision and write on the wire
  • influx db reads it, scales from seconds to nanoseconds and stores

The only effect is a loss of precision. If someone wants to lose precision on his timestamp he may well do it before passing those timestamp to the client.

Custom exception type

Currently Database.InfluxDB.Http can throw IO exceptions in case there is a decode error, or HttpExceptions from http-client due to various reason. It might be more useful to wrap these exceptions into a custom exception type.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.