senx / warp10-platform Goto Github PK

View Code? Open in Web Editor NEW

375.0 18.0 51.0 13.55 MB

The Most Advanced Time Series Platform

Home Page: https://warp10.io/

License: Apache License 2.0

Java 99.00% Thrift 0.33% Python 0.30% Shell 0.37%

database time-series metrics analysis warpscript tsdb sensor-data geo warp10 iot

warp10-platform's Introduction

Warp 10

The Most Advanced Time Series Platform

Warp 10 is a modular open source platform shaped for the IoT that collects, stores and allows you to analyze sensor data. It offers both a Time Series Database and a powerful analysis environment that can be used together or independently. Learn more

Increase the storage capacity of your historical data and reduce your storage bill while preserving all analysis capabilities
Deploy a real time database that scales with your time series needs
Enhance your existing tools with a ubiquitous analysis environment dedicated to time series data
Streamlining KPIs and data visualization across your organization
Enable your business applications to interact easily with your system's data

Improve the efficiency of your existing infrastructure

The Warp 10 Platform integrates into existing datalake infrastructures and provides storage and analytics solutions tailored for time series data which can be leveraged from existing tools.

Component	Description
Storage Engine	Securely ingest data coming from devices, supporting high throughput, delayed and out-of-order data with support for a wide variety of protocols such as HTTP, MQTT, or Kafka. Read more
History Files	Efficiently compact stable data and store the resulting files onto any filesystem or cloud object store while retaining the same access flexibility as data stored in the Warp 10 Storage Engine. Read more
Analytics Engine	Leverage WarpLib, a library of over 1300 functions designed specifically for time series data manipulation. Increase the efficiency of data teams thanks to the WarpScript programming language, which uses WarpLib and interacts with a large ecosystem.
Dynamics Dashboards	Create highly dynamic dashboards from your time series data. Discovery is a dashboard as code tool dedicated to Warp 10 technology. Display your data through an entire dashboard. Read more
Business Applications	Enable business applications to benefit from the wealth of knowledge present in time series data by connecting those applications to the analytics and storage engines provided by the Warp 10 platform. Read more

The Storage Engine, The Analytics Engine, History Files and Dynamics Dashboards can be used together or separately.

Versions

The Warp 10 platform is available in three versions, Standalone, Standalone+ and Distributed. All versions provide the same level of functionality except for some minor differences, the complete WarpScript language is available in both versions. They differ mainly by the way the Storage Engine is implemented.

Version	Description
Standalone	The Standalone version is designed to be deployed on a single server whose size can range from a Raspberry Pi to a multi CPU box. It uses LevelDB as its storage layer or an in-memory datastore for cache setups. All features (storage, analysis) are provided by a single process, hence the name standalone. Multiple Standalone instances can be made to work together to provide High Availability to your deployment. This is provided via a replication mechanism called Datalog.
Standalone+	Warp 10 with a FoundationDB backend. It is a middle ground between the standalone and distributed versions, basically a standalone version but with storage managed by FoundationDB instead of LevelDB.
Distributed	The Distributed version coordinates multiple processes on multiple servers. The Storage layer uses FoundationDB for data persistence. Communication between processes is done through Kafka and ZooKeeper. This version is suitable for heavy workloads and giant datasets. Scalability comes with a price, the added complexity of the architecture.

Getting started

We strongly recommend that you start with the Onboarding tutorials to learn how Warp 10 works, and how to perform basic operations with WarpScript. To deploy your own instance, read the getting started.

Learn more by browsing the documentation.

To test Warp 10 without installing it, try the free sandbox where you can get your hands on in no time.

For quick start:

./warp10.sh init standalone
./warp10.sh start

Help & Community

The team has put lots of efforts into the documentation of the Warp 10 Platform, there are still some areas which may need improving, so we count on you to raise the overall quality.

We understand that discovering all the features of the Warp 10 Platform at once can be intimidating, that’s why you have several options to find answers to your questions:

Explore the blog and especially the Tutorials and Thinking in WarpScript categories
Explore the tutorials on warp10.io
Join the Lounge, the Warp 10 community on Slack
Ask your question on StackOverflow using warp10 and warpscript tags
Get informed of the last news of the platform thanks to Twitter and the newsletter

Our goal is to build a large community of users to move our platform into territories we haven't explored yet and to make Warp 10 and WarpScript the standards for sensor data and the IoT.

Contributing to the Warp 10 Platform

Open source software is built by people like you, who spend their free time creating things the rest of the community can use.

You want to contribute to Warp 10? We encourage you to read the contributing page before.

Commercial Support

Should you need commercial support for your projects, SenX offers support plans which will give you access to the core team developing the platform.

Don't hesitate to contact us at [email protected] for all your inquiries.

Trademarks

Warp 10, WarpScript, WarpFleet, Geo Time Series and SenX are trademarks of SenX S.A.S.

warp10-platform's People

Contributors

Stargazers

Watchers

Forkers

hbs morind randomboolean starsway aurrelhebert lostinbrittany slambour swallez mindis vdieulesaint mbrukman waxzce magesa stevenleroux wangperry mehranshakeri pierrez d33d33 dantodor sboorlagadda rdejoux manolama miton18 giwi ftence blackyoup pi-r-p divarvel mefl garye iot-alex clevercloud kaseifr leoncx maniacs-db redref enovea sir-apollof tsdb-io bansicloud fasar stggn tobiasleenders shisheng-1 fmeurisse fgydata quantumphy markkupekkarinen eching-statck alexandrebrg

warp10-platform's Issues

LMAP behavior

When using LMAP like for example the sample from: http://www.warp10.io/reference/functions/function_LMAP/

[ 42 21 11 ]
<% -1 * %>
LMAP

gives, not always but most times, the following stack:

0: [0,-1,-2]
1: 11
2: 21
3: 42

When I think the following stack is expected:

0: [-42,-21,-11]
1: 11
2: 21
3: 42

It something wrong ? Maybe I use wrongly LMAP ?

Attributes should be systematically output by endpoints /find and /delete

No default limits for applications

Default limits are available for producers but not for applications.

What about adding: DEFAULT_MADS_APPLICATION and DEFAULT_RATE_APPLICATION?

Attributes are not copied in various GTS operations (bucketize, normalize, ...)

https://groups.google.com/forum/#!topic/warp10-users/KgtIe8pUMD4

Clean up references to DEFAULT_MODULUS

Since the only supported value of DEFAULT_MODULUS is 1, remove dead code.

Generic error status codes

Many errors have the same return code :

Exceeded MADS
Invalid Token
Exec Limit raised
Exceeded DDP

It would be nice to have a fine grained error status codes so that we can better catch the source of the issue.

Issue with TIMECLIP

On a Timecliped-Bucketized-gts, FILLVALUE will insert values outside of the clip's range: (Script)

Workaround: reclip after FILLVALUE (Script). Is it the expected behaviour ?

Make HBase scan block caching a configurable option

Directory : provide a last_seen hint

To know if a serie is still active or not, you can fetch the last available datapoint and check its date matching your criteria.

If data auto eviction is enabled, then you won't even be able to lookup this last datapoint.

It would be very useful to have a last_seen hint per GTS that give the timestamp (second is enough) corresponding to the last update on a GTS.

There is two possibilities to implements this :

Ingress
Store

Store component is CPU bound and uses less memory than Ingress, since it maintains the meta caching, so in case of distributed deployment, it would be better implement this on the Store side.

It would maintain a structure, like a concurrent hash map, setting the last timestamp as a value for a key corresponding to the TS ID.

Data could be sampled, and produced in best effort to either directory (standalone) or a dedicated Kafka topic.

The struct IndexSpec could have the last_seen field so that we can at the end perform a LASTSEEN with an optional parameter :

[ 'RTOKEN' 'class_pattern' { labels } ] LASTSEEN

result would be :

[{
		"c": "class",
		"l": {
			"label0": "value0",
			"label1": "value1"
		},
		"a": {
			"attr0": "value0"
		},
		"v": [
			[0, last_seen],
		]
	}
]

This way, we can leverage all frameworks to manipulate the result (FILTER, ...) and easily get series older than n days.

This would be very helpful to manage the Directory.

UNTIL does not correctly enforce max loop time

init file overwrites configuration file

Problem is that init file is overwriting some parameters which we expect to set them in configuration file. e.g. LevelDB home folder.
After setting leveldb.home = /some/where/else/data in configuration file, our Warp10 instance failed. After investigation we found out that this parameter is overwritten in init file. But since I'm not sure what is the process and order of running LevelDB, I couldn't decide what should be the general solution.

In our case we introduced another parameter in init file to have the same value as configuration there.

If only by having these kind of parameters in configuration file, Warp10 platform can work properly, then they should be not set in init file again, when the configuration file exists.

Add an option to sort labels/attributes for /fetch and /find endpoints

Add labels inside READ tokens

Both EgressFetchHandler.java (Fetch API) and FETCH.java (WarpScript API) implements Tokens.labelSelectorsFromReadToken() but looking deeper into this method, I've noticed that in Quasar, deliverReadToken() doesn't provide a way to create a read token with built-in labels.

Ultimately, the Thrift struct ReadToken doesn't provide a dedicated field for labels (there is one for attributes withal which make me think that's just an oversight).

We should add a fourteenth field in the struct reflecting the ability to extract labels from the read token too.

Issue with UNLIST

UNLIST doesn't push a mark on the stack but "<% MARK %> EVAL" instead.

NaN interpreted as 0 if first datapoint in GTS != NaN

Adding a "NaN" datapoint inside a GTS behaves strangely.
I had this issue, when I used FILLTICKS to fill with a GTS with NaN.
The first tick in a GTS has to be NaN, otherwise every NaN will be converted to 0 (for the whole GTS).

NEWGTS 'test1' RENAME
{ 'label0' '42' 'label1' 'foo' } RELABEL
100  NaN NaN NaN NaN ADDVALUE
200  NaN NaN NaN 42 ADDVALUE
300  NaN NaN NaN NaN ADDVALUE
'test1' STORE

NEWGTS 'test2' RENAME
{ 'label0' '42' 'label1' 'foo' } RELABEL
100  NaN NaN NaN 42 ADDVALUE
200  NaN NaN NaN NaN ADDVALUE
300  NaN NaN NaN NaN ADDVALUE
'test2' STORE

$test1 100 ATTICK 4 GET NaN == ASSERT
$test1 200 ATTICK 4 GET 42 == ASSERT
$test1 300 ATTICK 4 GET NaN == ASSERT

$test2 100 ATTICK 4 GET 42 == ASSERT
$test2 200 ATTICK 4 GET NaN == ASSERT
$test2 300 ATTICK 4 GET NaN == ASSERT

{"c":"test2","l":{"label0":"42","label1":"foo"},"a":{},"v":[[100,42],[200,0],[300,0]]}
{"c":"test1","l":{"label0":"42","label1":"foo"},"a":{},"v":[[100,NaN],[200,42],[300,NaN]]}

TLTTB doesn't work

Running example provided by Warp10 website returns error -1 without any more info!

ps: "FORi" should be changed to "FOR" in example

New functions to DROP/ADD buckets

Here's an example:

NEWGTS 'toto' RENAME 
10 NaN NaN NaN  3.0 ADDVALUE
20 NaN NaN NaN  10.0 ADDVALUE
30 NaN NaN NaN 16.0 ADDVALUE
40 NaN NaN NaN 40.0 ADDVALUE
1 ->LIST

[ SWAP mapper.rate 1 0 0 ] MAP

When running any kind of mapper on a sliding-window mode, the first and last buckets will be wrong and that's fair. I can then do a TIMECLIP to remove the faulty buckets.

My proposal is to offer two new functions:

DROPFIRSTBUCKETS (or REMOVE, I'm not sure about the right choice of name)
DROPLASTBUCKETS (or REMOVE, I'm not sure about the right choice of name)

Each function will take a list of bucketized GTS or a single GTS, then a number of buckets to drop. The result will be the list of GTS with the dropped buckets.

To be more generic, we may also add functions to add buckets at the beginning or the end, that will automatically deal with bucketspan to add the right timestamp for the new datapoint, but that's not my primary need.

I will be more than happy to add these functions!

Add periodic dump of LKP index in GeoDirectory

DTW function is not symetric

DTW implementations has a big problem. As it is a distance, it must be symetric but it is not the case.

Here is the Gist of my tests
https://gist.github.com/papainge/436e827e319f68505ce2ab37d21be080

Compute is your implementation
ComputeMine is my implementation

Make kafka request timeout a configurable option

Scope: Ingress (kafka producer)
property: request.timeout.ms

Duplicate LOG function

Hi,

The LOG function (for debug purpose) duplicate the mathematical LOG function.

Regards,

Setting data directory permissions in warp10-standalone.init

In init file, lines 123, 124, 125 are setting permission for data directory:

chmod -R 755 ${WARP10_HOME}/datalog
chmod -R 755 ${WARP10_HOME}/datalog_done
chmod -R 755 ${WARP10_HOME}/leveldb

Shouldn't they use ${WARP10_DATA_DIR}?
In default configuration this WARP10_DATA_DIR will be set as WARP10_HOME but if WARP10_DATA_DIR point to somewhere else, these lines of codes will be wrong.

Set a more meaningful class name in 'REDUCE'.

Right now we set the empty string as the name of the result GTS. It would be better to retain the original GTS name if it is common to all GTS we reduce.

Calls to commitOffsets can lead to inconsistent state if an exception if thrown.

COMMONTICKS fails when one of the GTS is empty

Value Type for MAXDEPTH function

When using the function MAXDEPTH on Warpscript, a type exeption is thrown ( integer cannot be casted to long).
In fact in the class : script/functions/MAXDEPTH.java, limit is set to the attribute ATTRIBUTE_MAX_DEPTH
stack.setAttribute(WarpScriptStack.ATTRIBUTE_MAX_DEPTH, limit);
The problem here is that limit correspond to a Long, when WarpScriptStack.ATTRIBUTE_MAX_DEPTH is set to int.

TOTIMESTAMP WarpScript function is rounding to milliseconds

Hello,

It seems that the function 'TOTIMESTAMP' is dropping all digits under millisecond.
On a platform where

$ grep timeunit /opt/warp10/warp10/etc/conf-standalone.conf
warp.timeunits = ns

has been set, this means that the last six digits are zeroed.

Using the script below, one can see that call to 'ISO8601' and then to 'TOTIMESTAMP' does not lead back to the initial timestamp obtained through call to 'NOW', unlike the ad-hoc function 'myTOTIMESTAMP'.

Is this a desired behaviour? In case it is, I will use my own function, but I thought you might want to know about this.

Please note that if you test this through quantum, display of results will be affected by:

https://github.com/cityzendata/warp10-quantum/issues/7

Thanks

WarpScript used to test and get the expected result:

<%
DUP
ISO8601
TOTIMESTAMP
SWAP
TSELEMENTS
6
GET
1000000
%
+
%>
'myTOTIMESTAMP'
STORE

NOW
DUP
ISO8601
DUP
TOTIMESTAMP
3
PICK
$myTOTIMESTAMP
EVAL

incorrect version number in getting started guide

Hello,
I'm diving into warp10 for the first time. I followed your getting started guide, but the version number given in that guide (1.0.5) mismatch the version cloned from github (1.0.6).
Therefore, the worf.sh script fails to start the worf console.

Regards,
Stan.

Count based fetch returns extraneous datapoints when Geo Time Series lies across HBase regions

The scanner is applied to each region independently, therefore if a GTS spans two or more regions, up to N datapoints will be returned from each region, we therefore need to limit the total number of datapoints returned on the client side.

Add comments to TRL

Right now, we can put two types of format in the TRL:

a token identifier
an application name

One feature present in throttling files is comments, which allows us to add additional information next to a throttling. Can we do the same things for TRL? This would allow us to display additional information next to token identifiers like this:

# app.abcd
awes0mesiphash1
awes0mesiphash2
awes0mesiphash3

Ignored producer limit is not correctly implemented in ThrottlingManager

When producer limit is -1, per application limit is not propertly checked.

LTTB creates duplicate Timestamps

Calling LTTB on some GTS can produce a duplicate datapoint in the beginning.

Steps to reproduce:

NEWGTS
0 1200
<% 'k' STORE $k 1000 * 500 + NaN NaN NaN $k 180.0 / DUP SIN SWAP 100 * COS + ADDVALUE %>
FOR
1000 LTTB

Result: {"c":"","l":{},"a":{},"v":[[500,1],[500,1],[3500,-0.0790576529419306] .........

Also notice, while the original GTS does have 1200 datapoints, the result does only contain 602 datapoints.

Default bootstrap manager loading delay should not be set to Long.MAX_VALUE

Make calls to Meta API create series in directory

Hi,

There are cases where you need to update some metadata, calling /meta API, but since your data points could be issued by counters, they can arrive later.

When Metadata are managed, there are few risks of desync between meta and data.

Another point is :

default behaviour should, by default, allow the creation of empty series while calling Meta API.
If a user deleteall a series, and update Meta that ends to the deprecated series creation. It's up to the user to assume it, and manage its series. "Last Activity" feature will help cleaning these eventualities.

The control over the ability to create series or not could be addressed at two places :

global param for the platform
as a boolean inside the WRITE token

IMO, this behaviour is more natural than just forbid the creation of empty series.

Thrift 0.9.1

Thrift 0.9.1 broke with Ubuntu 6.3.0-12.

APPEND on list has an o(n) time complexity

Hi,

I was using APPEND on very big list with the misleading belief that APPEND on list is a constant time operation.

Sadly after looking at the APPEND code it's not the case.
java.util.List.addAll will always iterate over the list.

APPEND could be a contant time operation if Guava com.google.common.collect.Iterables.concat
was used instead of List.

Would this be easy to implement ?
Maybe it means changing all functions implementation to use Iterables instead of List or Collections.
Not sure it's a simple modification.

Building from releases fails

I downloaded the source package from your posted releases and tried to build the binaries but failed with this error:

What went wrong:
A problem occurred evaluating root project 'warp10-platform-1.2.6'.

Process 'command 'git'' finished with non-zero exit value 128

Looks like your build always expects it to be a git repo.

Bug with ZIP function

ZIP function pushes an empty list when operating on a list that contains only singletons,
instead of pushing a list containing the concatenated list of these singletons.

/find endpoint is not available in the standalone version

ThrottlingConfigGenerator should differentiate missing and expired estimators

Better error when calling DELETE function on a inmemory standalone

When we're using a DELETE on a inmemory standalone, we obtain this error:

DELETE failed to complete actual request successfully (Server Error)

Warp10's log are holding the real error:

Caused by: java.io.IOException: MemoryStore only supports deleting complete Geo Time Series.

Is it possible for DELETE function to forward this error from /api/v0/delete?

Add property to set the Kafka Assignment Strategy

Add property per component (store, directory, ...) to set the 'partition.assignment.strategy' parameter onto Kafka Consumer instances. By default, partition.assignment.strategy value is 'range'. Since 0.8.2 we can use 'roundrobin'.
consumerconfigs

Make FETCH accept a list of selectors as parameter

Understanding mapper.geo.within

For a sample project, we need to extract a list of values from a square on the map.

I looked at the doc and found mapper.geo.within and tried to test it with sample values :

NEWGTS 'test' RENAME
10  10 10 0 '(10, 10)' ADDVALUE
20  10 02 0 '(10, 02)' ADDVALUE
30  12 20 0 '(12, 20)' ADDVALUE
40  15 20 0 '(15, 20)' ADDVALUE
50  15 25 0 '(15, 25)' ADDVALUE

// Let's define a square geo zone around the Ile Vierge, near the coastline of Brittany (France)
'POLYGON ((5 5, 5 100, 100 100, 100 5, 5 5 ))'
0.1 true GEO.WKT

mapper.geo.within 0 0 0
5 ->LIST MAP

The result of this query is:

[{"c":"test","l":{},"a":{},"v":[[50,14.999999986030161,24.99999993480742,0,"(15, 25)"]]}]

Only the last one is included into the result, instead of all except "(10, 02)".

Is it a bug or did I make a mistake ?

Add an option to the delete endpoint to only consider GTS which have all the specified labels

GTSEncoder reset leads to incorrect values

When a GTSEncoder is reset using a GTSEncoder retrieved from a GTSDecoder, if the GTSDecoder contained identical values and one of them was consumed, the reset GTSEncoder will be corrupted.

Issue reading data written with different write tokens which have the same application

Hi Warp10 team,

I am not sure this is solved already or not since I can't upgrade my Warp10 for the moment.
I am using version "1.2.5-rc7".

The problem is with this query:

[
 'READ_TOKEN'
 '~className|className'
 {}
 NOW -1
] FETCH

I expect this query to return latest data point for one GTS but in my case it returns one GTS with 3 data points.

How come I wrote this query? Well, I have one macro which generates FETCH query and it could happen that based on some conditions one class name will be chosen several times.

My current workaround is to make output of that macro UNIQUE but still I didn't expect Warp10 behave like this. Is this a bug or expected behaviour? And if it's a bug, is it already solved in current version?

Thanks

Add configuration to set client.id for Kafka Consumers and Producers

Implicit mutability while using APPEND

Here is an example :

[ 'a' ] 'list' STORE
$list 'b' + DROP
$list

The stack contains : ['a']

[ 'a' ] 'list' STORE
$list [ 'b' ]  APPEND DROP
$list

The stack contains : ['a','b']

Since the $list has not been STOREd back, it seems that APPEND is applying mutabilty on the given object.

If this is a normal behaviour, we should point it out in the reference doc for APPEND and other concerned functions.

senx / warp10-platform Goto Github PK

warp10-platform's Introduction

Warp 10

The Most Advanced Time Series Platform

Improve the efficiency of your existing infrastructure

Versions

Getting started

Help & Community

Contributing to the Warp 10 Platform

Commercial Support

Trademarks

warp10-platform's People

Contributors

Stargazers

Watchers

Forkers

warp10-platform's Issues

Recommend Projects

Recommend Topics

Recommend Org