Comments (12)
Hey @thunyathep,
yes this is working. To save some performance and bandwidth, it might be a good idea to also add $top=0, so you don't get all observations if your request data for a longer time interval. The full request looks like this:
from frost-server.
Unfortunately, aggregation functions are not part of the SensorThings API. Time functions (like during) are under discussion: opengeospatial/sensorthings#2
from frost-server.
Hey @tobi238 , I am doing the same thing by using $filter= phenomenonTime ge .... and phenomenonTime le .... then from the response, you get the "@iot.count". Then you can simply write the code to sum up all your results and then divided by "@iot.count". :)
from frost-server.
As I think aggregation support in crucial in the long run and is specified by an odata extension, I opened an issue on it opengeospatial/sensorthings#71
from frost-server.
You could make a prototype to see if it works as expected. A demonstration of function helps a lot when trying to get things standardised.
from frost-server.
We were discussing today if we can make it a thesis at the university
. However, before any implementation there should be at least some rough consensus about possible syntax and semantics IMHO.
After discussion with my colleagues I would argue additionally to the normal average towards a non-standard timeseries average with two arguments, both value and interval/timestamp. This is particularly necessary to average observations with variable PhenomenonTime intervals.
However it is unclear to me how to denote this using the OData Extension for Data Aggregation even with custom functions.
Also Functions applied before grouping seem to be not part of the standard. My Intuitive definition of a standard timeseries aggregation would be:
https://example.com/v1.0/Datastreams(1)/Observations?$filter=during( phenomenonTime, 2017-10-25T15:00:00.000Z/2017-10-26T15:00:00.000Z&apply=groupby(time_bucket(phenomenonTime,'10m'), aggregate((result,phenomenonTime) with st_average as average))
or
in the simple example above:
http://localhost:8080/SensorThingsServer-1.0/v1.0/Datastreams(123)/Observations
?$apply=aggregate((result, phenomenonTime) with st_average as avg)
&$filter=during( phenomenonTime, 2017-10-25T15:00:00.000Z/2017-10-26T15:00:00.000Z )
or if you consider only point measurements
http://localhost:8080/SensorThingsServer-1.0/v1.0/Datastreams(123)/Observations
?$apply=aggregate(result with average as avg)
&$filter=during( phenomenonTime, 2017-10-25T15:00:00.000Z/2017-10-26T15:00:00.000Z )
The last one would be the most simple one to implement, but this actually makes only sense for periodic or random point sampling. Although I believe this covers 80% of all practical cases...
st_average I would define as the linear interpolation between intervals and points. Alternatively only intervals could be counted, then any aggregation of single values would be ignored. Or only single values get linearly interpolated.
actually some magic is further needed for during that actually cuts interleaving intervals internally, the same should be done for time_bucket so that a PhenomenonTime 2017-10-24T23:00:00.000Z/2017-10-25T23:00:00.000Z should be cut into 2017-10-25T15:00:00.000Z/2017-10-25T23:00:00.000Z internally before aggregation
Seems to actually a bit ugly. BTW: during as I understand it would be also wrong in the above example because it doesn't get the first value correctly if its an interval, shouldn't it be anyinteracts to calculate the average correctly
Because this is a little bit difficult to define consistently I would rather do a definition first. I know its is common these days to do defacto standardization by taking demos as input to standards, but I do not like this. I would rather like to define a consistent definition that can be the basis for an implementation.
from frost-server.
This gets complicated quickly :) Some small questions for understanding:
First, taking a little step further back: In your brainstorm examples, we're doing queries on the Observations collection, but what are we getting back? I guess virtual Observations that do not have an ID, but only result and phenomenonTime, and are formatted the same otherwise? Or would it be a completely differently formatted result set?
If I understand correctly, the first example (with time_bucket function) would also work without filter. In that case it would return an entire Datastream, with 10 minute average result values.
In contrast, the second and third example would require the user to do a separate request for each 10 minute interval.
from frost-server.
The return is defined by the OData Data Aggregation Specification .
Yes, the first one works without filter. The second and third would be rather meaningless without the filter, but should theoretically also work without.
The complicated part is only if you consider time intervals (could be a second step) . I like that sensorthings supports intervals but it makes things rather complicated (e.g. the IMHO wrong use of during in the example above, shows that most users don't think about intervals in the first place). I realized that actually I know no timeseries database that uses them. I now grasp why... (although they are theoretically great).
The standard for aggregation without it seems to be implemented by some Microsoft and SAP Services: e.g. https://docs.microsoft.com/en-us/azure/devops/report/extend-analytics/aggregated-data-analytics?view=azure-devops the basic stuff is also implemented in Olingo
from frost-server.
MultiDatastreams will also be an issue, where result is an array of values.
And all other cases where result is not a simple numeric value.
Of course, since time series databases are optimised for a very specific function, and OData is an interface for generic relational databases, they are on two opposite ends of a spectrum. Do we need to be able to aggregate Things or ObservedProperties?
I'm starting to think it might be better to come up with a completely new definition rather than trying to use the OData aggregation specification. I don't really see many users building those aggregation queries...
An initial list of features to consider:
- Can the user request any aggregation, or does the admin specify certain parameters
- Available aggregation intervals (hourly, daily, 2 minutes)
- Aggregation offset (when does a day start, for daily aggregates)
- Which time to aggregate on (phenomenonTime, resultTime, validTime)
- Is the result Observation compatible (easier), or something different (potentially more powerful)?
- Non-numeric aggregations
- I have a OM_CategoryObservation and I want my aggregate result to be the map of counts for each category
- I have a MultiDatastream with wind speed and direction, and want the average speed, for each 45° section.
from frost-server.
My gut feeling is that it would be nice to stay/become compliant with OData, because of Tool support (Olingo and .NET provide client libraries, that will only work in a compliant setting). I also think a lot less documentation is needed if the syntax stays compliant.
In a full OData compliant setting you can specify which properties you allow to aggregate (this requires metadata support, however). I see, however, the problem that you are right: some time-series stuff is a bit awkward.
-
I believe that multidatastreams are handleable , because they are fixed length. Aggregating across different ObservedProperties (and even Quantities) makes no sense.
-
Arbitrary intervals are nice
-
offsets should be allowed: It makes no sense to use the same day in india and in germany (e.g. 0:0:0 UTC). Maybe one should prefer a syntax like at the bottom of the post. The problem here is really if the syntax is actually OData compliant. (Supports your argument of breaking the standard)
-
you should be able to choose the group by feature, i.e. which time
-
Results should a the standard shows only contain the grouping parameters and the aggregate (otherwise there is no cl ear semantics). The hierarchy of those should as defined in the standard be preserved
-
I believe it is completely OK to return errors if the aggregate function does not work on the data type.
-
Non-numeric aggregations by count are rather trivial
-
the wind speed example is straigh-forward to specify by using two group elements (I actually now think that using arithmetics is better than a custom time_bucket function due to the restrictions you mentioned):
$groupby(floor(phenomenonTime sub '2018-01-01T08:00:00Z' div duration'P1d') as window, result[1] div 45)
, I believe however that this might be slightly not compliant because the arithmetics might not be supported.
from frost-server.
IMHO The "best" semantics for an average is actually sum(interval.length(t)*x)/sum(interval.length(t)) and interval length(t) should default to epsilon for a timestamp and to a unitless interval length for any interval (If there is only epsilons the sum would be count()epsilon and the sum sum(epsilonx))
Practically you first look for any intervals in the aggregate if there are you use a weighted sum of those or if there are only timestamps you would use a standard average.
We first thought of using linear interpolations (could be a second custom function), however, this makes it really stupid for overlapping intervals. Actually, to be honest, the result in the above case for overlaps is well-defined, however, makes actually not too much sense. For rolling averages, the correct average function would need to weight all overlaps by the number of measurements overlapping. This is normally no problem, because you have a equal number of overlaps in most use cases (thus can use above function). But I have to admit that there are pathological cases. But I still think that an API only need to be well-defined ;) .
Beyond custom aggregation functions (3.1.3.6) looking further at the OData Aggregation specification one could actually easily define custom aggregates (6.2.3 Custom Aggregates) (like a time weighted average)
You can even force that the average is always used in the context of certain grouping parameters (6.2.4 , you could force using the unit to make it even more complicated :) ). It might be difficult however to force using a window function with variable using this odata mechanism (length, offset like in the expression using floor above).
I still believe it would be of great value to get this right....
from frost-server.
Hey @tobi238 , I am doing the same thing by using $filter= phenomenonTime ge .... and phenomenonTime le .... then from the response, you get the "@iot.count". Then you can simply write the code to sum up all your results and then divided by "@iot.count". :)
The problem is you can not count on the @iot.count as stated in
9.3.3.4 $count
Clients should be aware that the count returned inline may not exactly equal the actual number of items returned, due to latency between calculating the count and enumerating the last value or due to inexact calculations on the service.
from frost-server.
Related Issues (20)
- [Issue]SensorThings API with Tomcat 10 HOT 3
- GEOM column in Locations is sometimes created as NULL causing issues on the REST-API
- InternalMessageBus errors on fraunhoferiosb/frost-server-http HOT 5
- [Docker Image] Failed to initialize PostgreSQL DB HOT 10
- [Question]Can STA check duplicate data? HOT 1
- Disapperaing properties with empty string as value HOT 2
- Change Datastream/observationType to enum and check its values
- FROST Server very slow on large requests HOT 5
- Database is not using PhenomenonTime-Index HOT 5
- Issue With Authentication with FROST Server HOT 1
- Inconsistency in current Location of a Thing when Historical Locations are deleted or updated HOT 1
- Issue querying FROST-Server with umlauts in name HOT 5
- Query performance issues on Observations with plugins_coreModel_idType LONG HOT 3
- Using batch createObservations failed HOT 9
- Inconsistency in CSV Result Documentation HOT 1
- filtering with any problem HOT 2
- Using Frost-Server without Semantic Sensor Network Ontology HOT 3
- Deployment of FROST-Server.MQTTP-x.y.z.war not working HOT 3
- Mixed content error HOT 5
- USERS and USER_ROLES tables are not created with BasicAuthProvider authentication turned on HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from frost-server.