The cueobserve from cuebook

Ability to change the interval in anomaly card

Daily anomaly cards currently show 45 days of historical data.
The user should be able to input this interval on the card UI.

There are at least 2 ways to take this user input:

as time interval. e.g. last 90 days
as number of data points. e.g. 90 shows latest 90 data points.

refer discussion #139

Unable to create dataset with clickhouse as db

I can create a connection but get an error when I try to create a dataset.

select
  date_trunc('hour', date_add(day, date_diff(day, toDate('2014-03-23'), today()), EventTime)) as NewEventTime,
  MobilePhoneModel,
  count(1) as Hits
from hits_v1
GROUP BY
  date_trunc('hour', date_add(day, date_diff(day, toDate('2014-03-23'), today()), EventTime)), MobilePhoneModel
order by date_trunc('hour', date_add(day, date_diff(day, toDate('2014-03-23'), today()), EventTime))

Enable user to specify min % contribution for RCA

Analyze anomalous data point for dimension values with minimum X% contribution, where X can be specified by the user.
Currently X = 1

Add `Search` in screens

Screen	Search in Columns
Anomalies	Dataset, Granularity, Measure, Filters
Anomaly Definitions	Dataset, Granularity, Anomaly Definition
Datasets	Dataset Name, Connection, Granularity

Slack alert when an anomaly detection job fails

Multi-measure RCA

metric lineage
time lag between metrics
auto-apply dimension value as filter, if applicable and available

Discussed in #75

^{Originally posted by jithendra945 August 5, 2021}

Im getting this error, when i am trying to run anomaly definition.

Im not getting why it is having None in it.

Root Cause Analysis

Analyze anomalous data point for dimension values with minimum X% contribution

Clickhouse as datasource support

Describe the solution you'd like
I'd love to give CueObserve a try but our warehouse is currently in MS SQL Server.

Describe the solution you'd like
Add ClickHouse as a supported data source.

I will wait when #52 will resolve and try to implements PR

Specify multiple dimensions in an anomaly definition

Support Non roll-up dataset

I should be able to run anomaly detection on metrics that are not additive.

Below are a few examples of aggregate functions in a dataset's SQL GROUP BY that can then be supported:
COUNT(DISTINCT)
MIN()
MAX()
AVG()
Custom Percentage calculations

Dataset SQL can have zero or 1+ dimensions. Since data cannot be rolled up, anomaly definition cannot define anomaly explosion. Instead, dataset SQL itself defines the extent of explosion.
e.g. Say a dataset has 2 dimensions and 1 metric - State, Brand, ConversionRate. This means anomaly objects must be created for each state+brand combination. We cannot have an anomaly definition for a single dimension or no dimension.

impacts Anomaly Definition screen and logic
impacts anomaly object creation process
RCA must be disabled for such anomaly cards

Schedule Tasks more that or equal 6 are being struck lifetime in celery queue

Describe the bug
Cueobserve scheduled tasks are getting being struck in celery queue and not completing(No error is thrown)

To Reproduce
Steps to reproduce the behavior:

create 6 different anomalies definitions
create 5 min cron interval & wait for 5min to start the scheduled tasks

th0se tasks are not completing even after 1 day.

for debugging, I have tried executing 5 scheduled tasks and 1 Scheduled task separately(different cron intervals), its is working fine, but when the scheduled tasks are 6 with same cron interval those got struck and didn't finish

Expected behavior
It should complete the 6 accepted tasks and pull the next tasks

Thanks

Add infra requirements in docs

add minimum CPU and memory requirements in the Installation page.

`Top N` alternatives in anomaly definition

Min % Contribution X
X is a number between 1 and 100
Min Avg Value Y
Y is of data type double
- compare average of metric, instead of metric.
  Avg(metric) >= Y
  ensure average calculations are correct for each granularity
  - daily granularity
  - hourly granularity

Include chart as image in alerts

Docs for non roll-up dataset

docs for issue #48

Handle string in metrics

Discussed in #81

^{Originally posted by satkalra1 August 5, 2021}
Hey,
I was trying the cue observe on the test dataset, to understand its working properly but after defining the anomaly, I found this particular error:

{"stackTrace": "Traceback (most recent call last):\n File "pandas/_libs/lib.pyx", line 2062, in pandas._libs.lib.maybe_convert_numeric\nValueError: Unable to parse string "null"\n\nDuring handling of the above exception, another exception occurred:\n\nTraceback (most recent call last):\n File "/code/ops/tasks.py", line 49, in anomalyDetectionJob\n dimValsData = prepareAnomalyDataframes(datasetDf, anomalyDefinition.dataset.timestampColumn, anomalyDefinition.metric, anomalyDefinition.dimension, anomalyDefinition.top)\n File "/code/access/utils.py", line 17, in prepareAnomalyDataframes\n datasetDf[metricCol] = pd.to_numeric(datasetDf[metricCol])\n File "/opt/venv/lib/python3.7/site-packages/pandas/core/tools/numeric.py", line 155, in to_numeric\n values, set(), coerce_numeric=coerce_numeric\n File "pandas/_libs/lib.pyx", line 2099, in pandas._libs.lib.maybe_convert_numeric\nValueError: Unable to parse string "null" at position 1715\n", "message": "Unable to parse string "null" at position 1715"}

Could you please tell me , where the things are going wrong?

add Clickhouse in docs

Unable to connect to local postgres server

I tried to give postgres as given in the document.
But it is on my local system.

When i run it, it is giving 500 internal server error, could not connect to postgres server.

I can connect to postgres psql on my system but this project is not connecting.

Anyone know the solution ?

Originally posted by @jithendra945 in #55

Add Loader in `Add Connection` drawer

Add loader while connection is being tested, else the user doesn't get any feedback

Support SQL Server Data Source

Is your feature request related to a problem? Please describe.
I'd love to give CueObserve a try but our warehouse is currently in MS SQL Server.

Describe the solution you'd like
Add SQL Server as a supported data source.

Additional context
I'd be interested in making the necessary pull request, but I'd like some high level advice on what might be needed.

Is it as simple as adding the necessary sqlserver.py in https://github.com/cuebook/CueObserve/tree/main/api/dbConnections ?

Docker Compose

Specify dimension values as filter in anomaly definition

related #42

Email alerts

Ability to add/remove one or more emails in Settings screen

anomaly page blank

Discussed in #77

^{Originally posted by jithendra945 August 5, 2021}
whenever anomaly definition got Error, anomalies page is going blank

More options for Granularity

Dataset Granularity

Week
- ability to specify week start day in Settings.
Month

Runtime granularity for RCA

specify in multiples of dataset granularity. e.g. 3 days
specify origin/end point. default is end point as systime.

Anomaly Qualification rules

Anomaly qualification rules to decide whether an anomaly should be published or not.
Should these rules be defined at the global level, dataset level or at the anomaly definition level?
How do we merge duplicate anomalies resulting from multiple anomaly definitions on the same measure?
Support OR / AND when multiple rules.

Threshold metrics

Percentage Change in Metric. compare data point vs previous data point.
Metric Value
Filter's % Contribution
Anomaly Deviation Value
% Anomaly Deviation

Threshold operators
>, >=, <, <=, between, not between

Add authentication layer

creation of superuser
email: [email protected], password: admin
login page
Authentication can enable using environment variable
Authentication can disable using environment variable

Add documentation for development on CueObserve

How to run the UI service
How to run the Django API
How to configure celery and redis locally

Settings screen

Slack webhooks for

Anomaly Alerts #28
App Monitoring #29

Add detailed production deployment docs

user feedback

First question on top of my mind : how to host it somewhere and schedule these anomaly detection jobs in a self-serve manner.
Once scheduled - someone else can come in and navigate the results super easily. Could not find anything in the documentation but will play around a bit more.

Anomaly objects are not being created for some anomaly definitions, Plus fixes

0 anomaly objects
Slack alerts have reverted to older unformatted version
Increase test coverage

Ability to execute RCA on user specified data point

RCA currently executes on the latest anomaly data point. As a user, I should be able to execute RCA on any given data point.

refer discussion #139

Handle data quality before running anomaly detection

Run data quality check after fetching data for a dataset and before running anomaly detection job.

Metric column must not have any string value. Refer #81.
If a dataset metric contains a string value, throw error in anomaly definition and do not run the anomaly definition.
Handle NULL, NaN values for metrics in pandas dataframe
Better handling of insufficient data

Ability to download logs

it would help us debug issues like #147

anomaly objects must be greater than 0

Add a test case that anomaly objects must always be greater than 0 for an anomaly run.

There can be scenarios where no anomaly object is created due to % contribution or min value condition not being met

Modify api service in UI to automatically have the correct base URL in development and production

We can leverage the already NODE_ENV for checking if the app is running in development mode, see example code below:

if(process.env.NODE_ENV === "development"){
      // Development Settings
     this.host = "http://localhost:8000";
     this.base_url = this.host + basePath + "/api/";
}
else{
      // Production Settings
     this.host = "";
     this.base_url = this.host + basePath + "/api/";
}

Specify Manual Rule in anomaly definition

Type Measure Rule [Dimension Explosion]

Type = Rule, Prophet

Rule =

Percentage Change >= X
compare data point vs previous data point
X is a number >= 1
Value operator Y [and Z]
operator = >, >=, <, <=, between, not between
Y, Z are of type double
Lifetime High/Low

Ability to abort a running RCA

The time it takes to do RCA depends on the number of dimensions in the dataset and the available infra.
User should be able to abort a running RCA.

Support generic rest end point for notification

Discussed in #129

^{Originally posted by pjpringle August 30, 2021}
Not everyone has slack especially in the work place environments. Provide support to plug in rest calls to notify of anomalies.

Image bytes included in response json

Deleting a schedule assigned to an anomaly definition breaks the Anomaly Definitions screen

I found one more Issue like,
i added a schedule
then, I added it to anomaly definition,
afterwards I went to Schedules and deleted that schedule,
Then anomaly definitions are empty, it is getting 500 error as that schedule dint exist.

Originally posted by @jithendra945 in #61

Update docs

add the sample data to load for faster evaluating

Its actually good to hav the sample data set used in the demo in repo or somewhere.

RCA for rule based anomalies

implement RCA for anomalies generated via rules, not just prophet.
Pick up after #45 is closed

Auto-scale anomaly detection & RCA infra from/to Zero

Discussed in #135

^{Originally posted by sdepablos September 15, 2021}
Right now CueObserve is using Celery to schedule tasks (plus Redis to save the config). This requires having the system always up for the scheduler to work. In my case, I'd prefer to run this as an "scale to 0" application, either via Cloud Run or App Engine. To that effect, my idea would be to define to schedule in Google Cloud Scheduler, which will then trigger the recalculation, without the need to have a system always up. Which would be the API endpoint to trigger a task?

Celery Tasks Fail Randomly with redis.exceptions.ResponseError: wrong number of arguments for 'subscribe' command

Describe the bug
Facing this issue intermittently where Celery gives the following error on scheduled run. I believe this is happening because of some race condition due to asyncio. We have used single pod solution only, even with that configuration this issue pops up randomly

{"stackTrace": "Traceback (most recent call last):\n File \"/code/ops/tasks/anomalyDetectionTasks.py\", line 85, in 
anomalyDetectionJob\n result = _detectionJobs.get()\n File \"/opt/venv/lib/python3.7/site-packages/celery/result.py\", line 680, in get\n on_interval=on_interval,\n File \"/opt/venv/lib/python3.7/site-packages/celery/result.py\", line 799, in 
join_native\n on_message, on_interval):\n File \"/opt/venv/lib/python3.7/site-packages/celery/backends/asynchronous.py\",
 line 150, in iter_native\n for _ in self._wait_for_pending(result, no_ack=no_ack, **kwargs):\n File 
\"/opt/venv/lib/python3.7/site-packages/celery/backends/asynchronous.py\", line 267, in _wait_for_pending\n 
on_interval=on_interval):\n File \"/opt/venv/lib/python3.7/site-packages/celery/backends/asynchronous.py\", line 54, in 
drain_events_until\n yield self.wait_for(p, wait, timeout=interval)\n File \"/opt/venv/lib/python3.7/site-
packages/celery/backends/asynchronous.py\", line 63, in wait_for\n wait(timeout=timeout)\n File 
\"/opt/venv/lib/python3.7/site-packages/celery/backends/redis.py\", line 152, in drain_events\n message = 
self._pubsub.get_message(timeout=timeout)\n File \"/opt/venv/lib/python3.7/site-packages/redis/client.py\", line 3617, in 
get_message\n response = self.parse_response(block=False, timeout=timeout)\n File \"/opt/venv/lib/python3.7/site-
packages/redis/client.py\", line 3505, in parse_response\n response = self._execute(conn, conn.read_response)\n File 
\"/opt/venv/lib/python3.7/site-packages/redis/client.py\", line 3479, in _execute\n return command(*args, **kwargs)\n File 
\"/opt/venv/lib/python3.7/site-packages/redis/connection.py\", line 756, in read_response\n raise 
response\nredis.exceptions.ResponseError: wrong number of arguments for 'subscribe' command\n", "message": "wrong 
number of arguments for 'subscribe' command"}

To Reproduce
Steps to reproduce the behavior:

Create an anomaly definition
Schedule it to run at specific time
Few times schedule might succeed where as few other times you might see the above error

Expected behavior
Is there any work around that we can use to avoid this issue?, please help

Docs for authentication layer

for #64

Docs for manual rule based anomaly definition

docs for #45

overview
anomaly definition

cuebook / cueobserve Goto Github PK

cueobserve's People

Contributors

Stargazers

Watchers

Forkers

cueobserve's Issues

Discussed in #75

Discussed in #81

Discussed in #77

Discussed in #129

Discussed in #135

Recommend Projects

Recommend Topics

Recommend Org