Code Monkey home page Code Monkey logo

cuebook / cueobserve Goto Github PK

View Code? Open in Web Editor NEW
208.0 208.0 23.0 6.29 MB

Timeseries Anomaly detection and Root Cause Analysis on data in SQL data warehouses and databases

Home Page: https://cueobserve.cuebook.ai

License: Apache License 2.0

Python 55.56% JavaScript 33.07% HTML 0.14% CSS 8.18% SCSS 1.58% Dockerfile 0.66% Shell 0.81%
anomaly anomaly-detection bigquery datawarehouse prophet-facebook redshift root-cause-analysis snowflake sql timeseries-analysis timeseries-forecasting

cueobserve's People

Contributors

ankitkpandey avatar kmittal01 avatar kshitij-cuebook avatar prabhu31 avatar praveencuebook avatar sachinkbansal avatar slach avatar vikrantcue avatar vincue avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

cueobserve's Issues

Ability to change the interval in anomaly card

Daily anomaly cards currently show 45 days of historical data.
The user should be able to input this interval on the card UI.

There are at least 2 ways to take this user input:

  1. as time interval. e.g. last 90 days
  2. as number of data points. e.g. 90 shows latest 90 data points.

refer discussion #139

Unable to create dataset with clickhouse as db

I can create a connection but get an error when I try to create a dataset.

select
  date_trunc('hour', date_add(day, date_diff(day, toDate('2014-03-23'), today()), EventTime)) as NewEventTime,
  MobilePhoneModel,
  count(1) as Hits
from hits_v1
GROUP BY
  date_trunc('hour', date_add(day, date_diff(day, toDate('2014-03-23'), today()), EventTime)), MobilePhoneModel
order by date_trunc('hour', date_add(day, date_diff(day, toDate('2014-03-23'), today()), EventTime))

Add `Search` in screens

Screen Search in Columns
Anomalies Dataset, Granularity, Measure, Filters
Anomaly Definitions Dataset, Granularity, Anomaly Definition
Datasets Dataset Name, Connection, Granularity

Multi-measure RCA

  • metric lineage
  • time lag between metrics
  • auto-apply dimension value as filter, if applicable and available

hourly error

Discussed in #75

Originally posted by jithendra945 August 5, 2021
image
Im getting this error, when i am trying to run anomaly definition.

Im not getting why it is having None in it.

Root Cause Analysis

Analyze anomalous data point for dimension values with minimum X% contribution

Clickhouse as datasource support

Describe the solution you'd like
I'd love to give CueObserve a try but our warehouse is currently in MS SQL Server.

Describe the solution you'd like
Add ClickHouse as a supported data source.

I will wait when #52 will resolve and try to implements PR

Support Non roll-up dataset

I should be able to run anomaly detection on metrics that are not additive.

Below are a few examples of aggregate functions in a dataset's SQL GROUP BY that can then be supported:
COUNT(DISTINCT)
MIN()
MAX()
AVG()
Custom Percentage calculations

Dataset SQL can have zero or 1+ dimensions. Since data cannot be rolled up, anomaly definition cannot define anomaly explosion. Instead, dataset SQL itself defines the extent of explosion.
e.g. Say a dataset has 2 dimensions and 1 metric - State, Brand, ConversionRate. This means anomaly objects must be created for each state+brand combination. We cannot have an anomaly definition for a single dimension or no dimension.

  • impacts Anomaly Definition screen and logic
  • impacts anomaly object creation process
  • RCA must be disabled for such anomaly cards

Schedule Tasks more that or equal 6 are being struck lifetime in celery queue

Describe the bug
Cueobserve scheduled tasks are getting being struck in celery queue and not completing(No error is thrown)

image

To Reproduce
Steps to reproduce the behavior:

  1. create 6 different anomalies definitions
  2. create 5 min cron interval & wait for 5min to start the scheduled tasks

th0se tasks are not completing even after 1 day.

for debugging, I have tried executing 5 scheduled tasks and 1 Scheduled task separately(different cron intervals), its is working fine, but when the scheduled tasks are 6 with same cron interval those got struck and didn't finish

Expected behavior
It should complete the 6 accepted tasks and pull the next tasks

Thanks

`Top N` alternatives in anomaly definition

  • Min % Contribution X
    X is a number between 1 and 100
  • Min Avg Value Y
    Y is of data type double
    • compare average of metric, instead of metric.
      Avg(metric) >= Y
      ensure average calculations are correct for each granularity
      • daily granularity
      • hourly granularity

Handle string in metrics

Discussed in #81

Originally posted by satkalra1 August 5, 2021
Hey,
I was trying the cue observe on the test dataset, to understand its working properly but after defining the anomaly, I found this particular error:

{"stackTrace": "Traceback (most recent call last):\n File "pandas/_libs/lib.pyx", line 2062, in pandas._libs.lib.maybe_convert_numeric\nValueError: Unable to parse string "null"\n\nDuring handling of the above exception, another exception occurred:\n\nTraceback (most recent call last):\n File "/code/ops/tasks.py", line 49, in anomalyDetectionJob\n dimValsData = prepareAnomalyDataframes(datasetDf, anomalyDefinition.dataset.timestampColumn, anomalyDefinition.metric, anomalyDefinition.dimension, anomalyDefinition.top)\n File "/code/access/utils.py", line 17, in prepareAnomalyDataframes\n datasetDf[metricCol] = pd.to_numeric(datasetDf[metricCol])\n File "/opt/venv/lib/python3.7/site-packages/pandas/core/tools/numeric.py", line 155, in to_numeric\n values, set(), coerce_numeric=coerce_numeric\n File "pandas/_libs/lib.pyx", line 2099, in pandas._libs.lib.maybe_convert_numeric\nValueError: Unable to parse string "null" at position 1715\n", "message": "Unable to parse string "null" at position 1715"}

Could you please tell me , where the things are going wrong?

Unable to connect to local postgres server

I tried to give postgres as given in the document.
But it is on my local system.

When i run it, it is giving 500 internal server error, could not connect to postgres server.

I can connect to postgres psql on my system but this project is not connecting.

Anyone know the solution ?

Originally posted by @jithendra945 in #55

Support SQL Server Data Source

Is your feature request related to a problem? Please describe.
I'd love to give CueObserve a try but our warehouse is currently in MS SQL Server.

Describe the solution you'd like
Add SQL Server as a supported data source.

Additional context
I'd be interested in making the necessary pull request, but I'd like some high level advice on what might be needed.

Is it as simple as adding the necessary sqlserver.py in https://github.com/cuebook/CueObserve/tree/main/api/dbConnections ?

Email alerts

  • Ability to add/remove one or more emails in Settings screen

anomaly page blank

Discussed in #77

Originally posted by jithendra945 August 5, 2021
whenever anomaly definition got Error, anomalies page is going blank

More options for Granularity

Dataset Granularity

  • Week
    • ability to specify week start day in Settings.
  • Month

Runtime granularity for RCA

  • specify in multiples of dataset granularity. e.g. 3 days
  • specify origin/end point. default is end point as systime.

Anomaly Qualification rules

Anomaly qualification rules to decide whether an anomaly should be published or not.
Should these rules be defined at the global level, dataset level or at the anomaly definition level?
How do we merge duplicate anomalies resulting from multiple anomaly definitions on the same measure?
Support OR / AND when multiple rules.

Threshold metrics

  • Percentage Change in Metric. compare data point vs previous data point.
  • Metric Value
  • Filter's % Contribution
  • Anomaly Deviation Value
  • % Anomaly Deviation

Threshold operators
>, >=, <, <=, between, not between

Add detailed production deployment docs

user feedback

First question on top of my mind : how to host it somewhere and schedule these anomaly detection jobs in a self-serve manner.
Once scheduled - someone else can come in and navigate the results super easily. Could not find anything in the documentation but will play around a bit more.

Handle data quality before running anomaly detection

Run data quality check after fetching data for a dataset and before running anomaly detection job.

  • Metric column must not have any string value. Refer #81.
    If a dataset metric contains a string value, throw error in anomaly definition and do not run the anomaly definition.

  • Handle NULL, NaN values for metrics in pandas dataframe

  • Better handling of insufficient data

anomaly objects must be greater than 0

Add a test case that anomaly objects must always be greater than 0 for an anomaly run.

There can be scenarios where no anomaly object is created due to % contribution or min value condition not being met

Specify Manual Rule in anomaly definition

Type Measure Rule [Dimension Explosion]


Type = Rule, Prophet

Rule =

  • Percentage Change >= X
    compare data point vs previous data point
    X is a number >= 1
  • Value operator Y [and Z]
    operator = >, >=, <, <=, between, not between
    Y, Z are of type double
  • Lifetime High/Low

Ability to abort a running RCA

The time it takes to do RCA depends on the number of dimensions in the dataset and the available infra.
User should be able to abort a running RCA.

Support generic rest end point for notification

Discussed in #129

Originally posted by pjpringle August 30, 2021
Not everyone has slack especially in the work place environments. Provide support to plug in rest calls to notify of anomalies.

  • Image bytes included in response json

Update docs

  • add Settings screen #34
  • update Features in readme and overview for slack alerts
  • #44
  • #46
    • Is contribution calculated for the data point or for the entire dataset
    • Update in
      • readme (image + text), features
      • overview
      • why cueobserve
  • #29

Auto-scale anomaly detection & RCA infra from/to Zero

Discussed in #135

Originally posted by sdepablos September 15, 2021
Right now CueObserve is using Celery to schedule tasks (plus Redis to save the config). This requires having the system always up for the scheduler to work. In my case, I'd prefer to run this as an "scale to 0" application, either via Cloud Run or App Engine. To that effect, my idea would be to define to schedule in Google Cloud Scheduler, which will then trigger the recalculation, without the need to have a system always up. Which would be the API endpoint to trigger a task?

Celery Tasks Fail Randomly with redis.exceptions.ResponseError: wrong number of arguments for 'subscribe' command

Describe the bug
Facing this issue intermittently where Celery gives the following error on scheduled run. I believe this is happening because of some race condition due to asyncio. We have used single pod solution only, even with that configuration this issue pops up randomly

{"stackTrace": "Traceback (most recent call last):\n File \"/code/ops/tasks/anomalyDetectionTasks.py\", line 85, in 
anomalyDetectionJob\n result = _detectionJobs.get()\n File \"/opt/venv/lib/python3.7/site-packages/celery/result.py\", line 680, in get\n on_interval=on_interval,\n File \"/opt/venv/lib/python3.7/site-packages/celery/result.py\", line 799, in 
join_native\n on_message, on_interval):\n File \"/opt/venv/lib/python3.7/site-packages/celery/backends/asynchronous.py\",
 line 150, in iter_native\n for _ in self._wait_for_pending(result, no_ack=no_ack, **kwargs):\n File 
\"/opt/venv/lib/python3.7/site-packages/celery/backends/asynchronous.py\", line 267, in _wait_for_pending\n 
on_interval=on_interval):\n File \"/opt/venv/lib/python3.7/site-packages/celery/backends/asynchronous.py\", line 54, in 
drain_events_until\n yield self.wait_for(p, wait, timeout=interval)\n File \"/opt/venv/lib/python3.7/site-
packages/celery/backends/asynchronous.py\", line 63, in wait_for\n wait(timeout=timeout)\n File 
\"/opt/venv/lib/python3.7/site-packages/celery/backends/redis.py\", line 152, in drain_events\n message = 
self._pubsub.get_message(timeout=timeout)\n File \"/opt/venv/lib/python3.7/site-packages/redis/client.py\", line 3617, in 
get_message\n response = self.parse_response(block=False, timeout=timeout)\n File \"/opt/venv/lib/python3.7/site-
packages/redis/client.py\", line 3505, in parse_response\n response = self._execute(conn, conn.read_response)\n File 
\"/opt/venv/lib/python3.7/site-packages/redis/client.py\", line 3479, in _execute\n return command(*args, **kwargs)\n File 
\"/opt/venv/lib/python3.7/site-packages/redis/connection.py\", line 756, in read_response\n raise 
response\nredis.exceptions.ResponseError: wrong number of arguments for 'subscribe' command\n", "message": "wrong 
number of arguments for 'subscribe' command"}

To Reproduce
Steps to reproduce the behavior:

  1. Create an anomaly definition
  2. Schedule it to run at specific time
  3. Few times schedule might succeed where as few other times you might see the above error

Expected behavior
Is there any work around that we can use to avoid this issue?, please help

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.