simplystaking / panic Goto Github PK

PANIC Monitoring and Alerting For Blockchains

License: Apache License 2.0

Shell 0.07% Python 76.97% Dockerfile 0.03% JavaScript 9.63% HTML 0.03% CSS 0.04% SCSS 1.40% TypeScript 11.85%

alerter blockchain chainlink cosmos substrate monitoring

panic's Introduction

PANIC Monitoring and Alerting for Blockchains

DISCLAIMER: Don't allow public access to PANIC UI as it might contain sensitive information about your infrastructure. UI authentication is still to be developed.

PANIC is an open source monitoring and alerting solution for Cosmos-SDK, Substrate and Chainlink based nodes by Simply VC. The tool was built with user friendliness in mind and comes with numerous features, such as phone calls for critical alerts, a UI Dashboard, a Web based installation process and Telegram/Slack commands for increased control over your alerter.

We are sure that PANIC will be beneficial for node operators and we look forward for feedback. Feel free to read on if you are interested in the design of the alerter, if you wish to try it out, or if you would like to support and contribute to this open source project.

Design and Features

If you want to dive into the design and feature set of PANIC click here.

Installation Guide

We will now guide you through the steps required to get PANIC up and running. We recommend that PANIC is installed on a Linux system and that everything needed in the Requirements section is done before the installation procedure is started.

As you will notice below, PANIC supports many alerting channels. It is recommended that at least one of the alerting channels mentioned in the requirements section is set-up.

Requirements

Git command line tools. Click here if you want a guide to set it up.
Docker and Docker Compose: This installation guide uses Docker and Docker Compose to run PANIC, these will need to be installed. Click here if you want a guide to set it up.

Optional

Node Exporter, this will be used to monitor the systems on which the nodes are running. If you want your nodes' systems to be monitored this step is no longer optional. Node Exporter must also be installed on each machine that you want to monitor. Click here if you want a guide to set it up.
Telegram account and bots, for Telegram alerts and commands. Click here if you want a guide to set it up.
Slack account and app, for Slack alerts and commands. Click here if you want a guide to set it up.
Twilio account, for phone call alerts. Click here if you want a guide to set it up.
PagerDuty account, for notifications and phone call alerts. Click here if you want a guide to set it up.
OpsGenie account, for notifications and phone call alerts. Click here if you want a guide to set it up.

Installation

TIP: If your terminal is telling you that you do not have permissions to run a command try adding sudo to your command e.g, sudo docker --version this will run your command as root. If you have any issues during the installation procedure check out the FAQ section.

Git Installation

Note: Skip this step if Git is already installed.

Firstly we will install and verify your Git installation.

# Install Git
sudo apt install git

# Verify that git is now installed
git --version

This should give you the current version of git that has been installed.

Docker and Docker Compose Installation

Note: Skip this step if Docker and Docker Compose is already installed.

First, install Docker and Docker Compose by running these commands on your terminal.

# Install docker and docker-compose
curl -sSL https://get.docker.com/ | sh
sudo apt install docker-compose -y

# Confirm that installation successful
docker --version
docker-compose --version

These should give you the current versions of the software that have been installed. At the time of writing the current working docker version is 20.10.10 while the docker-compose version is 1.25.0. If you have a different version that doesn't allow you to run the docker-compose.yml file then either upgrade your versions of docker and docker-compose or change the version inside of the docker-compose.yml file which is currently at 3.7.

Configuration Setup

# Clone the panic repository and navigate into it
git clone https://github.com/SimplyVC/panic
cd panic

Now that you're inside the PANIC directory, open up the .env file and change the UI_ACCESS_IP field to the IP of where PANIC UI is going to be hosted (can be set to localhost if running locally). This is to ensure that the API (PANIC UI Backend) is only accessible from the UI. Helper scripts which can be used to get the IP address (scripts/get_ip_linux.sh and scripts/get_ip_mac.sh) are available but please note that these are not guaranteed to work on all servers/machines.

# This will access the .env file on your terminal
nano .env

Once inside change UI_ACCESS_IP accordingly. Here is an example:

UI_ACCESS_IP=1.1.1.1

Then to exit hit the following keys:

To exit your .env file: CTRL + X
To select yes to save your modified file: Y
To confirm the file name and exit: ENTER

Running PANIC

Once you have everything setup, you can start PANIC by running the below command:

docker-compose up -d --build

NOTE If build fails run these commands to clean your docker images and try again. Please be aware that these commands will also stop other docker images that you might have running on your system.

docker-compose kill
docker system prune -a --volumes
docker-compose up -d --build

The next step is to configure PANIC to monitor your nodes and systems as well as give it the channels to alert you through. You can do this by navigating to the PANIC UI at https://{UI_ACCESS_IP}:3333, or at https://localhost:3333 if you're running it locally. The PANIC UI will start the installation procedure if it does not detect any configurations. Make sure you type HTTPS if you're getting an error when accessing PANIC UI on your browser.

After you set-up PANIC to your liking the installation procedure will save these details in the Mongo database. For correct behavior the database should never be modified manually. If you would like to edit the configurations at some point you can do so by accessing the settings option on the PANIC UI header.

PANIC will automatically read these configuration files and begin monitoring the data sources. To confirm that PANIC is running as expected we suggest running the command docker-compose logs -f alerter and docker-compose logs -f health-checker. By this you can see the different components starting up. If you have set-up telegram/slack commands we suggest that you enter the command /status (telegram) or /panicstatus (slack) to check that all PANIC components are running. If you want to check that every PANIC component is up and running without any issue we suggest that you check that all the logs inside panic/alerter/logs have no errors.

After PANIC is up and running you can visualise node metrics and alerts using PANIC UI at https://{UI_ACCESS_IP}:3333, or at https://localhost:3333 if you're running it locally. For more information regarding PANIC UI, click here.

Congratulations you should have PANIC up and running!

Optional Installations

Node Exporter Setup

Note: This needs to be done on every host machine that you want the system metrics monitored and alerted on.

GitHub link to most recent version of Node Exporter we support.

Create a Node Exporter user for running the exporter:

sudo useradd --no-create-home --shell /bin/false node_exporter

Download and extract the latest version of Node Exporter:

wget https://github.com/prometheus/node_exporter/releases/download/v0.18.1/node_exporter-0.18.1.linux-amd64.tar.gz
tar -xzvf node_exporter-0.18.1.linux-amd64.tar.gz

Send the executable to /usr/local/bin:

sudo cp node_exporter-0.18.1.linux-amd64/node_exporter /usr/local/bin/

Give the Node Exporter user ownership of the executable:

sudo chown node_exporter:node_exporter /usr/local/bin/node_exporter

Perform some cleanup and create and save a Node Exporter service with the below contents:

sudo rm node_exporter-0.18.1.linux-amd64 -rf
sudo nano /etc/systemd/system/node_exporter.service

[Unit]
Description=Node Exporter
Wants=network-online.target
After=network-online.target

[Service]
User=node_exporter
Group=node_exporter
Type=simple
ExecStart=/usr/local/bin/node_exporter
 
[Install]
WantedBy=multi-user.target

Reload systemctl services list, start the service and enable it to have it start on system restart:

sudo systemctl daemon-reload
sudo systemctl start node_exporter
sudo systemctl enable node_exporter
sudo systemctl status node_exporter

Check if the installation was successful by checking if {NODE_IP}:{PORT}/metrics is accessible from a web browser.

Back to the Requirements

Telegram Setup

To create a free Telegram account, download the app for Android / iPhone and sign up using your phone number.
To create a Telegram bot, add @BotFather on Telegram, press Start, and follow the below steps:
1. Send a /newbot command and fill in the requested details, including a bot name and username.
2. Take a note of the API token, which looks something like 111111:AAA-AAA111111-aaaaa11111.
3. Access the link t.me/<username> to your new bot given by BotFather and press Start.
4. Access the link api.telegram.org/bot<token>/getUpdates, replacing <token> with the bot's API token. This gives a list of the bot's activity, including messages sent to the bot.
5. The result section should contain at least one message, due to us pressing the Start button. If not, sending a /start command to the bot should do the trick. Take a note of the "id" number in the "chat" section of this message.
6. One bot is enough for now. You can repeat these steps to create more bots.

At the end, you should have:

A Telegram account
A Telegram bot (at least one)
The Telegram bot's API token (at least one)
The chat ID of your chat with the bot (at least one)

Back to the Requirements

Slack Setup

Login to Slack using an existing account or sign up for a Slack account.
If you are not in a workspace, join a workspace from an invite or create a workspace.
Create a channel within the workspace which will be used to receive notifications and interact with the Slack Bot.
To create a Slack app, visit the slack apps page and press Create New App. The steps below are to be used to set-up the app, which includes gathering the app-level token, the bot token, and the channel ID:
1. Click the From an app manifest option in the pop-up window.
2. Select the workspace which contains the target channel.
3. Copy the YAML app manifest provided within the PANIC repository and overwrite the default YAML provided by Slack.
4. Click Next followed by Create.
5. Scroll down to the App-Level Tokens section and click Generate Token and Scopes.
6. Enter a Token Name (this is just a reference to the token which can be set to anything), add the connections:write scope, and click Generate.
7. Take note of the Token generated, this is the App-Level Token.
8. Go to the 'Install App' setting (left pane) and click Install to Workspace, followed by Allow.
9. Go to the 'OAuth & Permissions' feature (left pane) and take note of the Bot User OAuth Token.
10. Add the newly created PANIC Notifications app to the target channel by typing /add within the channel and selecting Add apps to this channel.
11. Right click the actual app that was added to the workspace, then Open app details and from there + add app to channel.
Go to the Slack client, right click the name of the target slack channel within the list of channels (left pane), click Open channel details, and take note of the Channel ID (found at the bottom).

At the end, you should have:

Access to a Slack workspace
A Slack account, app, and channel
The Slack app's Bot User Token and App-Level Token
The ID of the target slack channel

Back to the Requirements

Twilio Setup

To create a free trial Twilio account, head to the try-twilio page and sign up using your details.
Next, three important pieces of information have to be obtained:
1. Navigate to the account dashboard page.
2. Click the 'Get a Trial Number' button in the dashboard to generate a unique number.
3. Take a note of the (i) Twilio phone number.
4. Take a note of the (ii) account SID and (iii) auth token.
All that remains now is to add a number that Twilio is able to call:
1. Navigate to the Verified Caller IDs page.
2. Press the red + to add a new personal number and follow the verification steps.
3. One number is enough for now. You can repeat these steps to verify more than one number.

At the end, you should have:

A Twilio phone number.
The account SID, available in the account dashboard.
The auth token, available in the account dashboard.
A verified personal phone number (at least one)

If you wish to explore more advanced features, PANIC also supports configurable TwiML; instructions which can re-program Twilio to do more than just call numbers. By default, the TwiML is set to reject calls as soon as the recipient picks up, without any charges. This can be re-configured from the twilio section of the .env file to either a URL or raw TwiML instructions.

Back to Requirements

PagerDuty Setup

It is assumed that a user has previously used PagerDuty and has a PagerDuty Account, if not head to the PagerDuty sign-up page and sign up using your details.
First you need to add a service, and get two important pieces of information.
- Firstly the integration key:
  1. Navigate to the + Add new services button on the right side of the page
  2. Name your service and give it a description
  3. In the Integration Settings select Use our API directly and choose Events API v2,
  4. The rest can be configured to your preferences.
  5. Click Add Service
  6. You will be taken to a new page, where you need to navigate to the Integrations tab and take note of the (i)Integration Key.

At the end, you should have:

The Integration Key

This will be used later on in the installation procedure.

Note You can also install an app for Android / iPhone as well as setup your phone number to receive alerts.

Back to Requirements

Opsgenie Setup

It is assumed that a user has previously used Opsgenie and has an Opsgenie Account, if not head to the Opsgenie sign-up page and sign up using your details.
Let's go through the process of setting up your API.
1. Click on Integrate with Jira and your monitoring tools on your home page.
2. Make sure API integration is selected
3. Click Save integrations
4. Click Now, go to the integrations page and explore
5. Navigate to the API you just set up and take note of API Key.

At the end, you should have:

The API token

This will be used later on in the installation procedure.

Note You can also install the Opsgenie app for Android / iPhone as well as setup your phone number to receive calls.

Back to Requirements

Replacing SSL certificates (recommended)

Apply your own SSL certificate signed by a certificate authority. The SSL certificate (cert.pem) and the key (key.pem) should be placed in the panic/certificates folder, and they should replace the existing dummy files. Note that these dummy files were given just for convenience as the API (PANIC UI Backend) server won't start without them, however, for maximum security these must be replaced.

We suggest reading this for more details on SSL certificates, and how to generate a self signed certificate in case you do not want to obtain a certificate signed by a certificate authority. However, for maximum security, the self signed certificate option is not recommended.

Running the PANIC test suite

To run the tests for the alerter component within PANIC, do the following:

docker-compose kill  # To stop any running containers (to avoid conflicts)
docker-compose -p panic-tests -f docker-compose-tests.yml up --build -d  # To build the tests container
docker-compose -p panic-tests -f docker-compose-tests.yml logs test-suite  # To see the result of the tests
docker-compose -p panic-tests -f docker-compose-tests.yml kill  # To remove test environment

To run the tests for the API component within PANIC, navigate to the /api directory and do the following:

npm install         # Install API project dependencies
npm test            # Run API unit tests

To run the tests for the Substrate API component within PANIC, navigate to the /substrate-api directory and do the following:

npm install         # Install API project dependencies
npm test            # Run API unit tests

To run the tests for the UI component within PANIC, navigate to the /ui directory and do the following:

npm install         # Install UI project dependencies
npm test            # Run UI E2E and unit tests

Support and Contribution

On top of the additional work that we will put in ourselves to improve and maintain the tool, any support from the community through development will be greatly appreciated.

Who We Are

Simply VC runs highly reliable and secure infrastructure in our own datacentre in Malta, built with the aim of supporting the growth of the blockchain ecosystem. Read more about us on our website and Twitter:

Simply VC website: https://simply-vc.com.mt/
Simply VC Twitter: https://twitter.com/Simply_VC

panic's People

Contributors

Stargazers

Watchers

panic's Issues

Add SIGNL4 as Alerting Channel

Add the app-based alerting service SIGNL4 as additional alerting channel in PANIC.

I just talked to a SIGNL4 user that wanted to send alerts from PANIC to his SIGNL4 team. After some investigation I found this repository and wonder if you are open to add SIGNL4 as an additional alerting channel. If so, I would be happy to add a pull request for the same.

Thanks a lot

Ron

Bug in alerting for total headers received and last processed block height (PANIC-608)

While running the chainlink integration, Francesco Cremona reported this:

After investigating the data on prometheus and the monitoring logs on the vm I noticed that the monitors are not throwing 7 minute gaps in data. This means we should look at the alerting logic, specifically the classify_no_change_in_alert function to see if there is something incorrect in the logic. We should run the alerter from our side using the configs which are running on the vm, to make sure that the issue is replicated.

Alerters - Implement/Adapt Manager Logic for Cosmos

Installation Wizard - Alerts Setup

As a node operator, I want to be able to customize what metrics I want to be alerted on and their respective threshold.

Requirements

A page must be displayed at /installation/alerts
It must have an svc-data-table to configure the threshold values for each alert
It must have an svc-data-table to configure the severity for each alert
A "info" button must be provided, so the node operator can click and get to know more information about what he's configuring at the moment. This in-depth info must be displayed in a modal.
The page must have a "back" button, so the node operator can navigate to the previous step (Repositories setup)
The page must have a "next" button, so the node operator can navigate to the next step (Config Review)
When the "next" button is clicked, the current configuration is saved in the local storage (browser cache)

Investigate the use of Python linters for the Alerter

We need to investigate whether it would make sense to start using Python linters for the alerter. This would promote consistency since there are multiple developers using different IDEs.

The outcome of this ticket is to investigate whether it is beneficial to use a linter, and to devise a list of linters which can be used in our project.

First Installation Modal

As a node operator, I want to be prompted to start the wizard when accessing PANIC UI home page for the first time.

Requirements

read the value IS_FIRST_INSTALLATION from #41
open a modal with a brief message (explaining what's about to happen) + a button so the user can start the wizard
while the first configuration isn't concluded with success, the node operator must not be able to access any page and the modal should be always shown on the home page

Installation Wizard - Review

As a node operator, I would like to see a page showing a summary of my sub-chain configuration, so I can review and approve it.

Requirements

A page must be displayed at /installation/review.
A "go back" button must be displayed. It takes the node operator to the previous step (Alerts Setup).
A checkbox with a label "This configuration is correct" must be displayed
A "next button" must be displayed. It takes the node operator to the feedback page.

Web UI Iteration 2: Make the UI backend compatible with configs stored inside MongoDB and not Redis

Since the UI will be storing the configurations inside mongo db, there is no longer need for us to retrieve the monitorables info from Redis, since this can be done directly from Mongo. In this ticket we should fix the API endpoints which are getting configs data from Redis.

Bug in Alert Store when storing alert_no_synced_data_sources alert

When testing out the alerter I've noticed that the Alert Store was raising errors when trying to store the alert_no_synced_data_sources alert:

After a brief investigation I have noticed that this is because the keys are missing from the file alerter/src/data_store/redis/store_keys.py. Therefore we need to add this key, where this key should be equivalent to the value set in alerter/src/alerter/grouped_alerts_metric_code/contract/chainlink_contract_metric_code.py.

We must also perform the above exercise for every possible alert, to confirm that all alerts are being stored in Redis. i.e. we need to go over each script inside alerter/src/alerter/grouped_alerts_metric_code and check that a key was created with the appropriate notation for each entry. This must be done because I noticed that contracts_not_retrieved is also missing.

Monitors - Adapt Node Monitors Manager for Cosmos

Alerter: Investigate whether we are able to use BlockingConnection for ConfigsManager (PANIC-615)

As described in a meeting, RabbitMQ has the following short comings:

If a connection is idle for more than 60 seconds it is dropped

Now, since we need to wait for events to occur before sending the configurations, the connections are dropped. This means that there is a large delay from when the configs are created/modified/deleted till when the configs are sent.

Therefore we need to investigate whether we can use an asynchronous connection (SelectConnection) instead of the usual BlockingConnection in order for the connection to not be dropped. Note that this means that rabbitmq would then run in a separate thread and hence there would not be any blocking issues. This might mean that some form of shared memory must be implemented between the thread listening for events and the thread handling rabbitmq and thus the message sending. Please make sure that no Race condition occurs.

Installation Wizard - Add Nodes

As a node operator, I want to be able to add nodes that I want to monitor.

Requirements

A "home page" for Nodes setup must be created and be displayed at /installation/nodes.
The page must have two central elements:
- A button with the label "Add Node", allowing the node operator to click and open the form. The form will be shown in a modal (no redirection to a different page).
- A svc-data-table which is only rendered when the node operator has added at least one node. The data table will be set to crud mode, therefore rendering two buttons, for editing/deleting respectively:
  - When the node operator clicks in the "edit" button, a modal is shown with a form populated with the node data.
  - When the node operator clicks in the "delete" button, a dialog is shown asking the node operator to confirm the action.
The page must have other 3 secondary elements:
- A "go back" button. It leads the node operator to the previous step (Channels).
- A "next" button. It leads the node operator to the next step (Repositories).
- When the "next" button is clicked, the current configuration is saved in the local storage (browser cache)
- A "help" button so the node operator can click and get to know more information about what he's configuring at the moment. This in-depth info must be displayed in a modal.

CRUD Sub-Chains - Nodes CRUD

As a node operator, I would like to see a page showing the nodes configured under a given sub-chain and be able to perform CRUD operations over it.

Requirements

A page located at /sub-chains/SUB_CHAINS_ID/nodes must be created
The page must have a button with the label "Add Node", to allow the node operator to add more nodes under that specific sub-chain. The form for the new nodes must be displayed at /sub-chains/SUB_CHAIN_ID/nodes/create.
The page must have an svc-data-table component showing the nodes that have been added so far (or a message telling the node operator "Hey, you need to add your first node..." if no nodes have been added yet). Once the data table has been populated, a special column will be rendered, allowing the node operator to perform edit/delete actions over the nodes.
- When the node operator clicks in the "edit" button, a modal is shown with a form populated with the node data.
- When the node operator clicks in the "delete" button, a dialog is shown asking the node operator to confirm the action.
A "go back" button must be displayed in the header, at the far left. It takes the user back to the previous step (Sub-Chains Menu Page)
A "help" button so the node operator can click and get to know more information about what he's configuring at the moment. This in-depth info must be displayed in a modal.

Installation Wizard - Welcome Page

As a node operator, once I click on the "start wizard" button, I want to see a "welcome page" explaining to me the process that I'm about to start.

Requirements

Create a page with a title something like "Welcome Page" located at /installation/welcome
The page must explain to the node operator the steps that he's going through, meaning:
- Sub-chain setup
- Channels setup
- Nodes setup
- Repos setup
- Alerts setup
- Review page
- Feedback page
It must have a button to allow the user to kick off the process

Installation Wizard - Add Repos

As a node operator, I want to be able to add repositories that I want to be alerted on.

Requirements

A "home page" for Repositories setup must be created and be displayed at /installation/repositories.
The page must have two central elements:
- A button with the label "Add Repository", allowing the node operator to click and open the form. The form will be shown in a modal (no redirection to a different page).
- A svc-data-table which is only rendered when the node operator has added at least one node. The data table will be set to crud mode, therefore rendering two buttons, for editing/deleting respectively:
  - When the node operator clicks in the "edit" button, a modal is shown with a form populated with the repository data.
  - When the node operator clicks in the "delete" button, a dialog is shown asking the node operator to confirm the action.
The page must have other 3 secondary elements:
- A "go back". It leads the node operator to the previous step (Add Node).
- Next button, allowing the node operator to proceed to the next step (Alerts setup)
- When the "next" button is clicked, the current configuration is saved in the local storage (browser cache)
- A "help" button so the node operator can click and get to know more information about what he's configuring at the moment. This in-depth info must be displayed in a modal.

Configure Python linter and reformat code-base

After the investigation performed in #52 , we need to configure our environment so it includes our linting. After configuring the environment, we need to go over our python code-base to check that every script satisfies the linter requirements.

CRUD Sub-Chains - Repositories CRUD

As a node operator, I would like to see a page showing the repositories configured under a given sub-chain and be able to perform CRUD operations over it.

Requirements

A page located at /sub-chains/SUB_CHAINS_ID/repositories must be created
The page must have a button with the label "Add Repository", to allow the node operator to add more repositories under that specific sub-chain. The form for the new repositories must be displayed at /sub-chains/SUB_CHAIN_ID/repositories/create.
The page must have an svc-data-table component showing the repositories that have been added so far (or a message telling the repository operator "Hey, you need to add your first repository..." if no repositories have been added yet). Once the data table has been populated, a special column will be rendered, allowing the repository operator to perform edit/delete actions over the repositories.
- When the node operator clicks in the "edit" button, a modal is shown with a form populated with the repository data.
- When the node operator clicks in the "delete" button, a dialog is shown asking the node operator to confirm the action.
A "go back" button must be displayed in the header, at the far left. It takes the user back to the previous step (Sub-Chains Menu Page)
A "help" button so the node operator can click and get to know more information about what he's configuring at the moment. This in-depth info must be displayed in a modal.

Alerter: Develop a POC which interacts with RabbitMQ (PANIC-614)

The POC developed for PANIC-612 should be extended to interact with RabbitMQ. There are two interactions which we need to cater for:

Sending configs to RabbitMQ with the correct routing key once an event is fired by the database layer
Receiving pings from RabbitMQ and responding with a heartbeat to indicate that the component is alive.

IMP: Since Pika is not thread safe we must make sure that there is only 1 rabbit connection per thread.

Note that this also depends on the investigation done in PANIC-615

CRUD Sub-Chains - Menu Page

As a node operator, I would like to see a page that shows me a menu with the following options (so I can update the config accordingly):

Channels
Nodes
Repos
Alerts

Requirements

A page located at /sub-chains/SUB_CHAIN_ID/menu must be shown (once the node operator clicks in the "Edit" button when viewing the "Sub-Chain Home Page")
The page must have 4 buttons:
- Channels (When the node operator clicks here, a page located at /sub-chains/SUB_CHAIN_ID/channels is shown)
- Nodes (When the node operator clicks here, a page located at /sub-chains/SUB_CHAIN_ID/nodes is shown)
- Repositories (When the node operator clicks here, a page located at /sub-chains/SUB_CHAIN_ID/repositories is shown)
- Alerts (When the node operator clicks here, a page located at /sub-chains/SUB_CHAIN_ID/alerts is shown)
- The above pages will be tackled in individual tickets
A "go back" button must be displayed in the header, at the far left. It takes the user back to the previous page (Sub-Chains Home Page)

Investigation: As a node operator I want to upgrade PANIC as easy as possible (PANIC-606)

We need to perform the following investigation:

We would be able to facilitate the node operators life by including a very simple upgradeability

process for PANIC. Ideally, this should be done in the UI itself. The UI can monitor a list of releases from the PANIC repository, and when a new upgrade is available, the user is notified in the UI accordingly. If the user wants to update, the new code-base is downloaded from the PANIC repository. In this repository we should also include a bash script which performs the update, such as:

Clearing Redis
Migrates how data is stored in Mongo to possibly new ways
Migrates how data is stored in Redis to possibly new ways
Clears rabbitmq
Updates configs

API: Add save and load config functionality in the API (PANIC-612)

The API must be able to store configurations in the configs database as follows:

We should have 3 collections namely:

Chains
Channels
General

In the chains collection we should store data with the following structure concept:

{

     subchains: [

  {

         base_chain: cosmos

         sub_chain: akash

         nodes_config: {section_1:

{node_id: val_1, node_name: val_2}

, section_2: {} }

     systems_config: {}

     id: value

     ..................

   }

]

}

The channels and general collections should have similar data as above, the only difference being in the configurations stored

Alerter: Develop a POC which retrieves data from the database (PANIC-613)

Based on the research done in PANIC-611, we are to develop a POC which retrieves the configurations from the database whenever the configurations are created, updated or deleted.

Installer, UI Iteration 1: The installer should output all possible .ini files even if empty

When running the installer I detected these 2 bugs:

When setting up a chain, the installer does not create every .ini file if no monitorables are set for that file. For example, if you are setting up a chainlink chain and you do not add any systems for monitoring, no system_config.ini is created.
If no channels are set-up, the channels folder is not created.

For the general chain this is done correctly. In general, we should always output every possible .ini file irrespective of whether it is empty or not. The reason is that if the user modifies the configurations while the alerter is offline and some configurations are removed by the installer, then the configs store would not be able to detect that some monitorables have been removed. Therefore, the UI would display some monitorables which are no longer being monitored.

Note that this is a temporary fix until we start storing the configs in mongo. Once we start storing the configs in Mongo we don't even require the configs store to exist.

Web Installer outputting dockerhub configs incorrectly

When testing out the dockerhub implementation I noticed that the schema outputted by the web-installer is incorrect. The web-installer is outputting the following schema:

[docker_09c56c8d-afcb-434b-8102-8cc58d39cfc3]
id=docker_09c56c8d-afcb-434b-8102-8cc58d39cfc3
parent_id=chain_name_2be935b4-1072-469c-a5ff-1495f032fefa
name=dylangalea/testrepo
monitor_docker=true

However the schema should look like this, where name is to be split into repo_namespace and repo_name, and monitor_repo should replace monitor_docker:

[docker_09c56c8d-afcb-434b-8102-8cc58d39cfc3]
id=docker_09c56c8d-afcb-434b-8102-8cc58d39cfc3
parent_id=chain_name_2be935b4-1072-469c-a5ff-1495f032fefa
repo_name=testrepo
repo_namespace=dylangalea
monitor_repo=true

We must also collaborate with Luke to check what should be put inside the docker forms in terms of placeholders, field descriptions, field names and so on.

CRUD Sub-Chains - Channels CRUD

As a node operator, I would like to see a page showing the channels configured under a given sub-chain and be able to perform CRUD operations over it.

Requirements

A page located at /sub-chains/SUB_CHAINS_ID/channels must be created
The page must have a button with the label "Add Channel", to allow the node operator to add more channels under that specific sub-chain. The form for the new channels must be displayed at /sub-chains/SUB_CHAIN_ID/channels/create.
The page must have an svc-data-table component showing the channels that have been added so far (or a message telling the node operator "Hey, you need to add your first channel..." if no channels have been added yet). Once the data table has been populated, a special column will be rendered, allowing the node operator to perform edit/delete actions over the channels.
- When the node operator clicks in the "edit" button, a modal is shown with a form populated with the channels data.
- When the node operator clicks in the "delete" button, a dialog is shown asking the node operator to confirm the action.
A "go back" button must be displayed in the header, at the far left. It takes the user back to the previous step (Sub-Chains Menu Page)
A "help" button so the node operator can click and get to know more information about what he's configuring at the moment. This in-depth info must be displayed in a modal.

Onboarding Chainlink and EVM to API (PANIC-593)

Since in the Alerter we are now supporting Chainlink and EVM Node alerting, we need to also raise node problems inside the UI's Chainlink section. Currently, the API is only aware about system and repo problems. Therefore, we need to add the new Chainlink and EVM node keys inside the API. This should also be applied to DockerHub repos.

Note: We need to also check that the configs store and the alert store already cater for both Chainlink and EVM nodes, otherwise we can't get the correct list of monitorables and list of problems respectively from the API.

Alerter: Investigate technologies needed for configs database retrieval (PANIC-611)

We need to investigate what technologies we need to use in order to retrieve configurations from the database.

So far we proposed the use of MongoDB with triggers/ChangeStream, however if this is not possible, the use of an SQL database is not excluded

Installation Wizard - Add Channels

As a node operator, I want to be able to add channels that I want to be alerted on.

Requirements

A "home page" for Channels setup must be created and be displayed at /installation/channels
The page must have two central elements:
- A button with the label "Add Channel", allowing the node operator to click and open the form. The form will be shown in a modal (no redirection to a different page).
- A svc-data-table which is only rendered when the node operator has added at least one channel. The data table will be set to crud mode, therefore rendering two buttons, for editing/deleting respectively:
  - When the node operator clicks in the "edit" button, a modal is shown with a form populated with the channel data.
  - When the node operator clicks in the "delete" button, a dialog is shown asking the node operator to confirm the action.
The page must have other 4 secondary elements:
- A "help" button so the node operator can click and get to know more information about what he's configuring at the moment. This in-depth info must be displayed in a modal.
- A Channel Type filter, so the node operator can filter the data table. It's only shown when the svc-data-table has at least one record. The implementation looks something like this: an svc-select wrapped by an svc-filter to fire events with the selected channel type (telegram, twillio, etc) and then update the data table as necessary.
- A "go back" button. It leads the node operator to the previous step (Add Sub-Chain).
- A "next" button. It leads the node operator to the next step (Nodes seup).
- When the "next" button is clicked, the current configuration is saved in the local storage (browser cache)

System test dockerhub code-base and review code at high-level before merging (PANIC-618)

Bug in downtime for EVM Nodes

While running the alerter, @Cherrett noticed the following error in the logs:

After carefully investigating, I noticed that we have the following piece of code in the EVM Node alerter logic:

Therefore we are passing the entire dict of went_down_at rather than the current value

Note: When tackling this issue we should also check if we are doing the parsing correctly for every other alerter.

Alerter - Implement Cosmos Node Alerter (PANIC-557)

Add Settings Menu

As a node operator, I want to be able to click in a menu called Settings and see the following options:

Chains (To access the sub-chains setup)
Channels (To access the channels setup)

Requirements

Add a menu called Settings in the home page header
The component used must be the svc-select from https://www.npmjs.com/package/@simply-vc/uikit
Chains menu option leads the user to a page located at /chains
Channels menu option leads the user to a page located at /channels
Settings menu must be positioned at the left side of Networks menu

Detect PANIC First Installation

Background

We need to come up with a logic to detect if it's the very first time that PANIC UI is running, so we can trigger the modal inviting the user to kick off the installer.

Suggestion to whoever is tackling this: I think that we can have a quick meeting to discuss the approach.

Requirements

A flag somewhere saying "Hey, this is the first time running!".
The flag must be accessible from the service PANIC UI.

Installation Wizard - Feedback

As a node operator, I would like to see a page that tells me that the process was completed with success. The page must also give me the option to restart the installer - to add more sub-chains - or go to the home page.

Requirements

A page must be displayed at /installation/feedback.
A brief message/picture must be displayed giving the user a positive feedback
Two buttons must be displayed:
- "restart process" button, so the node operator can add more sub-chains
- "go to home page" button, so the node operator can navigate back and check the config that he just did

Installation Wizard - Add Sub-Chain

As a node operator, I want to be able to set up a sub-chain based on the supported base-chains.

Requirements

A "home page" for Sub-Chain setup must be created and be displayed at /installation/sub-chain
The page must have a form with two inputs:
- Base-chain: a svc-select with a list of base-chains
- Name: an svc-input type text to allow the node operator input the sub-chain name
The page must have an info icon so the node operator can get to know more info about the current step
A "go back" button must be provided, so the node operator can click and navigate back to "welcome page"
A "next" button must be provided, so the node operator can click and navigate to the next step (Channels)
When the "next" button is clicked, the current configuration is saved in the local storage (browser cache)

CRUD Sub-Chains - Alerts

As a node operator, I would like to see a page showing two svc-data-table components, one for alerts threshold and the other for severity.

Requirements

A page located at /sub-chains/SUB_CHAIN_ID/alerts must be created
The page will have two svc-data-table components:
- Threshold Alerts (allowing the node operator to configure the warning and critical threshold per alert and to enable/disable it)
- Severity Alerts (allowing the node operator to set the level of severity of a given alert and to enable/disable it)
A "go back" button in the header, placed at the far left. It leads the node operator to the previous step (Sub-Chains Menu Page).
A "help" button so the node operator can click and get to know more information about what he's configuring at the moment. This in-depth info must be displayed in a modal.

Just for visual reference: https://www.figma.com/file/Zl0eqJvxkrBnh0hqhIKI9x/Web-Installer-Screenshots?node-id=0%3A1

Remove metric state storing from the Alert Store

Technical Story

When receiving an alert from the Alerter the AlertStore performs the following:

It stores the alert in Mongo
Stores the state of the alerted metric inside Redis. This is then used by the API/UI to display the problems in the Overview Dashboard.

Following the SRP (single responsibility principle) the AlertStore should perform one job only, that being of storing the alert in Mongo. Therefore, we need to create another component which continuously checks the values of each metric and compares them to the alertable thresholds/conditions.

In order to do this change we need to perform the following tasks:

Remove metric state storing logic from the AlertStore
Remove internal alerts mechanism as it would no longer be needed
Develop the MetricsStateStore by integrating each monitorable in a granular way using the Strategy pattern

Description

The aim of this ticket is to remove the metric state storing logic from the AlertStore

Requirements

To achieve the aims of this ticket you need to perform the following:

Remove the metric state storing logic from the AlertStore
Remove any variables/constants/helper functions that are no longer being used.
Modify the AlertStore unit tests accordingly

Acceptance criteria

When: An alert is raised
Then: The alert is stored in MongoDB

When: An alert is raised
Then: No metric state is stored in Redis

When: Checking the two acceptance criteria above
Then: There are no errors in the AlertStore logs

Update UI to use system/node/repo names instead of IDs (PANIC-616)

It was noted that the sources dropdown within the alerts overview component showed the IDs of the sources. This dropdown should instead show the names of the sources (as gathered from the API). This is also the case for the systems overview component which should show the names of the systems rather than their IDs

Data Store - Implement Data Store for Cosmos Nodes

Monitors - Implement Cosmos Network Monitors Manager

The Cosmos Network Monitors Manager must be able to start a network monitor for a cosmos chain based on the node_configurations it received.

For each sub-configuration, a network monitor is started only if the following are satisfied:

monitor_network is true for every sub-configuration
There is at least 1 data source irrespective of what endpoints were enabled/disabled and their value. Data sources should be grouped as [non-validators, validators]

See implementation of Substrate for an idea, however, the logic there is a bit different.

Other important things to modify:

Slack command handler should include the Cosmos Network Monitor
Telegram commands handler should include the Cosmos Network Monitor
A Cosmos Network Monitor starter should be implemented
All Cosmos Network Monitor Manager config queues should be initialised in the run_alerter
The run_alerter should be able to start the Cosmos Network Monitors Manager

Data Transformers - Implement Cosmos Node Data Transformer (PANIC-553)

Implement a Data Transformer for the Cosmos Node data. This data transformer should not cater for the network data, this should be catered for in the CosmosNetworkDataTransformer.

Web UI Iteration 2: Remove Configs Store from Alerter code-base

Since the UI will be storing the configurations inside mongo db, there is no longer need for us to store the configs inside Redis. The API backend can now compute the monitorables info using mongodb. The task here is to remove the configs store from the alerter.

Implement Tendermint Endpoint Monitoring (PANIC-587)

Due to some buggy endpoints in the Cosmos Rest Server, we will opt in using the Tendermint RPC endpoints to monitor Slashing Events and Block Signatures Missed/Committed.

Since the data mentioned above should be obtained via an indirect node, we need to also test whether the Tendermint RPC endpoint of the monitored node is online.

Monitor retrieved data should be on debug mode for the logs (PANIC-610)

A lot of log files are created from the monitors because at each monitoring round we are outputting the data being retrieved by the monitors.

In general these logs should be put as debug.

CRUD - Sub-Chains Home Page

As a node operator, I would like to see a page that shows all sub-chains that I have configured and a filter, so I can filter the sub-chains by base-chain.

Requirements

An menu item with the label "Sub-Chains" must be added to the Settings menu
When the node operator clicks in the option, a page located at /sub-chains is shown
The page must have an svc-data-table, to allow the node operator to perform CRUD operations over the sub-chains
The page must have a filter, to allow the node operator to filter the data table by base-chain
The page must have an "Add Sub-Chain" button, to allow the node operator to add new sub-chains
When the node operator clicks in the "Edit" button (to edit a specific sub-chain), the app is redirected to /sub-chains/SUB_CHAIN_ID/menu
When the node operator clicks in the "Delete" button, a dialog is shown asking the node operator if the sub-chain (along with all its configuration*) should be removed.
- *All the configuration under that deleted sub-chain must be removed, but the channels remain intact since they can be used across different sub-chains

Edit Network monitoring fields from Web-Installer

We need to perform two changes to the node configuration form for Cosmos nodes:

Removal of the Governance Addresses from the front-end, loading and saving as this is no longer needed by the network monitor
The addition of another form beneath the node configuration form which asks the user if he wants to monitor the network for governance or not. If yes, monitor_network should be set to true for every added node configuration. If not, monitor_network should be set to false for every added node configuration. If the user has not added any nodes yet, the form should be disabled by default.

Data Transformers - Implement/Adapt Manager Logic for Cosmos (PANIC-555)

Fix improper teardown in API tests (PANIC-598)

When running the API tests, the following warning is show in the logs:

After running jest using --detectOpenHandles I noticed that the issue is with the timer intervals set for Mongo and Redis inside server.ts. We have 2 options to solve this:

Wrap the logic in server.ts inside a function and use error handling to detect any error. When an error is detected, clear the timer and terminate
Set a variable for both timers and export it to the test suite, so the test suite clears the timer after each test in its own teardown.

Data Store - Implement/Adapt Manager Logic for Cosmos (PANIC-563)

Monitors - Implement CosmosNetworkMonitor (PANIC-580)

This class should monitor the network related data. The only metrics we are going to monitor are the number of referendums, and referendum data so that we can alert if a new referendum is raised and/or the validator has not voted yet. These metrics are obtained from cosmos rest.

Vitaly's research to check what data there is related to these metrics, and what alerts we need to show.

Look at the substrate one as an example to understand more the logic.

simplystaking / panic Goto Github PK

panic's Introduction

PANIC Monitoring and Alerting for Blockchains

Design and Features

Installation Guide

Requirements

Optional

Installation

Git Installation

Docker and Docker Compose Installation

Configuration Setup

Optional Installations

Node Exporter Setup

Create a Node Exporter user for running the exporter:

Download and extract the latest version of Node Exporter:

Send the executable to /usr/local/bin:

Give the Node Exporter user ownership of the executable:

Perform some cleanup and create and save a Node Exporter service with the below contents:

Reload systemctl services list, start the service and enable it to have it start on system restart:

Telegram Setup

Slack Setup

Twilio Setup

PagerDuty Setup

Opsgenie Setup

Replacing SSL certificates (recommended)

Running the PANIC test suite

Support and Contribution

Who We Are

panic's People

Contributors

Stargazers

Watchers

Forkers

panic's Issues

Technical Story

Description

Requirements

Acceptance criteria

Recommend Projects

Recommend Topics

Recommend Org