pokt-network / gateway-server Goto Github PK

View Code? Open in Web Editor NEW

11.0 11.0 5.0 744 KB

Interact with the POKT Protocol with ease

License: MIT License

Go 97.02% Dockerfile 0.44% Shell 2.53%

gateway-server's People

Contributors

Stargazers

Watchers

Forkers

saltatory porters-xyz 0xthresh maxitosh memosys

gateway-server's Issues

[CHORE] Add Unit tests for Relay Controller

Unit Test Coverage Enhancement

Issue:

To fortify the reliability and maintainability of the Gateway stack, we should have unit tests for relay controller by mocking out relay client responses and as well fasthttp context

Description:

The current relay controller lacks sufficient unit test coverage for scenarios such as

invalid inputs
invalid path
bad response from relay client (mock out the relay client response)
success (Happy case)

Objectives:

Unit Test Coverage: Develop a suite of unit tests covering critical components and functionalities for relay controller

Acceptance Criteria:

tests invalid input
tests invalid path
test bad response from relay client
happy case

[FEATURE] Gateway QoS

Quality of Service (QoS) Checks Enhancement

Issue:

As part of the Gateway Stack's transition from alpha to Release Candidate (RC), implementing Quality of Service (QoS) checks is crucial. This initiative aims to enable Pocket responses with a 99% success rate, ensuring reliability and performance.

Description:

The current state lacks comprehensive QoS checks, which can impact the reliability of Pocket responses. This issue focuses on implementing checks that guarantee a 99% success rate, thereby improving the overall quality of service provided by the Gateway Stack.

Objectives:

QoS Checks Implementation: Enable QoS checks to ensure Pocket responses with a 99% success rate.

Acceptance Criteria:

QoS checks are implemented to achieve a 99% success rate in Pocket responses.

Additional Information (based off eng discussion):

A session contains of 24 nodes per app stake. You can get a lot of diverse nodes, but every hour, they are rotated.

This presents a problem as we don't know which nodes are healthy. So we kickstart off a process called priming which will send arbitrary payloads (100-200 requests) to determine the state of the node per app session.

Every new session, we will prime for healthiness and latency, this will give us the most update to date information. This data as part of the priming process will be placed in a hot cache and reseted frequently.

Of course, this presents another problem because the priming process will take time to yield results (10s to 60s) and we will need a way to determine how do we route traffic still. In this case, we'll leverage a cold storage of historical information to determine which nodes are likely to return a good response.

That is the high level overview. We will be focused on the hot storage implementation and the health checks. Cold storage requires more engineering work and infrastructure needs, and doesn't yield much results except for the first couple of seconds. Cold storage likely will require more eng scoping but since we follow an iterative engineering process, there is no need to place emphasis on it immediately

Height/integrity check fails for near resulting in no nodes being selected

Describe the Bug

When using an app staked for near in gateway server it fails to grab the block height with an invalid syntax error. Which then results in "cannot find source of truth for data integrity check" which means you get "Something went wrong node selector can't find node" when trying to send a relay.

My guess here is that it is due to near not confirming to jsonRPC spec.

Expected Behavior

Gateway server should be able to grab the block height of near in health checks

Steps to Reproduce

Stake an app with near
Launch gateway server with said app stake
Observe Logs for failed health/integrity check
Try sending a relay on 0052

[FEATURE] Observability Enhancement for Gateway Server Metrics

Observability Enhancement for Gateway Server Metrics

Issue:

As part of the transition from alpha to Release Candidate (RC) for the Pocket Network Gateway Stack, it is crucial to enhance observability on the gateway server. This will provide valuable insights into the performance, success rates, errors, and session dispatching rates, allowing for better monitoring and troubleshooting.

Description:

The current state of the gateway stack lacks comprehensive observability features required to measure and analyze key metrics. To address this, we propose the implementation of observability enhancements to the gateway server. The goal is to enable Prometheus to scrape relevant metrics and provide a clear picture of the gateway's performance.

Objectives:

Metric Implementation: Implement instrumentation within the gateway server to capture essential metrics such as success rates, errors, and session dispatching rates.
Prometheus Integration: Allow for Prometheus to scrape and collect the metrics exposed by the gateway server, enabling real-time monitoring and alerting.
Dashboard Creation: Develop a user-friendly dashboard to visualize the collected metrics, facilitating easy interpretation for developers and operators.

Acceptance Criteria:

Prometheus successfully scrapes relevant metrics from the gateway server.
Metrics include but are not limited to success rates, errors, and session dispatching rates.
A comprehensive dashboard is created for visualizing the metrics.
Documentation is updated to include information on the newly implemented observability features.

[BUG] Update Application registry when chains change.

Describe the Bug

We currently update the application cache whenever a new pokt application is added or removed. However, we don't update application cache when the chains change.

Expected Behavior

Application registry should update applications whenever chains change as well.

Steps to Reproduce

Step 1
Step 2
...

Screenshots

If applicable, add screenshots to help explain your problem.

Environment

OS: [e.g., Windows 10, macOS]
Version: [e.g., 22]

Additional Context

Add any other context about the problem here.

[CHORE] Write tests for pkg Pocket Client V0

Unit Test Coverage Enhancement

Issue:

To fortify the reliability and maintainability of the Gateway stack, we should also have a unit tests for the pokt v0 client, including the helper functions and mocking out the fastify HTTP requests.

Objectives:

Unit Test Coverage: Develop a suite of unit tests covering critical components and functionalities for pokt client

Acceptance Criteria:

test cases for the pokt v0 client

[CHORE] Encrypt and Load APP_STAKE_KEYS from Encrypted File

Chore Task: Encrypt and Load APP_STAKE_KEYS from Encrypted File

Issue:

Currently, sensitive information (APP_STAKE_KEYS) is stored in an unencrypted format in a .env file. To improve security, this information should be encrypted and loaded from an encrypted file. The task involves changing the configuration to use APP_STAKE_KEYS_FILE instead of a direct key in the .env file and modifying the main process to prompt for a passphrase to unlock and load the encrypted app stakes.

Description:

Encrypting the APP_STAKE_KEYS and loading them from an encrypted file enhances security by protecting sensitive information. Changing the configuration to use APP_STAKE_KEYS_FILE allows for better management of encrypted data. The main process will be modified to prompt for a passphrase during initialization to unlock and load the encrypted app stakes.

Objectives:

Encryption Implementation:
- Encrypt APP_STAKE_KEYS and save the encrypted data in a designated file (APP_STAKE_KEYS_FILE).
- Ensure the encryption process uses AES256 for robust security.
Configuration Update:
- Update the configuration to use APP_STAKE_KEYS_FILE instead of directly specifying the key in the .env file.
- Ensure the configuration change is well-documented for future reference.
Passphrase Prompt:
- Modify the main process to prompt for a passphrase during initialization.
- Implement a secure mechanism to accept and verify the passphrase.
- Ensure the passphrase is used to unlock and load the encrypted app stakes.

Acceptance Criteria:

Encryption:
- APP_STAKE_KEYS is successfully encrypted using AES256 and stored in the specified file (APP_STAKE_KEYS_FILE).
- The encryption process is secure and does not compromise the integrity of the keys.
Configuration Update:
- The configuration is updated to use APP_STAKE_KEYS_FILE.
- The application functions correctly with the updated configuration.
Passphrase Prompt:
- The main process prompts for a passphrase during initialization.
- The passphrase is securely handled and used to unlock and load the encrypted app stakes.
- The implementation handles incorrect passphrases gracefully, preventing unauthorized access.

Additional Considerations:

Ensure thorough testing of the entire process, including encryption, configuration updates, and passphrase handling.
Provide clear documentation for users and developers on how to manage the encrypted file and passphrase.

[DOCS] Create Minimum Compute Resources Required

Describe the Ask

https://discord.com/channels/553741558869131266/564836328202567725/1217559275086549093

what are the minimum requirements for the Gateway Server (https://github.com/pokt-network/gateway-server) ? I'm not seeing it mentioned anywhere in the repo/ docs, it would be particularly helpful imo to have that included in the Quick Onboarding Guide section of the docs in github. if you can describe them to me I'm happy to open a PR adding that info to docs

Acceptance Criteria

Section on the roughly required disk space, cpus, and memory required by gateway server.

[ENHANCEMENT] Migration Script Flexibility

Description

The current migration script limits operations to applying only one migration at a time, either up or down. This restricts the ability to efficiently manage database states over larger changes.

Expected Behavior

To enhance the usability and functionality of the migration script, the following changes are proposed:

Up Migration Improvements:
- Allow the --up flag to apply all pending migrations by default if no specific migration number is provided.
- If a number of migrations is specified with --up, only migrate up to that many migrations.
Down Migration Requirements:
- Require a specific number of migrations to revert when using the --down flag.
- Introduce the -all option with the --down flag to revert all migrations safely.

Rationale

These changes aim to provide users with clear options for precise or complete migrations and rollbacks.

[Chore] Add initial Unit Tests

Unit Test Coverage Enhancement

Issue:

To fortify the reliability and maintainability of the Gateway stack, there is a need to enhance unit test coverage. This initiative aims to create a comprehensive suite of unit tests covering critical components and functionalities.

Description:

The current state lacks sufficient unit test coverage, which may lead to challenges in identifying and resolving issues promptly. The goal is to develop a robust suite of unit tests that rigorously validate individual units of code, ensuring the correctness of the implementation.

Objectives:

Unit Test Coverage: Develop a suite of unit tests covering critical components and functionalities.

Acceptance Criteria:

Add mocking library and testify
Create a basic script to generate mocks
Create a mock for PocketService
Add TDD tests for https://github.com/baaspoolsllc/os-gateway/blob/main/pkg/common/crypto.go, https://github.com/baaspoolsllc/os-gateway/blob/main/pkg/common/http.go
readme testing instructions

Archival Check Basic Implementation

Describe the Feature

Currently we do not check if a node currently is archival or not. We can include a check to query a block in a random range from 1 to N to determine if a node is archival. N blocks can be set custom via the chain configurator.

Additional Context / Pre-requisite

While the archival check is useful and simple to implement, POKT Network is extremely confusing on which chain ids should actually be archival or not. For example, there is:

0021 - Ethereum
0027 - Ethereum Archival
0028 - Ethereum Archival Trace
000B - Polygon Archival

Then there are some chains that have the expectation of being archival without archival suffix being attached to them, i.e all the testnets (according to Grove).

Due to the standardization of chain listings, it would be difficult to apply an archival check to the right chain ids without prior research.

In order to solve this, we can allow node operators to enable archival check via chain configuration table and leave it up to the node operators discretion and have it disabled by default.

Write test for CachedClient Decorator

Unit Test Coverage Enhancement

Issue:

To fortify the reliability and maintainability of the Gateway stack, we should also have a unit tests for cached client decorator by mocking out cache interface, but since Cache is not yet an interface we'll likely need to make an interface for the cache.

Description:

The current cached client decorator lacks sufficient unit test coverage for scenarios such as

not cached
cached
error

Objectives:

Unit Test Coverage: Develop a suite of unit tests covering critical components and functionalities for cached client decorator.

Acceptance Criteria:

test for session is not cached
test for session is cached
test for session returns an error

[ENHANCEMENT] Node Health Metrics with Detailed Per-Chain Gauges

Description

Having detailed metrics about node healthiness per chain can significantly enhance monitoring and troubleshooting capabilities. The following metrics will provide valuable insights:

Number of Healthy Nodes per Chain: Tracks deemed healthy nodes based on Gateway QoS checks.
Number of Synced Nodes per Chain: Indicates nodes that are fully synced with the blockchain.
Number of Nodes in Timeout per Chain: Monitors the number of nodes that are currently in a timeout state (banned).

Expected Behavior

These metrics should be exported in Prometheus format and available for scraping at the /metrics endpoint. This will enable seamless integration with existing monitoring systems and dashboards, allowing for real-time tracking and alerting on the health and status of nodes across different chains.

[BUG] Incorrect Relayer Timeout Configuration and Documentation Inconsistency

Overview

This report addresses two separate issues found within the project: an incorrect timeout configuration in the internal/relayer/relayer.go file, and a documentation inconsistency regarding environment variable names.

Detailed Description

Code Issue: Timeout Configuration

Location: internal/relayer/relayer.go
Problem: The method getPocketRequestTimeout may not retrieve the correct timeout value due to an erroneous reference to a chain configuration attribute. Specifically, the attribute used does not match the intended timeout setting for POKT requests, leading to potential misconfigurations.
Reference: Timeout Configuration Line

Documentation Issue: Environment Variable Naming

Location: Quick Onboarding Guide
Problem: The environment variable POKT_RPC_TIMEOUT is referenced for two distinct purposes: to define the response timeout for both a POKT node and an altruist backup. This likely stemmed from a copy-paste error and could lead to confusion during environment setup, as it's unclear which timeout the variable is intended to set.

Expected Behavior

Code Fix: When getPocketRequestTimeout is executed within the relayer logic, it should accurately fetch the intended timeout value specific to POKT requests, ensuring reliable and predictable request handling.
Documentation Correction: The guide should distinctly name and describe the environment variables for setting the POKT node response timeout and the altruist backup response timeout, eliminating any ambiguity for users setting up their environment.

Proposed Solution or Workaround

I have prepared fixes for both issues and will detail them in an associated pull request.

PR Link: #29

[ENHANCEMENT] Relay Controller should handle concurrent requests

Relay controller concurrent requests

Issue:

Our relay controller should send concurrent requests for a faster response time, and chose the first response as an initial response.

Description:

The relay client should be kept simple, independent of any strong opinionated values outside of ease of development. The relay controller on the other hand is responsible for handling a relay once a request is received. It would be a good place for the relay controller to decide how the request is routed and served via concurrent requests to other nodes.

Concurrently sending requests will lay the foundation for how we will be able to enable QoS checksas we "prime" the nodes for health.

Objectives:

Acceptance Criteria:

Concurrent requests are sent and first response is consumed.

[CHORE] Architectural Diagrams for Gateway Stack

Architectural Diagrams Enhancement

Issue:

To enhance transparency and facilitate a comprehensive understanding of the Gateway Stack, there is a need to create and update architectural diagrams. This initiative aims to provide clear visual representations of the Gateway Stack's structure and interactions on E2E receiving a request to serving the relay under QoS measures, and the underlying high level components.

Description:

The current state of documentation lacks architectural diagrams, hindering users' ability to grasp the overall system architecture easily. This issue focuses on creating visual representations that effectively communicate the relationships between various components within the Gateway Stack.

Objectives:

Architectural Diagrams: Create clear and detailed visual representations of the Gateway Stack's architecture.

Acceptance Criteria:

Architectural diagrams are created to illustrate the structure and interactions within the Gateway Stack.
Diagrams are included in the documentation to provide a visual reference for users.

[Enhancement] Create Grafana Dashboards

Describe the Bug

Currently right now the gateway server emits rich data about success rates, latency, etc via Promethesus metrics. This dashboard should leverage all the metrics emitted by the gateway server and allow for service_url filtering if enabled as well.

There is no offical grafana dashboard currently.

[WIP] Grafana Dashboard: https://gist.github.com/nodiesBlade/910afe2ad9dbd5f19948fc7d42d1535a

pokt-network / gateway-server Goto Github PK

gateway-server's People

Contributors

Stargazers

Watchers

Forkers

gateway-server's Issues

Unit Test Coverage Enhancement

Issue:

Description:

Objectives:

Acceptance Criteria:

Quality of Service (QoS) Checks Enhancement

Issue:

Description:

Objectives:

Acceptance Criteria:

Additional Information (based off eng discussion):

Describe the Bug

Expected Behavior

Steps to Reproduce

Observability Enhancement for Gateway Server Metrics

Issue:

Description:

Objectives:

Acceptance Criteria:

Describe the Bug

Expected Behavior

Steps to Reproduce

Screenshots

Environment

Additional Context

Unit Test Coverage Enhancement

Issue:

Objectives:

Acceptance Criteria:

Chore Task: Encrypt and Load APP_STAKE_KEYS from Encrypted File

Issue:

Description:

Objectives:

Acceptance Criteria:

Additional Considerations:

Describe the Ask

Acceptance Criteria

Description

Expected Behavior

Rationale

Unit Test Coverage Enhancement

Issue:

Description:

Objectives:

Acceptance Criteria:

Describe the Feature

Additional Context / Pre-requisite

Unit Test Coverage Enhancement

Issue:

Description:

Objectives:

Acceptance Criteria:

Description

Expected Behavior

Overview

Detailed Description

Code Issue: Timeout Configuration

Documentation Issue: Environment Variable Naming

Expected Behavior

Proposed Solution or Workaround

Relay controller concurrent requests

Issue:

Description:

Objectives:

Acceptance Criteria:

Architectural Diagrams Enhancement

Issue:

Description:

Objectives:

Acceptance Criteria:

Describe the Bug

Recommend Projects

Recommend Topics