pokt-network / gateway-server Goto Github PK
View Code? Open in Web Editor NEWInteract with the POKT Protocol with ease
License: MIT License
Interact with the POKT Protocol with ease
License: MIT License
To fortify the reliability and maintainability of the Gateway stack, we should have unit tests for relay controller by mocking out relay client responses and as well fasthttp context
The current relay controller lacks sufficient unit test coverage for scenarios such as
As part of the Gateway Stack's transition from alpha to Release Candidate (RC), implementing Quality of Service (QoS) checks is crucial. This initiative aims to enable Pocket responses with a 99% success rate, ensuring reliability and performance.
The current state lacks comprehensive QoS checks, which can impact the reliability of Pocket responses. This issue focuses on implementing checks that guarantee a 99% success rate, thereby improving the overall quality of service provided by the Gateway Stack.
A session contains of 24 nodes per app stake. You can get a lot of diverse nodes, but every hour, they are rotated.
This presents a problem as we don't know which nodes are healthy. So we kickstart off a process called priming which will send arbitrary payloads (100-200 requests) to determine the state of the node per app session.
Every new session, we will prime for healthiness and latency, this will give us the most update to date information. This data as part of the priming process will be placed in a hot cache and reseted frequently.
Of course, this presents another problem because the priming process will take time to yield results (10s to 60s) and we will need a way to determine how do we route traffic still. In this case, we'll leverage a cold storage of historical information to determine which nodes are likely to return a good response.
That is the high level overview. We will be focused on the hot storage implementation and the health checks. Cold storage requires more engineering work and infrastructure needs, and doesn't yield much results except for the first couple of seconds. Cold storage likely will require more eng scoping but since we follow an iterative engineering process, there is no need to place emphasis on it immediately
When using an app staked for near in gateway server it fails to grab the block height with an invalid syntax error. Which then results in "cannot find source of truth for data integrity check" which means you get "Something went wrong node selector can't find node" when trying to send a relay.
My guess here is that it is due to near not confirming to jsonRPC spec.
Gateway server should be able to grab the block height of near in health checks
As part of the transition from alpha to Release Candidate (RC) for the Pocket Network Gateway Stack, it is crucial to enhance observability on the gateway server. This will provide valuable insights into the performance, success rates, errors, and session dispatching rates, allowing for better monitoring and troubleshooting.
The current state of the gateway stack lacks comprehensive observability features required to measure and analyze key metrics. To address this, we propose the implementation of observability enhancements to the gateway server. The goal is to enable Prometheus to scrape relevant metrics and provide a clear picture of the gateway's performance.
Metric Implementation: Implement instrumentation within the gateway server to capture essential metrics such as success rates, errors, and session dispatching rates.
Prometheus Integration: Allow for Prometheus to scrape and collect the metrics exposed by the gateway server, enabling real-time monitoring and alerting.
Dashboard Creation: Develop a user-friendly dashboard to visualize the collected metrics, facilitating easy interpretation for developers and operators.
We currently update the application cache whenever a new pokt application is added or removed. However, we don't update application cache when the chains change.
Application registry should update applications whenever chains change as well.
If applicable, add screenshots to help explain your problem.
Add any other context about the problem here.
To fortify the reliability and maintainability of the Gateway stack, we should also have a unit tests for the pokt v0 client, including the helper functions and mocking out the fastify HTTP requests.
Currently, sensitive information (APP_STAKE_KEYS
) is stored in an unencrypted format in a .env file. To improve security, this information should be encrypted and loaded from an encrypted file. The task involves changing the configuration to use APP_STAKE_KEYS_FILE
instead of a direct key in the .env file and modifying the main process to prompt for a passphrase to unlock and load the encrypted app stakes.
Encrypting the APP_STAKE_KEYS
and loading them from an encrypted file enhances security by protecting sensitive information. Changing the configuration to use APP_STAKE_KEYS_FILE
allows for better management of encrypted data. The main process will be modified to prompt for a passphrase during initialization to unlock and load the encrypted app stakes.
Encryption Implementation:
APP_STAKE_KEYS
and save the encrypted data in a designated file (APP_STAKE_KEYS_FILE
).Configuration Update:
APP_STAKE_KEYS_FILE
instead of directly specifying the key in the .env file.Passphrase Prompt:
Encryption:
APP_STAKE_KEYS
is successfully encrypted using AES256 and stored in the specified file (APP_STAKE_KEYS_FILE
).Configuration Update:
APP_STAKE_KEYS_FILE
.Passphrase Prompt:
https://discord.com/channels/553741558869131266/564836328202567725/1217559275086549093
what are the minimum requirements for the Gateway Server (https://github.com/pokt-network/gateway-server) ? I'm not seeing it mentioned anywhere in the repo/ docs, it would be particularly helpful imo to have that included in the Quick Onboarding Guide section of the docs in github. if you can describe them to me I'm happy to open a PR adding that info to docs
Section on the roughly required disk space, cpus, and memory required by gateway server.
The current migration script limits operations to applying only one migration at a time, either up or down. This restricts the ability to efficiently manage database states over larger changes.
To enhance the usability and functionality of the migration script, the following changes are proposed:
Up Migration Improvements:
--up
flag to apply all pending migrations by default if no specific migration number is provided.--up
, only migrate up to that many migrations.Down Migration Requirements:
--down
flag.-all
option with the --down
flag to revert all migrations safely.These changes aim to provide users with clear options for precise or complete migrations and rollbacks.
To fortify the reliability and maintainability of the Gateway stack, there is a need to enhance unit test coverage. This initiative aims to create a comprehensive suite of unit tests covering critical components and functionalities.
The current state lacks sufficient unit test coverage, which may lead to challenges in identifying and resolving issues promptly. The goal is to develop a robust suite of unit tests that rigorously validate individual units of code, ensuring the correctness of the implementation.
Currently we do not check if a node currently is archival or not. We can include a check to query a block in a random range from 1 to N to determine if a node is archival. N
blocks can be set custom via the chain configurator.
While the archival check is useful and simple to implement, POKT Network is extremely confusing on which chain ids should actually be archival or not. For example, there is:
0021 - Ethereum
0027 - Ethereum Archival
0028 - Ethereum Archival Trace
000B - Polygon Archival
Then there are some chains that have the expectation of being archival without archival suffix being attached to them, i.e all the testnets (according to Grove).
Due to the standardization of chain listings, it would be difficult to apply an archival check to the right chain ids without prior research.
In order to solve this, we can allow node operators to enable archival check via chain configuration table and leave it up to the node operators discretion and have it disabled by default.
To fortify the reliability and maintainability of the Gateway stack, we should also have a unit tests for cached client decorator by mocking out cache interface, but since Cache is not yet an interface we'll likely need to make an interface for the cache.
The current cached client decorator lacks sufficient unit test coverage for scenarios such as
Having detailed metrics about node healthiness per chain can significantly enhance monitoring and troubleshooting capabilities. The following metrics will provide valuable insights:
These metrics should be exported in Prometheus format and available for scraping at the /metrics
endpoint. This will enable seamless integration with existing monitoring systems and dashboards, allowing for real-time tracking and alerting on the health and status of nodes across different chains.
This report addresses two separate issues found within the project: an incorrect timeout configuration in the internal/relayer/relayer.go
file, and a documentation inconsistency regarding environment variable names.
getPocketRequestTimeout
may not retrieve the correct timeout value due to an erroneous reference to a chain configuration attribute. Specifically, the attribute used does not match the intended timeout setting for POKT requests, leading to potential misconfigurations.POKT_RPC_TIMEOUT
is referenced for two distinct purposes: to define the response timeout for both a POKT node and an altruist backup. This likely stemmed from a copy-paste error and could lead to confusion during environment setup, as it's unclear which timeout the variable is intended to set.Code Fix: When getPocketRequestTimeout
is executed within the relayer logic, it should accurately fetch the intended timeout value specific to POKT requests, ensuring reliable and predictable request handling.
Documentation Correction: The guide should distinctly name and describe the environment variables for setting the POKT node response timeout and the altruist backup response timeout, eliminating any ambiguity for users setting up their environment.
I have prepared fixes for both issues and will detail them in an associated pull request.
PR Link: #29
Our relay controller should send concurrent requests for a faster response time, and chose the first response as an initial response.
The relay client should be kept simple, independent of any strong opinionated values outside of ease of development. The relay controller on the other hand is responsible for handling a relay once a request is received. It would be a good place for the relay controller to decide how the request is routed and served via concurrent requests to other nodes.
Concurrently sending requests will lay the foundation for how we will be able to enable QoS checksas we "prime" the nodes for health.
To enhance transparency and facilitate a comprehensive understanding of the Gateway Stack, there is a need to create and update architectural diagrams. This initiative aims to provide clear visual representations of the Gateway Stack's structure and interactions on E2E receiving a request to serving the relay under QoS measures, and the underlying high level components.
The current state of documentation lacks architectural diagrams, hindering users' ability to grasp the overall system architecture easily. This issue focuses on creating visual representations that effectively communicate the relationships between various components within the Gateway Stack.
Currently right now the gateway server emits rich data about success rates, latency, etc via Promethesus metrics. This dashboard should leverage all the metrics emitted by the gateway server and allow for service_url filtering if enabled as well.
There is no offical grafana dashboard currently.
[WIP] Grafana Dashboard: https://gist.github.com/nodiesBlade/910afe2ad9dbd5f19948fc7d42d1535a
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.