This is a follow up to <a class="issue-link js-issue-link" data-error-text="Failed to

While we want to be generally agnostic to back-ends, on <a class="user-mention notrans

While we want to be generally agnostic to back-ends, on <a class="user-me

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Create initial version of project about pencroff HOT 9 CLOSED

vishakh commented on June 11, 2024

Create initial version of project

from pencroff.

Comments (9)

vishakh commented on June 11, 2024

While we want to be generally agnostic to back-ends, on @bhaskarkishore 's suggestion we would like to use HBase as the data store for this project. Docker images for HBase are publicly available.

from pencroff.

ivanopagano commented on June 11, 2024

While we want to be generally agnostic to back-ends, on @bhaskarkishore 's suggestion we would like to use HBase as the data store for this project. Docker images for HBase are publicly available.

I'm not familiar enough with HBase to properly comment, but I'd love to know more about the choice if you're willing to give additional info on the "product" and its fit to the task.

from pencroff.

ivanopagano commented on June 11, 2024

I would stress, if you're all of the same party, that the query api should be focused on high performance and availability.

from pencroff.

vishakh commented on June 11, 2024

While we want to be generally agnostic to back-ends, on @bhaskarkishore 's suggestion we would like to use HBase as the data store for this project. Docker images for HBase are publicly available.

I'm not familiar enough with HBase to properly comment, but I'd love to know more about the choice if you're willing to give additional info on the "product" and its fit to the task.

@ivanopagano : @bhaskarkishore suggested HBase because it scales horizontally across a cluster and works nicely with Spark, etc. This leaves the option open in the future of running Lorre as a Spark transform on the raw RPC data such that we can use Kubernetes to spin up a massive cluster, quickly do the computations and immediately spin down the cluster.

from pencroff.

vishakh commented on June 11, 2024

I would stress, if you're all of the same party, that the query api should be focused on high performance and availability.

@ivanopagano Could you explain what the practical implications of this are?

from pencroff.

ivanopagano commented on June 11, 2024

I would stress, if you're all of the same party, that the query api should be focused on high performance and availability.

@ivanopagano Could you explain what the practical implications of this are?

Essentially the same thinking behind the choice of HBase: to optimize for Lorre usage the api should be able to handle large numbers of incoming parallel requests and respond in a timely fashion.
I suggested this to say that the goal would be for this to be a fast and scalable cache in front of tezos nodes

from pencroff.

vishakh commented on June 11, 2024

@dorians To keep the initial build simple, please use a store you are already comfortable with, e.g. Postgres or Sqlite. We can take on HBase as a separate task. We'd like to get the first cut out quickly.

from pencroff.

dorians commented on June 11, 2024

Summary of POC:

Problem

Indexing is too slow because of Tezos responses times.

Solution

Instead of querying Tezos node from Lorre we should query a new service called Pencroff which will have a copy of all the data we need. Pencroff should index all the needed data in a similar fashion to Lorre.

Problem

Adding Pencroff to the stack might add some difficulty in extending Conseil. Each time we want to add a new source of data from Tezos we will need to do it in two places and wait with deploying of Conseil until Pencroff fully indexes. Pencroff becomes our source of truth so our caching mechanism is on the critical path which might cause problems in case of lack of consistency

Solution

We can consider introducing a pattern known as a reverse proxy with persistent storage (like Varnish). We could set up Varnish to cache each request (maybe except those with head in the path) and put it between Conseil and Tezos. It would be both faster and more reliable than another indexing tool. As a bonus point, we wouldn't need to do anything extra on adding new data sources from Tezos and we wouldn't need to care to set up Conseil with Tezos/Pencroff as a dependency since Varnish would work in a transparent way for Conseil.

Problem

Varnish is not persistent by default and although it has persistency as an optional feature, it is not recommended to use it (more info: https://varnish-cache.org/docs/trunk/phk/persistent.html). The second problem is scalability. Each Varnish's node has its own cache so when we have more than one instance the cache propagation might be cumbersome.

Solution

As a workaround, I may suggest implementing a similar solution in Pencroff with key-value storage (Redis, HBase, ...)

Performance and other observations

When testing traffic comes from my laptop to Varnish deployed in our cloud vs. tezos node in Europe I have ~0.65 speedup. When I test it against local node it is way faster (10 times). I'm not sure where Varnish node is deployed but anyway it brings me to the conclusion that Varnish node is pretty well optimized itself and the main problem is network delay. So we need to test it from the cloud to see a real performance gain.

from pencroff.

itamarreif commented on June 11, 2024

Golang has a very simple, performant in-memory key-value store called BoltDB (essentially just a B+ tree of buckets of key-value pairs) that's used at Spotify and Heroku. It has an indexer called storm that's extremely simple to use.

Read speeds are much, much better than writes on it (the writes take a mutex on the bucket iirc) but that should be fine for a cache.

from pencroff.

Create initial version of project about pencroff HOT 9 CLOSED

Comments (9)

Summary of POC:

Problem

Solution

Problem

Solution

Problem

Solution

Performance and other observations

Related Issues (2)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent