Code Monkey home page Code Monkey logo

Comments (9)

vishakh avatar vishakh commented on June 11, 2024

While we want to be generally agnostic to back-ends, on @bhaskarkishore 's suggestion we would like to use HBase as the data store for this project. Docker images for HBase are publicly available.

from pencroff.

ivanopagano avatar ivanopagano commented on June 11, 2024

While we want to be generally agnostic to back-ends, on @bhaskarkishore 's suggestion we would like to use HBase as the data store for this project. Docker images for HBase are publicly available.

I'm not familiar enough with HBase to properly comment, but I'd love to know more about the choice if you're willing to give additional info on the "product" and its fit to the task.

from pencroff.

ivanopagano avatar ivanopagano commented on June 11, 2024

I would stress, if you're all of the same party, that the query api should be focused on high performance and availability.

from pencroff.

vishakh avatar vishakh commented on June 11, 2024

While we want to be generally agnostic to back-ends, on @bhaskarkishore 's suggestion we would like to use HBase as the data store for this project. Docker images for HBase are publicly available.

I'm not familiar enough with HBase to properly comment, but I'd love to know more about the choice if you're willing to give additional info on the "product" and its fit to the task.

@ivanopagano : @bhaskarkishore suggested HBase because it scales horizontally across a cluster and works nicely with Spark, etc. This leaves the option open in the future of running Lorre as a Spark transform on the raw RPC data such that we can use Kubernetes to spin up a massive cluster, quickly do the computations and immediately spin down the cluster.

from pencroff.

vishakh avatar vishakh commented on June 11, 2024

I would stress, if you're all of the same party, that the query api should be focused on high performance and availability.

@ivanopagano Could you explain what the practical implications of this are?

from pencroff.

ivanopagano avatar ivanopagano commented on June 11, 2024

I would stress, if you're all of the same party, that the query api should be focused on high performance and availability.

@ivanopagano Could you explain what the practical implications of this are?

Essentially the same thinking behind the choice of HBase: to optimize for Lorre usage the api should be able to handle large numbers of incoming parallel requests and respond in a timely fashion.
I suggested this to say that the goal would be for this to be a fast and scalable cache in front of tezos nodes

from pencroff.

vishakh avatar vishakh commented on June 11, 2024

@dorians To keep the initial build simple, please use a store you are already comfortable with, e.g. Postgres or Sqlite. We can take on HBase as a separate task. We'd like to get the first cut out quickly.

from pencroff.

dorians avatar dorians commented on June 11, 2024

Summary of POC:

Problem

Indexing is too slow because of Tezos responses times.

Solution

Instead of querying Tezos node from Lorre we should query a new service called Pencroff which will have a copy of all the data we need. Pencroff should index all the needed data in a similar fashion to Lorre.

Problem

Adding Pencroff to the stack might add some difficulty in extending Conseil. Each time we want to add a new source of data from Tezos we will need to do it in two places and wait with deploying of Conseil until Pencroff fully indexes. Pencroff becomes our source of truth so our caching mechanism is on the critical path which might cause problems in case of lack of consistency

Solution

We can consider introducing a pattern known as a reverse proxy with persistent storage (like Varnish). We could set up Varnish to cache each request (maybe except those with head in the path) and put it between Conseil and Tezos. It would be both faster and more reliable than another indexing tool. As a bonus point, we wouldn't need to do anything extra on adding new data sources from Tezos and we wouldn't need to care to set up Conseil with Tezos/Pencroff as a dependency since Varnish would work in a transparent way for Conseil.

Problem

Varnish is not persistent by default and although it has persistency as an optional feature, it is not recommended to use it (more info: https://varnish-cache.org/docs/trunk/phk/persistent.html). The second problem is scalability. Each Varnish's node has its own cache so when we have more than one instance the cache propagation might be cumbersome.

Solution

As a workaround, I may suggest implementing a similar solution in Pencroff with key-value storage (Redis, HBase, ...)

Performance and other observations

When testing traffic comes from my laptop to Varnish deployed in our cloud vs. tezos node in Europe I have ~0.65 speedup. When I test it against local node it is way faster (10 times). I'm not sure where Varnish node is deployed but anyway it brings me to the conclusion that Varnish node is pretty well optimized itself and the main problem is network delay. So we need to test it from the cloud to see a real performance gain.

from pencroff.

itamarreif avatar itamarreif commented on June 11, 2024

Golang has a very simple, performant in-memory key-value store called BoltDB (essentially just a B+ tree of buckets of key-value pairs) that's used at Spotify and Heroku. It has an indexer called storm that's extremely simple to use.

Read speeds are much, much better than writes on it (the writes take a mutex on the bucket iirc) but that should be fine for a cache.

from pencroff.

Related Issues (2)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.