Comments (9)
While we want to be generally agnostic to back-ends, on @bhaskarkishore 's suggestion we would like to use HBase as the data store for this project. Docker images for HBase are publicly available.
from pencroff.
While we want to be generally agnostic to back-ends, on @bhaskarkishore 's suggestion we would like to use HBase as the data store for this project. Docker images for HBase are publicly available.
I'm not familiar enough with HBase to properly comment, but I'd love to know more about the choice if you're willing to give additional info on the "product" and its fit to the task.
from pencroff.
I would stress, if you're all of the same party, that the query api should be focused on high performance and availability.
from pencroff.
While we want to be generally agnostic to back-ends, on @bhaskarkishore 's suggestion we would like to use HBase as the data store for this project. Docker images for HBase are publicly available.
I'm not familiar enough with HBase to properly comment, but I'd love to know more about the choice if you're willing to give additional info on the "product" and its fit to the task.
@ivanopagano : @bhaskarkishore suggested HBase because it scales horizontally across a cluster and works nicely with Spark, etc. This leaves the option open in the future of running Lorre as a Spark transform on the raw RPC data such that we can use Kubernetes to spin up a massive cluster, quickly do the computations and immediately spin down the cluster.
from pencroff.
I would stress, if you're all of the same party, that the query api should be focused on high performance and availability.
@ivanopagano Could you explain what the practical implications of this are?
from pencroff.
I would stress, if you're all of the same party, that the query api should be focused on high performance and availability.
@ivanopagano Could you explain what the practical implications of this are?
Essentially the same thinking behind the choice of HBase: to optimize for Lorre usage the api should be able to handle large numbers of incoming parallel requests and respond in a timely fashion.
I suggested this to say that the goal would be for this to be a fast and scalable cache in front of tezos nodes
from pencroff.
@dorians To keep the initial build simple, please use a store you are already comfortable with, e.g. Postgres or Sqlite. We can take on HBase as a separate task. We'd like to get the first cut out quickly.
from pencroff.
Summary of POC:
Problem
Indexing is too slow because of Tezos responses times.
Solution
Instead of querying Tezos node from Lorre we should query a new service called Pencroff which will have a copy of all the data we need. Pencroff should index all the needed data in a similar fashion to Lorre.
Problem
Adding Pencroff to the stack might add some difficulty in extending Conseil. Each time we want to add a new source of data from Tezos we will need to do it in two places and wait with deploying of Conseil until Pencroff fully indexes. Pencroff becomes our source of truth so our caching mechanism is on the critical path which might cause problems in case of lack of consistency
Solution
We can consider introducing a pattern known as a reverse proxy with persistent storage (like Varnish). We could set up Varnish to cache each request (maybe except those with head in the path) and put it between Conseil and Tezos. It would be both faster and more reliable than another indexing tool. As a bonus point, we wouldn't need to do anything extra on adding new data sources from Tezos and we wouldn't need to care to set up Conseil with Tezos/Pencroff as a dependency since Varnish would work in a transparent way for Conseil.
Problem
Varnish is not persistent by default and although it has persistency as an optional feature, it is not recommended to use it (more info: https://varnish-cache.org/docs/trunk/phk/persistent.html). The second problem is scalability. Each Varnish's node has its own cache so when we have more than one instance the cache propagation might be cumbersome.
Solution
As a workaround, I may suggest implementing a similar solution in Pencroff with key-value storage (Redis, HBase, ...)
Performance and other observations
When testing traffic comes from my laptop to Varnish deployed in our cloud vs. tezos node in Europe I have ~0.65 speedup. When I test it against local node it is way faster (10 times). I'm not sure where Varnish node is deployed but anyway it brings me to the conclusion that Varnish node is pretty well optimized itself and the main problem is network delay. So we need to test it from the cloud to see a real performance gain.
from pencroff.
Golang has a very simple, performant in-memory key-value store called BoltDB (essentially just a B+ tree of buckets of key-value pairs) that's used at Spotify and Heroku. It has an indexer called storm that's extremely simple to use.
Read speeds are much, much better than writes on it (the writes take a mutex on the bucket iirc) but that should be fine for a cache.
from pencroff.
Related Issues (2)
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from pencroff.