Code Monkey home page Code Monkey logo

uzh-blockchain-data-collection's Introduction

EVM-compatible blockchain data collection

Unit tests Integration tests Docs Release Code style pre-commit

A collection of Docker Containers and their orchestration for collecting EVM-compatible blockchains' data.

Overview

App overview

Main components (Docker containers):

  • Producer - scrape block data from the node and propagate transactions to Kafka
  • Consumers - save relevant transaction data to the database
  • Kafka - event store for transaction hashes
  • PostgreSQL - persistent data store
  • Redis - cache for orchestration data between producer and consumers

Usage

The containers are orchestrated by docker compose yaml files. For convenience a set of bash scripts has been provided for easily running the data collection process.

Requirements

Quickstart ๐Ÿš€

Compose files should be started with run scripts that can be found in the scripts/ directory. For this you also need to have an .env file present. If you are cloning this directory, use cp .env.default .env and update env variables according to your needs. Then to start the collection with the development environment for Ethereum:

$ bash scripts/run-dev-eth.sh
# use CTRL+C once to gracefully exit (with automatic docker compose down cleanup)

Check the docs for a more in-depth example with configuration and output.

Deployment Environment ๐Ÿ™๏ธ

There are two deployment environments available for the collection process.

  • Development = use for development of new features
  • Production = intended for long running collection of data (Abacus-3)
    • $ bash scripts/run-prod-eth.sh
    • CTRL+C only closes the logs output, containers continue running. Stopping and removing of containers is manual!
    • config file: src/data_collection/etc/cfg/prod

There are some minor differences between a development and production environment besides the configuration files. Details can be found in the scripts directory.

Configuration ๐Ÿ—๏ธ

Two main configuration sources (files):

  1. .env = static configuration variables (data directory, connection URLs, credentials, timeout settings, ...)
  2. src/data_collection/etc/cfg/<environment>/<blockchain>.json = data collection configuration (block range, data collection mode, addresses, events, ...)

For more, check the configuration guide.

Features

  • EVM compatible blockchains supported โ˜‘๏ธ
  • Multiple data collection modes (partial, full, get_logs, ...) โ˜‘๏ธ
  • trace_Block and trace_replayTransaction data included โ˜‘๏ธ
  • Parallelization of producers and consumers (multiple modes can run at the same time) โ˜‘๏ธ
  • Collect data on multiple blockchains at the same time โ˜‘๏ธ
  • Configurable timeouts for consumers and retries for web3 requests โ˜‘๏ธ
  • Single SQL database across all chains โ˜‘๏ธ
  • Add your own Events, contract ABIs and more...

Scripts ๐Ÿ“œ

The scripts/ directory contains bash scripts that mostly consist of docker compose commands. These scripts are used for orchestrating the whole application.

Querying Data ๐Ÿ“

To query the collected data from the database you will need a running PostgreSQL service. To start one, use:

$ bash scripts/run-db.sh

To connect to the running database, from another terminal window:

$ docker exec -it <project_name>-db-1 psql <postgresql_dsn>

Then you can easily execute SQL statements in the psql CLI:

db=# \dt+ eth*;
...
(10 rows)

More details on how to connect can be found in the src/db/ directory.

Extensions ๐Ÿšง

If you'd like to extend the current data collection functionality, such as:

  • adding a new web3 Event to store in the db or to process (e.g. OwnershipTransferred)
  • adding a new contract ABI (e.g. cETH)
  • adding a new data collection mode (e.g. LogFilter)
  • supporting more blockchains than ETH and BSC

Please check out the functionality extension guide.

Tools ๐Ÿ› ๏ธ

The etc/ directory contains a few python scripts that can be used for various tasks:

  1. get_top_uniswap_pairs.py = print top n uniswap pairs in a JSON format ready to be plugged into the data collection cfg.json
  2. query_tool.py = CLI with predefined SQL queries for easily accessing the DB data (e.g for plotting).
  3. web3_method_benchmark.py = request response time benchmarking tool

Documentation ๐Ÿ“—

Most python code is documented with google docstrings and handsdown is used as a docgen https://uzh-eth-mp.github.io/app/.

FAQ ๐Ÿ™‹๐Ÿป

A list of frequently asked questions and their answers can be found here.

Contributing ๐Ÿฅท๐Ÿป

Contributions are welcome and appreciated. Please follow the convention and rules described here.

uzh-blockchain-data-collection's People

Contributors

dvdblk avatar cinan93 avatar dependabot[bot] avatar github-actions[bot] avatar

Stargazers

Tao Yan avatar

Watchers

AngeloAyranji avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.