A collection of Docker Containers and their orchestration for collecting EVM-compatible blockchains' data.
Main components (Docker containers):
- Producer - scrape block data from the node and propagate transactions to Kafka
- Consumers - save relevant transaction data to the database
- Kafka - event store for transaction hashes
- PostgreSQL - persistent data store
- Redis - cache for orchestration data between producer and consumers
The containers are orchestrated by docker compose yaml files. For convenience a set of bash scripts has been provided for easily running the data collection process.
docker compose
(v2.14.0+)- to use with abacus-3: install the compose plugin manually.
Compose files should be started with run scripts that can be found in the scripts/
directory. For this you also need to have an .env
file present. If you are cloning this directory, use cp .env.default .env
and update env variables according to your needs. Then to start the collection with the development environment for Ethereum:
$ bash scripts/run-dev-eth.sh
# use CTRL+C once to gracefully exit (with automatic docker compose down cleanup)
Check the docs for a more in-depth example with configuration and output.
There are two deployment environments available for the collection process.
- Development = use for development of new features
$ bash scripts/run-dev-eth.sh
- exit with CTRL+C, followed by an automatic cleanup via
docker compose down
- config file:
src/data_collection/etc/cfg/dev
- Production = intended for long running collection of data (Abacus-3)
$ bash scripts/run-prod-eth.sh
- CTRL+C only closes the logs output, containers continue running. Stopping and removing of containers is manual!
- config file:
src/data_collection/etc/cfg/prod
There are some minor differences between a development and production environment besides the configuration files. Details can be found in the scripts directory.
Two main configuration sources (files):
.env
= static configuration variables (data directory, connection URLs, credentials, timeout settings, ...)src/data_collection/etc/cfg/<environment>/<blockchain>.json
= data collection configuration (block range, data collection mode, addresses, events, ...)
For more, check the configuration guide.
- EVM compatible blockchains supported โ๏ธ
- Multiple data collection modes (partial, full, get_logs, ...) โ๏ธ
trace_Block
andtrace_replayTransaction
data included โ๏ธ- Parallelization of producers and consumers (multiple modes can run at the same time) โ๏ธ
- Collect data on multiple blockchains at the same time โ๏ธ
- Configurable timeouts for consumers and retries for web3 requests โ๏ธ
- Single SQL database across all chains โ๏ธ
- Add your own Events, contract ABIs and more...
The scripts/ directory contains bash scripts that mostly consist of docker compose commands. These scripts are used for orchestrating the whole application.
To query the collected data from the database you will need a running PostgreSQL service. To start one, use:
$ bash scripts/run-db.sh
To connect to the running database, from another terminal window:
$ docker exec -it <project_name>-db-1 psql <postgresql_dsn>
Then you can easily execute SQL statements in the psql CLI:
db=# \dt+ eth*;
...
(10 rows)
More details on how to connect can be found in the src/db/ directory.
If you'd like to extend the current data collection functionality, such as:
- adding a new web3
Event
to store in the db or to process (e.g. OwnershipTransferred) - adding a new contract ABI (e.g. cETH)
- adding a new data collection mode (e.g. LogFilter)
- supporting more blockchains than ETH and BSC
Please check out the functionality extension guide.
The etc/ directory contains a few python scripts that can be used for various tasks:
- get_top_uniswap_pairs.py = print top
n
uniswap pairs in a JSON format ready to be plugged into the data collection cfg.json - query_tool.py = CLI with predefined SQL queries for easily accessing the DB data (e.g for plotting).
- web3_method_benchmark.py = request response time benchmarking tool
Most python code is documented with google docstrings and handsdown is used as a docgen https://uzh-eth-mp.github.io/app/.
A list of frequently asked questions and their answers can be found here.
Contributions are welcome and appreciated. Please follow the convention and rules described here.