hashgraph / hedera-mirror-node Goto Github PK

Hedera Mirror Node archives data from consensus nodes and serves it via an API

License: Apache License 2.0

Java 77.48% Shell 0.28% JavaScript 12.91% PLpgSQL 0.63% Dockerfile 0.09% Gherkin 0.48% Go 4.16% Mustache 0.18% Solidity 3.72% Kotlin 0.05% Python 0.01%

hedera-mirror-node's Introduction

Hedera Mirror Node

The Hedera Mirror Node acts as an archive node and stores historical data for the Hedera network.

Overview

Mirror nodes receive information from the Hedera nodes and can provide value-added services such as APIs, auditing, analytics, visibility services, security threat modeling, data monetization services, etc. Mirror nodes can also run additional business logic to support applications built using the Hedera network.

While mirror nodes receive information from the main nodes, they do not contribute to consensus on the network, and their votes are not counted. Only the votes from the main nodes are counted for determining consensus. The trust of the Hedera network is derived based on the consensus reached by the main nodes. That trust is transferred to the mirror nodes using cryptographic signatures on a chain of files.

Eventually, the mirror nodes will be able to run the same code as the Hedera nodes so that they can see the transactions in real time. To make the initial deployments easier, the mirror node strives to take away the burden of running a full Hedera node through creation of periodic files that contain processed information (such as account balances or transaction records), and have the full trust of the main nodes. The mirror node software reduces the processing burden by receiving pre-constructed files from the network, validating those, populating a database, and providing APIs to expose the data. This approach provides the following advantages:

Lower compute and bandwidth requirements
Allows users to only save the data that they care about (lower storage requirement)
Easy searchable database so the users can add value quickly
Easy to consume APIs to make integrations faster

Architecture

Main Nodes

When a transaction reaches consensus, Hedera nodes add the transaction and its associated record to a record file.
Record files contain the hash of the previous record file, thus creating an unbreakable validation chain.
The file is closed on a regular cadence and a signature file is generated by the node for the record file.
The record and signature files from each node are then uploaded to Amazon S3 and Google Cloud Storage.

Mirror Nodes

This mirror node software downloads signature files from cloud storage.
The signature files are verified using the corresponding node's public key from the address book (stored in a 0.0.102 file).
The verified signature files are checked to ensure at least 1/3 have the same record file hash.
For each valid signature file, the corresponding record file is then downloaded from cloud storage and its hash is verified against the hash contained in the signature file.
The downloaded record file contains a previous hash that is validated against the last processed file to verify the hash chain.
Record files can then be processed and transactions and records processed for long term storage.

Getting Started

Prerequisite Tools

Ensure these tools are installed (note minimum versions) prior to running the mirror node:

Docker Desktop 4.28+

Running

For production use, we recommend using Kubernetes and to deploy using our Helm chart. Hedera managed mirror nodes use Kubernetes and Helm for their deployments, and this process is considered the most production-ready. As an alternative for local development, Docker Compose can be used to run the mirror node. See the installation document for more details on configuring and running with Docker Compose. To get up and running quickly with Docker Compose execute the following commands in your terminal:

git clone https://github.com/hashgraph/hedera-mirror-node.git
cd hedera-mirror-node
docker compose up

NOTE: This defaults to a bucket setup for demonstration purposes. See the next section for more details.

Data Access

Demo

The free option utilizes a bucket setup for demonstration purposes. This is not a real Hedera network but simply a dummy bucket populated with a day's worth of past testnet data. This is the default option and requires no additional steps. Once you've verified your deployment works against the demo bucket, remember to configure it with a public network then wipe the database and restart the mirror node.

Public Networks

To access data from real Hedera networks, AWS or GCS requester pays credentials must be used. The charges associated with the downloading of stream files are paid for by the requester and not the bucket owner. See the Run Your Own Mirror Node documentation for more information.

Documentation

Components
- GraphQL API
- gRPC API
- Importer
- Monitor
- REST API
- REST Java API
- Rosetta API
- Web3 API
Configuration
Contributing
Installation
Troubleshooting

Releasing

To perform a new release, run the Automated Release GitHub workflow.

Support

If you have a question on how to use the product, please see our support guide.

Contributing

Contributions are welcome. Please see the contributing guide to see how you can get involved.

Code of Conduct

This project is governed by the Contributor Covenant Code of Conduct. By participating, you are expected to uphold this code of conduct. Please report unacceptable behavior to [email protected].

License

Apache License 2.0

hedera-mirror-node's People

Contributors

Stargazers

Watchers

Forkers

kenthejr weijunleetx rightbot apeksharma aech-io carlcherry sierraprospector thomascherickal mike-burrage-hedera welikecode sguibord jmspring beeradb injectedfusion cooper-kunz stjordanis limechain dp-h tannerjfco machineilya rbair23 kk753344a xin-hedera ntwi tradesage anighanta queilawithaq ap19932404 omki thaingo rustyshacklefurd dovuofficial justin-atwell lvbk longnguyencircle gobbledygook-coder cha11enger zimplifica ljzn tolorukooba salartatlii thomas993 fredagvdd buidler-labs jsindy mcantu-kahoa rocremer eddie-gaggle ledgerworks fazolari simihunjan propellyr ar-conmit blade-labs sbwengineer shades2 relja80 neurone bpam1021 gregscullard imhunterand lukelee-sl tmli3b3rm4n maximfischuk isavov tom-hbar mgoelswirlds richmix doctornigel mgogoi georgi-l95 tgpa piyush280599 jeromek13 codersmith syed-bcw devnoh internetofpeers david-bakin-sl u2u-eco leric7 omahs alfredog87 asmrprog wdstorer-bg robocatty truesoni triquetra0x m-arifrizki usintreasinward faurichtih coredumped7893 zanavesgrug nirbosl pkeshkamat elijahlynn shelbyseeholzer step-security-bot karthiks416 clonemycode

hedera-mirror-node's Issues

/MirrorNodeData/recordStreams/valid does not exist

Actual:
This error prints repeatedly on startup:

mirror-node-record-download-parse_1  | WARNING: An illegal reflective access operation has occurred
mirror-node-record-download-parse_1  | WARNING: Illegal reflective access by com.google.protobuf.UnsafeUtil (file:/MirrorNodeCode/lib/protobuf-java-3.5.1.jar) to field java.nio.Buffer.address
mirror-node-record-download-parse_1  | WARNING: Please consider reporting this to the maintainers of com.google.protobuf.UnsafeUtil
mirror-node-record-download-parse_1  | WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations
mirror-node-record-download-parse_1  | WARNING: All illegal access operations will be denied in a future release
mirror-node-record-download-parse_1  | Downloading Done
mirror-node-record-download-parse_1  | 2019-08-09 20:31:07.326 ERROR  241  recordfileparser - Exception file /MirrorNodeData/recordStreams/valid does not exist
mirror-node-record-download-parse_1  | Parsing Done
mirror-node-record-download-parse_1  | Downloading
mirror-node-record-download-parse_1  | Downloading Done

Expected:
All necessary directories are created by the app on startup.

create the `eventStreams/valid/` directory if it doesn't exist

In EventStreamParser, we should create the eventStreams/valid/ directory if it doesn't exist

BalanceFileLogger crashes on start if balances don't exist

mirror-node-balance-parser_1         | 2019-08-09 18:41:25.749 ERROR  139  balancelogger - /MirrorNodeData/accountBalances/valid does not exist.
mirror-node-balance-parser_1         | Exception in thread "main" java.lang.NullPointerException
mirror-node-balance-parser_1         | 	at com.hedera.balanceFileLogger.BalanceFileLogger.processAllFilesForHistory(BalanceFileLogger.java:162)
mirror-node-balance-parser_1         | 	at com.hedera.balanceFileLogger.BalanceFileLogger.main(BalanceFileLogger.java:156)

Expected:
It should create the directory on startup if it doesn't exist.

EventStream Parser

Add an EventStream Parser to parse new version EventStream files.
format:

Type	Description	Value
int	EVENT_STREAM_FILE_VERSION	2
byte	TYPE_PREV_HASH	1
byte[48]	Previous File Hash
byte	STREAM_EVENT_START_NO_TRANS_WITH_VERSION (0x5b) or STREAM_EVENT_START_WITH_VERSION (0x5a)
int	STREAM_EVENT_VERSION	2
long	event.creatorId
long	event.creatorSeq
long	event.otherId
long	event.otherSeq
long	event.selfParentGen
long	event.otherParentGen
NullableByteArray	event.selfParentHash	if read an Int -1, it means the array is null; else the int value is the length of the byte array to be read
NullableByteArray	event.otherParentHash	same as above
Transaction[]	event.transactions	only exists if we have read STREAM_EVENT_START_WITH_VERSION earlier
Instant	event.timeCreated
byte[]	event.signature
byte	commEventLast	0x46
byte[]	event.getHash()
Instant	event.consensusTimestamp
long	event.consensusOrder
byte	STREAM_EVENT_START_NO_TRANS_WITH_VERSION (0x5b) or STREAM_EVENT_START_WITH_VERSION (0x5a)
int	STREAM_EVENT_VERSION	2
next Event info...
byte	STREAM_EVENT_START_NO_TRANS_WITH_VERSION (0x5b) or STREAM_EVENT_START_WITH_VERSION (0x5a)
int	STREAM_EVENT_VERSION	2
next Event info...
...

NullPointerException in AccountBalancesDownloader

Actual:
Occurs on startup with a clean db and filesystem:

mirror-node-balance-downloader_1     | WARNING: An illegal reflective access operation has occurred
mirror-node-balance-downloader_1     | WARNING: Illegal reflective access by com.google.protobuf.UnsafeUtil (file:/MirrorNodeCode/lib/protobuf-java-3.5.1.jar) to field java.nio.Buffer.address
mirror-node-balance-downloader_1     | WARNING: Please consider reporting this to the maintainers of com.google.protobuf.UnsafeUtil
mirror-node-balance-downloader_1     | WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations
mirror-node-balance-downloader_1     | WARNING: All illegal access operations will be denied in a future release
mirror-node-balance-downloader_1     | Exception in thread "main" java.lang.NullPointerException
mirror-node-balance-downloader_1     | 	at com.hedera.downloader.AccountBalancesDownloader.verifySigsAndDownloadBalanceFiles(AccountBalancesDownloader.java:88)
mirror-node-balance-downloader_1     | 	at com.hedera.downloader.AccountBalancesDownloader.main(AccountBalancesDownloader.java:47)

Expected:
No NPE

Single process mode

Currently the architecture is complex with 5 java processes starting up each with only a single thread. We should also support starting up in a single process with each of the standalone processes now running in a separate thread. Wouldn't take much work, just need to consolidate to a single main() and have each of the previous mains implement a common interface. A flag would allow you to pick and choose which module you wanted to run on startup.

Add unit tests and run through CI

Modify RecordStream File parser to parse File with new hashing

Related to issue: swirlds/services-hedera/issues/1514

Right now In version1, hashing is calculated as follows:
h[i] = hash(h[i-1] || c[i])

In version2 , Incorporate the change of hashing formula to:
h[i] = hash(h[i-1] || hash(c[i]))

lastValidBalanceFileName not updated if file already exists locally

The state of the last balance file received with the actual data downloaded can get out of sync. When this occurs the balance downloader will never download any new files indefinitely. In practice, I'm not sure how the balances.json file got behind the data. This happened when I was playing around with adding sleep. But it could potentially occur if the balances.json fails to write or gets accidentally wiped out.

To reproduce, you can set your balances.json to a file in the past (pick something past maxDownloadItems in the past). For example, download up to date then update balances.json:

{
  "lastValidBalanceFileName": "2019-07-11-10-05.csv"
}

You'll just see it trying the same files over and over in the logs and saying they exist:

2019-08-22 11:28:31,818 TRACE [main  ] c.h.d.AccountBalancesDownloader File exists: ./MirrorNodeData/accountBalances/valid//2019-07-11-10-10.csv
2019-08-22 11:28:31,818 TRACE [main  ] c.h.d.AccountBalancesDownloader File exists: ./MirrorNodeData/accountBalances/valid//2019-07-11-10-15.csv
2019-08-22 11:28:31,818 TRACE [main  ] c.h.d.AccountBalancesDownloader File exists: ./MirrorNodeData/accountBalances/valid//2019-07-11-10-20.csv
2019-08-22 11:28:31,819 TRACE [main  ] c.h.d.AccountBalancesDownloader File exists: ./MirrorNodeData/accountBalances/valid//2019-07-11-10-25.csv
2019-08-22 11:28:31,819 TRACE [main  ] c.h.d.AccountBalancesDownloader File exists: ./MirrorNodeData/accountBalances/valid//2019-07-11-10-30.csv
2019-08-22 11:28:31,819 TRACE [main  ] c.h.d.AccountBalancesDownloader File exists: ./MirrorNodeData/accountBalances/valid//2019-07-11-10-35.csv
2019-08-22 11:28:31,819 TRACE [main  ] c.h.d.AccountBalancesDownloader File exists: ./MirrorNodeData/accountBalances/valid//2019-07-11-10-40.csv
2019-08-22 11:28:31,819 TRACE [main  ] c.h.d.AccountBalancesDownloader File exists: ./MirrorNodeData/accountBalances/valid//2019-07-11-10-45.csv
2019-08-22 11:28:31,819 TRACE [main  ] c.h.d.AccountBalancesDownloader File exists: ./MirrorNodeData/accountBalances/valid//2019-07-11-10-50.csv
2019-08-22 11:28:31,819 TRACE [main  ] c.h.d.AccountBalancesDownloader File exists: ./MirrorNodeData/accountBalances/valid//2019-07-11-10-55.csv
2019-08-22 11:28:31,819 TRACE [main  ] c.h.d.AccountBalancesDownloader File exists: ./MirrorNodeData/accountBalances/valid//2019-07-11-11-00.csv

Store full key structure against an entity

Extract the keys, keylist and threshold keys from keys and adminKeys in t_entities.

Option 1: Create a database structure to mirror keys, lists and threshold lists. Consider searches must be able to find an entity for a given key regardless of depth.

Option 2: Simply extract a distinct list of keys to attach to an entity such that an entity can be found by searching for the key

Use address book for list of nodes rather than nodesInfo.json

Use address book for list of nodes rather than nodes info, this keeps the list of nodes up to date and removes a configuration step.

Proxy
Look at list of nodes from address book.
Mirror node downloads
-Not automating this does allow for some freedom in where to download from, at the expense of trust and reliability
-Automating forces mirror node operators to download everything

Consolidate config and environment variables

There are several .env files, confusing environment variable names, etc...
We should consolidate as much as possible to reduce the likelihood of errors.

Enforce that the address book is downloaded from the network if not present on first run

When any of the download or mirror processes start, check the address book file is present in ./config.
If not, attempt to download using parameters from config.json or .env (NODE_ID, NODE_ADDRESS, OPERATOR_ID and OPERATOR_KEY).
If the address book fails to download of those parameters are missing, exit with error message.

Compile source with docker

Add EventStream Downloader

Add an EventStream Downloader, and verify the signature files.

Modify EventStream file Parser to be able to parse eventStream files with new version

Modify MirrorNode to be able to parse EventStream files with new version number 3.
Related issue: https://github.com/swirlds/platform-swirlds/issues/1369

Difference between version3 and version2:

the way of generating file Hash

file[i] = p[i] || h[i] || c[i]

version2:
h[i] = hash(p[i-1] || h[i-1] || c[i-1])

version3:
h[i] = hash(p[i-1] || h[i-1] || hash(c[i-1]))

in version2, each time the nodes restart, the prevFileHash contained in the first EventStream file which is generated after restarting would be new byte[48];
in version3, while starting the platform, we read the FileHash contained in the last .evts_sig file in the directory, and use it as prevFileHash when we start to write the first EventStream file; if such file doesn't exist, we use new byte[48] as prevFileHash.

Record file status for eventStream files; Reimport a file if necessary

add a file status table in database;
add a column fileId in t_events table;
if a file was partially parsed and the parsing stoped, we should first delete the events in t_events which are contained in this file, then reimport this eventstream file without producing duplicate rows.

Cleanup logging

Account balances downloader polls S3 repeatedly

Account balances only occur every 15 minutes, but the current downloader will repeatedly poll S3 for new files without any sleep in between. We should institute some amount of sleeping to reduce CPU usage and S3 network costs. There are two scenarios to consider:

Initial startup and catching up: There should be no sleep in between queries when there were any downloads last round.
Up to date: There should be a sleep. It could be anywhere from a dumb 1s sleep to a smart 15min from last download sleep.

java.sql.Connection doesn't handle connect closed exception

Just examining the code, the java.sql.Connection is opened once and reused throughout the lifecycle of the app. However, it's possible that the database will close the connection during some fatal errors. The app should be resilient to such closures.

Another issue is that the Connection object is not thread safe. A thread pool like HikariCP should be used to provide separate connections per thread and handle automatically reopening closed connections.

Update docker for balance logging

BalanceFileLogger uses a lot of CPU

Actual:
Account balancer parser uses too much CPU even with no files to process. I assume it will also use a lot of CPU even with files.

$ docker stats
CONTAINER ID        NAME                                         CPU %               MEM USAGE / LIMIT     MEM %               NET I/O             BLOCK I/O           PIDS
7d58db1a8885        docker_mirror-node-balance-parser_1          20.48%              98.04MiB / 3.855GiB   2.48%               2.95kB / 1.28kB     0B / 0B             19
2ea1372c35b4        docker_mirror-node-record-download-parse_1   2.15%               202.5MiB / 3.855GiB   5.13%               12.6MB / 8.35MB     0B / 0B             21
2270524615a9        docker_mirror-node-postgres_1                0.49%               4.312MiB / 3.855GiB   0.11%               842kB / 779kB       0B / 4.1kB          7
0141de8d8535        docker_mirror-node-rest-api_1                0.00%               52.38MiB / 3.855GiB   1.33%               600kB / 416kB       0B / 16.4kB         24

$ docker logs -f 7d58db1a8885
2019-08-09 21:25:10,364 INFO  [main  ] balancelogger: Balance History processing done
2019-08-09 21:25:10,365 INFO  [main  ] balancelogger: No balance file to parse found
2019-08-09 21:25:10,366 INFO  [main  ] balancelogger: Last Balance processing done
2019-08-09 21:25:10,367 INFO  [main  ] balancelogger: Balance History processing done
2019-08-09 21:25:10,369 INFO  [main  ] balancelogger: No balance file to parse found
2019-08-09 21:25:10,369 INFO  [main  ] balancelogger: Last Balance processing done
2019-08-09 21:25:10,370 INFO  [main  ] balancelogger: Balance History processing done
2019-08-09 21:25:10,372 INFO  [main  ] balancelogger: No balance file to parse found
2019-08-09 21:25:10,372 INFO  [main  ] balancelogger: Last Balance processing done
2019-08-09 21:25:10,374 INFO  [main  ] balancelogger: Balance History processing done
2019-08-09 21:25:10,376 INFO  [main  ] balancelogger: No balance file to parse found
2019-08-09 21:25:10,376 INFO  [main  ] balancelogger: Last Balance processing done
2019-08-09 21:25:10,377 INFO  [main  ] balancelogger: Balance History processing done
2019-08-09 21:25:10,379 INFO  [main  ] balancelogger: No balance file to parse found

Expected:
The process uses less CPU.

The problem is there's a while (true) {someWork();} without there being any sleep after processing. Alternatively, it would be better to use Java's FileWatcher API to be asynchronously notified of file changes.

Crash on invalid account balance csv

Actual:

2019-08-15 15:10:37,094 INFO  [main  ] configloader Loading configuration from ./config/config.json
2019-08-15 15:11:17,133 INFO  [main  ] balancelogger No balance file to parse found
2019-08-15 15:11:17,133 INFO  [main  ] balancelogger Last Balance processing done
2019-08-15 15:11:17,133 INFO  [main  ] balancelogger Balance History processing done
2019-08-15 15:12:17,209 ERROR [main  ] balancelogger File ./MirrorNodeData/accountBalances/valid/foo.csv is not named as expected, should be like 2019-06-28-22-05.csv
2019-08-15 15:12:17,221 INFO  [main  ] balancelogger Last Balance processing done
2019-08-15 15:12:17,222 ERROR [main  ] balancelogger File ./MirrorNodeData/accountBalances/valid/foo.csv is not named as expected, should be like 2019-06-28-22-05.csv
2019-08-15 15:12:17,223 ERROR [main  ] filewatcher Exception : {}
java.lang.StringIndexOutOfBoundsException: begin 0, end 10, length 7
	at java.lang.String.checkBoundsBeginEnd(String.java:3319) ~[?:?]
	at java.lang.String.substring(String.java:1874) ~[?:?]
	at com.hedera.utilities.Utility.moveFileToParsedDir(Utility.java:499) ~[classes/:?]
	at com.hedera.balanceFileLogger.BalanceFileLogger.processAllFilesForHistory(BalanceFileLogger.java:162) ~[classes/:?]
	at com.hedera.balanceFileLogger.BalanceFileLogger.onCreate(BalanceFileLogger.java:149) ~[classes/:?]
	at com.hedera.fileWatcher.FileWatcher.watch(FileWatcher.java:61) [classes/:?]
	at com.hedera.balanceFileLogger.BalanceFileLogger.main(BalanceFileLogger.java:143) [classes/:?]

Expected:
Reject invalid file and move on

Public testnet record file signature verification fails

NullPointerException in EventStreamFileParser

Actual:
On startup with a clean filesystem and db:

mirror-node-event-download-parse_1   | Exception in thread "main" java.lang.NullPointerException
mirror-node-event-download-parse_1   | 	at com.hedera.parser.EventStreamFileParser.loadEventStreamFiles(EventStreamFileParser.java:363)
mirror-node-event-download-parse_1   | 	at com.hedera.parser.EventStreamFileParser.parseNewFiles(EventStreamFileParser.java:459)
mirror-node-event-download-parse_1   | 	at com.hedera.downloader.DownloadAndParseEventFiles.main(DownloadAndParseEventFiles.java:48)

Expected:
No crash

Performance Tests: set up and tooling

Begin evaluating tooling and implementing performance tests, to be run from CI (nightly test?).

spin up environment to test the performance of the mirror-node (parser and rest-api), likely nightly build.
get results for:
- rest-api performance numbers per transaction
- insert/processing performance numbers from the parser
  ...

cc @atul-hedera

FileWatcher not notified of files created while down

BalanceFileLogger uses the Java file watching API to be notified of creates and updates of files that the downloader writes. However, if BalanceFileLogger is down while downloader writes them then there will be no notification. We need to look for new files on startup.

Test balance update with a file containing 5M rows

Generate a CSV file with 5M records (starting at 1000 for the account number) and load into the database (place the file in the valid folder and run balanceParse).

Generate a second CSV file with different balances and load into the database.

Note: Set the config.json parameter "validatebalancesignatures" to false before running the test, else signatures will be required for all nodes on this file.

Balance downloader fails to start

Creating network "docker_default" with the default driver
Creating docker_mirror-node-rest-api_1              ... done
Creating docker_mirror-node-balance-downloader_1    ... error
Creating docker_mirror-node-postgres_1           ... done
Creating docker_mirror-node-102-file-update_1    ... done
Creating docker_mirror-node-balance-parser_1     ... 
Creating docker_mirror-node-record-download-parse_1 ... 

Creating docker_mirror-node-balance-parser_1        ... done
Creating docker_mirror-node-record-download-parse_1 ... done

ERROR: for mirror-node-balance-downloader  Cannot start service mirror-node-balance-downloader: OCI runtime create failed: container_linux.go:345: starting container process caused "exec: \"/usr/bin/java\": stat /usr/bin/java: no such file or directory": unknown
ERROR: Encountered errors while bringing up the project.

Manually running container shows java is not there in path:

docker run -it --rm docker_mirror-node-balance-downloader /bin/sh
# java
/bin/sh: 1: java: not found
# /bin/java 
/bin/sh: 2: /bin/java: not found

Consider writing all state to the database

Currently there are some json files that contain the state of the system like the below:

balance.json
records.json
events.json
loggerStatus.json

All system state should be consolidated to the database or constructed dynamically from existing tables. For example, the lastValidRcd* state could be reconstructed by getting the max timestamp of records processed on startup or by manually storing the last valid processed in a separate table.

Reasons for using the database:

State is stored in one spot instead of spread around
Can be migrated between versions when new fields added/removed
Transactions provide atomic updates, ensuring no race conditions between multiple threads/processes reading/writing to same file
Current files hardcode paths with / in config and thus won't work on Windows
Better data type validation, foreign keys and cascading delete

Balance file name doesn't include nanoseconds

Current resolution is seconds in the file name. We should include nanoseconds in the database once the testnet is upgraded.

Rework database schema

Transactions - use consensus_ns as primary key
~~Balances - use entities instead of normalising data into t_account_balances~~ see #86
Remove unnecessary columns and indices

REST API returning numeric fields as strings

There are several fields returned by the REST API that are returned as JSON strings (quoted) instead of JSON numeric values.

Simple cleanup of this (to return numerics) before the API is released would be helpful.

transaction response fields:

charged_tx_fee
transfers.amount
balances - I'm not sure (still trying to get this working locally)

Fields that are numeric (decimal) where using a string is an ok tradeoff to avoid loss of precision (we don't need to change these, necessarily):

valid_start_timestamp
consensus_timestamp

Expose entity keys in the t_entities_table

Entity keys (key for accounts and files and admin_key for contracts) are currently stored in protobuf format.

Add column to store key in plain text
1.1 Consolidate key and admin_key into a single column
Modify java code to extract the ED25519 key from protobuf at the time of parsing and store in that column
Ignore keys that contain keylist or threshold keys
Optionally implement a migration for the entities that already exist in the database.

Optimise balance load into database

Purpose is to reduce the amount of updates to existing rows.

-Insert balance data into new table
-Drop t_account_balances table
-Rename new table to t_account_balances
-Rebuild indices / foreign keys

NullPointerException when 102 file doesn't exist

mirror-node-record-download-parse_1  | 2019-08-09 18:41:27.870 ERROR  286  utility - getBytes() failed, Exception: {}
mirror-node-record-download-parse_1  | java.io.FileNotFoundException: ./config/0.0.102 (No such file or directory)
mirror-node-record-download-parse_1  | 	at java.io.FileInputStream.open0(Native Method) ~[?:?]
mirror-node-record-download-parse_1  | 	at java.io.FileInputStream.open(FileInputStream.java:219) ~[?:?]
mirror-node-record-download-parse_1  | 	at java.io.FileInputStream.<init>(FileInputStream.java:157) ~[?:?]
mirror-node-record-download-parse_1  | 	at com.hedera.utilities.Utility.getBytes(Utility.java:283) [mirrorNode.jar:?]
mirror-node-record-download-parse_1  | 	at com.hedera.signatureVerifier.NodeSignatureVerifier.<init>(NodeSignatureVerifier.java:49) [mirrorNode.jar:?]
mirror-node-record-download-parse_1  | 	at com.hedera.downloader.RecordFileDownloader.verifySigsAndDownloadRecordFiles(RecordFileDownloader.java:137) [mirrorNode.jar:?]
mirror-node-record-download-parse_1  | 	at com.hedera.downloader.RecordFileDownloader.downloadNewRecordfiles(RecordFileDownloader.java:37) [mirrorNode.jar:?]
mirror-node-record-download-parse_1  | 	at com.hedera.downloader.DownloadAndParseRecordFiles.main(DownloadAndParseRecordFiles.java:35) [mirrorNode.jar:?]
mirror-node-record-download-parse_1  | Exception in thread "main" java.lang.NullPointerException
mirror-node-record-download-parse_1  | 	at com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:209)
mirror-node-record-download-parse_1  | 	at com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:214)
mirror-node-record-download-parse_1  | 	at com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:49)
mirror-node-record-download-parse_1  | 	at com.hederahashgraph.api.proto.java.NodeAddressBook.parseFrom(NodeAddressBook.java:237)
mirror-node-record-download-parse_1  | 	at com.hedera.signatureVerifier.NodeSignatureVerifier.<init>(NodeSignatureVerifier.java:49)
mirror-node-record-download-parse_1  | 	at com.hedera.downloader.RecordFileDownloader.verifySigsAndDownloadRecordFiles(RecordFileDownloader.java:137)
mirror-node-record-download-parse_1  | 	at com.hedera.downloader.RecordFileDownloader.downloadNewRecordfiles(RecordFileDownloader.java:37)
mirror-node-record-download-parse_1  | 	at com.hedera.downloader.DownloadAndParseRecordFiles.main(DownloadAndParseRecordFiles.java:35)

mirror-node-balance-downloader_1     | 2019-08-09 18:41:23.922 ERROR  286  utility - getBytes() failed, Exception: {}
mirror-node-balance-downloader_1     | java.io.FileNotFoundException: ./config/0.0.102 (No such file or directory)
mirror-node-balance-downloader_1     | 	at java.io.FileInputStream.open0(Native Method) ~[?:?]
mirror-node-balance-downloader_1     | 	at java.io.FileInputStream.open(FileInputStream.java:219) ~[?:?]
mirror-node-balance-downloader_1     | 	at java.io.FileInputStream.<init>(FileInputStream.java:157) ~[?:?]
mirror-node-balance-downloader_1     | 	at com.hedera.utilities.Utility.getBytes(Utility.java:283) [mirrorNode.jar:?]
mirror-node-balance-downloader_1     | 	at com.hedera.signatureVerifier.NodeSignatureVerifier.<init>(NodeSignatureVerifier.java:49) [mirrorNode.jar:?]
mirror-node-balance-downloader_1     | 	at com.hedera.downloader.AccountBalancesDownloader.verifySigsAndDownloadBalanceFiles(AccountBalancesDownloader.java:74) [mirrorNode.jar:?]
mirror-node-balance-downloader_1     | 	at com.hedera.downloader.AccountBalancesDownloader.main(AccountBalancesDownloader.java:47) [mirrorNode.jar:?]
mirror-node-balance-downloader_1     | Exception in thread "main" java.lang.NullPointerException
mirror-node-balance-downloader_1     | 	at com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:209)
mirror-node-balance-downloader_1     | 	at com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:214)
mirror-node-balance-downloader_1     | 	at com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:49)
mirror-node-balance-downloader_1     | 	at com.hederahashgraph.api.proto.java.NodeAddressBook.parseFrom(NodeAddressBook.java:237)
mirror-node-balance-downloader_1     | 	at com.hedera.signatureVerifier.NodeSignatureVerifier.<init>(NodeSignatureVerifier.java:49)
mirror-node-balance-downloader_1     | 	at com.hedera.downloader.AccountBalancesDownloader.verifySigsAndDownloadBalanceFiles(AccountBalancesDownloader.java:74)
mirror-node-balance-downloader_1     | 	at com.hedera.downloader.AccountBalancesDownloader.main(AccountBalancesDownloader.java:47)

It would be better to fallback to nodesConfig.json if not available, I would think. Or possibly constructing it from the files in S3.

Also, there is a race condition if the 102 updater doesn't run before the other processes that require the file run they will still fail to start.

Add CI for build

Enable CI on the repository for build.

REST API in Docker

Store lastValidRCDFile and hash in file other than config.json

Sharing config.json across multiple processes results in changes made by one process to the config.json file to be overwritten by another.

Add analytics base on Events

Basic Network Health Analytics Based on Events

Count Created Events Per time window
Count Reached Consensus Events Per time window
Count Created Events Per time window Per Node
Count Reached Consensus Events Per time window Per Node
Count of System & User Transactions in Reached Consensus Events Per time window Per Node
query for the min, median, and max latency (consensusTimestamp - timeCreated)for all events that reached consensus in a given time window, per node.
get the latency of events that reached consensus in a given time window

Save parsed events into database; Design Event schema

t_events Table

Type	Column Name	Description	NotNull
bigint	id	Primary Key, auto-increment	NotNull
bigint	consensus_order	order in history (0 first)	NotNull
bigint	creator_node_id	nodeID of this event's creator	NotNull
bigint	creator_seq	sequence number for this by its creator (0 is first)	NotNull
bigint	other_node_id	ID of otherParent 's creator
bigint	other_seq	sequence number for otherParent event (by its creator)
bytea	signature	creator's sig for this	NotNull
bytea	hash	hash of this event	NotNull
bigint	self_parent_id	the id for the self parent
bigint	other_parent_id	the id for other parent
bytea	self_parent_hash	hash of the self parent
bytea	other_parent_hash	hash of other parent
bigint	self_parent_generation	the generation for the self parent
bigint	other_parent_generation	the generation for other parent
bigint	generation	generation (which is 1 plus max of parents' generations)	NotNull
bigint	created_timestamp_ns	seconds * (10 ^ 9) + nanos of creation time, as claimed by its creator	NotNull
bigint	consensus_timestamp_ns	seconds * (10 ^ 9) + nanos of the community's consensus timestamp for this event	NotNull
bigint	latency_ns	consensus_timestamp_ns - created_timestamp_ns	NotNull
integer	txs_bytes_count	number of bytes in transactions in this event	NotNull
integer	platform_tx_count	number of platform Transactions in this event	NotNull
integer	app_tx_count	number of application Transactions in this event	NotNull

REST API: Remove offset from the queries

Only store deltas in account_balances

In order to reduce data storage requirements, only store the balance of an account in history if it has changed since the last time it was recorded.

Don't process partial files

Due to the nature of downloading to files and processing the files in separate processes, it's possible that processing could occur on a partially downloaded file. Whether it's polling for new files manually or using java's WatcherService, it's a possibility that the S3 downloader created the file, wrote some bytes but has not finished writing all bytes by the time the file is picked up by the parsers (especially for larger files).

Possible solutions:

Download files to a different directory and move them once complete. Move is an atomic operation.
Combine downloader and parser into a single process and use S3's GetObject to download to a ByteArrayOutputStream instead of a file.

Pros to 1 is its quickest.

Pros to 2 is it's a simpler architecture and avoids the penalty of writing files and coordinating processes. Due to the serial nature of processing these file streams, the only benefit of having separate processes is for separation of failures. But parser would be effectively down anyway if there's no new files being downloaded. Con is it's more work and not sure how large files get to be loaded into memory.

Enable balance file signature verification

When the testnet is updated, re-enable balance file signature verification.
Also store the last valid balance file in a file other than config.json.

AWS / GCP file fetch automatic failover

Problem

Currently, the importer is configured to either download from AWS or GCS. In the event of a connection failure, the mirror node should support failing over to one or more other cloud providers.

Solution

Add a hedera.mirror.importer.downloader.sources to CommonDownloaderProperties:

    private List<StreamSourceProperties> sources = new ArrayList<>();

    public static class StreamSourceProperties {
        private StreamCredentials credentials;
        private String projectId; // maps to gcpProjectId
        private StreamSourceType type; // Renamed from CloudProvider
        private URI uri; // maps to endpointOverride
    }

    public static class StreamCredentials {
        private String accessKey;
        private String secretKey;
    }

Keep existing bucket properties as is. On startup, create a StreamSourceProperties from the existing properties and push to the front of the sources list.
Add a class to abstract away the source of stream files

public interface StreamFileProvider {

    Mono<StreamFileData> get(ConsensusNode node, StreamFilename streamFilename);

    Flux<StreamFileData> list(ConsensusNode node, StreamFilename lastFilename);
}

Add a S3StreamFileProvider that uses S3AsyncClient
- A GcsStreamFileProvider is not necessary since we can reuse S3StreamFileProvider for both since they implement the same API.
Add a CompositeStreamFileProvider to handle the failover functionality
- Iterate over the sources always starting with the first
- If the selected source throws an exception, and it's not the last functioning source, remove it from the list of valid sources
- Don't remove if it's a common, transient exception like NoSuchKeyException, etc. Check historical logs to verify which ones.
- Periodically retry bad sources to see if they've become healthy again and re-add to valid sources

Send Queries that are free to a random node that have worked recently

Currently, for the Queries which don't contain a payment, we send them to a default node.

For load balancing and getting rid of single point failure, each time we should pick a random node that have worked recently and send a Query to it.

Reference: wallets and micropayment server

As Leemon said: In our wallets and micropayment server etc, we have code to remember when a node fails to respond, and we then mark it to not be randomly called in the near future. So mostly we choose randomly from the nodes that have worked recently, and only occasionally call a node that used to be down. That kind of behavior should eventually be implemented here, too.

Consider separating business logic and database logic

Currently, database queries are performed directly in the business logic layers. Most enterprise applications separate these two so the database operations can be tested in isolation and reused among multiple components. This pattern is call the DAO or Repository pattern. Libraries like Hibernate and Spring Data help automate most of this so you get the basic operations for free.

We should consider at minimum separating into separate classes and potentially using Spring Data in the future. Should also create domain objects populated by repository to pass between business layer and repo layer.

Download from S3 in parallel

Currently we loop through all node IDs and download any new files serially. To optimize this, we should start a new thread per node ID (maybe up to some max threads) and download them in parallel. This way we can utilize network and CPU resources more efficiently and enable parser to get the 2/3 of nodes signatures quicker.

Rework address book file update

When a transaction to update the address book is detected by mirror node, the address book file is updated.
Need to test this scenario with an actual address book update transaction.

In the event that an address book update transaction itself cannot be tested, a test with another file to verify contents are updated would be a good first confidence test.

Create a file on the file system containing some data
Create the same file on the network via a transaction
Issue an update transaction for the same file with new contents
Check that the 0.0.102 file contains the new contents.