Code Monkey home page Code Monkey logo

hedera-mirror-node's Introduction

Build codecov Discord

Hedera Mirror Node

The Hedera Mirror Node acts as an archive node and stores historical data for the Hedera network.

Overview

Mirror nodes receive information from the Hedera nodes and can provide value-added services such as APIs, auditing, analytics, visibility services, security threat modeling, data monetization services, etc. Mirror nodes can also run additional business logic to support applications built using the Hedera network.

While mirror nodes receive information from the main nodes, they do not contribute to consensus on the network, and their votes are not counted. Only the votes from the main nodes are counted for determining consensus. The trust of the Hedera network is derived based on the consensus reached by the main nodes. That trust is transferred to the mirror nodes using cryptographic signatures on a chain of files.

Eventually, the mirror nodes will be able to run the same code as the Hedera nodes so that they can see the transactions in real time. To make the initial deployments easier, the mirror node strives to take away the burden of running a full Hedera node through creation of periodic files that contain processed information (such as account balances or transaction records), and have the full trust of the main nodes. The mirror node software reduces the processing burden by receiving pre-constructed files from the network, validating those, populating a database, and providing APIs to expose the data. This approach provides the following advantages:

  • Lower compute and bandwidth requirements
  • Allows users to only save the data that they care about (lower storage requirement)
  • Easy searchable database so the users can add value quickly
  • Easy to consume APIs to make integrations faster

Architecture

Main Nodes

  • When a transaction reaches consensus, Hedera nodes add the transaction and its associated record to a record file.
  • Record files contain the hash of the previous record file, thus creating an unbreakable validation chain.
  • The file is closed on a regular cadence and a signature file is generated by the node for the record file.
  • The record and signature files from each node are then uploaded to Amazon S3 and Google Cloud Storage.

Mirror Nodes

  • This mirror node software downloads signature files from cloud storage.
  • The signature files are verified using the corresponding node's public key from the address book (stored in a 0.0.102 file).
  • The verified signature files are checked to ensure at least 1/3 have the same record file hash.
  • For each valid signature file, the corresponding record file is then downloaded from cloud storage and its hash is verified against the hash contained in the signature file.
  • The downloaded record file contains a previous hash that is validated against the last processed file to verify the hash chain.
  • Record files can then be processed and transactions and records processed for long term storage.

Getting Started

Prerequisite Tools

Ensure these tools are installed (note minimum versions) prior to running the mirror node:

Running

For production use, we recommend using Kubernetes and to deploy using our Helm chart. Hedera managed mirror nodes use Kubernetes and Helm for their deployments, and this process is considered the most production-ready. As an alternative for local development, Docker Compose can be used to run the mirror node. See the installation document for more details on configuring and running with Docker Compose. To get up and running quickly with Docker Compose execute the following commands in your terminal:

git clone https://github.com/hashgraph/hedera-mirror-node.git
cd hedera-mirror-node
docker compose up

NOTE: This defaults to a bucket setup for demonstration purposes. See the next section for more details.

Data Access

Demo

The free option utilizes a bucket setup for demonstration purposes. This is not a real Hedera network but simply a dummy bucket populated with a day's worth of past testnet data. This is the default option and requires no additional steps. Once you've verified your deployment works against the demo bucket, remember to configure it with a public network then wipe the database and restart the mirror node.

Public Networks

To access data from real Hedera networks, AWS or GCS requester pays credentials must be used. The charges associated with the downloading of stream files are paid for by the requester and not the bucket owner. See the Run Your Own Mirror Node documentation for more information.

Documentation

Releasing

To perform a new release, run the Automated Release GitHub workflow.

Support

If you have a question on how to use the product, please see our support guide.

Contributing

Contributions are welcome. Please see the contributing guide to see how you can get involved.

Code of Conduct

This project is governed by the Contributor Covenant Code of Conduct. By participating, you are expected to uphold this code of conduct. Please report unacceptable behavior to [email protected].

License

Apache License 2.0

hedera-mirror-node's People

Contributors

0xivanov avatar apeksharma avatar ar-conmit avatar bilyana-gospodinova avatar calvinchengx avatar dependabot[bot] avatar edwin-greene avatar georgi-l95 avatar gregscullard avatar hedera-github-bot avatar ivankavaldzhiev avatar jascks avatar jnels124 avatar kenthejr avatar kselveliev avatar marckriguerathedera avatar matheus-dallrosa avatar mgoelswirlds avatar mike-burrage-hedera avatar mustafauzunn avatar nana-ec avatar natanasow avatar nikolovyanko avatar qianswirlds avatar steven-sheehy avatar stoyan-lime avatar stoyanov-st avatar xin-hedera avatar yiliev0 avatar zhpetkov avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

hedera-mirror-node's Issues

/MirrorNodeData/recordStreams/valid does not exist

Actual:
This error prints repeatedly on startup:

mirror-node-record-download-parse_1  | WARNING: An illegal reflective access operation has occurred
mirror-node-record-download-parse_1  | WARNING: Illegal reflective access by com.google.protobuf.UnsafeUtil (file:/MirrorNodeCode/lib/protobuf-java-3.5.1.jar) to field java.nio.Buffer.address
mirror-node-record-download-parse_1  | WARNING: Please consider reporting this to the maintainers of com.google.protobuf.UnsafeUtil
mirror-node-record-download-parse_1  | WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations
mirror-node-record-download-parse_1  | WARNING: All illegal access operations will be denied in a future release
mirror-node-record-download-parse_1  | Downloading Done
mirror-node-record-download-parse_1  | 2019-08-09 20:31:07.326 ERROR  241  recordfileparser - Exception file /MirrorNodeData/recordStreams/valid does not exist
mirror-node-record-download-parse_1  | Parsing Done
mirror-node-record-download-parse_1  | Downloading
mirror-node-record-download-parse_1  | Downloading Done

Expected:
All necessary directories are created by the app on startup.

BalanceFileLogger crashes on start if balances don't exist

mirror-node-balance-parser_1         | 2019-08-09 18:41:25.749 ERROR  139  balancelogger - /MirrorNodeData/accountBalances/valid does not exist.
mirror-node-balance-parser_1         | Exception in thread "main" java.lang.NullPointerException
mirror-node-balance-parser_1         | 	at com.hedera.balanceFileLogger.BalanceFileLogger.processAllFilesForHistory(BalanceFileLogger.java:162)
mirror-node-balance-parser_1         | 	at com.hedera.balanceFileLogger.BalanceFileLogger.main(BalanceFileLogger.java:156)

Expected:
It should create the directory on startup if it doesn't exist.

EventStream Parser

Add an EventStream Parser to parse new version EventStream files.
format:

Type Description Value
int EVENT_STREAM_FILE_VERSION 2
byte TYPE_PREV_HASH 1
byte[48] Previous File Hash
byte STREAM_EVENT_START_NO_TRANS_WITH_VERSION (0x5b) or STREAM_EVENT_START_WITH_VERSION (0x5a)
int STREAM_EVENT_VERSION 2
long event.creatorId
long event.creatorSeq
long event.otherId
long event.otherSeq
long event.selfParentGen
long event.otherParentGen
NullableByteArray event.selfParentHash if read an Int -1, it means the array is null; else the int value is the length of the byte array to be read
NullableByteArray event.otherParentHash same as above
Transaction[] event.transactions only exists if we have read STREAM_EVENT_START_WITH_VERSION earlier
Instant event.timeCreated
byte[] event.signature
byte commEventLast 0x46
byte[] event.getHash()
Instant event.consensusTimestamp
long event.consensusOrder
byte STREAM_EVENT_START_NO_TRANS_WITH_VERSION (0x5b) or STREAM_EVENT_START_WITH_VERSION (0x5a)
int STREAM_EVENT_VERSION 2
next Event info...
byte STREAM_EVENT_START_NO_TRANS_WITH_VERSION (0x5b) or STREAM_EVENT_START_WITH_VERSION (0x5a)
int STREAM_EVENT_VERSION 2
next Event info...
...

NullPointerException in AccountBalancesDownloader

Actual:
Occurs on startup with a clean db and filesystem:

mirror-node-balance-downloader_1     | WARNING: An illegal reflective access operation has occurred
mirror-node-balance-downloader_1     | WARNING: Illegal reflective access by com.google.protobuf.UnsafeUtil (file:/MirrorNodeCode/lib/protobuf-java-3.5.1.jar) to field java.nio.Buffer.address
mirror-node-balance-downloader_1     | WARNING: Please consider reporting this to the maintainers of com.google.protobuf.UnsafeUtil
mirror-node-balance-downloader_1     | WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations
mirror-node-balance-downloader_1     | WARNING: All illegal access operations will be denied in a future release
mirror-node-balance-downloader_1     | Exception in thread "main" java.lang.NullPointerException
mirror-node-balance-downloader_1     | 	at com.hedera.downloader.AccountBalancesDownloader.verifySigsAndDownloadBalanceFiles(AccountBalancesDownloader.java:88)
mirror-node-balance-downloader_1     | 	at com.hedera.downloader.AccountBalancesDownloader.main(AccountBalancesDownloader.java:47)

Expected:
No NPE

Single process mode

Currently the architecture is complex with 5 java processes starting up each with only a single thread. We should also support starting up in a single process with each of the standalone processes now running in a separate thread. Wouldn't take much work, just need to consolidate to a single main() and have each of the previous mains implement a common interface. A flag would allow you to pick and choose which module you wanted to run on startup.

lastValidBalanceFileName not updated if file already exists locally

The state of the last balance file received with the actual data downloaded can get out of sync. When this occurs the balance downloader will never download any new files indefinitely. In practice, I'm not sure how the balances.json file got behind the data. This happened when I was playing around with adding sleep. But it could potentially occur if the balances.json fails to write or gets accidentally wiped out.

To reproduce, you can set your balances.json to a file in the past (pick something past maxDownloadItems in the past). For example, download up to date then update balances.json:

{
  "lastValidBalanceFileName": "2019-07-11-10-05.csv"
}

You'll just see it trying the same files over and over in the logs and saying they exist:

2019-08-22 11:28:31,818 TRACE [main  ] c.h.d.AccountBalancesDownloader File exists: ./MirrorNodeData/accountBalances/valid//2019-07-11-10-10.csv
2019-08-22 11:28:31,818 TRACE [main  ] c.h.d.AccountBalancesDownloader File exists: ./MirrorNodeData/accountBalances/valid//2019-07-11-10-15.csv
2019-08-22 11:28:31,818 TRACE [main  ] c.h.d.AccountBalancesDownloader File exists: ./MirrorNodeData/accountBalances/valid//2019-07-11-10-20.csv
2019-08-22 11:28:31,819 TRACE [main  ] c.h.d.AccountBalancesDownloader File exists: ./MirrorNodeData/accountBalances/valid//2019-07-11-10-25.csv
2019-08-22 11:28:31,819 TRACE [main  ] c.h.d.AccountBalancesDownloader File exists: ./MirrorNodeData/accountBalances/valid//2019-07-11-10-30.csv
2019-08-22 11:28:31,819 TRACE [main  ] c.h.d.AccountBalancesDownloader File exists: ./MirrorNodeData/accountBalances/valid//2019-07-11-10-35.csv
2019-08-22 11:28:31,819 TRACE [main  ] c.h.d.AccountBalancesDownloader File exists: ./MirrorNodeData/accountBalances/valid//2019-07-11-10-40.csv
2019-08-22 11:28:31,819 TRACE [main  ] c.h.d.AccountBalancesDownloader File exists: ./MirrorNodeData/accountBalances/valid//2019-07-11-10-45.csv
2019-08-22 11:28:31,819 TRACE [main  ] c.h.d.AccountBalancesDownloader File exists: ./MirrorNodeData/accountBalances/valid//2019-07-11-10-50.csv
2019-08-22 11:28:31,819 TRACE [main  ] c.h.d.AccountBalancesDownloader File exists: ./MirrorNodeData/accountBalances/valid//2019-07-11-10-55.csv
2019-08-22 11:28:31,819 TRACE [main  ] c.h.d.AccountBalancesDownloader File exists: ./MirrorNodeData/accountBalances/valid//2019-07-11-11-00.csv

Store full key structure against an entity

Extract the keys, keylist and threshold keys from keys and adminKeys in t_entities.

Option 1: Create a database structure to mirror keys, lists and threshold lists. Consider searches must be able to find an entity for a given key regardless of depth.

Option 2: Simply extract a distinct list of keys to attach to an entity such that an entity can be found by searching for the key

Use address book for list of nodes rather than nodesInfo.json

Use address book for list of nodes rather than nodes info, this keeps the list of nodes up to date and removes a configuration step.

  • Proxy
    Look at list of nodes from address book.

  • Mirror node downloads
    -Not automating this does allow for some freedom in where to download from, at the expense of trust and reliability
    -Automating forces mirror node operators to download everything

Modify EventStream file Parser to be able to parse eventStream files with new version

Modify MirrorNode to be able to parse EventStream files with new version number 3.
Related issue: https://github.com/swirlds/platform-swirlds/issues/1369

Difference between version3 and version2:

  1. the way of generating file Hash

file[i] = p[i] || h[i] || c[i]

version2:
h[i] = hash(p[i-1] || h[i-1] || c[i-1])

version3:
h[i] = hash(p[i-1] || h[i-1] || hash(c[i-1]))

  1. in version2, each time the nodes restart, the prevFileHash contained in the first EventStream file which is generated after restarting would be new byte[48];
    in version3, while starting the platform, we read the FileHash contained in the last .evts_sig file in the directory, and use it as prevFileHash when we start to write the first EventStream file; if such file doesn't exist, we use new byte[48] as prevFileHash.

Record file status for eventStream files; Reimport a file if necessary

add a file status table in database;
add a column fileId in t_events table;
if a file was partially parsed and the parsing stoped, we should first delete the events in t_events which are contained in this file, then reimport this eventstream file without producing duplicate rows.

Account balances downloader polls S3 repeatedly

Account balances only occur every 15 minutes, but the current downloader will repeatedly poll S3 for new files without any sleep in between. We should institute some amount of sleeping to reduce CPU usage and S3 network costs. There are two scenarios to consider:

  1. Initial startup and catching up: There should be no sleep in between queries when there were any downloads last round.
  2. Up to date: There should be a sleep. It could be anywhere from a dumb 1s sleep to a smart 15min from last download sleep.

java.sql.Connection doesn't handle connect closed exception

Just examining the code, the java.sql.Connection is opened once and reused throughout the lifecycle of the app. However, it's possible that the database will close the connection during some fatal errors. The app should be resilient to such closures.

Another issue is that the Connection object is not thread safe. A thread pool like HikariCP should be used to provide separate connections per thread and handle automatically reopening closed connections.

BalanceFileLogger uses a lot of CPU

Actual:
Account balancer parser uses too much CPU even with no files to process. I assume it will also use a lot of CPU even with files.

$ docker stats
CONTAINER ID        NAME                                         CPU %               MEM USAGE / LIMIT     MEM %               NET I/O             BLOCK I/O           PIDS
7d58db1a8885        docker_mirror-node-balance-parser_1          20.48%              98.04MiB / 3.855GiB   2.48%               2.95kB / 1.28kB     0B / 0B             19
2ea1372c35b4        docker_mirror-node-record-download-parse_1   2.15%               202.5MiB / 3.855GiB   5.13%               12.6MB / 8.35MB     0B / 0B             21
2270524615a9        docker_mirror-node-postgres_1                0.49%               4.312MiB / 3.855GiB   0.11%               842kB / 779kB       0B / 4.1kB          7
0141de8d8535        docker_mirror-node-rest-api_1                0.00%               52.38MiB / 3.855GiB   1.33%               600kB / 416kB       0B / 16.4kB         24
$ docker logs -f 7d58db1a8885
2019-08-09 21:25:10,364 INFO  [main  ] balancelogger: Balance History processing done
2019-08-09 21:25:10,365 INFO  [main  ] balancelogger: No balance file to parse found
2019-08-09 21:25:10,366 INFO  [main  ] balancelogger: Last Balance processing done
2019-08-09 21:25:10,367 INFO  [main  ] balancelogger: Balance History processing done
2019-08-09 21:25:10,369 INFO  [main  ] balancelogger: No balance file to parse found
2019-08-09 21:25:10,369 INFO  [main  ] balancelogger: Last Balance processing done
2019-08-09 21:25:10,370 INFO  [main  ] balancelogger: Balance History processing done
2019-08-09 21:25:10,372 INFO  [main  ] balancelogger: No balance file to parse found
2019-08-09 21:25:10,372 INFO  [main  ] balancelogger: Last Balance processing done
2019-08-09 21:25:10,374 INFO  [main  ] balancelogger: Balance History processing done
2019-08-09 21:25:10,376 INFO  [main  ] balancelogger: No balance file to parse found
2019-08-09 21:25:10,376 INFO  [main  ] balancelogger: Last Balance processing done
2019-08-09 21:25:10,377 INFO  [main  ] balancelogger: Balance History processing done
2019-08-09 21:25:10,379 INFO  [main  ] balancelogger: No balance file to parse found

Expected:
The process uses less CPU.

The problem is there's a while (true) {someWork();} without there being any sleep after processing. Alternatively, it would be better to use Java's FileWatcher API to be asynchronously notified of file changes.

Crash on invalid account balance csv

Actual:

2019-08-15 15:10:37,094 INFO  [main  ] configloader Loading configuration from ./config/config.json
2019-08-15 15:11:17,133 INFO  [main  ] balancelogger No balance file to parse found
2019-08-15 15:11:17,133 INFO  [main  ] balancelogger Last Balance processing done
2019-08-15 15:11:17,133 INFO  [main  ] balancelogger Balance History processing done
2019-08-15 15:12:17,209 ERROR [main  ] balancelogger File ./MirrorNodeData/accountBalances/valid/foo.csv is not named as expected, should be like 2019-06-28-22-05.csv
2019-08-15 15:12:17,221 INFO  [main  ] balancelogger Last Balance processing done
2019-08-15 15:12:17,222 ERROR [main  ] balancelogger File ./MirrorNodeData/accountBalances/valid/foo.csv is not named as expected, should be like 2019-06-28-22-05.csv
2019-08-15 15:12:17,223 ERROR [main  ] filewatcher Exception : {}
java.lang.StringIndexOutOfBoundsException: begin 0, end 10, length 7
	at java.lang.String.checkBoundsBeginEnd(String.java:3319) ~[?:?]
	at java.lang.String.substring(String.java:1874) ~[?:?]
	at com.hedera.utilities.Utility.moveFileToParsedDir(Utility.java:499) ~[classes/:?]
	at com.hedera.balanceFileLogger.BalanceFileLogger.processAllFilesForHistory(BalanceFileLogger.java:162) ~[classes/:?]
	at com.hedera.balanceFileLogger.BalanceFileLogger.onCreate(BalanceFileLogger.java:149) ~[classes/:?]
	at com.hedera.fileWatcher.FileWatcher.watch(FileWatcher.java:61) [classes/:?]
	at com.hedera.balanceFileLogger.BalanceFileLogger.main(BalanceFileLogger.java:143) [classes/:?]

Expected:
Reject invalid file and move on

NullPointerException in EventStreamFileParser

Actual:
On startup with a clean filesystem and db:

mirror-node-event-download-parse_1   | Exception in thread "main" java.lang.NullPointerException
mirror-node-event-download-parse_1   | 	at com.hedera.parser.EventStreamFileParser.loadEventStreamFiles(EventStreamFileParser.java:363)
mirror-node-event-download-parse_1   | 	at com.hedera.parser.EventStreamFileParser.parseNewFiles(EventStreamFileParser.java:459)
mirror-node-event-download-parse_1   | 	at com.hedera.downloader.DownloadAndParseEventFiles.main(DownloadAndParseEventFiles.java:48)

Expected:
No crash

Performance Tests: set up and tooling

Begin evaluating tooling and implementing performance tests, to be run from CI (nightly test?).

  • spin up environment to test the performance of the mirror-node (parser and rest-api), likely nightly build.
  • get results for:
    • rest-api performance numbers per transaction
    • insert/processing performance numbers from the parser
      ...

cc @atul-hedera

FileWatcher not notified of files created while down

BalanceFileLogger uses the Java file watching API to be notified of creates and updates of files that the downloader writes. However, if BalanceFileLogger is down while downloader writes them then there will be no notification. We need to look for new files on startup.

Test balance update with a file containing 5M rows

Generate a CSV file with 5M records (starting at 1000 for the account number) and load into the database (place the file in the valid folder and run balanceParse).

Generate a second CSV file with different balances and load into the database.

Note: Set the config.json parameter "validatebalancesignatures" to false before running the test, else signatures will be required for all nodes on this file.

Balance downloader fails to start

Creating network "docker_default" with the default driver
Creating docker_mirror-node-rest-api_1              ... done
Creating docker_mirror-node-balance-downloader_1    ... error
Creating docker_mirror-node-postgres_1           ... done
Creating docker_mirror-node-102-file-update_1    ... done
Creating docker_mirror-node-balance-parser_1     ... 
Creating docker_mirror-node-record-download-parse_1 ... 

Creating docker_mirror-node-balance-parser_1        ... done
Creating docker_mirror-node-record-download-parse_1 ... done

ERROR: for mirror-node-balance-downloader  Cannot start service mirror-node-balance-downloader: OCI runtime create failed: container_linux.go:345: starting container process caused "exec: \"/usr/bin/java\": stat /usr/bin/java: no such file or directory": unknown
ERROR: Encountered errors while bringing up the project.

Manually running container shows java is not there in path:

docker run -it --rm docker_mirror-node-balance-downloader /bin/sh
# java
/bin/sh: 1: java: not found
# /bin/java 
/bin/sh: 2: /bin/java: not found

Consider writing all state to the database

Currently there are some json files that contain the state of the system like the below:

  • balance.json
  • records.json
  • events.json
  • loggerStatus.json

All system state should be consolidated to the database or constructed dynamically from existing tables. For example, the lastValidRcd* state could be reconstructed by getting the max timestamp of records processed on startup or by manually storing the last valid processed in a separate table.

Reasons for using the database:

  • State is stored in one spot instead of spread around
  • Can be migrated between versions when new fields added/removed
  • Transactions provide atomic updates, ensuring no race conditions between multiple threads/processes reading/writing to same file
  • Current files hardcode paths with / in config and thus won't work on Windows
  • Better data type validation, foreign keys and cascading delete

Rework database schema

  • Transactions - use consensus_ns as primary key
  • Balances - use entities instead of normalising data into t_account_balances see #86
  • Remove unnecessary columns and indices

REST API returning numeric fields as strings

There are several fields returned by the REST API that are returned as JSON strings (quoted) instead of JSON numeric values.

Simple cleanup of this (to return numerics) before the API is released would be helpful.

transaction response fields:

charged_tx_fee
transfers.amount
balances - I'm not sure (still trying to get this working locally)

Fields that are numeric (decimal) where using a string is an ok tradeoff to avoid loss of precision (we don't need to change these, necessarily):

valid_start_timestamp
consensus_timestamp

Expose entity keys in the t_entities_table

Entity keys (key for accounts and files and admin_key for contracts) are currently stored in protobuf format.

  1. Add column to store key in plain text
    1.1 Consolidate key and admin_key into a single column
  2. Modify java code to extract the ED25519 key from protobuf at the time of parsing and store in that column
  3. Ignore keys that contain keylist or threshold keys
  4. Optionally implement a migration for the entities that already exist in the database.

Optimise balance load into database

Purpose is to reduce the amount of updates to existing rows.

-Insert balance data into new table
-Drop t_account_balances table
-Rename new table to t_account_balances
-Rebuild indices / foreign keys

NullPointerException when 102 file doesn't exist

mirror-node-record-download-parse_1  | 2019-08-09 18:41:27.870 ERROR  286  utility - getBytes() failed, Exception: {}
mirror-node-record-download-parse_1  | java.io.FileNotFoundException: ./config/0.0.102 (No such file or directory)
mirror-node-record-download-parse_1  | 	at java.io.FileInputStream.open0(Native Method) ~[?:?]
mirror-node-record-download-parse_1  | 	at java.io.FileInputStream.open(FileInputStream.java:219) ~[?:?]
mirror-node-record-download-parse_1  | 	at java.io.FileInputStream.<init>(FileInputStream.java:157) ~[?:?]
mirror-node-record-download-parse_1  | 	at com.hedera.utilities.Utility.getBytes(Utility.java:283) [mirrorNode.jar:?]
mirror-node-record-download-parse_1  | 	at com.hedera.signatureVerifier.NodeSignatureVerifier.<init>(NodeSignatureVerifier.java:49) [mirrorNode.jar:?]
mirror-node-record-download-parse_1  | 	at com.hedera.downloader.RecordFileDownloader.verifySigsAndDownloadRecordFiles(RecordFileDownloader.java:137) [mirrorNode.jar:?]
mirror-node-record-download-parse_1  | 	at com.hedera.downloader.RecordFileDownloader.downloadNewRecordfiles(RecordFileDownloader.java:37) [mirrorNode.jar:?]
mirror-node-record-download-parse_1  | 	at com.hedera.downloader.DownloadAndParseRecordFiles.main(DownloadAndParseRecordFiles.java:35) [mirrorNode.jar:?]
mirror-node-record-download-parse_1  | Exception in thread "main" java.lang.NullPointerException
mirror-node-record-download-parse_1  | 	at com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:209)
mirror-node-record-download-parse_1  | 	at com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:214)
mirror-node-record-download-parse_1  | 	at com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:49)
mirror-node-record-download-parse_1  | 	at com.hederahashgraph.api.proto.java.NodeAddressBook.parseFrom(NodeAddressBook.java:237)
mirror-node-record-download-parse_1  | 	at com.hedera.signatureVerifier.NodeSignatureVerifier.<init>(NodeSignatureVerifier.java:49)
mirror-node-record-download-parse_1  | 	at com.hedera.downloader.RecordFileDownloader.verifySigsAndDownloadRecordFiles(RecordFileDownloader.java:137)
mirror-node-record-download-parse_1  | 	at com.hedera.downloader.RecordFileDownloader.downloadNewRecordfiles(RecordFileDownloader.java:37)
mirror-node-record-download-parse_1  | 	at com.hedera.downloader.DownloadAndParseRecordFiles.main(DownloadAndParseRecordFiles.java:35)
mirror-node-balance-downloader_1     | 2019-08-09 18:41:23.922 ERROR  286  utility - getBytes() failed, Exception: {}
mirror-node-balance-downloader_1     | java.io.FileNotFoundException: ./config/0.0.102 (No such file or directory)
mirror-node-balance-downloader_1     | 	at java.io.FileInputStream.open0(Native Method) ~[?:?]
mirror-node-balance-downloader_1     | 	at java.io.FileInputStream.open(FileInputStream.java:219) ~[?:?]
mirror-node-balance-downloader_1     | 	at java.io.FileInputStream.<init>(FileInputStream.java:157) ~[?:?]
mirror-node-balance-downloader_1     | 	at com.hedera.utilities.Utility.getBytes(Utility.java:283) [mirrorNode.jar:?]
mirror-node-balance-downloader_1     | 	at com.hedera.signatureVerifier.NodeSignatureVerifier.<init>(NodeSignatureVerifier.java:49) [mirrorNode.jar:?]
mirror-node-balance-downloader_1     | 	at com.hedera.downloader.AccountBalancesDownloader.verifySigsAndDownloadBalanceFiles(AccountBalancesDownloader.java:74) [mirrorNode.jar:?]
mirror-node-balance-downloader_1     | 	at com.hedera.downloader.AccountBalancesDownloader.main(AccountBalancesDownloader.java:47) [mirrorNode.jar:?]
mirror-node-balance-downloader_1     | Exception in thread "main" java.lang.NullPointerException
mirror-node-balance-downloader_1     | 	at com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:209)
mirror-node-balance-downloader_1     | 	at com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:214)
mirror-node-balance-downloader_1     | 	at com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:49)
mirror-node-balance-downloader_1     | 	at com.hederahashgraph.api.proto.java.NodeAddressBook.parseFrom(NodeAddressBook.java:237)
mirror-node-balance-downloader_1     | 	at com.hedera.signatureVerifier.NodeSignatureVerifier.<init>(NodeSignatureVerifier.java:49)
mirror-node-balance-downloader_1     | 	at com.hedera.downloader.AccountBalancesDownloader.verifySigsAndDownloadBalanceFiles(AccountBalancesDownloader.java:74)
mirror-node-balance-downloader_1     | 	at com.hedera.downloader.AccountBalancesDownloader.main(AccountBalancesDownloader.java:47)

It would be better to fallback to nodesConfig.json if not available, I would think. Or possibly constructing it from the files in S3.

Also, there is a race condition if the 102 updater doesn't run before the other processes that require the file run they will still fail to start.

Add analytics base on Events

Basic Network Health Analytics Based on Events

  1. Count Created Events Per time window
  2. Count Reached Consensus Events Per time window
  3. Count Created Events Per time window Per Node
  4. Count Reached Consensus Events Per time window Per Node
  5. Count of System & User Transactions in Reached Consensus Events Per time window Per Node
  6. query for the min, median, and max latency (consensusTimestamp - timeCreated)for all events that reached consensus in a given time window, per node.
  7. get the latency of events that reached consensus in a given time window

Save parsed events into database; Design Event schema

t_events Table

Type Column Name Description NotNull
bigint id Primary Key, auto-increment NotNull
bigint consensus_order order in history (0 first) NotNull
bigint creator_node_id nodeID of this event's creator NotNull
bigint creator_seq sequence number for this by its creator (0 is first) NotNull
bigint other_node_id ID of otherParent 's creator
bigint other_seq sequence number for otherParent event (by its creator)
bytea signature creator's sig for this NotNull
bytea hash hash of this event NotNull
bigint self_parent_id the id for the self parent
bigint other_parent_id the id for other parent
bytea self_parent_hash hash of the self parent
bytea other_parent_hash hash of other parent
bigint self_parent_generation the generation for the self parent
bigint other_parent_generation the generation for other parent
bigint generation generation (which is 1 plus max of parents' generations) NotNull
bigint created_timestamp_ns seconds * (10 ^ 9) + nanos of creation time, as claimed by its creator NotNull
bigint consensus_timestamp_ns seconds * (10 ^ 9) + nanos of the community's consensus timestamp for this event NotNull
bigint latency_ns consensus_timestamp_ns - created_timestamp_ns NotNull
integer txs_bytes_count number of bytes in transactions in this event NotNull
integer platform_tx_count number of platform Transactions in this event NotNull
integer app_tx_count number of application Transactions in this event NotNull

Only store deltas in account_balances

In order to reduce data storage requirements, only store the balance of an account in history if it has changed since the last time it was recorded.

Don't process partial files

Due to the nature of downloading to files and processing the files in separate processes, it's possible that processing could occur on a partially downloaded file. Whether it's polling for new files manually or using java's WatcherService, it's a possibility that the S3 downloader created the file, wrote some bytes but has not finished writing all bytes by the time the file is picked up by the parsers (especially for larger files).

Possible solutions:

  1. Download files to a different directory and move them once complete. Move is an atomic operation.
  2. Combine downloader and parser into a single process and use S3's GetObject to download to a ByteArrayOutputStream instead of a file.

Pros to 1 is its quickest.

Pros to 2 is it's a simpler architecture and avoids the penalty of writing files and coordinating processes. Due to the serial nature of processing these file streams, the only benefit of having separate processes is for separation of failures. But parser would be effectively down anyway if there's no new files being downloaded. Con is it's more work and not sure how large files get to be loaded into memory.

AWS / GCP file fetch automatic failover

Problem

Currently, the importer is configured to either download from AWS or GCS. In the event of a connection failure, the mirror node should support failing over to one or more other cloud providers.

Solution

  • Add a hedera.mirror.importer.downloader.sources to CommonDownloaderProperties:
    private List<StreamSourceProperties> sources = new ArrayList<>();

    public static class StreamSourceProperties {
        private StreamCredentials credentials;
        private String projectId; // maps to gcpProjectId
        private StreamSourceType type; // Renamed from CloudProvider
        private URI uri; // maps to endpointOverride
    }

    public static class StreamCredentials {
        private String accessKey;
        private String secretKey;
    }
  • Keep existing bucket properties as is. On startup, create a StreamSourceProperties from the existing properties and push to the front of the sources list.
  • Add a class to abstract away the source of stream files
public interface StreamFileProvider {

    Mono<StreamFileData> get(ConsensusNode node, StreamFilename streamFilename);

    Flux<StreamFileData> list(ConsensusNode node, StreamFilename lastFilename);
}
  • Add a S3StreamFileProvider that uses S3AsyncClient
    • A GcsStreamFileProvider is not necessary since we can reuse S3StreamFileProvider for both since they implement the same API.
  • Add a CompositeStreamFileProvider to handle the failover functionality
    • Iterate over the sources always starting with the first
    • If the selected source throws an exception, and it's not the last functioning source, remove it from the list of valid sources
    • Don't remove if it's a common, transient exception like NoSuchKeyException, etc. Check historical logs to verify which ones.
    • Periodically retry bad sources to see if they've become healthy again and re-add to valid sources

Send Queries that are free to a random node that have worked recently

Currently, for the Queries which don't contain a payment, we send them to a default node.

For load balancing and getting rid of single point failure, each time we should pick a random node that have worked recently and send a Query to it.

Reference: wallets and micropayment server

As Leemon said: In our wallets and micropayment server etc, we have code to remember when a node fails to respond, and we then mark it to not be randomly called in the near future. So mostly we choose randomly from the nodes that have worked recently, and only occasionally call a node that used to be down. That kind of behavior should eventually be implemented here, too.

Consider separating business logic and database logic

Currently, database queries are performed directly in the business logic layers. Most enterprise applications separate these two so the database operations can be tested in isolation and reused among multiple components. This pattern is call the DAO or Repository pattern. Libraries like Hibernate and Spring Data help automate most of this so you get the basic operations for free.

We should consider at minimum separating into separate classes and potentially using Spring Data in the future. Should also create domain objects populated by repository to pass between business layer and repo layer.

Download from S3 in parallel

Currently we loop through all node IDs and download any new files serially. To optimize this, we should start a new thread per node ID (maybe up to some max threads) and download them in parallel. This way we can utilize network and CPU resources more efficiently and enable parser to get the 2/3 of nodes signatures quicker.

Rework address book file update

When a transaction to update the address book is detected by mirror node, the address book file is updated.
Need to test this scenario with an actual address book update transaction.

In the event that an address book update transaction itself cannot be tested, a test with another file to verify contents are updated would be a good first confidence test.

  1. Create a file on the file system containing some data
  2. Create the same file on the network via a transaction
  3. Issue an update transaction for the same file with new contents
  4. Check that the 0.0.102 file contains the new contents.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.