Code Monkey home page Code Monkey logo

pyquarkchain's Introduction

QuarkChain

commit-build-test nightly-check-db

QuarkChain is a sharded blockchain protocol that employs a two-layer architecture - one extensible sharding layer consisting of multiple shard chains processing transactions and one root chain layer securing the network and coordinating cross-shard transactions among shard chains. The capacity of the network scales linearly as the number of shard chains increase while the root chain is always providing strong security guarantee regardless of the number of shards. QuarkChain testnet consistently hit 10,000+ TPS with 256 shards run by 50 clusters consisting of 6450 servers with each loadtest submitting 3,000,000 transactions to the network.

Features

  • Cluster implementation allowing multiple processes / physical machines to work together as a single full node
  • State sharding dividing global state onto independent processing and storage units allowing the network capacity to scale linearly by adding more shards
  • Cross-shard transaction allowing native token transfers among shard chains
  • Adding shards dynamically to the network
  • Support of different mining algorithms on different shards
  • P2P network allowing clusters to join and leave anytime with encrypted transport
  • Fully compatible with Ethereum smart contract

For dApp Developers

Please check dApp Development for a step-by-step tutorial.

Design

QuarkChain Cluster

Check out the Wiki to understand the design of QuarkChain.

Development Setup

QuarkChain should be run using pypy for better performance. The rest of section uses OSX as the reference for environment set-up.

To install pypy3 on OSX, first install Homebrew

/usr/bin/ruby -e "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install)"

Then install pypy3 and other dependencies

brew install pypy3 gmp pkg-config openssl

It's is highly recommended to use virtual environment creating an isolated python environment for your project so that the python modules installed later will only affect this environment.

To clone the code to your target directory

git clone https://github.com/QuarkChain/pyquarkchain.git
cd pyquarkchain

To create a virtual environment

mkdir ~/virtualenv
pypy3 -m venv ~/virtualenv/qc

As the virtual env is created with pypy3 once the env is activated all the python and pip commands will point to their pypy3 versions automatically.

To activate the virtual environment

source ~/virtualenv/qc/bin/activate
# the rest of the tutorial assumes virtual environment

Install rocksdb which is required by the python-rocksdb module in the next step

brew install rocksdb

To install the required modules for the project. Under pyquarkchain dir where setup.py is located

# you may want to set the following if cryptography complains about header files: (https://github.com/pyca/cryptography/issues/3489)
# export CPPFLAGS=-I/usr/local/opt/openssl/include
# export LDFLAGS=-L/usr/local/opt/openssl/lib
pip install -e .

Once all the modules are installed, try running all the unit tests under pyquarkchain

python -m pytest

Development Flow

pre-commit is used to manage git hooks.

pip install pre-commit
pre-commit install

black is used to format modified python code, which will be automatically triggered on new commit after running the above commands. Refer to STYLE for coding style suggestions.

Running Clusters

If you are on a private network (e.g. running from a laptop which connects to the Internet through a router), you need to first setup port forwarding for UDP/TCP 38291.

We recommend following the instruction wiki to start clusters using docker.

Running a single cluster for local testing

Start running a local cluster which does not connect to anyone else. The default cluster has 8 shards and 4 slaves.

cd quarkchain/cluster
pypy3 cluster.py --p2p
# add --start_simulated_mining to mine blocks with simulated mining (does not run any hash algorithms)

Running multiple clusters for local testing

Run multiple clusters with P2P network on a single machine with simulated mininig:

pypy3 multi_cluster.py --num_clusters=3 --p2p --start_simulated_mining

Running multiple clusters with P2P network on different machines

NOTE this is effectively a private network. If you would like to join our testnet or mainnet, look back a few sections for instructions.

Just follow the same command to run single cluster and provide --bootnodes flag to discover and connect to other clusters. Make sure ports are open and accessible from outside world: this means if you are running on AWS, open the ports (default both UDP and TCP 38291) in security group; if you are running from a LAN (connecting to the internet through a router), you need to setup port forwarding for UDP/TCP 38291. We have a convenience UPNP module as well, but you will need to check if it has successfully set port forwarding.

(Optional) Not needed if you are joining a testnet or mainnet. If you are starting your own network, first start the bootstrap cluster:

# optional, run python quarkchain/tools/newkey.py and note $BOOTSTRAP_PRIV_KEY and $BOOTSTRAP_PUB_KEY
pypy3 cluster.py --p2p --privkey=$BOOTSTRAP_PRIV_KEY

Then start other clusters and provide the bootnode.

BOOTSTRAP_ENODE=enode://$BOOTSTRAP_PUB_KEY@$BOOTSTRAP_IP:$BOOTSTRAP_DISCOVERY_PORT
pypy3 cluster.py --p2p --bootnodes=$BOOTSTRAP_ENODE

Effectively, newkey.py gives the bootstrap node an identity, and you will need to provide the public key to anyone who wants to connect to the bootnodes for discovery. Read https://github.com/QuarkChain/pyquarkchain/wiki/Networking#commandline-flags-explained for details on the commandline flags.

Monitoring Clusters

Use the stats tool in the repo to monitor the status of a cluster. It queries the given cluster through JSON RPC every 10 seconds and produces an entry.

$ quarkchain/tools/stats --ip=localhost
----------------------------------------------------------------------------------------------------
                                      QuarkChain Cluster Stats
----------------------------------------------------------------------------------------------------
CPU:                8
Memory:             16 GB
IP:                 localhost
Shards:             8
Servers:            4
Shard Interval:     60
Root Interval:      10
Syncing:            False
Mining:             False
Peers:              127.0.0.1:38293, 127.0.0.1:38292
----------------------------------------------------------------------------------------------------
Timestamp                     TPS   Pending tx  Confirmed tx       BPS      SBPS      ROOT       CPU
----------------------------------------------------------------------------------------------------
2018-09-21 16:35:07          0.00            0             0      0.00      0.00        84     12.50
2018-09-21 16:35:17          0.00            0          9000      0.02      0.00        84      7.80
2018-09-21 16:35:27          0.00            0         18000      0.07      0.00        84      6.90
2018-09-21 16:35:37          0.00            0         18000      0.07      0.00        84      4.49
2018-09-21 16:35:47          0.00            0         18000      0.10      0.00        84      6.10

JSON RPC

JSON RPCs are defined in jsonrpc.py. Note that there are two JSON RPC ports. By default they are 38491 for private RPCs and 38391 for public RPCs. Since you are running your own clusters you get access to both.

Public RPCs are documented in the Developer Guide. You can use the client library quarkchain-web3.js to query account state, send transactions, deploy and call smart contracts. Here is a simple example to deploy smart contract on QuarkChain using the client library.

You may find a list of accounts with tokens preallocated through the genesis blocks here. Feel free to use any of them to issue transactions.

Loadtest

Follow this wiki page to loadtest your cluster and see how fast it processes large volumn of transacations.

Issue

Please open issues on github to report bugs or make feature requests.

Contribution

All the help from community is appreciated! If you are interested in working on features or fixing bugs, please open an issue first to describe the task you are planning to do. For small fixes (a few lines of change) feel free to open pull requests directly.

Developer Community

Join our developer community on Discord.

License

Unless explicitly mentioned in a folder or a file, all files are licensed under MIT License defined in LICENSE file.

pyquarkchain's People

Contributors

bc1pjerry avatar caoyoyo avatar czy1234 avatar dependabot[bot] avatar freshmanq avatar hanyunx avatar jishankai avatar jsw-qkc avatar jyouyj avatar lyhe18 avatar ninjaahhh avatar pichaoqkc avatar ping-ke avatar qcdll avatar qcgg avatar qizhou avatar qkcww avatar yanhongfirst avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

pyquarkchain's Issues

Genesis blocks restructure and new shard creation

Proposal to simplify the genesis block structure which will also facilitate the creation of new shard in future

This comment describes the dependencies among the genesis blocks where each chain has two genesis blocks.

# Structure of genesis blocks
#
# +--+ +--+
# |m0|<----|m1|
# +--+ +--+--+
# ^ | ^
# | | |
# +--+<-+ +--+
# |r0|<----|r1|
# +--+ +--+
#
# where m0 is the first block in each shard, m1 is the second block in each shard, and
# r0 is the first root block, which confirms m0, r1 is the second root block, which confirms m1
# This make sures that all hash pointers of m1 and r1 points to previous valid blocks.

This is a bit over complicated and can be simplified as follows.

The genesis state starts with a genesis root block R0. Genesis block on each shard then use R0 as hash_prev_root_block. This should be sufficient to guarantee that every clients have the same genesis state.

The genesis blocks on each shard shall be imported from a genesis file specifying the initial shard state. Each shard can specify the height at which the genesis block should be created and allow reserved shards to be used in future any time.

Initially we can have 32 shards with half creating genesis blocks at root block height 0 (R0) and half reserved for future. To start using a reserved shard we just need to push out a genesis file for that shard with target root block height set in the future. Clients picking up the new genesis file will create the genesis block at the target root height. Clients unaware of the new shard will be on a hard fork.

Different difficulty adjustments for different mining algorithms

current difficulty calculation is based on ethash

def calculate_diff_with_parent(self, parent, create_time):
check(not self.check_uncle)
check(parent.create_time < create_time)
sign = max(1 - (create_time - parent.create_time) // self.cutoff, -99)
offset = parent.difficulty // self.diff_factor
return int(max(parent.difficulty + offset * sign, self.minimum_diff))

however with different mining algorithms (simplest example is double sha256), we might want to change our adjustment algorithm, as well as initial difficulty in our cluster config, and even how we calculate the target from the difficulty, which also comes from ethereum

target = (2 ** 256 // (block_header.difficulty or 1) - 1).to_bytes(
32, byteorder="big"
)

as we come to support more mining algorithms for different shards, we need to have a better way for those mining algorithm management.

Disable proof of progress

Since there are too many JSON RPC failures due to proof of progress, we may consider stopping this feature.

Fix fake root block addition

useful for testnet, but will break real PoW.

async def __add_block(block):
# Root block should include latest minor block headers while it's being mined
# This is a hack to get the latest minor block included since testnet does not check difficulty
# TODO: fix this as it will break real PoW
block = await __create_block()

get work bug

mining with 2-3hours have a failed to get work HTTPConnectionpool(host="localhost",port=3839)1:Read timed out. (read timeout=3)ใ€‚must be reboot can continue mining.and i email to [email protected].

fully integrate EIP-712

currently, pyquarkchain's transaction include a field called version which signifies how the transaction is signed:

('version', big_endian_int),

0 means the signature is generated in a similar way to ethereum transactions
1 means the signature is generated based on typed signature (EIP-712) as implemented by MetaMask BEFORE #4803

after Metamask 4803 is pushed, it will be a breaking change for MetaMask users of our testnet, so we need to upgrade accordingly

Refactor DefaultConfig

Move shard specific parameters into ShardConfig and global parameters into QuarkChainConfig

bug

SLAVE_S0: ESC[1;31mE1016ESC[1;0m 07:01:45.170564 shard.py:532] Traceback (most recent call last):
SLAVE_S0: File "/code/pyquarkchain/quarkchain/cluster/shard.py", line 530, in add_block
SLAVE_S0: xshard_list = self.state.add_block(block)
SLAVE_S0: File "/code/pyquarkchain/quarkchain/cluster/shard_state.py", line 646, in add_block
SLAVE_S0: block.header.height, self.header_tip.height
SLAVE_S0: IndexError: out of range: index 2 but only 2 arguments
SLAVE_S0:

direct slave connection

slaves should be able to directly connect peers instead of going through the proxy connection in master. specially, now master handles message encryption/decryption and has to perform on all shards

on the other hand, master need a way to reveal security handshake secret to slave, this is reasonable requirement for nodes in the same cluster are mutually trusted

Dynamic block gas limit for cross-shard transaction throttling

As the block gas limit for the target shard can change from block to block the throttling threshold should be adjusted accordingly. As the block gas limit is just a way to limit the size / computation / storage cost of a block we could relax the constraint a bit for processing cross-shard deposits and allow it to use a bit more gas than the actual limit. By relaxing the constraint we could simply use the following algorithm to get an estimate of the block gas limit of the target shard.

When creating a new block S whose hash_prev_root_block is R including minor block headers from shard A, B, C, the block gas limit of a shard can be taken from the last header of that shard included by R.

S  --> R (A, B, C)

Since the genesis root block doesn't include any minor block headers, initially all the block gas limits perceived by the source shard are 0 by default and thus no cross-shard transactions are allowed. As soon as a new root block arrives the block gas limits will be updated.

download too old block when sync

SLAVE_S1: E1202 12:18:37.594388 shard.py:536] Traceback (most recent call last):
SLAVE_S1: File "/code/pyquarkchain/quarkchain/cluster/shard.py", line 534, in add_block
SLAVE_S1: xshard_list = self.state.add_block(block)
SLAVE_S1: File "/code/pyquarkchain/quarkchain/cluster/shard_state.py", line 655, in add_block
SLAVE_S1: block.header.height, self.header_tip.height
SLAVE_S1: ValueError: block is too old 28 << 4943

p2p discovery is broken

to reproduce

python multi_cluster.py --mine --clean --devp2p_enable

error
MASTER: E1012 05:42:49.670386 peermanager.py:224] discovery failed error=Serialization failed because of element at index 1 ("Object is not a serializable (<class 'str'>)") num_peers=0 min_peers=2

Restrict root chain mining

Root block signed by a preconfigured private key shall have a weaker requirement on difficulty. The matching public key is hardcoded and used to validate the signature.

bug in Miner causing mining to stop unexpectedly

The add_block function can return without adding block or raising exception and thus mine_new_block_async will not be called by anyone.

slave.py

        async def __add_block(block):
            # Do not add block if there is a sync in progress
            if self.synchronizer.running:
                return
            # Do not add stale block
            if self.state.header_tip.height >= block.header.height:
                return
            await self.handle_new_block(block)

miner.py

    def mine_new_block_async(self):
        async def handle_mined_block(instance: Miner):
            while True:
                block = await instance.output_q.coro_get()
                if not block:
                    return
                try:
                    await instance.add_block_async_func(block)
                except Exception as ex:
                    GLOG.exception(ex)
                    instance.mine_new_block_async()

Remove private keys from repo and testnet

We need to cleanup private keys from the following files

  • testnet/accounts_to_fund.py
  • loadtest/accounts.py
  • config.py genesis key

and start a clean testnet using none of the keys in the above files.

Problems calling the method sendTransaction(), call().

I calling method sendTransaction() or call() of https://developers.quarkchain.io/#getaccountdata

curl -X POST --data '{
    "jsonrpc": "2.0",
    "method": "call",
    "params": {
        "gasPrice": "0x2540be400", 
        "gas": "0x7530", 
        "value": "0x9184e72a",
        "data": "0xd46e8dd67c5d32be8d46e8dd67c5d32be8058bb8eb970870f072445675058bb8eb970870f072445675",
        "fromFullShardId": "0x19e189ec",
        "toFullShardId": "0x18f9ba2c",
        "networkId": "0x3",
        "to": "0x283B50c1326F5C09BA792cc0Ad6C08b5035a36711",
        "v": "0x1a",
        "r": "0x293d59ef8705e34585d646f5899530d52a2d39b312fd061607036152e5fcf589",
        "s": "0x98d2e479720cee2be165703dd97085765adc65b18ed8d9dfbf3d6d7e7fe5a6e"
    },
    "id": 1
}' http://jrpc.testnet.quarkchain.io:38391 

Response:

{"jsonrpc": "2.0", "error": {"code": -32602, "message": "Invalid params"}, "id": 1}

Difficulty adjustment failed on shard chains in testnet 2

The difficulty is stuck below 512 and blocks are mined too fast.
The root cause is that the minimum difficulty isn't explicitly set and the default value 1 is used.

else EthDifficultyCalculator(cutoff=7, diff_factor=512)

def __init__(self, cutoff, diff_factor, minimum_diff=1, check_uncle=False):

Fixing the code will cause hard fork. We'd rather fix the code and start a new testnet.

JSON RPC error code?

The error codes / messages returned from the json rpcs are misleading.
We need to go through each RPC and make sure the right code / message is returned.

master hanging on "write" sys call

when running the cluster (testnet201.bootstrap), after mining for a certain period of time, the jsonrpc server became unresponsive for all requests. strace shows following results

$ ps aux | grep py
...
root     15480  0.3  5.0 2049216 1605800 ?     Sl   Nov20   8:17 pypy3 master.py --cluster_config=/code/cluster_config.json
root     15481  0.8  1.5 788468 484300 ?       Sl   Nov20  22:27 pypy3 slave.py --cluster_config=/code/cluster_config.json --node_id=S0
...
$ sudo strace -p 15480 -s 10000
strace: Process 15480 attached
write(2, "I1121 00:26:53.642705 jsonrpc.py:408] {\"jsonrpc\": \"2.0\", \"method\": \"getWork\", \"params\": [\"0x1\"], \"id\": 433}\n", 108
<ctrl+c>
$ sudo strace -p 15481 -s 10000
strace: Process 15481 attached
epoll_wait(3,
<ctrl+c>

so slave server looks fine, but master is hanging on writing the jsonrpc log to strerr (file descriptor 2).

it also points to line 408

async def __handle(self, request):
request = await request.text()
Logger.info(request)

where in turn it calls

@classmethod
def info(cls, msg):
cls.check_logger_set()
cls._qkc_logger.info(msg)

need to find out why the write call is blocked.

Peer connection continuously resetting

Using a simple configuration with 3 clusters, each on a different machine (one is the bootstrap and 2 other connect to the bootstrap), I have one cluster master that keeps closing the peer connection but then reconnects, again and again:

SLAVE_S108: I1016 18:17:59.847517 shard.py:136] [236] received new header with height 66
SLAVE_S120: I1016 18:17:59.847553 shard.py:136] [504] received new header with height 55
SLAVE_S29: I1016 18:17:59.848172 shard.py:136] [285] received new header with height 47
SLAVE_S24: I1016 18:17:59.848497 shard.py:136] [536] received new header with height 47
SLAVE_S33: I1016 18:17:59.849180 shard.py:136] [289] received new header with height 49
SLAVE_S126: I1016 18:17:59.849541 shard.py:136] [766] received new header with height 48
SLAVE_S33: I1016 18:17:59.850793 shard.py:136] [161] received new header with height 54
SLAVE_S14: I1016 18:17:59.850684 shard.py:136] [398] received new header with height 42
SLAVE_S54: I1016 18:17:59.863734 shard.py:136] [822] received new header with height 47
SLAVE_S108: I1016 18:17:59.872091 shard.py:136] [620] received new header with height 51
SLAVE_S99: I1016 18:17:59.878481 shard.py:136] [995] received new header with height 48
SLAVE_S127: E1016 18:17:59.878653 slave.py:203] Traceback (most recent call last):
SLAVE_S127: File "slave.py", line 201, in handle_add_root_block_request
SLAVE_S127: switched = await shard.add_root_block(req.root_block)
SLAVE_S127: File "/home/laurent_pellegrino/pyquarkchain/quarkchain/cluster/shard.py", line 462, in add_root_block
SLAVE_S127: return self.state.add_root_block(root_block)
SLAVE_S127: File "/home/laurent_pellegrino/pyquarkchain/quarkchain/cluster/shard_state.py", line 1048, in add_root_block
SLAVE_S127: raise ValueError("cannot find previous root block in pool")
SLAVE_S127: ValueError: cannot find previous root block in pool
SLAVE_S127:
SLAVE_S99: I1016 18:17:59.879714 shard.py:136] [99] received new header with height 43
SLAVE_S96: I1016 18:17:59.880505 shard.py:136] [864] received new header with height 49
MASTER: E1016 18:17:59.881408 master.py:100] Traceback (most recent call last):
MASTER: File "master.py", line 98, in sync
MASTER: await self.__run_sync()
MASTER: File "master.py", line 156, in __run_sync
MASTER: await self.__add_block(block)
MASTER: File "master.py", line 194, in __add_block
MASTER: await self.master_server.add_root_block(root_block)
MASTER: File "master.py", line 1015, in add_root_block
MASTER: check(all([resp.error_code == 0 for _, resp, _ in result_list]))
MASTER: File "/home/laurent_pellegrino/pyquarkchain/quarkchain/utils.py", line 54, in check
MASTER: raise AssertionError(msg)
MASTER: AssertionError
MASTER:
SLAVE_S127: I1016 18:17:59.881958 shard.py:136] [1023] received new header with height 54
MASTER: I1016 18:17:59.882185 simple_network.py:172] Closing peer 13e8fe6de8394401c91735981ee39f6a4577548aa12cf471ec49fd61837993fd with the following reason:
MASTER: I1016 18:17:59.882568 simple_network.py:148] Peer 13e8fe6de8394401c91735981ee39f6a4577548aa12cf471ec49fd61837993fd disconnected, remaining 1
SLAVE_S127: I1016 18:17:59.884073 shard.py:136] [127] received new header with height 41
SLAVE_S127: I1016 18:17:59.885576 shard.py:136] [255] received new header with height 46
SLAVE_S127: I1016 18:17:59.886909 shard.py:136] [639] received new header with height 39
MASTER: I1016 18:20:56.125511 master.py:146] [R] syncing from 4 10264a68b8c4e31c7a74a8d266444d42459698e200c1e1454eeaa433e55744a2
MASTER: I1016 18:21:02.707984 master.py:150] [R] downloaded 1 blocks from peer
SLAVE_S15: I1016 18:21:03.519918 slave.py:406] [15] sync request from master, downloaded 3 blocks (25 - 27)
SLAVE_S20: I1016 18:21:03.520654 slave.py:406] [20] sync request from master, downloaded 2 blocks (23 - 24)
SLAVE_S4: I1016 18:21:03.520847 slave.py:406] [4] sync request from master, downloaded 4 blocks (26 - 29)
SLAVE_S0: I1016 18:21:03.520806 slave.py:406] [0] sync request from master, downloaded 5 blocks (24 - 28)
SLAVE_S15: E1016 18:21:03.520998 shard.py:597] Traceback (most recent call last):
SLAVE_S15: File "/home/laurent_pellegrino/pyquarkchain/quarkchain/cluster/shard.py", line 595, in add_block_list_for_sync
SLAVE_S15: xshard_list = self.state.add_block(block)
SLAVE_S15: File "/home/laurent_pellegrino/pyquarkchain/quarkchain/cluster/shard_state.py", line 656, in add_block
SLAVE_S15: self.__validate_block(block)
SLAVE_S15: File "/home/laurent_pellegrino/pyquarkchain/quarkchain/cluster/shard_state.py", line 424, in __validate_block
SLAVE_S15: block.header.hash_prev_minor_block.hex(),
SLAVE_S15: ValueError: [15] prev block not found, block height 25 prev hash 04d657110e2b71da93acff13c73a4d33b1a4ff2fb08cf40ac42bb968e52f7ae4
SLAVE_S15:
SLAVE_S14: I1016 18:21:03.520972 slave.py:406] [14] sync request from master, downloaded 5 blocks (27 - 31)
SLAVE_S7: I1016 18:21:03.520896 slave.py:406] [7] sync request from master, downloaded 5 blocks (27 - 31)
SLAVE_S8: I1016 18:21:03.521011 slave.py:406] [8] sync request from master, downloaded 4 blocks (26 - 29)
SLAVE_S18: I1016 18:21:03.521023 slave.py:406] [18] sync request from master, downloaded 5 blocks (27 - 31)
SLAVE_S1: I1016 18:21:03.521154 slave.py:406] [1] sync request from master, downloaded 6 blocks (20 - 25)
SLAVE_S10: I1016 18:21:03.521264 slave.py:406] [10] sync request from master, downloaded 3 blocks (27 - 29)
SLAVE_S12: I1016 18:21:03.521490 slave.py:406] [12] sync request from master, downloaded 3 blocks (27 - 29)
SLAVE_S5: I1016 18:21:03.521558 slave.py:406] [5] sync request from master, downloaded 5 blocks (20 - 24)
SLAVE_S13: I1016 18:21:03.521574 slave.py:406] [13] sync request from master, downloaded 5 blocks (24 - 28)
SLAVE_S20: E1016 18:21:03.521722 shard.py:597] Traceback (most recent call last):
SLAVE_S20: File "/home/laurent_pellegrino/pyquarkchain/quarkchain/cluster/shard.py", line 595, in add_block_list_for_sync
SLAVE_S20: xshard_list = self.state.add_block(block)
SLAVE_S20: File "/home/laurent_pellegrino/pyquarkchain/quarkchain/cluster/shard_state.py", line 656, in add_block
SLAVE_S20: self.__validate_block(block)
SLAVE_S20: File "/home/laurent_pellegrino/pyquarkchain/quarkchain/cluster/shard_state.py", line 424, in __validate_block
SLAVE_S20: block.header.hash_prev_minor_block.hex(),
SLAVE_S20: ValueError: [20] prev block not found, block height 23 prev hash c3f9e52ff0b582e4e330619a92eda0b7fe9a174ab1dcb6ea28f8a7b3d98c9e96

...

MASTER: E1016 18:21:45.439136 master.py:100] Traceback (most recent call last):
MASTER: File "master.py", line 98, in sync
MASTER: await self.__run_sync()
MASTER: File "master.py", line 156, in __run_sync
MASTER: await self.__add_block(block)
MASTER: File "master.py", line 194, in __add_block
MASTER: await self.master_server.add_root_block(root_block)
MASTER: File "master.py", line 1015, in add_root_block
MASTER: check(all([resp.error_code == 0 for _, resp, _ in result_list]))
MASTER: File "/home/laurent_pellegrino/pyquarkchain/quarkchain/utils.py", line 54, in check
MASTER: raise AssertionError(msg)
MASTER: AssertionError
MASTER:
MASTER: I1016 18:21:45.439860 simple_network.py:172] Closing peer 13e8fe6de8394401c91735981ee39f6a4577548aa12cf471ec49fd61837993fd with the following reason:
MASTER: I1016 18:21:45.440154 simple_network.py:148] Peer 13e8fe6de8394401c91735981ee39f6a4577548aa12cf471ec49fd61837993fd disconnected, remaining 0
MASTER: I1016 18:21:59.535938 simple_network.py:93] Got HELLO from peer 82384749bb351c8bdbe13645bb0103265e843fbf89bb53e43be915caa1de092c (35.227.54.173:38291)
MASTER: I1016 18:21:59.571896 simple_network.py:123] Established virtual shard connections with peer 82384749bb351c8bdbe13645bb0103265e843fbf89bb53e43be915caa1de092c
MASTER: I1016 18:21:59.572502 simple_network.py:132] Peer 82384749bb351c8bdbe13645bb0103265e843fbf89bb53e43be915caa1de092c added to active peer pool

Improve transaction queue

The transaction queue evm/transaction_queue.py we copied from pyethereum is very basic which doesn't buffer transactions with higher nonces.
py-evm and go-ethereum should have better design that we can borrow.

root chain cannot catch up with shards

when this happens the root chain only includes shard blocks that were generated some time ago
this causes x-shard transactions to hang (until blocks of the originating shard are confirmed by root blocks), and miners coinbase rewards not finalized because of possible forks

broadcast blocks before add_block

current measurements on loadtest shows 10x+ propagation latency when loadtesting, we now try broadcasting blocks right after POW validation, instead of waiting for local add_block

for slave:

  1. move broadcast before add_block
  2. allow peers to download blocks that are in a pool, instead of just from the chain

syncing cluster keep connecting/disconnecting from other peers

here is what I observed:

  1. run a network (eg 5 clusters) and start mining, wait until it reaches some height
  2. fire up a new cluster and connect to the network
  3. the new cluster will connect with one peer (usually bootstrap node) and start syncing root blocks
  4. while the new cluster is syncing, it will connect to other nodes in the network, but will close_with_error:
MASTER: E1127 10:57:41.764544 master.py:102] Traceback (most recent call last):
MASTER: File "master.py", line 100, in sync
MASTER: await self.__run_sync()
MASTER: File "master.py", line 164, in __run_sync
MASTER: await self.__add_block(block)
MASTER: File "master.py", line 207, in __add_block
MASTER: await self.master_server.add_root_block(root_block)
MASTER: File "master.py", line 1109, in add_root_block
MASTER: check(all([resp.error_code == 0 for _, resp, _ in result_list]))
MASTER: File "/Users/dll/chain/pyquarkchain/quarkchain/utils.py", line 61, in check
MASTER: raise AssertionError(msg)
MASTER: AssertionError
Closing peer XXX with the following reason:

it seems the slaves refused to add the root block in question because the prev root block is unknown

peer reputation

geth has it, parity has it, py-evm does not have it (yet)
we want it

peer reputation should be stateful, and limited to local cluster, ie. we should not convey reputation over the p2p wire, which means clusters have to figure out who to trust on their own

a preliminary implementation is to store currently connected peers and prefer them when restarted
this requires the client to store states other than the chain itself

Implement neighbor and cross-shard transaction throttling

To reduce communication cost inside a cluster and make cluster truly scalable we cannot allow every shards to talk to each other (e.g., broadcasting new blocks with cross-shard deposits). We will implement a neighbor rule that only allows one shard to communicate with its neighbors. This means one cannot make cross-shard transaction to arbitrary shard directly. Some token transfers would require multiple cross-shard transactions using one or more shards as relay to transfer tokens to the destination shard. This process would be managed by the wallet doing the transfer.

The number of cross-shard transactions targeting a specific recipient shard should also be limited to avoid hot spot. In the current design the first minor block mined after a root block will be incentivized to process all the cross-shard deposits from the minor blocks confirmed by the root block. Each deposit will cost DEPOSIT_GAS (9000) gas to be processed. We need to make sure the total number of deposits will not exceed BLOCK_GAS_LIMIT / DEPOSIT_GAS. X the max number of cross-shard transactions targeting a destination shard in a block shall satisfy the following statement.

X * MAX_BLOCKS_PER_SHARD * MAX_NEIGHBORS <= BLOCK_GAS_LIMIT / DEPOSIT_GAS

where MAX_BLOCKS_PER_SHARD is the max number of minor blocks that can be included by a root block, MAX_NEIGHBORS is the max number of neighbors a shard can have.

So we need to enforce that

X <= BLOCK_GAS_LIMIT / DEPOSIT_GAS / MAX_NEIGHBORS / MAX_BLOCKS_PER_SHARD

Using the following parameters

BLOCK_GAS_LIMIT = 10,000,000
DEPOSIT_GAS = 9,000
MAX_NEIGHBORS = 32
MAX_BLOCKS_PER_SHARD = 10

we got X < 3.5

Refresh mining root block when new minor block header comes in

during simulated mining in testnet, whenever the block gets refreshed (a new tx for minor block, or a new minor block header for a root block comes in), should pass the new block into miner.input_q, such that miner will discard current progress and start mining new block to maximize testnet's performance.

in production, since new block is triggered by mining (miner.get_work will generate blocks to mine) which has a 5-second period during each block generation, this is not an issue.

(it's less an issue for minor block since the mining interval is short.)

Better abstraction for managing shards in slave server

Right now SlaveServer owns ShardStates and MasterConnection owns all the ShardConnections.

Ideally we can have a Shard abstraction that wraps both a ShardState and the corresponding ShardConneciton. Then SlaveServer owns a list of Shards and acts as a proxy between the Shards and MasterConnection & SlaveConnections

Add mining functions

Add ethash, sha3sha3 to cluster/miner.py.
The specific mining function for each shard can be configured through ShardConfig in config.py
Block validation function shall be updated accordingly.

progressive sync

sync takes a long time and the progress cannot be killed halfway and resumed later
we would want this feature for people to join the network easily

DNS peer cluster discovery

I hate the current form of bootnodes:
node://da6ea5897f2245346dd0787163512763f26bae9e783adb1ea94053251c1fea0f1324567ac4b1abd2dc3a1cf90a352a7bf7ad11fe7748ca5126e7f2cc5077e1a4@18.236.134.107:38291

I'd rather type something like discovery.quarkchain.io and let it be, it looks devp2p has some new features for this, let's try it out

estimate_gas failure should not close connection with master

MASTER: I1012 04:33:34.078363 jsonrpc.py:408] {"jsonrpc":"2.0","method":"estimateGas","id":"1","params":{"data":"0xa6f2ae3a","to":"0x837f9A0e7185141D75e37D55e213efd725e76710B10d0E9F","value":"0xde0b6b3a7640000","from":"0xb0997B1309C61D50495b79b2B82D377d862Db367b0c6797f"}}
SLAVE_S31: ESC[1;31mE1012ESC[1;0m 04:33:34.135198 protocol.py:167] Traceback (most recent call last):
SLAVE_S31: File "/code/pyquarkchain/quarkchain/protocol.py", line 165, in __internal_handle_metadata_and_raw_data
SLAVE_S31: await self.handle_metadata_and_raw_data(metadata, raw_data)
SLAVE_S31: File "/code/pyquarkchain/quarkchain/cluster/protocol.py", line 80, in handle_metadata_and_raw_data
SLAVE_S31: await super().handle_metadata_and_raw_data(metadata, raw_data)
SLAVE_S31: File "/code/pyquarkchain/quarkchain/protocol.py", line 152, in handle_metadata_and_raw_data
SLAVE_S31: await self.__handle_rpc_request(op, cmd, rpc_id, metadata)
SLAVE_S31: File "/code/pyquarkchain/quarkchain/protocol.py", line 127, in __handle_rpc_request
SLAVE_S31: resp = await handler(self, request)
SLAVE_S31: File "slave.py", line 430, in handle_estimate_gas
SLAVE_S31: res = self.slave_server.estimate_gas(req.tx, req.from_address)
SLAVE_S31: File "slave.py", line 1136, in estimate_gas
SLAVE_S31: return shard.state.estimate_gas(tx, from_address)
SLAVE_S31: File "/code/pyquarkchain/quarkchain/cluster/shard_state.py", line 1290, in estimate_gas
SLAVE_S31: if hi == cap and not run_tx(hi):
SLAVE_S31: File "/code/pyquarkchain/quarkchain/cluster/shard_state.py", line 1277, in run_tx
SLAVE_S31: evm_tx = self.__validate_tx(tx, evm_state, from_address, gas=gas)
SLAVE_S31: File "/code/pyquarkchain/quarkchain/cluster/shard_state.py", line 255, in __validate_tx
SLAVE_S31: validate_transaction(evm_state, evm_tx)
SLAVE_S31: File "/code/pyquarkchain/quarkchain/evm/messages.py", line 178, in validate_transaction
SLAVE_S31: rp(tx, "balance", state.get_balance(tx.sender), total_cost)
SLAVE_S31: quarkchain.evm.exceptions.InsufficientBalance: <Transaction(09e6)>: 'balance' actual:8858050000000000 target:1000000000000000000
SLAVE_S31: 
SLAVE_S31: I1012 04:33:34.135638 slave.py:148] Closing connection with master: slave_master: error processing request: <Transaction(09e6)>: 'balance' actual:8858050000000000 target:1000000000000000000
SLAVE_S31: I1012 04:33:34.135953 slave.py:144] Lost connection with master
SLAVE_S31: I1012 04:33:34.136633 slave.py:144] Lost connection with master
SLAVE_S31: I1012 04:33:34.136918 slave.py:144] Lost connection with master

ValueError: prev block not found

Using 3 machines, each running a cluster (started with mining enabled) with slaves and shards, when I trigger multiple consecutive load tests, I notice several errors similar to the following in the output:

SLAVE_S161: E1016 16:20:22.501752 shard.py:532] Traceback (most recent call last):
SLAVE_S161: File "/home/x/pyquarkchain/quarkchain/cluster/shard.py", line 530, in add_block
SLAVE_S161: xshard_list = self.state.add_block(block)
SLAVE_S161: File "/home/x/pyquarkchain/quarkchain/cluster/shard_state.py", line 656, in add_block
SLAVE_S161: self.__validate_block(block)
SLAVE_S161: File "/home/x/pyquarkchain/quarkchain/cluster/shard_state.py", line 424, in __validate_block
SLAVE_S161: block.header.hash_prev_minor_block.hex(),
SLAVE_S161: ValueError: [161] prev block not found, block height 226 prev hash 256fdd2e28f80a19beda8a7ea285483b708de6f5cf47a1f3b8c52998a12d7d28

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.