skalenetwork / skale-admin Goto Github PK

View Code? Open in Web Editor NEW

18.0 9.0 7.0 4.94 MB

SKALE admin docker container orchestrates all other SKALE Docker containers

Home Page: https://skale.network

License: GNU Affero General Public License v3.0

Dockerfile 0.10% Python 99.36% Shell 0.44% Jinja 0.10%

skale-network docker skale-admin

skale-admin's Introduction

SKALE Admin

This repo contains source code for 3 core SKALE Node containers:

skale_admin - worker that manages sChains creation and node rotation
skale_api - webserver that provides node API
celery - distributed task queue

API reference

SKALE API reference could be found in the docs repo: SKALE Node API.

Development

Run tests locally

Run local ganache, download and deploy SKALE Manager contracts to it

ETH_PRIVATE_KEY=[..] MANAGER_BRANCH=[..] bash ./scripts/deploy_manager.sh

ETH_PRIVATE_KEY - it could be any valid Ethereum private key (without 0x prefix!)
MANAGER_BRANCH - tag of the SKALE Manager image to use ($MANAGER_BRANCH-latest will be used)
SGX_WALLET_TAG - tag of the SGX simulator to use (optional, latest will be used by default)

List of the available SM tags: https://hub.docker.com/r/skalenetwork/skale-manager/tags
List of the available SGX tags: https://hub.docker.com/r/skalenetwork/sgxwalletsim/tags

Run SGX wallet simulator and all tests after it

ETH_PRIVATE_KEY=[...] SCHAIN_TYPE=[...] bash ./scripts/run_tests.sh

ETH_PRIVATE_KEY - it could be any valid Ethereum private key (without 0x prefix!)
SCHAIN_TYPE - type of the chain for the DKG test (could be test2 - 2 nodes, test4 - 4 nodes, tiny - 16 nodes)

Test build:

export BRANCH=$(git branch | grep -oP "^\*\s+\K\S+$")
export VERSION=$(bash scripts/calculate_version.sh)
bash scripts/build.sh

License

All contributions to SKALE Admin are made under the GNU Affero General Public License v3. See LICENSE.

skale-admin's People

Contributors

Stargazers

Watchers

Forkers

keytwojohn fullertasha jwitte3

skale-admin's Issues

Dry run failed during insufficient funds on schain wallet

Preconditions
Skale manager: 1.8.0-beta.0
Skale admin: 1.1.0-beta.33
Transaction-manager: 1.1.0-beta.7
Step to reproduce
Create schain with not enough funds on schain wallet

Actual result
In case if we have insufficient funds on schain wallet nodes will send reverted tx. This gas cost will reduced from node wallet.
Tx example:
https://rinkeby.etherscan.io/tx/0xc44d83474ad8612493d34345aa82cf6675f543efed35140cb6acac2dfe2937a4

┆Issue is synchronized with this Jira Bug

Missing mainnet chain id for IMA container

Due to the Berlin fork, CID_MAIN_NET env variable now required for the IMA container (--cid-main-net). Default value (-4) is no longer works.

┆Issue is synchronized with this Jira Bug

Update SM docker container and node components for the new skale-manager

skale-manager compilation and deployment procedure changed. Now helper scripts, docker build for skale-manager docker container and other components should be updated.

┆Issue is synchronized with this Jira Task

Check web3_clientVersion

Make eth node endpoint trusted only for Geth clients

┆Issue is synchronized with this Jira Task

Rotate after block should be moved to configs.yml and depend on the env type and sChain size

Currently, we have only one value for rotate_after_block. It should be different for different sChain sizes and env types.

┆Issue is synchronized with this Jira Task

Change logs dump command in the CLI to work without API

skale logs dump [PATH]

This command now works via skale-api docker container.
Since now we have all required logic in the node-cli, logs dump command should be implemented in the node-cli only.

┆Issue is synchronized with this Jira Task

Misleading wallet info

skale-admin/core/schains/utils.py

Line 31 in 49ccb7b

logger.info('Trying to notify not enough balance...')

It's not notifying that wallet has not enough ETH. Info should be framed as: checking the balance of the node wallet to have enough SKL tokens or ETH

Secondly, This is the node wallet. And why there should be at least 0.1 SKALE tokens in the node wallet?

Admin: Increase reconnect time for SGX

Preconditions
Versions

Step to reproduce
Turn off sgx before schain creation for 10-20 min(not less than 10 min)
Start creating schain

Actual result
Skale admin have only ~600 seconds to reconnect to sgx.
As result for successful dkg we need to restart skale admin manually in 120 min
Otherwise dkg will failed

NOTE: This case related mostly to problem with sgx during dkg process

┆Issue is synchronized with this Jira Task

Concurrent writes during block syncing check

The block syncing check contains two steps:

Check if last block timestamp is not older than 30 seconds.
Check if the last used block is less than the current one.
Seconds step contains the problem: There is a place in skale-admin where last block written concurrently to the file.
We can either disable second step completely. Or fix it

┆Issue is synchronized with this Jira Bug

Storage limit should be calculated dymamically

Currently, the storage limit is static and set here ~~> https://github.com/skalenetwork/skale~~node/blob/4363af9b1b3ef64e302fa24321a5bdd7721ba2ab/schain_allocation.yml#L14
Instead of this is should be calculated dynamically.

LevelDB (30% of the sChain volume) -> storage limit should be 70% of this 30%.

┆Issue is synchronized with this Jira Task

Return disk size of attached storage in bytes from hardware checks

Currently attached storage size in hardware checks returns in kilobytes which is confusing because memory returns in bytes. It's better to change to bytes

┆Issue is synchronized with this Jira Task

Schains don't work with a new sgxwallet which was backuped on new url

create 17+ nodes
create medium schains
create backup for first sgxwallet and not run container
copy backup to second sgx server run new sgxwallet from backup
change in .env on node with schains url for sgx wallet from first to second
update node

Expected: node and schains works fine with new sgxwallet url
Actual: schains still looking at the first sgxwallet url and stuck

┆Issue is synchronized with this Jira Bug

skale-api returns incorrect DKG status

Watchdog returns false for DKG, when DKG is completed after n-th rotation. While check for DKG on skale-admin is true.
Watchdog version: 1.1.2-beta.0

[~accountid:5b2c7d78927da916aaaae26b] add admin version please

Watchdog returns:

Node ID        sChain Name         Data directory    DKG    Config file   Volume   Container    IMA    Firewall   RPC    Blocks
-------------------------------------------------------------------------------------------------------------------------------
13        rhythmic-pherkad-minor   True             False   True          True     True        False   True       True   True

┆Issue is synchronized with this Jira Bug

Admin: skaled container didn't restart after SIGABRT

Preconditions
Skale-admin:1.1.0-beta.25
Skaled: 3.4.9-develop.0

Step to reproduce
Spin up schain

Actual result

Admin didn't restart container after SIGABRT
Node: 44.241.162.179
Schain name: loud-gienah-cygni

Skaled log

Failed sChain checks
sChain name: loud-gienah-cygni


# Failed checks: rpc, blocks
[2021-02-10 11:42:05,498 INFO] tools.notifications.messages:119 - ThreadPoolExecutor-0_0 - Saving new checks state 399 [('blocks', False), ('config', True), ('container', True), ('data_dir', True), ('dkg', True), ('exit_code_ok', True), ('firewall_rules', True), ('ima_container', True), ('rpc', None), ('volume', True)]
[2021-02-10 11:42:05,520 INFO] web.models.schain:106 - ThreadPoolExecutor-0_0 - Changing first_run for loud-gienah-cygni to False
[2021-02-10 11:42:05,561 INFO] web.models.schain:116 - ThreadPoolExecutor-0_0 - Changing new_schain for loud-gienah-cygni to False
[2021-02-10 11:42:06,036 INFO] core.schains.creator:197 - ThreadPoolExecutor-0_0 - Running monitor for sChain loud-gienah-cygni in REGULAR mode
[2021-02-10 11:42:51,773 INFO] core.schains.creator:131 - MainThread - Creator procedure finished
[2021-02-10 11:42:51,890 INFO] core.schains.creator:87 - MainThread - Creator process is joined.



┆Issue is synchronized with this [Jira Bug](https://skalelabs.atlassian.net/browse/SKALE-3865)

Add node-cli and .removed_containers logs to skale logs dump command

Currently skale logs dump command downloads an archive with logs only from docker containers. The suggestion is to add node-cli and .removed_containers logs to the archive.

┆Issue is synchronized with this Jira Task

Fix sChain container restart after SSL certs upload

Now SSL certs are not picked up by sChain containers after upload. To fix it we need to re-create each sChain container instead of restarting it.

┆Issue is synchronized with this Jira Bug

Remove debug APIs for skaled (Mainnet)

Option --enable-debug-behavior-apis now always present in skaled CMD, it should be removed for the Mainnet build.

NOTE: Probably we should add some flag to the skale-node that will indicate testnet/mainnet/another setup.

┆Issue is synchronized with this Jira Task

Alright tx failed

Preconditions
Versions
20 nodes on the network

Step to reproduce
Create schains
Observe dkg tx

Actual result
Latest tx alright always fail:
https://rinkeby.etherscan.io/tx/0xaccca2c5a2434c8d698e2cb8a0804c41fc308b203caf76fbc04395ed2914ab1f

Solution: Change estimate gas multiplier to 1.5. Latest alright always cost more then previous.

┆Issue is synchronized with this Jira Bug

Investigate preferred SSL certificates providers

Need to investigate and compare SSL certificates providers and associated costs to select preferred options for validator set up
Example: let's Encrypt and others

┆Issue is synchronized with this Jira Task

Monitor doesn't recreate data dir for schain in RESTART mode

STR:

create 17+ nodes
create schain
run node exit with node that contains 4 schain.
first rotation DKG should fail for any of schain.

Expected: schain eventually should be rotated and keep working.
Actual: node that had schain before second rotation cannot successfully save secret_key because of FileNotFoundError: [Errno 2] No such file or directory: '/skale_node_data/schains/squeaking-shaula/secret_key_2.json'

The problem is that after failed dkg data dir for the schain is removed, but second rotation during monitor in RESTART mode it’s not recreated, so dkg procedure eventually failed.

┆Issue is synchronized with this Jira Bug
┆Attachments: logs.txt

Restructure sChain storage limits

Currently, internal sChain volume limits are calculated in 2 different ways for the different pieces. Consensus, LevelDB, and filestorage limits are calculated dynamically, but the storage limit is pre-set.

Everything should be generated in advance using static file in skale-node (configs.yml) and Python script in helper-scripts. It should generate schain_allocation.yml file with all params (in bytes)

┆Issue is synchronized with this Jira Task

Update ganache in all CI tests once London fork will be supported

We should update ganache version that is currently used for testing purposes in all repos (admin, validator-cli, etc) to the version that supports London fork to have the behavior similar to the Mainnet.

Berlin-enabled ganache is not released yet. Track ticket: trufflesuite/ganache#821

NOTE: Not critical.

┆Issue is synchronized with this Jira Task

Always delete existing BTRFS shapshots when an outside snapshot is downloaded

┆Issue is synchronized with this Jira Bug

Login command returns strange error if specified user not exists

python main.py user login

Enter username: test
Enter password:
Authorization failed: {"errors": [{"msg": "<Model: User> instance matching query does not exist:\nSQL: SELECT "t1"."id", "t1"."username", "t1"."password", "t1"."token", "t1"."join_date" FROM "user" AS "t1" WHERE (("t1"."username" = ?) AND ("t1"."password" = ?)) LIMIT ? OFFSET ?\nParams: ['test', '098f6bcd4621d373cade4e832627b4f6', 1, 0]"}]}

Add CAP_SYS_NICE capability to docker container

CAP_SYS_NICE lets change thread priority

┆Issue is synchronized with this Jira Task

Change default route (0.0.0.0) to node public IP for sChain <-> IMA communication

Currently, IMA interacts with sChain using the default bind-to-all interface - 0.0.0.0 which causing unexpected errors. It should be changed to the node public IP address (SCHAIN_RPC_URL env variable for IMA docker container).

┆Issue is synchronized with this Jira Bug

skaled container is stuck when it has 'created' status

Have an infrequently error when schain create. Sometimes skaled container is stuck because of 'created' status when run schain create

versions:
admin:1.1.0-beta.39
schain:3.5.12-develop.0
node_cli:1.1.0-beta.22

str:
create schain on 16 VMs

┆Issue is synchronized with this Jira Bug

Add flag to disable IMA in skale-admin

IMA container should be optional. IMA should be enabled by default.

┆Issue is synchronized with this Jira Task

Increase the number of ports for schain

We need to increase the number of ports for a schain to 64

┆Issue is synchronized with this Jira Task

Increase skaled container stop timeout to 1 minute

Admin and node-cli should wait for 1 minute after stoping skaled before removing it. Currently, it's 40 seconds.

┆Issue is synchronized with this Jira Task

Check if schain is registered in IMA before creating IMA container

Currently there is a check inside ima which verify if schain is registered. But we still creating container even it wouldn't work. It's better to do such verification in skale-admin before container creation.

┆Issue is synchronized with this Jira Task

Test admin-skaled interaction using skaled-emulator

1 create schain using skalenetwork/skaled-emulator instead of skaled
2 check skaled statuses: which started successfully which not
3 if some of unstarted skaleds print "FAILURE" in logs - report about it
4 document exit reason from skaled's logs and run repair on unstarted skaleds
5 after all 16 skaleds started - run node update on one (or serveral at once if it's easier) skaleds 10-20 times and check that everything is successfull

┆Issue is synchronized with this Jira Task

Cleanup chaos in exit codes

Sync exit codes of SKALE admin and skaled

┆Issue is synchronized with this Jira Task

When a node with many chains starts up after long time, start chains in sequence

Start chain1. Wait 10 minutes or more, so it can download snapshot.
Start chain2. Wait 10 minutes or more, so it can download snapshot.

etc

┆Issue is synchronized with this Jira Bug

Add hardware and geth checks to skale-api

We need to verify that node's machine hardware is met requirements before schain creation. Also it's better to ensure that eth client is running correctly. We need to add corresponding healthchecks to skale-api.

┆Issue is synchronized with this Jira Task

Pull sChain Docker container in init and update procedures

In the current version, sChain container pulls just before the start by admin. Instead, it's better to download it in advance during init and update procedures. This should help with Cretated state issue.

┆Issue is synchronized with this Jira Task

Switch colors in logs off when sending logs to an external server

Switch colors in logs off when sending logs to an external server such as ELK

┆Issue is synchronized with this Jira Task

Add additional field in schain config

Need to update schain config to have opportunity see true ETH on schain in metamask.
Config example:
"skaleConfig": {
"nodeInfo": {
"nodeName": "Node1",
"nodeID": 1112,
"bindIP": "127.0.0.1",
"basePort": 1231,
"bindIP6": "::1",
"basePort6": 1231,
"logLevel": "trace",
"logLevelProposal": "trace",
"adminOrigins": [
"*"
],
"ipc": false,
"ipcpath": "./ipcx",
"db-path": "./node",
"httpRpcPort": 15000,
"httpsRpcPort": 15010,
"wsRpcPort": 15020,
"wssRpcPort": 15030,
"httpRpcPort6": 15000,
"httpsRpcPort6": 15010,
"wsRpcPort6": 15040,
"wssRpcPort6": 15050,
"acceptors": 1,
"infoHttpRpcPort": 16000,
"infoHttpsRpcPort": 16010,
"infoWsRpcPort": 16020,
"infoWssRpcPort": 16030,
"infoHttpRpcPort6": 16000,
"infoHttpsRpcPort6": 16010,
"infoWsRpcPort6": 16040,
"infoWssRpcPort6": 16050,
"info-acceptors": 1,

For more info ask [~accountid:5beaf49dc1d1402b40229cd2]

┆Issue is synchronized with this Jira Task

Improve snapshot sending/receiving procedure

The following approach is suggested:

For now leave Large schain type completely out of the scope, because there some other issues that prevents us to create it.
Release first mainnet version without related feature.
For the second mainnet update implement one of the following solutions (going to decide later):

a. Saving data to the temporary space (reserved space inside attached storage) without limiting number of schains that currently downloading snapshots (modifications in both skale node components and skaled).

b. Send and receive snapshots using streams without saving snapshots to any non btrfs file/directory (require only skaled changes).

Investigate end to end flow: moving from schain to another schain

Need to investigate the flow and functional changes to transition data from one schain to another
May imply starting schain from snapshot from another schain

┆Issue is synchronized with this Jira Task

Endpoint check fails on testnet nodes

Endpoint check fails on some Testnet nodes with 500 error

┆Issue is synchronized with this Jira Bug

When autorepair is called logs should be copied to some place and not deleted.

┆Issue is synchronized with this Jira Bug

Cleaner didn't handle rotated sChain

sChain was rotated from node due to the failed DKG, cleaner didn't remove it because of this condition check:

if not skale.schains_internal.is_schain_exist(schain_name) or \
            is_exited(schain_name, dutils=dutils):
        logger.info(arguments_list_string(
            {'sChain name': schain_name}, 'Removed sChain found')
        )

┆Issue is synchronized with this Jira Bug

Watchdog should check that the validator is connected to geth and not to Infura or Pockt

┆Issue is synchronized with this Jira Bug

Rotation. Node send complaint to another node without broadcast sending

create 17+ nodes
create schain
run node exit with node that contains 4 schain.
first rotation DKG should fail for the schain.

Expected: schain eventually should be rotated and keep working.
Actual_1: node G send complaint to node B before broadcast transaction on node G

┆Issue is synchronized with this Jira Bug
┆Attachments: node-B-skale-logs-dump-2021-04-15-14_33_20.tar.gz | node-G-skale-logs-dump-2021-04-15-14_38_06.tar.gz

Design and implement new configs system for different envs

The new system that will allow us to set values such as log level, system requirements, etc for the differents envs - testnet/mainnet/qa/other

┆Issue is synchronized with this Jira Task

SKALE Admin should ALWAYS clear data_dir before starting skaled with --download-snapshot

skaled cannot start from snapshot if there are local snapshots.
see proposed fix of this here: skalenetwork/skaled#455

┆Issue is synchronized with this Jira Bug

Rotation. Have node_id: -1 - the node did not find itself in the list of nodes for the schain from contract.

create 17+ nodes
create 4 schain
run node exit with node that contains 4 schain.
first rotation DKG should fail for the schain.

Expected: schain eventually should be rotated and keep working.
Actual_1: node A send complaint to node B after removing node A from schain squeaking-shaula group on contract
Actual_1_1: complaint was sended because skale_admin on node A do not see broadcast what was sended from node B
Actual_2: node A truying send broadcast for schain 'squeaking-shaula', despite the fact that this node A is not in the group for this schain on contract
Actual_3: monitor show schain squeaking-shaula on node and not show this schain on contract in skale_admin logs

┆Issue is synchronized with this Jira Bug
┆Attachments: node-A-skale-logs-dump-2021-04-15-14_32_18.tar.gz | node-B-skale-logs-dump-2021-04-15-14_33_20.tar.gz | rotated-node-skale-logs-dump-2021-04-15-14_30_05.tar.gz

add dkg fake complaint test to skale-test-cli

add 5-th dkg-complaint type test to skale-test-cli

┆Issue is synchronized with this Jira Task

Add API to watchdog and skale-api to return only block health check for sChain

Currently watchdog (and skale-api) provides an ability to get only all sChain checks (config, DKG, RPC, blocks, volume, firewall, etc) which consumes lots of resources and time.
Proposal from validator: add an API to retrieve only the latest health check - blocks check, assuming that if blocks are mining and local RPC is available, then sChain is operating normally.

┆Issue is synchronized with this Jira Task

skalenetwork / skale-admin Goto Github PK

skale-admin's Introduction

SKALE Admin

API reference

Development

Run tests locally

License

skale-admin's People

Contributors

Stargazers

Watchers

Forkers

skale-admin's Issues

Recommend Projects

Recommend Topics

Recommend Org