oxheadalpha / tezos-k8s Goto Github PK

View Code? Open in Web Editor NEW

50.0 10.0 26.0 3.57 MB

Deploy a Tezos Blockchain on Kubernetes

Home Page: https://tezos-k8s.io/

License: MIT License

Python 55.32% Shell 31.30% Dockerfile 1.87% Smarty 11.42% Ruby 0.09%

minikube tezos-chain zerotier kubernetes k8s tezos blockchain crypto faucet

tezos-k8s's People

Contributors

Stargazers

Watchers

tezos-k8s's Issues

Implement configuration that functions as equivalent of tezos-docker-manager.sh

See https://tezos.gitlab.io/introduction/howtoget.html#docker-images

remove nfs from default deployment

nfs setup is cumbersome

for default minikube setup rely on auto provisioned storage

Configure redis persistance for RPC authentication

remove flextesa dependency

flextesa is used to generate the block value for config.json

this introduces an additional dependency and another docker image download. can the same thing be accomplished with tezos binaries?

Update mkchain default tezos image to v8

provide tzkt indexer

provide a deployment description to run tzkt using the private chain config.

this deployment should be an optional feature for each cluster participating in the chain. every participant in the chain has the option of running their own indexer.

mkchain fails to run minikube's docker

https://github.com/tqtezos/tezos-k8s/blob/0b2ed250ae44ec23f492533130081dc511c1ecb5/mkchain/tqchain/mkchain.py#L31

executing the following

mkchain $CHAIN_NAME --zerotier-network $ZT_NET --zerotier-token $ZT_TOKEN

results in the error

bozon@buba test % minikube start       
😄  minikube v1.16.0 on Darwin 11.1
✨  Using the hyperkit driver based on existing profile
👍  Starting control plane node minikube in cluster minikube
🏃  Updating the running hyperkit "minikube" VM ...
🐳  Preparing Kubernetes v1.20.0 on Docker 20.10.0 ...
🔎  Verifying Kubernetes components...
🌟  Enabled addons: storage-provisioner, default-storageclass
🏄  Done! kubectl is now configured to use "minikube" cluster and "default" namespace by default
bozon@buba test % eval $(minikube docker-env)
bozon@buba test % mkdir mkchain && cd mkchain
python3 -m venv .venv
source .venv/bin/activate

(.venv) bozon@buba mkchain % 
(.venv) bozon@buba mkchain % pip install mkchain==0.0.1
Collecting mkchain==0.0.1
  Using cached mkchain-0.0.1-py3-none-any.whl (5.0 kB)
Collecting pyyaml
  Using cached PyYAML-5.3.1.tar.gz (269 kB)
Using legacy 'setup.py install' for pyyaml, since package 'wheel' is not installed.
Installing collected packages: pyyaml, mkchain
    Running setup.py install for pyyaml ... done
Successfully installed mkchain-0.0.1 pyyaml-5.3.1
WARNING: You are using pip version 20.2.3; however, version 20.3.3 is available.
You should consider upgrading via the '/Users/bozon/Documents/TQ/test/mkchain/.venv/bin/python3 -m pip install --upgrade pip' command.
(.venv) bozon@buba mkchain % export PYTHONUNBUFFERED=x
(.venv) bozon@buba mkchain % export CHAIN_NAME=avb     
(.venv) bozon@buba mkchain % mkchain $CHAIN_NAME --zerotier-network $ZT_NET --zerotier-token $ZT_TOKEN
Traceback (most recent call last):
  File "/Users/bozon/Documents/TQ/test/mkchain/.venv/bin/mkchain", line 8, in <module>
    sys.exit(main())
  File "/Users/bozon/Documents/TQ/test/mkchain/.venv/lib/python3.9/site-packages/tqchain/mkchain.py", line 125, in main
    "genesis_chain_id": get_genesis_vanity_chain_id(),
  File "/Users/bozon/Documents/TQ/test/mkchain/.venv/lib/python3.9/site-packages/tqchain/mkchain.py", line 48, in get_genesis_vanity_chain_id
    run_docker(
  File "/Users/bozon/Documents/TQ/test/mkchain/.venv/lib/python3.9/site-packages/tqchain/mkchain.py", line 19, in run_docker
    return subprocess.check_output(
  File "/usr/local/Cellar/[email protected]/3.9.1/Frameworks/Python.framework/Versions/3.9/lib/python3.9/subprocess.py", line 420, in check_output
    return run(*popenargs, stdout=PIPE, timeout=timeout, check=True,
  File "/usr/local/Cellar/[email protected]/3.9.1/Frameworks/Python.framework/Versions/3.9/lib/python3.9/subprocess.py", line 524, in run
    raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command 'docker run --entrypoint flextesa --rm registry.gitlab.com/tezos/flextesa:01e3f596-run vani "" --seed 4X4M58LLL043JZWS --first --machine-readable csv' returned non-zero exit status 127.

provide WireGuard as an option for the vpn layer to connect clusters

see https://gravitational.com/blog/announcing_wormhole/

Since tezos-k8s now has a docker folder containing custom containers (only zerotier today, but there will be more soon), we need to deploy these containers to docker hub (or somewhere else). It's best to do it through github actions.

Then, have mkchain figure out by itself if it's a tagged released version, and in this case, go fetch zerotier container from docker hub instead of expecting the container to be built by devspace.

This removes the devspace requirement for published releases.

add ability to start node from a snapshot

after a new node accepts an invite to join the chain they should be able to fast forward their sync by bootstrapping with a snapshot

Don't create bootstrap node p2p service if Zerotier is in use

provide snapshot export utility

the chain will require the ability to produce full and rolling snapshots

provide utility scripts to use existing techniques to generate snapshots and store them in configured locations, e.g. s3, local volume, etc.

provide tzstats indexer

provide a deployment description to run tzstats using the private chain config.

this deployment should be an optional feature for each cluster participating in the chain. every participant in the chain has the option of running their own indexer.

--number-of-bakers

This is an incremental feature on top of what we already have and is worth doing now.

Currently bootstrap-nodes is a k8s deployment while nodes are a k8s statefulset.

turn bootstrap-nodes into a statefulset
make the number of bakers configurable on mkchain i.e. --number-of-bakers just like --number-of-nodes
--generate-constants should generate the appropriate number of baking accounts
ensure every baking process in the pods are configured to bake for the account according to the statefulset pod number
every block should be baked at priority 0. scale down is supported, in which case some blocks will be baked at priority 1 or more.
scale up is not supported - you must decide at chain creation how many baking accounts you want, after that, changing the baker cycles require on-chain operations and waiting several cycles

Research removing docker dependency in mkchain

Currently using docker to generate keys (tezos docker image) and creating a genesis block hash (in flextessa). This is an akward and unnecessary complication on the pipeline and ideally would be replaced with a simpler solution if possible. This ticket requires research wrt this possibility.

Implement Tezos node RPC access via secret link

We need to be able to provide access to private/permissioned chains
via node RPC.

One approach is to set up user access to the private
network where chain operates. This works, but is fairly
complex. Consider, for example, setup with ZeroTier:

user signs in to ZT with a 3rd party account or creates a ZT account
user downloads and installs ZT software
user receives invitation to join ZT network and joins the network
ZT admin approves newly joined user

Another approach might be to authenticate access to RPC endpoints via
some HTTP authentication scheme, however most tools in the ecosystem
don't appear to support any and do not expose HTTP enough to do even
basic things like adding custom headers to requests.

Also, user account managements with a 3rd party or a custom system in
addition to Tezos accounts adds to overall complexity.

Thus we propose to implement a system that provides access to Tezos
node RPC via a secret URL - unique, unguessable URL that embeds access
token issued to a tz address after confirming tz address ownership.

In this system there is a web service ("vending machine") that issues the secret
URLs and a client who requests the secret URL to be issued to them
(multiple implementations possible, but initially some combination of standard tools - tezos-client, command
shell and curl)

Workflow to receive secret URL:

Client initiates the conversation:

curl -X POST https://vending-machine/$CHAIN_ID

Note that vending machine URL itself is a secret link emebedding chain id that needs to be shared with the client.
Vending machine responds with a GUID, e.g.

63a2096b01774122807a44e7695e5a9f

This GUID serves as both conversation identifier and nonce that will
be used to confirm ownership of a tz address
Client signs the guid and posts the guid, signature and public key

  TZ_ALIAS=key20200625
  GUID=63a2096b01774122807a44e7695e5a9f
  SIGNATURE=$(tezos-client -p PsCARTHAGazK sign bytes 0x05${GUID} for ${TZ_ALIAS} | cut -f 2 -d " ")
  PUBLIC_KEY=$(tezos-client show address ${TZ_ALIAS} 2>/dev/null | grep "Public Key:" | cut -d: -f2)
  curl -X POST -d "guid=${GUID}" -d "signature=${SIGNATURE}" -d "${PUBLIC_KEY}"  https://vending-machine

Server
- verifies that guid was previously issued by this vending machine
- validates signature
- if signature is not valid returns error, otherwise
- calculates tz address by hashing the public key
- generates access token for the tz address
- derives Tezos node RPC URL with the access token embedded, e.g.
  https://tezos-node-rpc/{ACCESS_TOKEN}
- returns the URL to the client
Client uses provided secret URL to access Tezos node RPC, e.g.
tezos-client --endpoint=${SECRET_URL} (with tezos 8.0)

provide a library of constant definitions with different performance characteristics

selecting meaningful constants for the protocol is both art and science. provide bundled constants as deployment options to replace stock parameters.json.

see for example https://gist.github.com/smondet/8b32b56cb4720d95347903ccc6604092

Move mkchain to Delphi

CI should always run the list_containers_to_publish job

Currently only run on release. Helpful to debug any potential issues with docker images before releasing.

Job publish_mkchain is currently run on all pushes and runs through all of its steps except for the actual publishing to pypi. Helpful for making sure the main steps work.

improve template generation

current mkchain script just uses python string templates to generate yaml definitons.

use a more widely accepted tool for the job like helm or kustomize

see: https://foghornconsulting.com/2020/06/04/helm-versus-kustomize/

Certain values specified in Helm values.yaml should be more consistent

e.g., the container_images prop is itself too long as well as the image names. The prop should really be images. Images themselves can just be zerotier, tezos, rpc-auth. Instead of zerotier_docker_image, rpc_auth_image, tezos_docker_image.

Making them consistent also helps with the CI getting the values so they can be updated as versions are bumped.

EDIT: Helm does not allow - in values. (It actually might be a Go templating issue). Will use _

Investigate occasional activate-job errors

I'm noticing that the activate-job is occasionally erring. Usually it restarts and the next try works. Other times all restarts fail.

Examples:

Running cmd similar to k -n tqtezos1 logs activate-job-2vmh8 -c activate to retrieve logs.

<<<<4: 500 Internal Server Error
  [ { "kind": "temporary", "id": "failure",
      "msg":
        "(Invalid_argument \"Json_encoding.construct: consequence of bad union\")" } ]
Error:
  (Invalid_argument "Json_encoding.construct: consequence of bad union")

<<<<2: 500 Internal Server Error
  [ { "kind": "permanent", "id": "proto.006-PsCARTHA.context.storage_error",
      "missing_key": [ "rolls", "owner", "current" ], "function": "copy" } ]
Error:
  Storage error:
    Cannot copy undefined key 'rolls/owner/current'.

Seb sent me some code (tezos source code I believe):

let () =
  register_error_kind
    `Permanent
    ~id:"context.storage_error"
    ~title: "Storage error (fatal internal error)"
    ~description:
      "An error that should never happen unless something \
       has been deleted or corrupted in the database."

Sometimes I see this error:

<<<<4: 500 Internal Server Error
  [ { "kind": "temporary", "id": "failure", "msg": "Fitness too low" } ]
Error:
  Fitness too low

Seb says he's seen that one very often when trying to activate a protocol on a chain that is already activated. Could be minikube has not removed all the necessary resources/storage after I deleted the namespace and re-applied the yml. This could also be related to the second error where there is deleted and/or corrupted data.

It should be possible to expand/reduce the number of bakers at run time

When a private network operator needs to onboard/oboard a 'validator' onto the network it needs to be able to simply 'add'/remove' another baker node to the network at run time and thus to reconfigure the network without rebuilding/restarting everything so to fulfil the SLAs and/or speed up the adjustment process.

Simplify node startup definitions - let node generate identity

According to https://tezos.gitlab.io/releases/version-8.html#changelog-version-8-0-rc1, starting with 8.0 rc1 tezos-node run now automatically generates identity. node.yaml and bootstrap.yaml can be now simplified and not check for identity.json themselves

Provide a more secure way to handle Zerotier credentials

It isn't very secure to set the ZT network and and auth token on the command line.

A couple of alt. possibilities:

User should add creds (and chain-name) to a file. Run . ./<file>
Pass a file path containing ZT creds as a flag to mkchain.
Set as an env var the path to the file.

Mkchain then reads from file.

Also, raw cred values are stored in values.yaml. Need to investigate setting them as encrypted secrets.

Allow user to choose CORS headers for public RPC node

       --cors-header=HEADER
           Header reported by Access-Control-Allow-Headers reported during
           CORS preflighting; may be used multiple times

       --cors-origin=ORIGIN
           CORS origin allowed by the RPC server via
           Access-Control-Allow-Origin; may be used multiple times

(shloud also be able to set them in the config.json)

move generate-constants to k8s jobs

We rely on locally installed docker to generate the constants by running short-lived commands on the tezos and flextesa container. This should mode to kubernetes jobs to reduce the HOWTO requirements.

Mkchain should then funnel the results out of kubernetes safely and back into a file, perhaps with kubectl cp or the python sdk equivalent.

Invite yaml should only contain a single peer node

Currently invite yaml gets the amount of nodes that the chain creator configured. Invitees should only have one node spun up. If they want more, they can edit the yaml file.

Package and publish Helm chart

For now I am going to publish to github pages.

Implement ability to spin up a local test network with configured number of bakers and nodes

"mkchain my-chain" in the readme fails - needs updating

The readme quickstart section of the readme contains the following:

mkchain my-chain

which when executed as instructed fails with the following message:

mkchain my-chain
usage: mkchain [-h] [--version] {generate-constants,create,invite} ...
mkchain: error: argument action: invalid choice: 'mkchain' (choose from 'generate-constants', 'create', 'invite')

The readme needs updating

implement ability to install on Windows

Windows is a popular desktop operating system. We should provide an easy to use installer or package that will automate or guide user through the necessary steps to spin up, configure and manage local tezos k8s infrastructure.

provide a mechanism to allocate funds

after the chain is created - only the bootstrap account is funded and so only the bootstrap account can bake

provide a mechanism for chain participants to request and receive funds to allow baking privileges.

Error starting rpc-auth in devspace dev: no space left on device

This is on aryeh/rpc-auth branch:

.venv ❯ devspace dev --var=CHAIN_NAME="$CHAIN_NAME"
[info]   Using kube context 'minikube'
[info]   Using namespace 'tqtezos'

? Please enter a value for FLASK_ENV development
development
[info]   Skip building image 'zerotier'
[done] √ Done building image tezos-rpc-auth:Wv4S0qV (rpc-auth)
[info]   Execute hook: sh '-c' 'mkchain create igor_chain > mkchain-devspace.yaml'
namespace/tqtezos unchanged
secret/tezos-secret unchanged
configmap/tqtezos-utils configured
configmap/tezos-config unchanged
statefulset.apps/tezos-node configured
job.batch/activate-job configured
service/tezos-bootstrap-node-rpc unchanged
service/tezos-bootstrap-node-p2p unchanged
deployment.apps/tezos-bootstrap-node configured
persistentvolumeclaim/tezos-bootstrap-node-pv-claim unchanged
[done] √ Successfully deployed chain with kubectl
[info]   Execute hook: minikube 'addons' 'enable' 'ingress'
🔎  Verifying ingress addon...
🌟  The 'ingress' addon is enabled
service/redis-service created
persistentvolumeclaim/redis-pv-claim created
configmap/redis-config created
deployment.apps/redis created
service/rpc-auth-service created
deployment.apps/rpc-auth created
ingress.networking.k8s.io/rpc-vending-machine-ingress created
ingress.networking.k8s.io/tezos-rpc-ingress created
[done] √ Successfully deployed rpc-auth with kubectl

#########################################################
[info]   DevSpace UI available at: http://localhost:8090
#########################################################

[done] √ Sync started on /Users/itkach/dev/tezos-k8s/docker/rpc-auth/server <-> . (Pod: tqtezos/rpc-auth-786cfb74c4-kktvm)
[info]   Starting log streaming for containers that use images defined in devspace.yaml

[rpc-auth-786cfb74c4-kktvm] *** Starting uWSGI 2.0.19.1 (64bit) on [Fri Nov 20 16:33:47 2020] ***
[rpc-auth-786cfb74c4-kktvm] compiled with version: 8.3.0 on 20 November 2020 15:31:22
[rpc-auth-786cfb74c4-kktvm] os: Linux-4.19.114 #1 SMP Wed Sep 2 16:52:19 PDT 2020
[rpc-auth-786cfb74c4-kktvm] nodename: rpc-auth-786cfb74c4-kktvm
[rpc-auth-786cfb74c4-kktvm] machine: x86_64
[rpc-auth-786cfb74c4-kktvm] clock source: unix
[rpc-auth-786cfb74c4-kktvm] detected number of CPU cores: 2
[rpc-auth-786cfb74c4-kktvm] current working directory: /var/rpc-auth
[rpc-auth-786cfb74c4-kktvm] detected binary path: /usr/local/bin/uwsgi
[rpc-auth-786cfb74c4-kktvm] !!! no internal routing support, rebuild with pcre support !!!
[rpc-auth-786cfb74c4-kktvm] your memory page size is 4096 bytes
[rpc-auth-786cfb74c4-kktvm] detected max file descriptor number: 1048576
[rpc-auth-786cfb74c4-kktvm] lock engine: pthread robust mutexes
[rpc-auth-786cfb74c4-kktvm] thunder lock: disabled (you can enable it with --thunder-lock)
[rpc-auth-786cfb74c4-kktvm] uwsgi socket 0 bound to TCP address 0.0.0.0:8080 fd 3
[rpc-auth-786cfb74c4-kktvm] Python version: 3.9.0 (default, Nov 18 2020, 13:39:06)  [GCC 8.3.0]
[rpc-auth-786cfb74c4-kktvm] Python main interpreter initialized at 0x55f4a9cd4260
[rpc-auth-786cfb74c4-kktvm] python threads support enabled
[rpc-auth-786cfb74c4-kktvm] your server socket listen backlog is limited to 100 connections
[rpc-auth-786cfb74c4-kktvm] your mercy for graceful operations on workers is 0 seconds
[rpc-auth-786cfb74c4-kktvm] mapped 2212928 bytes (2161 KB) for 100 cores
[rpc-auth-786cfb74c4-kktvm] *** Operational MODE: threaded ***
[rpc-auth-786cfb74c4-kktvm] WSGI app 0 (mountpoint='') ready in 1 seconds on interpreter 0x55f4a9cd4260 pid: 8 (default app)
[rpc-auth-786cfb74c4-kktvm] *** uWSGI is running in multiple interpreter mode ***
[rpc-auth-786cfb74c4-kktvm] spawned uWSGI master process (pid: 8)
[rpc-auth-786cfb74c4-kktvm] spawned uWSGI worker 1 (pid: 10, cores: 100)
[rpc-auth-786cfb74c4-kktvm] failed to watch file "/var/lib/docker/containers/81943af4ae1bb6f976b339d06d556133ecfa03bce8876fa84c33ca7bc3ef4a34/81943af4ae1bb6f976b339d06d556133ecfa03bce8876fa84c33ca7bc3ef4a34-json.log": no space left on device

[warn]   Log streaming service has been terminated
[done] √ Sync and port-forwarding services are running (Press Ctrl+C to abort services)
^C^C

Allow external access to tezos node

External clients and tezos nodes should be able to talk to the node(s) running inside k8s cluster via RPC and/or p2p

add transfer to tutorial

describe how to generate a tz account and receive funds from a private chain creator in order to interact with a private chain

Re-organize the repo around components

This repo started life as a single component repo dedicated to mkchain.
Now it holds several different, distinct components, each with its own dependencies, documentation, dev setup etc. It would help to re-organize the repo to reflect this, with some content also living at the root of the project to bring it all together - top-level README and perhaps a Makefile and/or devspace.yaml and or other artifacts.

❯ tree
.
├── LICENSE
├── Makefile
├── README.md
├── devspace.yaml
├── mkchain
│   ├── MANIFEST.in
│   ├── README.md
│   ├── doc
│   ├── mkchain-devspace.yaml
│   ├── scripts
│   ├── setup.cfg
│   ├── setup.py
│   ├── test
│   ├── tqchain
│   │   ├── __pycache__
│   │   │   ├── _version.cpython-39.pyc
│   │   │   └── mkchain.cpython-39.pyc
│   │   ├── _version.py
│   │   ├── deployment
│   │   │   ├── bootstrap-node.yaml
│   │   │   ├── common.yaml
│   │   │   ├── node.yaml
│   │   │   ├── rpc-auth.yaml
│   │   │   └── zerotier.yaml
│   │   ├── mkchain.py
│   │   └── utils
│   │       ├── generateTezosConfig.py
│   │       └── import_keys.sh
│   └── versioneer.py
├── rpc-auth
│   ├── Dockerfile
│   ├── requirements.txt
│   ├── rpc-auth-client.sh
│   ├── scripts
│   ├── server
│   │   └── index.py
│   └── test
└── zerotier
    ├── Dockerfile
    ├── entrypoint.sh
    └── supervisor-zerotier.conf

provide ZeroTier as an option for the vpn layer

github ci.yaml needs to be updated due to codebase restructure

Code restructure: #59

Re-organize documentation

Currently documentation is split across several files, such as README.md, MULTICLUSTER.md, DEVELOPMENT.md and sequence for successful setup is not sequential reading. Lets consolidate and tidy this up to be a coherent set of docs, also taking into account component-based structure that will be implemented in #53

mkchain should be able to accept more than 1 bootstrap peer

Currently the mkchain arg --bootstrap-peer only accepts one peer. It should be able to accept multiple. Either by specifying the arg multiple times and/or being able to pass many peers to the single arg.

make every --create input configurable within chain.yaml

At the moment there are 2 sources that can alter the output of mkchain --create: changing command line arguments or changing the input chain yaml generated by --generate-constants. This ticket merges these 2 sources into one.

ensure every cli argument has a yaml equivalent
--generate-constants should take every cli argument and record it into input yaml, so --create can be passed with zero arguments
--create still supports arguments, in which case they will override the input yaml (useful for i.e. scale up)

Enable creation of a baking operation (to join mainnet or otherwise) using k8s toolchain

Job To Be Done (JTBD) of this functionality:

When an organisation wants to become a participant in the crypto-economic system of Tezos, it would use the same infrastructure toolchain (helm/k8s/docker/etc) as for 'private network' to test, create, monitor and operate an institutional-grade production baking cluster - with high SLAs - in order to maximise the operation availability and thus baking rewards (revenue).

The following scenarios should be possible:

create a single or multi-node baking set-up and join mainnet
the same but joining a testnet of customers' choice
rolling updates of the baking cluster with newly released Tezos software versions

Occasionally nodes fail to connect over Zerotier

When I had a connection failure, I checked /var/tezos/zerotier_network_members.json in the container. I noticed that the name field for a member was set as "". I also noticed in one instance that there was nothing in the ipAssignments array either.

Empty name causes generateTezosConfig.py to not get the ip of the bootstrap node. Hence the peer nodes will not have the bootstrap node's ip in their bootstrap-peers list and they won't connect.

This is hard to reproduce.
Discussing with Nicolas, it seems we should fail the entrypoint.sh script on api call errors, and also fail the generateTezosConfig.py if it didn't find any bootstrap node peers.

Ability to update configMap through initContainers

We need the ability for the Tezos client to dynamically generate keys which can then be used in the configMap and throughout the rest of the k8s. Once the keys are generated, we need to be able to update the configMap so that the Tezos node can start with the right config.json and parameters.json for both genesis and baker keys.

allow user to specify docker image for tezos node

allow parameters for the yaml - helm? kustomize?

Automatically cleanup Zerotier members when deleting a chain

This suggestion is primarily for devs. Because devs are often spinning up new chains and then deleting them, ZT ip's keep increasing.

A couple of issues:

At a certain point, ZT website warns of too many members. A script was provided to delete network members. You can filter nodes to delete by diff values. However, this is a manual process that requires you to remember to do so.
Currently you are required to delete members in order for a new chain to work. This is bec bootstrap nodes are created and are assigned new ip's by ZT. Each peer node on launch looks for bootstrap node ip's to get the p2p service working, but often finds stale ip's that it thinks are live running nodes. Then nodes do not all connect to each other. This point seems to be fixed with pr #103

A possible solution to these problems is to create a Helm hook that on deletion of release (post-delete) fires a command to delete the ZT member. helm uninstall does not remove pvc's so that would need to be done likely via kubectl.

Most likely as this should be implemented only for developers, there needs to be a way to templatize the hook. Possible way is to create a separate values.yaml file for different envs. dev yaml will have a value that is used to add the hook yaml. This can be implemented with this simple suggestion. There are tools for this: https://github.com/roboll/helmfile. Nice intro to helmfile: https://www.arthurkoziel.com/managing-helm-charts-with-helmfile/

Other links: https://stackoverflow.com/questions/50596384/delete-kubernetes-secret-on-helm-delete?rq=1

CI seems to make pre-release a release

I noticed that test release 0.0.4 marked as pre-release, turned into a regular release while the CI was running. 0.0.2 and 0.0.1 still show as a prerelease. 0.0.3 may have been released as regular.

I don't know what changed in the CI config that would cause this issue.

It may have something to do with the action (https://github.com/softprops/action-gh-release) we are running to publish mkchain to pypi. Here: https://github.com/tqtezos/tezos-k8s/blob/908ca91043251bd9e4d4e4c3fa01d6d03097dbb2/.github/workflows/ci.yml#L109-L115
It is possible that it is setting the release from pre to regular. The action does have a parameter to set for prerelease.

Provide way to purge your zerotier network of stale nodes

As many k8s tezos chain spin up and downs happen during development, your free tier zerotier network warns you of maxing out the number of nodes that can be part of the network.

There is no way to bulk delete nodes from the UI.

Provide a script that can delete nodes for you.

Proposal: Use dhall for yaml generation instead of (mostly) python

I know that this has been talked about for a while, but I would like to formally make my proposal to use Dhall as our kubernetes yaml-templating solution. I have a working implementation to demonstrate as well, which can be found on the branch s-zeng/dhall_config. See below for more

Problem

Python, as a general purpose language, can often be not the most ergonomic language to write configurations in.

Our current process with mkchain.py involves quite a number of steps - we have a variety of existing yaml templates, which we consume as python data structures to edit a handful of them in place, and then spit everything back out as yaml again.

In particular, we have a main loop which involves looking through every yaml template, wherein we check each template for the appropriate name to apply a particular patch on. Essentially:

if safeget(k, "metadata", "name") == <a>:
    # apply patch <a>
    ...
if safeget(k, "metadata", "name") == <b>:
    # apply patch <b>
    ...

This style, in my opinion, obfuscates the relationship between the original templates and the logic that goes into adjusting them, and can make it hard to follow how the inputs become the final outputted yaml file.

Why Dhall

Unlike Python, Dhall is purpose-built as a configuration language, with a number of features designed specifically to this end. A few for summary:

Essentially JSON or YAML, but with functions
Strong types: Has a type system in the vein of ML/Haskell. Even has dependent types (!) if one is so inclined to go that far. Allows for much stronger guarantees about the structure of data
Pure and non-turing complete: unlike general-purpose languages, guarantees lack of IO, and guarantees termination. Dhall is a strongly-normalizing language, so every expression is guaranteed to normalize into a most-reduced form
Can output to always-correct JSON or YAML

Below are snippets from my prototype dhall implementation, along with comparisons to the original python if applicable.

main mkchain function (only have mkchain create implemented right now but should be trivial to implement invite:

let tezos = ./tezos/tezos.dhall

let k8s = tezos.common.k8s.Resource

let create =
      λ(opts : tezos.chainConfig.Type) →
        [ k8s.Namespace tezos.namespace
        , k8s.Secret (tezos.chainConfig.makeK8sSecret opts)
        , k8s.ConfigMap tezos.utilsConfigMap
        , k8s.ConfigMap (tezos.chainConfig.makeChainParams opts)
        , k8s.StatefulSet (tezos.makeNode opts)
        , k8s.Job (tezos.bootstrap.makeActivateJob opts)
        , k8s.Service tezos.bootstrap.port_services.rpc
        , k8s.Service tezos.bootstrap.port_services.p2p
        , k8s.Deployment (tezos.bootstrap.makeDeployment opts)
        , k8s.PersistentVolumeClaim tezos.bootstrap.pvc
        , k8s.PersistentVolumeClaim tezos.zerotier.pvc
        , k8s.ConfigMap (tezos.zerotier.makeZTConfig opts)
        , k8s.DaemonSet tezos.zerotier.bridge
        ]

in  { tezos, create }

Sample config + running (current weakness: stil needs an external script to generate things like keys and such. see below for more info):

let mkchain = ./mkchain.dhall

let opts =
      mkchain.tezos.chainConfig.ChainOpts::{
      , chain_name = "simons_chain"
      , baker = True
      , keys = toMap
          { baker_secret_key = "..."
          , bootstrap_account_1_secret_key = "..."
          , bootstrap_account_2_secret_key = "..."
          , genesis_secret_key = "..."
          }
      , bootstrap_timestamp = "2020-10-27T20:44:49.522093+00:00"
      , genesis_chain_id = "BL4D6Gg3XArdsawVRcvMiyny2QEhWtUzi4CXgzemUKkEnuedDQG"
      , additional_nodes = 4
      , zerotier_network = "6ab565387a53269c"
      , zerotier_token = "agnshaxe0mqF1WAiWDVHw1lIRbds9xyq"
      , zerotier_hostname = "46305149-64cb-4f74-aea0-7f249324b574"
      }

in  mkchain.create opts

The above can be used to output yaml with the following command:

dhall-to-yaml --documents <<< sample.dhall

Templating activate job in dhall:

let makeActivateJob =
      λ(opts : chainConfig.Type) →
        let varVolume =
              merge
                { minikube = common.volumes.bootstrap_pvc_var
                , eks = common.volumes.var
                }
                opts.cluster

        let spec =
              k8s.JobSpec::{
              , template = k8s.PodTemplateSpec::{
                , metadata = k8s.ObjectMeta::{ name = Some "activate-job" }
                , spec = Some k8s.PodSpec::{
                  , initContainers = Some
                    [ jobs.make_import_key_job opts
                    , jobs.config_generator
                    , jobs.wait_for_node
                    , jobs.make_activate_container opts
                    , jobs.make_bake_once opts
                    ]
                  , containers =
                    [ k8s.Container::{
                      , name = "job-done"
                      , image = Some "busybox"
                      , command = Some
                        [ "sh", "-c", "echo \"private chain activated\"" ]
                      }
                    ]
                  , restartPolicy = Some "Never"
                  , volumes = Some
                    [ common.volumes.config, varVolume, common.volumes.utils ]
                  }
                }
              }

        in  k8s.Job::{
            , metadata = common.tqMeta "activate-job"
            , spec = Some spec
            }

Compared to python yaml template + if statement:

apiVersion: batch/v1
kind: Job
metadata:
  name: activate-job
  namespace: "tqtezos"
spec:
  template:
    metadata:
      name: activate-job
    spec:
      initContainers:
      - name: import-keys
        command: ['sh', '/opt/tqtezos/import_keys.sh']
        envFrom:
        - secretRef:
            name: tezos-secret
        volumeMounts:
        - name: tqtezos-utils
          mountPath: /opt/tqtezos
        - name: var-volume
          mountPath: /var/tezos
      - imagePullPolicy: Always
        name: tezos-config-generator
        image: python:alpine
        command: ["python", "/opt/tqtezos/generateTezosConfig.py"]
        envFrom:
        - configMapRef:
            name: tezos-config
        volumeMounts:
        - name: config-volume
          mountPath: /etc/tezos
        - name: tqtezos-utils
          mountPath: /opt/tqtezos
        - name: var-volume
          mountPath: /var/tezos
      - name: wait-for-node
        image: busybox
        command: ['sh', '-c', 'until nslookup tezos-bootstrap-node-rpc; do echo waiting for tezos-bootstrap-node-rpc; sleep 2; done;']
      - name: activate
        command: ["/usr/local/bin/tezos-client"]
        volumeMounts:
        - name: config-volume
          mountPath: /etc/tezos
        - name: var-volume
          mountPath: /var/tezos
      - name: bake-once
        command: ["/usr/local/bin/tezos-client"]
        args: ["-A", "tezos-bootstrap-node-rpc", "-P", "8732", "-d", "/var/tezos/client", "-l", "bake", "for", "baker", "--minimal-timestamp"]
        volumeMounts:
        - name: config-volume
          mountPath: /etc/tezos
        - name: var-volume
          mountPath: /var/tezos
      containers:
      - name: job-done
        image: busybox
        command: ['sh', '-c', 'echo "private chain activated"']
      restartPolicy: Never
      volumes:
      - name: config-volume
        emptyDir: {}
      - name: var-volume
        emptyDir: {}
      - name: tqtezos-utils
        configMap:
          name: tqtezos-utils

                if safeget(k, "metadata", "name") == "activate-job":
                    k["spec"]["template"]["spec"]["initContainers"][0][
                        "image"
                    ] = c["docker_image"]
                    k["spec"]["template"]["spec"]["initContainers"][3][
                        "image"
                    ] = c["docker_image"]
                    k["spec"]["template"]["spec"]["initContainers"][3]["args"] = [
                        "-A",
                        "tezos-bootstrap-node-rpc",
                        "-P",
                        "8732",
                        "-d",
                        "/var/tezos/client",
                        "-l",
                        "--block",
                        "genesis",
                        "activate",
                        "protocol",
                        c["protocol_hash"],
                        "with",
                        "fitness",
                        "-1",
                        "and",
                        "key",
                        "genesis",
                        "and",
                        "parameters",
                        "/etc/tezos/parameters.json",
                    ]
                    k["spec"]["template"]["spec"]["initContainers"][4][
                        "image"
                    ] = c["docker_image"]

                    if args.cluster == "minikube":
                        k["spec"]["template"]["spec"]["volumes"][1] = {
                           "name": "var-volume",
                           "persistentVolumeClaim": {
                             "claimName": "tezos-bootstrap-node-pv-claim" } }

oxheadalpha / tezos-k8s Goto Github PK

tezos-k8s's People

Contributors

Stargazers

Watchers

Forkers

tezos-k8s's Issues

Job To Be Done (JTBD) of this functionality:

Problem

Why Dhall

Recommend Projects

Recommend Topics

Recommend Org