Code Monkey home page Code Monkey logo

playground's Introduction

Playground

The playground is an example deployment of the Tinkerbell stack for use in learning and testing. It is not a production reference architecture. Please use the Helm chart for production deployments.

Quick-Starts

The following quick-start guides will walk you through standing up the Tinkerbell stack. There are a few options for this. Pick the one that works best for you.

Options

Next Steps

By default the Vagrant quickstart guides automatically install Ubuntu on the VM (machine1). You can provide your own OS template. To do this:

  1. Login to the stack VM

    vagrant ssh stack
  2. Add your template. An example Template object can be found here and more Template documentation can be found here.

    kubectl apply -f my-OS-template.yaml
  3. Create the workflow. An example Workflow object can be found here.

    kubectl apply -f my-custom-workflow.yaml
  4. Restart the machine to provision (if using the vagrant playground test machine this is done by running vagrant destroy -f machine1 && vagrant up machine1)

playground's People

Contributors

cbkhare avatar chrisdoherty4 avatar detiber avatar displague avatar dmajrekar avatar douglaswainer avatar gauravgahlot avatar gianarb avatar jacobweinstock avatar jarededwards avatar jgavinray avatar jmarhee avatar mergify[bot] avatar micahhausler avatar mmlb avatar moadqassem avatar mrchrd avatar ncopa avatar nshalman avatar qmfrederik avatar rgl avatar splaspood avatar stappersg avatar swills avatar tstromberg avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

playground's Issues

examples/Debian OS provisioning not working- kexec?

When using the example listed at https://docs.tinkerbell.org/deploying-operating-systems/examples-debian/ I can't seem to get kexec working- I get a "function not implemented" error. It's unclear if this is due to a lack of kexec support in OSIE (should I be using OSIE?) or somewhere else.

Expected Behaviour

Using the example at the link above should execute kexec to switch to the new debian kernel after streaming the image.

Current Behaviour

The kexec action goes to STATE_FAILED with the error: function not implemented.

Possible Solution

Unsure if this down to OSIE or the debian image; suspect the former since afaik Debian Buster has exec support baked-in by default. Should I be using OSIE, or do the examples require something different (hook?)

Steps to Reproduce (for bugs)

Create the template using the first example in https://docs.tinkerbell.org/deploying-operating-systems/examples-debian/
Create a workflow and boot the VM, observe the results.

Context

I am unable to use the kexec action and the workflow does not complete.

possibly related to #80

Your Environment

MacOS, using the vagrant method.

v0.7.0 release

Create a new release with the latest updated changes

Context

  • Semantic version Tagged images for all components at current version
  • Verify quickstart works against all targets
    • Vagrant/libvirt
    • Vagrant/Virtualbox
    • Docker compose on standalone host
    • Terraform/Equinix Metal

Issue with published sandbox libvirt bootstrap image

Expected Behaviour

Should be able to run vagrant up with the libvirt provisioner

Current Behaviour

Provisioning with vagrant/libvirt fails

With the currently published bootstrap boxes, it appears that the libvirt box has a 0 byte file for /usr/local/bin/docker-compose instead of the expected docker-compose binary.

Possible Solution

Publish an updated bootstrap image for libvirt that does not have a 0 byte file for /usr/local/bin/docker-compose

Steps to Reproduce (for bugs)

  1. On a Linux box with vagrant/libvirt, run vagrant up

Your Environment

  • Operating System and version (e.g. Linux, Windows, MacOS): Linux/Fedora 33

  • How are you running Tinkerbell? Using Vagrant & Libvirt

provisioner: ./setup.sh: line 502: TINKERBELL_SKIP_NETWORKING: unbound variable

Expected Behaviour

Executing vagrant up provisioner should complete.

Current Behaviour

When executing vagrant up provisioner, it fails with:

provisioner: ./setup.sh: line 502: TINKERBELL_SKIP_NETWORKING: unbound variable

Possible Solution

It appears that the added variable TINKERBELL_SKIP_NETWORKING in #88 did not make it into ./generate-env.sh. By adding the following to ./generate-env.sh, this error in Vagrant doesn't occur:

export TINKERBELL_SKIP_NETWORKING=""

Then it works:

[[TRUNCATED]]
    provisioner: ++ export TINKERBELL_SKIP_NETWORKING=
    provisioner: ++ TINKERBELL_SKIP_NETWORKING=
    provisioner: + make_certs_writable
[[TRUNCATED]]
    provisioner: ++ export TINKERBELL_SKIP_NETWORKING=
    provisioner: ++ TINKERBELL_SKIP_NETWORKING=
    provisioner: + [[ -z '' ]]
    provisioner: + setup_networking ubuntu 18.04
[[TRUNCATED]]

Removed the .env file and set the parameter to false in the ./generate-env.sh and rebuilt:

[[TRUNCATED]
    provisioner: ++ export TINKERBELL_SKIP_NETWORKING=true
    provisioner: ++ TINKERBELL_SKIP_NETWORKING=true
[[TRUNCATED]
    provisioner: + [[ -z true ]]
    # IT WAS SKIPPED
    provisioner: + setup_osie
[[TRUNCATED]

This failed with running in Vagrant as expected but provides the skip behavior desired.

Steps to Reproduce (for bugs)

  1. Walk through the steps at Local Setup with Vagrant.

Context

This prevents creating the local setup with Vagrant to allow sandbox'd learning.

Your Environment

  • Operating System and version (e.g. Linux, Windows, MacOS):
    macOS Big Sur 11.4

  • How are you running Tinkerbell? Using Vagrant & VirtualBox, Vagrant & Libvirt, on Packet using Terraform, or give details:
    Using Vagrant

  • Link to your project or a code example to reproduce issue:
    Exactly as-is from this repository.

Change documentation to use sandbox

Change doc from git clone tinkerbell/tink to git clone tinkerbell/sandbox.

For what I can tell, only Vagrant and Terraform Setup needs to change.

Docker Compose: "host" network_mode is incompatible with port_bindings

The following lines are in conflict:
https://github.com/tinkerbell/sandbox/blob/58937939c36239e9fb3fb03b5e744f7d0249a9c4/deploy/docker-compose.yml#L117

and

https://github.com/tinkerbell/sandbox/blob/58937939c36239e9fb3fb03b5e744f7d0249a9c4/deploy/docker-compose.yml#L144-L147

Resulting in:

root@tinkerbell:~/sandbox-master/deploy# source ../.env; docker-compose up -d
Building with native build. Learn about native build in Compose here: https://docs.docker.com/go/compose-native-build/
deploy_db_1 is up-to-date
Recreating deploy_nginx_1 ...
deploy_registry_1 is up-to-date
Recreating 17fbb5905e81_deploy_boots_1 ...
deploy_hegel_1 is up-to-date
Recreating deploy_nginx_1                 ... done
ERROR: for 17fbb5905e81_deploy_boots_1  "host" network_mode is incompatible with port_bindings
Recreating deploy_tink-server-migration_1 ... done
Recreating deploy_tink-server_1           ... done
deploy_tink-cli_1 is up-to-date

ERROR: for boots  "host" network_mode is incompatible with port_bindings
Traceback (most recent call last):
  File "docker-compose", line 3, in <module>
  File "compose/cli/main.py", line 80, in main
  File "compose/cli/main.py", line 192, in perform_command
  File "compose/metrics/decorator.py", line 18, in wrapper
  File "compose/cli/main.py", line 1165, in up
  File "compose/cli/main.py", line 1161, in up
  File "compose/project.py", line 708, in up
  File "compose/parallel.py", line 106, in parallel_execute
  File "compose/parallel.py", line 204, in producer
  File "compose/project.py", line 694, in do
  File "compose/service.py", line 580, in execute_convergence_plan
  File "compose/service.py", line 502, in _execute_convergence_recreate
  File "compose/parallel.py", line 106, in parallel_execute
  File "compose/parallel.py", line 204, in producer
  File "compose/service.py", line 495, in recreate
  File "compose/service.py", line 614, in recreate_container
  File "compose/service.py", line 333, in create_container
  File "compose/service.py", line 937, in _get_container_create_options
  File "compose/service.py", line 1069, in _get_container_host_config
  File "docker/api/container.py", line 598, in create_host_config
  File "docker/types/containers.py", line 339, in __init__
docker.errors.InvalidArgument: "host" network_mode is incompatible with port_bindings
[512463] Failed to execute script docker-compose

From my reading of this container configuration, it seems like the network_mode: host line would be preferable to be removed. It seems to work either way (I tried this using both configurations and was able to complete a provisioning), but not totally sure what the prefered resolution might be if there are implications to either that I'm not considering. Thanks!

Vagrant box version appears not to exist

Expected Behaviour

Running vagrant up provisioner should spin up the provisioner vagrant box as usual.

Current Behaviour

An error indicating the box does not exist appears:

The box you're attempting to add has no available version that
matches the constraints you requested. Please double-check your
settings. Also verify that if you specified version constraints,
that the provider you wish to use is available for these constraints.

Box: tinkerbelloss/sandbox-ubuntu1804
Address: https://vagrantcloud.com/tinkerbelloss/sandbox-ubuntu1804
Constraints: 0.2.0
Available versions: 0.1.0

However, version 0.2.0 appears to be present at https://vagrantcloud.com/tinkerbelloss/sandbox-ubuntu1804

Steps to Reproduce (for bugs)

  1. Clone the sandbox repo.
  2. Run cd sandbox/deploy/vagrant && vagrant up provisioner

Output:

~/sandbox/deploy/vagrant# vagrant up provisioner
Bringing machine 'provisioner' up with 'virtualbox' provider...
==> provisioner: Box 'tinkerbelloss/sandbox-ubuntu1804' could not be found. Attempting to find and install...
    provisioner: Box Provider: virtualbox
    provisioner: Box Version: 0.2.0
==> provisioner: Loading metadata for box 'tinkerbelloss/sandbox-ubuntu1804'
    provisioner: URL: https://vagrantcloud.com/tinkerbelloss/sandbox-ubuntu1804
The box you're attempting to add has no available version that
matches the constraints you requested. Please double-check your
settings. Also verify that if you specified version constraints,
that the provider you wish to use is available for these constraints.

Box: tinkerbelloss/sandbox-ubuntu1804
Address: https://vagrantcloud.com/tinkerbelloss/sandbox-ubuntu1804
Constraints: 0.2.0
Available versions: 0.1.0

Context

Tested on MacOS and Debian with freshly cloned copies of the sandbox repo. This prevents use of sandbox on vagrant entirely.

Your Environment

tested on Debian Buster & MacOS Big Sur, using the vagrant provider for sandbox.

vagrant up provisioner fails

Expected Behaviour

Following the local setup with Vagrant, the vagrant up provisioner command fails.

Current Behaviour

image

Possible Solution

Steps to Reproduce (for bugs)

  1. Run vagrant up provisioner

Context

Your Environment

  • Operating System and version (e.g. Linux, Windows, MacOS):
    Windows

  • How are you running Tinkerbell? Using Vagrant & VirtualBox, Vagrant & Libvirt, on Packet using Terraform, or give details:
    Vagrant & VirtualBox

  • Link to your project or a code example to reproduce issue:

  • https://docs.tinkerbell.org/setup/local-vagrant/

docker-compose quickstart breaks with ERROR: Invalid interpolation format for "tls-gen" option in service "services": "${FACILITY:-onprem}"

Expected Behaviour

$ docker-compose up -d
Creating network "compose_default" with the default driver
Creating volume "compose_postgres_data" with default driver
Creating volume "compose_certs" with default driver
Creating volume "compose_auth" with default driver
Pulling tls-gen (cfssl/cfssl:)...
latest: Pulling from cfssl/cfssl
* woot * ** party parrot **

Current Behaviour

$ docker-compose up -d
ERROR: Invalid interpolation format for "tls-gen" option in service "services": "${FACILITY:-onprem}"

Possible Solution

I think the precise version docker-compose needs to be specified. 2.x is already public, 1.26 is too old, and 1.29.2 works for me.

$ curl -L "https://github.com/docker/compose/releases/download/1.29.2/docker-compose-$(uname -s)-$(uname -m)" -o /usr/local/bin/docker-compose

Steps to Reproduce (for bugs)

$ curl -L "https://github.com/docker/compose/releases/download/1.26/docker-compose-$(uname -s)-$(uname -m)" -o /usr/local/bin/docker-compose

Context

I was trying the simplest way to get to a running tinkerbell setup, assuming that docker-compose might be a good starting point.

Your Environment

  • Operating System and version (e.g. Linux, Windows, MacOS):

using an ubuntu 20.04.3 LTS VM.

  • How are you running Tinkerbell? Using Vagrant & VirtualBox, Vagrant & Libvirt, on Packet using Terraform, or give details:

ubuntu vm

  • Link to your project or a code example to reproduce issue:

n/a

Ansible as first-class means of sandbox deployment

Expected Behaviour

Well, I'm not sure if I really have any expectations beyond what is already provided.

Current Behaviour

Sandbox ships a number of deployment methods but an ansible playbook/role is not one of them. Particularly, docker-compose is a method that's attractive for small scale "dip your toes in" deployments but is not super friendly to those of us who plan to use podman instead of docker.

Possible Solution

Develop an ansible role that deploys the sandbox using either docker or podman. Along the lines of what is done with ceph-ansible.

Steps to Reproduce (for bugs)

  1. Evaluate sandbox repo.
  2. Draw conclusions from observation.

Context

We're attempting to use tinkerbell but we're not very interested in using docker. We've developed quite a liking for podman though and seeing as they can pretty much run the same containers, plan to develop a method for getting the sandbox up with podman. Recent versions of podman do support docker-compose via the podman unix socket but compose is a pretty fragile format it seems. Different versions of docker-compose support different directives in the yaml format (ex: service_completed_successfully was removed and later added back to docker-compose) and support varies a bit across podman versions. The use of docker-compose also implicitly suggests deployment is favored towards docker as opposed to other container runtimes like podman. Ansible would allow development of a more runtime neutral deployment method.

Your Environment

  • Operating System and version (e.g. Linux, Windows, MacOS): Fedora 34 and Debian testing

  • How are you running Tinkerbell? Using Vagrant & VirtualBox, Vagrant & Libvirt, on Packet using Terraform, or give details: rootless podman containers

  • Link to your project or a code example to reproduce issue:

The tink_worker in terraformed sandbox doesn't get provisioned

Two disclaimers:

  1. I am still investigating this issue
  2. I am new to all of this but I did follow the guide and tried to do some basic troubleshooting.

After I reboot the tink_worker for the first time it doesn't get provisioned. (after running terraform apply).

My first intuition was a networking issue, especially that I can see a couple of "this doesn't work as it's supposed to" in the terraform file. I'll run a tcpdump on the server on port 67 to check. That said, the network does seem to be set up correctly when I check it on the Equinix Metal portal. It's not a networking issue.

If that's correct, I'm going to tinker in the worker itself, maybe I'm hitting #130? It's a bit odd though, I reran it a couple of times and it consistently didn't work.

Expected Behaviour

Tink-worker connects to the provisioner and one can see the worker under tink workflow events

Current Behaviour

The workflow is stuck in the PENDING state.

Steps to Reproduce (for bugs)

Run the instructions from here

Context

I was just trying to take Tinkerbell for a spin!

Your Environment

Im running it on macOS, I'm using the terraform sandbox with Equinix metal.

TLS server certificate must not contain the CA certificate

Expected Behaviour

The current generated bundle.pem must not contain the CA certificate as that fails the certificate validation.

Only the client must have the CA certificate. The server must not send it.

Current Behaviour

openssl s_client fails to validate the certificate:

root@provisioner:~# echo -e | openssl s_client -showcerts $TINKERBELL_HOST_IP:443 
CONNECTED(00000003)
Can't use SSL_get_servername
depth=1 L = @FACILITY@, CN = tinkerbell
verify error:num=19:self signed certificate in certificate chain
verify return:1
depth=1 L = @FACILITY@, CN = tinkerbell
verify return:1
depth=0 L = @FACILITY@, CN = tinkerbell
verify return:1
---
Certificate chain
 0 s:L = @FACILITY@, CN = tinkerbell
   i:L = @FACILITY@, CN = tinkerbell
-----BEGIN CERTIFICATE-----
MIIDpTCCAo2gAwIBAgIUQq5S0pUcxxU4w79dVE0tTljeG5EwDQYJKoZIhvcNAQEL
BQAwKjETMBEGA1UEBwwKQEZBQ0lMSVRZQDETMBEGA1UEAxMKdGlua2VyYmVsbDAe
Fw0yMTA5MTIxMjM4MDBaFw0yMjA5MTIxMjM4MDBaMCoxEzARBgNVBAcMCkBGQUNJ
TElUWUAxEzARBgNVBAMTCnRpbmtlcmJlbGwwggEiMA0GCSqGSIb3DQEBAQUAA4IB
DwAwggEKAoIBAQCynjRTI6Kx37youYrFHpd0hgFGxYkik0DzgCoIRQRFIuxkR6SU
XthNL3tZArogCcqh8jD0MdEcIVlX4mlOVHHaiyEKJd9sxZFvotcSXGhS/upo+/bn
SXha9vtpBFgRyYmyccCXVwNzoDnoRHYL54t3eS+q4AePmHCsua5mthmTF1OJmRrA
rdQ7BvmEMok7Zuk09dUNowdAyDvtIBV7WOLZFtd7uTixebrsQ4L9xcSXZ6zwHp5Q
4TaiigRKz+fmsJI9O1LSNohDUg2tQOQO1VGfMaylYxyl3aIKwSEQXqlIZdUyfsTc
NqZuoOUozF2zMGT7E7oI8t6JvvMIr+AnIC9FAgMBAAGjgcIwgb8wDgYDVR0PAQH/
BAQDAgWgMBMGA1UdJQQMMAoGCCsGAQUFBwMBMAwGA1UdEwEB/wQCMAAwHQYDVR0O
BBYEFDcL3jfy/ZmSI2VMSAsWKnvm3wUPMGsGA1UdEQRkMGKCE3RpbmtlcmJlbGwu
cmVnaXN0cnmCFXRpbmtlcmJlbGwudGlua2VyYmVsbIIKdGlua2VyYmVsbIILdGlu
ay1zZXJ2ZXKCCWxvY2FsaG9zdIcECgMAAocEwKgyBIcEfwAAATANBgkqhkiG9w0B
AQsFAAOCAQEAoOxbT5d3wpuk3DF8d7KA3rvC94US8lO6O9j+omZtIHBtfbSKObVA
2UJYzK5vu7TEFZ776GalNVyZGSqcpgoYEncFCHCwkWFxLa0Ep/U2wh5iZYsFori5
Y8Piwqae0hN0TJTvpHqQlPQ+XTcBg/Y3YguoZfLvBZj9aUxSyWmqp7WYND4oLfXh
wXG7XWdoppHnoqpjeFFZ1iXEVhwEIyKv0vzs7yWZGxrS01V83VMVDNeSFKbasGpx
F0hkA+ZAEzeOKiLQKfIYYK0Lm2dih0+yQzcqgrJ1aCze8jVht9xAxo/aT7xozEDA
5kSWIpr5QRGTWjA1h6+l1tiVpG2r6KlE3Q==
-----END CERTIFICATE-----
 1 s:L = @FACILITY@, CN = tinkerbell
   i:L = @FACILITY@, CN = tinkerbell
-----BEGIN CERTIFICATE-----
MIIDQTCCAimgAwIBAgIUauM0LZhuobLZKZoYIV76UehK2DwwDQYJKoZIhvcNAQEL
BQAwKjETMBEGA1UEBwwKQEZBQ0lMSVRZQDETMBEGA1UEAxMKdGlua2VyYmVsbDAe
Fw0yMTA5MTIxMjM4MDBaFw0yNjA5MTExMjM4MDBaMCoxEzARBgNVBAcMCkBGQUNJ
TElUWUAxEzARBgNVBAMTCnRpbmtlcmJlbGwwggEiMA0GCSqGSIb3DQEBAQUAA4IB
DwAwggEKAoIBAQCoYEtG3Oc726/uYeCSVme6gU6OXvXq2237VJts2AhkpKQJibJm
Hlc+1aaTdSDx7wWsHWj3krKeG9wDdz4QaVUGDTbusKAQh22+odUQd73oL5SOsLSJ
0qRXt8kyvkMPbjkfW2BX9xa+AR2w+6nOtgqMaUYdY8/mYkQuWPRR2+phZAyV/x+V
sEEjWRRJELvfKzIodwSNI9adTOGPHw6yyUguaQyEUxpa33+2AI27bu4cThNspQ1Q
ok3TCi9YwJ2xyNZQ3WTSTCFMsmV03PhBC+30+xwBfYM3ytzjW1rx+aUIKHMvoe94
ISmPu3+MzpD4oSRqqzJJp9WIp76XdF9m1jYLAgMBAAGjXzBdMA4GA1UdDwEB/wQE
AwIBBjAPBgNVHRMBAf8EBTADAQH/MB0GA1UdDgQWBBRibN7CiS54PIFZiXzXSUv/
7dirHDAbBgNVHREEFDAShwQKAwAChwTAqDIEhwR/AAABMA0GCSqGSIb3DQEBCwUA
A4IBAQBSWhkw324WFjQrd+chAMS5mdyj8wkUQX4gNgkyrVv9QqSzvoZiPGDCRg8X
rHGiKOkZ+ZgjsW2eVsBNctWAO8QTZvTmmXJCUg58Ro+K6d0vGYQVMlsBWXYm2OEg
D+GkmA2wAq41pZJsDH5XP//x1qTDTuAWSYTt5IuKFSN8sbe94D3swhnvQ6SsX5VA
C0RCmfeTqpBqvSSnbUtKFXnuZSDWGwsTCu5/jW0R1pEBAO/XMZeOqHqI+QltrMfZ
0yWuevZKNJwDvXjOTTlXkGwR48dXAfmU9Z6zhnKVkdgFPTcwUnlWDs0YeWeHTETF
qrnwiMjhUj6jc7bTJse0aFLkWKRH
-----END CERTIFICATE-----
---
Server certificate
subject=L = @FACILITY@, CN = tinkerbell

issuer=L = @FACILITY@, CN = tinkerbell

---
No client certificate CA names sent
Peer signing digest: SHA256
Peer signature type: RSA
Server Temp Key: X25519, 253 bits
---
SSL handshake has read 2350 bytes and written 376 bytes
Verification error: self signed certificate in certificate chain

wget fails to validate the certificate:

root@provisioner:~# wget "https://$TINKERBELL_HOST_IP/v2/_catalog"
--2021-09-12 14:40:59--  https://10.3.0.2/v2/_catalog
Connecting to 10.3.0.2:443... connected.
ERROR: cannot verify 10.3.0.2's certificate, issued by โ€˜CN=tinkerbell,L=@FACILITY@โ€™:
  Self-signed certificate encountered.
To connect to 10.3.0.2 insecurely, use `--no-check-certificate'.

curl does not fail to validate the certificate, but it should, so I'm not really sure what is going on:

root@provisioner:~# curl "https://$TINKERBELL_HOST_IP/v2/_catalog"
{"errors":[{"code":"UNAUTHORIZED","message":"authentication required","detail":[{"Type":"registry","Class":"","Name":"catalog","Action":"*"}]}]}

Possible Solution

  1. Modify https://github.com/tinkerbell/sandbox/blob/main/deploy/compose/tls/generate.sh#L27 to not bundle the ca into the bundle.pem file. Instead it should split them into a server-crt.pem and a ca.pem file.
  2. Modify everything that uses this (at least the following services need to be changed: registry and think-server).
  3. When creating the CA certificate, to make the troubleshoot easier, use a different common name for the CA and the server certificate.

Steps to Reproduce (for bugs)

  1. Start the provisioner.
  2. Execute the command used in the Current Behaviour section above.

Figure out how to generate a changelog

Expected Behaviour

As part of the release CI pipeline, we should have a changelog that collects and grab updates from all the projects: tink, osie, Hegel, boots.

I think the list of PR titles is enough but I want to put some extra care into the breaking change. They should be listed in their own section so the user will be able to figure how what they are and what they have to do to update (as part of the content of every PR that contains breaking changes there is a section that explains how to mitigate it)

Current Behaviour

We do not have it

Possible Solution

Using some tool that does that automatically or writes a custom binary.

Hegel not exposed

Documentation indicates we call the provisioner ip /metadata

For example, if you are using the Vagrant Setup, Hegel runs as part of the Provisioner virtual machine with the IP: 192.168.1.2. When the Worker starts and if you have logged in to osie using the password root you can access the metadata for your server via cURL:

Looks like

https://github.com/tinkerbell/hegel/blob/master/http_server.go#L31

https://github.com/tinkerbell/hegel/blob/master/main.go#L105

Its running on port 50061 it seems. But 192.168.1.2 is the nginx service.. and it doesn't seem to be proxying through to it. Unless i'm missing something?

Expected Behaviour

If I hit 192.168.1.2/metadata I should hit hegel

Current Behaviour

It hits nginx which is just serving its webroot. Only way to get it up is to hit that port directly.

Possible Solution

Change nginx config to proxy through.

Steps to Reproduce (for bugs)

  1. Deploy
  2. Provision hello world on worker
  3. on worker access http://192.168.1.2/metadata

Context

Your Environment

  • Operating System and version (e.g. Linux, Windows, MacOS): MacOS

  • How are you running Tinkerbell? Using Vagrant & VirtualBox, Vagrant & Libvirt, on Packet using Terraform, or give details: vagrant

  • Link to your project or a code example to reproduce issue:

Centralise versions in a single place

Right now versions for our components are everywhere:

  • docker-compose
  • generate-envrc

We need to create a file that will contain all of them, in this way it will e way easier to bump them up and check what a user is running. ideally, we will have to only ask: "can you copy paste the FILE_VERSION from sandbox?"

x.5009 cert error on docker-compose

hey ,

i'm running the docker compose quick start and Im getting this error:

{"level":"info","ts":1648207945.5831676,"caller":"boots/dhcp.go:91","msg":"retrieved job is empty","service":"github.com/tinkerbell/boots","pkg":"main","type":"DHCPDISCOVER","mac":"ac:1f:6b:c7:ba:da","err":"discover from dhcp message: get hardware by mac from tink: rpc error: code = Unavailable desc = connection error: desc = "transport: authentication handshake failed: x509: certificate is valid for 192.168.56.4, 127.0.0.1, not 10.126.118.60"","errVerbose":"rpc error: code = Unavailable desc = connection error: desc = "transport: authentication handshake failed: x509: certificate is valid for 192.168.56.4, 127.0.0.1, not 10.126.118.60"\nget hardware by mac from tink\ngithub.com/tinkerbell/boots/packet.(*client).DiscoverHardwareFromDHCP\n\t/opt/actions-runner/_work/boots/boots/packet/endpoints.go:108\ngithub.com/tinkerbell/boots/job.discoverHardwareFromDHCP.func1\n\t/opt/actions-runner/_work/boots/boots/job/fetch.go:17\ngithub.com/golang/groupcache/singleflight.(*Group).Do\n\t/home/github/go/pkg/mod/github.com/golang/[email protected]/singleflight/singleflight.go:56\ngithub.com/tinkerbell/boots/job.discoverHardwareFromDHCP\n\t/opt/actions-runner/_work/boots/boots/job/fetch.go:19\ngithub.com/tinkerbell/boots/job.CreateFromDHCP\n\t/opt/actions-runner/_work/boots/boots/job/job.go:106\nmain.dhcpHandler.serveDHCP\n\t/opt/actions-runner/_work/boots/boots/cmd/boots/dhcp.go:89\nmain.dhcpHandler.ServeDHCP.func1\n\t/opt/actions-runner/_work/boots/boots/cmd/boots/dhcp.go:50\ngithub.com/gammazero/workerpool.startWorker\n\t/home/github/go/pkg/mod/github.com/gammazero/[email protected]/workerpool.go:218\nruntime.goexit\n\t/opt/actions-runner/_work/_tool/go/1.16.3/x64/src/runtime/asm_amd64.s:1371\ndiscover from dhcp message"}

The VM from where Im running docker-compose has to interfaces (public and private), the bare metal are on the private network, which this VM hold this IP 10.126.118.60... any idea why Im getting this cert error?

docker-compose on Linux: "host" network_mode is incompatible with port_bindings

Expected Behaviour

docker-compose up should work without any issue with any version.

Current Behaviour

On my Mac I run:

$ docker-compose version
docker-compose version 1.27.4, build 40524192
docker-py version: 4.3.1
CPython version: 3.7.7
OpenSSL version: OpenSSL 1.1.1g  21 Apr 2020

I installed the lastest docker-compose version on Linux:

$ docker-compose version
docker-compose version 1.28.0, build d02a7b1a
docker-py version: 4.4.1
CPython version: 3.9.0
OpenSSL version: OpenSSL 1.1.1d  10 Sep 2019

The behavior is different. My Mac works fine. On Linux I get:

  File "docker/api/container.py", line 598, in create_host_config
  File "docker/types/containers.py", line 338, in __init__
docker.errors.InvalidArgument: "host" network_mode is incompatible with port_bindings
[14315] Failed to execute script docker-compose

Possible Solution

I think the error is reasonable, we should remove the network_mode: host if not necessary.

Steps to Reproduce (for bugs)

  1. Install docker-compose https://docs.docker.com/compose/install/
  2. Run sandbox

Hegel filters out important information for running TB workflows

Expected Behaviour

The executed ubuntu workflow in the workflows repo will work as expected.

Current Behaviour

While running the ubuntu TB workflow from this repo, the workflow will fail due to lack of information from Hegel meta server:

echo 'metadata.facility.plan_slug is missing, empty or null'
metadata.facility.plan_slug is missing, empty or null
functions.sh: line 172: $3: unbound variable

Possible Solution

Can be found attached to this PR #63

Steps to Reproduce (for bugs)

  1. Setup TB using the tf provisioner on equinix in the sandbox project
  2. Execute this workflow
  3. Take a look at the logs of the tink-worker in worker node

Context

I am trying to install ubuntu after the worker boots up.

Your Environment

  • Operating System and version (e.g. Linux, Windows, MacOS):
    OSIE(Alpine Linux)
  • How are you running Tinkerbell? Using Vagrant & VirtualBox, Vagrant & Libvirt, on Packet using Terraform, or give details:
    on Packet using Terraform
  • Link to your project or a code example to reproduce issue:

Add missing version constraints for providers

Currently when doing terraform init in deploy/terraform directory, following messages are printed:

The following providers do not have any version constraints in configuration,
so the latest version was installed.

To prevent automatic upgrades to new major versions that may contain breaking
changes, we recommend adding version constraints in a required_providers block
in your configuration, with the constraint strings suggested below.

* hashicorp/null: version = "~> 2.1.2"
* hashicorp/template: version = "~> 2.1.2"

mount.nfs requested NFS version or transport protocol is not supported

Debian10
Vagrant 2.2.18
Most recent sandbox

Expected Behaviour

cd sandbox/deploy/vagrant vagrant up provisioner - starts the sandbox

Current Behaviour

==> provisioner: Mounting NFS shared folders...
==> provisioner: Pruning invalid NFS exports. Administrator privileges will be required...
==> provisioner: Removing domain...
==> provisioner: Deleting the machine folder
The following SSH command responded with a non-zero exit status.
Vagrant assumes that this means the command failed!

mount -o vers=4,rw,vers=4,tcp 192.168.121.1:/home/banner/Documents/Lib/Tinkerbell/deploy /vagrant

Stdout from the command:

Stderr from the command:

mount.nfs: requested NFS version or transport protocol is not supported

Possible Solution

Steps to Reproduce (for bugs)

  1. Install debian10
  2. install vagrant
  3. install kvm-libvirt
  4. run vagrant up

Context

Your Environment

  • Operating System and version (e.g. Linux, Windows, MacOS):

  • How are you running Tinkerbell? Using Vagrant & VirtualBox, Vagrant & Libvirt, on Packet using Terraform, or give details:
    Vagrant + Libvirt

I tried this on two separate servers, both debian10. Both incurred the same error.

vagrant plugin list
vagrant-docker-compose (1.5.1, global)
vagrant-libvirt (0.5.3, global)

It's late, I could have very easily overlooked something dumb, but I haven't gone out of my way to do crazy things here. Just pulled the repo and run vagrant up.

Kubernetes Resource Model - docker-compose extension

I'm looking to add support in the sandbox for running Tinkerbell components using the Kubernetes Resource Model (KRM) as outlined in tinkerbell/proposals#46. I was thinking of adding docker-compose support, but wanted to get some input on approach.

Currently there is a single docker-compose.yml in deploy/compose, however with KRM, there are some services/containers that won't be needed (database, tink-cli, etc). Given the overlap, could we have 3 compose files since compose supports combining multiple files? The result would be something like a docker-compose.base.yml, docker-compose.k8s.yml and docker-compose.db.yaml. In order to run the existing stack, you'd have to run:

docker-compose -f docker-compose.base.yml -f docker-compose.db.yml up -d

and for the KRM stack:

docker-compose -f docker-compose.base.yml -f docker-compose.k8s.yml up -d

This gets away from the simplicity of docker-compose up -d, but enables reuse of services like registry, and common config on services like boots, hegel, tink-server, and nginx.

The only alternative I see is to have a separate, largely duplicated compose file.

Any opinions here?

Provisioner should be setup to NAT for the workers

I'm not sure why we didn't do this from the beginning and I can't really think of a good reason to make the default setup not have internet access via the provisioner. If we enable routing and NAT'ing we'd be able to drop the local registry and the need to sync/re-tag images from the setup, and the workflows will be able to use the internet access to fetch from.

ERROR: for images-to-local-registry Container ... is unhealthy

I', trying to set up a provisioner using the docker-compose container and actually I encounter an unhealthy status for

ERROR: for images-to-local-registry Container "356a4ca1b50a" is unhealthy.

ERROR: for osie-bootloader Container "fa22bda087e8" exited with code 1.

Expected Behaviour

Current Behaviour

Possible Solution

Steps to Reproduce (for bugs)

  1. following your README for setting up a provisioner

Context

Your Environment

  • Operating System and version (e.g. Linux, Windows, MacOS):

  • Linux Ubuntu Ubuntu 20.04.2 LTS

  • How are you running Tinkerbell? Using Vagrant & VirtualBox, Vagrant & Libvirt, on Packet using Terraform, or give details:

  • I'm trying using a proxmox's VM for the provisioner.

  • Link to your project or a code example to reproduce issue:

Support skipping network configuration in setup.sh

I'm running the sandbox on a bare metal environment and don't need setup.sh to do any networking for me. Can I add an escape hatch (maybe just set TINKERBELL_SKIP_NETWORKING to some non-zero length value) to setup.sh? I still want osie, certs, and container registry setup, so it'd be nice to not have to run something hacky like sed -i 's,setup_networking ,#setup_networking ,g' setup.sh.

"vagrant up provisioner" failing on Ubuntu 20.04.2 with libvirtd backend

I have a machine running Ubuntu 20.024.2 where when following the documentation when I get to the vagrant up provisioner step I get a failure (detailed below.) I was able to bisect it to have started failing at commit 9edecbf.

Expected Behaviour

vagrant up provisioner should succeed

Current Behaviour

vagrant up provisioner fails

Last bit of output is this:

    provisioner: + cd /certs
    provisioner: + '[' '!' -f ca-key.pem ]
    provisioner: + cfssl gencert -initca ca.json
    provisioner: + cfssljson -bare ca
    provisioner: 2021/03/03 16:13:19 [INFO] generating a new CA key and certificate from CSR
    provisioner: 2021/03/03 16:13:19 [INFO] generate received request
    provisioner: 2021/03/03 16:13:19 [INFO] received CSR
    provisioner: 2021/03/03 16:13:19 [INFO] generating key: rsa-2048
    provisioner: 2021/03/03 16:13:19 [INFO] encoded CSR
    provisioner: 2021/03/03 16:13:19 [INFO] signed certificate with serial number 6198577501498366853967432593968947302343768126
    provisioner: + '[' '!' -f
    provisioner:  server.pem ]
    provisioner: + cfssl
    provisioner:  gencert
    provisioner:  '-ca=ca.pem'
    provisioner:  '-ca-key=ca-key.pem'
    provisioner:  '-config=/ca-config.json' '-profile=server' server-csr.json
    provisioner: + cfssljson -bare server
    provisioner: 2021/03/03 16:13:19 [INFO] generate received request
    provisioner: 2021/03/03 16:13:19 [INFO] received CSR
    provisioner: 2021/03/03 16:13:19 [INFO] generating key: rsa-2048
    provisioner: 2021/03/03 16:13:19 [INFO] encoded CSR
    provisioner: 2021/03/03 16:13:19 [INFO] signed certificate with serial number 185473496253083687571958641817618602018650548890
    provisioner: +
    provisioner: cat
    provisioner:  server.pem
    provisioner:  ca.pem
    provisioner: +
    provisioner: cmp
    provisioner:  -s
    provisioner:  bundle.pem.tmp
    provisioner:  bundle.pem
    provisioner: +
    provisioner: mv
    provisioner:  bundle.pem.tmp
    provisioner:  bundle.pem
    provisioner: Error: No such object:
The SSH command responded with a non-zero exit status. Vagrant
assumes that this means the command failed. The output for this command
should be in the log above. Please read the output to determine what
went wrong.

Possible Solution

Unclear, but git bisect blames 9edecbf

Steps to Reproduce (for bugs)

  1. Take existing Ubuntu 20.04.2 machine with Vagrant and Libvirt installed
  2. clone this repo
  3. cd deploy/vagrant
  4. vagrant up provisioner

Context

I am unable to use Vagrant and Libvirt to bring up the provisioner.

Your Environment

  • Operating System and version (e.g. Linux, Windows, MacOS):
    Ubuntu 20.04.2

  • How are you running Tinkerbell? Using Vagrant & VirtualBox, Vagrant & Libvirt, on Packet using Terraform, or give details:
    Vagrant and Libvirt

GitHub Self hosted runner uses go cache

Current Behaviour

We are not running vagrant tests because it gets cached from go cache

Possible Solution

Set GOFLAGS="-count=1" in the .env action-runner file

Environment

GitHub Action self hosted runner

/cc. @rawkode

Master doesn't compile

[root@73a5a635-85b2-4a26-cd19-bd3e89bc9c36 boots]$ go build -v
github.com/tinkerbell/boots/tftp

github.com/tinkerbell/boots/tftp

tftp/tftp.go:15:20: undefined: ipxe.MustAsset
tftp/tftp.go:16:20: undefined: ipxe.MustAsset
tftp/tftp.go:17:20: undefined: ipxe.MustAsset
tftp/tftp.go:18:20: undefined: ipxe.MustAsset

osie symlinks are breaking this sandbox

Expected Behaviour

The sandbox provisioner starts.

Current Behaviour

The sandbox provisioner fails to start.

Possible Solution

Unknown at the moment.

Steps to Reproduce (for bugs)

Follow the steps here: https://docs.tinkerbell.org/setup/local-vagrant/

Context

I am unable to test tinkerbell in the pre-designed and isolated vagrantbox.

Your Environment

  • Operating System and version (e.g. Linux, Windows, MacOS):
    Windows 10

  • How are you running Tinkerbell? Using Vagrant & VirtualBox, Vagrant & Libvirt, on Packet using Terraform, or give details:
    Vagrant

  • Link to your project or a code example to reproduce issue:
    I added a a few commands to indicate the PWD and the contents of PWD to determine that symlinks are an issue.

    provisioner: + curl https://tinkerbell-oss.s3.amazonaws.com/osie-uploads/osie-v0-n=404,c=c35a5f8,b=master.tar.gz -o ./osie.tar.gz
    provisioner:   % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
    provisioner:                                  Dload  Upload   Total   Spent    Left  Speed
 50 2174M   50 1107M    0     0  17.7M      0  0:02:02  0:01:01  0:01:01 15.9M
 96 2174M   96 2100M    0     0  16.7M      0  0:02:09  0:02:05  0:00:04 17.1M
100 2174M  100 2174M    0     0  16.8M      0  0:02:08  0:02:08 --:--:-- 21.0M
    provisioner: + tar -zxf osie.tar.gz
    provisioner: + pushd osie-v0-n=404,c=c35a5f8,b=master/
    provisioner: /tmp/tmp.20jmEiTnhL/osie-v0-n=404,c=c35a5f8,b=master /tmp/tmp.20jmEiTnhL /vagrant
    provisioner: + mv workflow-helper.sh workflow-helper-rc /vagrant/deploy/state/webroot/workflow/
    provisioner: ++ pwd
    provisioner: THE PWD IS /tmp/tmp.20jmEiTnhL/osie-v0-n=404,c=c35a5f8,b=master
    provisioner: + echo 'THE PWD IS /tmp/tmp.20jmEiTnhL/osie-v0-n=404,c=c35a5f8,b=master'
    provisioner: + ls -la ./discover-metal-x86_64.tar.gz ./discover-rc ./discover.sh ./grub ./initramfs-2a2 ./initramfs-aarch64 ./initramfs-amp ./initramfs-hua ./initramfs-qcom ./initramfs-tx2 ./initramfs-x86_64 ./modloop-2a2 ./modloop-aarch64 ./modloop-amp ./modloop-hua ./modloop-qcom ./modloop-tx2 ./modloop-x86_64 
./osie-aarch64.tar.gz ./osie-installer-rc ./osie-installer.sh ./osie-runner-x86_64.tar.gz ./osie-x86_64.tar.gz ./repo-aarch64 ./repo-x86_64 ./rescue-helper-rc ./rescue-helper.sh ./runner-rc ./runner.sh ./vmlinuz-2a2 ./vmlinuz-aarch64 ./vmlinuz-amp ./vmlinuz-hua ./vmlinuz-qcom ./vmlinuz-tx2 ./vmlinuz-x86_64
    provisioner: -rw-r--r--  1 root root 1063704576 Jan  6  2021 ./discover-metal-x86_64.tar.gz
    provisioner: -rw-r--r--  1 root root        133 Jan  6  2021 ./discover-rc
    provisioner: -rw-r--r--  1 root root       2499 Jan  6  2021 ./discover.sh
    provisioner: -rw-r--r--  1 root root   41651322 Jan  6  2021 ./initramfs-2a2
    provisioner: -rw-r--r--  1 root root   45498385 Jan  6  2021 ./initramfs-aarch64
    provisioner: -rw-r--r--  1 root root   42696062 Jan  6  2021 ./initramfs-amp
    provisioner: -rw-r--r--  1 root root   37430247 Jan  6  2021 ./initramfs-hua
    provisioner: -rw-r--r--  1 root root   42708137 Jan  6  2021 ./initramfs-qcom
    provisioner: -rw-r--r--  1 root root   49447343 Jan  6  2021 ./initramfs-tx2
    provisioner: -rw-r--r--  1 root root  130723463 Jan  6  2021 ./initramfs-x86_64
    provisioner: -rw-r--r--  1 root root   39305216 Jan  6  2021 ./modloop-2a2
    provisioner: -rw-r--r--  1 root root  188854272 Jan  6  2021 ./modloop-aarch64
    provisioner: -rw-r--r--  1 root root   40345600 Jan  6  2021 ./modloop-amp
    provisioner: -rw-r--r--  1 root root   35037184 Jan  6  2021 ./modloop-hua
    provisioner: -rw-r--r--  1 root root   40349696 Jan  6  2021 ./modloop-qcom
    provisioner: -rw-r--r--  1 root root   47124480 Jan  6  2021 ./modloop-tx2
    provisioner: -rw-r--r--  1 root root  209412096 Jan  6  2021 ./modloop-x86_64
    provisioner: -rw-r--r--  1 root root  706898432 Jan  6  2021 ./osie-aarch64.tar.gz
    provisioner: -rw-r--r--  1 root root         96 Jan  6  2021 ./osie-installer-rc
    provisioner: -rw-r--r--  1 root root       7003 Jan  6  2021 ./osie-installer.sh
    provisioner: -rw-r--r--  1 root root  423728640 Jan  6  2021 ./osie-runner-x86_64.tar.gz
    provisioner: -rw-r--r--  1 root root  892384256 Jan  6  2021 ./osie-x86_64.tar.gz
    provisioner: lrwxrwxrwx  1 root root         20 Jan  6  2021 ./repo-aarch64 -> ../../../alpine/edge
    provisioner: lrwxrwxrwx  1 root root         21 Jan  6  2021 ./repo-x86_64 -> ../../../alpine/v3.12
    provisioner: -rw-r--r--  1 root root        123 Jan  6  2021 ./rescue-helper-rc
    provisioner: -rw-r--r--  1 root root       1019 Jan  6  2021 ./rescue-helper.sh
    provisioner: -rw-r--r--  1 root root         69 Jan  6  2021 ./runner-rc
    provisioner: -rw-r--r--  1 root root       4764 Jan  6  2021 ./runner.sh
    provisioner: -rw-r--r--  1 root root   10412544 Jan  6  2021 ./vmlinuz-2a2
    provisioner: -rw-r--r--  1 root root   14852608 Jan  6  2021 ./vmlinuz-aarch64
    provisioner: -rw-r--r--  1 root root   11272704 Jan  6  2021 ./vmlinuz-amp
    provisioner: -rw-r--r--  1 root root    8912384 Jan  6  2021 ./vmlinuz-hua
    provisioner: -rw-r--r--  1 root root   11270656 Jan  6  2021 ./vmlinuz-qcom
    provisioner: -rw-r--r--  1 root root   11270656 Jan  6  2021 ./vmlinuz-tx2
    provisioner: -rw-r--r--  1 root root    6699168 Jan  6  2021 ./vmlinuz-x86_64
    provisioner:
    provisioner: ./grub:
    provisioner: total 92
    provisioner: drwxr-xr-x 23 root root 4096 Jul 25 03:02 .
    provisioner: drwxr-xr-x  3 root root 4096 Jul 25 03:03 ..
    provisioner: drwxr-xr-x 35 root root 4096 Jul 25 03:02 centos_7
    provisioner: drwxr-xr-x 36 root root 4096 Jul 25 03:02 centos_8
    provisioner: drwxr-xr-x 31 root root 4096 Jul 25 03:02 debian_10
    provisioner: drwxr-xr-x 30 root root 4096 Jul 25 03:02 debian_8
    provisioner: drwxr-xr-x 34 root root 4096 Jul 25 03:02 debian_9
    provisioner: drwxr-xr-x 33 root root 4096 Jul 25 03:02 opensuse_42_3
    provisioner: drwxr-xr-x 31 root root 4096 Jul 25 03:02 rhel_7
    provisioner: drwxr-xr-x 31 root root 4096 Jul 25 03:02 rhel_8
    provisioner: drwxr-xr-x 27 root root 4096 Jul 25 03:02 scientific_6
    provisioner: drwxr-xr-x 24 root root 4096 Jul 25 03:02 suse_sles12_sp3
    provisioner: drwxr-xr-x 29 root root 4096 Jul 25 03:02 ubuntu_14_04
    provisioner: drwxr-xr-x 40 root root 4096 Jul 25 03:02 ubuntu_16_04
    provisioner: drwxr-xr-x 32 root root 4096 Jul 25 03:02 ubuntu_17_04
    provisioner: drwxr-xr-x 32 root root 4096 Jul 25 03:02 ubuntu_17_10
    provisioner: drwxr-xr-x 40 root root 4096 Jul 25 03:02 ubuntu_18_04
    provisioner: drwxr-xr-x 39 root root 4096 Jul 25 03:02 ubuntu_19_04
    provisioner: drwxr-xr-x 39 root root 4096 Jul 25 03:02 ubuntu_19_10
    provisioner: drwxr-xr-x 40 root root 4096 Jul 25 03:02 ubuntu_20_04
    provisioner: drwxr-xr-x 40 root root 4096 Jul 25 03:02 ubuntu_20_10
    provisioner: drwxr-xr-x 40 root root 4096 Jul 25 03:02 vmware_nsx_2_5_0
    provisioner: drwxr-xr-x 40 root root 4096 Jul 25 03:02 vmware_nsx_3_0_0
    provisioner: + cp -r ./discover-metal-x86_64.tar.gz ./discover-rc ./discover.sh ./grub ./initramfs-2a2 ./initramfs-aarch64 ./initramfs-amp ./initramfs-hua ./initramfs-qcom ./initramfs-tx2 ./initramfs-x86_64 ./modloop-2a2 ./modloop-aarch64 ./modloop-amp ./modloop-hua ./modloop-qcom ./modloop-tx2 ./modloop-x86_64 ./osie-aarch64.tar.gz ./osie-installer-rc ./osie-installer.sh ./osie-runner-x86_64.tar.gz ./osie-x86_64.tar.gz ./repo-aarch64 ./repo-x86_64 ./rescue-helper-rc ./rescue-helper.sh ./runner-rc ./runner.sh ./vmlinuz-2a2 ./vmlinuz-aarch64 ./vmlinuz-amp ./vmlinuz-hua ./vmlinuz-qcom ./vmlinuz-tx2 ./vmlinuz-x86_64 /vagrant/deploy/state/webroot/misc/osie/current
    provisioner: cp: cannot create symbolic link '/vagrant/deploy/state/webroot/misc/osie/current/repo-aarch64': Protocol error
    provisioner: cp: cannot create symbolic link '/vagrant/deploy/state/webroot/misc/osie/current/repo-x86_64': Protocol error
    provisioner: + finish
    provisioner: + rm -rf /tmp/tmp.20jmEiTnhL
The SSH command responded with a non-zero exit status. Vagrant
assumes that this means the command failed. The output for this command
should be in the log above. Please read the output to determine what
went wrong.
PS C:\Users\user\projects\sandbox\deploy\vagrant>

Vagrant sandbox not working on Virtualbox 6.1.28

When running recent virtualbox (6.1.28) an error is returned during 'vagrant up' when attaching the 192.168.50.1 IP:

Command: ["hostonlyif", "ipconfig", "vboxnet1", "--ip", "192.168.50.1", "--netmask", "255.255.255.0"]

Some research found https://discuss.hashicorp.com/t/vagrant-2-2-18-osx-11-6-cannot-create-private-network/30984/20 which suggests recent virtualbox versions have instituted some sort of filtering of acceptable private networks and our use of 192.168.50.0/24 is at fault.

Expected Behaviour

vagrant should be able to successfully bring up a VM with a hostonlyif using 192.168.50.1 as the IP address.

Current Behaviour

Bringing machine 'provisioner' up with 'virtualbox' provider...
==> provisioner: Checking if box 'generic/ubuntu2004' version '3.5.0' is up to date...
==> provisioner: Clearing any previously set network interfaces...
There was an error while executing VBoxManage, a CLI used by Vagrant
for controlling VirtualBox. The command and stderr is shown below.

Command: ["hostonlyif", "ipconfig", "vboxnet2", "--ip", "192.168.50.1", "--netmask", "255.255.255.0"]

Stderr: VBoxManage: error: Code E_ACCESSDENIED (0x80070005) - Access denied (extended info not available)
VBoxManage: error: Context: "EnableStaticIPConfig(Bstr(pszIp).raw(), Bstr(pszNetmask).raw())" at line 242 of file VBoxManageHostonly.cpp

Possible Solution

Per the hashicorp post linked above if we switch our private network to 192.168.56.0/24 we may avoid this issue. Other option would be for end-users to update the list of acceptable networks but that option is less desirable. I will confirm changing to an alternate network does resolve the issue.

Downgrading to an earlier virtualbox would also 'resolve' the issue.

Context

This issue prevents me from bringing up the vagrant based sandbox on recent OS X machines.

Your Environment

  • OSX 11.6
  • Virtualbox 6.1.28
  • Vagrant 2.2.18

Installing using docker-compose behind a proxy

I'm trying to install the provisioner on a server behind a proxy.
It is not functioning because, in spite the fact that I've already set the proxies variables both in the environment and also into the docker.service, while trying to download and install the images of the containers nothing is ok cause they cannot contact the github server; for instance.
Do you have any idea on how to solve that issue?

Best regards

Failed to generate Tinkerbell env using `generate-envrc.sh` script

Running the generate-envrc.sh scripts fails, when installing TinkerBell using terraform on Equinix Metal:

./generate-envrc.sh: line 14: ./current_versions.sh: No such file or directory

Expected Behaviour

The generate-envrc.sh should prepare the TinkerBell env.

Current Behaviour

Failed to run the script because the current_versions.sh isn't mounted on the provisioner

Possible Solution

mount the file on the target provisioner, here is the PR: #61

Steps to Reproduce (for bugs)

1.echo "metal_api_token = XXXX \nproject_id= XXX " >terraform.tfvars
2.erraform init --upgrade
3.terraform apply
4.ssh root@[ip-address]
5./generate-envrc.sh enp1s0f1 > .env

Context

Your Environment

  • Operating System and version (e.g. Linux, Windows, MacOS):
    Linux: Ubuntu
  • How are you running Tinkerbell? Using Vagrant & VirtualBox, Vagrant & Libvirt, on Packet using Terraform, or give details:
    Packet using Terraform
  • Link to your project or a code example to reproduce issue:

vagrant worker connection refused

When I run vagrant up worker I get the following error message from Vagrant:

$ vagrant up worker
Bringing machine 'worker' up with 'virtualbox' provider...
==> worker: Importing base box 'generic/alpine38'...
==> worker: Matching MAC address for NAT networking...
==> worker: Checking if box 'generic/alpine38' version '3.1.22' is up to date...
==> worker: Setting the name of the VM: vagrant_worker_1611164269731_58978
==> worker: Fixed port collision for 22 => 2222. Now on port 2200.
==> worker: Clearing any previously set network interfaces...
==> worker: Preparing network interfaces based on configuration...
    worker: Adapter 1: nat
    worker: Adapter 2: intnet
==> worker: Forwarding ports...
    worker: 22 (guest) => 2200 (host) (adapter 1)
==> worker: Running 'pre-boot' VM customizations...
==> worker: Booting VM...
==> worker: Waiting for machine to boot. This may take a few minutes...
    worker: SSH address: 127.0.0.1:22
    worker: SSH username: vagrant
    worker: SSH auth method: private key
    worker: Warning: Connection refused. Retrying...
    worker: Warning: Connection refused. Retrying...
    worker: Warning: Connection refused. Retrying...
    worker: Warning: Connection refused. Retrying...

The logs (displayed from the docker-compose logs -f tink-server boots nginx command) show the following:

boots_1                  | {"level":"info","ts":1611163375.3727038,"caller":"syslog/receiver.go:114","msg":"host=192.168.1.5 facility=kern app-name=ipxe msg=\"http://192.168.1.2/misc/osie/current/vmlinuz-x86_64... Operation not permitted (http://ipxe.org/410c613c)\"","service":"github.com/tinkerbell/boots","pkg":"syslog"}
boots_1                  | {"level":"info","ts":1611163375.3755722,"caller":"syslog/receiver.go:114","msg":"host=192.168.1.5 facility=kern app-name=ipxe msg=\"Could not boot image: Operation not permitted (http://ipxe.org/410c613c)\"","service":"github.com/tinkerbell/boots","pkg":"syslog"}
boots_1                  | {"level":"info","ts":1611163375.3776581,"caller":"syslog/receiver.go:114","msg":"host=192.168.1.5 facility=kern app-name=ipxe msg=\"No more network devices\"","service":"github.com/tinkerbell/boots","pkg":"syslog"}

I then tried interrupting vagrant up worker with Ctrl-C, and ran vagrant reload worker, and the following error message was printed:

$ vagrant reload worker
==> worker: Checking if box 'generic/alpine38' version '3.1.22' is up to date...
==> worker: Fixed port collision for 22 => 2222. Now on port 2200.
==> worker: Clearing any previously set network interfaces...
==> worker: Preparing network interfaces based on configuration...
    worker: Adapter 1: nat
    worker: Adapter 2: intnet
==> worker: Forwarding ports...
    worker: 22 (guest) => 2200 (host) (adapter 1)
There was an error while executing `VBoxManage`, a CLI used by Vagrant
for controlling VirtualBox. The command and stderr is shown below.

Command: ["modifyvm", "e7ac9167-740b-40bb-bf22-9a5e99c816e2", "--natpf1", "ssh,tcp,127.0.0.1,2200,,22"]

Stderr: VBoxManage: error: A NAT rule of this name already exists
VBoxManage: error: Details: code NS_ERROR_INVALID_ARG (0x80070057), component NATEngineWrap, interface INATEngine, callee nsISupports
VBoxManage: error: Context: "AddRedirect(Bstr(strName).raw(), proto, Bstr(strHostIp).raw(), RTStrToUInt16(strHostPort), Bstr(strGuestIp).raw(), RTStrToUInt16(strGuestPort))" at line 1907 of file VBoxManageModifyVM.cpp

Expected Behaviour

I should be prompted to login by the GUI.

Current Behaviour

Connection refused. Retrying... is printed until I interrupt vagrant up worker with Ctrl-C.

Possible Solution

Unsure ๐Ÿคทโ€โ™‚๏ธ

Steps to Reproduce (for bugs)

  1. Follow the steps in the Local Setup with Vagrant guide until you try to bring up the worker machine with vagrant up worker step.

Context

I'm trying to complete the Local Setup with Vagrant guide. I've followed all of the steps provided in the guide, as provided. This is my second attempt at following the guide, and I am getting stuck at the same step. I'm unsure of the next steps that I should take.

Your Environment

  • Operating System and version (e.g. Linux, Windows, MacOS):

Ubuntu 20.04.1 LTS

  • How are you running Tinkerbell? Using Vagrant & VirtualBox, Vagrant & Libvirt, on Packet using Terraform, or give details:

Vagrant & VirtualBox

Move tink-cli from docker-compose to the actual host

I think it is confusing and that great to use the CLI as a docker container. We have everything we need to move tink-cli out from docker-compose to the host itself.

A brief description of how I think this issue can be resolved:

  1. Remove the tink-cli service from docker-compose
  2. Based on where we are (vagrant | terraform) we should install the tink-worker looking at the environment variable set here https://github.com/tinkerbell/sandbox/blob/master/current_versions.sh#L9 and we should do something like:
docker pull $IMAGE
docker cp $IMAGE /tink /usr/local/bin/tink

2a. If we are on Vagrant use vagrant/tinkerbell.sh
2b. On terraform use the exec-remote provider
2c. DO NOT USE setup.sh because it is not its responsibility to install programs

  1. Update doc, readme, and things like that

Can't complete `docker_compose` quickstart guide

Greetings,

I am trying to "play" with Tinkerbell to provision bare metal servers with OSes (e.g with Ubuntu Focal). I am following https://github.com/tinkerbell/sandbox/blob/main/docs/quickstarts/COMPOSE.md and I haven't been able to finish the provisioning steps -I've re-tried the steps 5 times in a row, and saw 2 different outcomes-. On my first 3 tries, Boots can't recognize the DHCP request and logs the info written below. On my 4th try, it picked it up, but the workflow's action state stayed/stuck in STATE_PENDING state -I've left it like that for 2 hours, and I think that is long enough time for it to at least start working-. Then I tried it one more time, and it didn't get pick up by Boots like the first 3 tries.

Any recommendation/tips are welcome. If you have more known ways of making Tinkerbell work -a bare metal server provisioning abother one with an OS-, I am also willing to give a try to them.

Expected Behaviour

I am expecting to see similar outcome for the steps described in https://github.com/tinkerbell/sandbox/blob/main/docs/quickstarts/COMPOSE.md

Current Behaviour

Either Boots doesn't pick up the machine, or it gets picked up but the workflow stays in 0%, STATE_ENDING stage.

  • Boots doesn't recognize the DHCP request. For # echo $TINKERBELL_CLIENT_MAC e4:43:4b:3d:75:b8, I encountered the following output in Boots logs and machine doesn't picked up by Tinkerbell stack;
boots_1                     | {"level":"info","ts":1649949865.6316814,"caller":"[email protected]/handler.go:105","msg":"","service":"github.com/ti
nkerbell/boots","pkg":"dhcp","pkg":"dhcp","event":"recv","mac":"e4:43:4b:3d:75:b8","via":"0.0.0.0","iface":"eno1","xid":"\"4b:3d:75:b8\"","type":"DHCPDISCOVER","secs":28}   boots_1                     | {"level":"info","ts":1649949865.6318014,"caller":"boots/dhcp.go:78","msg":"parsed option82/circuitid","service":"github.com/tinkerbell/boots","
pkg":"main","mac":"e4:43:4b:3d:75:b8","circuitID":""}                                                                                                                        boots_1                     | {"level":"info","ts":1649949865.6336043,"caller":"boots/dhcp.go:91","msg":"retrieved job is empty","service":"github.com/tinkerbell/boots","pkg
":"main","type":"DHCPDISCOVER","mac":"e4:43:4b:3d:75:b8","err":"discover from dhcp message: get hardware by mac from tink: rpc error: code = Unknown desc = SELECT: sql: no rows in result set","errVerbose":"rpc error: code = Unknown desc = SELECT: sql: no rows in result set\nget hardware by mac from tink\ngithub.com/tinkerbell/boots/packet.(*client).DiscoverHardwareFromDHCP\n\t/opt/actions-runner/_work/boots/boots/packet/endpoints.go:108\ngithub.com/tinkerbell/boots/job.discoverHardwareFromDHCP.func1\n\t/opt/actions-runner/_work/boots/boots/job/fetch.go:17\ngithub.com/golang/groupcache/singleflight.(*Group).Do\n\t/home/github/go/pkg/mod/github.com/golang/[email protected]/singleflight/singleflight.go:56\ngithub.com/tinkerbell/boots/job.discoverHardwareFromDHCP\n\t/opt/actions-runner/_work/boots/boots/job/fetch.go:19\ngithub.com/tinkerbell/boots/job.CreateFromDHCP\n\t/opt/actions-runner/_work/boots/boots/job/job.go:106\nmain.dhcpHandler.serveDHCP\n\t/opt/actions-runner/_work/boots/boots/cmd/boots/dhcp.go:89\nmain.dhcpHandler.ServeDHCP.func1\n\t/opt/actions-runner/_work/boots/boots/cmd/boots/dhcp.go:50\ngithub.com/gammazero/workerpool.startWorker\n\t/home/github/go/pkg/mod/github.com/gammazero/[email protected]/workerpool.go:218\nruntime.goexit\n\t/opt/actions-runner/_work/_tool/go/1.16.3/x64/src/runtime/asm_amd64.s:1371\ndiscover from dhcp message"}
  • Workflow doesn't progress. On my 4th try, the machine got picked up by Boots, but then the workflow got "stuck", and it was visible during Step 6 of the linked guide above;
Every 1.0s: tink workflow events c263defc-c0b1-11ec-9ab9-0242ac130006; tink workflow state c263defc-c0b1-11ec-9ab9-0242ac130006 

+-----------+-----------+-------------+----------------+---------+---------------+
| WORKER ID | TASK NAME | ACTION NAME | EXECUTION TIME | MESSAGE | ACTION STATUS |
+-----------+-----------+-------------+----------------+---------+---------------+
+-----------+-----------+-------------+----------------+---------+---------------+
+----------------------+--------------------------------------+
| FIELD NAME           | VALUES                               |
+----------------------+--------------------------------------+
| Workflow ID          | c263defc-c0b1-11ec-9ab9-0242ac130006 |
| Workflow Progress    | 0%                                   |
| Current Task         |                                      |
| Current Action       |                                      |
| Current Worker       |                                      |
| Current Action State | STATE_PENDING                        |
+----------------------+--------------------------------------+

On the KVM screen, this screenshot was visible -on the 4th run when Boots was able to pick up the server's request-, and during that 2 hours it didn't change;

image

Possible Solution

N/A

Steps to Reproduce (for bugs)

  1. Follow the https://github.com/tinkerbell/sandbox/blob/main/docs/quickstarts/COMPOSE.md guide.
  2. Process gets stuck on Step 6, either Boots can't recognize the DHCP request being sent by the defined MAC address in Step 3 or it gets picked up, but the workflow doesn't progress and provision the machine.

Context

Your Environment

  • Operating System and version (e.g. Linux, Windows, MacOS):

Ubuntu 20.04.4 LTS

  • How are you running Tinkerbell? Using Vagrant & VirtualBox, Vagrant & Libvirt, on Packet using Terraform, or give details:

Bare metal provisioner, trying to provision another bare metal server with docker-compose method outlined in https://github.com/tinkerbell/sandbox/blob/main/docs/quickstarts/COMPOSE.md

  • Link to your project or a code example to reproduce issue:

N/A

remove OSIE and replace with hook

We have much faster development time with Hook and can easily support more functionality and hardware.

Expected Behaviour

The sandbox pulls a much smaller kernel/initramfs built by hook.

Current Behaviour

We default to OSIE, which is very large and cumbersome

Possible Solution

Upload a fresh hook to s3 and get a URL.. stick that URL in setup.sh

Steps to Reproduce (for bugs)

Context

Your Environment

  • Operating System and version (e.g. Linux, Windows, MacOS):

  • How are you running Tinkerbell? Using Vagrant & VirtualBox, Vagrant & Libvirt, on Packet using Terraform, or give details:

  • Link to your project or a code example to reproduce issue:

provisioner: ./setup.sh: line 117: NAT_INTERFACE: unbound variable

Expected Behaviour

Following steps on local setup with vagrant on Mac with Virtualbox installed. Using this reference https://docs.tinkerbell.org/setup/local-vagrant/
Docs say to look for line INFO: tinkerbell stack setup completed successfully on ubuntu server
After running the command vagrant up provisioner

Current Behaviour

Instead I get to the ./setup.sh output and
provisioner: + ./setup.sh
provisioner: INFO: starting tinkerbell stack setup
provisioner: INFO: verifying prerequisites for ubuntu (18.04)
provisioner: Found prerequisite: docker
provisioner: Found prerequisite: docker-compose
provisioner: Found prerequisite: ip
provisioner: Found prerequisite: jq
provisioner: Found prerequisite: netplan
provisioner: INFO: waiting for the network configuration to be applied by systemd-networkd
provisioner: INFO: tinkerbell network interface configured successfully
provisioner: ./setup.sh: line 117: NAT_INTERFACE: unbound variable
The SSH command responded with a non-zero exit status. Vagrant
assumes that this means the command failed. The output for this command
should be in the log above. Please read the output to determine what
went wrong.

Possible Solution

I noticed the setup.sh file was modified yesterday on April 8'21 but have not done deep dive on diffs just yet.

Steps to Reproduce (for bugs)

  1. git clone https://github.com/tinkerbell/sandbox.git
  2. cd sandbox/deploy/vagrant
  3. vagrant up provisioner

Context

Just opening issue if it might be an easy fix for diff summary on code commits.
One note - if I ignore the error and just vagrant ssh provisioner, I do get dropped in to the a prompt in the provisioner so may just be background error. Will continue with steps to setup and see if it prevents any progress.

Your Environment

  • Operating System and version (e.g. Linux, Windows, MacOS):
    Mac OS Big sur 11.2.3
    VirtualBox Graphical User Interface
    Version 6.1.18 r142142 (Qt5.6.3

  • Link to your project or a code example to reproduce issue:

Remove internal registry

We are not sure if the internal registry is something we want or if it is just a complication.

We enabled #41. So now the workers have access to the internet. It means that all the registry setup and the certificate to have it secure is not strictly required anymore.

BUT we will have to figure out by ourselves how to get our actions to run in the worker. This is not a problem when the action is public, but for private repositories, it is a bit more complicated and the operating system installation environment (osie or tinkie) will have to give us a way to inject authentication (or we have to make tink-worker good enough to authenticate I suppose)

OS Deployment not working as expected

Expected Behaviour

I'm not sure if I'm putting this ticket in the right spot.

I am trying to test Tinkerbell to see if we might want to switch to it as a deployment method for our physical hosts. I am trying to deploy Debian as per your instructions here, specifically streaming the image to disk. I'm not sure I'm following them correctly and I just want to know if I'm on the right track. Any help would be very much appreciated.

I've also tried CentOS and Ubuntu with the same results

PREFACE -- The hello-world container deploys for me

I add the following hardware, template, and workflow. When the VM with the mac of 52:54:00:de:af:57 comes up it boots and alpine linux is running, debian is not installed and I see no errors in the logs and I have not caught any errors in the boot sequence either. I assume I am doing something pretty dumb, but I can't seem to figure it out.

cat > hardware-data.json <<EOF
{
  "id": "ce3e62ed-826f-4485-a39f-a82bb74338e2",
  "metadata": {
    "facility": {
      "facility_code": "onprem"
    },
    "instance": {},
    "state": ""
  },
  "network": {
    "interfaces": [
      {
        "dhcp": {
          "arch": "x86_64",
          "ip": {
            "address": "192.168.100.202",
            "gateway": "192.168.100.1",
            "netmask": "255.255.255.0"
          },
          "mac": "52:54:00:de:af:57",
          "uefi": false
        },
        "netboot": {
          "allow_pxe": true,
          "allow_workflow": true
        }
      }
    ]
  }
}
EOF

docker exec -i deploy_tink-cli_1 tink hardware push < ./hardware-data.json


cat > debian.yml  <<EOF
version: "0.1"
name: debian_Focal
global_timeout: 1800
tasks:
  - name: "os-installation"
    worker: "{{.device_1}}"
    volumes:
      - /dev:/dev
      - /dev/console:/dev/console
      - /lib/firmware:/lib/firmware:ro
    actions:
      - name: "stream debian image"
        image: quay.io/tinkerbell-actions/image2disk:v1.0.0
        timeout: 90
        environment:
            IMG_URL: 192.168.100.201:8080/debian-10-openstack-amd64.raw
            DEST_DISK: /dev/sda
            COMPRESSED: false
      - name: "kexec debian"
        image: quay.io/tinkerbell-actions/kexec:v1.0.0
        timeout: 90
        pid: host
        environment:
            BLOCK_DEVICE: /dev/sda1
            FS_TYPE: ext4          
EOF

docker exec -i deploy_tink-cli_1 tink template create \
  < ./debian.yml


docker exec -i deploy_tink-cli_1 tink workflow create \
    -t 44e51a28-a6cc-11eb-b575-0242ac120005 \
    -r '{"device_1":"52:54:00:de:af:57"}'

Additionally, if I log into the alpine container and execute the following:

docker run -e DEST_DISK=/dev/sda -e IMG_URL=192.168.100.201:8080/debian-10-openstack-amd64.raw -e COMPRESSED=false quay.io/tinkerbell-actions/image2disk:v1.0.0

I get an error that says `msg="write /dev/sda: no space left on device"

Context

Your Environment

  • Operating System and version (e.g. Linux, Windows, MacOS):

  • How are you running Tinkerbell? Using Vagrant & VirtualBox, Vagrant & Libvirt, on Packet using Terraform, or give details:

I assume I'm just doing something wrong

Run nightly tests to have a chance detecting flakiness

Expected Behaviour

Every day e2e tests runs to ensure everything is functioning as expected.

Current Behaviour

Currently, some users may hit things like tinkerbell/osie#183, which does not happen all the time.

Possible Solution

Configure GitHub Action to run every night, so build fails are reported via email.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.