Code Monkey home page Code Monkey logo

Comments (6)

bshephar avatar bshephar commented on July 17, 2024

Hey, I'll try reproducing with the same release image and get back to you.

from dev-scripts.

bshephar avatar bshephar commented on July 17, 2024

This appears to have worked for me:

[m3@localhost dev-scripts]$ oc get clusterversion
NAME      VERSION                          AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.13.0-0.okd-2023-08-18-135805   True        False         58m     Cluster version is 4.13.0-0.okd-2023-08-18-135805
[m3@localhost dev-scripts]$ oc get bmh -A
NAMESPACE               NAME              STATE                    CONSUMER                      ONLINE   ERROR   AGE
openshift-machine-api   ostest-master-0   externally provisioned   ostest-fr5ld-master-0         true             92m
openshift-machine-api   ostest-master-1   externally provisioned   ostest-fr5ld-master-1         true             92m
openshift-machine-api   ostest-master-2   externally provisioned   ostest-fr5ld-master-2         true             92m
openshift-machine-api   ostest-worker-0   provisioned              ostest-fr5ld-worker-0-jbq5b   true             92m
openshift-machine-api   ostest-worker-1   provisioned              ostest-fr5ld-worker-0-84t6t   true             92m

I'll have to dig into the logs you provided to see if there are any clues about why yours is failing and mine isn't.

I'm setting:

[m3@localhost dev-scripts]$ grep -Ev '^#|^$' config_m3.sh
export OPENSHIFT_RELEASE_IMAGE=registry.ci.openshift.org/origin/release:4.13.0-0.okd-2023-08-18-135805
export PULL_SECRET_FILE=pull_secret.json
export OPENSHIFT_RELEASE_TYPE=okd
export IP_STACK=v4
export NUM_EXTRA_WORKERS=2

So we should be deploying the same thing here. I'm running on a CentOS9-Stream host:

[m3@localhost dev-scripts]$ cat /etc/redhat-release
CentOS Stream release 9

I see you're running Rocky 8.8:

❯ grep PRETTY_NAME 06_create_cluster-2023-09-20-082531.log
2023-09-20 08:25:31 +++(/etc/os-release:7): source(): PRETTY_NAME='Rocky Linux 8.8 (Green Obsidian)'

It would probably be helpful if you were able to provide logs from the Bootstrap node, since that is where the ironic container should be running:
https://docs.okd.io/latest/support/troubleshooting/troubleshooting-installations.html#gathering-bootstrap-diagnostic-data_troubleshooting-installations

Check to see if the Ironic is listening on the bootstrap node:

sudo ss -tpnl | grep 6385

See if there are any restarting containers:

podman ps -a

Check the logs of the Ironic container specifically:

sudo podman logs ironic

That's probably the best place to start trying to narrow things down.

from dev-scripts.

rakeshk121 avatar rakeshk121 commented on July 17, 2024

Thanks @bshephar .

Yes , Im setting the variables which matches your settings,

[core@nodea08 dev-scripts]$ grep -Ev '^#|^$' config_core.sh 
export OPENSHIFT_RELEASE_IMAGE=registry.ci.openshift.org/origin/release:4.13.0-0.okd-2023-08-18-135805
export PULL_SECRET_FILE=pull_secret.json
export OPENSHIFT_RELEASE_TYPE=okd
export NUM_EXTRA_WORKERS=2
export IP_STACK=v4

Ironic is listening on the bootstrap node:

[core@localhost ~]$ sudo ss -tpnl | grep 6385
LISTEN 0      128                *:6385             *:*    users:(("ironic",pid=6379,fd=5),("ironic",pid=6379,fd=4))  

I do not see any restarting of the containers.

[core@localhost ~]$ sudo podman ps -a
CONTAINER ID  IMAGE                                                                                                  COMMAND               CREATED            STATUS                        PORTS       NAMES
1c51a4cb99f1  quay.io/openshift/okd-content@sha256:50ec87cbc91ded3b7cd41e54da9a21f0835cdfc36daac0bd1dca65737d70aa9f                        About an hour ago  Up About an hour                          dnsmasq
5fe7599f4302  quay.io/openshift/okd-content@sha256:a70e232022f49a883e1facb48690d6c16fdbdc79b2ff4fc807bf07825eb7c380  /bin/copy-metal -...  About an hour ago  Exited (0) About an hour ago              coreos-downloader
b17f707d9374  quay.io/openshift/okd-content@sha256:50ec87cbc91ded3b7cd41e54da9a21f0835cdfc36daac0bd1dca65737d70aa9f                        About an hour ago  Up About an hour                          httpd
ef6ba4a14d4e  quay.io/openshift/okd-content@sha256:ad2224900eabbb62bc83b7b356a0491bdb5798b57c2351f5df05e01a3b84ac90                        About an hour ago  Up About an hour                          image-customization
a8737f33d92a  quay.io/openshift/okd-content@sha256:50ec87cbc91ded3b7cd41e54da9a21f0835cdfc36daac0bd1dca65737d70aa9f                        About an hour ago  Up About an hour                          ironic
ffa193396b97  quay.io/openshift/okd-content@sha256:50ec87cbc91ded3b7cd41e54da9a21f0835cdfc36daac0bd1dca65737d70aa9f                        About an hour ago  Up About an hour                          ironic-inspector
d5f382e9cb36  quay.io/openshift/okd-content@sha256:50ec87cbc91ded3b7cd41e54da9a21f0835cdfc36daac0bd1dca65737d70aa9f                        About an hour ago  Up About an hour                          ironic-ramdisk-logs
f7847bdcf80c  quay.io/openshift/okd-content@sha256:1a245dbcc0684c6ca15c9ea67fbfa55073c5d672ea7b48f50c14c371b09de558  start --tear-down...  15 minutes ago     Up 15 minutes                         

Attaching the ironic logs here:

ironic.log

from dev-scripts.

bshephar avatar bshephar commented on July 17, 2024

Hey @rakeshk121 .

Ok, two thoughts:
1.
Was this IP address reachable at all during the bootstrap process? 192.168.111.5

$ curl -s -o /dev/null -w "%{http_code}" https://192.168.111.5:6443 -k

I originally thought that maybe this just happened at the end of the deployment failure, but I think that VIP should still actually be available even if it does fail:

2023-09-20 09:26:01 E0920 09:26:01.513229  161368 memcache.go:238] couldn't get current server API group list: Get "https://api.ostest.test.metalkube.org:6443/api?timeout=32s": dial tcp 192.168.111.5:6443: connect: no route to host
2023-09-20 09:26:04 E0920 09:26:04.585280  161368 memcache.go:238] couldn't get current server API group list: Get "https://api.ostest.test.metalkube.org:6443/api?timeout=32s": dial tcp 192.168.111.5:6443: connect: no route to host

It looks like Ironic is working there. So, assuming that IP address is indeed reachable during the bootstrap process. We might need a must-gather to see if there is anything else happening on that node. If it's not reachable , then that is the first problem we need to solve.

from dev-scripts.

bdlink avatar bdlink commented on July 17, 2024

I am having what seems to be a similar failure.
config parameters:

export OPENSHIFT_RELEASE_IMAGE=quay.io/openshift/okd:4.15.0-0.okd-2024-03-10-010116
export PULL_SECRET_FILE=pull_secret.json
export OPENSHIFT_RELEASE_TYPE=okd
export IP_STACK=v4
export NETWORK_TYPE="OVNKubernetes"
export MASTER_DISK=90
export MASTER_VCPU=4
export NUM_WORKERS=0
export NUM_EXTRA_WORKERS=0

Using WORKING_DIR=/home/dev-scripts

I am running on a fresh install of CentOS Stream 9, and the process after make is that step 06 times out after an hour. The bootstrap node comes up, the bootstrap API comes up.

sudo ss -tpnl | grep 6385 returns nothing.
sudo podman ps does not show restarting containers (inside or outside the bootstrap node)
sudo podman logs ironic returns
Error: no container with name or ID "ironic" found: no such container

The virtual machines ostest_master_0 , _1, and_2 are shut down.
oc get bmh -A shows three machines on line.
oc get po -n openshift-machine-api shows:
No resources found in openshift-machine-api namespace

As I am using a current version of yq (v4.44.2) I had to remove the "y" on line 102 of 01_install_requirements.sh
06_create_cluster-2024-06-18-075053.log

from dev-scripts.

bdlink avatar bdlink commented on July 17, 2024

Looking at the use of yq in the bash scripts, I think the ones in utils.sh may not work with yq v4 (needing a period before []). This could be the cause of the issue. However, I am not an expert in yq.

In the bootstrap there are fewer podman images running than Rakeshk121 had:

sudo podman ps -a
CONTAINER ID  IMAGE                                                                                                  COMMAND               CREATED        STATUS        PORTS       NAMES
3308f5f6df18  quay.io/openshift/okd-content@sha256:90eb227746e445d6e258d3c9aaccbbdeca517ffb0dcaf5b880c2bde4f74aaae2  /bin/rundnsmasq       11 hours ago   Up 11 hours               dnsmasq
e4b3a442040a  quay.io/openshift/okd-content@sha256:90eb227746e445d6e258d3c9aaccbbdeca517ffb0dcaf5b880c2bde4f74aaae2  /bin/runlogwatch....  11 hours ago   Up 11 hours               ironic-ramdisk-logs
80e66f86071b  quay.io/openshift/okd-content@sha256:90eb227746e445d6e258d3c9aaccbbdeca517ffb0dcaf5b880c2bde4f74aaae2  /bin/runhttpd         11 hours ago   Up 11 hours               httpd
49d9ecfa58df  quay.io/openshift/okd-content@sha256:9f3f8f11fd743a332f8328b774bed1854c5d5d058663eb122289191bcb0cee73  start --tear-down...  3 minutes ago  Up 3 minutes              cluster-bootstrap

from dev-scripts.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.