Code Monkey home page Code Monkey logo

Comments (17)

guilhermesteinmuller avatar guilhermesteinmuller commented on September 11, 2024 1

I have added a comment related the usage of remote_group and remote_ip_prefixhere https://review.opendev.org/c/vexxhost/ansible-collection-atmosphere/+/840855

in reality im thinking here… so the sec group are applied to ports of health manager created on each controller right?

then what actually happens is that the amphoras will have a sec group there and will only contact those ports? if so, maybe the approach of remote_group makes sense…

from atmosphere.

okozachenko1203 avatar okozachenko1203 commented on September 11, 2024

@mnaser here is the first commit
https://review.opendev.org/c/vexxhost/ansible-collection-atmosphere/+/840855

from atmosphere.

mnaser avatar mnaser commented on September 11, 2024

Thanks for that, I left a review, sorry for teh delay.

from atmosphere.

okozachenko1203 avatar okozachenko1203 commented on September 11, 2024

https://review.opendev.org/c/vexxhost/ansible-collection-atmosphere/+/842384

from atmosphere.

mnaser avatar mnaser commented on September 11, 2024

left a review, good progress @okozachenko1203

from atmosphere.

mnaser avatar mnaser commented on September 11, 2024

@okozachenko1203 could you try and check this locally since I think it's failing to go up properly

from atmosphere.

okozachenko1203 avatar okozachenko1203 commented on September 11, 2024

@okozachenko1203 could you try and check this locally since I think it's failing to go up properly

sure, i have been doing tests on my lab

from atmosphere.

okozachenko1203 avatar okozachenko1203 commented on September 11, 2024

@mnaser Please check my answers on your 3 comments https://review.opendev.org/c/vexxhost/ansible-collection-atmosphere/+/840855. I didn't resolve them to continue discussion.

Btw, after using separate rabbitmq clusters per service, zuul CI failing because of resource lack.
It fails at rabbitmq-octavia deployment.

Events:
  Type     Reason            Age   From               Message
  ----     ------            ----  ----               -------
  Warning  FailedScheduling  65m   default-scheduler  0/8 nodes are available: 3 Insufficient cpu, 5 node(s) didn't match Pod's node affinity/selector.
root@ctl1:/home/ubuntu# kg rabbitmqcluster
NAME                ALLREPLICASREADY   RECONCILESUCCESS   AGE
rabbitmq-barbican   True               True               101m
rabbitmq-cinder     True               True               93m
rabbitmq-glance     True               True               98m
rabbitmq-heat       True               True               75m
rabbitmq-keystone   True               True               105m
rabbitmq-neutron    True               True               88m
rabbitmq-nova       True               True               84m
rabbitmq-octavia    False              Unknown            71m
rabbitmq-senlin     True               True               77m

Resource spec of one rabbitmq cluster is

    Limits:
      cpu:     2
      memory:  2Gi
    Requests:
      cpu:      1
      memory:   2Gi

So i think we need to increase node spec or descrease rabbitmq cluster resource spec.

Which one do you prefer?

from atmosphere.

okozachenko1203 avatar okozachenko1203 commented on September 11, 2024

octavia role is pending because of this https://review.opendev.org/c/vexxhost/ansible-collection-atmosphere/+/844271

from atmosphere.

guilhermesteinmuller avatar guilhermesteinmuller commented on September 11, 2024

octavia role is pending because of this https://review.opendev.org/c/vexxhost/ansible-collection-atmosphere/+/844271

@okozachenko1203 i see that this is merged. Do we still have blockers here?

from atmosphere.

guilhermesteinmuller avatar guilhermesteinmuller commented on September 11, 2024

I have added a comment related the usage of remote_group and remote_ip_prefixhere https://review.opendev.org/c/vexxhost/ansible-collection-atmosphere/+/840855

from atmosphere.

okozachenko1203 avatar okozachenko1203 commented on September 11, 2024

in this case, this ps is ready for continuing review @mnaser

from atmosphere.

okozachenko1203 avatar okozachenko1203 commented on September 11, 2024

to fetch tempest log at least
https://review.opendev.org/c/vexxhost/ansible-collection-atmosphere/+/848943

from atmosphere.

okozachenko1203 avatar okozachenko1203 commented on September 11, 2024

openstack/octavia-housekeeping-d8978f76c-jbpgq[octavia-housekeeping]: 2022-07-18 16:43:50.554 1 WARNING octavia.amphorae.drivers.haproxy.rest_api_driver [-] Could not connect to instance. Retrying.: requests.exceptions.ConnectTimeout: HTTPSConnectionPool(host='172.24.2.91', port=9443): Max retries exceeded with url: // (Caused by ConnectTimeoutError(<urllib3.connection.HTTPSConnection object at 0x7f21f76a27f0>, 'Connection to 172.24.2.91 timed out. (connect timeout=10.0)'))

from atmosphere.

okozachenko1203 avatar okozachenko1203 commented on September 11, 2024

@mnaser can you help me on this?

  • After fixing port binding for health manager, lb status changed to ONLINE but it stuck with provisioning status pending_create.
root@ctl1:/home/ubuntu# o loadbalancer list
+--------------------------------------+------+----------------------------------+---------------+---------------------+------------------+----------+
| id                                   | name | project_id                       | vip_address   | provisioning_status | operating_status | provider |
+--------------------------------------+------+----------------------------------+---------------+---------------------+------------------+----------+
| 8b83fbf5-5b52-47ea-90aa-e999cc6cd133 | lb1  | eca2e68ae45340f997824c695113ffab | 10.96.250.210 | PENDING_CREATE      | ONLINE           | amphora  |
+--------------------------------------+------+----------------------------------+---------------+---------------------+------------------+----------+

root@ctl1:/home/ubuntu# o port list
+--------------------------------------+-------------------------------------------------+-------------------+------------------------------------------------------------------------------+--------+
| ID                                   | Name                                            | MAC Address       | Fixed IP Addresses                                                           | Status |
+--------------------------------------+-------------------------------------------------+-------------------+------------------------------------------------------------------------------+--------+
| 22c145cd-b12b-40aa-bbfb-944c64c60758 |                                                 | fa:16:3e:d5:2a:63 | ip_address='172.24.0.2', subnet_id='73e96218-92f9-44e5-be0e-8bb1edf33b19'    | ACTIVE |
| 2e4f48b8-6c5f-4c76-854d-056f3a008d10 | octavia-health-manager-port-ctl2                | fa:16:3e:72:c9:13 | ip_address='172.24.1.104', subnet_id='73e96218-92f9-44e5-be0e-8bb1edf33b19'  | ACTIVE |
| 4f0ce54e-634f-40c1-8a76-0c1d40a2863c |                                                 | fa:16:3e:16:83:80 | ip_address='172.24.2.150', subnet_id='73e96218-92f9-44e5-be0e-8bb1edf33b19'  | ACTIVE |
| 8875ddcb-8230-430f-a9ef-bbe74fadbfa4 | octavia-health-manager-port-ctl3                | fa:16:3e:65:e8:79 | ip_address='172.24.1.208', subnet_id='73e96218-92f9-44e5-be0e-8bb1edf33b19'  | ACTIVE |
| a7e5d08e-9ac3-4f27-bd34-d2cf2d0b4612 | octavia-lb-8b83fbf5-5b52-47ea-90aa-e999cc6cd133 | fa:16:3e:4f:b2:48 | ip_address='10.96.250.210', subnet_id='a9f5e3bc-41e8-4746-acf4-4c65c65a5755' | DOWN   |
| ce402260-fa7a-42e5-9e61-9be844e601cd |                                                 | fa:16:3e:5e:1a:28 | ip_address='172.24.0.4', subnet_id='73e96218-92f9-44e5-be0e-8bb1edf33b19'    | ACTIVE |
| d3efde73-5bae-40dc-bfae-9ef9ea11b79a | octavia-health-manager-port-ctl1                | fa:16:3e:76:55:fa | ip_address='172.24.3.212', subnet_id='73e96218-92f9-44e5-be0e-8bb1edf33b19'  | ACTIVE |
| ff44e947-5b39-4a0c-9d48-96663633cf9d |                                                 | fa:16:3e:72:8f:19 | ip_address='172.24.0.3', subnet_id='73e96218-92f9-44e5-be0e-8bb1edf33b19'    | ACTIVE |
+--------------------------------------+-------------------------------------------------+-------------------+------------------------------------------------------------------------------+--------+

But lb port is down yet. Not sure down port is the reason of pending_create status or that is the result of it.

  • I fixed sec groups properly
    5555 should be reachable from all amphora machines so we can set remote_ip_prefix using subnet's CIDR for lb-health-mgr-sec-grp
    9443 should be reachable from health manager and housekeeping so we can set remote_ip_prefix using controller ports(in lb-mgmt net) ip.

  • Current status
    I tried troubleshooting for this pending_create provisioning status, all ports for heartbeat packets and amphora’s api are reachable on controllers. I catched packets on lb-mgmt network and couldn't find any udp packet for heartbeat sent to health manager from amphora.

https://access.redhat.com/solutions/4942351
I wanted to check if this is the same case with us
I created my own keypair and configured octavia to use that for amphora. And i tried to access amphora via ssh.
I can telnet to 22 port and the key is also correct but the connection is closed suddenly.

root@ctl1:/home/ubuntu# ssh [email protected] -vvv
OpenSSH_8.2p1 Ubuntu-4ubuntu0.3, OpenSSL 1.1.1f  31 Mar 2020
debug1: Reading configuration data /etc/ssh/ssh_config
debug1: /etc/ssh/ssh_config line 19: include /etc/ssh/ssh_config.d/*.conf matched no files
debug1: /etc/ssh/ssh_config line 21: Applying options for *
debug2: resolve_canonicalize: hostname 172.24.2.150 is address
debug2: ssh_connect_direct
debug1: Connecting to 172.24.2.150 [172.24.2.150] port 22.
debug1: Connection established.
debug1: identity file /root/.ssh/id_rsa type 0
debug1: identity file /root/.ssh/id_rsa-cert type -1
debug1: identity file /root/.ssh/id_dsa type -1
debug1: identity file /root/.ssh/id_dsa-cert type -1
debug1: identity file /root/.ssh/id_ecdsa type -1
debug1: identity file /root/.ssh/id_ecdsa-cert type -1
debug1: identity file /root/.ssh/id_ecdsa_sk type -1
debug1: identity file /root/.ssh/id_ecdsa_sk-cert type -1
debug1: identity file /root/.ssh/id_ed25519 type -1
debug1: identity file /root/.ssh/id_ed25519-cert type -1
debug1: identity file /root/.ssh/id_ed25519_sk type -1
debug1: identity file /root/.ssh/id_ed25519_sk-cert type -1
debug1: identity file /root/.ssh/id_xmss type -1
debug1: identity file /root/.ssh/id_xmss-cert type -1
debug1: Local version string SSH-2.0-OpenSSH_8.2p1 Ubuntu-4ubuntu0.3
debug1: Remote protocol version 2.0, remote software version OpenSSH_8.2p1 Ubuntu-4ubuntu0.5
debug1: match: OpenSSH_8.2p1 Ubuntu-4ubuntu0.5 pat OpenSSH* compat 0x04000000
debug2: fd 3 setting O_NONBLOCK
debug1: Authenticating to 172.24.2.150:22 as 'ubuntu'
debug3: send packet: type 20
debug1: SSH2_MSG_KEXINIT sent
debug3: receive packet: type 20
debug1: SSH2_MSG_KEXINIT received
debug2: local client KEXINIT proposal
debug2: KEX algorithms: curve25519-sha256,[email protected],ecdh-sha2-nistp256,ecdh-sha2-nistp384,ecdh-sha2-nistp521,diffie-hellman-group-exchange-sha256,diffie-hellman-group16-sha512,diffie-hellman-group18-sha512,diffie-hellman-group14-sha256,ext-info-c
debug2: host key algorithms: [email protected],[email protected],[email protected],[email protected],[email protected],[email protected],[email protected],[email protected],[email protected],ecdsa-sha2-nistp256,ecdsa-sha2-nistp384,ecdsa-sha2-nistp521,[email protected],ssh-ed25519,[email protected],rsa-sha2-512,rsa-sha2-256,ssh-rsa
debug2: ciphers ctos: [email protected],aes128-ctr,aes192-ctr,aes256-ctr,[email protected],[email protected]
debug2: ciphers stoc: [email protected],aes128-ctr,aes192-ctr,aes256-ctr,[email protected],[email protected]
debug2: MACs ctos: [email protected],[email protected],[email protected],[email protected],[email protected],[email protected],[email protected],hmac-sha2-256,hmac-sha2-512,hmac-sha1
debug2: MACs stoc: [email protected],[email protected],[email protected],[email protected],[email protected],[email protected],[email protected],hmac-sha2-256,hmac-sha2-512,hmac-sha1
debug2: compression ctos: none,[email protected],zlib
debug2: compression stoc: none,[email protected],zlib
debug2: languages ctos: 
debug2: languages stoc: 
debug2: first_kex_follows 0 
debug2: reserved 0 
debug2: peer server KEXINIT proposal
debug2: KEX algorithms: curve25519-sha256,[email protected],ecdh-sha2-nistp256,ecdh-sha2-nistp384,ecdh-sha2-nistp521,diffie-hellman-group-exchange-sha256,diffie-hellman-group16-sha512,diffie-hellman-group18-sha512,diffie-hellman-group14-sha256
debug2: host key algorithms: rsa-sha2-512,rsa-sha2-256,ssh-rsa,ecdsa-sha2-nistp256,ssh-ed25519
debug2: ciphers ctos: [email protected],aes128-ctr,aes192-ctr,aes256-ctr,[email protected],[email protected]
debug2: ciphers stoc: [email protected],aes128-ctr,aes192-ctr,aes256-ctr,[email protected],[email protected]
debug2: MACs ctos: [email protected],[email protected],[email protected],[email protected],[email protected],[email protected],[email protected],hmac-sha2-256,hmac-sha2-512,hmac-sha1
debug2: MACs stoc: [email protected],[email protected],[email protected],[email protected],[email protected],[email protected],[email protected],hmac-sha2-256,hmac-sha2-512,hmac-sha1
debug2: compression ctos: none,[email protected]
debug2: compression stoc: none,[email protected]
debug2: languages ctos: 
debug2: languages stoc: 
debug2: first_kex_follows 0 
debug2: reserved 0 
debug1: kex: algorithm: curve25519-sha256
debug1: kex: host key algorithm: ecdsa-sha2-nistp256
debug1: kex: server->client cipher: [email protected] MAC: <implicit> compression: none
debug1: kex: client->server cipher: [email protected] MAC: <implicit> compression: none
debug3: send packet: type 30
debug1: expecting SSH2_MSG_KEX_ECDH_REPLY

I checked MTU but it is ok.

I created an other vm using the amphora image and cirros image on lb-mgmt network and tried to access via ssh but same issue happened.

I think there is some issue in lb-mgmt network but not sure what it is.
I compared the upstream document for networking creation (https://docs.openstack.org/octavia/latest/install/install-ubuntu.html section 7.) and I can see o-hm0 but cannot find o-bhm0 on ctls, even on our public clouds.

From the octavia log, i can see only these warnnings from housekeeping and health-manager

openstack/octavia-housekeeping-56f5c48cfc-26f6m[octavia-housekeeping]: 2022-07-19 10:52:35.777 1 WARNING octavia.amphorae.drivers.haproxy.rest_api_driver [-] Could not connect to instance. Retrying.: requests.exceptions.ConnectionError: HTTPSConnectionPool(host='172.24.0.17', port=9443): Max retries exceeded with url: // (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x7f68001a2490>: Failed to establish a new connection: [Errno 113] No route to host'))

openstack/octavia-health-manager-default-rp98j[octavia-health-manager]: 2022-07-19 10:52:35.768 2210098 WARNING octavia.controller.healthmanager.health_manager [-] Load balancer 8b83fbf5-5b52-47ea-90aa-e999cc6cd133 is in immutable state PENDING_CREATE. Skipping failover.

housekeeping tries to connect non-existing amphoras' api. I cannot find such warning for existing amphora.

from atmosphere.

mnaser avatar mnaser commented on September 11, 2024

@okozachenko1203 I think in this case, the issue is MTU.. let me propose a theory:

2: ens3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc fq_codel state UP group default qlen 1000
    inet 10.96.240.110/24 brd 10.96.240.255 scope global dynamic ens3
       valid_lft 81199sec preferred_lft 81199sec

The interface that runs the VXLAN network is running with 1450 MTU, but, the interface that is also running the VXLAN network for Octavia which is o-hm0 is also running 1450 MTU, now that means when it needs to leave the o-hm0 interface with a full 1450 MTU packet, it will need to leave the ens3 interface with a 1500 MTU packet.. that can't happen.

I've also tested the following:

# ping -M do -s 1422 172.24.1.104
PING 172.24.1.104 (172.24.1.104) 1422(1450) bytes of data.
^C
--- 172.24.1.104 ping statistics ---
7 packets transmitted, 0 received, 100% packet loss, time 6128ms

So you can see that full MTU packets will not work, however, this is something we improved in our cloud lately!

https://vexxhost.com/blog/9000mtus-jumbo-frames-public-cloud/

So I think if you delete this stack and recreate it, you will get internal interfaces that are 9000 MTU, and then you will be able to run pings for larger packets, and I think it will resolve the issue because the timeouts are probably happening because HTTPS is using high MTU.

from atmosphere.

okozachenko1203 avatar okozachenko1203 commented on September 11, 2024

@mnaser thanks. 👍
i will recreate

from atmosphere.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.