Code Monkey home page Code Monkey logo

Comments (9)

pqarmitage avatar pqarmitage commented on May 27, 2024

You are showing that keepalived segfaulted, which would explain why some addresses failed to be removed (keepalived crashed before it removed the addresses).

The version of keepalived you are running is quite old (nearly 4 years), and some 800 non-merge commits behind the current version. Can you please try running with v2.2.8 and see if you still experience the problem.

If you still experience the problem with v2.2.8, then a stack backtrace produced from the coredump could help us identify the cause of the problem.

from keepalived.

CoderSinger avatar CoderSinger commented on May 27, 2024

thanks for your reply! @pqarmitage
May I ask if the issue of deleting the network port( or vlan interface) where VIP is located during the Keepalived runtime causing it to crash has been resolved in the new version? If it has already been resolved, in which commit?

maybe related with this one?#1902 (comment)

from keepalived.

pqarmitage avatar pqarmitage commented on May 27, 2024

@CoderSinger The answer to your question about whether removing an interface causes a segfault has been fixed is that I do not know, although I suspect it has. Unfortunately I do not have the time to go through all the commits, and I don't remember all the changes that have been made over the last 4 years or so! This is why I asked you to try v2.2.8 to see if it resolves your issue. If the issue still exists with v2.2.8 then I will research it provide a fix.

If you want to see all the relevant commits, you could try executing in a keepalived git tree: git log --oneline v2.1.5..v2.2.8 | grep -v " Merge " and looking for a likely commit description.

from keepalived.

CoderSinger avatar CoderSinger commented on May 27, 2024

@pqarmitage Thank you for your reply. I will consider upgrading to the Keepalived version.
Also, I have some other questions. I used keepalived in the Docker container, but because I didn't want to increase the volume of the container image, I didn't use Systemctl to control keepalived. Instead, I directly executed the keepalived executable file and specified parameters to run it. In this case, three processes can be seen through the 'ps' command: keepalived, keepalived checker, and keepalived vrrp.
I have two questions:

  1. May I ask if using "kill -15 {3pids}" can meet my requirements if I want to gracefully exit the process to clear the IPVS rule? Or is it just "kill -15 {keepalived main process pid}"?
  2. If I want to reload keepalived to make the new keepalived.conf file enabled, can "kill -HUP {3pids}" meet my requirements? Or is it just "kill -HUP {keepalived main process pid}"?

The reason why I have this question is because I encountered the following issues when using 3pids for reload.
Normal checker logs:

[INFO] 2024/02/23 15:59:34 Shutting down service [1111::18]:tcp:5005 from VS [1efe:ffff:0:f107::18]:tcp:5005
[INFO] 2024/02/23 15:59:34 Shutting down service [192.169.0.24]:tcp:443 from VS [10.227.200.18]:tcp:443
[INFO] 2024/02/23 15:59:34 Shutting down service [192.169.0.40]:tcp:29952 from VS [10.227.200.18]:tcp:29952
[INFO] 2024/02/23 15:59:34 Shutting down service [1111::18]:tcp:18080 from VS [5efe:ffff:0:f101::1]:tcp:18080
[INFO] 2024/02/23 15:59:34 Shutting down service [1111::18]:tcp:29292 from VS [5efe:ffff:0:f101::1]:tcp:29292

Abnormal checker logs:

[INFO] 2024/02/23 15:59:34 Shutting down service [1111::18]:tcp:5005 from VS [1efe:ffff:0:f107::18]:tcp:5005
[INFO] 2024/02/23 15:59:34 IPVS cmd IP_VS_SO_SET_DELDEST(1160) error: No such destination(2)
[INFO] 2024/02/23 15:59:34 IPVS cmd IP_VS_SO_SET_DEL(1156) error: No such file or directory(2)
[INFO] 2024/02/23 15:59:34 Shutting down service [192.169.0.24]:tcp:443 from VS [10.227.200.18]:tcp:443
[INFO] 2024/02/23 15:59:34 IPVS cmd IP_VS_SO_SET_DELDEST(1160) error: No such destination(2)
[INFO] 2024/02/23 15:59:34 IPVS cmd IP_VS_SO_SET_DEL(1156) error: No such file or directory(2)
[INFO] 2024/02/23 15:59:34 Shutting down service [192.169.0.40]:tcp:29952 from VS [10.227.200.18]:tcp:29952
[INFO] 2024/02/23 15:59:34 IPVS cmd IP_VS_SO_SET_DELDEST(1160) error: No such destination(2)
[INFO] 2024/02/23 15:59:34 IPVS cmd IP_VS_SO_SET_DEL(1156) error: No such file or directory(2)
[INFO] 2024/02/23 15:59:34 Shutting down service [1111::18]:tcp:18080 from VS [5efe:ffff:0:f101::1]:tcp:18080
[INFO] 2024/02/23 15:59:34 IPVS cmd IP_VS_SO_SET_DELDEST(1160) error: No such destination(2)
[INFO] 2024/02/23 15:59:34 IPVS cmd IP_VS_SO_SET_DEL(1156) error: No such file or directory(2)
[INFO] 2024/02/23 15:59:34 Shutting down service [1111::18]:tcp:29292 from VS [5efe:ffff:0:f101::1]:tcp:29292
[INFO] 2024/02/23 15:59:34 IPVS cmd IP_VS_SO_SET_DELDEST(1160) error: No such destination(2)
[INFO] 2024/02/23 15:59:34 IPVS cmd IP_VS_SO_SET_DEL(1156) error: No such file or directory(2)

It's like I repeatedly called reload, causing duplicate deletion of service configuration and resulting in an error. At the same time point in the error log, it was observed in the VRRP log that the keepalived process entered the backup state, but in reality, the keepalived process is executing on a single node.
vrrp log:

[INFO] 2024/02/23 15:59:34 IPVS cmd IP_VS_SO_SET_STOPDAEMON(1164) error: No such file or directory(2)
[INFO] 2024/02/23 15:59:34 (xlb1) Entering BACKUP STATE
[INFO] 2024/02/23 15:59:34 (xlb1) sent 0 priority
[INFO] 2024/02/23 15:59:34 (xlb1) removing Virtual Routes
[INFO] 2024/02/23 15:59:34 Netlink: error: No such process(3), type=RTM_DELROUTE(25), seq=1708675201, pid=0
[INFO] 2024/02/23 15:59:34 Netlink: error: No such process(3), type=RTM_DELROUTE(25), seq=1708675202, pid=0
[INFO] 2024/02/23 15:59:34 (xlb1) removing VIPs.
[INFO] 2024/02/23 15:59:34 (xlb1) removing E-VIPs.

from keepalived.

pqarmitage avatar pqarmitage commented on May 27, 2024

It is kill -TERM {keepalived main process pid} to gracefully shutdown keepalived. Likewise kill -HUP {keepalived main process pid} to reload the configuration. I am not sure what effect it will have sending signals directly to the checker and vrrp processes, but it is not how it is intended to work, and I am not surprised if it is having strange effects.

from keepalived.

CoderSinger avatar CoderSinger commented on May 27, 2024

Thank you, i have no other questions.

from keepalived.

CoderSinger avatar CoderSinger commented on May 27, 2024

Hi, I'm sorry to bother you again due to the same issue. @pqarmitage
I have upgraded the version of keepalived to v2.2.8 and found this problem again. What's different is that the address was not removed from network interface is virtual_ipaddress rather than virtual_ipaddress_excluded. I estimate that they have the same problem.

My operations:

  1. Stop keepalived
  2. write new keepalived.conf
  3. Start keepalived

I'm running keepalived in docker container, which's image version is Alpine 3.17.6. There is no SEGFAULT this time.

Then the old virtual ip address still exists on network interface.
As shown below, I changed the br-iapi‘s ip address from 10.230.93.64/22 to 10.230.93.241/22 and changed the br-net_outband‘s ip address from 191.118.54.44/24 to 191.117.54.43/24, but the old ip addresses still exists on network interfaces.

paas-controller2:/# ip a s br-net_outband
16: br-net_outband: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN group default qlen 1000
    link/ether 06:49:ca:7d:44:49 brd ff:ff:ff:ff:ff:ff
    inet 191.117.54.43/24 scope global br-net_outband
       valid_lft forever preferred_lft forever
    inet 191.118.54.44/24 scope global br-net_outband
       valid_lft forever preferred_lft forever
    inet6 2dfe:ffff:0:f101::14/64 scope global nodad
       valid_lft forever preferred_lft forever
    inet6 2efe:ffff:0:f101::13/64 scope global nodad
       valid_lft forever preferred_lft forever
    inet6 fe80::449:caff:fe7d:4449/64 scope link
       valid_lft forever preferred_lft forever

paas-controller2:/# ip a s br-iapi
18: br-iapi: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN group default qlen 1000
    link/ether 7e:8e:34:84:ab:41 brd ff:ff:ff:ff:ff:ff
    inet 10.230.93.241/22 scope global br-iapi
       valid_lft forever preferred_lft forever
    inet 10.230.93.64/22 scope global secondary br-iapi
       valid_lft forever preferred_lft forever
    inet6 fe80::7c8e:34ff:fe84:ab41/64 scope link
       valid_lft forever preferred_lft forever

paas-controller2:/# cat /dev/shm/lbagent/keepalived.conf
global_defs {
        router_id 101.101.1.134
        vrrp_version 3
        lvs_sync_daemon br-papi xlb1
}
vrrp_instance xlb1 {
        state BACKUP
        interface br-papi
        nopreempt
        virtual_router_id 1
        priority 100
        advert_int 1
        unicast_src_ip 101.101.1.134
        unicast_peer {
                101.101.1.132
        }
        virtual_ipaddress {
                10.230.93.241/22 dev br-iapi
                101.101.1.254/24 dev br-papi
                191.117.54.43/24 dev br-net_outband
                172.31.0.1/16 dev vgh-e372
                192.23.11.13/16 dev br-net_traffic
        }
        virtual_ipaddress_excluded {
                2ffe:ffff:0:f101::1/64 dev br-papi
                2efe:ffff:0:f101::13/64 dev br-net_outband
                1173::1/64 dev vgh-e372
        }
        virtual_routes {
                0.0.0.0/0 via 10.230.92.1 dev br-iapi metric 10
        }
        notify_master "/backup_to_master.sh --ipv4=101.101.1.254 --ipv6=2ffe:ffff:0:f101::1 --action=del"
        notify_backup "/master_to_backup.sh --ipv4=101.101.1.254 --ipv6=2ffe:ffff:0:f101::1 --action=add"
}
virtual_server 191.117.54.43 35554 {
        delay_loop 5
        lb_algo wrr
        lb_kind NAT
... // vsg config was omitted

from keepalived.

pqarmitage avatar pqarmitage commented on May 27, 2024

@codesinger Can you please post the keepalived config from both before the reload and after the reload so that I can be absolutely clear about what changes are being made. I will then test this and make sure I understand what is happening, and I hope provide a fix.

from keepalived.

CoderSinger avatar CoderSinger commented on May 27, 2024

@pqarmitage
Our testing environment here cannot be kept indefinitely, but I have found the same problem in the new testing environment. The VIP on the bond0.133 network port has been changed from 10.230.133.112/24 to 10.230.133.113/24, but there is still 10.230.133.112/24 remaining on the bond0.133 network port, and the VLAN has not been modified this time.

I will post the new and old configurations and logs below. I have noticed that even after restarting with the new configuration, keepalived will still assign the old address to the network port. I am not sure if this is the problem with the "netlink_if_address_filter" function.
It should be noted that due to changes in VIP, the corresponding virtual servers needs to be rebuilt. In our testing environment, this is not a one-time operation, but a continuous process. Therefore, after restarting, there will be multiple reloads after modifying the virtual servers in keepalived.conf.

Network interface infromation:

[root@paas-controller:/home/pict]$ ip a s bond0.133
13: bond0.133@bond0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether fc:2d:5e:66:8d:55 brd ff:ff:ff:ff:ff:ff
    inet 10.230.133.113/24 scope global bond0.133
       valid_lft forever preferred_lft forever
    inet 10.230.133.112/24 scope global secondary bond0.133
       valid_lft forever preferred_lft forever
    inet6 fe80::fe2d:5eff:fe66:8d55/64 scope link
       valid_lft forever preferred_lft forever

Old configuration:

global_defs {
        router_id 193.116.9.31
        vrrp_version 3
        lvs_sync_daemon bond0.209 test
}
vrrp_instance test {
        state BACKUP
        interface bond0.209
        nopreempt
        virtual_router_id 1
        priority 100
        advert_int 1
        unicast_src_ip 193.116.9.31
        unicast_peer {
                1.1.1.1
        }
        virtual_ipaddress {
                10.230.133.112/24 dev bond0.133
                192.23.11.14/16 dev bond0.700
                193.116.9.254/24 dev bond0.209
                172.31.0.1/16 dev vgh-b815
        }
        virtual_ipaddress_excluded {
        }
        virtual_routes {
                0.0.0.0/0 via 10.230.133.254 dev bond0.133 metric 10
        }
        notify_master "/backup_to_master.sh --ipv4=193.116.9.254 --ipv6= --action=del"
        notify_backup "/master_to_backup.sh --ipv4=193.116.9.254 --ipv6= --action=add"
}

New configuration:

global_defs {
        router_id 193.116.9.31
        vrrp_version 3
        lvs_sync_daemon bond0.209 test
}
vrrp_instance test {
        state BACKUP
        interface bond0.209
        nopreempt
        virtual_router_id 1
        priority 100
        advert_int 1
        unicast_src_ip 193.116.9.31
        unicast_peer {
                1.1.1.1
        }
        virtual_ipaddress {
                10.230.133.113/24 dev bond0.133
                192.23.11.14/16 dev bond0.700
                193.116.9.254/24 dev bond0.209
                172.31.0.1/16 dev vgh-b815
        }
        virtual_ipaddress_excluded {
        }
        virtual_routes {
                0.0.0.0/0 via 10.230.133.254 dev bond0.133 metric 10
        }
        notify_master "/backup_to_master.sh --ipv4=193.116.9.254 --ipv6= --action=del"
        notify_backup "/master_to_backup.sh --ipv4=193.116.9.254 --ipv6= --action=add"
}

I think the key logs are:

[INFO] 2024/03/04 07:15:08 (test) sent 0 priority
[INFO] 2024/03/04 07:15:08 (test) removing Virtual Routes
[INFO] 2024/03/04 07:15:08 (test) removing VIPs.
[INFO] 2024/03/04 07:15:09 Stopped - used 3.778129 user time, 0.778837 system time
...
[INFO] 2024/03/04 07:15:09 IPVS cmd IP_VS_SO_SET_STOPDAEMON(1164) error: No such file or directory(2)
[INFO] 2024/03/04 07:15:09 IPVS cmd IP_VS_SO_SET_STARTDAEMON(1163) error: Daemon has already run(17)
[INFO] 2024/03/04 07:15:12 Netlink reflector reports IP 193.116.9.254 removed from bond0.209
[INFO] 2024/03/04 07:15:12 IPVS cmd IP_VS_SO_SET_STOPDAEMON(1164) error: No such file or directory(2)
[INFO] 2024/03/04 07:15:12 IPVS cmd IP_VS_SO_SET_STARTDAEMON(1163) error: Daemon has already run(17)
[INFO] 2024/03/04 07:15:12 (test) Entering BACKUP STATE
[INFO] 2024/03/04 07:15:12 (test) sent 0 priority
[INFO] 2024/03/04 07:15:12 (test) removing Virtual Routes
[INFO] 2024/03/04 07:15:12 Reloading: 0
[INFO] 2024/03/04 07:15:12 Netlink: error: No such process(3), type=RTM_DELROUTE(25), seq=1709502289, pid=0
[INFO] 2024/03/04 07:15:12 (test) removing VIPs.
[INFO] 2024/03/04 07:15:12    Virtual Router ID = 1
[INFO] 2024/03/04 07:15:12    Priority = 100
[INFO] 2024/03/04 07:15:12    Advert interval = 1000 milli-sec
[INFO] 2024/03/04 07:15:12    Accept = enabled
[INFO] 2024/03/04 07:15:12    Preempt = disabled
[INFO] 2024/03/04 07:15:12    Promote_secondaries = disabled
[INFO] 2024/03/04 07:15:12    Virtual IP :
[INFO] 2024/03/04 07:15:12      10.230.133.113/24 dev bond0.133 scope global
[INFO] 2024/03/04 07:15:12      193.116.9.254/24 dev bond0.209 scope global
[INFO] 2024/03/04 07:15:12      172.31.0.1/16 dev vgh-b815 scope global
[INFO] 2024/03/04 07:15:12      192.23.11.13/16 dev bond0.700 scope global
[INFO] 2024/03/04 07:15:12    Unicast TTL = 255
[INFO] 2024/03/04 07:15:12    Check unicast src : no
[INFO] 2024/03/04 07:15:12    Unicast Peer :
[INFO] 2024/03/04 07:15:12      1.1.1.1 min_ttl 0 max_ttl 255
[INFO] 2024/03/04 07:15:12    Unicast checksum compatibility = no
[INFO] 2024/03/04 07:15:12    No sockets allocated
[INFO] 2024/03/04 07:15:12    Virtual Routes :
[INFO] 2024/03/04 07:15:09 IPVS cmd IP_VS_SO_SET_STOPDAEMON(1164) error: No such file or directory(2)
[INFO] 2024/03/04 07:15:09 IPVS cmd IP_VS_SO_SET_STARTDAEMON(1163) error: Daemon has already run(17)
[INFO] 2024/03/04 07:15:12 Netlink reflector reports IP 193.116.9.254 removed from bond0.209
[INFO] 2024/03/04 07:15:12 IPVS cmd IP_VS_SO_SET_STOPDAEMON(1164) error: No such file or directory(2)
[INFO] 2024/03/04 07:15:12 IPVS cmd IP_VS_SO_SET_STARTDAEMON(1163) error: Daemon has already run(17)
[INFO] 2024/03/04 07:15:12 (test) Entering BACKUP STATE
[INFO] 2024/03/04 07:15:12 (test) sent 0 priority
[INFO] 2024/03/04 07:15:12 (test) removing Virtual Routes
[INFO] 2024/03/04 07:15:12 Reloading: 0
[INFO] 2024/03/04 07:15:12 Netlink: error: No such process(3), type=RTM_DELROUTE(25), seq=1709502290, pid=0
[INFO] 2024/03/04 07:15:12 (test) removing VIPs.
[INFO] 2024/03/04 07:15:12      0.0.0.0/0 via inet 10.230.133.254 dev bond0.133 proto 18 metric 10
[INFO] 2024/03/04 07:15:12    Using smtp notification = no
[INFO] 2024/03/04 07:15:12    Notify deleted = Fault
[INFO] 2024/03/04 07:15:12    Backup state transition script = '/master_to_backup.sh' '--ipv4=193.116.9.254' '--ipv6=' '--action=add', uid:gid 0:0
[INFO] 2024/03/04 07:15:12    Master state transition script = '/backup_to_master.sh' '--ipv4=193.116.9.254' '--ipv6=' '--action=del', uid:gid 0:0
[INFO] 2024/03/04 07:15:12    Notify priority changes = false
...
[INFO] 2024/03/04 07:15:12 ------< Interfaces >------
.....//omitted interfaces information
[INFO] 2024/03/04 07:15:12  Name = bond0.133
[INFO] 2024/03/04 07:15:12    index = 13
[INFO] 2024/03/04 07:15:12    IPv4 address = 10.230.133.112
[INFO] 2024/03/04 07:15:12    IPv6 address = fe80::fe2d:5eff:fe66:8d55
[INFO] 2024/03/04 07:15:12    MAC = fc:2d:5e:66:8d:55
[INFO] 2024/03/04 07:15:12    MAC broadcast = ff:ff:ff:ff:ff:ff
[INFO] 2024/03/04 07:15:12    State = UP, RUNNING
[INFO] 2024/03/04 07:15:12    MTU = 1500
[INFO] 2024/03/04 07:15:12    HW Type = ETHERNET
[INFO] 2024/03/04 07:15:12    NIC netlink status update
[INFO] 2024/03/04 07:15:12    Reset ARP config counter 0
[INFO] 2024/03/04 07:15:12    Original arp_ignore 1
[INFO] 2024/03/04 07:15:12    Original arp_filter 0
[INFO] 2024/03/04 07:15:12    Original promote_secondaries 0
[INFO] 2024/03/04 07:15:12    Reset promote_secondaries counter 0
[INFO] 2024/03/04 07:15:12    Tracking VRRP instances :
[INFO] 2024/03/04 07:15:12      test, weight 0
...
[INFO] 2024/03/04 07:15:15 Deassigned address 10.230.133.112 from interface bond0.133
[INFO] 2024/03/04 07:15:16 Assigned address 10.230.133.112 for interface bond0.133

keepalived_vrrp.log.1.gz

from keepalived.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.