Comments (9)
You are showing that keepalived segfaulted, which would explain why some addresses failed to be removed (keepalived crashed before it removed the addresses).
The version of keepalived you are running is quite old (nearly 4 years), and some 800 non-merge commits behind the current version. Can you please try running with v2.2.8 and see if you still experience the problem.
If you still experience the problem with v2.2.8, then a stack backtrace produced from the coredump could help us identify the cause of the problem.
from keepalived.
thanks for your reply! @pqarmitage
May I ask if the issue of deleting the network port( or vlan interface) where VIP is located during the Keepalived runtime causing it to crash has been resolved in the new version? If it has already been resolved, in which commit?
maybe related with this one?#1902 (comment)
from keepalived.
@CoderSinger The answer to your question about whether removing an interface causes a segfault has been fixed is that I do not know, although I suspect it has. Unfortunately I do not have the time to go through all the commits, and I don't remember all the changes that have been made over the last 4 years or so! This is why I asked you to try v2.2.8 to see if it resolves your issue. If the issue still exists with v2.2.8 then I will research it provide a fix.
If you want to see all the relevant commits, you could try executing in a keepalived git tree: git log --oneline v2.1.5..v2.2.8 | grep -v " Merge "
and looking for a likely commit description.
from keepalived.
@pqarmitage Thank you for your reply. I will consider upgrading to the Keepalived version.
Also, I have some other questions. I used keepalived in the Docker container, but because I didn't want to increase the volume of the container image, I didn't use Systemctl to control keepalived. Instead, I directly executed the keepalived executable file and specified parameters to run it. In this case, three processes can be seen through the 'ps' command: keepalived, keepalived checker, and keepalived vrrp.
I have two questions:
- May I ask if using "kill -15 {3pids}" can meet my requirements if I want to gracefully exit the process to clear the IPVS rule? Or is it just "kill -15 {keepalived main process pid}"?
- If I want to reload keepalived to make the new keepalived.conf file enabled, can "kill -HUP {3pids}" meet my requirements? Or is it just "kill -HUP {keepalived main process pid}"?
The reason why I have this question is because I encountered the following issues when using 3pids for reload.
Normal checker logs:
[INFO] 2024/02/23 15:59:34 Shutting down service [1111::18]:tcp:5005 from VS [1efe:ffff:0:f107::18]:tcp:5005
[INFO] 2024/02/23 15:59:34 Shutting down service [192.169.0.24]:tcp:443 from VS [10.227.200.18]:tcp:443
[INFO] 2024/02/23 15:59:34 Shutting down service [192.169.0.40]:tcp:29952 from VS [10.227.200.18]:tcp:29952
[INFO] 2024/02/23 15:59:34 Shutting down service [1111::18]:tcp:18080 from VS [5efe:ffff:0:f101::1]:tcp:18080
[INFO] 2024/02/23 15:59:34 Shutting down service [1111::18]:tcp:29292 from VS [5efe:ffff:0:f101::1]:tcp:29292
Abnormal checker logs:
[INFO] 2024/02/23 15:59:34 Shutting down service [1111::18]:tcp:5005 from VS [1efe:ffff:0:f107::18]:tcp:5005
[INFO] 2024/02/23 15:59:34 IPVS cmd IP_VS_SO_SET_DELDEST(1160) error: No such destination(2)
[INFO] 2024/02/23 15:59:34 IPVS cmd IP_VS_SO_SET_DEL(1156) error: No such file or directory(2)
[INFO] 2024/02/23 15:59:34 Shutting down service [192.169.0.24]:tcp:443 from VS [10.227.200.18]:tcp:443
[INFO] 2024/02/23 15:59:34 IPVS cmd IP_VS_SO_SET_DELDEST(1160) error: No such destination(2)
[INFO] 2024/02/23 15:59:34 IPVS cmd IP_VS_SO_SET_DEL(1156) error: No such file or directory(2)
[INFO] 2024/02/23 15:59:34 Shutting down service [192.169.0.40]:tcp:29952 from VS [10.227.200.18]:tcp:29952
[INFO] 2024/02/23 15:59:34 IPVS cmd IP_VS_SO_SET_DELDEST(1160) error: No such destination(2)
[INFO] 2024/02/23 15:59:34 IPVS cmd IP_VS_SO_SET_DEL(1156) error: No such file or directory(2)
[INFO] 2024/02/23 15:59:34 Shutting down service [1111::18]:tcp:18080 from VS [5efe:ffff:0:f101::1]:tcp:18080
[INFO] 2024/02/23 15:59:34 IPVS cmd IP_VS_SO_SET_DELDEST(1160) error: No such destination(2)
[INFO] 2024/02/23 15:59:34 IPVS cmd IP_VS_SO_SET_DEL(1156) error: No such file or directory(2)
[INFO] 2024/02/23 15:59:34 Shutting down service [1111::18]:tcp:29292 from VS [5efe:ffff:0:f101::1]:tcp:29292
[INFO] 2024/02/23 15:59:34 IPVS cmd IP_VS_SO_SET_DELDEST(1160) error: No such destination(2)
[INFO] 2024/02/23 15:59:34 IPVS cmd IP_VS_SO_SET_DEL(1156) error: No such file or directory(2)
It's like I repeatedly called reload, causing duplicate deletion of service configuration and resulting in an error. At the same time point in the error log, it was observed in the VRRP log that the keepalived process entered the backup state, but in reality, the keepalived process is executing on a single node.
vrrp log:
[INFO] 2024/02/23 15:59:34 IPVS cmd IP_VS_SO_SET_STOPDAEMON(1164) error: No such file or directory(2)
[INFO] 2024/02/23 15:59:34 (xlb1) Entering BACKUP STATE
[INFO] 2024/02/23 15:59:34 (xlb1) sent 0 priority
[INFO] 2024/02/23 15:59:34 (xlb1) removing Virtual Routes
[INFO] 2024/02/23 15:59:34 Netlink: error: No such process(3), type=RTM_DELROUTE(25), seq=1708675201, pid=0
[INFO] 2024/02/23 15:59:34 Netlink: error: No such process(3), type=RTM_DELROUTE(25), seq=1708675202, pid=0
[INFO] 2024/02/23 15:59:34 (xlb1) removing VIPs.
[INFO] 2024/02/23 15:59:34 (xlb1) removing E-VIPs.
from keepalived.
It is kill -TERM {keepalived main process pid}
to gracefully shutdown keepalived. Likewise kill -HUP {keepalived main process pid}
to reload the configuration. I am not sure what effect it will have sending signals directly to the checker and vrrp processes, but it is not how it is intended to work, and I am not surprised if it is having strange effects.
from keepalived.
Thank you, i have no other questions.
from keepalived.
Hi, I'm sorry to bother you again due to the same issue. @pqarmitage
I have upgraded the version of keepalived to v2.2.8 and found this problem again. What's different is that the address was not removed from network interface is virtual_ipaddress rather than virtual_ipaddress_excluded. I estimate that they have the same problem.
My operations:
- Stop keepalived
- write new keepalived.conf
- Start keepalived
I'm running keepalived in docker container, which's image version is Alpine 3.17.6. There is no SEGFAULT this time.
Then the old virtual ip address still exists on network interface.
As shown below, I changed the br-iapi‘s ip address from 10.230.93.64/22 to 10.230.93.241/22 and changed the br-net_outband‘s ip address from 191.118.54.44/24 to 191.117.54.43/24, but the old ip addresses still exists on network interfaces.
paas-controller2:/# ip a s br-net_outband
16: br-net_outband: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN group default qlen 1000
link/ether 06:49:ca:7d:44:49 brd ff:ff:ff:ff:ff:ff
inet 191.117.54.43/24 scope global br-net_outband
valid_lft forever preferred_lft forever
inet 191.118.54.44/24 scope global br-net_outband
valid_lft forever preferred_lft forever
inet6 2dfe:ffff:0:f101::14/64 scope global nodad
valid_lft forever preferred_lft forever
inet6 2efe:ffff:0:f101::13/64 scope global nodad
valid_lft forever preferred_lft forever
inet6 fe80::449:caff:fe7d:4449/64 scope link
valid_lft forever preferred_lft forever
paas-controller2:/# ip a s br-iapi
18: br-iapi: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN group default qlen 1000
link/ether 7e:8e:34:84:ab:41 brd ff:ff:ff:ff:ff:ff
inet 10.230.93.241/22 scope global br-iapi
valid_lft forever preferred_lft forever
inet 10.230.93.64/22 scope global secondary br-iapi
valid_lft forever preferred_lft forever
inet6 fe80::7c8e:34ff:fe84:ab41/64 scope link
valid_lft forever preferred_lft forever
paas-controller2:/# cat /dev/shm/lbagent/keepalived.conf
global_defs {
router_id 101.101.1.134
vrrp_version 3
lvs_sync_daemon br-papi xlb1
}
vrrp_instance xlb1 {
state BACKUP
interface br-papi
nopreempt
virtual_router_id 1
priority 100
advert_int 1
unicast_src_ip 101.101.1.134
unicast_peer {
101.101.1.132
}
virtual_ipaddress {
10.230.93.241/22 dev br-iapi
101.101.1.254/24 dev br-papi
191.117.54.43/24 dev br-net_outband
172.31.0.1/16 dev vgh-e372
192.23.11.13/16 dev br-net_traffic
}
virtual_ipaddress_excluded {
2ffe:ffff:0:f101::1/64 dev br-papi
2efe:ffff:0:f101::13/64 dev br-net_outband
1173::1/64 dev vgh-e372
}
virtual_routes {
0.0.0.0/0 via 10.230.92.1 dev br-iapi metric 10
}
notify_master "/backup_to_master.sh --ipv4=101.101.1.254 --ipv6=2ffe:ffff:0:f101::1 --action=del"
notify_backup "/master_to_backup.sh --ipv4=101.101.1.254 --ipv6=2ffe:ffff:0:f101::1 --action=add"
}
virtual_server 191.117.54.43 35554 {
delay_loop 5
lb_algo wrr
lb_kind NAT
... // vsg config was omitted
from keepalived.
@codesinger Can you please post the keepalived config from both before the reload and after the reload so that I can be absolutely clear about what changes are being made. I will then test this and make sure I understand what is happening, and I hope provide a fix.
from keepalived.
@pqarmitage
Our testing environment here cannot be kept indefinitely, but I have found the same problem in the new testing environment. The VIP on the bond0.133 network port has been changed from 10.230.133.112/24 to 10.230.133.113/24, but there is still 10.230.133.112/24 remaining on the bond0.133 network port, and the VLAN has not been modified this time.
I will post the new and old configurations and logs below. I have noticed that even after restarting with the new configuration, keepalived will still assign the old address to the network port. I am not sure if this is the problem with the "netlink_if_address_filter" function.
It should be noted that due to changes in VIP, the corresponding virtual servers needs to be rebuilt. In our testing environment, this is not a one-time operation, but a continuous process. Therefore, after restarting, there will be multiple reloads after modifying the virtual servers in keepalived.conf.
Network interface infromation:
[root@paas-controller:/home/pict]$ ip a s bond0.133
13: bond0.133@bond0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
link/ether fc:2d:5e:66:8d:55 brd ff:ff:ff:ff:ff:ff
inet 10.230.133.113/24 scope global bond0.133
valid_lft forever preferred_lft forever
inet 10.230.133.112/24 scope global secondary bond0.133
valid_lft forever preferred_lft forever
inet6 fe80::fe2d:5eff:fe66:8d55/64 scope link
valid_lft forever preferred_lft forever
Old configuration:
global_defs {
router_id 193.116.9.31
vrrp_version 3
lvs_sync_daemon bond0.209 test
}
vrrp_instance test {
state BACKUP
interface bond0.209
nopreempt
virtual_router_id 1
priority 100
advert_int 1
unicast_src_ip 193.116.9.31
unicast_peer {
1.1.1.1
}
virtual_ipaddress {
10.230.133.112/24 dev bond0.133
192.23.11.14/16 dev bond0.700
193.116.9.254/24 dev bond0.209
172.31.0.1/16 dev vgh-b815
}
virtual_ipaddress_excluded {
}
virtual_routes {
0.0.0.0/0 via 10.230.133.254 dev bond0.133 metric 10
}
notify_master "/backup_to_master.sh --ipv4=193.116.9.254 --ipv6= --action=del"
notify_backup "/master_to_backup.sh --ipv4=193.116.9.254 --ipv6= --action=add"
}
New configuration:
global_defs {
router_id 193.116.9.31
vrrp_version 3
lvs_sync_daemon bond0.209 test
}
vrrp_instance test {
state BACKUP
interface bond0.209
nopreempt
virtual_router_id 1
priority 100
advert_int 1
unicast_src_ip 193.116.9.31
unicast_peer {
1.1.1.1
}
virtual_ipaddress {
10.230.133.113/24 dev bond0.133
192.23.11.14/16 dev bond0.700
193.116.9.254/24 dev bond0.209
172.31.0.1/16 dev vgh-b815
}
virtual_ipaddress_excluded {
}
virtual_routes {
0.0.0.0/0 via 10.230.133.254 dev bond0.133 metric 10
}
notify_master "/backup_to_master.sh --ipv4=193.116.9.254 --ipv6= --action=del"
notify_backup "/master_to_backup.sh --ipv4=193.116.9.254 --ipv6= --action=add"
}
I think the key logs are:
[INFO] 2024/03/04 07:15:08 (test) sent 0 priority
[INFO] 2024/03/04 07:15:08 (test) removing Virtual Routes
[INFO] 2024/03/04 07:15:08 (test) removing VIPs.
[INFO] 2024/03/04 07:15:09 Stopped - used 3.778129 user time, 0.778837 system time
...
[INFO] 2024/03/04 07:15:09 IPVS cmd IP_VS_SO_SET_STOPDAEMON(1164) error: No such file or directory(2)
[INFO] 2024/03/04 07:15:09 IPVS cmd IP_VS_SO_SET_STARTDAEMON(1163) error: Daemon has already run(17)
[INFO] 2024/03/04 07:15:12 Netlink reflector reports IP 193.116.9.254 removed from bond0.209
[INFO] 2024/03/04 07:15:12 IPVS cmd IP_VS_SO_SET_STOPDAEMON(1164) error: No such file or directory(2)
[INFO] 2024/03/04 07:15:12 IPVS cmd IP_VS_SO_SET_STARTDAEMON(1163) error: Daemon has already run(17)
[INFO] 2024/03/04 07:15:12 (test) Entering BACKUP STATE
[INFO] 2024/03/04 07:15:12 (test) sent 0 priority
[INFO] 2024/03/04 07:15:12 (test) removing Virtual Routes
[INFO] 2024/03/04 07:15:12 Reloading: 0
[INFO] 2024/03/04 07:15:12 Netlink: error: No such process(3), type=RTM_DELROUTE(25), seq=1709502289, pid=0
[INFO] 2024/03/04 07:15:12 (test) removing VIPs.
[INFO] 2024/03/04 07:15:12 Virtual Router ID = 1
[INFO] 2024/03/04 07:15:12 Priority = 100
[INFO] 2024/03/04 07:15:12 Advert interval = 1000 milli-sec
[INFO] 2024/03/04 07:15:12 Accept = enabled
[INFO] 2024/03/04 07:15:12 Preempt = disabled
[INFO] 2024/03/04 07:15:12 Promote_secondaries = disabled
[INFO] 2024/03/04 07:15:12 Virtual IP :
[INFO] 2024/03/04 07:15:12 10.230.133.113/24 dev bond0.133 scope global
[INFO] 2024/03/04 07:15:12 193.116.9.254/24 dev bond0.209 scope global
[INFO] 2024/03/04 07:15:12 172.31.0.1/16 dev vgh-b815 scope global
[INFO] 2024/03/04 07:15:12 192.23.11.13/16 dev bond0.700 scope global
[INFO] 2024/03/04 07:15:12 Unicast TTL = 255
[INFO] 2024/03/04 07:15:12 Check unicast src : no
[INFO] 2024/03/04 07:15:12 Unicast Peer :
[INFO] 2024/03/04 07:15:12 1.1.1.1 min_ttl 0 max_ttl 255
[INFO] 2024/03/04 07:15:12 Unicast checksum compatibility = no
[INFO] 2024/03/04 07:15:12 No sockets allocated
[INFO] 2024/03/04 07:15:12 Virtual Routes :
[INFO] 2024/03/04 07:15:09 IPVS cmd IP_VS_SO_SET_STOPDAEMON(1164) error: No such file or directory(2)
[INFO] 2024/03/04 07:15:09 IPVS cmd IP_VS_SO_SET_STARTDAEMON(1163) error: Daemon has already run(17)
[INFO] 2024/03/04 07:15:12 Netlink reflector reports IP 193.116.9.254 removed from bond0.209
[INFO] 2024/03/04 07:15:12 IPVS cmd IP_VS_SO_SET_STOPDAEMON(1164) error: No such file or directory(2)
[INFO] 2024/03/04 07:15:12 IPVS cmd IP_VS_SO_SET_STARTDAEMON(1163) error: Daemon has already run(17)
[INFO] 2024/03/04 07:15:12 (test) Entering BACKUP STATE
[INFO] 2024/03/04 07:15:12 (test) sent 0 priority
[INFO] 2024/03/04 07:15:12 (test) removing Virtual Routes
[INFO] 2024/03/04 07:15:12 Reloading: 0
[INFO] 2024/03/04 07:15:12 Netlink: error: No such process(3), type=RTM_DELROUTE(25), seq=1709502290, pid=0
[INFO] 2024/03/04 07:15:12 (test) removing VIPs.
[INFO] 2024/03/04 07:15:12 0.0.0.0/0 via inet 10.230.133.254 dev bond0.133 proto 18 metric 10
[INFO] 2024/03/04 07:15:12 Using smtp notification = no
[INFO] 2024/03/04 07:15:12 Notify deleted = Fault
[INFO] 2024/03/04 07:15:12 Backup state transition script = '/master_to_backup.sh' '--ipv4=193.116.9.254' '--ipv6=' '--action=add', uid:gid 0:0
[INFO] 2024/03/04 07:15:12 Master state transition script = '/backup_to_master.sh' '--ipv4=193.116.9.254' '--ipv6=' '--action=del', uid:gid 0:0
[INFO] 2024/03/04 07:15:12 Notify priority changes = false
...
[INFO] 2024/03/04 07:15:12 ------< Interfaces >------
.....//omitted interfaces information
[INFO] 2024/03/04 07:15:12 Name = bond0.133
[INFO] 2024/03/04 07:15:12 index = 13
[INFO] 2024/03/04 07:15:12 IPv4 address = 10.230.133.112
[INFO] 2024/03/04 07:15:12 IPv6 address = fe80::fe2d:5eff:fe66:8d55
[INFO] 2024/03/04 07:15:12 MAC = fc:2d:5e:66:8d:55
[INFO] 2024/03/04 07:15:12 MAC broadcast = ff:ff:ff:ff:ff:ff
[INFO] 2024/03/04 07:15:12 State = UP, RUNNING
[INFO] 2024/03/04 07:15:12 MTU = 1500
[INFO] 2024/03/04 07:15:12 HW Type = ETHERNET
[INFO] 2024/03/04 07:15:12 NIC netlink status update
[INFO] 2024/03/04 07:15:12 Reset ARP config counter 0
[INFO] 2024/03/04 07:15:12 Original arp_ignore 1
[INFO] 2024/03/04 07:15:12 Original arp_filter 0
[INFO] 2024/03/04 07:15:12 Original promote_secondaries 0
[INFO] 2024/03/04 07:15:12 Reset promote_secondaries counter 0
[INFO] 2024/03/04 07:15:12 Tracking VRRP instances :
[INFO] 2024/03/04 07:15:12 test, weight 0
...
[INFO] 2024/03/04 07:15:15 Deassigned address 10.230.133.112 from interface bond0.133
[INFO] 2024/03/04 07:15:16 Assigned address 10.230.133.112 for interface bond0.133
from keepalived.
Related Issues (20)
- FIFO process seems to be killed prematurely before stop command terminates HOT 5
- keepalived on Wi-Fi - network delay HOT 4
- After restart NIC, keepalived can not become master state HOT 6
- What is the correct way to disable preempt for keepalived HOT 4
- vrrp_script; Cannot find script docker in path - disabling HOT 3
- keepalived Docker Image Build Failed HOT 8
- unable to recover from split brain problem HOT 6
- Configure virtual server only on master but not on backups HOT 5
- v2.2.8: nopreempt is configured, the notify_xxx method will not be triggered. HOT 2
- One-off symlink resolution causes fragile setups on NixOS HOT 6
- Add documentation for notify script option HOT 1
- restart keepalived not work , systemd: Can't open PID file /etc/keepalived/logs/keepalived.pid (yet?) after start: No such file or directory HOT 1
- restart keepalived donot work, Can't open PID file /var/run/keepalived.pid (yet?) after start: No such file or directory HOT 1
- Need to know why both VMs became MASTER instead of a clean failover HOT 6
- Keepalived High Availability Issue: Both Nodes Selected as Master in RHOSP 16.2 Setup HOT 3
- keepalived standby not receiving advertise packets HOT 1
- keepalived can't call notify_master script timely HOT 4
- Last status of misc check sometimes not updated in keepalived_check.data HOT 5
- Do not go to FAULT state when announce link is down HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from keepalived.