Comments (6)
Can you please provide the output of ip addr show eth0
on both systems.
If this doesn't help identify the cause of the problem I'll provide details of how to enable the various debug options within keepalived.
from keepalived.
Sorry for late answer, recently we had some problems with reproducing the issue.
The below same issue with slightly different config then previously attached - like double initial master with same priority, which I know is anti-pattern, however the keepalived, as far as I tested, locally was able to recover from such wrong config. Moreover the number of reload was forced to be bigger then previous run.
The full logs from system run:
$hostname0: ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
3: eth0@if22956: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1400 qdisc noqueue state UP group default
link/ether 0a:58:0a:81:02:64 brd ff:ff:ff:ff:ff:ff link-netnsid 0
inet 10.129.2.100/23 brd 10.129.3.255 scope global eth0
valid_lft forever preferred_lft forever
inet6 fd01:0:0:3::598b/64 scope global
valid_lft forever preferred_lft forever
inet6 fe80::858:aff:fe81:264/64 scope link
valid_lft forever preferred_lft forever
4: net1@if4: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default
link/ether 2a:90:6d:9a:a4:1b brd ff:ff:ff:ff:ff:ff link-netnsid 0
inet 172.168.1.36/24 brd 172.168.1.255 scope global net1
valid_lft forever preferred_lft forever
inet 10.10.10.2/24 scope global net1
valid_lft forever preferred_lft forever
inet6 fe80::2890:6dff:fe9a:a41b/64 scope link
valid_lft forever preferred_lft forever
5: net2@if4: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default
link/ether 6e:21:6f:cf:da:91 brd ff:ff:ff:ff:ff:ff link-netnsid 0
inet 172.168.0.4/24 scope global net2
valid_lft forever preferred_lft forever
inet 192.168.120.2/24 scope global net2
valid_lft forever preferred_lft forever
inet6 fe80::6c21:6fff:fecf:da91/64 scope link
valid_lft forever preferred_lft forever
$hostname0: ss
Netid State Recv-Q Send-Q Local Address:Port Peer Address:PortProcess
??? UNCONN 0 0 0.0.0.0%eth0:vrrp 0.0.0.0:*
??? UNCONN 0 0 10.129.2.100%eth0:vrrp 0.0.0.0:*
$hostname0: cat /tmp/keepalived.conf
global_defs {
vrrp_startup_delay 10.0
vrrp_garp_interval 0.001
vrrp_version 3
vrrp_garp_master_refresh 30
vrrp_garp_lower_prio_repeat 5
vrrp_higher_prio_send_advert true
script_user root root
notify_fifo /tmp/notify_fifo
notify_fifo_script /tmp/notify.sh
}
vrrp_script check_masterability {
script "/cmds -run check-master"
interval 1
timeout 1
rise 1
fall 1
}
vrrp_script check_masterability_on_active {
script "/cmds -run check-master-on-active"
interval 1
timeout 1
rise 2
fall 3
}
track_file drop_master {
file "/config/drop_master"
weight 0
init_file 0
}
vrrp_instance VI_1 {
advert_int 0.4
interface eth0
state MASTER
unicast_src_ip 10.129.2.100
unicast_peer {
10.131.0.83
}
virtual_router_id 1
priority 255
virtual_ipaddress {
192.168.120.2/24 dev net2
10.10.10.2/24 dev net1
}
virtual_routes {
}
track_script {
check_masterability
check_masterability_on_active
}
track_interface {
net1
net2
}
track_file {
drop_master
}
notify_master "/cmds -run on-master"
}
$hostname0: tcpdump proto 112
dropped privs to tcpdump
tcpdump: verbose output suppressed, use -v[v]... for full protocol decode
listening on eth0, link-type EN10MB (Ethernet), snapshot length 262144 bytes
07:31:50.435733 IP 10.131.0.83 > svc-tcp-service-01-0: VRRPv3, Advertisement, (ttl 254), vrid 1, prio 255, intvl 40cs, length 16
07:31:50.531809 IP svc-tcp-service-01-0 > 10.131.0.83: VRRPv3, Advertisement, vrid 1, prio 255, intvl 40cs, length 16
07:31:50.835950 IP 10.131.0.83 > svc-tcp-service-01-0: VRRPv3, Advertisement, (ttl 254), vrid 1, prio 255, intvl 40cs, length 16
07:31:50.931995 IP svc-tcp-service-01-0 > 10.131.0.83: VRRPv3, Advertisement, vrid 1, prio 255, intvl 40cs, length 16
07:31:51.236236 IP 10.131.0.83 > svc-tcp-service-01-0: VRRPv3, Advertisement, (ttl 254), vrid 1, prio 255, intvl 40cs, length
$hostname1: ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
3: eth0@if18234: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1400 qdisc noqueue state UP group default
link/ether 0a:58:0a:83:00:53 brd ff:ff:ff:ff:ff:ff link-netnsid 0
inet 10.131.0.83/23 brd 10.131.1.255 scope global eth0
valid_lft forever preferred_lft forever
inet6 fd01:0:0:5::4714/64 scope global
valid_lft forever preferred_lft forever
inet6 fe80::858:aff:fe83:53/64 scope link
valid_lft forever preferred_lft forever
4: net1@if4: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default
link/ether d6:cc:d7:68:3a:f3 brd ff:ff:ff:ff:ff:ff link-netnsid 0
inet 172.168.1.48/24 brd 172.168.1.255 scope global net1
valid_lft forever preferred_lft forever
inet 10.10.10.2/24 scope global net1
valid_lft forever preferred_lft forever
inet6 fe80::d4cc:d7ff:fe68:3af3/64 scope link
valid_lft forever preferred_lft forever
5: net2@if4: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default
link/ether 96:6c:04:e2:d5:28 brd ff:ff:ff:ff:ff:ff link-netnsid 0
inet 172.168.0.3/24 scope global net2
valid_lft forever preferred_lft forever
inet 192.168.120.2/24 scope global net2
valid_lft forever preferred_lft forever
inet6 fe80::946c:4ff:fee2:d528/64 scope link
valid_lft forever preferred_lft forever
$hostname1: ss
Netid State Recv-Q Send-Q Local Address:Port Peer Address:PortProcess
??? UNCONN 0 0 0.0.0.0%eth0:vrrp 0.0.0.0:*
??? UNCONN 0 0 10.131.0.83%eth0:vrrp 0.0.0.0:*
$hostname1: cat /tmp/keepalived.conf
global_defs {
vrrp_startup_delay 10.0
vrrp_garp_interval 0.001
vrrp_version 3
vrrp_garp_master_refresh 30
vrrp_garp_lower_prio_repeat 5
vrrp_higher_prio_send_advert true
script_user root root
notify_fifo /tmp/notify_fifo
notify_fifo_script /tmp/notify.sh
}
vrrp_script check_masterability {
script "/cmds -run check-master"
interval 1
timeout 1
rise 1
fall 1
}
vrrp_script check_masterability_on_active {
script "/cmds -run check-master-on-active"
interval 1
timeout 1
rise 2
fall 3
}
track_file drop_master {
file "/config/drop_master"
weight 0
init_file 0
}
vrrp_instance VI_1 {
advert_int 0.4
interface eth0
state MASTER
unicast_src_ip 10.131.0.83
unicast_peer {
10.129.2.100
}
virtual_router_id 1
priority 255
virtual_ipaddress {
192.168.120.2/24 dev net2
10.10.10.2/24 dev net1
}
virtual_routes {
}
track_script {
check_masterability
check_masterability_on_active
}
track_interface {
net1
net2
}
track_file {
drop_master
}
notify_master "/cmds -run on-master"
}
$hostname1: tcpdump proto 112
dropped privs to tcpdump
tcpdump: verbose output suppressed, use -v[v]... for full protocol decode
listening on eth0, link-type EN10MB (Ethernet), snapshot length 262144 bytes
07:31:12.814505 IP svc-tcp-service-01-1 > 10.129.2.100: VRRPv3, Advertisement, vrid 1, prio 255, intvl 40cs, length 16
07:31:12.910542 IP 10.129.2.100 > svc-tcp-service-01-1: VRRPv3, Advertisement, (ttl 254), vrid 1, prio 255, intvl 40cs, length 16
07:31:13.214688 IP svc-tcp-service-01-1 > 10.129.2.100: VRRPv3, Advertisement, vrid 1, prio 255, intvl 40cs, length 16
07:31:13.310866 IP 10.129.2.100 > svc-tcp-service-01-1: VRRPv3, Advertisement, (ttl 254), vrid 1, prio 255, intvl 40cs, length 16
07:31:13.615045 IP svc-tcp-service-01-1 > 10.129.2.100: VRRPv3, Advertisement, vrid 1, prio 255, intvl 40cs, length 16
07:31:13.711201 IP 10.129.2.100 > svc-tcp-service-01-1: VRRPv3, Advertisement, (ttl 254), vrid 1, prio 255, intvl 40cs, length 16
07:31:14.015266 IP svc-tcp-service-01-1 > 10.129.2.100: VRRPv3, Advertisement, vrid 1, prio 255, intvl 40cs, length 16
I even checked with strace and it seems that is processes:
strace: Process 89 attached
sendmsg(14, {msg_name={sa_family=AF_INET, sin_port=htons(0), sin_addr=inet_addr("10.131.0.83")}, msg_namelen=16, msg_iov=[{iov_base="E\300\0$\17;\0\0\377p\0\0\n\201\2d\n\203\0S1\1\377\2\0(j\341\300\250x\2"..., iov_len=36}], msg_iovlen=1, msg_controllen=0, msg_flags=0}, 0) = 36
recvmsg(13, {msg_name={sa_family=AF_INET, sin_port=htons(0), sin_addr=inet_addr("10.131.0.83")}, msg_namelen=28 => 16, msg_iov=[{iov_base="E\300\0$\17<\0\0\376p\224\263\n\203\0S\n\201\2d1\1\377\2\0(j\341\300\250x\2"..., iov_len=1400}], msg_iovlen=1, msg_controllen=0, msg_flags=0}, MSG_CTRUNC|MSG_TRUNC) = 36
recvmsg(13, {msg_namelen=16}, MSG_CTRUNC|MSG_TRUNC) = -1 EAGAIN (Resource temporarily unavailable)
sendmsg(14, {msg_name={sa_family=AF_INET, sin_port=htons(0), sin_addr=inet_addr("10.131.0.83")}, msg_namelen=16, msg_iov=[{iov_base="E\300\0$\17<\0\0\377p\0\0\n\201\2d\n\203\0S1\1\377\2\0(j\341\300\250x\2"..., iov_len=36}], msg_iovlen=1, msg_controllen=0, msg_flags=0}, 0) = 36
recvmsg(13, {msg_name={sa_family=AF_INET, sin_port=htons(0), sin_addr=inet_addr("10.131.0.83")}, msg_namelen=28 => 16, msg_iov=[{iov_base="E\300\0$\17=\0\0\376p\224\262\n\203\0S\n\201\2d1\1\377\2\0(j\341\300\250x\2"..., iov_len=1400}], msg_iovlen=1, msg_controllen=0, msg_flags=0}, MSG_CTRUNC|MSG_TRUNC) = 36
strace: Process 87 attached
recvmsg(13, {msg_name={sa_family=AF_INET, sin_port=htons(0), sin_addr=inet_addr("10.129.2.100")}, msg_namelen=28 => 16, msg_iov=[{iov_base="E\300\0$\16r\0\0\376p\225}\n\201\2d\n\203\0S1\1\377\2\0(j\341\300\250x\2"..., iov_len=1400}], msg_iovlen=1, msg_controllen=0, msg_flags=0}, MSG_CTRUNC|MSG_TRUNC) = 36
recvmsg(13, {msg_namelen=16}, MSG_CTRUNC|MSG_TRUNC) = -1 EAGAIN (Resource temporarily unavailable)
sendmsg(14, {msg_name={sa_family=AF_INET, sin_port=htons(0), sin_addr=inet_addr("10.129.2.100")}, msg_namelen=16, msg_iov=[{iov_base="E\300\0$\16s\0\0\377p\0\0\n\203\0S\n\201\2d1\1\377\2\0(j\341\300\250x\2"..., iov_len=36}], msg_iovlen=1, msg_controllen=0, msg_flags=0}, 0) = 36
recvmsg(13, {msg_name={sa_family=AF_INET, sin_port=htons(0), sin_addr=inet_addr("10.129.2.100")}, msg_namelen=28 => 16, msg_iov=[{iov_base="E\300\0$\16s\0\0\376p\225|\n\201\2d\n\203\0S1\1\377\2\0(j\341\300\250x\2"..., iov_len=1400}], msg_iovlen=1, msg_controllen=0, msg_flags=0}, MSG_CTRUNC|MSG_TRUNC) = 36
recvmsg(13, {msg_namelen=16}, MSG_CTRUNC|MSG_TRUNC) = -1 EAGAIN (Resource temporarily unavailable)
sendmsg(14, {msg_name={sa_family=AF_INET, sin_port=htons(0), sin_addr=inet_addr("10.129.2.100")}, msg_namelen=16, msg_iov=[{iov_base="E\300\0$\16t\0\0\377p\0\0\n\203\0S\n\201\2d1\1\377\2\0(j\341\300\250x\2"..., iov_len=36}], msg_iovlen=1, msg_controllen=0, msg_flags=0}, 0) = 36
The keepalived logs are available at:
https://gist.github.com/stanluk/cc828b1f99a2f4734f609501eaa8c4ab
from keepalived.
Is there any progress on this issue? I am also encountering the same problem in my Kubernetes cluster
from keepalived.
I think this is probably caused by reloading keepalived before the vrrp_startup_delay has expired. Looking in vrrp_dispatcher_read() in vrrp_scheduler.c, there are the following lines of code:
if (vrrp_delayed_start_time.tv_sec)
continue;
which means that any packet received before the start delay timer expires is discarded. However when the restart occurs before the delay timer expires, the timer thread to cancel the timer is removed, and so the timer never expires.
I will continue investigating, and submit a patch later today.
from keepalived.
I was able to reproduce this problem, and it was indeed caused by reloading keepalived before the startup_delay timer had expired.
Commit 58483b2 resolves this issue. Many apologies for the long delay in resolving this, but I hadn't previously realised the significance of the startup delay.
from keepalived.
@pqarmitage thanks for investigating this and providing a patch!
from keepalived.
Related Issues (20)
- Add documentation for notify script option HOT 1
- restart keepalived not work , systemd: Can't open PID file /etc/keepalived/logs/keepalived.pid (yet?) after start: No such file or directory HOT 1
- restart keepalived donot work, Can't open PID file /var/run/keepalived.pid (yet?) after start: No such file or directory HOT 1
- Need to know why both VMs became MASTER instead of a clean failover HOT 6
- Keepalived High Availability Issue: Both Nodes Selected as Master in RHOSP 16.2 Setup HOT 3
- keepalived standby not receiving advertise packets HOT 1
- keepalived can't call notify_master script timely HOT 4
- Last status of misc check sometimes not updated in keepalived_check.data HOT 5
- Do not go to FAULT state when announce link is down HOT 1
- TCP disconnection upon master switchover HOT 3
- SLAVE unknown state HOT 2
- 2.3.0 crashes if using /etc/keepalived/keepalived.conf HOT 5
- Code analysis: Buffer overflow at memcpy v2.2.8 libipvs.c HOT 1
- Code analysis: deref after free in ipwrapper.c HOT 1
- ip_total_len and received length mismatch due to padding HOT 8
- Possible bad copy paste in track_process.c HOT 2
- Keepalived claims that blackhole route doesn't have interface and can't be tracked HOT 3
- I can not interrupt the gdb debugger by entering ctrl+c when it has attached a keepalived_vrrp process HOT 1
- Issue with interface ip and vip subnet HOT 4
- the backupser log is always print Entering MASTER STATE/Entering BACKUP STATE HOT 5
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from keepalived.