Comments (15)
Thanks, taking a look.
update 1: Cannot reproduce on Ubuntu 22.04 host with 6.2 kernel running the test in an LXD sid container nor in a bookworm container.
from dqlite.
Aha, I can reproduce if I block my container from accessing the outside world. my container isn't assigned an IPv4 address (which is fortunately an issue I ran into by accident, but allows to reproduce this problem). If I disable my host firewall my LXD container is assigned an IPv4 address and the tests succeed. Not sure if this gives you something useful to work with ... don't know yet if this is a problem on our side.
from dqlite.
Ah, that's interesting, and reflects my setup. The containers I spin up for packaging work are on an IPv6-only network segment, although they do have both IPv4 and IPv6 addresses setup on the loopback interface. My bookworm machine has both addresses, and the QEMU VM only has IPv4 due to the simple NAT I configured.
I haven't looked into how the tests are trying to bind to addresses -- are they assuming there's an IPv4 address they can use on some "real" (non-loopback) interface?
from dqlite.
@gibmat Thanks for the report. Just for some context -- which release of libraft are you upgrading from, and do the same tests pass in the same containerized environment when run against that earlier version?
It's entirely possible that we have some IPv4-related implicit assumptions in our networking code; I'm looking into what the specific problem might be.
from dqlite.
So at least some of the failures here are caused by failing to resolve 127.0.0.1:9001
, which is hardcoded for the tests here:
https://github.com/canonical/raft/blob/b68076fdc7d34a901cc2eb5eb480832c709b2532/test/lib/uv.h#L51
We could conceivably make it possible to run the tests using a different address/port.
(At least, I was able to reproduce the same pattern of test failures by doing ip addr del
on the assigned IPv4 address inside a LXD container and running the test suite there; and I traced some of the test failures in that case to failing to a failed call to getaddrinfo against 127.0.0.1:9001
inside uvIpResolveBindAddresses.)
More generally, the code that implements the libuv TCP transport for raft is not IPv6-capable right now.
from dqlite.
The reason that 127.0.0.1
can't be resolved here despite the existence of the loopback interface is that we pass the AI_ADDRCONFIG
flag to getaddrinfo.
from dqlite.
When trying to build the raft 0.18.0 for Fedora 37 or 38 via namespaced mock buildroot I'm facing the same behaviour as @gibmat. The tests remain stuck at:
address=localhost:9000, bind-address=localhost:9000 [ ERROR ]
Error: test/integration/test_uv_tcp_listen.c:211: assertion failed: rv == 0 (18 == 0)
Error: child killed by signal 6 (Aborted)
address=localhost:9000, bind-address=:9000 [ ERROR ]
Error: test/integration/test_uv_tcp_listen.c:211: assertion failed: rv == 0 (18 == 0)
Error: child killed by signal 6 (Aborted)
address=localhost:9000, bind-address=0.0.0.0:9000 [ ERROR ]
Error: test/integration/test_uv_tcp_listen.c:211: assertion failed: rv == 0 (18 == 0)
Error: child killed by signal 6 (Aborted)
tcp_connect/closeDuringSecondConnect [ OK ] [ 0.00023986 / 0.00029068 CPU ]
tcp_connect/closeDuringConnectAbort
The build environment only has a loopback device:
<mock-chroot> sh-5.2# ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
The exact same build works fine for raft 0.17.1.
from dqlite.
Thanks for the report @ganto. I have an in-progress PR to rework how we call getaddrinfo in uv_ip.c that will hopefully enable running the test suite smoothly when there is no non-loopback IPv4 address available.
from dqlite.
The exact same build works fine for raft 0.17.1.
Can you confirm this please, because I can reproduce this behavior from 0.16.0
(incl.) onwards.
from dqlite.
Summary of where I think the problems are and how to approach fixing them:
-
There are two calls to getaddrinfo in the main codebase (note that uv_getaddrinfo just calls getaddrinfo on the thread pool):
https://github.com/canonical/raft/blob/b68076fdc7d34a901cc2eb5eb480832c709b2532/src/uv_ip.c#L57-L85
https://github.com/canonical/raft/blob/b68076fdc7d34a901cc2eb5eb480832c709b2532/src/uv_tcp_connect.c#L293-L294The first one is used to implement UvTcpListen, the second to implement UvTcpConnect.
-
In both cases, we pass the
AI_ADDRCONFIG
flag. The upshot of this is that on a system with no configured non-loopback IPv4 addresses, no IPv4 addresses will be returned from getaddrinfo. Because we pass AF_INET to getaddrinfo, no addresses will be returned at all in this case. This explains many of the test failures. -
I am hesitant to remove
AI_ADDRCONFIG
from the getaddrinfo calls, because I don't fully understand the implications for downstream. Based on the man page it sounds like without this flag "bogus" addrinfos will be returned in some cases, and then raft would fail (fatally) trying to bind to those addresses. -
Instead, I would propose that we upgrade the code in uv_ip.c to support parsing IPv6 + port pairs, so that people who want to run the tests can adapt the ones that rely on the network to listen on
[::1]:9001
(etc.) if that's appropriate. We already have parsing code in dqlite that can be adapted for this. -
Of course, we also need to pass something other than AF_INET to getaddrinfo for this to work. Just using AF_UNSPEC runs into problems because on a system that does have an IPv4 address configured, something like
getaddrinfo("::1", "9001", AF_UNSPEC)
will return both IPv4 and IPv6 addresses that conflict (EADDRINUSE), and on the server side we try to bind to every returned addrinfo. (Which is a design decision that we could revisit.) -
Instead, we could pass either AF_INET or AF_INET6 to getaddrinfo depending on whether the pre-split address string uses the IPv6-specific syntax
[addr]:port
. This is backwards-compatible in the sense that both hostnames and IPv4 address strings passed into raft will behave just like they did before. We can continue using AI_V4MAPPED which I think should help with binding/connecting to IPv6 addresses on systems that only have IPv4 set up. -
The test TCP server code in test/lib/tcp.c will also need updating along the lines I've suggested.
-
I'm not sure what to do about cases where we end up calling
getaddrinfo(NULL)
. We could continue passing AF_INET in this case, which would be backwards-compatible, but then some of our tests will continue to fail. Alternatively, we could use AF_UNSPEC, and then we'd have to work out some alternative to binding to every single returned addrinfo. (Special-casing this makes some sense to me anyway.) I have not tested whether using AF_INET6 + AI_V4MAPPED here is a workable solution.
from dqlite.
As for patching the tests to work on an IPv6-only host (once the other problems in the main codebase have been fixed), it should be mostly a matter of replacing 127.0.0.1
by [::1]
in various places; I don't have a comprehensive patch for that yet.
from dqlite.
Can you confirm this please, because I can reproduce this behavior from
0.16.0
(incl.) onwards.
Tested again on a Fedora 39 build root. 0.17.1
can still be built and tested fine. However because of canonical/raft#263 I'm still using this patch for that build and that I haven't adjusted it for 0.18.0 yet.
If I remove this patch the tests indeed also fail with 0.17.1
.
from dqlite.
I tried to update dqlite's packaging in Debian to v1.16.4, but am encountering this same issue when trying to build with the now-bundled raft source. tcp_connect/closeDuringConnectAbort
still hangs, and there are several additional test failures (due to output size, I'm not including the various failed tests):
180 of 233 (77%) tests successful, 11 (5%) test skipped.
FAIL raft-uv-integration-test (exit status: 1)
Trying to build with the current cowsql/raft fork results in an error:
libtool: link: gcc -std=c11 -g3 -fcf-protection --param=ssp-buffer-size=4 -pipe -fno-strict-aliasing -fdiagnostics-color -fexceptions -fstack-clash-protection -fstack-protector-strong -fasynchronous-unwind-tables -fdiagnostics-show-option -Wall -Wextra -Wimplicit-fallthrough=5 -Wcast-align -Wstrict-prototypes -Wlogical-op -Wmissing-include-dirs -Wold-style-definition -Winit-self -Wfloat-equal -Wsuggest-attribute=noreturn -Wformat=2 -Wshadow -Wendif-labels -Wdate-time -Wnested-externs -Wconversion -Werror -O2 -Wno-conversion -g -O2 -ffile-prefix-map=/build/dqlite-1.16.4=. -fstack-protector-strong -fstack-clash-protection -Wformat -Werror=format-security -fcf-protection -Wl,-z -Wl,relro -Wl,-z -Wl,now -o integration-test test/integration/integration_test-test_client.o test/integration/integration_test-test_cluster.o test/integration/integration_test-test_fsm.o test/integration/integration_test-test_membership.o test/integration/integration_test-test_node.o test/integration/integration_test-test_role_management.o test/integration/integration_test-test_server.o test/integration/integration_test-test_vfs.o test/integration/integration_test-main.o ./.libs/libtest.a -luv -ldl -lrt -lpthread -lsqlite3 -lraft ./.libs/libdqlite.so -pthread -Wl,-rpath -Wl,/build/dqlite-1.16.4/.libs
/usr/bin/ld: ./.libs/libdqlite.so: undefined reference to `raft_register_state_cb'
collect2: error: ld returned 1 exit status
Possibly relevant versions: Debian sid LXD container with Debian kernel 6.1.0-18-amd64; libsqlite 3.45.1; libuv 1.48.0; liblz4 1.9.4.
from dqlite.
I tried to update dqlite's packaging in Debian to v1.16.4, but am encountering this same issue when trying to build with the now-bundled raft source.
tcp_connect/closeDuringConnectAbort
still hangs, and there are several additional test failures (due to output size, I'm not including the various failed tests):180 of 233 (77%) tests successful, 11 (5%) test skipped. FAIL raft-uv-integration-test (exit status: 1)
This could be fixed by applying a change like in cowsql/raft#79, see:
in particular.
Trying to build with the current cowsql/raft fork results in an error:
libtool: link: gcc -std=c11 -g3 -fcf-protection --param=ssp-buffer-size=4 -pipe -fno-strict-aliasing -fdiagnostics-color -fexceptions -fstack-clash-protection -fstack-protector-strong -fasynchronous-unwind-tables -fdiagnostics-show-option -Wall -Wextra -Wimplicit-fallthrough=5 -Wcast-align -Wstrict-prototypes -Wlogical-op -Wmissing-include-dirs -Wold-style-definition -Winit-self -Wfloat-equal -Wsuggest-attribute=noreturn -Wformat=2 -Wshadow -Wendif-labels -Wdate-time -Wnested-externs -Wconversion -Werror -O2 -Wno-conversion -g -O2 -ffile-prefix-map=/build/dqlite-1.16.4=. -fstack-protector-strong -fstack-clash-protection -Wformat -Werror=format-security -fcf-protection -Wl,-z -Wl,relro -Wl,-z -Wl,now -o integration-test test/integration/integration_test-test_client.o test/integration/integration_test-test_cluster.o test/integration/integration_test-test_fsm.o test/integration/integration_test-test_membership.o test/integration/integration_test-test_node.o test/integration/integration_test-test_role_management.o test/integration/integration_test-test_server.o test/integration/integration_test-test_vfs.o test/integration/integration_test-main.o ./.libs/libtest.a -luv -ldl -lrt -lpthread -lsqlite3 -lraft ./.libs/libdqlite.so -pthread -Wl,-rpath -Wl,/build/dqlite-1.16.4/.libs /usr/bin/ld: ./.libs/libdqlite.so: undefined reference to `raft_register_state_cb' collect2: error: ld returned 1 exit status
This is a known issue, and wouldn't be hard to fix, but since dqlite is now shipping with an embedded raft source code it's probably not worth it, unless somebody really needs that.
from dqlite.
Thanks!
from dqlite.
Related Issues (20)
- Fix Coverity
- Use execinfo.h instead of libbacktrace HOT 3
- Set up tmate action
- Set up a canary PPA to get feedback from LP builders before merge HOT 2
- no known leader error forever
- Node.js bindings HOT 1
- Fix cases where the server can return a FAILURE message with result code 0
- Investigate what causes LXD's dqlite to create lots of singleton closed segments
- Flaky test: cluster/restart HOT 5
- 1.16.4: build fails on linking with missing `raft_register_state_cb` symbol HOT 1
- posix_fallocate extremely slow on ext3 file system HOT 1
- 1.16.5: build fails with `-Wimplicit-function-declaration` HOT 9
- Documentation links are broken HOT 3
- dqlite requires a large stack HOT 3
- MicroOVN built with latest Dqlite crashes on cluster formation. HOT 5
- Investigate CPU spikes when performing role adjustment
- Term increment is skipped in one-node clusters
- Tracking issue for improved observability
- Recursion in EXEC_SQL request handling
- build_mac & build_windows
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from dqlite.