Comments (29)
Maybe it is related: - sometimes relay crashes.
from firezone.
I managed to reproduce this on Windows. Still trying for Android. I'm going to mark this issue for Windows for now.
Steps to reproduce:
- Start Firezone
- Sign in, start a ping
ping -t github.com
- Switch from Eth to Wifi
Logs when doing this on my test laptop:
from firezone.
Thanks for the report @gleam-ru. Just to confirm, is this happening with the cloud-managed version or control plane infrastructure you are self-hosting?
from firezone.
self-hosting.
from firezone.
Ah I see. I don't think we'll be able to reproduce this unfortunately. The error seems to be this:
{"time":"2024-06-13T12:33:24.693301933Z","target":"snownet::node","logging.googleapis.com/sourceLocation":{"file":"connlib/snownet/src/node.rs","line":"373"},"severity":"WARNING","message":"No channel","peer":"158.160.140.10:55582"}
That could be for a number of reasons, most likely related to your control plane configuration / reachability. Connection roaming (what happens when your network interfaces change) is a quite a complex process that we've done extensive development on, but only able to guarantee it works reliably for our infra deployed in GCP. If you're also on GCP you could use our terraform modules as inspiration.
We'll do some more testing to be extra sure it's not a client issue.
from firezone.
Couldn't replicate on the Windows Client 1.0.7 on the Windows laptop. I turned the Wi-Fi off for maybe 10 seconds and pinging speed.cloudflare.com
on staging worked both before and after.
from firezone.
"message":"No channel"
What does this message mean?
And why it reproduces only on windows/android and not reproduces on macos/headless? :)
Please, help me with localization of issue... I don't undesrtand where could be a problem - relay? gateway? api? configuration of server-VM?
from firezone.
The No channel
message refers to TURN channels and would indeed be related to the Relay. Thanks for posting the backtrace. We haven't seen that in production, but will tag @thomaseizinger here in case it's a bug.
from firezone.
If it could be relay-related problem, here is some more info/logs:
I have about 1 "refresh failed: Unauthorized" message every ~5sec.
If I can provide more info - feel free to ask :)
from firezone.
UPD:
1 - I suppose this env vars should start with FEATURE_* https://github.com/firezone/firezone/blob/main/docker-compose.yml#L102
After editing envs I was able to see "relays" tab in sidebar.
I deleted default (from seeds) relays and created the new self-hosted one.
BUT!
2 - Stacktrace (shown earlier) still reproduces.
3 - I had a new warn-logs just before crash of relay:
from firezone.
I have about 1 "refresh failed: Unauthorized" message every ~5sec.
That means your portal is sending invalid credentials for the relays to the clients. A few "Unauthorized" are expected as part of the initial handshake but not every 5s!
Will take a look at the stacktrace, it definitely shouldn't crash!
from firezone.
your portal is sending invalid credentials for the relays
Web and api are under the nginx reverse proxy, if matters.
from firezone.
"message":"No channel"
What does this message mean?
I means we were not able to hole-punch to the gateway and thus started relaying data via the relay, yet at the same time, we didn't make a channel binding for that connection and thus don't have a channel to actually send data from.
Couple of theories here but since you are self-hosted, I'd assume that you are not configuring the relay correctly with its public IP addresses.
from firezone.
crash of relay:
This is a "only" a debug assertion btw, it appears you are not running production builds.
from firezone.
you are not configuring the relay correctly with its public IP addresses
you are not running production builds
I started everything with docker compose up --build, as mentioned in docs here: https://github.com/firezone/firezone/blob/main/docs/CONTRIBUTING.md
from commit 650d7d7
Is there anywhere instructions on how to run it in prod mode?
Terraform doesn't suit me because I use own server.
from firezone.
Is there anywhere instructions on how to run it in prod mode?
Terraform doesn't suit me because I use own server.
That is all what we've got for the moment so far, sorry!
you are not running production builds
I started everything with docker compose up --build, as mentioned in docs here:
main
/docs/CONTRIBUTING.md from commit 650d7d7
The docker-compose file is primarily used for our testing infrastructure. It uses debug builds so we hit debug assertions in certain edge-cases like you've just encountered. Those binaries are too slow for production though. For our internal perf runs, we replace the containers, see
firezone/.github/workflows/ci.yml
Lines 160 to 184 in 12b684e
To learn about all the different docker stages, see https://github.com/firezone/firezone/blob/main/rust/Dockerfile. You'll want the release
one most likely.
from firezone.
Ping on windows not working every 6th time :)
On macos/linux/headless - everything is good.
from firezone.
#5367 fixes the debug assertion.
from firezone.
Maybe I'll ask a stupid question, but does it possible to use firezone without relay?
from firezone.
Maybe I'll ask a stupid question, but does it possible to use firezone without relay?
Our relay performs two roles: STUN & TURN. You need at least STUN to establish a connection, if you can guarantee that hole-punching works (i.e. nothing is deployed behind symmetric NAT), then you could be fine without the TURN part.
But the client needs something that it can perform STUN with. From a purely technical PoV, that wouldn't have to be our relay but at least at the moment, the codepath for that is fixed to always use the relays returned from the portal. If you are willing to maintain a fork, it would be possible to add a config option for clients and gateways to use an additional list of STUN servers, then you wouldn't have to deploy a relay.
But this might break if you e.g. end up sitting in a cafe that uses a symmetric NAT. Then you wouldn't be able to contact your gateways.
from firezone.
If you are willing to maintain a fork
I am a frontend-developer. It is too tricky for me :)
You said:
we replace the containers
and:
different docker stages
Did I understand correctly?
- I should replace "target" from "dev" to "release" in all docker containers:
ex: https://github.com/firezone/firezone/blob/main/docker-compose.yml#L433 - Or I should use images which listed above:
ex: us-east1-docker.pkg.dev/firezone-staging/firezone/perf/relay
from firezone.
Did I understand correctly?
1. I should replace "target" from "dev" to "release" in all docker containers: ex: [`main`/docker-compose.yml#L433](https://github.com/firezone/firezone/blob/main/docker-compose.yml?rgh-link-date=2024-06-14T01%3A49%3A36Z#L433) 2. Or I should use images which listed above: ex: us-east1-docker.pkg.dev/firezone-staging/firezone/perf/relay
Here are the released containers: https://github.com/orgs/firezone/packages?repo_name=firezone! :)
from firezone.
Here are the released containers
Thank you :)
UPD-2:
I modified my docker compose. Now I am using latest release-containers.
And I think I could localize problem.
On android and windows after connection roaming I see message: "refresh failed: Allocation Mismatch" and client not reconnects.
More logs here
relay-1 | 2024-06-14T16:18:42.205008Z INFO relay: Allocations = 2 Channels = 0 Throughput = 0.00 B/s
relay-1 | 2024-06-14T16:18:49.581908Z INFO handle_binding_request{transaction_id=TransactionId(0xEE29BB30B0EA6285BA415D15) sender=193.201.90.105:39414}: firezone_relay::server: Handled BINDING request
relay-1 | 2024-06-14T16:18:49.599475Z WARN relay: refresh failed: Allocation Mismatch
relay-1 | 2024-06-14T16:18:49.615462Z WARN handle_allocate_request{transaction_id=TransactionId(0xB19EEBD93A7ED991A55A270F) sender=193.201.90.105:39414}: relay: Partially fulfilling allocation using only an IPv4 address
relay-1 | 2024-06-14T16:18:49.615498Z INFO handle_allocate_request{transaction_id=TransactionId(0xB19EEBD93A7ED991A55A270F) sender=193.201.90.105:39414 allocation=55628}: relay: Created new allocation first_relay_address=158.160.140.10 lifetime=600s
relay-1 | 2024-06-14T16:18:49.615514Z INFO relay: Created allocation port=55628 family=IPv4
relay-1 | 2024-06-14T16:18:51.655469Z INFO handle_binding_request{transaction_id=TransactionId(0xE35BCBB09A7D4652D7D1CAAB) sender=193.201.90.105:39414}: firezone_relay::server: Handled BINDING request
relay-1 | 2024-06-14T16:18:51.657341Z INFO handle_binding_request{transaction_id=TransactionId(0xD692DE9F10D4A9365A26865E) sender=193.201.90.105:39414}: firezone_relay::server: Handled BINDING request
relay-1 | 2024-06-14T16:18:51.671483Z WARN relay: refresh failed: Unauthorized
relay-1 | 2024-06-14T16:18:51.701481Z INFO handle_binding_request{transaction_id=TransactionId(0x9DFE922675C9B65A9C1A0560) sender=158.160.140.10:44111}: firezone_relay::server: Handled BINDING request
relay-1 | 2024-06-14T16:18:51.703405Z WARN relay: refresh failed: Unauthorized
relay-1 | 2024-06-14T16:18:51.705222Z INFO handle_refresh_request{transaction_id=TransactionId(0x46C67542D5654088C00CB6BF) sender=158.160.140.10:44111 allocation=55557}: relay: Refreshed allocation
relay-1 | 2024-06-14T16:18:51.761421Z INFO handle_refresh_request{transaction_id=TransactionId(0x158D990CB8383FF36B109E51) sender=193.201.90.105:39414 allocation=55628}: relay: Refreshed allocation
relay-1 | 2024-06-14T16:18:51.763406Z INFO handle_channel_bind_request{transaction_id=TransactionId(0xBE59D9CAA538A48475936B65) sender=193.201.90.105:39414 allocation=55628 peer=158.160.140.10:44111 channel=16384}: relay: Successfully bound channel
relay-1 | 2024-06-14T16:18:51.767463Z DEBUG handle_peer_traffic{sender=193.201.90.105:39414 allocation=55557}: relay: no channel
relay-1 | 2024-06-14T16:18:51.768603Z INFO handle_channel_bind_request{transaction_id=TransactionId(0x21439689C66C12B7F850E3FF) sender=193.201.90.105:39414 allocation=55628 peer=158.160.140.10:55557 channel=16385}: relay: Successfully bound channel
relay-1 | 2024-06-14T16:18:51.769893Z INFO handle_channel_bind_request{transaction_id=TransactionId(0x80410668F0695841B40117F9) sender=158.160.140.10:44111 allocation=55557 peer=193.201.90.105:39414 channel=16384}: relay: Successfully bound channel
relay-1 | 2024-06-14T16:18:51.771021Z DEBUG handle_peer_traffic{sender=158.160.140.10:38090 allocation=55628}: relay: no channel
relay-1 | 2024-06-14T16:18:51.771092Z INFO handle_channel_bind_request{transaction_id=TransactionId(0x5E5A0BA840088A23BB1FB5D1) sender=158.160.140.10:44111 allocation=55557 peer=158.160.140.10:55628 channel=16385}: relay: Successfully bound channel
relay-1 | 2024-06-14T16:18:51.943533Z DEBUG handle_peer_traffic{sender=158.160.140.10:38090 allocation=55628}: relay: no channel
relay-1 | 2024-06-14T16:18:52.204716Z INFO relay: Allocations = 3 Channels = 4 Throughput = 170.00 B/s
relay-1 | 2024-06-14T16:18:52.444098Z DEBUG handle_peer_traffic{sender=158.160.140.10:38090 allocation=55628}: relay: no channel
relay-1 | 2024-06-14T16:18:53.443005Z DEBUG handle_peer_traffic{sender=158.160.140.10:38090 allocation=55628}: relay: no channel
relay-1 | 2024-06-14T16:18:54.944067Z DEBUG handle_peer_traffic{sender=158.160.140.10:38090 allocation=55628}: relay: no channel
relay-1 | 2024-06-14T16:18:56.443741Z DEBUG handle_peer_traffic{sender=158.160.140.10:38090 allocation=55628}: relay: no channel
relay-1 | 2024-06-14T16:18:57.943966Z DEBUG handle_peer_traffic{sender=158.160.140.10:38090 allocation=55628}: relay: no channel
relay-1 | 2024-06-14T16:18:59.443465Z DEBUG handle_peer_traffic{sender=158.160.140.10:38090 allocation=55628}: relay: no channel
relay-1 | 2024-06-14T16:19:02.204707Z INFO relay: Allocations = 3 Channels = 4 Throughput = 697.00 B/s
UPD-3:
I noticed that problem reproduces ONLY WITH KEYCLOAK-USERS (!!!) ("OpenID Connect" identity provider). And only on windows/android (tesed: ubuntu, macos, headless).
With "Username & Password"-provider everything is OK.
Any ideas?
from firezone.
There are some known issues with roaming when relays are involved. They'll be solved with #5080 but that is still blocked by some other work.
An "Allocation mismatch" error is expected as part of roaming. TURN operates on the user's 3-tuple which changes when you roam.
from firezone.
UPD-4 (I could work around the problem):
- I found that ip of nginx (which reverse-proxying firezone web and api) is under vpn
- when "network roaming" occures vpn-client is trying to re-establish connection
2.1) linux, macos, headless don't use vpn for it, so everything is ok.
2.2) windows and android clients are trying to use vpn to reauthenticate (?) and resource (ip of nginx) is not acceptable for them
I moved nginx with vpn reverse-proxy to other ip (not wrapped with vpn) and now it is working as expected.
But I think, that there is a problem with windows and andriod clients :)
from firezone.
Hi @gleam-ru, thanks for the added detail. Control plane IPs and Relays should not be added as Resources, as you've discovered. The Clients need to be able talk to something in order to setup the VPN connection, otherwise it's a chicken-and-egg problem.
from firezone.
nginx ip: a.a.a.a
relay ip: b.b.b.b
web/api ip: b.b.b.b
nginx (a.a.a.a) reverse proxies web and api (b.b.b.b).
I have a resource with ip a.a.a.a (same as nginx, but not nginx).
And both of ips - a.a.a.a and b.b.b.b are acceptable WITHOUT vpn.
But windows and macos clients works differently...
from firezone.
I managed to reproduce this on Windows. Still trying for Android. I'm going to mark this issue for Windows for now.
Steps to reproduce:
1. Start Firezone 2. Sign in, start a ping `ping -t github.com` 3. Switch from Eth to Wifi
Logs when doing this on my test laptop:
Is this using latest main
?
from firezone.
Looks like it was 409039a which is from earlier today, about 7 hours ago
from firezone.
Related Issues (20)
- "Found reply from unexpected source" when upstream DNS is also a Resource HOT 3
- connlib: sticky IPs across sessions HOT 2
- Re-launching app should show the menubar, not welcome window
- Form state is lost when navigating away or WebSocket sleeps
- Client doesn't accept resolved AWS RDS instance from Gateway HOT 4
- connlib: tracking issue for battery & bandwidth optimisations HOT 3
- Resource Groups
- Relayed connections fail seemingly randomly with "Packet is a STUN message but no agent handled it" HOT 3
- `relays_presence` message not consistently broadcasted HOT 3
- Connectivity via IPv6 to IPv4-only resource does not work HOT 7
- connlib: replace `domain` library with `hickory-proto`
- connlib: don't construct DNS responses based on existing byte-buffer
- connlib: sort out "+20" byte-length weirdness
- connlib: introduce DNS cache shared across all connections on gateway
- no function clause matching in Web.Policies.Components.day_of_week_index/1 HOT 2
- Title is blank when hovering over tray icon in Windows
- Consider stopping IPC service when stopping GUI HOT 9
- bug(gui-client): Tray menu can be in signed-out state for several seconds before automatic sign-in
- RFC: Client and gateway versioning scheme
- connlib: surface interrupted portal connection to user
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from firezone.