Code Monkey home page Code Monkey logo

Comments (29)

gleam-ru avatar gleam-ru commented on June 26, 2024 1

Maybe it is related: - sometimes relay crashes.
image

https://pastebin.com/CwqfiPmE

from firezone.

jamilbk avatar jamilbk commented on June 26, 2024 1

I managed to reproduce this on Windows. Still trying for Android. I'm going to mark this issue for Windows for now.

cc @ReactorScram

Steps to reproduce:

  1. Start Firezone
  2. Sign in, start a ping ping -t github.com
  3. Switch from Eth to Wifi

Logs when doing this on my test laptop:

set-dns-repeated.zip

from firezone.

jamilbk avatar jamilbk commented on June 26, 2024

Thanks for the report @gleam-ru. Just to confirm, is this happening with the cloud-managed version or control plane infrastructure you are self-hosting?

from firezone.

gleam-ru avatar gleam-ru commented on June 26, 2024

self-hosting.

from firezone.

jamilbk avatar jamilbk commented on June 26, 2024

Ah I see. I don't think we'll be able to reproduce this unfortunately. The error seems to be this:

{"time":"2024-06-13T12:33:24.693301933Z","target":"snownet::node","logging.googleapis.com/sourceLocation":{"file":"connlib/snownet/src/node.rs","line":"373"},"severity":"WARNING","message":"No channel","peer":"158.160.140.10:55582"}

That could be for a number of reasons, most likely related to your control plane configuration / reachability. Connection roaming (what happens when your network interfaces change) is a quite a complex process that we've done extensive development on, but only able to guarantee it works reliably for our infra deployed in GCP. If you're also on GCP you could use our terraform modules as inspiration.

We'll do some more testing to be extra sure it's not a client issue.

from firezone.

ReactorScram avatar ReactorScram commented on June 26, 2024

Couldn't replicate on the Windows Client 1.0.7 on the Windows laptop. I turned the Wi-Fi off for maybe 10 seconds and pinging speed.cloudflare.com on staging worked both before and after.

from firezone.

gleam-ru avatar gleam-ru commented on June 26, 2024

"message":"No channel"

What does this message mean?
And why it reproduces only on windows/android and not reproduces on macos/headless? :)

Please, help me with localization of issue... I don't undesrtand where could be a problem - relay? gateway? api? configuration of server-VM?

from firezone.

jamilbk avatar jamilbk commented on June 26, 2024

The No channel message refers to TURN channels and would indeed be related to the Relay. Thanks for posting the backtrace. We haven't seen that in production, but will tag @thomaseizinger here in case it's a bug.

from firezone.

gleam-ru avatar gleam-ru commented on June 26, 2024

If it could be relay-related problem, here is some more info/logs:
image

https://pastebin.com/2E271pmE

I have about 1 "refresh failed: Unauthorized" message every ~5sec.

If I can provide more info - feel free to ask :)

from firezone.

gleam-ru avatar gleam-ru commented on June 26, 2024

UPD:

1 - I suppose this env vars should start with FEATURE_* https://github.com/firezone/firezone/blob/main/docker-compose.yml#L102

After editing envs I was able to see "relays" tab in sidebar.
I deleted default (from seeds) relays and created the new self-hosted one.

BUT!
2 - Stacktrace (shown earlier) still reproduces.
3 - I had a new warn-logs just before crash of relay:

image
image

from firezone.

thomaseizinger avatar thomaseizinger commented on June 26, 2024

I have about 1 "refresh failed: Unauthorized" message every ~5sec.

That means your portal is sending invalid credentials for the relays to the clients. A few "Unauthorized" are expected as part of the initial handshake but not every 5s!

Will take a look at the stacktrace, it definitely shouldn't crash!

from firezone.

gleam-ru avatar gleam-ru commented on June 26, 2024

your portal is sending invalid credentials for the relays

Web and api are under the nginx reverse proxy, if matters.
image

from firezone.

thomaseizinger avatar thomaseizinger commented on June 26, 2024

"message":"No channel"

What does this message mean?

I means we were not able to hole-punch to the gateway and thus started relaying data via the relay, yet at the same time, we didn't make a channel binding for that connection and thus don't have a channel to actually send data from.

Couple of theories here but since you are self-hosted, I'd assume that you are not configuring the relay correctly with its public IP addresses.

from firezone.

thomaseizinger avatar thomaseizinger commented on June 26, 2024

crash of relay:

This is a "only" a debug assertion btw, it appears you are not running production builds.

from firezone.

gleam-ru avatar gleam-ru commented on June 26, 2024

you are not configuring the relay correctly with its public IP addresses

here is my relay's compose
image

you are not running production builds

I started everything with docker compose up --build, as mentioned in docs here: https://github.com/firezone/firezone/blob/main/docs/CONTRIBUTING.md
from commit 650d7d7

Is there anywhere instructions on how to run it in prod mode?
Terraform doesn't suit me because I use own server.

from firezone.

thomaseizinger avatar thomaseizinger commented on June 26, 2024

Is there anywhere instructions on how to run it in prod mode?
Terraform doesn't suit me because I use own server.

That is all what we've got for the moment so far, sorry!

you are not running production builds

I started everything with docker compose up --build, as mentioned in docs here: main/docs/CONTRIBUTING.md from commit 650d7d7

The docker-compose file is primarily used for our testing infrastructure. It uses debug builds so we hit debug assertions in certain edge-cases like you've just encountered. Those binaries are too slow for production though. For our internal perf runs, we replace the containers, see

perf-tests:
# Only the debug images have perf tooling
if: ${{ github.event_name == 'pull_request' }}
name: perf-tests-${{ matrix.version.prefix }}-${{ matrix.test_name }}
needs:
- build-base-perf-artifacts
- build-head-perf-artifacts
runs-on: ubuntu-22.04
permissions:
contents: read
id-token: write
pull-requests: write
env:
API_IMAGE: 'us-east1-docker.pkg.dev/firezone-staging/firezone/api'
API_TAG: ${{ matrix.version.sha }}
WEB_IMAGE: 'us-east1-docker.pkg.dev/firezone-staging/firezone/web'
WEB_TAG: ${{ matrix.version.sha }}
ELIXIR_IMAGE: 'us-east1-docker.pkg.dev/firezone-staging/firezone/elixir'
ELIXIR_TAG: ${{ matrix.version.sha }}
GATEWAY_IMAGE: 'us-east1-docker.pkg.dev/firezone-staging/firezone/perf/gateway'
GATEWAY_TAG: ${{ matrix.version.sha }}
CLIENT_IMAGE: 'us-east1-docker.pkg.dev/firezone-staging/firezone/perf/client'
CLIENT_TAG: ${{ matrix.version.sha }}
RELAY_IMAGE: 'us-east1-docker.pkg.dev/firezone-staging/firezone/perf/relay'
RELAY_TAG: ${{ matrix.version.sha }}
.

To learn about all the different docker stages, see https://github.com/firezone/firezone/blob/main/rust/Dockerfile. You'll want the release one most likely.

from firezone.

gleam-ru avatar gleam-ru commented on June 26, 2024

Ping on windows not working every 6th time :)
On macos/linux/headless - everything is good.
telegram-cloud-photo-size-2-5429239550552695198-y

from firezone.

thomaseizinger avatar thomaseizinger commented on June 26, 2024

#5367 fixes the debug assertion.

from firezone.

gleam-ru avatar gleam-ru commented on June 26, 2024

Maybe I'll ask a stupid question, but does it possible to use firezone without relay?

from firezone.

thomaseizinger avatar thomaseizinger commented on June 26, 2024

Maybe I'll ask a stupid question, but does it possible to use firezone without relay?

Our relay performs two roles: STUN & TURN. You need at least STUN to establish a connection, if you can guarantee that hole-punching works (i.e. nothing is deployed behind symmetric NAT), then you could be fine without the TURN part.

But the client needs something that it can perform STUN with. From a purely technical PoV, that wouldn't have to be our relay but at least at the moment, the codepath for that is fixed to always use the relays returned from the portal. If you are willing to maintain a fork, it would be possible to add a config option for clients and gateways to use an additional list of STUN servers, then you wouldn't have to deploy a relay.

But this might break if you e.g. end up sitting in a cafe that uses a symmetric NAT. Then you wouldn't be able to contact your gateways.

from firezone.

gleam-ru avatar gleam-ru commented on June 26, 2024

If you are willing to maintain a fork

I am a frontend-developer. It is too tricky for me :)

You said:

we replace the containers

and:

different docker stages

Did I understand correctly?

  1. I should replace "target" from "dev" to "release" in all docker containers:
    ex: https://github.com/firezone/firezone/blob/main/docker-compose.yml#L433
  2. Or I should use images which listed above:
    ex: us-east1-docker.pkg.dev/firezone-staging/firezone/perf/relay

from firezone.

thomaseizinger avatar thomaseizinger commented on June 26, 2024

Did I understand correctly?

1. I should replace "target" from "dev" to "release" in all docker containers:
   ex: [`main`/docker-compose.yml#L433](https://github.com/firezone/firezone/blob/main/docker-compose.yml?rgh-link-date=2024-06-14T01%3A49%3A36Z#L433)

2. Or I should use images which listed above:
   ex: us-east1-docker.pkg.dev/firezone-staging/firezone/perf/relay

Here are the released containers: https://github.com/orgs/firezone/packages?repo_name=firezone! :)

from firezone.

gleam-ru avatar gleam-ru commented on June 26, 2024

Here are the released containers

Thank you :)

UPD-2:
I modified my docker compose. Now I am using latest release-containers.
And I think I could localize problem.
On android and windows after connection roaming I see message: "refresh failed: Allocation Mismatch" and client not reconnects.

More logs here

relay-1 | 2024-06-14T16:18:42.205008Z INFO relay: Allocations = 2 Channels = 0 Throughput = 0.00 B/s
relay-1 | 2024-06-14T16:18:49.581908Z INFO handle_binding_request{transaction_id=TransactionId(0xEE29BB30B0EA6285BA415D15) sender=193.201.90.105:39414}: firezone_relay::server: Handled BINDING request
relay-1 | 2024-06-14T16:18:49.599475Z WARN relay: refresh failed: Allocation Mismatch
relay-1 | 2024-06-14T16:18:49.615462Z WARN handle_allocate_request{transaction_id=TransactionId(0xB19EEBD93A7ED991A55A270F) sender=193.201.90.105:39414}: relay: Partially fulfilling allocation using only an IPv4 address
relay-1 | 2024-06-14T16:18:49.615498Z INFO handle_allocate_request{transaction_id=TransactionId(0xB19EEBD93A7ED991A55A270F) sender=193.201.90.105:39414 allocation=55628}: relay: Created new allocation first_relay_address=158.160.140.10 lifetime=600s
relay-1 | 2024-06-14T16:18:49.615514Z INFO relay: Created allocation port=55628 family=IPv4
relay-1 | 2024-06-14T16:18:51.655469Z INFO handle_binding_request{transaction_id=TransactionId(0xE35BCBB09A7D4652D7D1CAAB) sender=193.201.90.105:39414}: firezone_relay::server: Handled BINDING request
relay-1 | 2024-06-14T16:18:51.657341Z INFO handle_binding_request{transaction_id=TransactionId(0xD692DE9F10D4A9365A26865E) sender=193.201.90.105:39414}: firezone_relay::server: Handled BINDING request
relay-1 | 2024-06-14T16:18:51.671483Z WARN relay: refresh failed: Unauthorized
relay-1 | 2024-06-14T16:18:51.701481Z INFO handle_binding_request{transaction_id=TransactionId(0x9DFE922675C9B65A9C1A0560) sender=158.160.140.10:44111}: firezone_relay::server: Handled BINDING request
relay-1 | 2024-06-14T16:18:51.703405Z WARN relay: refresh failed: Unauthorized
relay-1 | 2024-06-14T16:18:51.705222Z INFO handle_refresh_request{transaction_id=TransactionId(0x46C67542D5654088C00CB6BF) sender=158.160.140.10:44111 allocation=55557}: relay: Refreshed allocation
relay-1 | 2024-06-14T16:18:51.761421Z INFO handle_refresh_request{transaction_id=TransactionId(0x158D990CB8383FF36B109E51) sender=193.201.90.105:39414 allocation=55628}: relay: Refreshed allocation
relay-1 | 2024-06-14T16:18:51.763406Z INFO handle_channel_bind_request{transaction_id=TransactionId(0xBE59D9CAA538A48475936B65) sender=193.201.90.105:39414 allocation=55628 peer=158.160.140.10:44111 channel=16384}: relay: Successfully bound channel
relay-1 | 2024-06-14T16:18:51.767463Z DEBUG handle_peer_traffic{sender=193.201.90.105:39414 allocation=55557}: relay: no channel
relay-1 | 2024-06-14T16:18:51.768603Z INFO handle_channel_bind_request{transaction_id=TransactionId(0x21439689C66C12B7F850E3FF) sender=193.201.90.105:39414 allocation=55628 peer=158.160.140.10:55557 channel=16385}: relay: Successfully bound channel
relay-1 | 2024-06-14T16:18:51.769893Z INFO handle_channel_bind_request{transaction_id=TransactionId(0x80410668F0695841B40117F9) sender=158.160.140.10:44111 allocation=55557 peer=193.201.90.105:39414 channel=16384}: relay: Successfully bound channel
relay-1 | 2024-06-14T16:18:51.771021Z DEBUG handle_peer_traffic{sender=158.160.140.10:38090 allocation=55628}: relay: no channel
relay-1 | 2024-06-14T16:18:51.771092Z INFO handle_channel_bind_request{transaction_id=TransactionId(0x5E5A0BA840088A23BB1FB5D1) sender=158.160.140.10:44111 allocation=55557 peer=158.160.140.10:55628 channel=16385}: relay: Successfully bound channel
relay-1 | 2024-06-14T16:18:51.943533Z DEBUG handle_peer_traffic{sender=158.160.140.10:38090 allocation=55628}: relay: no channel
relay-1 | 2024-06-14T16:18:52.204716Z INFO relay: Allocations = 3 Channels = 4 Throughput = 170.00 B/s
relay-1 | 2024-06-14T16:18:52.444098Z DEBUG handle_peer_traffic{sender=158.160.140.10:38090 allocation=55628}: relay: no channel
relay-1 | 2024-06-14T16:18:53.443005Z DEBUG handle_peer_traffic{sender=158.160.140.10:38090 allocation=55628}: relay: no channel
relay-1 | 2024-06-14T16:18:54.944067Z DEBUG handle_peer_traffic{sender=158.160.140.10:38090 allocation=55628}: relay: no channel
relay-1 | 2024-06-14T16:18:56.443741Z DEBUG handle_peer_traffic{sender=158.160.140.10:38090 allocation=55628}: relay: no channel
relay-1 | 2024-06-14T16:18:57.943966Z DEBUG handle_peer_traffic{sender=158.160.140.10:38090 allocation=55628}: relay: no channel
relay-1 | 2024-06-14T16:18:59.443465Z DEBUG handle_peer_traffic{sender=158.160.140.10:38090 allocation=55628}: relay: no channel
relay-1 | 2024-06-14T16:19:02.204707Z INFO relay: Allocations = 3 Channels = 4 Throughput = 697.00 B/s

UPD-3:
I noticed that problem reproduces ONLY WITH KEYCLOAK-USERS (!!!) ("OpenID Connect" identity provider). And only on windows/android (tesed: ubuntu, macos, headless).
With "Username & Password"-provider everything is OK.

Any ideas?

from firezone.

thomaseizinger avatar thomaseizinger commented on June 26, 2024

There are some known issues with roaming when relays are involved. They'll be solved with #5080 but that is still blocked by some other work.

An "Allocation mismatch" error is expected as part of roaming. TURN operates on the user's 3-tuple which changes when you roam.

from firezone.

gleam-ru avatar gleam-ru commented on June 26, 2024

UPD-4 (I could work around the problem):

  1. I found that ip of nginx (which reverse-proxying firezone web and api) is under vpn
  2. when "network roaming" occures vpn-client is trying to re-establish connection
    2.1) linux, macos, headless don't use vpn for it, so everything is ok.
    2.2) windows and android clients are trying to use vpn to reauthenticate (?) and resource (ip of nginx) is not acceptable for them

I moved nginx with vpn reverse-proxy to other ip (not wrapped with vpn) and now it is working as expected.

But I think, that there is a problem with windows and andriod clients :)

from firezone.

jamilbk avatar jamilbk commented on June 26, 2024

Hi @gleam-ru, thanks for the added detail. Control plane IPs and Relays should not be added as Resources, as you've discovered. The Clients need to be able talk to something in order to setup the VPN connection, otherwise it's a chicken-and-egg problem.

from firezone.

gleam-ru avatar gleam-ru commented on June 26, 2024

nginx ip: a.a.a.a
relay ip: b.b.b.b
web/api ip: b.b.b.b

nginx (a.a.a.a) reverse proxies web and api (b.b.b.b).

I have a resource with ip a.a.a.a (same as nginx, but not nginx).
And both of ips - a.a.a.a and b.b.b.b are acceptable WITHOUT vpn.

But windows and macos clients works differently...

from firezone.

thomaseizinger avatar thomaseizinger commented on June 26, 2024

I managed to reproduce this on Windows. Still trying for Android. I'm going to mark this issue for Windows for now.

cc @ReactorScram

Steps to reproduce:

1. Start Firezone

2. Sign in, start a ping `ping -t github.com`

3. Switch from Eth to Wifi

Logs when doing this on my test laptop:

set-dns-repeated.zip

Is this using latest main?

from firezone.

ReactorScram avatar ReactorScram commented on June 26, 2024

Looks like it was 409039a which is from earlier today, about 7 hours ago

from firezone.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.