Code Monkey home page Code Monkey logo

Comments (20)

JeremyBYU avatar JeremyBYU commented on May 26, 2024 1

Thanks for the response. I will try out most if not alll your suggestion this weekend!

from ecal.

rex-schilasky avatar rex-schilasky commented on May 26, 2024

Hi JeremyBYU, this is the default setup. If you enabled the network mode and set the multicast ttl to the right level in the ecal.ini file as you described in your previous issue everything is fine.

The publisher section in the ecal.ini file can be used to switch single transport layers on, off or to automatic. Automatic for shared memory means on if subscribed on the same host and off otherwise. Automatic for udp multicast behaves the other way around.
See ecal.ini‘s publisher section

publisher]
use_inproc                = 0
use_shm                   = 2
use_udp_mc                = 2

Innerprocess layer is switched off, shm and udp_mc layer configured as automatic.

from ecal.

JeremyBYU avatar JeremyBYU commented on May 26, 2024

Unfortunately with those settings you have outlined (for publishing HOST1) does not seem to work. This is what is seen on HOST2 ecal_mon.

image

It is showing the layer as Shared Memory and the data is not available. The data clock does appear to be increasing.

However if I force UDP to on and disable Shared Memory for publishing on HOST1 as described in the Readme (use_shm=0, use_udp_mc=1) through command line args then the messages appear on HOST2 as correctly.

image

The protobuf deserialization doesn't work (which I'll ask about later), but the data is clearly flowing.

from ecal.

rex-schilasky avatar rex-schilasky commented on May 26, 2024

Did you setup the multicast routes on both machines correctly like described in the Readme.md file ?

from ecal.

JeremyBYU avatar JeremyBYU commented on May 26, 2024

Yes I believe that I have. And just to be clear UDP multicast does work across hosts if I force with use_udp_mc=1, use_shm=0. This makes me think that the routes and everything are properly configured for multicast. Its only when both layers are on auto that I have this issue.

from ecal.

FlorianReimold avatar FlorianReimold commented on May 26, 2024

I too asume that you have some problem with your multicast routes. The behaviour is a typical indicator for that.
What is your network configuration and how did you set your multicast routes?

from ecal.

rex-schilasky avatar rex-schilasky commented on May 26, 2024

@JeremyBYU to make it more clear .. there are two kinds of mechanism that are needed for a pub/sub communication. One layer is for sure the transport layer (can be configured by the user) and the other one is the so called monitor layer.
The monitor layer always uses udp multicast. If this one is not working properly a publisher will not recognize that there is a subscription locally or outside it's host. This causes the publisher to NOT send data by shm or udp to not waste CPU time. By setting use_udp_mc=1 you force the publisher to write on this layer even the monitoring is not working.

from ecal.

JeremyBYU avatar JeremyBYU commented on May 26, 2024

I have an Orbi Router RBK40. HOST1 is plugged directly into the router. HOST2 is connected to the WIFI. I have changed a setting in the router as shown in #38 to allow some form of multi cast. I have verified that multi cast works from arbitrary programs (python scripts) as well verified that at least one layer (transport layer) that @rex-schilasky mentioned is working with UDP multicast. I guess there seems to be in an issue with the monitoring layer which uses UDP Multicast. I will reboot both computers and try again and fiddle around with any settings I can. Thanks for your help!

from ecal.

JeremyBYU avatar JeremyBYU commented on May 26, 2024

Okay its still not working and these are the only things I can think of that might help:

HOST1:

enp6s0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 192.168.1.2  netmask 255.255.255.0  broadcast 192.168.1.255
        inet6 fe80::b61e:a142:37ad:3a0d  prefixlen 64  scopeid 0x20<link>
        ether 70:85:c2:dc:f0:3d  txqueuelen 1000  (Ethernet)
        RX packets 4137  bytes 2096243 (2.0 MB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 5421  bytes 1304716 (1.3 MB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0
        device memory 0xf7600000-f761ffff  

(base) ➜  build git:(dev) ✗ route                                                    
Kernel IP routing table
Destination     Gateway         Genmask         Flags Metric Ref    Use Iface
default         _gateway        0.0.0.0         UG    100    0        0 enp6s0
link-local      0.0.0.0         255.255.0.0     U     1000   0        0 enp6s0
172.17.0.0      0.0.0.0         255.255.0.0     U     0      0        0 docker0
172.18.0.0      0.0.0.0         255.255.0.0     U     0      0        0 br-08146ed772c2
192.168.1.0     0.0.0.0         255.255.255.0   U     100    0        0 enp6s0
239.0.0.0       0.0.0.0         255.0.0.0       U     0      0        0 enp6s0

HOST2

wlp59s0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 192.168.1.24  netmask 255.255.255.0  broadcast 192.168.1.255
        inet6 fe80::ad93:7969:f8dd:eca6  prefixlen 64  scopeid 0x20<link>
        ether 9c:b6:d0:16:32:3b  txqueuelen 1000  (Ethernet)
        RX packets 10427  bytes 10115247 (10.1 MB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 6463  bytes 1170240 (1.1 MB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0
Kernel IP routing table
Destination     Gateway         Genmask         Flags Metric Ref    Use Iface
default         _gateway        0.0.0.0         UG    600    0        0 wlp59s0
link-local      0.0.0.0         255.255.0.0     U     1000   0        0 docker0
172.17.0.0      0.0.0.0         255.255.0.0     U     0      0        0 docker0
192.168.1.0     0.0.0.0         255.255.255.0   U     600    0        0 wlp59s0
239.0.0.0       0.0.0.0         255.0.0.0       U     0      0        0 wlp59s0

The only comment I have is that that the subnet mask of 255.255.255.0 is different than the number shown in this command in the README route add -net 239.0.0.0 netmask 255.0.0.0 dev eth0. However I think thats probably on purpose and is a number used for a different purpose.

from ecal.

rex-schilasky avatar rex-schilasky commented on May 26, 2024

Are your udp python scripts send and receive in both directions ? Are you able to connect HOST1 and HOST2 directly (cable instead WiFi) ? How did you set the ttl in the ecal.ini file ? Can you please increase the value to at least 2 ? Are the none python ecal samples (person_snd, person_rec) working, including monitoring reflection ?
Many questions, I know .. but it's a strange behavior.

from ecal.

JeremyBYU avatar JeremyBYU commented on May 26, 2024

Okay I have some new information. When the two HOSTS are directly connected by a cable I have no issues. It works exactly as advertised with negotiating between shared memory vs udp multi cast. Pub/sub as well as services are working fine. BTW I did set the TTL in the ecal.ini file and set it to 2 for both sides.

Now to try and get it to work with WIFI. When trying to get it work over WIFI for HOST2 (HOST1 connected to router by Ethernet) I could still not get it to work. Worse yet my trick to force udp only worked for pub/sub but does not seem to work for services. Which really was my failsafe so I'm a little dissapointed. Also there were errors in ecal_mon complaining that the monitoring layer was not working. I guess you were right about that!

So it kinds of seems pretty obvious that the router is the issue. So I guess an ORBI RBK40 mesh WIFI router doesn't seem to work with ECAL. Any recommendations for a cheap wifi router that should work with UDP multicast with ECAL?

from ecal.

rex-schilasky avatar rex-schilasky commented on May 26, 2024

Nice to hear that at least eCAL is running in a cable based setup. I think the router is for some reason blocking udp traffic. eCAL is using udp multicast. The group is 239.0.0.1 - 239.0.0.16, the ports are 14000 + at least the 2 following next port numbers. What group and port did your python udp sample use ? Can you configure your WiFi router to allow udp traffic on specific ports and groups ?
You can adapt eCAL's multicast's properties in the ecal.ini file [network] section.

from ecal.

JeremyBYU avatar JeremyBYU commented on May 26, 2024

What group and port did your python udp sample use ? Can you configure your WiFi router to allow udp traffic on specific ports and groups

According to the script I posted on the previous issue it was this.

MCAST_GRP = '224.1.1.1'
MCAST_PORT = 5007

You can adapt eCAL's multicast's properties in the ecal.ini file [network] section

I think your saying try changing the network section in ecal.ini to the the numbers used in the Python script? I will give that a shot and see what happens.

from ecal.

rex-schilasky avatar rex-schilasky commented on May 26, 2024

Yes, all multicast settings are in the ecal.ini file and can modified there. There is nothing "hard coded" inside the ecal core.

from ecal.

JeremyBYU avatar JeremyBYU commented on May 26, 2024

Okay some good news! When I changed the network section of my ecal.ini file to this:

[network]
network_enabled     = true
multicast_group     = 224.0.0.1
multicast_mask      = 0.0.0.15
multicast_port      = 14000
multicast_ttl       = 3
multicast_sndbuf    = 5242880
multicast_rcvbuf    = 5242880

multicast_group_mtl = 224.0.0.1
multicast_port_mtl  = 14100

unicast_ipaddr      = 127.0.0.1
unicast_port        = 15000

bandwidth_max_udp   = -1

and updated the routes as so: sudo route add -net 224.0.0.0 netmask 255.0.0.0 dev enp6s0 and all devices.

many things are now working. For example, a secondary host can now subscribe to a topic using UDP while previously it was only using shared memory and the data was not available. However there are a few issues:

  1. Services. Services are discoverable on the secondary host (HOST2) and can see the heart beat in ecal_mon. But for some reason when the client on the secondary host tries to call a method nothing occurs on HOST1 (didnt receive method call) and ecal_mon on HOST2 does not show incremented method calls. Is there a different protocol being used for services? TCP? Another setting that needs to be changed?
  2. Performance. If HOST1 has two processes, Process A publishing, and Process B subscribing, it will be all shared memory. I look at the publish and subscribe frequency and they match and are as expected (30 hz). When HOST2, using ecal_mon, subscribes to HOST1 Process A topic it will now do so using UDP multicast. However now at that point the frequency drops down to 10 hz for all publishing, subscription for all processes. It seems to still indicate that Process A and Process B are using shared memory, however their frequencies are quite low. What I would have hoped or expected was that HOST2 subscription would be slower (I do have bad wifi in the basement) but that HOST1 Process A and Process B would keep their high publish and subscribe rates.

Note for these situation HOST1 is a Raspberry PI4 on 64 bit kernel, HOST 2 is x86 linux desktop. The second issue is not that big of a deal, I can just not subscribe to these images on wifi (224X480) but I do want to be able to use remote procedure calls.

from ecal.

FlorianReimold avatar FlorianReimold commented on May 26, 2024

eCAL Services are using direct TCP connections. Thus they differ substantially from "normal" eCAL traffic. Each Host has to be able to resolve hostnames to an IP address.
On Windows, this usually work out of the box, thanks to the Windows network discovery that maps IP addresses directly to their hostnames. Linux uses Avahi for hostname resolution, which appends the .local domain to all discovered hostnames. The .local TLD is currently not supported by eCAL. The easiest solution is to edit /etc/hosts on all of your machines and add all of your hostnames with their IPs.
You can also install winbind, which enables Linux to participate in the Windows network discovery.

from ecal.

rex-schilasky avatar rex-schilasky commented on May 26, 2024

Regarding the performance "issue" this is a long discussed design decision. The eCAL::CPublisher Send will return when the payload is sent on all connected layers. So a slow performance UDP connection will slow down local connections too.
We want to avoid to have some kind of extra threading, queuing in the send logic. If you need to decouple the two layers you can create 2 publisher on the same topic and switch on/off specific transport layers for them. See sample/cpp/person/person_snd_multicast for example.

from ecal.

JeremyBYU avatar JeremyBYU commented on May 26, 2024

Thank you all for your clear explanations. I will try out your suggestions for host name resolution. If it works I will close this issue! However the only recommendation I can give is to indicate in the README that host names must be resolved by IP for services (not needed for pub/sub).

from ecal.

JeremyBYU avatar JeremyBYU commented on May 26, 2024

It works thank you! Adding the host names was the right call.

from ecal.

rex-schilasky avatar rex-schilasky commented on May 26, 2024

Thank you all for your clear explanations. I will try out your suggestions for host name resolution. If it works I will close this issue! However the only recommendation I can give is to indicate in the README that host names must be resolved by IP for services (not needed for pub/sub).

We will start to improve the documentation regarding network and local mode, layer configuration logic and other pitfalls with the next release.
Thank you for all your effort !

from ecal.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.