Code Monkey home page Code Monkey logo

dora's People

Contributors

bobd988 avatar dependabot[bot] avatar eduidl avatar felixhuangsiling avatar haixuantao avatar heyong4725 avatar jerry73204 avatar messense avatar meua avatar phil-opp avatar wlee1248 avatar xxchang avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

dora's Issues

Rust example build error when using 0.1.0 instruction

Environment: Ubuntu 22.04

Follow the README, using Rust template to build rust example (default), I have the following error (why we have iceoryx dependecy?):

(base) demo@:~/src/dora-rs/releases/0.1.0/rust_proj$ cargo build -p node_1
warning: skipping duplicate package `rust-dataflow-example-sink` found at `/home/demo/.cargo/git/checkouts/dora-865db9b4eb79312d/c9f4a1a/examples/rust-dataflow/sink`
   Compiling iceoryx-sys v0.1.0 (https://github.com/eclipse-iceoryx/iceoryx-rs.git#68fbd034)
error: failed to run custom build command for `iceoryx-sys v0.1.0 (https://github.com/eclipse-iceoryx/iceoryx-rs.git#68fbd034)`

Caused by:
  process didn't exit successfully: `/home/demo/src/dora-rs/releases/0.1.0/rust_proj/target/debug/build/iceoryx-sys-6b251d9e411e136f/build-script-build` (exit status: 1)
  --- stdout
  Output { status: ExitStatus(unix_wait_status(0)), stdout: "", stderr: "HEAD is now at f756b7c99 Merge pull request #1327 from ApexAI/iox-#1323-set-version-to-2.0.2\n" }
  --- stderr
  CMake Error: The current CMakeCache.txt directory /home/demo/src/dora-rs/releases/0.1.0/rust_proj/target/debug/build/iceoryx-sys-ac30c768c6b445f0/out/iceoryx-build/iceoryx_hoofs/CMakeCache.txt is different than the directory /home/demo/src/dora-rs/releases/0.1.0/c_proj/target/debug/build/iceoryx-sys-ac30c768c6b445f0/out/iceoryx-build/iceoryx_hoofs where CMakeCache.txt was created. This may result in binaries being created in the wrong place. If you are not sure, reedit the CMakeCache.txt
  CMake Error: The source "/home/demo/src/dora-rs/releases/0.1.0/rust_proj/target/debug/build/iceoryx-sys-ac30c768c6b445f0/out/iceoryx-git/iceoryx_hoofs/CMakeLists.txt" does not match the source "/home/demo/src/dora-rs/releases/0.1.0/c_proj/target/debug/build/iceoryx-sys-ac30c768c6b445f0/out/iceoryx-git/iceoryx_hoofs/CMakeLists.txt" used to generate cache.  Re-run cmake with a different source directory.
  Error: Custom { kind: Other, error: "Could not run cmake for 'iceoryx_hoofs'!" }

I have cleaned .cargo/git cache by "cargo cache -a" command, don't know why it still pull in iceoryx and examples/rust-dataflow/sink dependencies

no node exists at `../../target/debug/rust-dataflow-url-example-sink`: No such file or directory

Describe the bug
Could not validate yaml: Dataflow could not be validated.: no node exists at ../../target/debug/rust-dataflow-url-example-sink: No such file or directory (os error 2)

(dora3.7) jarvis@jia:~/coding/dora_home/dora/binaries/cli/src$ cargo run start ../../../examples/rust-dataflow-url/dataflow.yml --attach --hot-reload
   Compiling dora-operator-api-c v0.2.3-rc6 (/home/jarvis/coding/dora_home/dora/apis/c/operator)
   Compiling dora-cli v0.2.3-rc6 (/home/jarvis/coding/dora_home/dora/binaries/cli)
    Finished dev [unoptimized + debuginfo] target(s) in 3.97s
     Running `/home/jarvis/coding/dora_home/dora/target/debug/dora-cli start ../../../examples/rust-dataflow-url/dataflow.yml --attach --hot-reload`
Could not validate yaml: Dataflow could not be validated.: no node exists at `../../target/debug/rust-dataflow-url-example-sink`: No such file or directory (os error 2)
(dora3.7) jarvis@jia:~/coding/dora_home/dora/binaries/cli/src$ 

To Reproduce
Steps to reproduce the behavior:

  1. Dora start daemon: dora up
  2. Start a new dataflow: dora start examples/rust-dataflow-url/dataflow.yml --attach --hot-reload

Environments (please complete the following information):

  • System info: ubuntu 22.04
  • Dora version: dora v0.2.3-rc6

Dataflow with no specified name shows unnamed

Is your feature request related to a problem? Please describe.
Dataflow with no specified name shows unnamed, as follows:
image

impl Display for DataflowId {
    fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
        if let Some(name) = &self.name {
            write!(f, "[{name}] {}", self.uuid)
        } else {
            write!(f, "[<unnamed>] {}", self.uuid)
        }
    }
}

Describe the solution you'd like
Is it possible for dataflow without a name to randomly generate a unique name for it.

Multithreading mechanism

This is a problem encountered in porting NDT algorithm from ROS2 ecosystem to DORA framework

In ROS2, the ndt_scan_matcher package adopts the multithreading mechanism: from its code, it can be seen that, the two callback functions ( callbackMapPoints and callbackSensorPoints) and the serviceNDTAlign service share a thread, which are directly related to the setting and execution of the NDT algorithm, while CallbackInitialPose is a separate thread. Extract the relevant codes as follows:

// ndt_scan_matcher_core.cpp
NDTScanMatcher::NDTScanMatcher()
{
    //...
    rclcpp::CallbackGroup::SharedPtr initial_pose_callback_group;
    initial_pose_callback_group =
        this->create_callback_group(rclcpp::CallbackGroupType::MutuallyExclusive);

    rclcpp::CallbackGroup::SharedPtr main_callback_group;
    main_callback_group = this->create_callback_group(rclcpp::CallbackGroupType::MutuallyExclusive);
    
    auto initial_pose_sub_opt = rclcpp::SubscriptionOptions();
    initial_pose_sub_opt.callback_group = initial_pose_callback_group;

    auto main_sub_opt = rclcpp::SubscriptionOptions();
    main_sub_opt.callback_group = main_callback_group;

    initial_pose_sub_ = this->create_subscription<geometry_msgs::msg::PoseWithCovarianceStamped>(
        "ekf_pose_with_covariance", 100,
        std::bind(&NDTScanMatcher::callbackInitialPose, this, std::placeholders::_1),
        initial_pose_sub_opt);
    map_points_sub_ = this->create_subscription<sensor_msgs::msg::PointCloud2>(
        "pointcloud_map", rclcpp::QoS{1}.transient_local(),
        std::bind(&NDTScanMatcher::callbackMapPoints, this, std::placeholders::_1), main_sub_opt);
    sensor_points_sub_ = this->create_subscription<sensor_msgs::msg::PointCloud2>(
        "points_raw", rclcpp::SensorDataQoS().keep_last(points_queue_size),
        std::bind(&NDTScanMatcher::callbackSensorPoints, this, std::placeholders::_1), main_sub_opt);
    
    service_ = this->create_service<tier4_localization_msgs::srv::PoseWithCovarianceStamped>(
        "ndt_align_srv",
        std::bind(&NDTScanMatcher::serviceNDTAlign, this, std::placeholders::_1, std::placeholders::_2),
        rclcpp::ServicesQoS().get_rmw_qos_profile(), main_callback_group);
}

// ndt_scan_matcher_node.cpp
int main(int argc, char ** argv)
{
  rclcpp::init(argc, argv);
  auto ndt_scan_matcher = std::make_shared<NDTScanMatcher>();
  rclcpp::executors::MultiThreadedExecutor exec;
  exec.add_node(ndt_scan_matcher);
  exec.spin();
  rclcpp::shutdown();
  return 0;
}

During the migration of NDT algorithm to DORA framework. The on_input function is temporarily implemented as follows:

// ndt_scan_matcher.cpp
OnInputResult on_input(NDTScanMatcher &op, rust::Str id, rust::Slice<const uint8_t> data, OutputSender &output_sender)
{
    if (id == "initial_pose") {
        // Corresponding to the original callbackInitialPose callback function
	} else if (id == "tf_baselink2lidar") {
        // ...
    }
    if (id == "map_points") {
        // Corresponding to the original callbackMapPoints callback function
    } else if (id == "align_request_pose") {
        // Corresponding to the original serviceNDTAlign service
    } else if (id == "downsampled_pointcloud") {
        // Corresponding to the original callbackSensorPoints callback function
    }
}

How to implement multithreading mechanism in DORA operator?

The `dora stop uuid` unable to delete a running dataflow

Describe the bug
The dora stop uuid unable to delete a running dataflow. The dora destroy command works, but an exception is reported, and the exception is not handled correctly

To Reproduce
Steps to reproduce the behavior:

  1. Dora start daemon: dora up
  2. then execute dora start graphs/tutorials/webcam_yolov5.yaml --attach

Screenshots or Video
7310c682d54678fda927ac1c2b36e500

Environments (please complete the following information):

  • System info: Linux jia 5.15.0-69-generic #76~20.04.1-Ubuntu SMP Mon Mar 20 15:54:19 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
  • Dora version: 0.2.2

the following packages contain code that will be rejected by a future version of Rust: ntapi v0.3.7

Describe the bug
Compiling dora prompts that the dependent ntapi version is no longer supported.

(py310) D:\coding\rusthome\github.com\meua\dora>cargo build --all
   Compiling dora-operator-api-c v0.2.2 (D:\coding\rusthome\github.com\meua\dora\apis\c\operator)
   Compiling dora-cli v0.2.2 (D:\coding\rusthome\github.com\meua\dora\binaries\cli)
    Finished dev [unoptimized + debuginfo] target(s) in 28.36s
warning: the following packages contain code that will be rejected by a future version of Rust: ntapi v0.3.7
note: to see what the problems were, use the option `--future-incompat-report`, or run `cargo report future-incompatibilities --id 4`

To Reproduce
Steps to reproduce the behavior:

  1. Dora start daemon: cargo build --all

Expected behavior
Update dependencies so that make no dependency warnings are reported.

Environments (please complete the following information):

  • System info: windows 10
  • Dora version: v0.2.2
  • Dora branch: main branch

no daemon listen port for machine ``

Describe the bug

var m has no value and is an empty string.

(dora3.7) jarvis@jia:~/coding/dora_home/dora/binaries/cli/src$ RUST_LOG=info cargo run start /home/jarvis/coding/rust_home/github.com/meua/dora-drives/graphs/tutorials/webcam_midas_frame.yaml --attach --hot-reload
   Compiling dora-operator-api-c v0.2.3-rc6 (/home/jarvis/coding/dora_home/dora/apis/c/operator)
   Compiling dora-cli v0.2.3-rc6 (/home/jarvis/coding/dora_home/dora/binaries/cli)
    Finished dev [unoptimized + debuginfo] target(s) in 4.08s
     Running `/home/jarvis/coding/dora_home/dora/target/debug/dora-cli start /home/jarvis/coding/rust_home/github.com/meua/dora-drives/graphs/tutorials/webcam_midas_frame.yaml --attach --hot-reload`
no daemon listen port for machine ``
(dora3.7) jarvis@jia:~/coding/dora_home/dora/binaries/cli/src$ 

To Reproduce
Steps to reproduce the behavior:

  1. Start dora-coordinator: cd dora/binaries/coordinator/src && cargo run
  2. Don't start dora-daemon
  3. Start a new dataflow: RUST_LOG=info cargo run start /home/jarvis/coding/rust_home/github.com/meua/dora-drives/graphs/tutorials/webcam_midas_frame.yaml --attach --hot-reload

Expected behavior
A clear and concise description of what you expected to happen.

Screenshots or Video
image

Environments (please complete the following information):

  • System info: ubuntu 22.04
  • Dora version: v0.2.3-rc6

todo: Add survival status check command `dora status` for dora

Is your feature request related to a problem? Please describe.
After I execute dora up, an exception occurs after starting dataflow. I don’t know if dora is running abnormally. I want to check the status of dora service through dora-cli.

Describe the solution you'd like
View dora service status through dora-cli, For example the command dora status.

Do we need a similar mechanism like the Qos in ROS2

This is a problem encountered in porting NDT algorithm from ROS2 ecosystem to DORA framework.

A problem was found during code debugging. If a function of the DORA operator takes a long time to execute, other input messages cannot be processed in time.

For example, the serviceNDTAlign() function in ndt_scan_matcher operator, see the settings in dataflow.yml. Since Particle filtering is required to find a more accurate initial position, the program takes a long time to execute when receiving the align_request_pose input (In fact, it takes about 3~5s to execute this program in ROS2, but it takes tens of seconds in DORA. I am looking for the reason why the execution takes so long, which will be discussed later). Then other input messages, such as downsampled_pointcloud and initial_pose, will not be processed in time and will accumulate in memory. In fact, messages such as downsampled_pointcloud and initial_pose are time sensitive. After a certain period of time, these messages are meaningless, so there is no need to cache all messages. In ROS2, one can specify how many messages can be cached at most by setting Qos, as follows:

  initial_pose_sub_ = this->create_subscription<geometry_msgs::msg::PoseWithCovarianceStamped>(
    "ekf_pose_with_covariance", 100,
    std::bind(&NDTScanMatcher::callbackInitialPose, this, std::placeholders::_1),
    initial_pose_sub_opt);

  sensor_points_sub_ = this->create_subscription<sensor_msgs::msg::PointCloud2>(
    "points_raw", rclcpp::SensorDataQoS().keep_last(points_queue_size),
    std::bind(&NDTScanMatcher::callbackSensorPoints, this, std::placeholders::_1), main_sub_opt);

Is there a similar mechanism in Dora, or do we also need such a mechanism?

Error: failed to send start dataflow message

Describe the bug

(dora3.7) jarvis@jia:~/coding/dora_home/dora-drives$ proxychains dora start graphs/tutorials/webcam_single_dpt_frame.yaml --attach --hot-reload --name dpt_midas
ProxyChains-3.1 (http://proxychains.sf.net)
|S-chain|-<>-127.0.0.1:1080-<><>-127.0.0.1:6012-<><>-OK
Error: failed to send start dataflow message

Caused by:
    failed to fill whole buffer

Location:
    binaries/cli/src/main.rs:214:10
(dora3.7) jarvis@jia:~/coding/dora_home/dora-drives$ 

To Reproduce
Steps to reproduce the behavior:

  1. Dora start daemon: dora up
  2. Start a new dataflow: proxychains dora start graphs/tutorials/webcam_single_dpt_frame.yaml --attach --hot-reload --name dpt_midas

Expected behavior
dora can support proxy mode, the purpose is to quickly download foreign acceleration algorithm models

Environments (please complete the following information):

  • System info: Linux jia 5.15.0-70-generic #77~20.04.1-Ubuntu SMP Wed Apr 5 09:38:34 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
  • Dora version: 0.2.2

Accidentally dora command is unresponsive and stuck

Describe the bug
Accidentally dora command is unresponsive and stuck

To Reproduce
Steps to reproduce the behavior:

  1. Few cases,when executing command dora up , dora start dataflow.yaml, dora stop, dora destroy

Environments (please complete the following information):

  • System info: Linux jia 5.15.0-69-generic #76~20.04.1-Ubuntu SMP Mon Mar 20 15:54:19 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
  • Dora version:0.2.2

WARN dora_daemon::pending: node `webcam` exited before initializing dora connection

Describe the bug
sh -c 'conda activate python_virtual_env' can not work normally, but all kinds of conda operations in ubuntu terminal are completely normal.
image

Customize the node according to the following configuration algorithm.

jarvis@jia:~/coding/dora_home/dora/binaries/cli/src$ dora-cli logs 6382d842-7471-44d7-8b76-2e5ae8b30d45 midas_op
     │ Logs from midas_op.
   1 │ sh: 1: conda: not found
jarvis@jia:~/coding/dora_home/dora/binaries/cli/src$ 

To Reproduce
Steps to reproduce the behavior:

  1. Dora start daemon: dora up
  2. Start a new dataflow: RUST_LOG=info RUST_BACKTRACE=full cargo run start /path/to/your_rust_home/github.com/meua/dora-drives/graphs/tutorials/webcam_midas_frame_custom_node.yaml --attach --hot-reload

Expected behavior
The python environments between different algorithms are isolated from each other.

Environments (please complete the following information):

  • System info: ubuntu 22.04
  • Dora version: v0.2.3-rc6

Additional context

  - id: midas_op
    custom:
      source: shell
      args: conda run -n midas-py310 python ../../operators/midas_op.py
#      args: source /home/jarvis/anaconda3/etc/profile.d/conda.sh && conda activate midas-py310 && python ../../operators/midas_op.py
#      args: conda env create -f ../../dependencies/MiDaS/environment.yaml && conda activate midas-py310 && python ../../operators/midas_op.py
#      args: conda activate midas-py310 && python ../../operators/midas_op.py
#      args: conda info -e && python ../../operators/midas_op.py
      outputs:
        - depth_frame
      inputs:
        image: webcam/image
···

Tracking issue for a Pylot MVP on Dora Python API.

Tracking issue for a Pylot MVP on Dora Python API. Pylot is going to validate and benchmark design choices made by DORA.

CARLA bridge

  • Spawn a 'hero' vehicle.
  • Spawn other vehicles representing traffic.
  • Spawn pedestrians.
  • Attach a camera.
  • Attach a Depth Camera.
  • Attach a segmentation Camera.
  • Send position information.

Detection

  • Port DL Obstacle operator.
  • Port DL traffic light operator.
  • Port perfect obstacle operator.
  • Port obstacle location operator.
  • Port obstacle trajectory prediction operator.

Planning

  • Port route planning operator.

Visualisation

  • Visualize image from camera.
  • Visualize obstacle from camera.
  • VIsualize waypoints from camera.

Control

  • Port PID Controller Operator
  • Control Operator

Deployment

  • Package Demo into a Docker

Bug

  • Upgrade Carla to latest version
  • Fix: "Illegal Instruction Error"

Those implementation are currently committed into #18.

openssl not installed

Describe the bug

Steps to reproduce the behavior:
  run pkg_config fail: "`\"pkg-config\" \"--libs\" \"--cflags\" \"openssl\"` did not exit successfully: exit status: 1\nerror: could not find system library 'openssl' required by the 'openssl-sys' crate\n\n--- stderr\nPackage openssl was not found in the pkg-config search path.\nPerhaps you should add the directory containing `openssl.pc'\nto the PKG_CONFIG_PATH environment variable\nNo package 'openssl' found\n"

  --- stderr
  thread 'main' panicked at '

  Could not find directory of OpenSSL installation, and this `-sys` crate cannot
  proceed without this knowledge. If OpenSSL is installed and this crate had
  trouble finding it,  you can set the `OPENSSL_DIR` environment variable for the
  compilation process.

  Make sure you also have the development packages of openssl installed.
  For example, `libssl-dev` on Ubuntu or `openssl-devel` on Fedora.

  If you're in a situation where you think the directory *should* be found
  automatically, please open a bug at https://github.com/sfackler/rust-openssl
  and include information about your system as well as this message.

  $HOST = x86_64-unknown-linux-gnu
  $TARGET = x86_64-unknown-linux-gnu
  openssl-sys = 0.9.76

  ', /home/navy/.cargo/registry/src/github.com-1ecc6299db9ec823/openssl-sys-0.9.76/build/find_normal.rs:191:5
  note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace```

**Expected behavior**
Install openssl automatically

**Screenshots or Video**
![image](https://user-images.githubusercontent.com/11570442/225255470-fb179d3f-918f-4eae-8161-fa9af9bb977c.png)

**Environments (please complete the following information):**
 - System info: Linux jia 5.15.0-67-generic #74~20.04.1-Ubuntu SMP Wed Feb 22 14:52:34 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
 - Dora version:dora-cli 0.2.0
 
**Additional context**
https://dora.carsmos.ai/dora/installation.html
This is triggered during the execution of the command `cargo build --all --release` abnormal

Reduce number of open file handles

There is a maximum number of open file handles per process (typically 1024). We currently allocate a separate shared memory region per output message, which can quickly exhaust this limit if the receiver buffers received inputs or when large queue sizes are used.

To avoid this problem, we should allocate larger shared memory buffers and use the same buffer for multiple messages.

The daemon is dead and cannot be restarted automatically

Describe the bug

At the beginning, dora-daemon and dora-coordinator are running normally. When dataflow is started, because dora-coordinator responds slowly, daemon will automatically exit alone and cannot be restarted.

/home/jarvis/coding/dora_home/dora/target/debug/dora-daemon
Error: lost connection to coordinator

Location:
    /home/jarvis/coding/dora_home/dora/binaries/daemon/src/lib.rs:247:29

Process finished with exit code 1

To Reproduce
Steps to reproduce the behavior:

  1. Dora start daemon: dora up
  2. Start a new dataflow: RUST_LOG=info cargo run start /home/jarvis/coding/rust_home/github.com/meua/dora-drives/graphs/tutorials/webcam_midas_frame.yaml --attach --hot-reload
  3. Let dora-coordinator sleep for a while.

Expected behavior
It is expected that after the command dora up is executed, dora-coodinator and dora-daemon will act as system daemon processes and will not exit abnormally at will.

Environments (please complete the following information):

  • System info: ubuntu 22.04
  • Dora version: 0.2.3-rc6

Additional context

(dora3.7) jarvis@jia:~/coding/dora_home/dora/binaries/cli/src$ RUST_LOG=info cargo run start /home/jarvis/coding/rust_home/github.com/meua/dora-drives/graphs/tutorials/webcam_midas_frame.yaml --attach --hot-reload
   Compiling rustix v0.36.13
   Compiling dora-operator-api-c v0.2.3-rc6 (/home/jarvis/coding/dora_home/dora/apis/c/operator)
   Compiling is-terminal v0.4.4
   Compiling clap v4.1.11
   Compiling bat v0.23.0
   Compiling dora-cli v0.2.3-rc6 (/home/jarvis/coding/dora_home/dora/binaries/cli)
    Finished dev [unoptimized + debuginfo] target(s) in 7.92s
     Running `/home/jarvis/coding/dora_home/dora/target/debug/dora-cli start /home/jarvis/coding/rust_home/github.com/meua/dora-drives/graphs/tutorials/webcam_midas_frame.yaml --attach --hot-reload`
failed to spawn dataflow on machine ``
(dora3.7) jarvis@jia:~/coding/dora_home/dora/binaries/cli/src$ 

failed to stop dataflow

Describe the bug
dora-daemon hangs up due to heartbeat timeout, but dora-coodinator is running normally, then I restart dora-daemon, when the dataflow is closed by dora stop uuid, it cannot be closed.

(dora3.7) jarvis@jia:~/coding/dora_home/dora$ conda activate py310
(py310) jarvis@jia:~/coding/dora_home/dora$ dora-cli up
started dora coordinator
started dora daemon

(py310) jarvis@jia:~/coding/dora_home/dora$ dora-cli -V
dora-cli 0.2.3-rc6
(py310) jarvis@jia:~/coding/dora_home/dora$ dora-cli check
Dora Coordinator: ok
Dora Daemon: ok

(py310) jarvis@jia:~/coding/dora_home/dora$ dora-cli start examples/python-operator-dataflow/dataflow.yml --attach --hot-reload
10af7c98-604d-4808-b48a-7e028cb3d733
  2023-05-19T03:53:57.743423Z  WARN dora_coordinator: daemon at `` did not react as expected to watchdog message

Caused by:
   0: failed to send watchdog message to daemon
   1: Broken pipe (os error 32)

Location:
    /home/jarvis/coding/dora_home/dora/binaries/coordinator/src/lib.rs:550:10
    at binaries/coordinator/src/lib.rs:468

open new terminal and kill dora-daemon, simulate the daemon process to hang up abnormally

(py310) jarvis@jia:~/coding/dora_home/dora$ dora-cli list
Running dataflows:
- [nappy-back] 10af7c98-604d-4808-b48a-7e028cb3d733
(py310) jarvis@jia:~/coding/dora_home/dora$ ps -ef | grep dora
jarvis     22117       1  0 11:41 pts/12   00:00:00 dora-coordinator
jarvis     22131       1  0 11:41 pts/12   00:00:01 dora-daemon
jarvis     24461   18206  0 11:53 pts/12   00:00:00 dora-cli start dataflow.yml --attach --hot-reload
jarvis     24464   22131  7 11:53 pts/12   00:00:01 python3 -c import dora; dora.start_runtime() # webcam
jarvis     24467   22131  8 11:53 pts/12   00:00:01 python3 -c import dora; dora.start_runtime() # plot
jarvis     24598   22333  0 11:53 pts/3    00:00:00 grep --color=auto dora
(py310) jarvis@jia:~/coding/dora_home/dora$ kill -15 22131
(py310) jarvis@jia:~/coding/dora_home/dora$ dora-cli list
Running dataflows:
- [nappy-back] 10af7c98-604d-4808-b48a-7e028cb3d733
(py310) jarvis@jia:~/coding/dora_home/dora$ dora-cli stop 
> Choose dataflow to stop: [nappy-back] 10af7c98-604d-4808-b48a-7e028cb3d733
no daemon connection
(py310) jarvis@jia:~/coding/dora_home/dora$ dora-cli up
started dora daemon
(py310) jarvis@jia:~/coding/dora_home/dora$ dora-cli check
Dora Coordinator: ok
Dora Daemon: ok

(py310) jarvis@jia:~/coding/dora_home/dora$ dora-cli list
Running dataflows:
- [nappy-back] 10af7c98-604d-4808-b48a-7e028cb3d733
(py310) jarvis@jia:~/coding/dora_home/dora$ dora-cli stop
> Choose dataflow to stop: [nappy-back] 10af7c98-604d-4808-b48a-7e028cb3d733
failed to stop dataflow
(py310) jarvis@jia:~/coding/dora_home/dora$ dora-cli list
Running dataflows:
- [nappy-back] 10af7c98-604d-4808-b48a-7e028cb3d733
(py310) jarvis@jia:~/coding/dora_home/dora$ dora-cli -V
dora-cli 0.2.3-rc6
(py310) jarvis@jia:~/coding/dora_home/dora$ 

To Reproduce
Steps to reproduce the behavior:

  1. Dora start coodinator and daemon: dora-cli up
  2. Start a new dataflow: dora-cli start examples/python-operator-dataflow/dataflow.yaml --attach --hot-reload
  3. Kill dora-daemon: kill -15 pid_dora_daemon
  4. Dora start daemon: dora-cli up
  5. Destroy dataflow: dora-cli stop uuid_your_dataflow

Expected behavior
I expect dora-coodinator and dora-daemon to live and die together, and they can automatically restart when the heartbeat times out, Or dora-daemon hangs up, and dataflow is also destroyed.

Environments (please complete the following information):

  • System info: ubuntu 22.04
  • Dora version: v0.2.3-rc6

Plans for command line interface

We are planning a dora command line tool with the following subcommands:

For v0.1:

  • help: Help about any command (generated by clap)
  • check
    • Pre-check the dora installation/environment for potential problems (e.g., YAML syntax)
    • Check that the dora-coordinator is running
    • Check that the iox-roudi deamon is running
  • build: Build the dora nodes/operators (similar to run.rs in current examples)
  • graph: Generate dataflow graph per YAML spec
    • visualize as mermaid code
    • open image automatically, e.g. in OS or browser window
  • templates: Dora node/operator template
    • Rust
    • C
    • Python
    • C++
    • dataflow YAML
  • start: Run dora dataflow
  • stop: Stop dora dataflow
  • destroy: Cleanup and destroy all dora related processes
  • list: List running dora entities (resources)
  • up: Install all dora dependencies and start dora coordinator, iceoryx daemon and/or zenoh router etc
  • upgrade: Upgrade dora

Later:

  • dashboard: Open the dora dashboard in a web browser (future)
  • logs: Tail logs from customer node/operators
  • metrics: Fetch metrics from customer node/operators
  • stats: Display dora/traffic statistics
  • get: Get info about the dora entity (future may need CRUD on dora entity)
  • graph: visualize as image directly (e.g. as svg)

Other Ideas:

  • package: Create an archive file containing all compiled nodes for easier distribution
    • similar to cargo package

See the discussion in #81 for more context.

Implementation PRs:

Errors when follow README for 0.1.0 release

Env: MacOS

I followed README to test v0.1.0 release, and have following errors, please take a look @haixuanTao

(base) ~/GitHub/dora-rs/bin/test1 $ dora start dataflow.yml
dyld[35552]: Library not loaded: /Library/Frameworks/Python.framework/Versions/3.11/Python
  Referenced from: <49D90856-1186-3169-BACA-0F128A85669B> /Users/yonghe/GitHub/dora-rs/bin/dora-runtime
  Reason: tried: '/Library/Frameworks/Python.framework/Versions/3.11/Python' (no such file), '/System/Volumes/Preboot/Cryptexes/OS/Library/Frameworks/Python.framework/Versions/3.11/Python' (no such file), '/Library/Frameworks/Python.framework/Versions/3.11/Python' (no such file), '/System/Library/Frameworks/Python.framework/Versions/3.11/Python' (no such file, not in dyld cache)
dyld[35551]: Library not loaded: /Library/Frameworks/Python.framework/Versions/3.11/Python
  Referenced from: <49D90856-1186-3169-BACA-0F128A85669B> /Users/yonghe/GitHub/dora-rs/bin/dora-runtime
  Reason: tried: '/Library/Frameworks/Python.framework/Versions/3.11/Python' (no such file), '/System/Volumes/Preboot/Cryptexes/OS/Library/Frameworks/Python.framework/Versions/3.11/Python' (no such file), '/Library/Frameworks/Python.framework/Versions/3.11/Python' (no such file), '/System/Library/Frameworks/Python.framework/Versions/3.11/Python' (no such file, not in dyld cache)
Traceback (most recent call last):
  File "/Users/yonghe/GitHub/dora-rs/bin/test1/./node_1/node_1.py", line 4, in <module>
    from dora import Node
ModuleNotFoundError: No module named 'dora'
  2022-11-16T17:37:32.302484Z ERROR dora_coordinator: error occured in dataflow `48b40fe9-3a44-4a63-b952-122583f5fb4c`

Caused by:
   0: failed to run dataflow at /Users/yonghe/GitHub/dora-rs/bin/test1/dataflow.yml
   1: custom node failed
   2: runtime node op_1 failed (unknown exit code)

Location:
    binaries/coordinator/src/run/runtime.rs:55:17
    at binaries/coordinator/src/lib.rs:90

48b40fe9-3a44-4a63-b952-122583f5fb4c
(base) ~/GitHub/dora-rs/bin/test1 $ ls ../
dataflow.yml                 dora-coordinator             dora-v0.1.0-x86_64-macOS.zip test1
dora                         dora-runtime                 iceoryx
(base) ~/GitHub/dora-rs/bin/test1 $ python --version
Python 3.9.7
(base) ~/GitHub/dora-rs/bin/test1 $ pip --version
pip 21.2.4 from /usr/local/lib/python3.9/site-packages/pip (python 3.9)
(base) ~/GitHub/dora-rs/bin/test1 $ 

2 Python Operators will share the same GIL

By trying the following dataflow:

communication:
  zenoh:
    prefix: /foo

nodes:
  - id: timer
    custom:
      run: ../target/release/examples/example_source_timer
      outputs:
        - time

  - id: rate-limited-timer
    custom:
      run: ../target/release/examples/rate_limit --seconds 0.5
      inputs:
        data: timer/time
      outputs:
        - rate_limited

  - id: random
    custom:
      run: ../target/release/examples/random_number
      inputs:
        timestamp: rate-limited-timer/rate_limited
      outputs:
        - number

  - id: logger
    custom:
      run: ../target/release/examples/example_sink_logger
      inputs:
        random: random/number
        time: timer/time
        timestamped-random: runtime-node/op-1/timestamped-random
       # c-counter: runtime-node/op-2/counter
        python-counter: runtime-node/op-4/counter

  - id: runtime-node
    operators:
      - id: op-1
        shared-library: ../target/release/libexample_operator.so
        inputs:
          random: random/number
          time: timer/time
        outputs:
          - timestamped-random
      # - id: op-2
        # shared-library: ../runtime/examples/c-operator/operator.so
        # inputs:
          # time: timer/time
        # outputs:
          # - counter
      - id: op-3
        python: ../runtime/examples/python-operator/op.py
        inputs:
          time-python: timer/time
        outputs:
          - counter
      - id: op-4
        python: ../runtime/examples/python-operator/op.py
        inputs:
          counter: runtime-node/op-3/counter
        outputs:
          - counter

with the recommended command:

cargo run --release -- run examples/mini-dataflow.yml

I am able to lock one of the two operator in a way that the queue of the operator never get unfilled.

I think we have to look at how to enable multithreading for pyo3.

I will investigate

The dora service is not started. An error is reported execute dora list

Describe the bug
The dora service is not started. An error is reported execute dora list

To Reproduce
Steps to reproduce the behavior:

  1. dora list

Screenshots or Video
QQ20230414-201905

Environments (please complete the following information):

  • System info: Linux jia 5.15.0-69-generic #76~20.04.1-Ubuntu SMP Mon Mar 20 15:54:19 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
  • Dora version : 0.2.2

The process cannot be destroyed by the command `dora destroy`

Describe the bug
The process cannot be destroyed by the command dora destroy

To Reproduce
Steps to reproduce the behavior:

  1. Dora start daemon: dora up
  2. Destroy dataflow: dora destroy

Screenshots or Video
image

Environments (please complete the following information):

  • System info: ubuntu 20.04
  • Dora version: 0.2.2

Replace the SRV mechanism in ROS2

This is a problem encountered in porting NDT algorithm from ROS2 ecosystem to DORA framework.

The SRV mechanism provided in ROS2 accepts the service request sent by the client and send back the response.

It is known the NDT algorithm requires GNSS signal or manual initial position assignment before it works, and the NDT algorithm is sensitive to the error of the given initial position, while the GNSS (without RTK) signal or manual initial position generally has an error of 2-10 meters, so the Particle filter + NDT algorithm is used to find an accurate position near the given initial position.

In the implementation of the autoware.universe's location module, the process of obtaining the exact initial pose is provided by a SRV mechanism, i.e. the serviceNDTAlign service in ndt_scan_matcher_core.cpp. The call to the serviceNDTAlign service is made in the pose_initializer package. Extract the relevant codes as follows:

// pose_initializer_core.cpp
void PoseInitializer::on_initialize(
  const Initialize::Service::Request::SharedPtr req,
  const Initialize::Service::Response::SharedPtr res)
{
    // ...
    try {
        change_state(State::Message::INITIALIZING);
        auto pose = req->pose.empty() ? get_gnss_pose() : req->pose.front();
        if (ndt_) {
            pose = ndt_->align_pose(pose);
        }
        pose.pose.covariance = output_pose_covariance_;
        // ...
  }
}

// ndt_module.cpp
NdtModule::NdtModule(rclcpp::Node * node) : logger_(node->get_logger())
{
    cli_align_ = node->create_client<RequestPoseAlignment>("ndt_align");
}

PoseWithCovarianceStamped NdtModule::align_pose(const PoseWithCovarianceStamped & pose)
{
    const auto req = std::make_shared<RequestPoseAlignment::Request>();
    req->pose_with_covariance = pose;

    if (!cli_align_->service_is_ready()) {
        throw component_interface_utils::ServiceUnready("NDT align server is not ready.");
    }

    RCLCPP_INFO(logger_, "Call NDT align server.");
    const auto res = cli_align_->async_send_request(req).get();
    if (!res->success) {
        RCLCPP_INFO(logger_, "NDT align server failed.");
        throw ServiceException(
            Initialize::Service::Response::ERROR_ESTIMATION, "NDT align server failed.");
    }
    RCLCPP_INFO(logger_, "NDT align server succeeded.");

    // Overwrite the covariance.
    return res->pose_with_covariance;
}

From the implementation of align_pose() function in ndt_module.cpp, it can be seen that, The SRV mechanism of ROS2 has two characteristics: first, whether the service is ready can be checked through if (! cli_align_ ->service_is_ready()) before calling the service; Second, service calls are executed asynchronously.

At present, there is no SRV mechanism in the DORA framework, so the Pub/Sub method is temporarily used to replace SRV: from dataflow.yml, it can be seen that first the pose_initializer operator sends a align_request_pose request message, and then the ndt_scan_matcher operator return the response message.

communication:
  zenoh:
    prefix: /localization-dataflow
    
nodes:
  - id: runtime-node
    operators:
      - id: pose_initializer
        shared-library: ../../../target/debug/pose_initializer
        inputs:
          manual_pose: manual_pose/initial_pose
          fitted_gnss_pose: runtime-node/map_height_fitter/fitted_gnss_pose
          aligned_response_pose: runtime-node/ndt_scan_matcher/aligned_response_pose
        outputs:
          - align_request_pose
          - initialpose3d

      - id: ndt_scan_matcher
        shared-library: ../../../target/debug/ndt_scan_matcher
        inputs:
          map_points: pointcloud_map_loader/map_points
          align_request_pose: runtime-node/pose_initializer/align_request_pose
          downsampled_pointcloud: runtime-node/pointcloud_downsample/downsampled_pointcloud
          regularization_pose: runtime-node/gnss_poser/gnss_pose_cov
          initial_pose: runtime-node/ekf_localizer/ekf_biased_pose_with_cov
          tf_baselink2lidar: tf_publisher/tf_baselink2lidar
        outputs:
          - ndt_pose_with_cov
          - aligned_response_pose

Such an approximate alternative cannot realize the two characteristics of SRV mechanism mentioned above: first, it is impossible to determine whether the service is ready before calling the service, because the ndt_align service provided by ndt_scan_matcher is actutally ready only when the operator has received a map_ points and downsampled_pointcloud input message. Since both ROS2 and Dora uses a Pub/Sub distributed communication mode, it is impossible to specify the node/operator startup sequence. So theoretically, it cannot be guaranteed that ndt_scan_matcher operator will receive the map_points and downsampled_pointcloud messages before receiving the align_request_pose request. Secondly, there is no asynchronous call mode, or I do not know how to use asynchronous mechanism in the implementation of DORA operator.

So there will be performance bottlenecks in the current alternatives. I wonder if there is a better alternative to SRV, Looking forward to your reply.

warning: variant `AllocateOutputSample` is never constructed

Describe the bug

warning: variant `AllocateOutputSample` is never constructed
  --> binaries\runtime\src\operator\mod.rs:71:5
   |
70 | pub enum OperatorEvent {
   |          ------------- variant in this enum
71 |     AllocateOutputSample {
   |     ^^^^^^^^^^^^^^^^^^^^
   |
   = note: `OperatorEvent` has a derived impl for the trait `Debug`, but this is intentionally ignored during dead code analysis
   = note: `#[warn(dead_code)]` on by default

To Reproduce
Steps to reproduce the behavior:

  1. git checkout -b v0.2.3-rc6 v0.2.3-rc6
  2. cd binaries/daemon && cargo run

Environments (please complete the following information):

  • System info: windows 10
  • Dora version: v0.2.3-rc6

WARN dora_daemon::pending: node `no_webcam` exited before initializing dora connection

Describe the bug

jarvis@jia:~/coding/dora_home/dora/examples/python-dataflow$ ./run.sh 
Looking in indexes: https://pypi.tuna.tsinghua.edu.cn/simple
Requirement already satisfied: maturin in ./.env/lib/python3.10/site-packages (0.15.1)
Requirement already satisfied: tomli>=1.1.0 in ./.env/lib/python3.10/site-packages (from maturin) (2.0.1)
🍹 Building a mixed python/rust project
🔗 Found pyo3 bindings with abi3 support for Python ≥ 3.7
🐍 Not using a specific python interpreter
📡 Using build options features from pyproject.toml
    Finished dev [unoptimized + debuginfo] target(s) in 0.11s
⚠️ Warning: Failed to set rpath for /home/jarvis/coding/dora_home/dora/target/debug/libdora.so: Failed to execute 'patchelf', did you install it? Hint: Try `pip install maturin[patchelf]` (or just `pip install patchelf`)
📦 Built wheel for abi3 Python ≥ 3.7 to /tmp/.tmpeaEO15/dora_rs-0.2.3rc6-cp37-abi3-linux_x86_64.whl
🛠 Installed dora-rs-0.2.3rc6
Looking in indexes: https://pypi.tuna.tsinghua.edu.cn/simple
Requirement already satisfied: pip in ./.env/lib/python3.10/site-packages (23.1.2)
Looking in indexes: https://pypi.tuna.tsinghua.edu.cn/simple
Requirement already satisfied: gitpython in ./.env/lib/python3.10/site-packages (from -r requirements.txt (line 5)) (3.1.31)
Requirement already satisfied: ipython in ./.env/lib/python3.10/site-packages (from -r requirements.txt (line 6)) (8.13.2)
Requirement already satisfied: matplotlib>=3.2.2 in ./.env/lib/python3.10/site-packages (from -r requirements.txt (line 7)) (3.7.1)
Requirement already satisfied: numpy>=1.18.5 in ./.env/lib/python3.10/site-packages (from -r requirements.txt (line 8)) (1.24.3)
Requirement already satisfied: opencv-python>=4.1.1 in ./.env/lib/python3.10/site-packages (from -r requirements.txt (line 9)) (4.7.0.72)
Requirement already satisfied: Pillow>=7.1.2 in ./.env/lib/python3.10/site-packages (from -r requirements.txt (line 10)) (9.5.0)
Requirement already satisfied: psutil in ./.env/lib/python3.10/site-packages (from -r requirements.txt (line 11)) (5.9.5)
Requirement already satisfied: PyYAML>=5.3.1 in ./.env/lib/python3.10/site-packages (from -r requirements.txt (line 12)) (6.0)
Requirement already satisfied: requests>=2.23.0 in ./.env/lib/python3.10/site-packages (from -r requirements.txt (line 13)) (2.30.0)
Requirement already satisfied: scipy>=1.4.1 in ./.env/lib/python3.10/site-packages (from -r requirements.txt (line 14)) (1.10.1)
Requirement already satisfied: thop>=0.1.1 in ./.env/lib/python3.10/site-packages (from -r requirements.txt (line 15)) (0.1.1.post2209072238)
Requirement already satisfied: torch>=1.7.0 in ./.env/lib/python3.10/site-packages (from -r requirements.txt (line 16)) (2.0.1)
Requirement already satisfied: torchvision>=0.8.1 in ./.env/lib/python3.10/site-packages (from -r requirements.txt (line 17)) (0.15.2)
Requirement already satisfied: tqdm>=4.64.0 in ./.env/lib/python3.10/site-packages (from -r requirements.txt (line 18)) (4.65.0)
Requirement already satisfied: tensorboard>=2.4.1 in ./.env/lib/python3.10/site-packages (from -r requirements.txt (line 21)) (2.13.0)
Requirement already satisfied: pandas>=1.1.4 in ./.env/lib/python3.10/site-packages (from -r requirements.txt (line 26)) (2.0.1)
Requirement already satisfied: seaborn>=0.11.0 in ./.env/lib/python3.10/site-packages (from -r requirements.txt (line 27)) (0.12.2)
Requirement already satisfied: gitdb<5,>=4.0.1 in ./.env/lib/python3.10/site-packages (from gitpython->-r requirements.txt (line 5)) (4.0.10)
Requirement already satisfied: backcall in ./.env/lib/python3.10/site-packages (from ipython->-r requirements.txt (line 6)) (0.2.0)
Requirement already satisfied: decorator in ./.env/lib/python3.10/site-packages (from ipython->-r requirements.txt (line 6)) (5.1.1)
Requirement already satisfied: jedi>=0.16 in ./.env/lib/python3.10/site-packages (from ipython->-r requirements.txt (line 6)) (0.18.2)
Requirement already satisfied: matplotlib-inline in ./.env/lib/python3.10/site-packages (from ipython->-r requirements.txt (line 6)) (0.1.6)
Requirement already satisfied: pickleshare in ./.env/lib/python3.10/site-packages (from ipython->-r requirements.txt (line 6)) (0.7.5)
Requirement already satisfied: prompt-toolkit!=3.0.37,<3.1.0,>=3.0.30 in ./.env/lib/python3.10/site-packages (from ipython->-r requirements.txt (line 6)) (3.0.38)
Requirement already satisfied: pygments>=2.4.0 in ./.env/lib/python3.10/site-packages (from ipython->-r requirements.txt (line 6)) (2.15.1)
Requirement already satisfied: stack-data in ./.env/lib/python3.10/site-packages (from ipython->-r requirements.txt (line 6)) (0.6.2)
Requirement already satisfied: traitlets>=5 in ./.env/lib/python3.10/site-packages (from ipython->-r requirements.txt (line 6)) (5.9.0)
Requirement already satisfied: pexpect>4.3 in ./.env/lib/python3.10/site-packages (from ipython->-r requirements.txt (line 6)) (4.8.0)
Requirement already satisfied: contourpy>=1.0.1 in ./.env/lib/python3.10/site-packages (from matplotlib>=3.2.2->-r requirements.txt (line 7)) (1.0.7)
Requirement already satisfied: cycler>=0.10 in ./.env/lib/python3.10/site-packages (from matplotlib>=3.2.2->-r requirements.txt (line 7)) (0.11.0)
Requirement already satisfied: fonttools>=4.22.0 in ./.env/lib/python3.10/site-packages (from matplotlib>=3.2.2->-r requirements.txt (line 7)) (4.39.4)
Requirement already satisfied: kiwisolver>=1.0.1 in ./.env/lib/python3.10/site-packages (from matplotlib>=3.2.2->-r requirements.txt (line 7)) (1.4.4)
Requirement already satisfied: packaging>=20.0 in ./.env/lib/python3.10/site-packages (from matplotlib>=3.2.2->-r requirements.txt (line 7)) (23.1)
Requirement already satisfied: pyparsing>=2.3.1 in ./.env/lib/python3.10/site-packages (from matplotlib>=3.2.2->-r requirements.txt (line 7)) (3.0.9)
Requirement already satisfied: python-dateutil>=2.7 in ./.env/lib/python3.10/site-packages (from matplotlib>=3.2.2->-r requirements.txt (line 7)) (2.8.2)
Requirement already satisfied: charset-normalizer<4,>=2 in ./.env/lib/python3.10/site-packages (from requests>=2.23.0->-r requirements.txt (line 13)) (3.1.0)
Requirement already satisfied: idna<4,>=2.5 in ./.env/lib/python3.10/site-packages (from requests>=2.23.0->-r requirements.txt (line 13)) (3.4)
Requirement already satisfied: urllib3<3,>=1.21.1 in ./.env/lib/python3.10/site-packages (from requests>=2.23.0->-r requirements.txt (line 13)) (1.26.15)
Requirement already satisfied: certifi>=2017.4.17 in ./.env/lib/python3.10/site-packages (from requests>=2.23.0->-r requirements.txt (line 13)) (2023.5.7)
Requirement already satisfied: filelock in ./.env/lib/python3.10/site-packages (from torch>=1.7.0->-r requirements.txt (line 16)) (3.12.0)
Requirement already satisfied: typing-extensions in ./.env/lib/python3.10/site-packages (from torch>=1.7.0->-r requirements.txt (line 16)) (4.5.0)
Requirement already satisfied: sympy in ./.env/lib/python3.10/site-packages (from torch>=1.7.0->-r requirements.txt (line 16)) (1.12)
Requirement already satisfied: networkx in ./.env/lib/python3.10/site-packages (from torch>=1.7.0->-r requirements.txt (line 16)) (3.1)
Requirement already satisfied: jinja2 in ./.env/lib/python3.10/site-packages (from torch>=1.7.0->-r requirements.txt (line 16)) (3.1.2)
Requirement already satisfied: nvidia-cuda-nvrtc-cu11==11.7.99 in ./.env/lib/python3.10/site-packages (from torch>=1.7.0->-r requirements.txt (line 16)) (11.7.99)
Requirement already satisfied: nvidia-cuda-runtime-cu11==11.7.99 in ./.env/lib/python3.10/site-packages (from torch>=1.7.0->-r requirements.txt (line 16)) (11.7.99)
Requirement already satisfied: nvidia-cuda-cupti-cu11==11.7.101 in ./.env/lib/python3.10/site-packages (from torch>=1.7.0->-r requirements.txt (line 16)) (11.7.101)
Requirement already satisfied: nvidia-cudnn-cu11==8.5.0.96 in ./.env/lib/python3.10/site-packages (from torch>=1.7.0->-r requirements.txt (line 16)) (8.5.0.96)
Requirement already satisfied: nvidia-cublas-cu11==11.10.3.66 in ./.env/lib/python3.10/site-packages (from torch>=1.7.0->-r requirements.txt (line 16)) (11.10.3.66)
Requirement already satisfied: nvidia-cufft-cu11==10.9.0.58 in ./.env/lib/python3.10/site-packages (from torch>=1.7.0->-r requirements.txt (line 16)) (10.9.0.58)
Requirement already satisfied: nvidia-curand-cu11==10.2.10.91 in ./.env/lib/python3.10/site-packages (from torch>=1.7.0->-r requirements.txt (line 16)) (10.2.10.91)
Requirement already satisfied: nvidia-cusolver-cu11==11.4.0.1 in ./.env/lib/python3.10/site-packages (from torch>=1.7.0->-r requirements.txt (line 16)) (11.4.0.1)
Requirement already satisfied: nvidia-cusparse-cu11==11.7.4.91 in ./.env/lib/python3.10/site-packages (from torch>=1.7.0->-r requirements.txt (line 16)) (11.7.4.91)
Requirement already satisfied: nvidia-nccl-cu11==2.14.3 in ./.env/lib/python3.10/site-packages (from torch>=1.7.0->-r requirements.txt (line 16)) (2.14.3)
Requirement already satisfied: nvidia-nvtx-cu11==11.7.91 in ./.env/lib/python3.10/site-packages (from torch>=1.7.0->-r requirements.txt (line 16)) (11.7.91)
Requirement already satisfied: triton==2.0.0 in ./.env/lib/python3.10/site-packages (from torch>=1.7.0->-r requirements.txt (line 16)) (2.0.0)
Requirement already satisfied: setuptools in ./.env/lib/python3.10/site-packages (from nvidia-cublas-cu11==11.10.3.66->torch>=1.7.0->-r requirements.txt (line 16)) (67.7.2)
Requirement already satisfied: wheel in ./.env/lib/python3.10/site-packages (from nvidia-cublas-cu11==11.10.3.66->torch>=1.7.0->-r requirements.txt (line 16)) (0.40.0)
Requirement already satisfied: cmake in ./.env/lib/python3.10/site-packages (from triton==2.0.0->torch>=1.7.0->-r requirements.txt (line 16)) (3.26.3)
Requirement already satisfied: lit in ./.env/lib/python3.10/site-packages (from triton==2.0.0->torch>=1.7.0->-r requirements.txt (line 16)) (16.0.3)
Requirement already satisfied: absl-py>=0.4 in ./.env/lib/python3.10/site-packages (from tensorboard>=2.4.1->-r requirements.txt (line 21)) (1.4.0)
Requirement already satisfied: grpcio>=1.48.2 in ./.env/lib/python3.10/site-packages (from tensorboard>=2.4.1->-r requirements.txt (line 21)) (1.54.0)
Requirement already satisfied: google-auth<3,>=1.6.3 in ./.env/lib/python3.10/site-packages (from tensorboard>=2.4.1->-r requirements.txt (line 21)) (2.18.0)
Requirement already satisfied: google-auth-oauthlib<1.1,>=0.5 in ./.env/lib/python3.10/site-packages (from tensorboard>=2.4.1->-r requirements.txt (line 21)) (1.0.0)
Requirement already satisfied: markdown>=2.6.8 in ./.env/lib/python3.10/site-packages (from tensorboard>=2.4.1->-r requirements.txt (line 21)) (3.4.3)
Requirement already satisfied: protobuf>=3.19.6 in ./.env/lib/python3.10/site-packages (from tensorboard>=2.4.1->-r requirements.txt (line 21)) (4.23.0)
Requirement already satisfied: tensorboard-data-server<0.8.0,>=0.7.0 in ./.env/lib/python3.10/site-packages (from tensorboard>=2.4.1->-r requirements.txt (line 21)) (0.7.0)
Requirement already satisfied: werkzeug>=1.0.1 in ./.env/lib/python3.10/site-packages (from tensorboard>=2.4.1->-r requirements.txt (line 21)) (2.3.4)
Requirement already satisfied: pytz>=2020.1 in ./.env/lib/python3.10/site-packages (from pandas>=1.1.4->-r requirements.txt (line 26)) (2023.3)
Requirement already satisfied: tzdata>=2022.1 in ./.env/lib/python3.10/site-packages (from pandas>=1.1.4->-r requirements.txt (line 26)) (2023.3)
Requirement already satisfied: smmap<6,>=3.0.1 in ./.env/lib/python3.10/site-packages (from gitdb<5,>=4.0.1->gitpython->-r requirements.txt (line 5)) (5.0.0)
Requirement already satisfied: cachetools<6.0,>=2.0.0 in ./.env/lib/python3.10/site-packages (from google-auth<3,>=1.6.3->tensorboard>=2.4.1->-r requirements.txt (line 21)) (5.3.0)
Requirement already satisfied: pyasn1-modules>=0.2.1 in ./.env/lib/python3.10/site-packages (from google-auth<3,>=1.6.3->tensorboard>=2.4.1->-r requirements.txt (line 21)) (0.3.0)
Requirement already satisfied: six>=1.9.0 in ./.env/lib/python3.10/site-packages (from google-auth<3,>=1.6.3->tensorboard>=2.4.1->-r requirements.txt (line 21)) (1.16.0)
Requirement already satisfied: rsa<5,>=3.1.4 in ./.env/lib/python3.10/site-packages (from google-auth<3,>=1.6.3->tensorboard>=2.4.1->-r requirements.txt (line 21)) (4.9)
Requirement already satisfied: requests-oauthlib>=0.7.0 in ./.env/lib/python3.10/site-packages (from google-auth-oauthlib<1.1,>=0.5->tensorboard>=2.4.1->-r requirements.txt (line 21)) (1.3.1)
Requirement already satisfied: parso<0.9.0,>=0.8.0 in ./.env/lib/python3.10/site-packages (from jedi>=0.16->ipython->-r requirements.txt (line 6)) (0.8.3)
Requirement already satisfied: ptyprocess>=0.5 in ./.env/lib/python3.10/site-packages (from pexpect>4.3->ipython->-r requirements.txt (line 6)) (0.7.0)
Requirement already satisfied: wcwidth in ./.env/lib/python3.10/site-packages (from prompt-toolkit!=3.0.37,<3.1.0,>=3.0.30->ipython->-r requirements.txt (line 6)) (0.2.6)
Requirement already satisfied: MarkupSafe>=2.1.1 in ./.env/lib/python3.10/site-packages (from werkzeug>=1.0.1->tensorboard>=2.4.1->-r requirements.txt (line 21)) (2.1.2)
Requirement already satisfied: executing>=1.2.0 in ./.env/lib/python3.10/site-packages (from stack-data->ipython->-r requirements.txt (line 6)) (1.2.0)
Requirement already satisfied: asttokens>=2.1.0 in ./.env/lib/python3.10/site-packages (from stack-data->ipython->-r requirements.txt (line 6)) (2.2.1)
Requirement already satisfied: pure-eval in ./.env/lib/python3.10/site-packages (from stack-data->ipython->-r requirements.txt (line 6)) (0.2.2)
Requirement already satisfied: mpmath>=0.19 in ./.env/lib/python3.10/site-packages (from sympy->torch>=1.7.0->-r requirements.txt (line 16)) (1.3.0)
Requirement already satisfied: pyasn1<0.6.0,>=0.4.6 in ./.env/lib/python3.10/site-packages (from pyasn1-modules>=0.2.1->google-auth<3,>=1.6.3->tensorboard>=2.4.1->-r requirements.txt (line 21)) (0.5.0)
Requirement already satisfied: oauthlib>=3.0.0 in ./.env/lib/python3.10/site-packages (from requests-oauthlib>=0.7.0->google-auth-oauthlib<1.1,>=0.5->tensorboard>=2.4.1->-r requirements.txt (line 21)) (3.2.2)
   Compiling dora-operator-api-c v0.2.3-rc6 (/home/jarvis/coding/dora_home/dora/apis/c/operator)
   Compiling dora-cli v0.2.3-rc6 (/home/jarvis/coding/dora_home/dora/binaries/cli)
    Finished dev [unoptimized + debuginfo] target(s) in 4.51s
     Running `/home/jarvis/coding/dora_home/dora/target/debug/dora-cli up`
    Finished dev [unoptimized + debuginfo] target(s) in 0.11s
     Running `/home/jarvis/coding/dora_home/dora/target/debug/dora-daemon --run-dataflow dataflow_without_webcam.yml`
  2023-05-23T09:19:39.129115Z ERROR dora_daemon: 
    ae86c2e2-49b1-4af2-b750-e9672d5abdf2/no_webcam failed with exit code 1.

    Check logs using: dora logs ae86c2e2-49b1-4af2-b750-e9672d5abdf2 no_webcam
                            
    at binaries/daemon/src/lib.rs:906

  2023-05-23T09:19:39.129165Z  WARN dora_daemon::pending: node `no_webcam` exited before initializing dora connection
    at binaries/daemon/src/pending.rs:76

  2023-05-23T09:21:24.222078Z ERROR dora_daemon: 
    ae86c2e2-49b1-4af2-b750-e9672d5abdf2/object_detection failed with exit code 1.

    Check logs using: dora logs ae86c2e2-49b1-4af2-b750-e9672d5abdf2 object_detection
                            
    at binaries/daemon/src/lib.rs:906

  2023-05-23T09:21:24.222120Z  WARN dora_daemon::pending: node `object_detection` exited before initializing dora connection
    at binaries/daemon/src/pending.rs:76

  2023-05-23T09:21:24.234340Z ERROR dora_daemon: 
    ae86c2e2-49b1-4af2-b750-e9672d5abdf2/plot failed with exit code 1.

    Check logs using: dora logs ae86c2e2-49b1-4af2-b750-e9672d5abdf2 plot
                            
    at binaries/daemon/src/lib.rs:906

Error: some nodes failed:
  - ae86c2e2-49b1-4af2-b750-e9672d5abdf2/no_webcam: 
    ae86c2e2-49b1-4af2-b750-e9672d5abdf2/no_webcam failed with exit code 1.

    Check logs using: dora logs ae86c2e2-49b1-4af2-b750-e9672d5abdf2 no_webcam
                            
  - ae86c2e2-49b1-4af2-b750-e9672d5abdf2/object_detection: 
    ae86c2e2-49b1-4af2-b750-e9672d5abdf2/object_detection failed with exit code 1.

    Check logs using: dora logs ae86c2e2-49b1-4af2-b750-e9672d5abdf2 object_detection
                            
  - ae86c2e2-49b1-4af2-b750-e9672d5abdf2/plot: 
    ae86c2e2-49b1-4af2-b750-e9672d5abdf2/plot failed with exit code 1.

    Check logs using: dora logs ae86c2e2-49b1-4af2-b750-e9672d5abdf2 plot
                            

Location:
    /home/jarvis/coding/dora_home/dora/binaries/daemon/src/lib.rs:163:13
jarvis@jia:~/coding/dora_home/dora/examples/python-dataflow$ 

To Reproduce
Steps to reproduce the behavior:

  1. Dora start daemon: cd examples/python-dataflow && ./run.sh

Environments (please complete the following information):

  • System info: 22.04
  • Dora version: v0.2.2

error while loading shared libraries: libpython3.7m.so.1.0: cannot open shared object file: No such file or directory

Describe the bug

(dora3.7) jarvis@jia:~/coding/rust_home/github.com/dora-rs/dora/binaries/daemon/src$ cargo run
   Compiling pyo3-build-config v0.18.2
   Compiling clap v4.1.11
   Compiling pyo3-ffi v0.18.2
   Compiling pyo3 v0.18.2
   Compiling arrow v35.0.0
   Compiling dora-node-api v0.2.2 (/home/jarvis/coding/rust_home/github.com/dora-rs/dora/apis/rust/node)
   Compiling dora-operator-api-python v0.2.2 (/home/jarvis/coding/rust_home/github.com/dora-rs/dora/apis/python/operator)
   Compiling dora-runtime v0.2.2 (/home/jarvis/coding/rust_home/github.com/dora-rs/dora/binaries/runtime)
   Compiling dora-daemon v0.2.2 (/home/jarvis/coding/rust_home/github.com/dora-rs/dora/binaries/daemon)
    Finished dev [unoptimized + debuginfo] target(s) in 9.55s
     Running `/home/jarvis/coding/rust_home/github.com/dora-rs/dora/target/debug/dora-daemon main.rs`
/home/jarvis/coding/rust_home/github.com/dora-rs/dora/target/debug/dora-daemon: error while loading shared libraries: libpython3.7m.so.1.0: cannot open shared object file: No such file or directory

To Reproduce
Steps to reproduce the behavior:

  1. Dora start daemon: cd /path/to/dora-rs/dora cargo run

Environments (please complete the following information):

  • System info: ubuntu 20.04
  • Dora version: main branch

MLA - Middleware Layer Abstraction

Middleware (communication) layer abstraction (MLA)

We need to implement MLA as a separate crate to provides a middleware abstraction layer that enables scalable, high performance communications for inter async tasks, intra-process (OS threads), interprocess communication on a single computer node or between different nodes in a computer network. MLA needs to support different communication patterns:

  • publish-subscribe push / push pattern - the published message is pushed to subscribers
  • publish-subscribe push / pull pattern - the published message is write to storage and later pulled by subscribers
  • Request / reply pattern
  • Point-to-point pattern
  • Client / Server pattern

The MLA needs to abstract following details:

  • inter-async tasks (e.g., tokio channels), intraprocess (OS threads, e.g., shared memory), interprocess and inter-host / inter-network communication
  • different transport layer implementations (shared memory, UDP, TCP)
  • builtin support for multiple serialization / deserialization protocols, e.g, capnproto, protobuf, flatbuffers etc
  • different language bindings to Rust, Python, C, C++ etc
  • telemetry tools for logs, metrics, distributed tracing, live data monitoring (e.g., tap a live data), recording and replay

Rust eco-system has abundant crates to provide underlaying communications, e.g.,:

  • tokio / crossbeam provides different types of channels serving different purpose: mpsc, oneshot, broadcast, watch etc
  • Tonic provides gRPC services
  • Tower provides request/reply service
  • Zenoh middleware provides many different pub/sub capabilities

MLA also needs to provide high level APIs:

  • publish(topic, value, optional fields):- optional fields may contain senders' identify to help MLA logics to satify above requirements
  • subscriber(topic, optional fields)-> future streams
  • put(key, value, optional fields)
  • get(key, optional fields) -> value
  • send(key, msg, optional fields)
  • recv(key, optional fields)->value

NDT execution is very time-consuming

After porting NDT algorithm from ROS2 ecosystem to DORA framework, I conducted some tests on the performance of NDT algorithm. The results show that NDT execution is very time-consuming. I've been trying to find out the reason these days, but can't solve the problem until now. The following is the detailed debugging process.

NDT operator execution time on Dora side

First, add necessary print info in ndt_scan_matcher.cpp to obtain the time required for NDT execution.

// ndt_scan_matcher.cpp
geometry_msgs::msg::PoseWithCovarianceStamped NDTScanMatcher::alignUsingMonteCarlo(
  const std::shared_ptr<NormalDistributionsTransformBase<PointSource, PointTarget>> & ndt_ptr,
  const geometry_msgs::msg::PoseWithCovarianceStamped & initial_pose_with_cov)
{
    // ...
    for (unsigned int i = 0; i < initial_poses.size(); i++) {
        const auto exe_start_time = std::chrono::system_clock::now(); // for debug
        // ...
        ndt_ptr->align(*output_cloud, initial_pose_matrix);
        // ...
        const auto exe_end_time = std::chrono::system_clock::now(); // for debug
        const double exe_time = std::chrono::duration_cast<std::chrono::microseconds>(exe_end_time - exe_start_time).count() / 1000.0;
        std::cout << "The alignUsingMonteCarlo execute time is: "<< exe_time << " ms, " << "and NumIteration is: " << num_iteration << std::endl;  // for debug
        // ...
    }
}

Where ndt_ptr->align(*output_cloud, initial_pose_matrix); calls a multi-core version of NDT algorithm. This multi-core implementation actually uses the OpenMP based multi-threading mechanism. The corresponding parallel settings is in ndt_omp_impl.hpp, extracted as follows.

// ndt_omp_impl.hpp
template<typename PointSource, typename PointTarget> double
pclomp::NormalDistributionsTransform<PointSource, PointTarget>::computeDerivatives(Eigen::Matrix<double, 6, 1> &score_gradient,
	Eigen::Matrix<double, 6, 6> &hessian,
	PointCloudSource &trans_cloud,
	Eigen::Matrix<double, 6, 1> &p,
	bool compute_hessian)
{
    // ...
    // const auto parallel_start_time = std::chrono::system_clock::now(); // for debug
#pragma omp parallel for num_threads(num_threads_) schedule(guided, 8)
    for (std::size_t idx = 0; idx < input_->points.size(); idx++)
    {
        int thread_n = omp_get_thread_num();
        // ...
    }
    // const auto parallel_end_time = std::chrono::system_clock::now(); // for debug
    // const double parallel_exe_time = std::chrono::duration_cast<std::chrono::microseconds>(parallel_end_time - parallel_start_time).count() / 1000.0;
    // printf("---- computeDerivatives parallel part execute time is %f ms ----\n", parallel_exe_time); // for debug
    
    // const auto serial_start_time = std::chrono::system_clock::now(); // for debug
    for (std::size_t i = 0; i < input_->points.size(); i++) {
        // ...
    }
    if (regularization_pose_) {
        // ...
    }
    // const auto serial_end_time = std::chrono::system_clock::now(); // for debug
    // const double serial_exe_time = std::chrono::duration_cast<std::chrono::microseconds>(serial_end_time - serial_start_time).count() / 1000.0;
    // printf("---- computeDerivatives serial part execute time is %f ms ----\n", serial_exe_time); // for debug
}

The test results are shown in the figures below. It can be found that the NDT execution time is usually above 400 ms, far from meeting the real-time requirements of localization. And by using htop and pstree commands, it is indicated that the multi-threading mechanism works properly.

dora-side-ndt-exetime-a

dora-side-ndt-threads-a

dora-side-ndt-threads-b

To find out which code is time-consuming to execute, I added more print info to "ndt_omp_impl.hpp", seeing the above code commented with "//" for details. The result shows that the execution time of the "computeDerivatives" function is about 70 ms, and this function will be called many times each time the NDT algorithm is run. Below is the corresponding screenshot.

dora-side-ndt-exetime-b

NDT execution time on ROS2 side

For comparison, on ROS2 side, the screenshots of NDT execution time are as follows. It only takes about 0.7 ms for the "computeDerivatives" function to execute and the total NDT execution time is about 20 ms.

ros2-side-ndt-threads-a

ros2-side-ndt-exetime-b

NDT node execution time on Dora side

Considering that the Dora operator itself is a thread, in order to eliminate the influence of this factor on the multi-threading mechanism of ndt, I rewrite the "ndt_scan_matcher" as Dora node and record its execution time. The relevant results are as follows:

dora-side-ndtnode-exetime-a

dora-side-ndtnode-threads-a

dora-side-ndtnode-threads-b

dora-side-ndtnode-exetime-b

It remains similar to the case of porting "ndt_scan_matcher" as Dora operator.


I carefully analyzed the code in ndt_omp_impl.hpp, and there seems nothing special. So it is strange that the same code only takes about 20 ms to execute on the ROS2 side, but it takes about 1000 ms on the Dora side. Looking forward to your advice.

error[E0554]: `#![feature]` may not be used on the stable release channel

Describe the bug

error[E0554]: `#![feature]` may not be used on the stable release channel
  --> /home/jarvis/.cargo/registry/src/github.com-1ecc6299db9ec823/rustix-0.36.9/src/lib.rs:99:26
   |
99 | #![cfg_attr(rustc_attrs, feature(rustc_attrs))]
   |                          ^^^^^^^^^^^^^^^^^^^^

To Reproduce
Steps to reproduce the behavior:

  1. Debug start coordinator.

Environments (please complete the following information):

  • System info: ubuntu 22.04
  • Dora version: dora-rs/dora main branch

Zero-copy Python inputs using `arrow`

Motivation

While we already have support for zero-copy inputs in Rust nodes, we still require copying the data for Python nodes. This can lead to decreased performance when using messages with large data. To avoid these slowdowns, we want to support zero-copy inputs for Python nodes too.

Challenges

The fundamental challenge is that Python normally operates on owned data. So we have to use a special type such as the memoryview object to make the data accessible to Python without copying. What makes things more complex is that we need a custom freeing logic because we need to unmap the shared memory and report a drop event back to the sender node.

Using a non-standard data type makes it more complex to interact with the data. For example, special functions might be needed for reading, cloning, and slicing the data. Also, it is common to convert the data to other types (e.g. numpy arrays), which requires special conversion functions.

Apache Arrow

To keep the Python API simple and easy to use, it's a good idea to use some mature existing data format rather than creating our own custom data format. The Apache Arrow project provides such a data format that supports zero-copy data transfer across language boundaries. It's already quite popular and used in many projects and tools, so it looks like a good candidate for dora.

Arrow vs PyO3

  • arrow provides two useful features for us
    • a stable ABI for sharing data across programming languages without copying
    • a library with high-level functions to work with the data in Python without copying it (e.g. convert it to a numpy array view)
  • We could also create our own stable ABI types using PyO3
    • This approach is more flexible, e.g. it allows implementing custom methods.
    • However, we would need to implement all desired Python functionality and conversions manually.

The Arrow Array datatype

  • Most suitable data type for our use case since dora inputs are always byte arrays right now
    • (We might want to support additional arrow types in the future.)
  • Provides direct access to the data
    • Invalidating the data from the outside is not possible. This would be useful for ensuring that old inputs are properly freed.
    • We need to rely on the user code to not keep the inputs alive for too long, otherwise the shared memory remains blocked. (We could print some warnings if an input is still not dropped after some timeout.)
  • Memory management is done using a release callback.

Implementation

  • Accessing the data from Python is possible using pyarrow:
    let pa = py.import("pyarrow")?;
    
    let array = pa.getattr("Array")?.call_method1(
        "_import_from_c",
        (array_ptr as Py_uintptr_t, schema_ptr as Py_uintptr_t),
    )?;
  • The question is how we can create the ArrowArray from Rust with custom drop semantics.

The arrow2 crate

  • Provides a set of different array types that can be shared via arrow
  • Unfortunately, most of these types are unsuitable for our use-case, since they assume that the data is stored on the heap
    • our data is backed by shared memory instead, which needs special actions on drop
  • If leaking the data was acceptable, we could use a PrimitiveArray and the arrow2::ffi::mmap::slice function.
    • Unfortunately there is no way to figure out whether the data is no longer used.
    • Freeing the data too early would result in a segfault or in corrupted data, which is not acceptable.
  • Idea: Implement the arrow2::array::Array trait for a custom type
    • Then we could use the arrow2::ffi::export_array_to_c function to create the Arrow array
      • The arrow2 library automatically fills in a release implementation that calls the drop handler of our custom type
      • In this drop handler, we could unmap the shared memory and report the drop tokens to the dora daemon
    • Unfortunately, this approach is not possible because the export_array_to_c function only works with the predefined array types (via downcasting)
  • Conclusion: There appears to be no way to create an Arrow array with custom drop logic using the arrow2 crate

Manual Creation

  • The ABI layout of Arrow arrays is specified, so we could do the creation manually
  • Drawbacks: very unsafe, difficult to get right, a lot of work
  • Advantage: We can implement exactly the drop behavior that we like

The official arrow crate

dora-coordinator has very high CPU usage

Env: Ubuntu 22.04
Dora Release: 0.1.1 (0.1.0 on MacOS has similar issue)

Python example:

  1. After dora up, CPU usage seems normal
  2. dora start python example, CPU usage become very high
  3. after dora stop, CPU usage still very high (4 of 16 CPU cores reach 100%), htop command shows dora-coordinator on 4 CPU cores, reach 100%
  4. after dora destroy, CPU usage back to normal

Rust example (I can build dora new rust_proj example using 0.1.1:):

  1. After dora up, CPU usage seems normal
  2. dora start rust example, CPU usage become very high
  3. after dora stop, CPU usage is still very high (2 of 16 CPU cores reach 100%), htop command shows dora-coordinator on 2 CPU cores, reach 100%
  4. after dora destroy, CPU usage back to normal

Ths issue is why after dora stop the dataflow, the dora-coordinator has such a high CPU usage. Further more, dora-coordinator as a control plane component, it should not have high CPU usage

@phil-opp please take a look

Add `Event::OutputReady` to the Event stream

Is your feature request related to a problem? Please describe.
Make sure that when sending output the nodes or operators are ready to receive those inputs.

Current solution
Current solution is quite hacky and requires to make a loop, like:

nodes:
  - id: oasis_agent
    custom:
      inputs:
        carla_gps_ready: carla_gps_op/ready
      outputs:
        - check
      source: shell

  - id: carla_gps_op
    operator:
      python: ../../carla/carla_gps_op.py
      outputs:
        - ready
      inputs:
        check: oasis_agent/check

Describe the solution you'd like

For Core / Daemon

  • Add Event::OutputReady and Event::AllOutputsReady to our event stream
  • Send NodeEvent::OutputReady { id } to signal to a node that an output is ready
  • Send NodeEvent::AllOutputsReady to signal that all outputs are ready ( similar to NodeEvent::AllInputsClosed )

For Runtime / Operators

  • Send a Ready event to the daemon when it has finished its initialization method (e.g.: __init__ for python).

Notes: This is transparent to the user

For Custom Nodes

  • Send a Ready event to the daemon when the dora node object is initialized. (i.e.: node = Node() in Python)

Notes: This is going to be transparent to the user.

Memory leaks with Python API Operator

It seems that we have a memory leak in the Python API operator. I am not able to reproduce it in the Rust API Operator but might also be present there.

Reproduce

If you unzip the following python dataflow and run it with:

dora-coordinator --run-dataflow dataflow.yml

my_dataflow.zip

You will be able to see the memory usage growing.

How did I found out this issue?

I found out this issue as I was running the Carla leaderboard and after about 2 hours, I run out of memory. So, we might want to fix this issue for the Carla Leaderboard.

Current Investigation

I'm currently investigating the different reasons for this leak. It seems that pyo3 can easily creates copy of python object. See: https://pyo3.rs/v0.15.1/memory.html . What I know so far:

  • This is true for both iceoryx and zenoh.
  • This is coming from the subscriber.

The python installation package of dora 0.2.2 is abnormal on the windows system, and dora-drives cannot recognize the dora installation package

Describe the bug
The python installation package of dora 0.2.2 is abnormal on the windows system, and dora-drives cannot recognize the dora installation package

To Reproduce
Steps to reproduce the behavior:

  1. visit website: https://pypi.org/project/dora-rs/0.2.2/:
  2. Enter https://github.com/dora-rs/dora project : cd apis/python/node
  3. Start a new dataflow: dora start dataflow.yaml
  4. Install dependency: pip install maturin
  5. Pip install dora-rs==0.2.2: maturin develop
  6. Enter https://github.com/dora-rs/dora-drives project : dora check --dataflow graphs/tutorials/webcam.yaml

Expected behavior
python package of dora support windows 10

Screenshots or Video
image
image

Environments (please complete the following information):

  • System info: windows 10
  • Dora version: dora 0.2.2

exec the command `dora upgrade` exception

Describe the bug
image

To Reproduce
Steps to reproduce the behavior:

  1. Dora upgrade daemon: dora upgrade
  2. The error is as follows:
    thread 'main' panicked at 'not yet implemented', binaries/cli/src/main.rs:148:29
    note: run with RUST_BACKTRACE=1 environment variable to display a backtrace

Screenshots or Video
image

Environments (please complete the following information):

  • System info:
    Linux jia 5.15.0-67-generic #74~20.04.1-Ubuntu SMP Wed Feb 22 14:52:34 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
  • Dora version:
    dora-cli 0.1.3

todo: dora provides Chinese documents

Is your feature request related to a problem? Please describe.
Dora currently only has English documents and no Chinese documents, which is not very friendly to Chinese domestic users

Describe the solution you'd like
Dora documents can be switched between Chinese and English

Describe alternatives you've considered
dora supports Chinese documents

Cannot exec dora or dora-cli command when install dora with conda on macOS

Describe the bug
cannot exec dora or dora-cli command when install dora with conda on macOS

To Reproduce
Steps to reproduce the behavior:

  1. create a new conda env : conda create -n dora python=3.9
  2. activate dora: conda activate dora
  3. install dora and dora-cli in conda env: pip install dora dora-cli
  4. exec dora-cli or dora comman in terminal

Expected behavior

expect dora can successfully run.

Failing operator initialization make the dataflow run forever

Describe the bug
If one operator initialisation fails. The other operator wait forever for this operator.

To Reproduce
Steps to reproduce the behavior:

  1. Dora start: dora up
  2. Start a new dataflow: dora start dataflow.yaml # With a failed initialized operator

p.s: I've only tested this for python operator.

Dora adds support for golang

Is your feature request related to a problem? Please describe.
At present, dora supports c++/c/python/rust language, and has basically supported several commonly used languages, but now the mainstream of cloud native needs to be golang, so some AI algorithms will use golang to encapsulate

Describe the solution you'd like
dora supports golang to define operator/node

Describe alternatives you've considered
Use golang to implement dora operator or node, and run it successfully

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.