retis-org / retis Goto Github PK

View Code? Open in Web Editor NEW

87.0 87.0 13.0 16.89 MB

Tracing packets in the Linux networking stack & friends

Rust 58.10% C 35.52% Dockerfile 0.11% Python 4.96% Shell 0.66% Makefile 0.64%

retis's People

Contributors

Stargazers

Watchers

Forkers

amorenoz vlrpl msherif1234 dmendes-rh liuhangbin shylou makelinux juanmasg wanghekai ffmancera stek29 jona42-ui sundaram123krishnan

retis's Issues

Support the DPDK datapath

Using uprobes, could be used for OVS-DPDK.

CI: launch runtime tests in a VM

Runtime tests[1] are currently skipped by default but could be run a in controlled VM. The reason we skip them for now is because they do require privileged capabilities.

[1] # cargo test --features=test_cap_bpf

Unmarhaller registration and execution should be done in two separated steps

While working on #62 we've added a Cache to the Unmarshalers. The whole idea is to allow Unmarshalers to keep some state.
I tried to implement it using a more natural way: using a struct, i.e: having Unmarshaler be a Trait and have a default implementation of that trait for Fn .... It works nicely except for one (big) problem: the list of unmarshalers is sent to a different thread while being updated (registered) so it must be sent as unmutable.

We should refactor this code to make it a two step process so we can move ownership of the entire Unmarshaler list to the unmarshaler thread so they can be turned into stateful structs.

allow to retrieve backtraces for specific symbols

It could be convenient, for a given packet, if it could be possible to show a backtrace.
This needs some investigation first.
Probably not every probe that matches the packet is supposed to generate events (symbol whitelist might make sense).

Use a log implementation to actually retrieve and display the logs

We use log as our logging API but alone the logging implementation is actually a no-op. We need to select, configure and use one of the logging implementations to actually show the logs.

Report dropped events and errors

As the number of events increase we will drop events at some point.
One typical place where events will be dropped is when reserving ringbuf space. Reporting will give users a hint that they might want to increase the ringbuf space (which should be an option)

A possible implementation could be to create an map that can be indexed by probes (to know which events were lost) and whose values can be increased when we fail to allocate ring buffer o hooks return errors.

When interrupted, the ProbeManager should read the content of that map and report its contents.

Core collector module

An interface for collectors to implement is needed so we can drive them in batches, as well as a way to register them to a group.

Allow modules to define their own ebpf programs?

While writing #90 I found myself what seemed to be abusing the Probe+Hook system.

It might be specific to OVS module or it might be needed by other modules, that's something to discuss but for me it was clear that some of the hooks I was attaching to some probes didn't need to be hooks.

The main use case was to add a small ebpf program that creates some context (and, say, stores it in a map) and maybe another program later on that clears it. These programs do not send events but need to share the map fd with the hook that will retrieve this context to enrich the event it sends.

Some open questions:

First, of course, is this usecase something we want to allow.
Would filtering be available? How?
Would the module open, load and attach it's own program or would it register the program somehow with the core infrastructure so we can centrally track what are we attaching where.

Support for handling the events (display on stdout, write to a file) after they are processed

Once an event is retrieved and processed we can provide it to the user. No post-processing is done at this point as we need all events for this and such things will be done offline.

Things to consider:

Output the events to stdout, without (much) processing. Options are to use raw json, or a slightly better format for reading.
The output might be controlled a little:
- Select the collectors events are coming from.
- Select specific fields to display. Here an option might be to have a formatting cmdline option such as --format "{timestamp} ksym: {ksym}"; or simpler options such as --show-field timestamp,ksym. We shouldn't support both though (for maintainability reasons).
Write the raw json output to a file for later use. This will be required by post-processing commands later on.

For an initial support only a raw output to stdout might be possible to support. That is fine, if so please split this issue.

This depends on #8.

skb tracking

A way to uniquely identify packets is required so we can reconstruct their life later on.

Initial OVS collector

We need a collector hooking to OVS data/control path for gathering OVS specific information. The exact scope is yet to be defined.

Filter on the skb tracking data

This will allow to track packets even if they are transformed (NAT, encapsulation, etc).

Cmdline parsing

Collectors should have a way to register cmdline arguments and to retrieve their value when the program starts.

A possible solution would be to use clap with Option<Vec<clap::Arg>> when registering a collector and Option<clap::ArgMatches> as one of its init() arguments.

BTF library

A BTF library to parse and expose data types and functions. It must be able to read the BTF information from multiple source, as we might needed it for various targets (kernel, OVS, etc).

Attach to qdiscs

Using BPF_PROG_TYPE_SCHED_CLS. The specificity is a qdisc needs to be provided as an argument for the eBPF program to be attached. This could require the user (or the tool) to attach the right kind of qdisc on some interfaces and could modify the way the system works.

The data probes will have access to would be struct __sk_buff.

Add a contributing doc

Could be in CONTRIBUTING.md and should contain pointers on how to contribute, what to check before the CI does, etc. Another aspect would be to write a small example (with explanations) on how to write a collector.

Please drop below raw information that should be part of it.

Freplace support

Kernel probes will implement a way for other modules to add eBPF hooks to parse extra arguments and augment the events. A good solution would be to use freplace and an XDP dispatcher like logic.

At the moment it seems libbpf_rs does not support freplace, extra work might be needed.

Remove libbpf-rs `Send` workaround

Currently we have to workaround the fact that libbpf-rs does not mark certain objects (e.g: Map or RingBuf) as Send.
We should:

send a PR to libbpf-rs to add it
remove the workaround

Attach to sockets

We might want to hook to sockets and have early/late filtering on packets. This could allow to better construct a packet lifetime in the Tx path; and to have extra information in Rx.

The corresponding BPF program types (BPF_PROG_TYPE_SK_MSG/SKB) have access to either struct __sk_buff or struct sk_msg_md. This should probably be split into two issues when assigned.

Net stack collector

Collector that should fill the events with networking stack generic data (info about the skb, interfaces, netns, etc).

discovery mode: auto-attach to reported symbols in stack traces

Keep a cache of functions we tried already.
Exclude the current list of probes.
Filter functions based on their parameters & if they are traceable.
Might need to exclude things like BPF trampoline.

Interface for probe hooks to access data

BPF probes access data in a probe-specific way, usually using a dedicated context structure. For hooks to safely access this data later on, an interface is required to both pass the data across hooks and to allow them to query for a specific structure or argument #.

Fexit probe support

The current logic to replace hooks into loaded BPF objects uses fexit under the hood. As we can't for now use fexit on fexit functions, we do not support hooking to fexit probes.

We should investigate this and see if there is a way to support this, in some ways. This would be handy for retrieving functions retvals. One option would be to use fexit only for the retval retrieval while still allowing to attach hooks to that function using kprobe.

On the technical part, handling fexit probes dynamically should look like the logic we currently have for raw tracepoints.

skb collector

The skb collector will be responsible of installing probes to function / tracepoints having an skb as one of their parameters. It won't process much by itself and will delegate the event augmentation to other collectors (OVS, net, ...) by allowing them to provide hooks.

It should support kprobes, fexit and raw tracepoints at minimum.

Some of the skb internals are topics covered by dedicated issues and part of the logic might be shared with other collectors.

OVS module makes the tool to fail if the OVS daemon isn't installed

If the tool is started on an environment where there is no OVS daemon, it will report an error and always fail. We should let the tool to continue working for those kind of issues, otherwise the default --collectors option won't often be working.

At the same time not failing when we do expect OVS events would not be good. A solution might be to add another cli option to decide whether or not those kind of issues are acceptable. This option could be used in profiles to make the user experience OK. But there might be other solutions.

Firewalling collector

Investigate and see if we can support a collector retrieving firewalling data. The use case would be for example to link a packet being dropped to an installed rule.

Better handle containers

The tool could automatically report more user formatted info on containers:

Mapping netns of events to containers names/id.
Using BPF_PROG_TYPE_CGROUP_SKB to see packets going in/out cgroups.
Allowing to target a container for automatic inspection, something like --container <id>.

Checksum collector for skb (and more?)

Support loading a hook to recompute the checksum of packets and report the result & all related info (if any).

Alternatively if #30 is supported this could probably come as an external BPF object.

Investigate what can be done for hardware offloading

Allow to specify a custom btf file path

CONFIG_DEBUG_INFO_BTF=y
CONFIG_DEBUG_INFO_BTF_MODULES=y

both are not always available in major distros. For example, in RHEL 8, CONFIG_DEBUG_INFO_BTF_MODULES is unset.
It might be useful for the user to specify a path to a list of raw BTF files

Add a way to load raw instructions

We might want to have the chance to load raw insns programs (mostly for filtering).
Libbpf doesn't expose any wrapper for that as it targets ELF files.
Implementing a small Rust module to do that seem to be the best option at the moment.

Profile support

Instead of letting the users find about all cmdline arguments they should use in a given situation (which collectors to use, where to probe, what extra data should be retrieved, etc.) they could use profiles. Profiles would be a set of cmdline options for a specific use case,such as, "let's inspect the TCP stack".

As discussed in the initial proposal for #10 profiles should reuse the cmdline parsing logic. There are however things to consider:

Argument priority? Should profile override user arguments, or the other way. Should we issue a warning?
How to define profiles. They should be embedded within the tool for easy deployment. We also want to support external ones.

Packet filtering

One of the key feature is to match on packets. A solution is required in both the core tool (to accept user provided filters and to modelize them) and in the collectors to perform the actual matching.

Please split this issue into sub-ones if needed.

Cluster wide tracing

Allow the collecting event part to be run on multiple machines and generating events retrieved at a single point. For this to happen some kind of synchronization (including timestamp) and data passing is required.

For example, trace-cmd is supporting something similar.

Post-processing in a Python interpreter

Investigate and implement if possible a post-processing command to convert events into Python objects and let the user manipulate them in a launched Python interpreter.

Some things to consider:

One option of interfacing Rust & Python and embedding an interpreter: https://github.com/RustPython/RustPython
Use of Scapy for representing events linked to packets?

Error reporting from BPF

Implement an error reporting mechanism (in a dedicated map?) for retrieving errors from BPF. This could for example be used to detect if the event map is full and an event is being ignored, or if we can't find an entry in a map for various reasons.

Event reporting and handling

This issue might be split as someone starts working on it.

A solution is needed to report events from probes and to digest them into a known format (json?). Possible solutions are splitting the event reporting logic per-collector, or to share one with more generic capabilities.

Support latency measurements

With events coming from different functions and subsystems for the same packets, we might be able to perform some latency measurement. This however is not a subject that can be overlooked so a proper investigation is required.

Runtime discovery of the system characteristics

Runtime discovery of what is running, in which version, etc might be handy for:

Automatically starting default collectors which have a dependency on a daemon or userspace part, or loaded kernel module.
Allow to change the behavior of collectors based on the versions found on a system. Capabilities might be restricted.
We also might have an automatic mode enabling all collectors retrieving information from what is running on a given machine.

Support conntrack

We should have a module dumping the conntrack every so often. This could give us the ability to:

Track NATed packet matching one of our filters.
Simply monitor the status of a tracked connection in order to see what flags are set and comparing those kind of info with the expectation

Support for loading external BPF object in hooks

We could support external BPF objects and load them into hooks. Those external objects could be useful to 1) have a collection of small utilities for users to load in addition to the core features 2) let users compile and provide their own hooks for finer inspection of the stack, as many debugging sessions end up looking for very specific information.

Things to consider:

Build environment so they can use our hook infrastructure (trace context, helpers, hook definition, etc.) and build compatible objects.
Command line argument to provide hooks. We will also need to support the targeted vs generic hooking mechanism.
Logic to report data to the Rust userspace part and add it to the events.
How to distribute external hooks, if we do provide a collection of useful ones. (Which would be good and let users augment the tool capabilities over time). Extra care would be needed to avoid loading objects using outdated APIs.

Simple post-processing command to group and reorder events

Support an initial (default?) post-processing command which would group and reorder events based on (at least) the skb tracking data and the event timestamps. This will be quite handy to understand a packet life in the networking stack. Some kind of formatting might also be needed to provide a nice user interface.

Some options we might consider to support:

Provide parameters to match packets, e.g. a starting and/or ending timestamp.
A parameter to only display data coming from a set of collectors / functions / having a specific field, field value, etc.
Select how to order packets: buffer address, skb address, first timestamp, current one, etc.
Having a “diff” view, showing for a given skb only the fields that have changed compared to the previous event.

Kernel symbols support (kallsyms)

An interface to manipulate kernel symbols exposed by /proc/kallsyms is required to convert symbol names to their addresses as well as the opposite.

ovs collector panicked while creating a bridge

a panic was observed while creating an ovs bridge with the tool already running.
Below the trace:

RUST_BACKTRACE=full ./target/debug/packet-tracer collect -c ovs
18:01:52 [INFO] Attaching probe to usdt /usr/local/sbin/ovs-vswitchd:dpif_netlink_operate__:op_flow_execute
thread '<unnamed>' panicked at 'attempt to subtract with overflow', src/core/user/proc.rs:384:62
stack backtrace:
   0:     0x55a2cf25d2c0 - <std::sys_common::backtrace::_print::DisplayBacktrace as core::fmt::Display>::fmt::hb280c2b0faedb192
   1:     0x55a2cf27a93e - core::fmt::write::h30e0b7ef777337ad
   2:     0x55a2cf25ac35 - std::io::Write::write_fmt::h86627e30c2b512b3
   3:     0x55a2cf25d085 - std::sys_common::backtrace::print::h7ed0882ed869c236
   4:     0x55a2cf25e90f - std::panicking::default_hook::{{closure}}::h9a127e13324a150a
   5:     0x55a2cf25e64a - std::panicking::default_hook::hf8f07fa1688cedd2
   6:     0x55a2cf25f008 - std::panicking::rust_panic_with_hook::he6d410a49c1deab2
   7:     0x55a2cf25ed61 - std::panicking::begin_panic_handler::{{closure}}::h3a4af972edd4df52
   8:     0x55a2cf25d76c - std::sys_common::backtrace::__rust_end_short_backtrace::h04151587e1857959
   9:     0x55a2cf25eac2 - rust_begin_unwind
  10:     0x55a2cef548d3 - core::panicking::panic_fmt::h5085b5d784b56c67
  11:     0x55a2cef549ad - core::panicking::panic::h699f7acfe9b26bc1
  12:     0x55a2cef89d1c - packet_tracer::core::user::proc::Process::get_note_from_symbol::h42392f1df81038e3
                               at /home/pvalerio/workspace/open_source/github/net-trace/packet-tracer-vlrpl/src/core/user/proc.rs:384:62
  13:     0x55a2cef99fef - packet_tracer::core::probe::user::user::register_unmarshaler::{{closure}}::h7174ef40370fcc82
                               at /home/pvalerio/workspace/open_source/github/net-trace/packet-tracer-vlrpl/src/core/probe/user/user.rs:98:24
  14:     0x55a2cefc75f5 - <alloc::boxed::Box<F,A> as core::ops::function::Fn<Args>>::call::h0eaa2820cd54c2e2
                               at /builddir/build/BUILD/rustc-1.66.1-src/library/alloc/src/boxed.rs:2001:9
  15:     0x55a2cefe6120 - packet_tracer::core::events::bpf::parse_raw_event::h299ad2fecfa5aede
                               at /home/pvalerio/workspace/open_source/github/net-trace/packet-tracer-vlrpl/src/core/events/bpf.rs:262:25
  16:     0x55a2ceffa733 - packet_tracer::core::events::bpf::BpfEvents::start_polling::{{closure}}::h94dc8d997d7c95c3
                               at /home/pvalerio/workspace/open_source/github/net-trace/packet-tracer-vlrpl/src/core/events/bpf.rs:132:31
  17:     0x55a2cf018e34 - libbpf_rs::ringbuf::RingBufferBuilder::call_sample_cb::h6d1234f729d02c26
                               at /home/pvalerio/.cargo/git/checkouts/libbpf-rs-a64433d6203387de/52ab250/libbpf-rs/src/ringbuf.rs:128:9
  18:     0x55a2cf04cd31 - ringbuf_process_ring
                               at /home/pvalerio/.cargo/registry/src/github.com-1ecc6299db9ec823/libbpf-sys-1.0.4+v1.0.1/libbpf/src/ringbuf.c:231:11
  19:     0x55a2cf04ce31 - ring_buffer__poll
                               at /home/pvalerio/.cargo/registry/src/github.com-1ecc6299db9ec823/libbpf-sys-1.0.4+v1.0.1/libbpf/src/ringbuf.c:288:9
  20:     0x55a2cf018ee4 - libbpf_rs::ringbuf::RingBuffer::poll::h211593462a5b2144
                               at /home/pvalerio/.cargo/git/checkouts/libbpf-rs-a64433d6203387de/52ab250/libbpf-rs/src/ringbuf.rs:157:28
  21:     0x55a2ceffacfc - packet_tracer::core::events::bpf::BpfEvents::start_polling::{{closure}}::h016bc39abb859420
                               at /home/pvalerio/workspace/open_source/github/net-trace/packet-tracer-vlrpl/src/core/events/bpf.rs:158:17
  22:     0x55a2cefa51a1 - std::sys_common::backtrace::__rust_begin_short_backtrace::hafb370250f6afa6b
                               at /builddir/build/BUILD/rustc-1.66.1-src/library/std/src/sys_common/backtrace.rs:121:18
  23:     0x55a2cef78e01 - std::thread::Builder::spawn_unchecked_::{{closure}}::{{closure}}::hc858349d835e0303
                               at /builddir/build/BUILD/rustc-1.66.1-src/library/std/src/thread/mod.rs:551:17
  24:     0x55a2cefe3e51 - <core::panic::unwind_safe::AssertUnwindSafe<F> as core::ops::function::FnOnce<()>>::call_once::hdb80aa3a4e01895b
                               at /builddir/build/BUILD/rustc-1.66.1-src/library/core/src/panic/unwind_safe.rs:271:9
  25:     0x55a2cef8bd11 - std::panicking::try::do_call::hfc928c58770113b0
                               at /builddir/build/BUILD/rustc-1.66.1-src/library/std/src/panicking.rs:483:40
  26:     0x55a2cef8bebb - __rust_try
  27:     0x55a2cef8ba5f - std::panicking::try::hc2c9b75d3499bfe0
                               at /builddir/build/BUILD/rustc-1.66.1-src/library/std/src/panicking.rs:447:19
  28:     0x55a2cef7c401 - std::panic::catch_unwind::h72fc8bbca879c25b
                               at /builddir/build/BUILD/rustc-1.66.1-src/library/std/src/panic.rs:137:14
  29:     0x55a2cef7872c - std::thread::Builder::spawn_unchecked_::{{closure}}::h6537250acbf18c1c
                               at /builddir/build/BUILD/rustc-1.66.1-src/library/std/src/thread/mod.rs:550:30
  30:     0x55a2cefea8ee - core::ops::function::FnOnce::call_once{{vtable.shim}}::h09a1881c17915317
                               at /builddir/build/BUILD/rustc-1.66.1-src/library/core/src/ops/function.rs:251:5
  31:     0x55a2cf261b53 - std::sys::unix::thread::Thread::new::thread_start::hfad602368217ab7c
  32:     0x7f3686b8e12d - start_thread
  33:     0x7f3686c0fbc0 - clone3

Allow user provided probe hooks

Allow the user to provide its own BPF object file and load it as a hook in the probes. For this to work the following topics needs to be covered:

Provide a way to compile BPF objects using our interfaces and specific data.
Allow loading external objects.
Having generic logic to fill events with generic data and to retrieve it.

User experience review

Before releasing let's make a ux review and check cmd line options, help, documentation, consistency, etc.

Package in rpm/deb/... for easy consumption by distributions

Documentation on how to write collectors and hooks

We should have at least a starting page with pointers and examples on how to write collectors and hooks. That will be required to allow easier external contributions. If we support external hooks, the hook documentation might also be used for that (see #30).

add bash completion script

There's a chance we'll end up having a decent number of options and adding a bash completion file could be a nice to have

Implement tests for filtering

Once filtering will be in place, we need a way to verify the correctness of the generated programs.