retis-org / retis Goto Github PK
View Code? Open in Web Editor NEWTracing packets in the Linux networking stack & friends
Tracing packets in the Linux networking stack & friends
Using uprobes, could be used for OVS-DPDK.
Runtime tests[1] are currently skipped by default but could be run a in controlled VM. The reason we skip them for now is because they do require privileged capabilities.
[1] # cargo test --features=test_cap_bpf
While working on #62 we've added a Cache
to the Unmarshaler
s. The whole idea is to allow Unmarshaler
s to keep some state.
I tried to implement it using a more natural way: using a struct, i.e: having Unmarshaler
be a Trait
and have a default implementation of that trait for Fn ...
. It works nicely except for one (big) problem: the list of unmarshalers is sent to a different thread while being updated (registered) so it must be sent as unmutable.
We should refactor this code to make it a two step process so we can move ownership of the entire Unmarshaler
list to the unmarshaler thread so they can be turned into stateful structs.
It could be convenient, for a given packet, if it could be possible to show a backtrace.
This needs some investigation first.
Probably not every probe that matches the packet is supposed to generate events (symbol whitelist might make sense).
We use log
as our logging API but alone the logging implementation is actually a no-op. We need to select, configure and use one of the logging implementations to actually show the logs.
As the number of events increase we will drop events at some point.
One typical place where events will be dropped is when reserving ringbuf space. Reporting will give users a hint that they might want to increase the ringbuf space (which should be an option)
A possible implementation could be to create an map that can be indexed by probes (to know which events were lost) and whose values can be increased when we fail to allocate ring buffer o hooks return errors.
When interrupted, the ProbeManager should read the content of that map and report its contents.
An interface for collectors to implement is needed so we can drive them in batches, as well as a way to register them to a group.
While writing #90 I found myself what seemed to be abusing the Probe+Hook system.
It might be specific to OVS module or it might be needed by other modules, that's something to discuss but for me it was clear that some of the hooks I was attaching to some probes didn't need to be hooks.
The main use case was to add a small ebpf program that creates some context (and, say, stores it in a map) and maybe another program later on that clears it. These programs do not send events but need to share the map fd with the hook that will retrieve this context to enrich the event it sends.
Some open questions:
core
infrastructure so we can centrally track what are we attaching where.Once an event is retrieved and processed we can provide it to the user. No post-processing is done at this point as we need all events for this and such things will be done offline.
Things to consider:
--format "{timestamp} ksym: {ksym}"
; or simpler options such as --show-field timestamp,ksym
. We shouldn't support both though (for maintainability reasons).For an initial support only a raw output to stdout might be possible to support. That is fine, if so please split this issue.
This depends on #8.
A way to uniquely identify packets is required so we can reconstruct their life later on.
We need a collector hooking to OVS data/control path for gathering OVS specific information. The exact scope is yet to be defined.
This will allow to track packets even if they are transformed (NAT, encapsulation, etc).
Collectors should have a way to register cmdline arguments and to retrieve their value when the program starts.
A possible solution would be to use clap
with Option<Vec<clap::Arg>>
when registering a collector and Option<clap::ArgMatches>
as one of its init()
arguments.
A BTF library to parse and expose data types and functions. It must be able to read the BTF information from multiple source, as we might needed it for various targets (kernel, OVS, etc).
Using BPF_PROG_TYPE_SCHED_CLS. The specificity is a qdisc needs to be provided as an argument for the eBPF program to be attached. This could require the user (or the tool) to attach the right kind of qdisc on some interfaces and could modify the way the system works.
The data probes will have access to would be struct __sk_buff
.
Could be in CONTRIBUTING.md
and should contain pointers on how to contribute, what to check before the CI does, etc. Another aspect would be to write a small example (with explanations) on how to write a collector.
Please drop below raw information that should be part of it.
Kernel probes will implement a way for other modules to add eBPF hooks to parse extra arguments and augment the events. A good solution would be to use freplace and an XDP dispatcher like logic.
At the moment it seems libbpf_rs does not support freplace, extra work might be needed.
Currently we have to workaround the fact that libbpf-rs
does not mark certain objects (e.g: Map
or RingBuf
) as Send
.
We should:
libbpf-rs
to add itWe might want to hook to sockets and have early/late filtering on packets. This could allow to better construct a packet lifetime in the Tx path; and to have extra information in Rx.
The corresponding BPF program types (BPF_PROG_TYPE_SK_MSG/SKB) have access to either struct __sk_buff
or struct sk_msg_md
. This should probably be split into two issues when assigned.
Collector that should fill the events with networking stack generic data (info about the skb, interfaces, netns, etc).
BPF probes access data in a probe-specific way, usually using a dedicated context structure. For hooks to safely access this data later on, an interface is required to both pass the data across hooks and to allow them to query for a specific structure or argument #.
The current logic to replace hooks into loaded BPF objects uses fexit
under the hood. As we can't for now use fexit
on fexit
functions, we do not support hooking to fexit
probes.
We should investigate this and see if there is a way to support this, in some ways. This would be handy for retrieving functions retvals. One option would be to use fexit
only for the retval retrieval while still allowing to attach hooks to that function using kprobe
.
On the technical part, handling fexit probes dynamically should look like the logic we currently have for raw tracepoints.
The skb collector will be responsible of installing probes to function / tracepoints having an skb as one of their parameters. It won't process much by itself and will delegate the event augmentation to other collectors (OVS, net, ...) by allowing them to provide hooks.
It should support kprobes, fexit and raw tracepoints at minimum.
Some of the skb internals are topics covered by dedicated issues and part of the logic might be shared with other collectors.
If the tool is started on an environment where there is no OVS daemon, it will report an error and always fail. We should let the tool to continue working for those kind of issues, otherwise the default --collectors
option won't often be working.
At the same time not failing when we do expect OVS events would not be good. A solution might be to add another cli option to decide whether or not those kind of issues are acceptable. This option could be used in profiles to make the user experience OK. But there might be other solutions.
Investigate and see if we can support a collector retrieving firewalling data. The use case would be for example to link a packet being dropped to an installed rule.
The tool could automatically report more user formatted info on containers:
--container <id>
.Support loading a hook to recompute the checksum of packets and report the result & all related info (if any).
Alternatively if #30 is supported this could probably come as an external BPF object.
CONFIG_DEBUG_INFO_BTF=y
CONFIG_DEBUG_INFO_BTF_MODULES=y
both are not always available in major distros. For example, in RHEL 8, CONFIG_DEBUG_INFO_BTF_MODULES is unset.
It might be useful for the user to specify a path to a list of raw BTF files
We might want to have the chance to load raw insns programs (mostly for filtering).
Libbpf doesn't expose any wrapper for that as it targets ELF files.
Implementing a small Rust module to do that seem to be the best option at the moment.
Instead of letting the users find about all cmdline arguments they should use in a given situation (which collectors to use, where to probe, what extra data should be retrieved, etc.) they could use profiles. Profiles would be a set of cmdline options for a specific use case,such as, "let's inspect the TCP stack".
As discussed in the initial proposal for #10 profiles should reuse the cmdline parsing logic. There are however things to consider:
One of the key feature is to match on packets. A solution is required in both the core tool (to accept user provided filters and to modelize them) and in the collectors to perform the actual matching.
Please split this issue into sub-ones if needed.
Allow the collecting event part to be run on multiple machines and generating events retrieved at a single point. For this to happen some kind of synchronization (including timestamp) and data passing is required.
For example, trace-cmd
is supporting something similar.
Investigate and implement if possible a post-processing command to convert events into Python objects and let the user manipulate them in a launched Python interpreter.
Some things to consider:
Implement an error reporting mechanism (in a dedicated map?) for retrieving errors from BPF. This could for example be used to detect if the event map is full and an event is being ignored, or if we can't find an entry in a map for various reasons.
This issue might be split as someone starts working on it.
A solution is needed to report events from probes and to digest them into a known format (json?). Possible solutions are splitting the event reporting logic per-collector, or to share one with more generic capabilities.
With events coming from different functions and subsystems for the same packets, we might be able to perform some latency measurement. This however is not a subject that can be overlooked so a proper investigation is required.
Runtime discovery of what is running, in which version, etc might be handy for:
We should have a module dumping the conntrack every so often. This could give us the ability to:
We could support external BPF objects and load them into hooks. Those external objects could be useful to 1) have a collection of small utilities for users to load in addition to the core features 2) let users compile and provide their own hooks for finer inspection of the stack, as many debugging sessions end up looking for very specific information.
Things to consider:
Support an initial (default?) post-processing command which would group and reorder events based on (at least) the skb tracking data and the event timestamps. This will be quite handy to understand a packet life in the networking stack. Some kind of formatting might also be needed to provide a nice user interface.
Some options we might consider to support:
An interface to manipulate kernel symbols exposed by /proc/kallsyms is required to convert symbol names to their addresses as well as the opposite.
a panic was observed while creating an ovs bridge with the tool already running.
Below the trace:
RUST_BACKTRACE=full ./target/debug/packet-tracer collect -c ovs
18:01:52 [INFO] Attaching probe to usdt /usr/local/sbin/ovs-vswitchd:dpif_netlink_operate__:op_flow_execute
thread '<unnamed>' panicked at 'attempt to subtract with overflow', src/core/user/proc.rs:384:62
stack backtrace:
0: 0x55a2cf25d2c0 - <std::sys_common::backtrace::_print::DisplayBacktrace as core::fmt::Display>::fmt::hb280c2b0faedb192
1: 0x55a2cf27a93e - core::fmt::write::h30e0b7ef777337ad
2: 0x55a2cf25ac35 - std::io::Write::write_fmt::h86627e30c2b512b3
3: 0x55a2cf25d085 - std::sys_common::backtrace::print::h7ed0882ed869c236
4: 0x55a2cf25e90f - std::panicking::default_hook::{{closure}}::h9a127e13324a150a
5: 0x55a2cf25e64a - std::panicking::default_hook::hf8f07fa1688cedd2
6: 0x55a2cf25f008 - std::panicking::rust_panic_with_hook::he6d410a49c1deab2
7: 0x55a2cf25ed61 - std::panicking::begin_panic_handler::{{closure}}::h3a4af972edd4df52
8: 0x55a2cf25d76c - std::sys_common::backtrace::__rust_end_short_backtrace::h04151587e1857959
9: 0x55a2cf25eac2 - rust_begin_unwind
10: 0x55a2cef548d3 - core::panicking::panic_fmt::h5085b5d784b56c67
11: 0x55a2cef549ad - core::panicking::panic::h699f7acfe9b26bc1
12: 0x55a2cef89d1c - packet_tracer::core::user::proc::Process::get_note_from_symbol::h42392f1df81038e3
at /home/pvalerio/workspace/open_source/github/net-trace/packet-tracer-vlrpl/src/core/user/proc.rs:384:62
13: 0x55a2cef99fef - packet_tracer::core::probe::user::user::register_unmarshaler::{{closure}}::h7174ef40370fcc82
at /home/pvalerio/workspace/open_source/github/net-trace/packet-tracer-vlrpl/src/core/probe/user/user.rs:98:24
14: 0x55a2cefc75f5 - <alloc::boxed::Box<F,A> as core::ops::function::Fn<Args>>::call::h0eaa2820cd54c2e2
at /builddir/build/BUILD/rustc-1.66.1-src/library/alloc/src/boxed.rs:2001:9
15: 0x55a2cefe6120 - packet_tracer::core::events::bpf::parse_raw_event::h299ad2fecfa5aede
at /home/pvalerio/workspace/open_source/github/net-trace/packet-tracer-vlrpl/src/core/events/bpf.rs:262:25
16: 0x55a2ceffa733 - packet_tracer::core::events::bpf::BpfEvents::start_polling::{{closure}}::h94dc8d997d7c95c3
at /home/pvalerio/workspace/open_source/github/net-trace/packet-tracer-vlrpl/src/core/events/bpf.rs:132:31
17: 0x55a2cf018e34 - libbpf_rs::ringbuf::RingBufferBuilder::call_sample_cb::h6d1234f729d02c26
at /home/pvalerio/.cargo/git/checkouts/libbpf-rs-a64433d6203387de/52ab250/libbpf-rs/src/ringbuf.rs:128:9
18: 0x55a2cf04cd31 - ringbuf_process_ring
at /home/pvalerio/.cargo/registry/src/github.com-1ecc6299db9ec823/libbpf-sys-1.0.4+v1.0.1/libbpf/src/ringbuf.c:231:11
19: 0x55a2cf04ce31 - ring_buffer__poll
at /home/pvalerio/.cargo/registry/src/github.com-1ecc6299db9ec823/libbpf-sys-1.0.4+v1.0.1/libbpf/src/ringbuf.c:288:9
20: 0x55a2cf018ee4 - libbpf_rs::ringbuf::RingBuffer::poll::h211593462a5b2144
at /home/pvalerio/.cargo/git/checkouts/libbpf-rs-a64433d6203387de/52ab250/libbpf-rs/src/ringbuf.rs:157:28
21: 0x55a2ceffacfc - packet_tracer::core::events::bpf::BpfEvents::start_polling::{{closure}}::h016bc39abb859420
at /home/pvalerio/workspace/open_source/github/net-trace/packet-tracer-vlrpl/src/core/events/bpf.rs:158:17
22: 0x55a2cefa51a1 - std::sys_common::backtrace::__rust_begin_short_backtrace::hafb370250f6afa6b
at /builddir/build/BUILD/rustc-1.66.1-src/library/std/src/sys_common/backtrace.rs:121:18
23: 0x55a2cef78e01 - std::thread::Builder::spawn_unchecked_::{{closure}}::{{closure}}::hc858349d835e0303
at /builddir/build/BUILD/rustc-1.66.1-src/library/std/src/thread/mod.rs:551:17
24: 0x55a2cefe3e51 - <core::panic::unwind_safe::AssertUnwindSafe<F> as core::ops::function::FnOnce<()>>::call_once::hdb80aa3a4e01895b
at /builddir/build/BUILD/rustc-1.66.1-src/library/core/src/panic/unwind_safe.rs:271:9
25: 0x55a2cef8bd11 - std::panicking::try::do_call::hfc928c58770113b0
at /builddir/build/BUILD/rustc-1.66.1-src/library/std/src/panicking.rs:483:40
26: 0x55a2cef8bebb - __rust_try
27: 0x55a2cef8ba5f - std::panicking::try::hc2c9b75d3499bfe0
at /builddir/build/BUILD/rustc-1.66.1-src/library/std/src/panicking.rs:447:19
28: 0x55a2cef7c401 - std::panic::catch_unwind::h72fc8bbca879c25b
at /builddir/build/BUILD/rustc-1.66.1-src/library/std/src/panic.rs:137:14
29: 0x55a2cef7872c - std::thread::Builder::spawn_unchecked_::{{closure}}::h6537250acbf18c1c
at /builddir/build/BUILD/rustc-1.66.1-src/library/std/src/thread/mod.rs:550:30
30: 0x55a2cefea8ee - core::ops::function::FnOnce::call_once{{vtable.shim}}::h09a1881c17915317
at /builddir/build/BUILD/rustc-1.66.1-src/library/core/src/ops/function.rs:251:5
31: 0x55a2cf261b53 - std::sys::unix::thread::Thread::new::thread_start::hfad602368217ab7c
32: 0x7f3686b8e12d - start_thread
33: 0x7f3686c0fbc0 - clone3
Allow the user to provide its own BPF object file and load it as a hook in the probes. For this to work the following topics needs to be covered:
Before releasing let's make a ux review and check cmd line options, help, documentation, consistency, etc.
We should have at least a starting page with pointers and examples on how to write collectors and hooks. That will be required to allow easier external contributions. If we support external hooks, the hook documentation might also be used for that (see #30).
There's a chance we'll end up having a decent number of options and adding a bash completion file could be a nice to have
Once filtering will be in place, we need a way to verify the correctness of the generated programs.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.