Code Monkey home page Code Monkey logo

Comments (12)

nyurik avatar nyurik commented on June 8, 2024 1

Thanks for the indepth info! For the past few days i was massively rewriting Lurk (hope it gets merged :) ) - and you can see the more "syscalls-based" approach here.

There is still a lot to do, and working with the well understood and small-scopped lib like safeptrace would be awesome! Please publish :)

At this stage, I am not looking to build a full decoder for each syscall, but some basics like (signed/unsigned) int, a bool, string, and address would be an awesome start. Eventually it might be fun to have auto-generated structs, but ... baby steps. So the steps above look reasonable, and i will see how it can be better implemented as an opt-in feature of your crate.

from syscalls.

jasonwhite avatar jasonwhite commented on June 8, 2024

I saw lurk posted on /r/rust a while back. Looks exciting! I've actually got quite a lot of experience with what you're doing. ;)

Regarding syscall "groups": See #20. Defining sets of syscalls should be relatively easy. I would just copy what strace has done (see links in that issue). To keep the core of this library small, I'd also put it behind a feature flag that is off by default.

I'm also the author/maintainer of Reverie, where I also implemented a toy version of strace that is able to pretty-print the arguments. The mapping of syscall numbers to argument types is defined here. However, it's a crap-ton of code and I wouldn't do it that way again because, like you said, it is architecture-specific and it is hard to maintain. It only supports x86-64 and aarch64 right now. I can probably get reverie-syscalls published on crates.io if you wish.

Instead, because Linux is the source of truth and contains all the information you need, what I would do is this:

  1. Compile Linux for your target architecture with debug symbols on. The result is a vmlinux ELF file. Some distos include debug symbols in their compiled Linux at /boot/vmlinux-*, but this only covers 1 architecture.
  2. Using goblin or a similar ELF parsing library, find the syscall table symbol (sys_call_table). This is an array of syscall functions. Whenever Linux traps the syscall instruction, it calls the function with something like sys_call_table[sysno](args...).
  3. Conveniently, Linux also stores metadata on all syscalls (because BPF needs it). This metadata includes all of the argument names and types of every syscall.
  4. Now, just because we know the C types for all syscall arguments, it doesn't mean we can do much with it in the Rust world. For this, we can define a simple mapping to get the Rust version of the argument type. For example, unsigned char* becomes *mut u8, const int* becomes *const i32, struct statx* becomes *mut libc::statx, etc. This can convert the majority of argument types, but there are some cases where libc doesn't have the equivalent type. For those, you can just map it to *mut libc::c_void. However, if you really want the full definition of the struct, it can be found in the debug info as well. You'd just need to search transitively through the struct/union definitions.
  5. Using your newfound powers, you can generate the equivalent Rust code in whatever format you want, deriving the pretty-printing of the argument types. There are some very tricky corner cases, like figuring out what each of the possible request/response types are for ioctl. If you parse deeply enough, I think this can be derived as well.

This script does steps 1-3 and was used to generate the syscall table at https://jasonwhite.io/thing/syscalls/. Since this will likely generate a lot of code, I probably wouldn't put it in the syscalls library. It should probably go into its own crate.

FYI, I also wrote safeptrace. It helps to avoid shooting yourself in the foot with the ptrace API and also provides a very efficient async ptrace API. I've been meaning to get it published as a crate as well.

from syscalls.

jasonwhite avatar jasonwhite commented on June 8, 2024

For syscall categories/groups, I don't think it should go into the syscall_enum! macro because then I think you'd have to modify syscall-gen to spit out the categories. I think the following would be a reasonable approach.

In some non-arch specific place:

// All the possible categories. Could use `EnumSetType` from `enumset` here.
pub enum Categories {
    File,
    Descriptor,
    IPC,
    Memory,
    Creds,
    // ...
}

Since src/arch/x86_64.rs is generated, the categories could be put into src/arch/x86_64/categories.rs:

use crate::Categories;
use crate::Categories::*;

static CATEGORIES: SysnoMap<Categories> = SysnoMap::new(&[
    (Sysno::read, Descriptor),
    (Sysno::open, Descriptor | File),
]);

impl Sysno {
   pub fn categories(&self) -> Categories {
       *CATEGORIES[self]
   }
}

Unfortunately, having a separate CATEGORIES definition for every architecture is a bit repetitive as the syscall categories will be the same for every architecture. I have some thoughts on how to avoid this duplication, but I'll put that into another github issue.

Edit: Added those thoughts in #30.

from syscalls.

nyurik avatar nyurik commented on June 8, 2024

I keep thinking if it would make more sense to just add the args and their types (as en enum) as part of the generated syscalls enum - keeping two crates like this in sync may be a bit of a pain, plus i think compiling-time is not that different.

The actual parsing of the args based on that enum could be a separate crate outright, maybe part of the same repo. E.g. there could be a separate crate that focuses on registry parsing - reading strings, signed/unsigned ints, and all sorts of other "magical" structs.

from syscalls.

nyurik avatar nyurik commented on June 8, 2024

P.S. I am a bit unclear why I would need elf parsing - could we do similar (hacky) text parsing of the files to generate the arguments? And also, how the syscalls-gen crate relates to gen-syscall repo...

from syscalls.

jasonwhite avatar jasonwhite commented on June 8, 2024

The number of different argument types for all the syscalls is quite large and deep, especially if you want to follow struct pointers or differentiate flags. For example, strace will read the stat struct pointer to pretty-print the information inside. I'm assuming you'll eventually want to do that. I don't want to pollute this library with those types (or maintain it). This library is meant to be low-level, providing only the basic necessities for dealing with syscalls.

Note that reverie-syscalls implements a typed interface for most syscalls and a way to display them. It is incomplete, but more than you'll likely ever need for an strace clone. For syscalls it doesn't have the arguments for, it just defaults to the 6 register values. So, it doesn't need to be 100% complete for it to be useful. There is a long tail of rarely used syscalls that you'll likely never see in the output of strace.

I am a bit unclear why I would need elf parsing

You don't need it, but I think it's way easier, more accurate, and more maintainable than trying to extract all the arguments via grepping the Linux codebase. With ELF parsing, you'll get a complete list of syscall numbers mapping to their arguments.

Here's a JSON file that I generated a couple of years ago that contained all of the syscalls (at the time) along with their arguments using the ELF parsing method. The hardest part is just building a kernel with debug symbols. Generating that list for other architectures is just a simple matter of building the Linux kernel for your target architecture. I had plans to automate this one day using GitHub Actions, but never got around to it.

from syscalls.

jasonwhite avatar jasonwhite commented on June 8, 2024

Looks like linux-raw-sys contains all of the types that could possibly be used for syscall arguments. This combined with the method described above should yield rich type information for all syscall arguments.

I've also released a more complete syscall argument scraper that can do a hacky C-type to Rust-type translation, but maybe the type name conversion is unnecessary with linux-raw-sys.

from syscalls.

Related Issues (19)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.