Code Monkey home page Code Monkey logo

Comments (17)

anderejd avatar anderejd commented on June 27, 2024 12

Hello, I'm the author of the tool cargo-geiger. Thank you @dtolnay for yet another significant open source Rust contribution, your work is amazing, period. This approach to C++ FFI looks wonderful from my perspective.

@bjorn3 I think I can see the point you are making, that it should be easy to find out if a crate depends on any unsafe code including FFI? Let's continue that discussion including detection of unsafe code using macro expansion over at the cargo-geiger repos.

from cxx.

dtolnay avatar dtolnay commented on June 27, 2024 9

I understand what you're saying but I don't find it to be a distinction that would be valuable for maintaining correctness of a codebase in practice.

My approach is as follows:

  • Fact 1: unsafe is the minimal set of code that you need to audit to discover all sources of undefined behavior.
  • Fact 2: all C++ is unsafe.
  • Fact 3: if you don't need to audit the Rust code, it doesn't need to be unsafe.

from cxx.

RalfJung avatar RalfJung commented on June 27, 2024 7

Fact 1: unsafe is the minimal set of code that you need to audit to discover all sources of undefined behavior.

That's already not true, though. However I have to admit that this particular refutation has little bearing on the discussion at hand.


I think @bjorn3 is following the interpretation that when running a Rust program, all possible UB must arise from some unsafe block. This property is violated by cxx, because C++ code is not inside a literal unsafe block. Usually, writing unsafe { extern_fn() } is the place where the proof obligation for the C++ code occurs.

Consider the following: suppose you were using traditional bindgen to expose an unsafe FFI which you then wrapped in a safe Rust API, as is common practice. That doesn't give you the right to later mess up the C++ code arbitrarily badly without getting UB...

No, it doesn't -- so? The unsafe block on the Rust side crucially marks the place where the proof obligation is checked, and of course if anything that is relied upon by the corresponding proof changes, it needs to be re-checked. This is true even for pure Rust code: calling some unsafe fn in a correct way does not give you the right to later change that unsafe Rust code and change its safety contract; you have to re-check all uses at that point.


@dtolnay on the other hand views C++ code as implicitly carrying an unsafe block all around it.

I can understand this view, but I think the issue with this is the lack of a clear proof obligation. unsafe blocks aren't about the exact auditing scope (see above); they are all about marking where exactly which proof obligations arise. (This is consistent with my blog post: the proof obligations for set_len start at some unsafe block in Vec that would cause UB with a incorrect length; that obligation is discharged via Vec's datastructure invariant, and that invariant in turns puts an obligation on the entire module.)

With cxx, what has to be proven about the C++ code is determined very non-locally by the cxx bridge file. The C++ type system is too weak to tell you what you have to prove. Crucially, that cxx bridge file doesn't even have any unsafe in it so it is not clear that there is some proof obligation being determined here. (This is basically the point @ogoffart is making above as well; I wonder what your thoughts on that are @dtolnay.)

from cxx.

ogoffart avatar ogoffart commented on June 27, 2024 4

But by that definition, extern functions should not be unsafe. But they are:

extern {  fn init_stuff(); }

fn main() {
     init_stuff(); // Error:  call to unsafe function is unsafe and requires unsafe function or block
}

Why is calling init_stuff() unsafe? I could claim that calling it is not unsafe because it is the C code that need to be audited.

from cxx.

dtolnay avatar dtolnay commented on June 27, 2024 4

Moderator note: I am deleting comments because GitHub's linear comment structure makes it unmanageable to hold a complex forking discussion here. Discussions that are not the dominant thread would be better to hold here.

from cxx.

dtolnay avatar dtolnay commented on June 27, 2024 3

100% of C++ code is unsafe. When auditing a project, you would be on the hook for auditing all the unsafe Rust code and all the C++ code. The core safety claim under this new model is that auditing just the C++ side would be sufficient to catch all problems, i.e. the Rust side can be 100% safe.

from cxx.

dtolnay avatar dtolnay commented on June 27, 2024 3

There's an interpretation of safe Rust code which allows to draw a clear boundary of what should be marked unsafe, which is that it should not be possible to cause undefined behavior with safe Rust code.

That's certainly the case here. You can't cause undefined behavior with just safe Rust code. You need to add some unsafe C++ code in order to get undefined behavior.

Consider the following: suppose you were using traditional bindgen to expose an unsafe FFI which you then wrapped in a safe Rust API, as is common practice. That doesn't give you the right to later mess up the C++ code arbitrarily badly without getting UB...

With CXX you can mess up the Rust code arbitrarily badly without getting UB, so the Rust code is safe.

from cxx.

dtolnay avatar dtolnay commented on June 27, 2024 2

But by that definition, extern functions should not be unsafe.

extern { fn init_stuff(); }

Why is calling init_stuff() unsafe? I could claim that calling it is not unsafe because it is the C code that need to be audited.

In the case of a handwritten extern, the two reasons it's unsafe are:

  • You might have the wrong signature. You don't get to mess up the Rust side arbitrarily badly (after reviewing what is being exposed from C++) without getting UB. CXX solves this.

  • Not all C++ functions with the type void (*)() are free of proof obligations.

CXX is designed for C++ functions that are fine to call with any possible value of their arguments and this would be something that an audit of the C++ side of the bridge would be responsible for verifying (but this definitely should be called out better in the documentation).

from cxx.

bjorn3 avatar bjorn3 commented on June 27, 2024

I understand. Making the rust side safe could make for example cargo-geiger think there is nothing to audit though, as it doesn't peek into the macro expansion.

from cxx.

dtolnay avatar dtolnay commented on June 27, 2024

The user of cargo-geiger would need to know that if there is non-Rust code in the project then Rust's safety guarantees obviously do not apply to the non-Rust code.

I don't know much about cargo-geiger but hopefully there would be some way for it to flag the presence of non-Rust code.

from cxx.

YaLTeR avatar YaLTeR commented on June 27, 2024

There's an interpretation of safe Rust code which allows to draw a clear boundary of what should be marked unsafe, which is that it should not be possible to cause undefined behavior with safe Rust code. Do I understand correctly that the intended usage of cxx is to write the C++ side in such a way as to satisfy this Rust side requirement, rather than marking the functions unsafe from the Rust side?

from cxx.

bjorn3 avatar bjorn3 commented on June 27, 2024

That doesn't give you the right to later mess up the C++ code arbitrarily badly without getting UB...

Yes, but in that case you uses unsafe {} to claim that the C++ doesn't cause UB. With cxx you can call C++ code which causes UB without ever using the unsafe keyword.

from cxx.

bjorn3 avatar bjorn3 commented on June 27, 2024

I think @bjorn3 is following the interpretation that when running a Rust program, all possible UB must arise from some unsafe block.

Indeed

Crucially, that cxx bridge file doesn't even have any unsafe in it so it is not clear that there is some proof obligation being determined here.

Maybe add a #[unsafe_assume_no_ub] (anyone a better name?) which makes the function safe to call, with the default being unsafe when #[unsafe_assume_no_ub] is not used?

from cxx.

dtolnay avatar dtolnay commented on June 27, 2024

@RalfJung I think your post you linked reinforces my perspective. The emphasis is that "unsafe is the minimal set of code that you need to audit to discover all sources of undefined behavior. As with the Vec implementation you talk about, looking at one piece of unsafe code can expose other code not necessarily bearing the unsafe keyword that also needs to be considered, but through the process you discover all problems and can convince yourself that the whole system is good.

Your wording of "C++ code as implicitly carrying an unsafe block all around it" is accurate. This is just how that language works, and how audits of FFI to it always work. https://www.reddit.com/r/rust/comments/elvfyn/ffi_like_its_2020_announcing_safe_ffi_for_rust_c/fdm8olr/ covers this in more detail from the perspective of real practical users of FFI. C++ isn't about clear formal proof obligations, it relies on the programmer to understand what they are looking at in broader context; this is pervasive to the language and applies here.

I would add that in addition to C++ code carrying an implicit unsafe block all around it, Cargo build scripts also always carry an implicit unsafe block all around and must be audited to discover the necessary understanding of the system and its safety requirements.

from cxx.

dtolnay avatar dtolnay commented on June 27, 2024

I think @bjorn3 is following the interpretation that when running a Rust program, all possible UB must arise from some unsafe block. This property is violated by cxx, because C++ code is not inside a literal unsafe block. Usually, writing unsafe { extern_fn() } is the place where the proof obligation for the C++ code occurs.

I don't find this to be a practical litmus test nor helpful for upholding the correctness of a codebase in practice. I've called out that all C++ and all build scripts implicitly carry the unsafe you are looking for. To drive this home, consider:

// build.rs

fn main() {
    let f = curl("https://gist.github.com/dtolnay/.../mystery.rs");
    let out_dir = PathBuf::from(env::var("OUT_DIR").unwrap());
    fs::write(out_dir.join("mystery.rs"), f).unwrap();
}
// src/lib.rs

include!(concat!(env!("OUT_DIR"), "/mystery.rs"));

What do we grep for to find the undefined behavior in this project? The solution isn't to make curling unsafe, or fs::write unsafe, or include! unsafe. I would see that as grasping to justify an incorrect litmus test.

from cxx.

0xd34d10cc avatar 0xd34d10cc commented on June 27, 2024

CXX is designed for C++ functions that are fine to call with any possible value of their arguments

Does CXX codegen automatically prove this? If not, why generated bindings are safe?

from cxx.

RalfJung avatar RalfJung commented on June 27, 2024

The emphasis is that "unsafe is the minimal set of code that you need to audit to discover all sources of undefined behavior.

I would extend this to say "to discover all sources of undefined behavior and the proof obligations to avoid it". And that's where cxx's approach falls short: even if we accept that C++ has an implicitly unsafe block all around it, that tells us nothing about what the actual proof obligation is.

And you are right that build scripts (and most forms of code generation) break this. There's not much we can do about that. Item-level proc-macros sometimes have a similar problem, and the usual approach seems to be to call them something with unsafe in their name -- that would match @bjorn3's proposal of an #[unsafe_assume_no_ub] attribute.

CXX is designed for C++ functions that are fine to call with any possible value of their arguments and this would be something that an audit of the C++ side of the bridge would be responsible for verifying (but this definitely should be called out better in the documentation).

The part about any possible value might be worth calling out more explicitly in the docs, maybe?

from cxx.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.