Code Monkey home page Code Monkey logo

Comments (11)

sulekhark avatar sulekhark commented on May 27, 2024

Thank you for the report.

  1. We are looking into validating the bounds in scenarios shown below (where there is a conditional) in the coming 3 to 4 months.
snprintf_test.c:13:12: note: (expanded) expected argument bounds are 'bounds((char *)buf, (char *)buf + len + 1 == 0 ? 0 : len + 1 - 1)'
snprintf_test.c:13:12: note: (expanded) inferred bounds are 'bounds(buf, buf + len)'
  1. At present, the compiler is not reasoning that len + 1 - 1 = len. We hope to fix this soon.

  2. I am working on #449 for the defining snprintf_array_ptr.

  3. Yes, as you have pointed out, there are many functions that write a null-terminated string to a user-supplied buffer. As we see specific cases, we can think of solutions for the more common usage patterns. For inet_ntop, the (interface) return type can be _Nt_array_ptr<const char> with the default bounds of count(0) as it is in the port of musl to Checked C. The case of snprintf is more tricky - we need to think about it.

from checkedc.

mattmccutchen-cci avatar mattmccutchen-cci commented on May 27, 2024

Thanks Sulekha for the update (on this and my other recent issues)!

For inet_ntop, the (interface) return type can be _Nt_array_ptr<const char> with the default bounds of count(0) as it is in the port of musl to Checked C.

I was talking about the char * parameter (named s in the musl port), not the return value. The size of the buffer (including the null terminator) is passed as a separate parameter, named l in the musl port. So this really is the exact same scenario as snprintf. The musl port currently declares s : itype(restrict _Nt_array_ptr<char>) count(l), i.e., the size is l excluding the null terminator. Thus, a call like the following:

void test() {
  char ip4_bin[4];
  char buf _Nt_checked[50];
  my_inet_ntop(AF_INET, ip4_bin, buf, sizeof buf);
}

fails to compile:

ntop_bound_test.c:11:34: error: argument does not meet declared bounds for 3rd parameter
  my_inet_ntop(AF_INET, ip4_bin, buf, sizeof buf);
                                 ^~~
ntop_bound_test.c:11:34: note: destination bounds are wider than the source bounds
ntop_bound_test.c:11:34: note: destination upper bound is above source upper bound
ntop_bound_test.c:11:34: note: (expanded) expected argument bounds are 'bounds((char *)buf, (char *)buf + (socklen_t)sizeof buf)'
ntop_bound_test.c:11:34: note: (expanded) inferred bounds are 'bounds(buf, buf + 49)'
  my_inet_ntop(AF_INET, ip4_bin, buf, sizeof buf);
                                 ^~~

To work around this, the caller would have to subtract 1 from sizeof buf.

I appreciate the work you're doing on better solutions. But what would you recommend that we do right now, e.g., for functions we're adding in #448? Copy and paste the complicated bound expression from snprintf to at least maintain consistency and document that the size includes the null terminator? (Would it be worth factoring out the bound expression into a macro in a shared Checked C header file? I don't know how introducing another header file might affect the tests.)

from checkedc.

sulekhark avatar sulekhark commented on May 27, 2024

Here's one proposal for something that we could do right now in order to proceed with porting. I will illustrate this approach for the snprintf function. The proposal is to define macros as follows:

_Unchecked
int snprintf(char * restrict s : itype(restrict _Nt_array_ptr<char>) count(n == 0 ? 0 : n-1),
                   size_t n,
                   const char * restrict format : itype(restrict _Nt_array_ptr<const char>), ...);

#define chkc1_snprintf(s, n, format, ...) snprintf(s, n, format, __VA_ARGS__)
#define chkc2_snprintf(s, n, format, ...) _Unchecked { snprintf((char *)s, n+1, format, __VA_ARGS__); }
#define chkc3_snprintf(s, n, format) snprintf(s, n, format)
#define chkc4_snprintf(s, n, format) _Unchecked { snprintf((char *)s, n+1, format); }

NOTE: The macros chkc3_snprintf and chkc4_snprintf are required because the previous two variadic macro definitions are not accepting zero variadic arguments.

Here is a test case that shows the two common ways in which the function snprintf could be called. Note that in each of the calling scenarios, the "convenient/natural" way that one would like to call snprintf is different.

#include <stdio.h>

void snprintf_test1() {
  char buf _Nt_checked[50];
  chkc1_snprintf(buf, 50, "Hello world - 1 - %d \n", 10);
  chkc3_snprintf(buf, 50, "Hello world - 1");
}

void snprintf_test2(_Nt_array_ptr<char> buf : count(len), size_t len) {
  chkc2_snprintf(buf, len, "Hello world - 2 - %d \n", 10);
  chkc4_snprintf(buf, len, "Hello world - 2");
}

The macros serve two purposes:

  1. hide the inconvenience of either doing a +1 or a -1 to reconcile the bounds of the first argument of type _Nt_array_ptr with its size, which is the second argument.
  2. enclose the calls to snprintf in an unchecked block temporarily until the compiler is able to reason about conditionals in bounds expressions.

Comments / Improvements / Alternatives are welcome.

from checkedc.

mattmccutchen-cci avatar mattmccutchen-cci commented on May 27, 2024

Hi Sulekha,

I really appreciate your help in finding an interim solution so we can proceed with porting right away.

I propose we start with a simpler example function with no variable arguments, since variable arguments both cause issues for the macros and require all calls to be unchecked; we'll see that my final proposal below works for variable-argument functions. Here's my example code analogous to yours. I've provided an implementation for the function so we can do simple runtime tests.

#pragma CHECKED_SCOPE on

#include <stddef.h>

void sncopy(char * restrict s : itype(restrict _Nt_array_ptr<char>) count(n == 0 ? 0 : n-1),
            size_t n,
            const char * restrict src : itype(restrict _Nt_array_ptr<const char>)) {
  if (n == 0)
    return;
  size_t i = 0;
  _Nt_array_ptr<const char> src2 : count(i) =
    _Dynamic_bounds_cast<_Nt_array_ptr<const char>>(src, count(i));
  while (i < n - 1 && src2[i] != '\0') {
    s[i] = src2[i];
    i++;
  }
  while (i < n - 1) {
    s[i] = '\0';
    i++;
  }
  s[n - 1] = '\0';
}

#define chkc1_sncopy(s, n, src) sncopy(s, n, src)
#define chkc2_sncopy(s, n, src) _Unchecked { sncopy((char *)s, n+1, src); }

void sncopy_test1(void) {
  char buf _Nt_checked[50];
  chkc1_sncopy(buf, 50, "Hello world - 1");
}

void sncopy_test2(_Nt_array_ptr<char> buf : count(len), size_t len) {
  chkc2_sncopy(buf, len, "Hello world - 2");
}

This works, although I'd like chkc1_sncopy and chkc2_sncopy to have better names that remind the user of what they do. chkc1_sncopy is equivalent to sncopy, so we could just use sncopy directly. For chkc2_sncopy, we could use my earlier proposal of sbcopy (b for "bound").

However, I have two more fundamental concerns about the approach:

(1) The _Unchecked in chkc2_sncopy bypasses checking of the entire call expression. It doesn't check that the original bounds of s are correct; it also bypasses checking of argument expressions to sncopy that have nothing to do with the buffer, including their subexpressions. For example, each of the following functions causes a segmentation fault that is not caught by Checked C:

void sncopy_test2_bad1(_Nt_array_ptr<char> buf : count(len), size_t len) {
  chkc2_sncopy(buf, len, (char*)1);
}

void sncopy_test2_bad2(void) {
  char bad_buf _Checked[20];
  chkc2_sncopy(bad_buf, 100000000, "Hello world - 3");
}

Fortunately, this is easy to fix: instead of a macro, we can use a wrapper function that performs only the single needed _Assume_bounds_cast and leaves the rest of the checking in place:

void sbcopy(restrict _Nt_array_ptr<char> s : count(len),
            size_t len,
            restrict _Nt_array_ptr<const char> src) {
  size_t len2 = len + 1 == 0 ? 0 : len + 1 - 1;
  _Nt_array_ptr<char> s2 : count(len2) = 0;
  _Unchecked {
    s2 = _Assume_bounds_cast<_Nt_array_ptr<char>>(s, count(len2));
  }
  // The compiler issues a warning on the third argument that I think is wrong
  // and irrelevant to the present discussion.
  sncopy(s2, len + 1, src);
}

void sbcopy_test2(_Nt_array_ptr<char> buf : count(len), size_t len) {
  sbcopy(buf, len, "Hello world - 2");
}

(2) The approach still uses a separate wrapper macro or function for every system function that writes a null-terminated string to a buffer. I still think maintaining all of these wrappers is going to be an unreasonable amount of work. This can be solved by defining a helper function and macro that do just the bounds cast and can be used in combination with any system function:

_Nt_array_ptr<char> buf_plus_nt_bounds_cast(_Nt_array_ptr<char> buf : count(len), size_t len)
  : count(len + 1 == 0 ? 0 : len + 1 - 1) _Unchecked {
  return _Assume_bounds_cast<_Nt_array_ptr<char>>(buf, count(len + 1 == 0 ? 0 : len + 1 - 1));
}

#define BUF_PLUS_NT_ARGS(_buf, _len) buf_plus_nt_bounds_cast(_buf, _len), (_len) + 1

void sncopy_test2_better(_Nt_array_ptr<char> buf : count(len), size_t len) {
  sncopy(BUF_PLUS_NT_ARGS(buf, len), "Hello world - 2");
}

(BUF_PLUS_NT_ARGS assumes that the buffer and size are successive arguments to the system function. This is true for all the system functions I've seen so far, but if the user needs to use a function that isn't defined this way, they can use buf_plus_nt_bounds_cast directly.)

Does the BUF_PLUS_NT_ARGS approach look good to you? (I'm happy to use different names if you prefer.) If so, I'll go ahead and use it in #448. I just need to know where I can put the common helper definitions; I assume I should start a new header file for such things. What filename would you suggest, and what changes (if any) should I make to the tests to accommodate the new header file? (Ultimately, we'll also need a solution to #449, but I can go ahead and start testing things without it.)

As I previously mentioned at the end of #450 (comment), this would also be a good opportunity to factor out the count(n == 0 ? 0 : n-1) code pattern into a macro to avoid having to duplicate it in many checked declarations (and the associated churn if we need to change it in the future). Does the following look good?

#define nt_count_for_size(_size) count(_size == 0 ? 0 : _size - 1)

void sncopy(char * restrict s : itype(restrict _Nt_array_ptr<char>) nt_count_for_size(n),
            size_t n,
            const char * restrict src : itype(restrict _Nt_array_ptr<const char>));

from checkedc.

sulekhark avatar sulekhark commented on May 27, 2024

We discussed this issue internally. We think that there is something deeper going on here that we need to sort out. We will need some time to think about this.

from checkedc.

sulekhark avatar sulekhark commented on May 27, 2024
  1. We have modified the declaration of snprintf to simplify the bounds on the first parameter s, and impose the precondition n > 0 on the second parameter, n.
  2. The changes are in: PR #456 on the checkedc repository and PR #1086 on the checkedc-clang repository (this PR fixes a bug related to the where clause usage in the snprintf declaration). I am running the final ADO tests on these PRs.
  3. Under the new declaration of the snprintf function, the compiler still warns that it cannot prove that the first argument meets declared bounds for 1st parameter in calls such as snprintf(p, len + 1, format, arg) (i.e., it is unable to prove that p + len == p + len + 1 - 1). We plan to fix this soon. So, I suggest that we don't define a macro/function to workaround this issue.
  4. If you would like to propose a macro or any other mechanism to hide the "len + 1" that we need to pass to functions like snprintf, your suggestions are welcome.
  5. At present the compiler does not yet validate that arguments to function calls satisfy the preconditions placed on the corresponding parameters via where clauses in function declarations. But we would like to proceed forward with specifying preconditions on function parameters where necessary - we will take up validating such preconditions at call sites as soon as possible.

from checkedc.

mattmccutchen-cci avatar mattmccutchen-cci commented on May 27, 2024

Thanks Sulekha. I'm studying your changes now, and I'd appreciate if you'd give me a few hours to raise any concerns before you merge the PR.

  1. Under the new declaration of the snprintf function, the compiler still warns that it cannot prove that the first argument meets declared bounds for 1st parameter in calls such as snprintf(p, len + 1, format, arg) (i.e., it is unable to prove that p + len == p + len + 1 - 1). We plan to fix this soon. So, I suggest that we don't define a macro/function to workaround this issue.

  2. If you would like to propose a macro or any other mechanism to hide the "len + 1" that we need to pass to functions like snprintf, your suggestions are welcome.

My proposal would still be BUF_PLUS_NT_ARGS as I described above. You say "We plan to fix this soon": how soon, realistically? I don't want #448 to be blocked indefinitely. So I'd propose to go ahead and add the BUF_PLUS_NT_ARGS mechanism to #448 for now, and if you do get the compiler fix done before #448 is merged, I'll remove BUF_PLUS_NT_ARGS from #448 again.

  1. At present the compiler does not yet validate that arguments to function calls satisfy the preconditions placed on the corresponding parameters via where clauses in function declarations. But we would like to proceed forward with specifying preconditions on function parameters where necessary - we will take up validating such preconditions at call sites as soon as possible.

Right, I see you have microsoft/checkedc-clang#218 open for this. But if we use where clauses in the checked declarations for existing C functions, what is your plan to avoid breaking compatibility with existing C code with implicit checked header inclusion? It would be silly for C code to pass a literal 0 as the size, but it might well pass a variable that isn't statically known to be positive (because of course, plain C code won't have a where clause), something like this:

int foo(char *buf, int size) {
  return snprintf(buf, size, /* some arguments */);
}

Is your plan to issue a warning (or no diagnostic at all) instead of an error if this occurs in an unchecked scope?

from checkedc.

sulekhark avatar sulekhark commented on May 27, 2024

My proposal would still be BUF_PLUS_NT_ARGS as I described above. You say "We plan to fix this soon": how soon, realistically? I don't want #448 to be blocked indefinitely. So I'd propose to go ahead and add the BUF_PLUS_NT_ARGS mechanism to #448 for now, and if you do get the compiler fix done before #448 is merged, I'll remove BUF_PLUS_NT_ARGS from #448 again.

We plan to take up this fix in a day or two and we should be done with the fix a week after we start. I really do not prefer to have the buf_plus_nt_bounds_cast wrapper. The macro could be:

#define NT_ARGS(_len) (_len) + 1

With the fix in place and the macro defined as above, we will also no longer have to define different macros if len does not follow buf in the argument list.

Right, I see you have microsoft/checkedc-clang#218 open for this. But if we use where clauses in the checked declarations for existing C functions, what is your plan to avoid breaking compatibility with existing C code with implicit checked header inclusion? It would be silly for C code to pass a literal 0 as the size, but it might well pass a variable that isn't statically known to be positive (because of course, plain C code won't have a where clause), something like this:

int foo(char *buf, int size) {
  return snprintf(buf, size, /* some arguments */);
}

Is your plan to issue a warning (or no diagnostic at all) instead of an error if this occurs in an unchecked scope?

Yes, the plan is to issue a warning instead of an error if this occurs in an unchecked scope.

from checkedc.

mattmccutchen-cci avatar mattmccutchen-cci commented on May 27, 2024

Sounds like a good plan. I have no further concerns about #456. In #448, I'll plan to copy the new code pattern from snprintf to inet_ntop and so forth. The BUF_PLUS_NT_ARGS thing is independent and can be added later if appropriate. If the compiler fix for p + len + 1 - 1 ends up being delayed and you really don't want to offer the macro as an interim solution for users (because you don't want to have to keep it for compatibility once the compiler is fixed, or whatever), I guess CCI will put it in our own fork for use by anyone who wants a solution and can accept the compatibility break when it is ultimately removed.

Is there an existing issue that covers the p + len + 1 - 1 simplification? (Maybe the third bullet of microsoft/checkedc-clang#208, but that issue is very broad and it might be good to have a more specific one that can actually be tracked and declared fixed.) Should I submit one myself?

from checkedc.

sulekhark avatar sulekhark commented on May 27, 2024

Ok, thanks! We will file an issue and start work on the simplification of p + len + 1 - 1.

from checkedc.

mattmccutchen-cci avatar mattmccutchen-cci commented on May 27, 2024

I see that microsoft/checkedc-clang#1088 has been filed for p + len + 1 - 1. I think this issue can now be closed because the remaining work is covered by other issues.

from checkedc.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.