Overview
I am concerned that the treatment of null pointers in Checked C will lead to too many runtime checks. We have been implementing the runtime checks required by the current Checked C specification. At memory accesses using an array_ptr, there would be a null pointer check followed by a bounds checks. At pointer arithmetic involving array_ptr, there will also be a non-null check before the pointer arithmetic operation. There will be a lot of checking.
The problem is the semantics that we’ve chosen for bounds when null pointers are around: a pointer is either null or has valid bounds. The problem is that this means that a null pointer may not have valid bounds. From Section 3.1 of the Checked C v0.6 specification:
The meaning of a bounds expression can be defined more precisely. At runtime, given an expression e with a bounds expression bounds(lb , ub ), let the runtime values of e , lb , and ub be ev ,lbv , and ubv , respectively. The value ev will be 0 (null) or have been derived via a sequence of
operations from a pointer to some object obj with bounds(low , high ). The following statement will be true at runtime: ev == 0 || (low <= lbv && ubv <= high ). In other words, if ev is null, the bounds may or may not be valid. If ev is non-null, the bounds must be valid. This implies
that any access to memory where ev != 0 && lbv <= ev && ev < ubv will be within the bounds of obj .
We chose this definition because C treats null pointers as interchangeable with other pointers. The definition results in less work and typing when converting programs. However, it has led to several issues in the semantics:
- We can’t allow arithmetic involving a null pointer because that could lead to the forging of a non-null pointer with invalid bounds. This is why we need runtime checks on pointer arithmetic.
- We “lose” bounds information when a pointer becomes null.
Proposal
We’re running into problems because we’re trying to combine bounds checking and the handling of null pointers. The fact that C pointers can either be null or point to valid objects is a source of complexity when reasoning about C programs.
I propose that we adapt the idea of nullable pointers to Checked C. We would use types to distinguish between the different ways in which null will be allowed or handled:
- ptr values must point to valid objects that can hold values of type T. ptr values cannot be null.
- array_ptr values can point anywhere in memory or be null. Bounds for array_ptr values must always be valid (a subrange of a valid object). This restricts when array_ptr values that have bounds can be null. It also prevents array_ptr values that are null from being used to access memory. Null is not within the range of any object, so bounds checks will always fail. No runtime checks are needed for pointer arithmetic.
- We introduce a
nullable
modifier that can be applied to ptr and array_ptr types.
- For a pointer of type
nullable ptr<T>
, a runtime null check is done before accessing memory.
- For a pointer of type
nullable array_ptr<T>
, a runtime null check is done before accessing memory. The runtime null check precedes the bounds check. The bounds for a nullable array_ptr<T>
are only required to be valid when the value is non-null.
- Null pointer constants have empty bounds (corresponding to the empty object) instead of having ‘any’ bounds.
- We may decide to allow conditional bounds expressions. I’d prefer to put this off for now.
Examples
It is a valid to assign a ptr variable a value that is guaranteed to be non-null. The following declarations and assignments are valid:
int y;
ptr<int> px = &y;
int arr[10];
px = &arr[5];
It is not valid to assign a ptr variable a value that is null. The following will be rejected at compile time:
ptr<int> px = NULL;
void f(int *a) {
ptr<int> p = &*a; // a could be null and a may not have valid bounds.
}
It is valid to assign null to an array_ptr variable with bounds, if the bounds are empty:
int len = 0;
array_ptr<int> x : count(len) = NULL;
The empty bounds are a subrange of any valid object.
It is invalid to assign to null to an array_ptr variable with non-empty bounds. This declaration is invalid:
array_ptr<int> x : count(5) = NULL;
bounds(NULL, NULL + 5) is not a subrange of a valid object.
It is valid to assign null to an nullable array_ptr variable with non-empty bounds. This declaration is valid:
nullable array_ptr<int> x : count(5) = NULL
Additional thoughts
- There is another way to understand why values with ptr cannot be null. The declaration
ptr<T> x
is equivalent to array_ptr<T> x : count(1)
. The bounds (NULL, NULL + 1) are invalid because no valid object includes NULL in it is bounds.
- ptr values become pointers that can be used unconditionally (without runtime checks).
- array_ptr only requires bounds checks.
Bounds-safe interfaces
My strawman proposal is to allow the keyword nullable
to precede the in-line bounds declaration for an unchecked pointer type. For example:
void *calloc(size_t num, size_t size) : nullable byte_count(num * size);
This implies in a checked context that calloc returns a nullable array_ptr<void>
.
For interface types, nullable can be applied as a type qualifier to _Ptr types. For example, the bounds-safe interface for the string-to-double function would be:
double strtod(const char * restrict nptr,
char ** restrict endptr : itype(restrict _Nullable _Ptr<char *>));
If endptr is non-null, strtod returns the location where the conversion stopped by modifying *endptr.
Conversions
- ptr values and array_ptr values can always be converted to nullable ptr and nullable array_ptr, respectively.
- The reverse conversion (from nullable ptr and nullable array_ptr to ptr and array_ptr, respectively) is allowed only when it provable that the value being converted is not null.
- Conversions from array_ptr to ptr continue to require that the array_ptr have bounds large enough to hold the ptr value.
Next steps
I modified the Checked C wrappers from the C standard library to add nullable type modifiers where necessary. I didn’t modify functions involving strings because we haven’t added support for null-terminated arrays. The results are on Github at https://github.com/dtarditi/checkedc/tree/nullable. There are two quick take-aways:
- Most functions aren’t expecting or prepared to handle a null pointer : nullable modifiers were not needed in too many places.
- It makes the interface descriptions more precise. This is no surprise; comparisons with SAL may arise. It seems better to have machine-checkable descriptions than to rely on imprecise English descriptions.