Code Monkey home page Code Monkey logo

checkedc-clang's Introduction

The Checked C clang repo

This repo contains a version of the LLVM/Clang toolchain that is being modified to support Checked C. Checked C extends C with checking to detect or prevent common programming errors such as out-of-bounds memory accesses. The Checked C specification is available at the Checked C repo.

Announcements

Source code update

On Feb 19, 2021 we updated the checkedc-clang sources to upstream release_110, specifically this commit.

On Feb 18, 2020 we updated the checkedc-clang sources to upstream release_90, specifically this commit.

Transition to monorepo

Early in 2019, the LLVM community transitioned to "monorepo".

We moved Checked C to a monorepo on Oct 30, 2019. This has resulted in the following changes:

  1. checkedc-llvm and checkedc-clang (as well as other LLVM subprojects) are now tracked via a single git repo.

  2. The checkedc-llvm repo will no longer be maintained. The checkedc-clang repo will be the new monorepo.

  3. There will be no changes to the checkedc repo. It will continue to be a separate git repo.

  4. All future patches should be based off this new monorepo.

  5. You can use this script to cherry-pick your existing patches to the new monorepo.

  6. Make sure to set the following CMake flag to enable clang in your builds: -DLLVM_ENABLE_PROJECTS=clang

Trying out Checked C

Programmers are welcome to use Checked C as it is being implemented. We have pre-built compiler installers for Windows available for download on the release page. For other platforms, you will have to build your own copy of the compiler. For directions on how to do this, see the Checked C clang wiki. The compiler user manual is here. For more information on Checked C and pointers to example code, see our Wiki.

You can use clangd built from this repository to get similar IDE support for editing Checked C code as upstream clangd provides for C code. For example, you can jump to definition/references and get a real-time display of errors and warnings, etc. Here is more information about Checked C's clangd.

3C: Semi-automated conversion of C code to Checked C

This repository includes a tool called 3C that partially automates the conversion of C code to Checked C. Quick documentation links:

More information

For more information on the Checked C clang compiler, see the Checked C clang wiki.

Build Status

Configuration Testing Status
Debug X86 Windows Checked C and clang regression tests Debug X86 Windows status
Debug X64 Windows Checked C and clang regression tests Debug X64 Windows status
Debug X64 Linux Checked C and clang regression tests Debug X64 Linux status
Debug X64 Linux 3C (Checked-C-Convert tool) nightly tests Nightly Sanity Tests
Release X64 Linux Checked C, clang, and LLVM nightly tests Release X64 Linux status

Contributing

We welcome contributions to the Checked C project. To get involved in the project, see Contributing to Checked C. We have a wish list of possible projects there.

For code contributions, we follow the standard Github workflow. See Contributing to Checked C for more detail. You will need to sign a contributor license agreement before contributing code.

Code of conduct

This project has adopted the Microsoft Open Source Code of Conduct. For more information see the Code of Conduct FAQ or contact [email protected] with any additional questions or comments.

checkedc-clang's People

Contributors

aaronballman avatar akyrtzi avatar alexey-bataev avatar arsenm avatar chandlerc avatar chapuni avatar d0k avatar ddunbar avatar djasper avatar douggregor avatar dwblaikie avatar echristo avatar eefriedman avatar ericwf avatar espindola avatar isanbard avatar lattner avatar majnemer avatar nico avatar pcc avatar rjmccall avatar rksimon avatar rnk avatar rotateright avatar rui314 avatar stoklund avatar tkremenek avatar tobiasgrosser avatar topperc avatar zygoloid avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

checkedc-clang's Issues

Validate type in bounds-safe interface type annotation

The compiler needs to validate the type of a bounds-safe type annotation relative to the declared type of the variable, member, or function return value:

  • The annotation type must be a checked pointer type.
  • It must be identical to the declared type if all occurrences of checked pointer types are replaced with unchecked pointer types.
  • The referent type of the annotation type must be ``at least as checked'' as the referent type of the declared type.

Check casts of function names to checked function pointer types.

We added new rules to the Checked C specification that allow a function name to be cast from its unchecked function pointer type to the corresponding checked function pointer type, where the two function types are compatible

The support for casting from an unchecked function pointer type to the corresponding checked function pointer type should already be there, but we do need to add tests for that.

We need to implement a separate test that ensures that the source is actually a function name and not just a variable with an unchecked function pointer type.

Merge-back to respective upstreams?

First of all, thanks a lot for putting this out in the public-domain.

I have a few questions, not sure if this is the best place to ask, but here I go:

A lot more users will directly benefit form this if this is provided as a package through distro-repositories for major OS distributions. So my questions are around how the teams sees this going forward:

  • Is the team working with upstream to merge changes back to clang/llvm codebases? If so, what release is being targeted/what timeline?
  • If merge-back is not on the cards, what is the long-term plan wrt packaging for various OS distro-flavors? Is the aim to make it available as separate packages on major distros (eg. Debian)?
  • If the plan is to have a separate distribution in the long-term, what does this mean for work that happens on Clang/LLVM upstream? Will patches from Clang/LLVM be merged back from time to time? How will the releases be planned?

improve error messages for misplacement of return bounds expression in function declarations

The parser currently looks for return bounds declarations as part of function declarators. They optionally appear after the argument list. For C functions that return constructed types that are complex, it is easy to get confused about this subtlety. One might think that the return bounds declaration appear after the declarator for a function and before the body.

Here is an example of a function that returns an unchecked pointer to 10 element array:

int (*(r31(int arg[10][10])))[10] {
  return arg;
}

If we want to add a bounds interop declaration to this complex declarator, the correct way to write it is:

int (*(r31(int arg[10][10]) : count(10)))[10] {
  return arg;
}

An incorrect way to write this is:

int (*(r31e(int arg[10][10])))[10] : count(10) {
  return arg;
}:

This results in the not-very-helpful error message:

expected function body after function declarator

We should improve the error message for this situation in clang. This is an easy error to make. In addition, it would be helpful to clearly document in the Checked C language extension that for function return values, the bounds declaration is part of the function declarator.

Note that this odd case is really a result of the complex way that return types for functions are declared in C. This case doesn't happen when using the new checked types. Instead one has

ptr<int[10]> r31(int arg[10][10]) {
  return arg;
}

process bounds expressions for parameters with all the parameters available

Bounds expressions need to processed during clang's semantic checking/processing phase in a scope with all the parameters available. I am about to submit a pull request where they are being processed in a scope that contains only the parameters seen so far.

This is a little complicated to implement in clang's semantic checking/processing phase. You have to delay parsing of the bounds expressions because parsing and semantic processing are intermixed.

Constraint library should deal with conflicts

A conflict could arise in a situation where a functions bounds-safe interface specifies _Ptr<T> for something in an interface, but our analysis determines that it can't be a _Ptr. In that situation, we have some Checked C tools at our disposal to deal with the problem (like casts) but the constraint library can't deal with this situation at all.

The constraint library should be modified so that:

  • Constraints that "hold a value down" are possible (i.e. q_i != ARR /\ q_i != WILD)
  • Conflicting constraints are identified and reported as solve failures

Once that's done, the client of the constraint library can start to implement solutions to conflicts discovered around external specifications.

improve error message for return bounds declarations where the return value has the wrong type

The compiler checks that return bounds declarations are only used for return types that can have bounds expression. For example, a function that returns a float cannot have a return bounds declaration. Some of the error messages incorrectly suggest an array type as one of the possible valid types. Functions cannot return array types, so this is confusing and should be fixed.

Add support for relative bounds expressions

The Checked C specification has relative bounds expressions. We need to extend the IR to represent relative bounds expressions, add parsing support for that, and add typechecking for that.

Remove declaring unchecked arrays using the 'unchecked' keyword

We decided to stick with the existing rule in the specification that all dimensions of a multi-dimensional array must all be checked or unchecked. There is no need for the ability to declare unchecked arrays using the 'unchecked' keyword, so we should remove this feature.

Allow type qualifiers and static keyword in parameter type annotations

C allows type modifiers or the keyword static to appear within the brackets for an array declarator for a parameter. For example, int a[static 10] means that a should only be passed arguments that have at least 10 elements. int a[const 10] means that a should be treated as having const pointer type in the body of the function.

The code for parsing type names for bounds-safe interface types for parameters did not allow these syntactic forms. This could cause problems for adding bounds-safe interface type annotations to code that uses them. The interface type and the declared type must be identical when checked-ness is ignored for pointers and arrays. We should add this support.

Replace logic in NewTyp::mkTypForConstrainedType with a recursive AST visitor

The logic in NewTyp::mkTypForConstrainedType is ugly and probably wrong in some cases. It should be replaced with either a ginormous switch statement (which is more intuitive from an ML-persons point of view, but not LLVM idiomatic) or a RecursiveASTVisitor (which is ugly from an ML-persons point of view because functionality is smeared across a bunch of tiny functions, but is LLVM idiomatic).

Restrict variable references in bounds in function types

The clang implementation of bounds in function types only supports references to global variables or parameters declared as part of the function declarator for a function type. We should add checks that enforce these restrictions. Otherwise typechecking may behave in unexpected ways.

Take the LUB of constraints when multiple constraints exist and they might differ

In FunctionVariableConstraint::mkString we take the first constraint in a set for the return and parameters and use them. Right now this should be okay because they should all be constrained to be equal, so whatever one is resolved to should be the same as the others. However, maybe in the future that won't be true, so we should instead be more generic and take the LUB of all the ConstraintVariables in the set.

Represent bounds information in function types

There needs to be a representation of bounds information in function types. Currently the bounds information is not being included in function types. This is problematic because in clang, typedefs are represent as names that map to type object. Uses of typedef'ed names refer to those function objects. It is also problematic for typechecking pointers to function types.

One approach is to use DeBruijn indices: number all the arguments and then change bounds expressions used in types to refer to those arguments. We would need a way to represent uses of those arguments in bounds expressions. For now, we could require that two bounds expressions be exactly identical after this transformation. Later on, we could canonicalize bounds expressions and require that they have the same canonical representation. Even later on, we might check to see if one bounds expression implies another.

Check bounds declarations for pointers to constant-sized data for a subset of expressions

This work item is to check bounds declarations to pointers to constant-sized data, where the bounds have the form bounds(_x_ + _const1_, _y_ + const2).   We will check bounds declarations for a subset of expressions that are useful for creating ptr-typed values.  These include:

  • address-of operators
  • uses of unchecked arrays with known dimensions
  • uses of unchecked arrays with known dimensions that are not parameters
  • function calls.  For function calls, we will need to substitute constant argument  expressions for parameter variables occurring in the return count.  We will then need to determine whether the resulting expression is a constant-expression.
  • casts

It also includes checking bounds at

  • Simple assignments
  • Function calls

Type check non-count bounds expressions

Type check bounds declarations and bounds expressions that use bounds, following the rules outlined in the Checked C design. These can be used in parameter declarations, variables declarations, and for function return, following the rules outline.

Revamp kinds for bounds expressions

It would be good to have one kind field on bounds expressions instead of specialized kinds per type of bounds expressions. This would make it easier to write code that traverses bounds expressions. Currently, we have to test the type of a bounds expression and then look at the kind field.

Only allow Checked C extension flag for C in clang

We should only allow the Checked C extension flag to be used for C in clang. We should issue an error message for other languages. We're not currently doing the implementation work to enable all the language features for other C family languages supported by clang (such as C++). We're also not testing the other C family languages.

typecheck return bounds expression for function pointer types

The Checked C clang implementation is not typechecking returnb ounds expressions for function pointer types. The following two methods compile without errors:

void fn121(int (*fnptr)(void) : count(5));
void fn122(array_ptr<void> (*fnptr)(void) : count(5));

The first case declares a return bounds expression of count(5) for an integer return value, which is not legal. The second case declares a return bounds expression of count(5) for an array_ptr return value, which is also illegal.

Re-visit re-writing variable declarations with a typedefed type

Let's say you have a typedef that looks like this:

typedef struct _A { 
  int a;
  int b;
} A, *PA;

There are two variable declarations somewhere in the program:

PA p1 = foo();

and

PA p2 = bar(); 

In one scenario, both foo and bar are unconstrained and can be PTR. The re-writer could make one edit, in the typedef, to change the definition of PA to be ptr<struct _A> PA. Then this will just work.

However, let's say that the return value of foo is PTR but bar is constrained to not PTR. So one of these will be re-written to ptr and another won't. Can we do better? One option is to create a new typedef, but there are questions to be answered. What is this typedef called? How do we disambiguate it from the old one? And so on.

Come up with a spectrum of answers to this question and implement them.

Rename keyword in bounds-safe interface type annotation.

Bounds-safe interface type annotations currently have the syntactic form type(type name). The usage of the name type is potentially confusing. It could be literally be interpreted that this is specifying a new type for the declared variable. We are going to use the name itype instead to reduce the potential for confusion. This is an abbreviation for interface type.

Handle redeclarations of functions and variables with bounds

For interoperation, we will redeclare existing library functions with additional bounds information. We assume that modifying library source code is not possible when modifying programs that use libraries. We'll have to add bounds information separately. C allows functions and extern variable to be redeclared, so long as the redeclared versions are compatible with the existing version.

We need to extend the code for checking redeclaration to take bounds into account. For parameters or variables of array_ptr type, bounds will have to match. For now, we will use syntactic equality. If they have unchecked pointer type, if they both of have bounds, they must match. Otherwise, the bounds will be regard as being part of a more "complete" declaration of the parameter or variable.

prefix new keywords with underscore

C has a well-established design pattern for introducing new keyword: prefix them with _. This reduces the chance of inadvertent naming conflicts from new keywords. This was used to introduce the Boolean type, for example. The keyword in C is _Bool. The unprefixed keywords can be used by including a header file that #defines the unprefixed keyword to the underscore based keyword. For example, the standard header file stdbool.h does this for Boolean.

We have observed that real-word code such as the Windows header files uses ptr as variable and field names, so we will switch to the C pattern.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.