Code Monkey home page Code Monkey logo

c4's People

Contributors

chainhelen avatar nubok avatar pborreli avatar rswier avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

c4's Issues

Pull request

Hello,
I have forked your repo to my git server and would like you to pull some syntax changes :-)

Does not compile for 64 bit. (segfault in line 442)

> uname -a
Linux 3.17.1-1-ARCH #1 SMP PREEMPT Wed Oct 15 15:04:35 CEST 2014 x86_64 GNU/Linux
> gcc -g -o c4 c4.c
[lots of warnings]
> ./c4 hello.c
segmentation fault (core dumped) ./c4 hello.c

running in gdb

Program received signal SIGSEGV, Segmentation fault.
0x000000000040392a in main (argc=1, argv=0x7fffffffe3b0) at c4.c:442
442   *--sp = EXIT; // call exit if main returns```

Interestingly, running it with valgrind ./c4 hello.c works

Compilation error

gcc -o c4 c4.c -m32
In file included from /usr/include/stdio.h:28:0,
from c4.c:9:
/usr/include/features.h:324:26: fatal error: bits/predefs.h: No such file or directory
compilation terminated.

gcc compile error. expects argument of type ‘int’, but argument 2 has type ‘long long int’

$ gcc -v
Using built-in specs.
COLLECT_GCC=gcc
COLLECT_LTO_WRAPPER=/usr/lib/gcc/x86_64-linux-gnu/5/lto-wrapper
Target: x86_64-linux-gnu
Configured with: ../src/configure -v --with-pkgversion='Ubuntu 5.4.0-6ubuntu1~16.04.11' --with-bugurl=file:///usr/share/doc/gcc-5/README.Bugs --enable-languages=c,ada,c++,java,go,d,fortran,objc,obj-c++ --prefix=/usr --program-suffix=-5 --enable-shared --enable-linker-build-id --libexecdir=/usr/lib --without-included-gettext --enable-threads=posix --libdir=/usr/lib --enable-nls --with-sysroot=/ --enable-clocale=gnu --enable-libstdcxx-debug --enable-libstdcxx-time=yes --with-default-libstdcxx-abi=new --enable-gnu-unique-object --disable-vtable-verify --enable-libmpx --enable-plugin --with-system-zlib --disable-browser-plugin --enable-java-awt=gtk --enable-gtk-cairo --with-java-home=/usr/lib/jvm/java-1.5.0-gcj-5-amd64/jre --enable-java-home --with-jvm-root-dir=/usr/lib/jvm/java-1.5.0-gcj-5-amd64 --with-jvm-jar-dir=/usr/lib/jvm-exports/java-1.5.0-gcj-5-amd64 --with-arch-directory=amd64 --with-ecj-jar=/usr/share/java/eclipse-ecj.jar --enable-objc-gc --enable-multiarch --disable-werror --with-arch-32=i686 --with-abi=m64 --with-multilib-list=m32,m64,mx32 --enable-multilib --with-tune=generic --enable-checking=release --build=x86_64-linux-gnu --host=x86_64-linux-gnu --target=x86_64-linux-gnu
Thread model: posix
gcc version 5.4.0 20160609 (Ubuntu 5.4.0-6ubuntu1~16.04.11)

gcc version 5.4.0 20160609 (Ubuntu 5.4.0-6ubuntu1~16.04.11) compile error.

ubuntu@VM-0-4-ubuntu:~/c4$ gcc -o c4 c4.c
c4.c: In function ‘next’:
c4.c:56:16: warning: format ‘%d’ expects argument of type ‘int’, but argument 2 has type ‘long long int’ [-Wformat=]
         printf("%d: %.*s", line, p - lp, lp);
                ^
c4.c:56:16: warning: field precision specifier ‘.*’ expects argument of type ‘int’, but argument 3 has type ‘long int’ [-Wformat=]
c4.c:62:34: warning: format ‘%d’ expects argument of type ‘int’, but argument 2 has type ‘long long int’ [-Wformat=]
           if (*le <= ADJ) printf(" %d\n", *++le); else printf("\n");
                                  ^

Inc not correct

int main()
{
int n;
n = 0;
printf("%d %d %d %d", ++n , n, n++, ++n);
}

c4 result: 1 1 1 3 //correct
c5 result: 3 2 1 1 //wrong

c4 does not support "int* - int*" operation

It looks like if the first left hand side of a substract is int*, then it ignores the type of the right hand side and treats it like a number. The following program demonstrates it

int main()
{
  int *s, *e, v;

  s = (int*)0xbebebeb0;
  e = (int*)0xbebebeb4;
  v = e - s;
  if (v == 1)
    printf("passed\n");
  else
    printf("failed, e - s = %x\n", v);
  v = e - 1;
  if ((int)v == s)
    printf("passed\n");
  else
    printf("failed, e - s = %x\n", v);
}

EarlGray's and my fork both hit this because we do "jitmap[pc - text]" where all jitmap, pc and text are of type int*. Maybe not really in EarlGray's fork because his c4x86.c cannot compile itself yet because he still relies on calling function pointer and not your qsort hack.

I think it's ok if c4 does not support this. But then it should warn something. Producing an incorrect program this way could be hard to track down.

licensing

Hey!

I'd love to continue extending some of this code for an existing personal project. Any chance you could release the code under a copyfree license?

Compilation error on MacOS

clang info:

Configured with: --prefix=/Library/Developer/CommandLineTools/usr --with-gxx-include-dir=/Library/Developer/CommandLineTools/SDKs/MacOSX.sdk/usr/include/c++/4.2.1
Apple clang version 11.0.0 (clang-1100.0.33.12)
Target: x86_64-apple-darwin19.3.0
Thread model: posix
InstalledDir: /Library/Developer/CommandLineTools/usr/bin

Error while compiling:

c4.c:14:13: note: expanded from macro 'int'
#define int long long
            ^
c4.c:333:5: error: first parameter of 'main' (argument count) must be of type 'int'
int main(int argc, char **argv) {

Assembly (-s) listing of branches and jumps is incorrect

If you build and do this:
./c4 -s c4.c | tail -20 | head -5
You will see

        LI
        LEV
    525:     else { printf("unknown instruction = %d! cycle = %d\n", i, cycle); return -1; }
        JMP  0
        IMM  -2357464

But when this is executed using the -d option you of course don't have crashes from branching to zero, and the instructions show non-zero targets in every executed case.

I believe the root cause is printing the assembly from inside next() from where it will often be the case that the code starting at line 291 in stmt() will not have back-patched the branches:

    if (tk == ')') next(); else { printf("%d: close paren expected\n", line); exit(-1); }
    *++e = BZ; b = ++e;
    stmt();

The recursive call to stmt() calls next() eventually, which prints the instructions that have not yet been printed, i.e. while (le < e), and this includes the byte at b which is to be filled in later.

At the moment I believe there is a fairly economical approach to this, but it would involve postponing the entire listing to after the parse, with some sort of list of (source_line,object) pairs or array of source lines so the C and assembly can be interleaved, with perhaps an atexit() to provide the listing if the parse aborts.

bug: close paren expected in sizeof

when c4 compiles this code (which can be compiled by gcc) it says "close paren expected in sizeof"

include<stdio.h>

int main(){
printf("%d\n",sizeof('a'));
}

`329: duplicate global definition` when compiling and interpreting itself

After fixing compile error and some warnings, I got 329: duplicate global definition when compiling and interpreting itself. Why did it happen?
The modified code is as follows:

#include <stdio.h>
#include <stdlib.h>
#include <memory.h>
#include <fcntl.h>
#define int long long

char *p, *lp, // current position in source code
     *data;   // data/bss pointer

int *e, *le,  // current position in emitted code
    *id,      // currently parsed identifier
    *sym,     // symbol table (simple list of identifiers)
    tk,       // current token
    ival,     // current token value
    ty,       // current expression type
    loc,      // local variable offset
    line,     // current line number
    src,      // print source and assembly flag
    debug;    // print executed instructions

// tokens and classes (operators last and in precedence order)
enum {
  Num = 128, Fun, Sys, Glo, Loc, Id,
  Char, Else, Enum, If, Int, Return, Sizeof, While,
  Assign, Cond, Lor, Lan, Or, Xor, And, Eq, Ne, Lt, Gt, Le, Ge, Shl, Shr, Add, Sub, Mul, Div, Mod, Inc, Dec, Brak
};

// opcodes
enum { LEA ,IMM ,JMP ,JSR ,BZ  ,BNZ ,ENT ,ADJ ,LEV ,LI  ,LC  ,SI  ,SC  ,PSH ,
       OR  ,XOR ,AND ,EQ  ,NE  ,LT  ,GT  ,LE  ,GE  ,SHL ,SHR ,ADD ,SUB ,MUL ,DIV ,MOD ,
       OPEN,READ,CLOS,PRTF,MALC,FREE,MSET,MCMP,EXIT };

// types
enum { CHAR, INT, PTR };

// identifier offsets (since we can't create an ident struct)
enum { Tk, Hash, Name, Class, Type, Val, HClass, HType, HVal, Idsz };

void next() {
  char *pp;

  while ((tk = *p)) {
    ++p;
    if (tk == '\n') {
      if (src) {
        printf("%lld: %.*s", line, p - lp, lp);
        lp = p;
        while (le < e) {
          printf("%8.4s", &"LEA ,IMM ,JMP ,JSR ,BZ  ,BNZ ,ENT ,ADJ ,LEV ,LI  ,LC  ,SI  ,SC  ,PSH ,"
                           "OR  ,XOR ,AND ,EQ  ,NE  ,LT  ,GT  ,LE  ,GE  ,SHL ,SHR ,ADD ,SUB ,MUL ,DIV ,MOD ,"
                           "OPEN,READ,CLOS,PRTF,MALC,FREE,MSET,MCMP,EXIT,"[*++le * 5]);
          if (*le <= ADJ) printf(" %lld\n", *++le); else printf("\n");
        }
      }
      ++line;
    }
    else if (tk == '#') {
      while (*p != 0 && *p != '\n') ++p;
    }
    else if ((tk >= 'a' && tk <= 'z') || (tk >= 'A' && tk <= 'Z') || tk == '_') {
      pp = p - 1;
      while ((*p >= 'a' && *p <= 'z') || (*p >= 'A' && *p <= 'Z') || (*p >= '0' && *p <= '9') || *p == '_')
        tk = tk * 147 + *p++;
      tk = (tk << 6) + (p - pp);
      id = sym;
      while (id[Tk]) {
        if (tk == id[Hash] && !memcmp((char *)id[Name], pp, p - pp)) { tk = id[Tk]; return; }
        id = id + Idsz;
      }
      id[Name] = (int)pp;
      id[Hash] = tk;
      tk = id[Tk] = Id;
      return;
    }
    else if (tk >= '0' && tk <= '9') {
      if ((ival = tk - '0')) { while (*p >= '0' && *p <= '9') ival = ival * 10 + *p++ - '0'; }
      else if (*p == 'x' || *p == 'X') {
        while ((tk = *++p) && ((tk >= '0' && tk <= '9') || (tk >= 'a' && tk <= 'f') || (tk >= 'A' && tk <= 'F')))
          ival = ival * 16 + (tk & 15) + (tk >= 'A' ? 9 : 0);
      }
      else { while (*p >= '0' && *p <= '7') ival = ival * 8 + *p++ - '0'; }
      tk = Num;
      return;
    }
    else if (tk == '/') {
      if (*p == '/') {
        ++p;
        while (*p != 0 && *p != '\n') ++p;
      }
      else {
        tk = Div;
        return;
      }
    }
    else if (tk == '\'' || tk == '"') {
      pp = data;
      while (*p != 0 && *p != tk) {
        if ((ival = *p++) == '\\') {
          if ((ival = *p++) == 'n') ival = '\n';
        }
        if (tk == '"') *data++ = ival;
      }
      ++p;
      if (tk == '"') ival = (int)pp; else tk = Num;
      return;
    }
    else if (tk == '=') { if (*p == '=') { ++p; tk = Eq; } else tk = Assign; return; }
    else if (tk == '+') { if (*p == '+') { ++p; tk = Inc; } else tk = Add; return; }
    else if (tk == '-') { if (*p == '-') { ++p; tk = Dec; } else tk = Sub; return; }
    else if (tk == '!') { if (*p == '=') { ++p; tk = Ne; } return; }
    else if (tk == '<') { if (*p == '=') { ++p; tk = Le; } else if (*p == '<') { ++p; tk = Shl; } else tk = Lt; return; }
    else if (tk == '>') { if (*p == '=') { ++p; tk = Ge; } else if (*p == '>') { ++p; tk = Shr; } else tk = Gt; return; }
    else if (tk == '|') { if (*p == '|') { ++p; tk = Lor; } else tk = Or; return; }
    else if (tk == '&') { if (*p == '&') { ++p; tk = Lan; } else tk = And; return; }
    else if (tk == '^') { tk = Xor; return; }
    else if (tk == '%') { tk = Mod; return; }
    else if (tk == '*') { tk = Mul; return; }
    else if (tk == '[') { tk = Brak; return; }
    else if (tk == '?') { tk = Cond; return; }
    else if (tk == '~' || tk == ';' || tk == '{' || tk == '}' || tk == '(' || tk == ')' || tk == ']' || tk == ',' || tk == ':') return;
  }
}

void expr(int lev) {
  int t, *d;

  if (!tk) { printf("%lld: unexpected eof in expression\n", line); exit(-1); }
  else if (tk == Num) { *++e = IMM; *++e = ival; next(); ty = INT; }
  else if (tk == '"') {
    *++e = IMM; *++e = ival; next();
    while (tk == '"') next();
    data = (char *)((int)data + sizeof(int) & -sizeof(int)); ty = PTR;
  }
  else if (tk == Sizeof) {
    next(); if (tk == '(') next(); else { printf("%lld: open paren expected in sizeof\n", line); exit(-1); }
    ty = INT; if (tk == Int) next(); else if (tk == Char) { next(); ty = CHAR; }
    while (tk == Mul) { next(); ty = ty + PTR; }
    if (tk == ')') next(); else { printf("%lld: close paren expected in sizeof\n", line); exit(-1); }
    *++e = IMM; *++e = (ty == CHAR) ? sizeof(char) : sizeof(int);
    ty = INT;
  }
  else if (tk == Id) {
    d = id; next();
    if (tk == '(') {
      next();
      t = 0;
      while (tk != ')') { expr(Assign); *++e = PSH; ++t; if (tk == ',') next(); }
      next();
      if (d[Class] == Sys) *++e = d[Val];
      else if (d[Class] == Fun) { *++e = JSR; *++e = d[Val]; }
      else { printf("%lld: bad function call\n", line); exit(-1); }
      if (t) { *++e = ADJ; *++e = t; }
      ty = d[Type];
    }
    else if (d[Class] == Num) { *++e = IMM; *++e = d[Val]; ty = INT; }
    else {
      if (d[Class] == Loc) { *++e = LEA; *++e = loc - d[Val]; }
      else if (d[Class] == Glo) { *++e = IMM; *++e = d[Val]; }
      else { printf("%lld: undefined variable\n", line); exit(-1); }
      *++e = ((ty = d[Type]) == CHAR) ? LC : LI;
    }
  }
  else if (tk == '(') {
    next();
    if (tk == Int || tk == Char) {
      t = (tk == Int) ? INT : CHAR; next();
      while (tk == Mul) { next(); t = t + PTR; }
      if (tk == ')') next(); else { printf("%lld: bad cast\n", line); exit(-1); }
      expr(Inc);
      ty = t;
    }
    else {
      expr(Assign);
      if (tk == ')') next(); else { printf("%lld: close paren expected\n", line); exit(-1); }
    }
  }
  else if (tk == Mul) {
    next(); expr(Inc);
    if (ty > INT) ty = ty - PTR; else { printf("%lld: bad dereference\n", line); exit(-1); }
    *++e = (ty == CHAR) ? LC : LI;
  }
  else if (tk == And) {
    next(); expr(Inc);
    if (*e == LC || *e == LI) --e; else { printf("%lld: bad address-of\n", line); exit(-1); }
    ty = ty + PTR;
  }
  else if (tk == '!') { next(); expr(Inc); *++e = PSH; *++e = IMM; *++e = 0; *++e = EQ; ty = INT; }
  else if (tk == '~') { next(); expr(Inc); *++e = PSH; *++e = IMM; *++e = -1; *++e = XOR; ty = INT; }
  else if (tk == Add) { next(); expr(Inc); ty = INT; }
  else if (tk == Sub) {
    next(); *++e = IMM;
    if (tk == Num) { *++e = -ival; next(); } else { *++e = -1; *++e = PSH; expr(Inc); *++e = MUL; }
    ty = INT;
  }
  else if (tk == Inc || tk == Dec) {
    t = tk; next(); expr(Inc);
    if (*e == LC) { *e = PSH; *++e = LC; }
    else if (*e == LI) { *e = PSH; *++e = LI; }
    else { printf("%lld: bad lvalue in pre-increment\n", line); exit(-1); }
    *++e = PSH;
    *++e = IMM; *++e = (ty > PTR) ? sizeof(int) : sizeof(char);
    *++e = (t == Inc) ? ADD : SUB;
    *++e = (ty == CHAR) ? SC : SI;
  }
  else { printf("%lld: bad expression\n", line); exit(-1); }

  while (tk >= lev) { // "precedence climbing" or "Top Down Operator Precedence" method
    t = ty;
    if (tk == Assign) {
      next();
      if (*e == LC || *e == LI) *e = PSH; else { printf("%lld: bad lvalue in assignment\n", line); exit(-1); }
      expr(Assign); *++e = ((ty = t) == CHAR) ? SC : SI;
    }
    else if (tk == Cond) {
      next();
      *++e = BZ; d = ++e;
      expr(Assign);
      if (tk == ':') next(); else { printf("%lld: conditional missing colon\n", line); exit(-1); }
      *d = (int)(e + 3); *++e = JMP; d = ++e;
      expr(Cond);
      *d = (int)(e + 1);
    }
    else if (tk == Lor) { next(); *++e = BNZ; d = ++e; expr(Lan); *d = (int)(e + 1); ty = INT; }
    else if (tk == Lan) { next(); *++e = BZ;  d = ++e; expr(Or);  *d = (int)(e + 1); ty = INT; }
    else if (tk == Or)  { next(); *++e = PSH; expr(Xor); *++e = OR;  ty = INT; }
    else if (tk == Xor) { next(); *++e = PSH; expr(And); *++e = XOR; ty = INT; }
    else if (tk == And) { next(); *++e = PSH; expr(Eq);  *++e = AND; ty = INT; }
    else if (tk == Eq)  { next(); *++e = PSH; expr(Lt);  *++e = EQ;  ty = INT; }
    else if (tk == Ne)  { next(); *++e = PSH; expr(Lt);  *++e = NE;  ty = INT; }
    else if (tk == Lt)  { next(); *++e = PSH; expr(Shl); *++e = LT;  ty = INT; }
    else if (tk == Gt)  { next(); *++e = PSH; expr(Shl); *++e = GT;  ty = INT; }
    else if (tk == Le)  { next(); *++e = PSH; expr(Shl); *++e = LE;  ty = INT; }
    else if (tk == Ge)  { next(); *++e = PSH; expr(Shl); *++e = GE;  ty = INT; }
    else if (tk == Shl) { next(); *++e = PSH; expr(Add); *++e = SHL; ty = INT; }
    else if (tk == Shr) { next(); *++e = PSH; expr(Add); *++e = SHR; ty = INT; }
    else if (tk == Add) {
      next(); *++e = PSH; expr(Mul);
      if ((ty = t) > PTR) { *++e = PSH; *++e = IMM; *++e = sizeof(int); *++e = MUL;  }
      *++e = ADD;
    }
    else if (tk == Sub) {
      next(); *++e = PSH; expr(Mul);
      if (t > PTR && t == ty) { *++e = SUB; *++e = PSH; *++e = IMM; *++e = sizeof(int); *++e = DIV; ty = INT; }
      else if ((ty = t) > PTR) { *++e = PSH; *++e = IMM; *++e = sizeof(int); *++e = MUL; *++e = SUB; }
      else *++e = SUB;
    }
    else if (tk == Mul) { next(); *++e = PSH; expr(Inc); *++e = MUL; ty = INT; }
    else if (tk == Div) { next(); *++e = PSH; expr(Inc); *++e = DIV; ty = INT; }
    else if (tk == Mod) { next(); *++e = PSH; expr(Inc); *++e = MOD; ty = INT; }
    else if (tk == Inc || tk == Dec) {
      if (*e == LC) { *e = PSH; *++e = LC; }
      else if (*e == LI) { *e = PSH; *++e = LI; }
      else { printf("%lld: bad lvalue in post-increment\n", line); exit(-1); }
      *++e = PSH; *++e = IMM; *++e = (ty > PTR) ? sizeof(int) : sizeof(char);
      *++e = (tk == Inc) ? ADD : SUB;
      *++e = (ty == CHAR) ? SC : SI;
      *++e = PSH; *++e = IMM; *++e = (ty > PTR) ? sizeof(int) : sizeof(char);
      *++e = (tk == Inc) ? SUB : ADD;
      next();
    }
    else if (tk == Brak) {
      next(); *++e = PSH; expr(Assign);
      if (tk == ']') next(); else { printf("%lld: close bracket expected\n", line); exit(-1); }
      if (t > PTR) { *++e = PSH; *++e = IMM; *++e = sizeof(int); *++e = MUL;  }
      else if (t < PTR) { printf("%lld: pointer type expected\n", line); exit(-1); }
      *++e = ADD;
      *++e = ((ty = t - PTR) == CHAR) ? LC : LI;
    }
    else { printf("%lld: compiler error tk=%lld\n", line, tk); exit(-1); }
  }
}

void stmt() {
  int *a, *b;

  if (tk == If) {
    next();
    if (tk == '(') next(); else { printf("%lld: open paren expected\n", line); exit(-1); }
    expr(Assign);
    if (tk == ')') next(); else { printf("%lld: close paren expected\n", line); exit(-1); }
    *++e = BZ; b = ++e;
    stmt();
    if (tk == Else) {
      *b = (int)(e + 3); *++e = JMP; b = ++e;
      next();
      stmt();
    }
    *b = (int)(e + 1);
  }
  else if (tk == While) {
    next();
    a = e + 1;
    if (tk == '(') next(); else { printf("%lld: open paren expected\n", line); exit(-1); }
    expr(Assign);
    if (tk == ')') next(); else { printf("%lld: close paren expected\n", line); exit(-1); }
    *++e = BZ; b = ++e;
    stmt();
    *++e = JMP; *++e = (int)a;
    *b = (int)(e + 1);
  }
  else if (tk == Return) {
    next();
    if (tk != ';') expr(Assign);
    *++e = LEV;
    if (tk == ';') next(); else { printf("%lld: semicolon expected\n", line); exit(-1); }
  }
  else if (tk == '{') {
    next();
    while (tk != '}') stmt();
    next();
  }
  else if (tk == ';') {
    next();
  }
  else {
    expr(Assign);
    if (tk == ';') next(); else { printf("%lld: semicolon expected\n", line); exit(-1); }
  }
}

signed main(signed argc, char **argv) {
  signed fd;
  int bt, typ, poolsz, *idmain;
  int *pc, *sp, *bp, a, cycle; // vm registers
  int i, *t; // temps

  --argc; ++argv;
  if (argc > 0 && **argv == '-' && (*argv)[1] == 's') { src = 1; --argc; ++argv; }
  if (argc > 0 && **argv == '-' && (*argv)[1] == 'd') { debug = 1; --argc; ++argv; }
  if (argc < 1) { printf("usage: c4 [-s] [-d] file ...\n"); return -1; }

  if ((fd = open(*argv, 0)) < 0) { printf("could not open(%s)\n", *argv); return -1; }

  poolsz = 256*1024; // arbitrary size
  if (!(sym = malloc(poolsz))) { printf("could not malloc(%lld) symbol area\n", poolsz); return -1; }
  if (!(le = e = malloc(poolsz))) { printf("could not malloc(%lld) text area\n", poolsz); return -1; }
  if (!(data = malloc(poolsz))) { printf("could not malloc(%lld) data area\n", poolsz); return -1; }
  if (!(sp = malloc(poolsz))) { printf("could not malloc(%lld) stack area\n", poolsz); return -1; }

  memset(sym,  0, poolsz);
  memset(e,    0, poolsz);
  memset(data, 0, poolsz);

  p = "char else enum if int return sizeof while "
      "open read close printf malloc free memset memcmp exit void main";
  i = Char; while (i <= While) { next(); id[Tk] = i++; } // add keywords to symbol table
  i = OPEN; while (i <= EXIT) { next(); id[Class] = Sys; id[Type] = INT; id[Val] = i++; } // add library to symbol table
  next(); id[Tk] = Char; // handle void type
  next(); idmain = id; // keep track of main

  if (!(lp = p = malloc(poolsz))) { printf("could not malloc(%lld) source area\n", poolsz); return -1; }
  if ((i = read(fd, p, poolsz-1)) <= 0) { printf("read() returned %lld\n", i); return -1; }
  p[i] = 0;
  close(fd);

  // parse declarations
  line = 1;
  next();
  while (tk) {
    bt = INT; // basetype
    if (tk == Int) next();
    else if (tk == Char) { next(); bt = CHAR; }
    else if (tk == Enum) {
      next();
      if (tk != '{') next();
      if (tk == '{') {
        next();
        i = 0;
        while (tk != '}') {
          if (tk != Id) { printf("%lld: bad enum identifier %lld\n", line, tk); return -1; }
          next();
          if (tk == Assign) {
            next();
            if (tk != Num) { printf("%lld: bad enum initializer\n", line); return -1; }
            i = ival;
            next();
          }
          id[Class] = Num; id[Type] = INT; id[Val] = i++;
          if (tk == ',') next();
        }
        next();
      }
    }
    while (tk != ';' && tk != '}') {
      typ = bt;
      while (tk == Mul) { next(); typ = typ + PTR; }
      if (tk != Id) { printf("%lld: bad global declaration\n", line); return -1; }
      if (id[Class]) { printf("%lld: duplicate global definition\n", line); return -1; }
      next();
      id[Type] = typ;
      if (tk == '(') { // function
        id[Class] = Fun;
        id[Val] = (int)(e + 1);
        next(); i = 0;
        while (tk != ')') {
          typ = INT;
          if (tk == Int) next();
          else if (tk == Char) { next(); typ = CHAR; }
          while (tk == Mul) { next(); typ = typ + PTR; }
          if (tk != Id) { printf("%lld: bad parameter declaration\n", line); return -1; }
          if (id[Class] == Loc) { printf("%lld: duplicate parameter definition\n", line); return -1; }
          id[HClass] = id[Class]; id[Class] = Loc;
          id[HType]  = id[Type];  id[Type] = typ;
          id[HVal]   = id[Val];   id[Val] = i++;
          next();
          if (tk == ',') next();
        }
        next();
        if (tk != '{') { printf("%lld: bad function definition\n", line); return -1; }
        loc = ++i;
        next();
        while (tk == Int || tk == Char) {
          bt = (tk == Int) ? INT : CHAR;
          next();
          while (tk != ';') {
            typ = bt;
            while (tk == Mul) { next(); typ = typ + PTR; }
            if (tk != Id) { printf("%lld: bad local declaration\n", line); return -1; }
            if (id[Class] == Loc) { printf("%lld: duplicate local definition\n", line); return -1; }
            id[HClass] = id[Class]; id[Class] = Loc;
            id[HType]  = id[Type];  id[Type]  = typ;
            id[HVal]   = id[Val];   id[Val]   = ++i;
            next();
            if (tk == ',') next();
          }
          next();
        }
        *++e = ENT; *++e = i - loc;
        while (tk != '}') stmt();
        *++e = LEV;
        id = sym; // unwind symbol table locals
        while (id[Tk]) {
          if (id[Class] == Loc) {
            id[Class] = id[HClass];
            id[Type] = id[HType];
            id[Val] = id[HVal];
          }
          id = id + Idsz;
        }
      }
      else {
        id[Class] = Glo;
        id[Val] = (int)data;
        data = data + sizeof(int);
      }
      if (tk == ',') next();
    }
    next();
  }

  if (!(pc = (int *)idmain[Val])) { printf("main() not defined\n"); return -1; }
  if (src) return 0;

  // setup stack
  bp = sp = (int *)((int)sp + poolsz);
  *--sp = EXIT; // call exit if main returns
  *--sp = PSH; t = sp;
  *--sp = argc;
  *--sp = (int)argv;
  *--sp = (int)t;

  // run...
  cycle = 0;
  while (1) {
    i = *pc++; ++cycle;
    if (debug) {
      printf("%lld> %.4s", cycle,
        &"LEA ,IMM ,JMP ,JSR ,BZ  ,BNZ ,ENT ,ADJ ,LEV ,LI  ,LC  ,SI  ,SC  ,PSH ,"
         "OR  ,XOR ,AND ,EQ  ,NE  ,LT  ,GT  ,LE  ,GE  ,SHL ,SHR ,ADD ,SUB ,MUL ,DIV ,MOD ,"
         "OPEN,READ,CLOS,PRTF,MALC,FREE,MSET,MCMP,EXIT,"[i * 5]);
      if (i <= ADJ) printf(" %lld\n", *pc); else printf("\n");
    }
    if      (i == LEA) a = (int)(bp + *pc++);                             // load local address
    else if (i == IMM) a = *pc++;                                         // load global address or immediate
    else if (i == JMP) pc = (int *)*pc;                                   // jump
    else if (i == JSR) { *--sp = (int)(pc + 1); pc = (int *)*pc; }        // jump to subroutine
    else if (i == BZ)  pc = a ? pc + 1 : (int *)*pc;                      // branch if zero
    else if (i == BNZ) pc = a ? (int *)*pc : pc + 1;                      // branch if not zero
    else if (i == ENT) { *--sp = (int)bp; bp = sp; sp = sp - *pc++; }     // enter subroutine
    else if (i == ADJ) sp = sp + *pc++;                                   // stack adjust
    else if (i == LEV) { sp = bp; bp = (int *)*sp++; pc = (int *)*sp++; } // leave subroutine
    else if (i == LI)  a = *(int *)a;                                     // load int
    else if (i == LC)  a = *(char *)a;                                    // load char
    else if (i == SI)  *(int *)*sp++ = a;                                 // store int
    else if (i == SC)  a = *(char *)*sp++ = a;                            // store char
    else if (i == PSH) *--sp = a;                                         // push

    else if (i == OR)  a = *sp++ |  a;
    else if (i == XOR) a = *sp++ ^  a;
    else if (i == AND) a = *sp++ &  a;
    else if (i == EQ)  a = *sp++ == a;
    else if (i == NE)  a = *sp++ != a;
    else if (i == LT)  a = *sp++ <  a;
    else if (i == GT)  a = *sp++ >  a;
    else if (i == LE)  a = *sp++ <= a;
    else if (i == GE)  a = *sp++ >= a;
    else if (i == SHL) a = *sp++ << a;
    else if (i == SHR) a = *sp++ >> a;
    else if (i == ADD) a = *sp++ +  a;
    else if (i == SUB) a = *sp++ -  a;
    else if (i == MUL) a = *sp++ *  a;
    else if (i == DIV) a = *sp++ /  a;
    else if (i == MOD) a = *sp++ %  a;

    else if (i == OPEN) a = open((char *)sp[1], *sp);
    else if (i == READ) a = read(sp[2], (char *)sp[1], *sp);
    else if (i == CLOS) a = close(*sp);
    else if (i == PRTF) { t = sp + pc[1]; a = printf((char *)t[-1], t[-2], t[-3], t[-4], t[-5], t[-6]); }
    else if (i == MALC) a = (int)malloc(*sp);
    else if (i == FREE) free((void *)*sp);
    else if (i == MSET) a = (int)memset((char *)sp[2], sp[1], *sp);
    else if (i == MCMP) a = memcmp((char *)sp[2], (char *)sp[1], *sp);
    else if (i == EXIT) { printf("exit(%lld) cycle = %lld\n", *sp, cycle); return *sp; }
    else { printf("unknown instruction = %lld! cycle = %lld\n", i, cycle); return -1; }
  }
}

No need to add headers by using C4 to compile

I delete all the included header files and find it also works by self-compilation,

The way it 'supports' functions provided in library ( such as printf, malloc, memset, etc ) really interests me, how was it implemented?

grammar error in function call when missing comma between parameters

update: pull request #47

missing comma when calling test(1 2), and compile ok

int test(int a, int b)
{
  return a + b;  
}

int main()
{
  int result;
  result = test(1 2);
  return 0;
}

patch

  else if (tk == Id) {
    ...
--  while (tk != ')') { expr(Assign); *++e = PSH; ++t; if (tk == ',') next(); }
++  while (tk != ')') { 
++    expr(Assign); *++e = PSH; ++t;
++    if (tk == ',') { next(); if(tk == ')') { printf("%d: error unexpected comma in function call\n", line); exit(-1); }}
++    else if(tk != ')') { printf("%d: error missing comma in function call\n", line); exit(-1); }
++  }
    ...
}

c4 does not compile without further #include added

I just tried to compile using gcc 5.3.0 on x86_64 Arch Linux and it failed to compile, giving the following error:

c4.c: In function ‘main’:
c4.c:342:13: warning: implicit declaration of function ‘open’ [-Wimplicit-function-declaration]
   if ((fd = open(*argv, 0)) < 0) { printf("could not open(%s)\n", *argv); return -1; }
             ^

Command used was gcc -m32 -o c4 c4.c.

By adding #include <fcntl.h> as listed in the open() man page, it compiles and works fine.

Is this a correct fix and shall I send a pull request to do such an addition?

How to call external functions.

In order to make a script useful for a real world I'd like to write

int main()
{
int sens1_val;
int sens2_val;
while(1)
{
sens1_val = m_sensor.GetDataByValType(1, SENS_VAL_TYPE_TEMP);
sens2_val = m_sensor.GetDataByValType(2, SENS_VAL_TYPE_HUM);
if ( sens1_val > 30.5 && sens2_val > 60)
Out(1, true);

}

}

But how to point the script to external functions (m_sensor.GetDataByValType, Out)?

10 = 666; considered legal

I cut my teeth on Small-C eons ago and c4 is an absolute delight and very clever. However, I did think the lvalue trick looked too fragile and indeed, if you generate just the right constant, like 9 or 10, you can trick it.

int main(int argc, char **argv) {
    10 = 666;
}

This compiles and will lead to stack corruption.

Definitely not hard to fix and probably not worthwhile.

how to compile c4.c by c4?

I succeeded c4 hello.c, then I wanna try to compile c4.c by c4 itself.
I tried c4 c4.c, it output:

usage: c4 [-s] [-d] file ...
exit(-1) cycle = 45

Then, I tried c4 -s -d c4.c, it output a bunch of ascii format of assembly code. which is like:

525: else { printf("unknown instruction = %d! cycle = %d\n", i, cycle); return -1; }
JMP 0
IMM 3356448
PSH
LEA -11
LI
PSH
LEA -10
LI
PSH
PRTF
ADJ 3
IMM -1
LEV
526: }
527: }
JMP 3123380
LEV

So I am wondering how to compile the c4.c into binary by c4 itself. Thanks!

C to assembly issue

Why you have converted sample program in assembly when it execution is using C functions ( like printf ), I haven't seen any place where assembly is executing.
Is there any significance for it or just demo purpose.
Don't get me wrong, I haven't get my head around it.

Bad local declaration

I get the error when use the following declaration:

int a = 5;

but the following code works without errors

int a;
a = 5;

Problem Compiling: Warnings and Errors

This project is really cool and i'm aching to give it a try.
But...
I curled the raw code and saved it to my hard drive, and then ran gcc -o c4 c4.c and got a list of 20 errors. Then, after realizing my machine might be 64 bit, I ran gcc again with -m32 included, and still got the following list of compiler errors and warnings.

Any idea how to bypass this?

$ gcc c4.c
c4.c:45:1: warning: type specifier missing, defaults to 'int' [-Wimplicit-int]
next()
^~~~
c4.c:49:21: error: non-void function 'next' should return a value
      [-Wreturn-type]
    if (!(tk = *p)) return;
                    ^
c4.c:54:23: warning: field precision should have type 'int', but argument has
      type 'long' [-Wformat]
        printf("%d: %.*s", line, p - lp, lp);
                    ~~^~         ~~~~~~
c4.c:75:39: warning: cast to 'char *' from smaller integer type 'int'
      [-Wint-to-pointer-cast]
        if (tk == id[Hash] && !memcmp((char *)id[Name], pp, p - pp)) { t...
                                      ^
c4.c:75:85: error: non-void function 'next' should return a value
      [-Wreturn-type]
  ...&& !memcmp((char *)id[Name], pp, p - pp)) { tk = id[Tk]; return; }
                                                              ^
c4.c:81:7: error: non-void function 'next' should return a value [-Wreturn-type]
      return;
      ^
c4.c:87:7: error: non-void function 'next' should return a value [-Wreturn-type]
      return;
      ^
c4.c:95:9: error: non-void function 'next' should return a value [-Wreturn-type]
        return;
        ^
c4.c:108:7: error: non-void function 'next' should return a value
      [-Wreturn-type]
      return;
      ^
c4.c:110:78: error: non-void function 'next' should return a value
      [-Wreturn-type]
  ...== '=') { if (*p == '=') { ++p; tk = Eq; } else tk = Assign; return; }
                                                                  ^
c4.c:111:76: error: non-void function 'next' should return a value
      [-Wreturn-type]
  ...(tk == '+') { if (*p == '+') { ++p; tk = Inc; } else tk = Add; return; }
                                                                    ^
c4.c:112:76: error: non-void function 'next' should return a value
      [-Wreturn-type]
  ...(tk == '-') { if (*p == '-') { ++p; tk = Dec; } else tk = Sub; return; }
                                                                    ^
c4.c:113:60: error: non-void function 'next' should return a value
      [-Wreturn-type]
    else if (tk == '!') { if (*p == '=') { ++p; tk = Ne; } return; }
                                                           ^
c4.c:114:113: error: non-void function 'next' should return a value
      [-Wreturn-type]
  ...= Le; } else if (*p == '<') { ++p; tk = Shl; } else tk = Lt; return; }
                                                                  ^
c4.c:115:113: error: non-void function 'next' should return a value
      [-Wreturn-type]
  ...= Ge; } else if (*p == '>') { ++p; tk = Shr; } else tk = Gt; return; }
                                                                  ^
c4.c:116:75: error: non-void function 'next' should return a value
      [-Wreturn-type]
  ...(tk == '|') { if (*p == '|') { ++p; tk = Lor; } else tk = Or; return; }
                                                                   ^
c4.c:117:76: error: non-void function 'next' should return a value
      [-Wreturn-type]
  ...(tk == '&') { if (*p == '&') { ++p; tk = Lan; } else tk = And; return; }
                                                                    ^
c4.c:118:37: error: non-void function 'next' should return a value
      [-Wreturn-type]
    else if (tk == '^') { tk = Xor; return; }
                                    ^
c4.c:119:37: error: non-void function 'next' should return a value
      [-Wreturn-type]
    else if (tk == '%') { tk = Mod; return; }
                                    ^
c4.c:120:37: error: non-void function 'next' should return a value
      [-Wreturn-type]
    else if (tk == '*') { tk = Mul; return; }
                                    ^
c4.c:121:38: error: non-void function 'next' should return a value
      [-Wreturn-type]
    else if (tk == '[') { tk = Brak; return; }
                                     ^
c4.c:122:38: error: non-void function 'next' should return a value
      [-Wreturn-type]
    else if (tk == '?') { tk = Cond; return; }
                                     ^
fatal error: too many errors emitted, stopping now [-ferror-limit=]
3 warnings and 20 errors generated. ```

gcc compile error in macos: first parameter of 'main' (argument count) must be of type 'int'

I got a error when build c4 in macos 10.15

c4.c:333:5: error: first parameter of 'main' (argument count) must be of type 'int'
int main(int argc, char **argv)

My gcc version

gcc -v
Configured with: --prefix=/Applications/Xcode.app/Contents/Developer/usr --with-gxx-include-dir=/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX.sdk/usr/include/c++/4.2.1
Apple clang version 12.0.0 (clang-1200.0.32.29)
Target: x86_64-apple-darwin20.2.0
Thread model: posix
InstalledDir: /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin

Does it not support MacOS and UNIX?

many complie warnings when run `gcc -o c4 c4.c`

root@localhost:~# gcc -o c4 c4.c
c4.c: In function ‘next’:
c4.c:56:18: warning: format ‘%d’ expects argument of type ‘int’, but argument 2 has type ‘long long int’ [-Wformat=]
   56 |         printf("%d: %.*s", line, p - lp, lp);
      |                 ~^         ~~~~                                                               |                  |         |
      |                  int       long long int
      |                 %lld
etc

my gcc version

gcc -v
Using built-in specs.
COLLECT_GCC=gcc
COLLECT_LTO_WRAPPER=/usr/lib/gcc/aarch64-linux-gnu/11/lto-wrapper
Target: aarch64-linux-gnu
Configured with: ../src/configure -v --with-pkgversion='Ubuntu 11.4.0-1ubuntu1~22.04' --with-bugurl=file:///usr/share/doc/gcc-11/README.Bugs --enable-languages=c,ada,c++,go,d,fortran,objc,obj-c++,m2 --prefix=/usr --with-gcc-major-version-only --program-suffix=-11 --program-prefix=aarch64-linux-gnu- --enable-shared --enable-linker-build-id --libexecdir=/usr/lib --without-included-gettext --enable-threads=posix --libdir=/usr/lib --enable-nls --enable-bootstrap --enable-clocale=gnu --enable-libstdcxx-debug --enable-libstdcxx-time=yes --with-default-libstdcxx-abi=new --enable-gnu-unique-object --disable-libquadmath --disable-libquadmath-support --enable-plugin --enable-default-pie --with-system-zlib --enable-libphobos-checking=release --with-target-system-zlib=auto --enable-objc-gc=auto --enable-multiarch --enable-fix-cortex-a53-843419 --disable-werror --enable-checking=release --build=aarch64-linux-gnu --host=aarch64-linux-gnu --target=aarch64-linux-gnu --with-build-config=bootstrap-lto-lean --enable-link-serialization=2
Thread model: posix
Supported LTO compression algorithms: zlib zstd
gcc version 11.4.0 (Ubuntu 11.4.0-1ubuntu1~22.04)
root@localhost:~#

Improve README.md

It would be helpful if the README had a longer description.

Something like:

A one-pass compiler for a subset of C, relying on a recursive-descent parser, doing the lexing, parsing and code generation in lockstep. The generated code, consisting of abstract machine instructions, is then executed by an instruction fetch and execute loop.
https://news.ycombinator.com/item?id=22353532

Compound conditional generates bad code

int is defined as 32 bit here, not long long, but that is not related to the bug:

%c4 -s c4.c

73: while (*p != 0 && *p != '\n') ++p;
0xb6d4c2a4 IMM 0xb6d0b008
0xb6d4c2ac LI
0xb6d4c2b0 LC
0xb6d4c2b4 PSH
0xb6d4c2b8 IMM (nil)
0xb6d4c2c0 NE
0xb6d4c2c4 BZ 0xb6d4c2f0 <<-- This address is wrong for short ciruit op
0xb6d4c2cc IMM 0xb6d0b008
0xb6d4c2d4 LI
0xb6d4c2d8 LC
0xb6d4c2dc PSH
0xb6d4c2e0 IMM 0xa
0xb6d4c2e8 NE
0xb6d4c2ec BZ (nil)
0xb6d4c2f4 IMM 0xb6d0b008
0xb6d4c2fc PSH
0xb6d4c300 LI
0xb6d4c304 PSH
0xb6d4c308 IMM 0x1
0xb6d4c310 ADD
0xb6d4c314 SI

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.