Code Monkey home page Code Monkey logo

write-a-c-interpreter's Introduction

C interpreter that interprets itself.

How to Run the Code

File xc.c is the original one and xc-tutor.c is the one that I make for the tutorial step by step.

gcc -o xc xc.c
./xc hello.c
./xc -s hello.c

./xc xc.c hello.c
./xc xc.c xc.c hello.c

About

This project is inspired by c4 and is largely based on it.

However, I rewrote them all to make it more understandable and help myself to understand it.

Despite the complexity we saw in books about compiler design, writing one is not that hard. You don't need that much theory though they will help for better understanding the logic behind the code.

Also I write a series of article about how this compiler is built under directory tutorial/en.

There is also a chinese version in my blog.

  1. 手把手教你构建 C 语言编译器(0)——前言
  2. 手把手教你构建 C 语言编译器(1)——设计
  3. 手把手教你构建 C 语言编译器(2)——虚拟机
  4. 手把手教你构建 C 语言编译器(3)——词法分析器
  5. 手把手教你构建 C 语言编译器(4)——递归下降
  6. 手把手教你构建 C 语言编译器(5)——变量定义
  7. 手把手教你构建 C 语言编译器(6)——函数定义
  8. 手把手教你构建 C 语言编译器(7)——语句
  9. 手把手教你构建 C 语言编译器(8)——表达式
  10. 手把手教你构建 C 语言编译器(9)——总结

Resources

Further Reading:

Forks:

Licence

The original code is licenced with GPL2, so this code will use the same licence.

write-a-c-interpreter's People

Contributors

0xflotus avatar aimaribarra avatar alvarorichard avatar comwrg avatar ihalseide avatar krupan avatar lotabout avatar mrasus16112 avatar snikitinlf avatar tesivo avatar tomershech avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

write-a-c-interpreter's Issues

Castom tokens

I have a simple, little script parser.
For example to switch a sprinkler on time
IF TIME == 12:00:00 THEN OUT1 = 1; IF TIME == 13:00:00 THEN OUT1 = 0;
Or to activate some devices on environment condition.
IF TEMP1 > 30.5 AND HUM2 > 60 THEN OUT2 = 1
Here I get a temperature of a sensor #1 and a humidity of a sensor #2.
In my code it looks like
`
switch (type)
{
case VAR_TYPE_TSENS:
val = m_sensor.GetDataByValType(num, SENS_VAL_TYPE_TEMP);
break;

case VAR_TYPE_HSENS:
val = m_sensor.GetDataByValType(num, SENS_VAL_TYPE_HUM);
break;
}
`

May I put it in the code?

`
int eval()
{
int op, *tmp;
while (1)
{
op = *pc++; // get next operation code

   //////////////////////////////////////////////////
    else if (op == MALC) { ax = (int)malloc(*sp);}
    else if (op == MSET) { ax = (int)memset((char *)sp[2], sp[1], *sp);}
    else if (op == MCMP) { ax = memcmp((char *)sp[2], (char *)sp[1], *sp);}


    ////////////// MY PART ///////////////////////////
    else if (op == TEMP)  { ax = m_sensor.GetDataByValType( num, SENS_VAL_TYPE_TEMP); }
    else if (op == HUM)  { ax = m_sensor.GetDataByValType( num, SENS_VAL_TYPE_HUM); }
}

}`

The question - how to parse my tokens (TEMP1, HUM2) and where to store the sensor's number (1,2)?

Stuck at script parsing

I test the interpreter

`
poolsize = 1 * 1024;
line = 1;

// allocate memory for virtual machine
if (!(text = old_text = malloc(poolsize)))
    return -1;

if (!(data = malloc(poolsize))) 
    return -1;

if (!(stack = malloc(poolsize))) 
    return -1;

if (!(symbols = malloc(poolsize)))
    return -1;

memset(text, 0, poolsize);
memset(data, 0, poolsize);
memset(stack, 0, poolsize);
memset(symbols, 0, poolsize);
bp = sp = (int *)((int)stack + poolsize);
ax = 0;

src = "char else enum if int return sizeof while "
      "open read close printf malloc memset memcmp exit void main";

// add keywords to symbol table
i = Char;
while (i <= While) 
{
    next();
    current_id[Token] = i++;
}

// add library to symbol table
i = OPEN;
while (i <= EXIT) 
{
    next();
    current_id[Class] = Sys;
    current_id[Type] = INT;
    current_id[Value] = i++;
}

next(); current_id[Token] = Char;
next(); idmain = current_id; 

src = "int vara;
int varb;
while (1)
{
vara=2;
varb=3;
int varc = vara + varb;
if (varc >= 5)
return varc;
}";

 program();

`
It stuck at program() after while at (1). What do I do wrong?

memcmp函数的问题

你好。看了你的《手把手教你做一个 C 语言编译器》。动手实践时遇到问题,苦恼很久。
if (current_id[Hash] == hash && !memcmp( (char *)current_id[Name], last_pos, src - last_pos) )
请问这句中的:memcmp( (char *)current_id[Name], last_pos, src - last_pos)是用来 截取一段字符串赋值给current_id[Name]的嘛?
因为int memcmp ( const void * ptr1, const void * ptr2, size_t num );这个是memcmp 的原型。感觉您的用法很特殊啊!最后一个参数不应该是数字嘛?

谢谢!

突然间想明白了。不用回答了。
谢谢

c没有getline

int getline(char * src){
char ch; //保存单个字符的变量
while ((ch = getchar()) && ch != 0 && ch != '\n'){
*src++ = ch;
}
*src = 0; //结尾放入EOF,字符串构造完毕
}

&& 运算计算有误

你好,你的代码写的很漂亮!
我最近在尝试让此解释器运行在64位机器上,在使用你的代码在32位机器上测试时发现‘&&’运算计算有误,不知是我操作有误还是其他原因,目前我也在尝试修改,请问你写完代码后有进行覆盖测试吗?或者有什么其他建议?谢谢。

Getting warning

warning

Hi getting warning and no further process after that, working on 64-bit windows machin

赋值问题

请问,这个编译器是如何解析 i=10 的呢?这个赋值操作怎么实现的呢? 有看到将Name写入Symbol中;但是并没有看到将 10 写入Symbol中的 Value啊。
望解解答。谢谢。

step1虚拟机的代码报错了

你好,xc.c是我在step1中clone下来的代码,我尝试用g++编译,但遇到以下错误:

g++ xc.c
xc.c: In function 'long long int eval()':
xc.c:83:62: error: 'open' was not declared in this scope
         else if (op == OPEN) { ax = open((char *)sp[1], sp[0]); }
                                                              ^
xc.c:84:46: error: 'close' was not declared in this scope
         else if (op == CLOS) { ax = close(*sp);}
                                              ^
xc.c:85:67: error: 'read' was not declared in this scope
         else if (op == READ) { ax = read(sp[2], (char *)sp[1], *sp); }
                                                                   ^
xc.c: In function 'int main(int, char**)':
xc.c:112:28: error: 'open' was not declared in this scope
     if ((fd = open(*argv, 0)) < 0) {
                            ^
xc.c:117:42: error: invalid conversion from 'void*' to 'char*' [-fpermissive]
     if (!(src = old_src = malloc(poolsize))) {
                                          ^
xc.c:122:38: error: 'read' was not declared in this scope
     if ((i = read(fd, src, poolsize-1)) <= 0) {
                                      ^
xc.c:127:13: error: 'close' was not declared in this scope
     close(fd);
             ^
xc.c:130:44: error: invalid conversion from 'void*' to 'long long int*' [-fpermissive]
     if (!(text = old_text = malloc(poolsize))) {
                                            ^
xc.c:134:33: error: invalid conversion from 'void*' to 'char*' [-fpermissive]
     if (!(data = malloc(poolsize))) {
                                 ^
xc.c:138:34: error: invalid conversion from 'void*' to 'long long int*' [-fpermissive]
     if (!(stack = malloc(poolsize))) {

代码在windows gcc-6.3.0下无法编译通过

您好~我直接从仓库拉代码下来无法编译通过:
os: win10 64bit
gcc: MinGW.org GCC-6.3.0-1

PS C:\Users\AlexC\Documents\GitHub\write-a-C-interpreter> gcc -m32 -o xc xc.c
xc.c: In function 'next':
xc.c:111:57: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast]
                 if (current_id[Hash] == hash && !memcmp((char *)current_id[Name], last_pos, src - last_pos)) {
                                                         ^
xc.c:121:32: warning: cast from pointer to integer of different size [-Wpointer-to-int-cast]
             current_id[Name] = (int)last_pos;
                                ^
xc.c:188:29: warning: cast from pointer to integer of different size [-Wpointer-to-int-cast]
                 token_val = (int)last_pos;
                             ^
xc.c: In function 'expression':
xc.c:363:30: warning: cast from pointer to integer of different size [-Wpointer-to-int-cast]
             data = (char *)(((int)data + sizeof(int)) & (-sizeof(int)));
                              ^
xc.c:363:20: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast]
             data = (char *)(((int)data + sizeof(int)) & (-sizeof(int)));
                    ^
xc.c:629:25: warning: cast from pointer to integer of different size [-Wpointer-to-int-cast]
                 *addr = (int)(text + 3);
                         ^
xc.c:633:25: warning: cast from pointer to integer of different size [-Wpointer-to-int-cast]
                 *addr = (int)(text + 1);
                         ^
.....

编译模板代码的时候会报open,read,close未定义先使用的错误:

PS C:\Users\AlexC\Documents\compiler> gcc -m32 .\compiler.c .\helloworld.c
.\compiler.c: In function 'main':
.\compiler.c:46:15: warning: implicit declaration of function 'open' [-Wimplicit-function-declaration]
     if ((fd = open(*argv, 0)) < 0) {
               ^~~~
.\compiler.c:56:14: warning: implicit declaration of function 'read' [-Wimplicit-function-declaration]
     if ((i = read(fd, src, poolsize-1)) <= 0) {
              ^~~~
.\compiler.c:61:5: warning: implicit declaration of function 'close' [-Wimplicit-function-declaration]
     close(fd);

在论坛上找到相似问题,但还是无法解决,如引入以下头文件:

#include <fcntl.h> // for open
#include <unistd.h> // for close

是我本地的gcc版本有问题吗?还是必须要在linux环境下运行?

step-2 xc-tutor.c 中关于数字处理的问题

代码第87-113.

代码中的逻辑是这样的:

token_val = token - '0';
 if (token_val) {
   //hex
    if (*src == 'x' || *src == 'X') {

    }else{
  // dec
    }
}
else
{
//oct
}

我觉得逻辑应该是这样的,调换一下dec和oct的代码位置:

token_val = token - '0';
 if (token_val) {
   //hex
    if (*src == 'x' || *src == 'X') {

    }else{
  // oct
    }
}
else
{
//dec
}

github上这样写对十六进制和八进制是没有影响的,作者只测试了十进制,所以没发现吧,把hello.c 中的数字换成十六进制一试便知。请lotabout再看一眼git上的代码吧。

hash = hash * 147 + *src

Hello! It's an honor to find you guys' code about compile and lexier. And this is an excellent work. But There is a problem when I read your code. it is about parsing identifier, the hash.
this line: hash = hash * 147 + *src;
why you chose the number 147? I wonder. and can this decrease conflict among different identifiers?
Looking forward to your reply.

Issue in Assembly mnemonics

I haven't yet understand so far the use of assembly mnemonics. We are executing source code ( hello.c ) using C functions like printf .

Is there any significance of Assembly instructions or they are just demo purpose.

I mean if we are just translating source code so the executable C program can run it, why use Assembly to increase overall complexity.

I am not sure am I right or not help me to understand this thing

Typos in `./tutorial/en/0-Preface.md`

  1. ...Compiler Thoery...

    Thoery" trys to teach is "How to build a parser generator", namely a tool that

    You meant Theory, didn’t you?

  2. ...can you imaging...

    programing easy. Anyway can you imaging building a web browser in only

    imagine is more appropriate here.

  3. ...trys to...

    Thoery" trys to teach is "How to build a parser generator", namely a tool that

    There should be tries.

重复的LEV

在使用return语句时,生成了一句LEV,而在function_body()函数的最后,当parse到右花括号时,又会生成一句LEV。

SI与SC指令

博客第二节中,讲了SI与SC指令,实现如下:

void eval() {
    int op, *tmp;
    while (1) {
        if (op == IMM)       {ax = *pc++;}                                     // load immediate value to ax
        else if (op == LC)   {ax = *(char *)ax;}                               // load character to ax, address in ax
        else if (op == LI)   {ax = *(int *)ax;}                                // load integer to ax, address in ax
        else if (op == SC)   {ax = *(char *)*sp++ = ax;}                       // save character to address, value in ax, address on stack
        else if (op == SI)   {*(int *)*sp++ = ax;}                             // save integer to address, value in ax, address on stack
    }

关于SC的实现: ax=*(char*)*sp++=ax; 这一句我理解是:
ax中原本存放地址,将其存入栈顶中,然后取出栈顶保存地址对应的值,再赋值给ax,出栈。请问我的理解对吗?
另外,我觉得SC与SI是相似的操作,为什么SC是 ax = *(char *)*sp++ = ax 而SI却是 *(int *)*sp++ = ax 呢?

64-bit support

Why not using 'long' to define pointers on 64-bit systems.

multiple assignment of ax

Hello

I noticed this question seemed to be asked before in Chinese but unfortunately I couldn't get precise translation of the answers, why we need to do ax = *(char *)*sp++ = ax; while if we remove ax = from the beginning the answer should not change, please correct me if I'm wrong, why we need this? thanks

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.