Code Monkey home page Code Monkey logo

jucc's Introduction

JuCC

JuCC logo


build codecov

This is the official Jadavpur University Compiler Compiler repository.

Key Features

  • Supports a subset of the C language for now.
  • Custom grammar files to easily extend the language.
  • LL(1) parsing with panic mode error recovery.
  • Generates .json parse tree outputs for easy visualization with Treant.js.
  • 100% Open Source (Apache-2.0 License)

Quickstart

The JuCC project is built and tested on Ubuntu 20.04.

$ git clone https://github.com/TheSYNcoder/JuCC
$ cd JuCC
$ sudo ./script/installation/packages.sh
$ cd server
$ npm i
$ cd ..
$ mkdir build
$ cd build
$ cmake -GNinja -DCMAKE_BUILD_TYPE=Release ..
$ ninja jucc
$ ./bin/jucc -g <grammar_file> -f <input_file> -o <output_json_file>

To run the unit tests provided,

$ mkdir build
$ cd build
$ cmake -GNinja -DCMAKE_BUILD_TYPE=Release ..
$ ninja
$ ./bin/jucc_test

To run the benchmarks, Note: -DCMAKE_BUILD_TYPE=Release is needed

$ mkdir build
$ cd build
$ cmake -GNinja -DCMAKE_BUILD_TYPE=Release ..
$ ninja
$ ./bin/jucc_benchmark

Before pushing or making a pull request ( The tests must pass, compulsory !! )

$ ninja
$ ninja check-format
$ ninja check-clang-tidy
$ ninja check-lint
$ ninja test

To add a new unit test, make a folder with the same relative path as in the src folder, and define your test. Please refer to docs for more details about writing tests using the googletest framework.

Additional Notes:

  • If you know what you're doing, install the prerequisite packages from ./script/installation/packages.sh manually.

For Developers

Please see the docs.

Contributing

Contributions from everyone are welcome!

jucc's People

Contributors

bisakhmondal avatar noob77777 avatar shehab7osny avatar thesyncoder avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

jucc's Issues

Improvement in grammar.

Summary

No cout, cin terminals in current grammar, in src/grammar/grammar.g

Solution

The grammar should be able to support cout, cin and should have its respective rules.


Duplicate Symbols in Symbol Table

Bug Report

Summary

Symbol Table bug

Environment

To address the bug, especially if it environment specific, we need to know what kind of configuration you are running on. Please include the following:

OS: Ubuntu (LTS) 20.04 or macOS 10.14+ (please specify version).

Compiler: GCC 7.0+ or Clang 8.0+.

CMake Profile: all

Steps to Reproduce


int main() {
    int x, y;
    cin >> x >> y;
    if (x != 0) {
        if (y > 0) {
            cout << y;
        } else {
            cout << -y;
        }
    }
    float z = 1 + 2 + 3 + 1000/ 50 * 23.2 * (x * y * 10);
    // cout << x << y << z;
    float z0 = 1 + 2 + 3 + 1000/ 50 * 23.2 * (x * y * 10);
    float z1 = 1 + 2 - 3 + 1000/ 50 * 23.2 * (x * y * 10);
    float z2 = 1 + 2 / 3 + 1000/ 50 * 23.2 * (x * y * 10);
    float z3 = 1 + 2 * 3 + 1000/ 50 * 23.2 * (x * y * 10);
    float z4 = 1 + 2 % 3 + 1000/ 50 * 23.2 * (x * y * 10);
    float z5 = 1 + 2 > 3 + 1000/ 50 * 23.2 * (x * y * 10);
    float z6 = 1 + 2 == 3 + 1000/ 50 * 23.2 * (x * y * 10);
    float z7 = 1 + 2 != 3 + 1000/ 50 * 23.2 * (x * y * 10);
    float z8 = 1 + 2 >= 3 + 1000/ 50 * 23.2 * (x * y * 10);
    float z9 = 1 + 2 <= 3 + 1000/ 50 * 23.2 * (x * y * 10);
    cout << z0 << z1 << z2 << z3 << z4;
    cout << z5 << z6 << z7 << z8 << z9;
}

Expected Behavior

ok

Actual Behavior

jucc: duplicate symbol: x
duplicate symbol: y
duplicate symbol: x
duplicate symbol: y
duplicate symbol: x
duplicate symbol: y
duplicate symbol: x
duplicate symbol: y
duplicate symbol: x
duplicate symbol: y
duplicate symbol: x
duplicate symbol: y
duplicate symbol: x
duplicate symbol: y
duplicate symbol: x
duplicate symbol: y
duplicate symbol: x
duplicate symbol: y
duplicate symbol: x
duplicate symbol: y
duplicate symbol: x
duplicate symbol: y

Improve Docs

Feature Request

Summary

Add further documentation, specially in the main README.md file about the files in the project and what they do.

More details should be added instead of nothing yet.

Symbol Table bug for undeclared symbols.

Bug Report

Summary

for declarations like int x, y; y is detected as a undeclared symbol.

Environment

To address the bug, especially if it environment specific, we need to know what kind of configuration you are running on. Please include the following:

OS: Ubuntu (LTS) 20.04 or macOS 10.14+

Compiler: GCC 7.0+ or Clang 8.0+

CMake Profile: all

Steps to Reproduce

in.cc

int main() {
    int x, y;
}

Expected Behavior

y is declared.

Actual Behavior

jucc: undelared symbol: y

Update grammar to LL(1)

Bug Report

Summary

Update grammar to LL(1)

Environment

OS: Ubuntu (LTS) 20.04 or macOS 10.14+.
Compiler: GCC 7.0+ or Clang 8.0+.
CMake Profile: all.

Steps to Reproduce

Known issue: with <assignment_expression> productions.
in.cc:
int main() { if (a == 5) { ; } }
grammar.g: default

Cause

grammar is not ll(1). Firsts set for <assignment_expression> productions intersect. There may be
more such issues.
Depends on #26.

Expected Behavior

Parser should successfully complete execution.

Actual Behavior

jucc: parsing table error: ...

Improvement of lexer to report detailed errors.

Improve lexer

Summary

The lexer should be able to report the errors in parsing, and also report the line numbers of the input file associated with the same.

Solution

A proper structured error consisting of line number and column number ( if possible ) and the associated error message.

Add error detection in parser and parsing table.

Feature Request

Summary

Add error detection in Parser and ParsingTable

Solution

We need to add more error statements in Parser class.
Add support in ParsingTable to detect grammar related errors like non ll(1) grammar.

Checklist

  • Add error statements in Parser class
  • Add error detection in ParsingTable class
  • Make sure incorrect grammar file or input file doesn't cause segmentation faults.

grammar.h naming issues

Naming issue

Summary

Remove typedef Entity.

Solution

Entity is a string and Rule is a list of entities. Remove using Entity = std::vectorstd::string from grammar/grammer.h
Make other nessacary changes.


file: lexer.h

Feature Request

Summary

Update header-safeguard in src/include/lexer/lexer.h

Solution

Header files follow the convention JUCC_LOCALNAMESPACE_FILENAME_H so we should update JUCC_LEXER_H to JUCC_LEXER_LEXER_H.


Segfault with comments

Bug Report

Note: Before filing a bug report, please make sure to check whether the bug has already been filed. If it has, please do not re-file the report, our developers are already hard at work fixing it!

Summary

Segmentation fault on commented code

Environment

To address the bug, especially if it environment specific, we need to know what kind of configuration you are running on. Please include the following:

OS: Ubuntu (LTS) 20.04 or macOS 10.14+ (please specify version).

Compiler: GCC 7.0+ or Clang 8.0+.

CMake Profile: all

Steps to Reproduce

// comment

int main() {
    ;
}

Expected Behavior

ok

Actual Behavior

segmentation fault

Add CI for macOS

Currently, the CI is only running on ubuntu 20.04. However, JUCC has been tested on macOS with the required dependencies.

Maybe it's time to add a CI and put the project on a temporary hold ๐Ÿบ
Only bug fixes, no new feature for some time.

Fix: grammar and lexer

Bug Report

Make grammar and lexer consistent with problem statement.

Summary

Some features are still missing in main branch grammar and lexer.

Data Type : integer (int), floating point (float) and void
Declaration statements : identifiers are declared in declaration statements as basic data types and may also be assigned constant values (integer of floating)
Condition constructs: if, else, nested statements are supported. There may be if statement without else statement.
Assignments to the variables are performed using the input / output constructs:
cin >> x - Read into variable x
cout << x - Write variable x to output
Only arithmetic operators {+, -, *, %} and assignment operator `=โ€™ are supported
Relational operators used in the if statement are < (less than), > (greater than), == (equal) and != (not equal)
Only function is main(), there is no other function. The main() function does not contain arguments and no return statements.

Expected Behavior

Rule for following operators should be present in grammar:

  • %

Support for additional tokens:

  • >=
  • <=
  • !=
  • +
  • -
  • *
  • /
  • %

Solution

Update grammar.g file and lexer.

Report Errors in Parsing Table.

Summary

Add error detection in parsing table for non - LL(1) grammars.

Solution

The parsing table shouldn't accept non-ambiguous grammar or grammar that is not LL(1).
A proper error message is recorded for ambiguous types of grammars.

FIX THE PARSER

BUG REPORT: FIX THE PARSER

OS: Ubuntu (LTS) 20.04 or macOS 10.14+ (please specify version).
Compiler: GCC 7.0+ or Clang 8.0+.
CMake Profile: all

Steps to Reproduce

inputs

grammar.g

## This is the grammar file for JuCC
## Edit this file to make changes to the parsing grammar
## Epsilon is represented by special string EPSILON

## Terminals
%terminals
else float if int void
( ) { } * + - / % ,
<< >> < > <= >= = == != ;
identifier integer_constant float_constant
main cin cout
%

## Non Terminals
%non_terminals
<primary_expression> <constant> <unary_operator> <unary_expression>
<type_specifier> <multiplicative_expression> <additive_expression>
<shift_expression> <relational_expression> <equality_expression>
<assignment_expression> <expression>
<declaration> <init_declarator_list> <init_declarator>
<initializer> <declarator> <direct_declarator>
<statement> <compound_statement> <block_item_list> <block_item>
<expression_statement> <selection_statement> <program>
%

## Start Symbol
%start
<program>
%

## Grammar for the language
%rules
## Expressions
<primary_expression> : identifier
<primary_expression> : <constant>
<primary_expression> : ( <expression> )
<constant> : integer_constant
<constant> : float_constant
<unary_operator> : +
<unary_operator> : -
<unary_expression> : <primary_expression>
<unary_expression> : <unary_operator> <primary_expression>
<multiplicative_expression> : <unary_expression>
<multiplicative_expression> : <multiplicative_expression> * <unary_expression>
<multiplicative_expression> : <multiplicative_expression> / <unary_expression>
<multiplicative_expression> : <multiplicative_expression> % <unary_expression>
<additive_expression> : <multiplicative_expression>
<additive_expression> : <additive_expression> + <multiplicative_expression>
<additive_expression> : <additive_expression> - <multiplicative_expression>
<shift_expression> : <additive_expression>
<shift_expression> : cin >> <additive_expression>
<shift_expression> : cout << <additive_expression>
<shift_expression> : <shift_expression> << <additive_expression>
<shift_expression> : <shift_expression> >> <additive_expression>
<relational_expression> : <shift_expression>
<relational_expression> : <relational_expression> < <shift_expression>
<relational_expression> : <relational_expression> > <shift_expression>
<relational_expression> : <relational_expression> <= <shift_expression>
<relational_expression> : <relational_expression> >= <shift_expression>
<equality_expression> : <relational_expression>
<equality_expression> : <equality_expression> == <relational_expression>
<equality_expression> : <equality_expression> != <relational_expression>
<assignment_expression> : <equality_expression>
<assignment_expression> : <assignment_expression> = <equality_expression>
<expression> : <assignment_expression>

## Declarations
<declaration> : <type_specifier> <init_declarator_list> ;
<init_declarator_list> : <init_declarator>
<init_declarator_list> : <init_declarator_list> , <init_declarator>
<init_declarator_list> : EPSILON
<init_declarator> : <declarator>
<init_declarator> : <declarator> = <initializer>
<type_specifier> : void
<type_specifier> : int
<type_specifier> : float
<declarator> : <direct_declarator>
<direct_declarator> : identifier
<direct_declarator> : ( <declarator> )
<initializer> : <assignment_expression>

## Statements
<statement> : <compound_statement>
<statement> : <expression_statement>
<statement> : <selection_statement>
<compound_statement> : { <block_item_list> }
<block_item_list> : <block_item>
<block_item_list> : <block_item> <block_item_list> 
<block_item_list> : EPSILON
<block_item> : <declaration>
<block_item> : <statement>
<expression_statement> : <expression> ;
<expression_statement> : ;
<selection_statement> : if ( <expression> ) <compound_statement>
<selection_statement> : if ( <expression> ) <compound_statement> else <compound_statement>

## Main
<program> : <type_specifier> main ( ) <compound_statement>
%

intest1.cc

int main() {
	int x = 1 + 2 + 3 + 4;
	if 1;
}

intest2.cc

int

Expected output:

Some error ๐Ÿ’ข

Current Output:

Segmentation fault (core dumped)

Improve Symbol Table

Improvement of Symbol Table

Summary

The current implementation of the symbol tables deletes the lexemes on scope end and thus cannot be used for look-up after the lexer phase.

Solution

A better implementation would be an introduction of visibility flags - to be set to false on scope end rather than deleting.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.