smarr / gigagensom Goto Github PK

0.0 1.0 0.0 97 KB

GigaGenSOM generates sources files in the SOM (Simple Object Machine) language.

License: MIT License

Python 99.88% Shell 0.12%

gigagensom's Introduction

GigaGenSOM

GigaGenSOM generates sources files in the SOM (Simple Object Machine) language. Its main goal is to generate large codebases that can be used to evaluate how virtual machines cope with millions of lines of code.

Goals

Generate code that can be parsed without error
Generate code that is executable
Generate code that shows somewhat realistic behavior
Have a tool that is independent of any specific SOM implementation
Keep it simple. This isn't a research project into how to generate code.

Non-Goals

To scope the project the current set of non-goals includes:

Building a code generator as input for fuzzing
Building a pretty printer or automatic code formatter
Integrate with an existing SOM code base

To Do

Generate Smalltalk-style methods
- should size distribution be based on real data?

Why Python

this is considered tooling, most tooling in the SOM repos is either based on shell scripts or Python
we need Python already for benchmarking
we have PySOM/RPySOM as sources for some possibly relevant code bits

gigagensom's People

Contributors

Watchers

gigagensom's Issues

As part of testing the code generators, add test to check whether result executes

While we have basic unit tests at the moment, which compare the resulting code, we do not yet have tests that catch whether the generated code also executes.

This would be useful since the unit tests currently don't catch the fact that the IntComp.som generated is actually not executable, because of #2.

New splitting of helper methods breaks IntComp code

We now generate helper methods as follows:

 helper_run2 = (
    | l1 |
    l1 := l1 negated.
...
    l1 := l1 abs
  )

The splitting out of helper methods is done to adhere to the maximum size of methods imposed by CSOM and SOM (java).

The problem is that the local variable used by the IntegerComputationClassGenerator is not properly handled.
And, the result of the method is also not propagated back.

Possible Solutions

use a field instead of a local (this should likely be correct, and relatively straightforward, and is possibly a fix specific to IntCompGen)
hand over local to method, and retrieve return value (this seems more complicated because we may need to be able to handle more than a single parameter, but we can only return one. This is problematic in the context of generating specs)

IntComp and Benchmarking Useful Aspects

This Issue is a discussion and design issue for the IntComp code generation.

Now with IntComp work, it becomes relevant to think about what is actually measured by it.

Currently, on SOM (java) it seems to measure:

the parser speed (can by avoided by loading the class before the benchmark)
likely the lookup caches: the class has thousands for methods, which means lookup will do a linear search on the method array

The big question is how to proceed, and what we intent to measure.
Would be good to get some data on typical aspects for large code bases:

method length (distribution)
number of methods in a class (distribution)
number of classes (distribution)

This would allow us to generate code bases with a more natural shape.
There must be some data from the Smalltalk world that's directly applicable, and data from Ruby codebase might be relevant, too.

Code generation ideas

Currently the code generates trivial numeric code that does not do much useful stuff.
Though, we may want to have different code generators that generate patterns known from real programs.
Similar to "world generators" for games perhaps.

Ideas for code patterns could be:

initializing object structures (including collections, etc)
complex patterns from frameworks, for instance builder patterns, etc.
object graph traversal to compute some result or accumulate a report
summarizing/aggregating of collections/graphs, likely specific fields of objects

Redundant assignments to locals when splitting out helper methods

Helper methods are currently generated as follows:

  helper_helper_testIntMultiplySymmetricLiteral211 = (
    | int arg |
    int := 1.
    arg := -4294967297.
    int := 1.
    arg := -9223372036854775808.
    self expect: int * arg  toEqual:  arg * int.
    self expect: int * arg toBeKindOf: arg class.

    int := 1.
    arg := -9223372036854775809.
    self expect: int * arg  toEqual:  arg * int.
    self expect: int * arg toBeKindOf: arg class.

This means the first setting of int and arg are not strictly needed.
Arguably, even the later int := 1 is redundant and could be avoided.

In the interest of code size and readability of specs, those may want to be avoided.

smarr / gigagensom Goto Github PK

gigagensom's Introduction

GigaGenSOM

Goals

Non-Goals

To Do

Why Python

gigagensom's People

Contributors

Watchers

gigagensom's Issues

As part of testing the code generators, add test to check whether result executes

New splitting of helper methods breaks IntComp code

Possible Solutions

IntComp and Benchmarking Useful Aspects

Code generation ideas

Redundant assignments to locals when splitting out helper methods

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent