Code Monkey home page Code Monkey logo

gigagensom's Introduction

GigaGenSOM

GigaGenSOM generates sources files in the SOM (Simple Object Machine) language. Its main goal is to generate large codebases that can be used to evaluate how virtual machines cope with millions of lines of code.

Goals

  1. Generate code that can be parsed without error
  2. Generate code that is executable
  3. Generate code that shows somewhat realistic behavior
  4. Have a tool that is independent of any specific SOM implementation
  5. Keep it simple. This isn't a research project into how to generate code.

Non-Goals

To scope the project the current set of non-goals includes:

  1. Building a code generator as input for fuzzing
  2. Building a pretty printer or automatic code formatter
  3. Integrate with an existing SOM code base

To Do

  1. Generate Smalltalk-style methods
    • should size distribution be based on real data?

Why Python

  • this is considered tooling, most tooling in the SOM repos is either based on shell scripts or Python
  • we need Python already for benchmarking
  • we have PySOM/RPySOM as sources for some possibly relevant code bits

gigagensom's People

Contributors

smarr avatar

Watchers

 avatar

gigagensom's Issues

New splitting of helper methods breaks IntComp code

We now generate helper methods as follows:

 helper_run2 = (
    | l1 |
    l1 := l1 negated.
...
    l1 := l1 abs
  )

The splitting out of helper methods is done to adhere to the maximum size of methods imposed by CSOM and SOM (java).

The problem is that the local variable used by the IntegerComputationClassGenerator is not properly handled.
And, the result of the method is also not propagated back.

Possible Solutions

  1. use a field instead of a local (this should likely be correct, and relatively straightforward, and is possibly a fix specific to IntCompGen)
  2. hand over local to method, and retrieve return value (this seems more complicated because we may need to be able to handle more than a single parameter, but we can only return one. This is problematic in the context of generating specs)

IntComp and Benchmarking Useful Aspects

This Issue is a discussion and design issue for the IntComp code generation.

Now with IntComp work, it becomes relevant to think about what is actually measured by it.

Currently, on SOM (java) it seems to measure:

  • the parser speed (can by avoided by loading the class before the benchmark)
  • likely the lookup caches: the class has thousands for methods, which means lookup will do a linear search on the method array

The big question is how to proceed, and what we intent to measure.
Would be good to get some data on typical aspects for large code bases:

  • method length (distribution)
  • number of methods in a class (distribution)
  • number of classes (distribution)

This would allow us to generate code bases with a more natural shape.
There must be some data from the Smalltalk world that's directly applicable, and data from Ruby codebase might be relevant, too.

Code generation ideas

Currently the code generates trivial numeric code that does not do much useful stuff.
Though, we may want to have different code generators that generate patterns known from real programs.
Similar to "world generators" for games perhaps.

Ideas for code patterns could be:

  • initializing object structures (including collections, etc)
  • complex patterns from frameworks, for instance builder patterns, etc.
  • object graph traversal to compute some result or accumulate a report
  • summarizing/aggregating of collections/graphs, likely specific fields of objects

Redundant assignments to locals when splitting out helper methods

Helper methods are currently generated as follows:

  helper_helper_testIntMultiplySymmetricLiteral211 = (
    | int arg |
    int := 1.
    arg := -4294967297.
    int := 1.
    arg := -9223372036854775808.
    self expect: int * arg  toEqual:  arg * int.
    self expect: int * arg toBeKindOf: arg class.

    int := 1.
    arg := -9223372036854775809.
    self expect: int * arg  toEqual:  arg * int.
    self expect: int * arg toBeKindOf: arg class.

This means the first setting of int and arg are not strictly needed.
Arguably, even the later int := 1 is redundant and could be avoided.

In the interest of code size and readability of specs, those may want to be avoided.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.