Code Monkey home page Code Monkey logo

hasbugs's Introduction

HasBugs - Handpicked Haskell Bugs

We try to provide a fresh dataset in the fashion of Defects4J to evaluate tasks such as fault localization, program repair and test generation.

Instead of just providing the commits, we learned from some of the datasets out there, which we address in the following:

  • A failing CI does not mean a bug - atleast not certainly from the code. Hence, our bug-points are aknowledged by the projects through PRs and Issues. This also means that the buggy version of the datapoints usually compiles.
  • Similarly, the Fix is what the maintainers consider fixed (even if that is removal of a feature).
  • As research by Martin Monperrus showed, not all found patches in program repair actually help despite passing the test-suite. To help with inspection of potential tasks, we look into the material provided (code, issues, project readme/architecture) to give a estimate of "what is buggy". We hope this helps to easier distinguish actually good result from e.g. overfitting.
  • We express multiple bug and fix locations in the datapoint

The aim of these changes is to provide a (small) dataset with bugs from real-world problems of which all steps are aknowledged by their maintainers (the bug, the test & the fix). We do not try to provide a dataset in the size that you can run deep learning on it - but we want to have a quality gold standard to benchmark modern tools against.

Datapoint

For every project, we provide

  • The buggy version
  • The fixed version
  • The tested (and failing) version
  • A git-patch of the test
  • A datapoint for analysis (see below)

A single datapoint can contain the following attributes (required attributes are marked with *, but these stars are not in the actual json!):

{
    "*id":42,
    "*repositoryurl": "https://github.com/ciselab/HasBugs",
    "*repository":"Ciselab/HasBugs",
    "*licence":"MIT",
    "*faultcommit":"xxx",
    "*fixcommit":"yyy",

    "ghc-version":"9.2.4",
    "buildframework":"Cabal",
    "testframeworks" : ["QuickCheck","Tasty"],

    "testpatch":true,
    "description": "Input Sanitazion missing - divides by zero",
    "categories" : ["Sanitazion", "DivideByZero"], 

    "relatedissues" : ["https:www.github.com/ciselab/HasBugs/issues/1"],
    "relatedprs" : ["https:www.github.com/ciselab/HasBugs/issues/2"],

    "*faultlocations" : [
        {
            "*startline": 5,
            "*endline": 15,
            "*file" : "./project/.../program.hs",
            "module" : "Math.Divison",
            "method": "divide"
        },{
            "*startline": 25,
            "*endline": 35,
            "*file" : "./project/.../program.hs",
            "module" : "Math.Divison",
            "method": "divide"
        }
    ],
    "*fixlocations" : [
        {
            "*startline": 5,
            "*endline": 20,
            "*file" : "./project/.../program.hs",
            "module" : "Math.Divison",
            "method": "divide"
        }
    ]
}

For a few more sentences on the datapoint fields and the commits see datapoint-notes.

To add a new datapoint, run setup.sh and the accompanying tools in ./tools. The shell scripts will ask you for some information on the command line.

Layout

  • The folder ./references contains the items to pull and setup a single datapoint. They are organized by ./references/repository/id.
  • The folder ./tools contains helpers to organize the references, e.g. to pull all datapoints or remove them.
  • The folder ./data contains the actual projects & dataset after you ran the tools to download everything. They are organized in folders by ./data/id.

Inclusion Criteria

  • Compatible with GHC >=8.10
  • Projects run with a single make test, cabal test or stack test
  • buggy version newer than 2018
  • bug and fix aknowledged as such by the maintainer(s) through Github Issues / PRs

Citation

If you use Datapoints, or just critizise this work, please use

@inproceedings{applis2023hasbugs,
	title={HasBugs-Handpicked Haskell Bugs},
	author={Applis, Leonhard and Panichella, Annibale},
	booktitle={2023 IEEE/ACM 20th International Conference on Mining Software Repositories (MSR)},
	pages={223--227},
	year={2023},
	organization={IEEE}
}

hasbugs's People

Contributors

lapplislazuli avatar

Stargazers

 avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

Forkers

lapplislazuli

hasbugs's Issues

Pandoc-3 fails on all stages

I get an error (for all stages) that looks as such:

...
Failed to build tasty-lua-1.0.2.
Build log (
/root/.cabal/logs/ghc-9.0.2/tasty-lua-1.0.2-d03b50baa5fae84e74320cd9dd535e7967df2f5d14d04ea770dfed92be3e2a02.log
):
Configuring library for tasty-lua-1.0.2..
Preprocessing library for tasty-lua-1.0.2..
Building library for tasty-lua-1.0.2..
[1 of 5] Compiling Test.Tasty.Lua.Arbitrary

src/Test/Tasty/Lua/Arbitrary.hs:65:43: error:
    • Could not deduce (Read HsLua.Core.Integer)
        arising from a use of ‘peekIntegral’
      from the context: LuaError e
        bound by the type signature for:
                   registerDefaultGenerators :: forall e. LuaError e => LuaE e ()
        at src/Test/Tasty/Lua/Arbitrary.hs:62:1-52
      There are instances for similar types:
        instance Read Prelude.Integer -- Defined in ‘GHC.Read’
    • In the third argument of ‘registerArbitrary’, namely
        ‘peekIntegral’
      In a stmt of a 'do' block:
        registerArbitrary "integer" pushinteger peekIntegral
      In the expression:
        do registerArbitrary "boolean" pushboolean peekBool
           registerArbitrary "integer" pushinteger peekIntegral
           registerArbitrary "number" pushnumber peekRealFloat
           registerArbitrary "string" pushString peekString
   |
65 |   registerArbitrary "integer" pushinteger peekIntegral
   |                                           ^^^^^^^^^^^^

src/Test/Tasty/Lua/Arbitrary.hs:66:43: error:
    • Could not deduce (Read Number)
        arising from a use of ‘peekRealFloat’
      from the context: LuaError e
        bound by the type signature for:
                   registerDefaultGenerators :: forall e. LuaError e => LuaE e ()
        at src/Test/Tasty/Lua/Arbitrary.hs:62:1-52
    • In the third argument of ‘registerArbitrary’, namely
        ‘peekRealFloat’
      In a stmt of a 'do' block:
        registerArbitrary "number" pushnumber peekRealFloat
      In the expression:
        do registerArbitrary "boolean" pushboolean peekBool
           registerArbitrary "integer" pushinteger peekIntegral
           registerArbitrary "number" pushnumber peekRealFloat
           registerArbitrary "string" pushString peekString
   |
66 |   registerArbitrary "number"  pushnumber  peekRealFloat
   |                                           ^^^^^^^^^^^^^
�[91mError: cabal: Failed to build tasty-lua-1.0.2 (which is required by
test:test-pandoc from pandoc-2.16.2). See the build log above for details.

�[0mThe command '/bin/sh -c cabal build' returned a non-zero code: 1
building ghcr.io/ciselab/hasbugs/pandoc-3:fixed-1.0.0 took 1586 seconds( 26 minutes)
...

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.