Code Monkey home page Code Monkey logo

pullreq-ml's Introduction

Github PR prediction (lots of code borrowed from pullreq-ml)

This Node/Python library creates ml features for Pull Requests by learning information about a Github Project. The aim of this library is to aid data scientists build ml models for predicting pull request behaviour

Getting Started

These instructions will get you a copy of the project up and running on your local machine for development and testing purposes.

Prerequisites

What things you need to install the software and how to install them

You will need the following:

  • Node 8 or newer
  • MongoDB 3.2 or newer
  • Git
  • A Github Access Token for using the Github API. This post explains how to get yours.

Installing & Running

  1. Choose a project to predict. In this document I will use https://github.com/Netflix/pygenie, because it is smaller, but you can use any, like the Node project

  2. Clone this repository into your machine:

    git clone https://github.com/benny-hal/pullreq-ml.git
  3. Install dependencies

    cd pullreq-ml # or pullreq-ml-master
    npm install
  4. Run mongo

    docker run -p 27017:27017 --name some-mongo -d mongo
  5. Replace the contents of config.js with the actual repo and database authentication. For example

     module.exports = {
         // Local Mongo DB
         MONGO_DB_URL: 'mongodb://github:github@localhost:27017/github',
         // Token
         GITHUB_ACCESS_TOKEN: '<your token here>',
         // Repo Information for example for https://github.com/Netflix/pygenie you should put
     }
  6. Clone the target repos inside the targetrepo folder

    cd targetrepo
    git clone https://github.com/Netflix/pygenie.git 
  7. Start fetching Repo information

    node fetch.js
  8. Create a features df for the PRs

    node query.js
  9. The jira-fetcher is a bit primitive: it uses jira client to query (jql) jira to find tickets that were reopened and have PRs, we assume that the first PR in those tickets had a bug. We use this to locate buggy PRs ibn order to train our algorithm and build a model. This is not ideal and should be improved. In pull_request.js we use an HTTP cal to scrap Jira online interface in order to get the PR use itself because I couldn't find a way to do it with the client. This should be improved as well!

pullreq-ml's People

Contributors

shoval-lev-hippo avatar alfasin avatar chertkovalex avatar comay17 avatar ashuster-hippo avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.