Code Monkey home page Code Monkey logo

wuhannan08 / funkr-pdae Goto Github PK

View Code? Open in Web Editor NEW

This project forked from qxl4515/funkr-pdae

0.0 0.0 0.0 38.89 MB

There is a huge project group in the open source community. Developers need to spend a lot of time to find interesting projects from many open source projects, so it is necessary to recommend suitable open source projects for developers. The application of a recommendation system in the field of software engineering can solve this problem. However,

Shell 3.40% Python 16.32% Perl 3.79% Java 76.50%

funkr-pdae's Introduction

FunkR-pDAE

SUMMARY

There is a huge project group in the open source community. Developers need to spend a lot of time to find interesting projects from many open source projects, so it is necessary to recommend suitable open source projects for developers. The application of a recommendation system in the field of software engineering can solve this problem. However, developers and project information in the open source community are implicit feedback. In order to fully exploit the information in the open source community and make it suitable for traditional recommendation algorithms, this paper proposes a personalized open source project recommendation method based on auto-encoder, called FunkR-pDAE (Funk singular value decomposition Recommendation approach using pearson correlation coefficient and Double-Auto-Encoders). By taking the Github community as an example, this paper constructs a scoring matrix that represents the developer’s preference for open source projects and a developer relevance matrix based on the characteristic attributes unique to Github. Use the Pearson Correlation Coefficient to calculate developer similarity based on the Developer Relevancy Matrix. Using the score matrix as input, use the auto-encoder to learn feature vectors that represent developers and open source projects. Combining the principle of Funk singular value decomposition, the obtained eigenvectors are converted into a new predictive scoring matrix. At the same time, we define a recommendation formula for Top-N recommendation.

Dataset

GHTorent retrieves high quality interconnect data through the REST API provided by GitHub. It contains all the public projects in Github. In this step, we obtain historical developers’ behavior data from the web site. The data includes information about developers, language information for open source projects, and a series of developers’ behavior about open source projects (watch, fork, pull-request comment, issue comment). The whole data set found at the following web site:     http://www.ghtorrent.org/downloads.html

PearsonCalcultor

The step is data collection and preprocessing. From the entire data set, based on the attributes of developers and projects, and the relationships between them, we collect the historical data we need. The development history data collected during this phase includes attributes such as watch, fork, issue-comment and pull-request comment that represent the association between the developer and the project, the follow attribute that represents the developer relationship. According to the above attribute characteristics, the matrix D and R are respectively constructed, and the relevance of the developers in the matrix is calculated through the pearson correlation coefficient.

ModelTraining

The step is model training and feature vector generation. We call the torch package in a python script via java. We implement the inner product calculation of two scoring vectors. Based on the predictive scoring matrix and developer correlation obtained by the model, we defined a recommended formula. We recommend the corresponding open source projects that may be of interest and return the Top-N projects to the developers.

Instructions

Some other experimental operations we completed through mysql ,matlab and excel.

funkr-pdae's People

Contributors

qxl4515 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.