Code Monkey home page Code Monkey logo

apache-spark-link-prediction's Introduction

Link Prediction in Citation Networks

A set of methods and model evaluation metrics for predicting links in an academic citation network using Apache Spark and Scala.

Description

In this experimental study we develop methods and try to evaluate models for predicting links in an academic citation network, by taking two different aspects into consideration:

  1. Having an insight about the existing network and some of its links and trying to restore a portion of it that has been deliberately removed
  2. Having no information about the existing network and rely only on the information of the scientific papers in order to predict the structure of the whole network.

For the first aspect we used supervised binary classification and more specifically the method of Logistic Regression which had a very good result, with F1 score close to 86% against the testing set. For the second aspect we relied mainly on Jaccard Similarity of the MinHash LSH of each paper’s abstract which had being vectorized using TF-IDF.

For more detailed information check the draft paper.

Prerequisites

Dataset

Our dataset contains 27,770 academic papers that are associated with the following information:

1. unique ID
2. publication year (between 1993 and 2003)
3. title
4. authors
5. name of journal
6. abstract

And exists under src/main/resources.

apache-spark-link-prediction's People

Contributors

vbarzokas avatar

Stargazers

 avatar  avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.