Code Monkey home page Code Monkey logo

lemonqc / research.lpca Goto Github PK

View Code? Open in Web Editor NEW

This project forked from merialdo/research.lpca

0.0 2.0 0.0 2.72 MB

This project analyzes the results of various models for Link Prediction on Knowledge Graphs using Knowledge Graph Embeddings. It allows to replicate the results in our work "Knowledge Graph Embeddings for Link Prediction: A Comparative Analysis" (https://arxiv.org/abs/2002.00819). Principal contributor: ANDREA ROSSI (https://github.com/AndRossi)

Python 100.00%

research.lpca's Introduction

research.lpca

This project analyzes the results of various models for Link Prediction on Knowledge Graphs using Knowledge Graph Embeddings. It allows to replicate the results in our work "Knowledge Graph Embeddings for Link Prediction: A Comparative Analysis".

Models

We include 16 models representative of various families of architectural choices. For each model we used the best-performing implementation available.

Project

Language

The project is completely written in Python 3.

Dependencies

  • numpy
  • matplotlib
  • seaborn

Structure

The project is structured as a set of Python scripts, each of which can be run separately from the others:

  • folder efficiency contains the scripts to visualize our results on efficiency of LP models.
    • Our findings for training times can be replicated by running script barchart_training_times.py
    • Our findings for prediction times can be replicated by running script barchart_prediction_times.py
  • folder effectiveness contains the scripts to obtain our results on the effectiveness:
    • folder performances_by_peers contains various scripts that show how the predictive performances of LP models vary, depending on the number of source and target peers of test facts.
    • folder performances_by_paths contains various scripts that show how the predictive performances of LP models vary, depending on the Relational Path Support of test facts.
    • folder performances_by_relation_properties contains various scripts that show how the predictive performances of LP models vary, depending on the properties of the relations of test facts.
    • folder performances_by_reified_relation_degree contains various scripts that show how the predictive performances of LP models vary, depending on the degree of the original reified relation in FreeBase.
  • folder dataset_analysis contains various scripts to analyze the structural properties of the original datasets featured in our analysis (e.g. for computing the source peers and target peers for each test fact, or its Relational Path Support, etc). We share the results we obtained using these scripts in ...

In each of these folders, the scripts to run in order to replicate the results of our paper are contained in the folders named papers.

We note that

  • In WN18RR, as reported by the authors of the dataset, a small percentage of test facts feature entities not included in the training set, so no meaningful predictions can be obtained for these facts. A few implementations (e.g. Ampligraph, ComplEx-N3) would actively skip such facts in their evaluation pipelines. Since the large majority of systems would keep them, we have all models include them in order to provide the fairest possible setting.
  • In YAGO3-10 we observe that a few entities appear in two different versions depending on HTML escaping policies or on capitalisation. In these cases, odels would handle each version as a separate, independent entity; to solve this issue we have performed deduplication manually. The duplicate entities we have identified are:
    • Brighton_&_Hove_Albion_F.C. and Brighton_&_Hove_Albion_F.C.
    • College_of_William_&_Mary and College_of_William_&_Mary
    • Maldon_&_Tiptree_F.C. and Maldon_&_Tiptree_F.C.
    • Alaska_Department_of_Transportation_&_Public_Facilities and Alaska_Department_of_Transportation_&_Public_Facilities
    • Turing_award and Turing_Award

How to run the project (Linux/MacOS)

  • Open a terminal shell;

  • Create a new folder named comparative_analysis in your filesystem by running command:

    mkdir comparative_analysis
  • Download and the datasets folder and the results folder from our storage, and move them into the comparative_analysis folder. Be aware that the files to download occupy around 100GB overall.

  • Clone this repository under the same comparative_analysis folder with command:

    git clone https://github.com/merialdo/research.lpca.git analysis
  • Open the project in folder comparative_analysis/analysis (using a Python IDE is suggested).

    • Access file comparative_analysis/analysis/config.py and update ROOT variable with the absolute path of your "comparative_analysis" folder.
    • In order to replicate the plots and experiments performed in our work, just run the corresponding Python scripts in the paper folders mentioned above. By default, these experiments will be run on dataset FB15K. In order to change the dataset on which to run the experiment, just change the value of variable dataset_name in the script you wish to launch. Acceptable values are FB15K, FB15K_237, WN18, WN18RR and YAGO3_10.

Please note that the data in folders datasets and results are required in order to launch most scripts in this repository. Those data can also be obtained by running the various scripts in folder dataset_analysis, that we include for the sake of completeness.

The global performances of all models on both min and avg tie policies can be printed on screen by running the the script print_global_performances.py.

research.lpca's People

Contributors

androssi avatar

Watchers

James Cloos avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.