Light

svakulenk0 / kbqa Goto Github PK

View Code? Open in Web Editor NEW

84.0 6.0 13.0 29.14 MB

Complex Question Answering over Knowledge Graphs

Python 46.05% Jupyter Notebook 53.95%

kbqa's Introduction

QAmp

Svitlana Vakulenko, Javier D. Fernandez, Axel Polleres, Maarten de Rijke and Michael Cochez. Message Passing for Complex Question Answering over Knowledge Graphs. CIKM. 2019

Requirements

Python 3.6
tensorflow==1.11.0
keras==2.2.4
pyHDT (for accesssing the DBpedia Knowledge Graph)
elasticsearch==5.5.3 (for indexing entities and predicate labels of the Knowledge Graph)
pymongo (for storing the LC-QuAD dataset)
flask (for the API)

Datasets

LCQUAD 5,000 pairs of questions and SPARQL queries

Setup

It is not trivial to set up the environment. You need to:

Create virtual environment and install all dependencies (to install CUDA, TF, Keras and friends follow https://medium.com/@naomi.fridman/install-conda-tensorflow-gpu-and-keras-on-ubuntu-18-04-1b403e740e25)

conda create -n kbqa python=3.6 pip
conda activate kbqa
pip install -r requirements.txt

Install HDT API:

git clone https://github.com/webdata/pyHDT.git
cd pyHDT/
./install.sh

Download DBPedia 2016-04 English HDT file and its index from http://www.rdfhdt.org/datasets/
Follow instructions in https://github.com/svakulenk0/hdt_tutorial to extract the list of entities (dbpedia201604_terms.txt) and predicates
Index entities and predicates into ElasticSearch
Download LC-QuAD dataset from http://lc-quad.sda.tech
Import LC-QuAD dataset into MongoDB

sudo service mongod start

Run

see notebooks

Benchmark

python final_benchmark_results.py

Citation

@inproceedings{DBLP:conf/cikm/VakulenkoGPRC19,
  author    = {Svitlana Vakulenko and
               Javier David Fernandez Garcia and
               Axel Polleres and
               Maarten de Rijke and
               Michael Cochez},
  title     = {Message Passing for Complex Question Answering over Knowledge Graphs},
  booktitle = {Proceedings of the 28th {ACM} International Conference on Information
               and Knowledge Management, {CIKM} 2019, Beijing, China, November 3-7,
               2019},
  pages     = {1431--1440},
  year      = {2019},
  url       = {https://doi.org/10.1145/3357384.3358026},
  doi       = {10.1145/3357384.3358026},
  timestamp = {Mon, 04 Nov 2019 11:09:32 +0100}
}

kbqa's People

Contributors

Stargazers

Watchers

Forkers

nathaliewang keithpallo shalevy1 berryhn fubincom 443127316 cloudycrane muncca haophancs tianyustar azizullah2017 elbemiranda dalerxli

kbqa's Issues

lcquad_answers

Can you provide the Json files , lcquad_answers.json

Prediction on a custom dataset

Hi,
I have a query if the above solution can be effectively leveraged for a custom-built dataset? If yes, can u share the pipeline to be followed to build a custom query based KG.
Thanks

QA over the real huge KGs

'hdt.HDTDocument' object has no attribute 'configure_hops'

Hi,
In the notebooks 2_entities_KBQA I got errors when trying to run this function

def evaluate_entity_ranking(_e_spans, indices, top_n):
    '''
    Estimate ranking accuracy:
    n_samples <int> size of the sample questions pool
    top_n <int> threshold for the number of top entities 
    '''
    n_correct_entities, n_correct_entities_1hop = 0, 0
    n_correct_answers_1hop = 0
    # match entities
    for i in indices:
        top_e_ids = []
        
        # entities index lookup
        for span in _e_spans[i]:
            for match in e_index.match_label(span, top=top_n):
                top_e_ids.append(match['_source']['id'])
        
        if set(correct_entities_ids[i]).issubset(set(top_e_ids)):
            n_correct_entities += 1
        
        # extract a subgraph for top entities
        kg = HDTDocument(hdt_path+hdt_file)
        # all predicates: 1 hop
        kg.configure_hops(1, [], namespace, True)
        entities, _, _ = kg.compute_hops(top_e_ids)
        if set(correct_entities_ids[i]).issubset(set(entities)):
            n_correct_entities_1hop += 1
        if set(correct_answers_ids[i]).issubset(set(entities)):
            n_correct_answers_1hop += 1
        kg.remove()

The HDTDocument class from hdt package doesn't have attributes configure_hops and compute_hops
Could you please provide me some information about those two methods?
Thank you.

Problems in notebooks dataset files

Hello,

I'm trying to execute: "notebooks/1_question_parser.ipynb", but '1hop' and '2hop' are not exist in JSON files. I'd like to understand how to get or generate them.

Other question, how is possible to reproduce this article experiment:
http://www.vldb.org/pvldb/vol10/p565-cui.pdf

I could not understand project in overview, thanks a lot, for any help.

data set

Thank you for your contribution， Can you provide the data set?
@svakulenk0

Extract entities and predicates issue

when I do this step, there is an issue like below figure.

Sincerely need help.

Indexing procedure

Hi! What is the procedure for step 4 'Index entities and predicates into ES'? util/index.py requires a entity-frequency file, is that something we're supposed to create from dbpedia2016-04en.hdt and then feed into it? Thank you.

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.