This software takes as input a lot of data coming from scattered sources, to dump it into a MongoDB instance that stores documents as Linked Data. LD dictionaries are also provided.
The dumping (main/
) takes place with different layers of data, each layer adds more data and improve links between existing data. A uncomplete description of the process can be found here.
Different entities are tagged with TagME API, there is also a basic and raw scraping utility for Webpages implemented.
Types of entities stored can be found here.
Schema about the resulting graph of object classes can be found here.
Main scripts and libraries to create a multi-database instance of Pramantha's Datastore, with data from static modules (input/
directory), websites
crawling and TagMe API's results.
Copyright 2014, 2015 Pramantha Ltd Credits to Lorenzo, Claudio and Jacopo
This package contains the file to run a complete Cloud re-creation from local and online resources, using
the TagMe API
-
Install MongoDB with default address and port
-
Install requirements in
requirements.txt
.pip install -r requirements.txt
-
PhantomJS
andbson
support are required as system requirements. Find the right way of installing them in your system. -
run the script in
main.py
for the multi-layers deploying
TagMe API by:
Paolo Ferragina, Ugo Scaiella
TagMe research paper:
Fast and Accurate Annotation of Short Texts with Wikipedia Pages. IEEE Software 29(1): 70-75 (2012)
- add Sensors from Chronos API and from spreadsheet
- implement Google Search API in crawling
- add OrientDB support for graph description of the datastore
- ...