iai-group / nordlys Goto Github PK
View Code? Open in Web Editor NEWNordlys: Toolkit for entity-oriented and semantic search
Home Page: http://nordlys.cc/
License: Other
Nordlys: Toolkit for entity-oriented and semantic search
Home Page: http://nordlys.cc/
License: Other
how to change the settings of mongo, elastic, api, etc.
Hi,
I am following the nordlys installation instructions
The command below give some strange output
./scripts/load_mongo_dumps.sh mongo_dbpedia-2015-10.tar.bz2
One of the things it is saying is that the compressed file ends unexpectedly and that it perhaps is corrupted.
Can you help fix this? Or is this supposed to happen?
I've attached the full output
script_output.txt
I am getting 403 error when I'm trying to use ./scripts/load_mongo_dumps.sh to download data.
Full log:
~/nordlys ❯❯❯ ./scripts/load_mongo_dumps.sh mongo_dbpedia-2015-10.tar.bz2 master
MongoDB shell version v3.6.3
connecting to: mongodb://127.0.0.1:27017
MongoDB server version: 3.6.3
{
"db" : "test",
"collections" : 0,
"views" : 0,
"objects" : 0,
"avgObjSize" : 0,
"dataSize" : 0,
"storageSize" : 0,
"numExtents" : 0,
"indexes" : 0,
"indexSize" : 0,
"fileSize" : 0,
"fsUsedSize" : 0,
"fsTotalSize" : 0,
"ok" : 1
}
mongodb running!
############ Loading Mongo collection ...
--2018-05-28 22:11:59-- http://iai.group/downloads/nordlys-v02/mongo_dbpedia-2015-10.tar.bz2
Resolving iai.group (iai.group)... 162.241.224.152
Connecting to iai.group (iai.group)|162.241.224.152|:80... connected.
HTTP request sent, awaiting response... 301 Moved Permanently
Location: http://gustav1.ux.uis.no/downloads/nordlys-v02/mongo_dbpedia-2015-10.tar.bz2 [following]
--2018-05-28 22:11:59-- http://gustav1.ux.uis.no/downloads/nordlys-v02/mongo_dbpedia-2015-10.tar.bz2
Resolving gustav1.ux.uis.no (gustav1.ux.uis.no)... 152.94.1.85
Connecting to gustav1.ux.uis.no (gustav1.ux.uis.no)|152.94.1.85|:80... connected.
HTTP request sent, awaiting response... 403 Forbidden
2018-05-28 22:11:59 ERROR 403: Forbidden.
tar (child): /home/fedor/nordlys/tmp/mongo_dbpedia-2015-10.tar.bz2: Cannot open: No such file or directory
tar (child): Error is not recoverable: exiting now
tar: Child returned status 2
tar: Error is not recoverable: exiting now
2018-05-28T22:11:59.879-0400 the --db and --collection args should only be used when restoring from a BSON file. Other uses are deprecated and will not exist in the future; use --nsInclude instead
2018-05-28T22:11:59.880-0400 building a list of collections to restore from /home/fedor/nordlys/tmp dir
2018-05-28T22:11:59.880-0400 done
When I tried to build the index from dpbedia dump using the following code,
VERSION=2015-10
python -m nordlys.core.data.dbpedia.indexer_dbpedia_types data/config/dbpedia-$VERSION/index_types.config.json
I face the error ImportError: cannot import name 'NTriplesParser' from 'rdflib.plugins.parsers.ntriples'
The traceback is:
File "/home/xxxx/miniconda3/envs/matchmaker/lib/python3.8/runpy.py", line 194, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/home/xxxx/miniconda3/envs/matchmaker/lib/python3.8/runpy.py", line 87, in _run_code
exec(code, run_globals)
File "/mnt/nfs/scratch1/xxxx/nordlys/nordlys/core/data/dbpedia/indexer_dbpedia_types.py", line 34, in <module>
from rdflib.plugins.parsers.ntriples import NTriplesParser
ImportError: cannot import name 'NTriplesParser' from 'rdflib.plugins.parsers.ntriples' (/home/xxxx/miniconda3/envs/matchmaker/lib/python3.8/site-packag
es/rdflib/plugins/parsers/ntriples.py)
My package and python versions:
python==3.8.12
rdflib==6.0.2
New indexer_dbpedia_types
should be able to build type index directly from DBpedia dumps
When regenerating the dbpedia v2 runs using the configs from data/dbpedia-entity-v2/config
the memory usage of nordlys continuously rises to a point where it exceeds 16 gigabytes (I do not remember the exact number, it has been several weeks. I just remember that it caused several gigabytes of swap to be used on my my 16Gb ram machine.).
I have been able to work around this issue by splitting queries_stopped.json
into several files (by splitting the queries file in half the final memory usage is also cut roughly in half). But without modifications like this it is at best very slow to perform this action on "commodity" hardware, or at worst (swap on slow storage device) not possible as it can cause elastic search to exceed to default time-out value.
Including DBpedia 2016-10 and Wikidata
I setup nordlys on my system. Running the entity linking python script, I get stuck on a file not found error with file 'data/el/snapshot_2015_10.txt' (see below)
The directory el only lists these files
data/el
data/el/yerd
data/el/yerd/qrels_YERD_er.txt
data/el/yerd/qrels_YERD_elq.txt
data/el/yerd/queries_YERD.json
data/el/yerd/YERD_2015_10.tsv
data/el/erd
data/el/erd/qrels_ERD_elq.txt
data/el/erd/queries_ERD.json
data/el/erd/qrels_ERD_er.txt
data/el/model.txt
data/el/config_ltr.json
Any suggestions?
$ python -m nordlys.services.el
2019-06-25 17:27:03,639 - nordlys - INFO - Loading KB snapshot of proper named entities ...
Traceback (most recent call last):
File "/home/ben/nordlys/anaconda-bin/lib/python3.7/runpy.py", line 193, in _run_module_as_main
"__main__", mod_spec)
File "/home/ben/nordlys/anaconda-bin/lib/python3.7/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/home/ben/nordlys/nordlys/nordlys/services/el.py", line 198, in <module>
main(arg_parser())
File "/home/ben/nordlys/nordlys/nordlys/services/el.py", line 187, in main
el = EL(conf, Entity(), ElasticCache(DBPEDIA_INDEX), FeatureCache())
File "/home/ben/nordlys/nordlys/nordlys/services/el.py", line 90, in __init__
load_kb_snapshot(self.__config["kb_snapshot"])
File "/home/ben/nordlys/nordlys/nordlys/logic/el/el_utils.py", line 18, in load_kb_snapshot
with open(kb_file, "r") as f:
FileNotFoundError: [Errno 2] No such file or directory: 'data/el/snapshot_2015_10.txt'
Refactor w2v code: flatten structure
Hello,
I followed all the installation instruction succesfully and decided to see if the program can succesfully replicate the entity retrieval results.
This gave the following error.
root@IR:~/nordlys# python -m nordlys.core.retrieval.retrieval data/dbpedia-entity-v2-replication-test/config/retrieval_bm25.config.json
2017-12-13 12:16:26,963 - nordlys - INFO - scoring [INEX_LD-2009022] Szechwan dish food cuisine
Traceback (most recent call last):
File "/root/anaconda3/lib/python3.5/runpy.py", line 184, in _run_module_as_main
"__main__", mod_spec)
File "/root/anaconda3/lib/python3.5/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/root/nordlys/nordlys/core/retrieval/retrieval.py", line 263, in <module>
main(arg_parser())
File "/root/nordlys/nordlys/core/retrieval/retrieval.py", line 256, in main
r.batch_retrieval()
File "/root/nordlys/nordlys/core/retrieval/retrieval.py", line 209, in batch_retrieval
results = self.retrieve(queries[query_id])
File "/root/nordlys/nordlys/core/retrieval/retrieval.py", line 186, in retrieve
query = self.__elastic.analyze_query(query)
File "/root/nordlys/nordlys/core/retrieval/elastic.py", line 116, in analyze_query
tokens = self.__es.indices.analyze(index=self.__index_name, body=query, analyzer=analyzer)["tokens"]
File "/root/anaconda3/lib/python3.5/site-packages/elasticsearch/client/utils.py", line 76, in _wrapped
return func(*args, params=params, **kwargs)
TypeError: analyze() got an unexpected keyword argument 'analyzer'
Uninstalling version 6.0.0 of the package and installing 2.3.0 solved the problem.
Can we store only surface form => list of entities or also an associated value?
I.e., can we have multiple values for a key? Can we have a tuple as a value for a key
string => (entity, score)
data/raw-data/dbpedia-2015-10
=> data/raw-data/dbpedia-2015-10_sample
. (The two would be exactly identical structure, but the sample is part of the repo and is small.)download_all.sh
should create data/raw-data/dbpedia-2015-10
and download the full files under that.There is a mismatch between the file in the repository and the file expected in build_indices.sh when building the DBPedia types index.
The file in the repository is called index_dbpedia-2015-10_types.config.json
while the script looks for index_dbpedia_2015_10_types.config.json
.
My suggestion would be renaming the file to the latter to keep the format uniform.
data/config_sample
based on data/config
(same filenames with the paths changed).A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.