Code Monkey home page Code Monkey logo

nordlys's People

Contributors

dariogarigliotti avatar hasibi avatar kbalog avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

nordlys's Issues

mongo_dbpedia-2015-10.tar.bz2 corrupted?

Hi,
I am following the nordlys installation instructions
The command below give some strange output
./scripts/load_mongo_dumps.sh mongo_dbpedia-2015-10.tar.bz2
One of the things it is saying is that the compressed file ends unexpectedly and that it perhaps is corrupted.
Can you help fix this? Or is this supposed to happen?

I've attached the full output
script_output.txt

403 forbidden when loading data

I am getting 403 error when I'm trying to use ./scripts/load_mongo_dumps.sh to download data.
Full log:

~/nordlys ❯❯❯ ./scripts/load_mongo_dumps.sh mongo_dbpedia-2015-10.tar.bz2                                                                      master
MongoDB shell version v3.6.3
connecting to: mongodb://127.0.0.1:27017
MongoDB server version: 3.6.3
{
	"db" : "test",
	"collections" : 0,
	"views" : 0,
	"objects" : 0,
	"avgObjSize" : 0,
	"dataSize" : 0,
	"storageSize" : 0,
	"numExtents" : 0,
	"indexes" : 0,
	"indexSize" : 0,
	"fileSize" : 0,
	"fsUsedSize" : 0,
	"fsTotalSize" : 0,
	"ok" : 1
}
mongodb running!
############ Loading Mongo collection ...
--2018-05-28 22:11:59--  http://iai.group/downloads/nordlys-v02/mongo_dbpedia-2015-10.tar.bz2
Resolving iai.group (iai.group)... 162.241.224.152
Connecting to iai.group (iai.group)|162.241.224.152|:80... connected.
HTTP request sent, awaiting response... 301 Moved Permanently
Location: http://gustav1.ux.uis.no/downloads/nordlys-v02/mongo_dbpedia-2015-10.tar.bz2 [following]
--2018-05-28 22:11:59--  http://gustav1.ux.uis.no/downloads/nordlys-v02/mongo_dbpedia-2015-10.tar.bz2
Resolving gustav1.ux.uis.no (gustav1.ux.uis.no)... 152.94.1.85
Connecting to gustav1.ux.uis.no (gustav1.ux.uis.no)|152.94.1.85|:80... connected.
HTTP request sent, awaiting response... 403 Forbidden
2018-05-28 22:11:59 ERROR 403: Forbidden.

tar (child): /home/fedor/nordlys/tmp/mongo_dbpedia-2015-10.tar.bz2: Cannot open: No such file or directory
tar (child): Error is not recoverable: exiting now
tar: Child returned status 2
tar: Error is not recoverable: exiting now
2018-05-28T22:11:59.879-0400	the --db and --collection args should only be used when restoring from a BSON file. Other uses are deprecated and will not exist in the future; use --nsInclude instead
2018-05-28T22:11:59.880-0400	building a list of collections to restore from /home/fedor/nordlys/tmp dir
2018-05-28T22:11:59.880-0400	done

Cannot import name 'NTriplesParser when build dbpedia index

When I tried to build the index from dpbedia dump using the following code,

VERSION=2015-10
python -m nordlys.core.data.dbpedia.indexer_dbpedia_types data/config/dbpedia-$VERSION/index_types.config.json

I face the error ImportError: cannot import name 'NTriplesParser' from 'rdflib.plugins.parsers.ntriples'
The traceback is:

File "/home/xxxx/miniconda3/envs/matchmaker/lib/python3.8/runpy.py", line 194, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/home/xxxx/miniconda3/envs/matchmaker/lib/python3.8/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/mnt/nfs/scratch1/xxxx/nordlys/nordlys/core/data/dbpedia/indexer_dbpedia_types.py", line 34, in <module>
    from rdflib.plugins.parsers.ntriples import NTriplesParser
ImportError: cannot import name 'NTriplesParser' from 'rdflib.plugins.parsers.ntriples' (/home/xxxx/miniconda3/envs/matchmaker/lib/python3.8/site-packag
es/rdflib/plugins/parsers/ntriples.py)

My package and python versions:
python==3.8.12
rdflib==6.0.2

Refine API logging

  • Have only a single line on server errors with a different status code/text.
  • Also include the name of the component invoked as an extra column

Nordlys seems to contain a memory leak

When regenerating the dbpedia v2 runs using the configs from data/dbpedia-entity-v2/config the memory usage of nordlys continuously rises to a point where it exceeds 16 gigabytes (I do not remember the exact number, it has been several weeks. I just remember that it caused several gigabytes of swap to be used on my my 16Gb ram machine.).

I have been able to work around this issue by splitting queries_stopped.json into several files (by splitting the queries file in half the final memory usage is also cut roughly in half). But without modifications like this it is at best very slow to perform this action on "commodity" hardware, or at worst (swap on slow storage device) not possible as it can cause elastic search to exceed to default time-out value.

el.py FileNotFoundError in load_kb_snapshot

I setup nordlys on my system. Running the entity linking python script, I get stuck on a file not found error with file 'data/el/snapshot_2015_10.txt' (see below)

The directory el only lists these files

data/el
data/el/yerd
data/el/yerd/qrels_YERD_er.txt
data/el/yerd/qrels_YERD_elq.txt
data/el/yerd/queries_YERD.json
data/el/yerd/YERD_2015_10.tsv
data/el/erd
data/el/erd/qrels_ERD_elq.txt
data/el/erd/queries_ERD.json
data/el/erd/qrels_ERD_er.txt
data/el/model.txt
data/el/config_ltr.json

Any suggestions?

$ python -m nordlys.services.el 
2019-06-25 17:27:03,639 - nordlys - INFO - Loading KB snapshot of proper named entities ...
Traceback (most recent call last):
  File "/home/ben/nordlys/anaconda-bin/lib/python3.7/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/home/ben/nordlys/anaconda-bin/lib/python3.7/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/home/ben/nordlys/nordlys/nordlys/services/el.py", line 198, in <module>
    main(arg_parser())
  File "/home/ben/nordlys/nordlys/nordlys/services/el.py", line 187, in main
    el = EL(conf, Entity(), ElasticCache(DBPEDIA_INDEX), FeatureCache())
  File "/home/ben/nordlys/nordlys/nordlys/services/el.py", line 90, in __init__
    load_kb_snapshot(self.__config["kb_snapshot"])
  File "/home/ben/nordlys/nordlys/nordlys/logic/el/el_utils.py", line 18, in load_kb_snapshot
    with open(kb_file, "r") as f:
FileNotFoundError: [Errno 2] No such file or directory: 'data/el/snapshot_2015_10.txt'

python package elasticsearch-6.0.0 gives unexpected keyword argument 'analyzer'

Hello,

I followed all the installation instruction succesfully and decided to see if the program can succesfully replicate the entity retrieval results.

This gave the following error.

root@IR:~/nordlys# python -m nordlys.core.retrieval.retrieval data/dbpedia-entity-v2-replication-test/config/retrieval_bm25.config.json
2017-12-13 12:16:26,963 - nordlys - INFO - scoring [INEX_LD-2009022] Szechwan dish food cuisine
Traceback (most recent call last):
  File "/root/anaconda3/lib/python3.5/runpy.py", line 184, in _run_module_as_main
    "__main__", mod_spec)
  File "/root/anaconda3/lib/python3.5/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/root/nordlys/nordlys/core/retrieval/retrieval.py", line 263, in <module>
    main(arg_parser())
  File "/root/nordlys/nordlys/core/retrieval/retrieval.py", line 256, in main
    r.batch_retrieval()
  File "/root/nordlys/nordlys/core/retrieval/retrieval.py", line 209, in batch_retrieval
    results = self.retrieve(queries[query_id])
  File "/root/nordlys/nordlys/core/retrieval/retrieval.py", line 186, in retrieve
    query = self.__elastic.analyze_query(query)
  File "/root/nordlys/nordlys/core/retrieval/elastic.py", line 116, in analyze_query
    tokens = self.__es.indices.analyze(index=self.__index_name, body=query, analyzer=analyzer)["tokens"]
  File "/root/anaconda3/lib/python3.5/site-packages/elasticsearch/client/utils.py", line 76, in _wrapped
    return func(*args, params=params, **kwargs)
TypeError: analyze() got an unexpected keyword argument 'analyzer'

Uninstalling version 6.0.0 of the package and installing 2.3.0 solved the problem.

Rename dbpedia-2015-10 to dbpedia-2015-10_sample

  • Rename data/raw-data/dbpedia-2015-10 => data/raw-data/dbpedia-2015-10_sample. (The two would be exactly identical structure, but the sample is part of the repo and is small.)
  • Update documentation/scripts to work with sample by default.
  • download_all.sh should create data/raw-data/dbpedia-2015-10 and download the full files under that.

Incorrect filename in build_indices.sh for DBPedia types index

There is a mismatch between the file in the repository and the file expected in build_indices.sh when building the DBPedia types index.

The file in the repository is called index_dbpedia-2015-10_types.config.json while the script looks for index_dbpedia_2015_10_types.config.json.

My suggestion would be renaming the file to the latter to keep the format uniform.

Create config files for DBpedia sample

  • Create config files in data/config_sample based on data/config (same filenames with the paths changed).
  • Also create config files for ER, EL and TTI using the sample indices.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.