Code Monkey home page Code Monkey logo

ncbo_cron's Introduction

NCBO CRON

A project with CRON job for the NCBO BioPortal

  • ncbo_cron daemon
  • Process or delete ontology
  • Generate annotator dictionary and cache
  • Calculate metrics
  • Process mapping counts
  • Bulk load mappings

Run the ncbo_cron daemon

To run it use the bin/ncbo_cron command

Running this command without option will run the job according to the settings defined in the NcboCron config file. Or by default in ncbo_cron/lib/ncbo_cron/config.rb

But the user can add arguments to change some settings.

Here an example to run the flush old graph job every 3 hours and to disable the automatic pull of new submissions:

bin/ncbo_cron --flush-old-graphs "0 */3 * * *" --disable-pull

It will run by default as a daemon

But it will not run as a daemon if you use one of the following options:

  • console (to open a pry console)
  • view_queue (view the queue of jobs waiting for processing)
  • queue_submission (adding a submission to the processing submission queue)
  • kill (stop the ncbo_cron daemon)

Stop the ncbo_cron daemon

The PID of the ncbo_cron process is in /var/run/ncbo_cron/ncbo_cron.pid

To stop the ncbo_cron daemon:

bin/ncbo_cron -k

Run manually

Process an ontology

bin/ncbo_ontology_process -o STY

Mappings bulk load

To load a lot of mappings without using the REST API (which can take a long time)

  • Put the mappings detailed in JSON in a file. Example (the first mapping have the minimum required informations):
[
    { 
        "creator":"admin",
        "relation" : ["http://www.w3.org/2004/02/skos/core#exactMatch"],
        "classes" : {   "http://class_id1/id1" : "ONT_ACRONYM1",
                        "http://class_id2/id2" : "ONT_ACRONYM2"}
    },
    { 
        "creator":"admin",
        "source_contact_info":"admin@my_bioportal.org",
        "relation" : ["http://www.w3.org/2004/02/skos/core#exactMatch", "http://purl.org/linguistics/gold/freeTranslation"],
        "Source":"REST",
        "source_name":"Reconciliation of multilingual mapping",
        "comment" : "Interportal mapping with all possible informations (to the NCBO bioportal)",
        "classes" : {   "http://purl.lirmm.fr/ontology/STY/T071" : "STY",
                        "http://purl.bioontology.org/ontology/STY/T071" : "ncbo:STY"}
    }
]
  • Run the job

bin/ncbo_mappings_bulk_load -b /path/to/mapping/file.json -l /path/to/log/file.log

ncbo_cron's People

Contributors

mdorf avatar ncbo-deployer avatar jvendetti avatar palexander avatar msalvadores avatar vemonet avatar ontoportal-bot-lirmm avatar syphax-bouazzouni avatar alexskr avatar jlamarque34 avatar dazza-codes avatar

Watchers

Andon Tchechmedjiev avatar James Cloos avatar Elcio Abrahão avatar Clement Jonquet avatar Amine Abdaoui avatar Stella Zevio avatar

ncbo_cron's Issues

Do a ruby script that remove data and files of a submission range (from "id1" to "id2" ) of an ontology.

The need

A script given an "ontology acronym" and a range of submission ids ( from "submission id" to "submission id" ), removes all the data (triple store) and files of the submissions inside that range.

Use case

From "stageportal" server, inside the "/srv/ontoportal/ncbo_cron" folder. The following command :

[ontoportal@stageportal ncbo_cron]$ bin/ncbo_ontology_submissions_eradicate -o MDRFRE --from 2 --to 5

Will :

  1. show you the list of all submissions that will be affected(removed).
  2. ask you to confirm (by typing "y'" for yes or "n" for no) if you want to remove (the data and files) of all the submission between 2 (included) and 5 (included).

Important: after the script a cache refresh is needed to see the result.

Details

  • The arguments "from" and "to" are reversible, i.e. --from 2 --to 5 is equal to --from 5 --to 2
  • The script delete the submission folder(files) then the data
  • Only the archived "submissions" (not used in production) can be removed

Do a script to test and pull a new ontology version if exist

Issue

There is only one where the testing if a new file exists, is in the pull_location CRON job . Which is done once a day.

The problem is that if we do an ontology reprocess with the ncbo_ontolgy_process script, it will not test the existence of a new version.

Solution

There are two ways to solve this

  1. The first and simple one is to just add a script called ncbo_ontology_pull to do the pull on demand.
  2. The second more complex, is to add in the submission process workflow a step that comes before the generate_rdf step called do_pull_location that will download and create a new submission if a new version is found.

set to "true" enable_pull_umls in ncbo_cron config file

in the config/config.rb add config.enable_pull_umls = true

 NcboCron.config do |config|
    config.redis_host           = Annotator.settings.annotator_redis_host
    config.redis_port           = Annotator.settings.annotator_redis_port
    config.enable_ontology_analytics = false
    config.search_index_all_url = 'http://localhost:8983/solr/term_search_core2'
    config.property_search_server_index_all_url = 'http://localhost:8983/solr/prop_search_core2'
    config.ontology_report_path = "#{$DATADIR}/reports/ontologies_report.json"
    config.enable_spam_deletion = false
    config.enable_pull_umls     = true
  end

Changes done in https://gite.lirmm.fr/bioportal/bioportal-configs/-/blob/master/agroportal/cron/config/config.rb for stage,agro and bioportal.

merge ncbo_cron to upstream

Merge to the tag: v 5.22.1
Result : No conflicts

Changed files :

State:

  • Deployed on stage
  • Deployed on production

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.