geneontology / noctua Goto Github PK

View Code? Open in Web Editor NEW

35.0 26.0 13.0 196.21 MB

Graph-based modeling environment for biology, including prototype editor and services

Home Page: http://noctua.geneontology.org/

License: BSD 3-Clause "New" or "Revised" License

Shell 0.01% CSS 0.31% HTML 0.01% JavaScript 99.68% Procfile 0.01%

noctua owl go-cam geneontology pathways curation ontology noctua-models functional-annotation annotation

noctua's Introduction

The Noctua Stack

The Noctua Stack is a curation platform developped by the Gene Ontology Consortium. The stack is composed of:

Minerva: the backend data server to retrieve, store, update and delete annotations.
Barista: an authentication layer controling and formating all communications from/to Minerva.
Noctua: the website to browse the annotations in production and development and provide an editorial platform to produce Gene Ontology Causal Activity Models (or GO-CAMs) using either the simple UI Noctua Form or the more advanced Graph Editor.

The biological knowledge are stored in RDF/OWL using the blazegraph triplestore implementation. In effect, any piece of knowledge stored in RDF/OWL is a triple { subject, predicate, object } defining a relationship (or association) between a subject and an object. Those triples are also commonly stored in Turtle files.

Installation

Pre-requisite

You must have npm installed. On ubuntu/debian, simply type:

sudo apt-get install nodejs

On OSX, it is also possible to install npm either from nodejs.org or using brew:

brew install node

Steps for a local Installation

# The full Noctua stack is a multi-repositorie project; optionally create a main directory for the stack to contain all the repositories.
# These instruction assume that "gulp" is in your path; if local-only, use: `./node_modules/.bin/gulp`.

# Creating a local directory for our work.
mkdir noctua-stack && cd noctua-stack

# Repo containing metadata (users, groups, etc.).
git clone https://github.com/geneontology/go-site.git
# The data repo to start the store and save to.
git clone https://github.com/geneontology/noctua-models.git
# Repo for the backend server.
git clone https://github.com/geneontology/minerva.git
# Repo for the Noctua client and middleware (Barista).
git clone https://github.com/geneontology/noctua.git

# Build the Minerva server (and CLI).
cd minerva && sh ./build-cli.sh && cd ..

# Create default authentication users with your favorite editor.
mkdir barista
vim barista/local.yaml
-
 uri: 'http://orcid.org/XXXX-XXXX-XXXX-XXXX'
 username: my_username
 password: my_password

# Install Noctua Form (old "simple-annoton-editor")
git clone https://github.com/geneontology/noctua-form.git
git clone https://github.com/geneontology/noctua-landing-page.git

# Install Noctua as an all-local installation.
cd noctua
npm install
cp config/startup.yaml.stack-dev ./startup.yaml

# Edit configuration file (barista, user, group, noctua models location, minerva memory to at least 16GB, link to NoctuaForm / SAE)
vim startup.yaml

# Build the stack and Blazegraph Journal (triplestore)
./node_modules/.bin/gulp build
# If running first time.
./node_modules/.bin/gulp batch-minerva-destroy-journal
./node_modules/.bin/gulp batch-minerva-destroy-ontology-journal
./node_modules/.bin/gulp batch-minerva-create-journal

# Then launch the stack, waiting for each to successfully start up:
./node_modules/.bin/gulp run-minerva &> minerva.log &
./node_modules/.bin/gulp run-barista &> barista.log &
./node_modules/.bin/gulp run-noctua &> noctua.log &

Additional notes

Gulp Tasks

doc - build the docs, available in doc/
test - need more here
build - assemble the apps for running
watch - development file monitor
clean - clean out /doc and /deploy

In addition, the last 3 lines of the installation steps launch all the 3 layers of the Noctua Stack:

gulp run-barista &> barista.log &
gulp run-minerva &> minerva.log &
gulp run-noctua &> noctua.log &

And Gulp can be used to both destroy and create blazegraph journals (triplestore):

gulp batch-minerva-destroy-journal
gulp batch-minerva-destroy-ontology-journal
gulp batch-minerva-create-journal

Users & groups

Barista, the authentication layer needs two files to run: users.yaml and groups.yaml. These files defined who is authorized to log in to the Noctua Stack to perform biological curations.

To know more about curation with the Noctua Stack, visit our wiki.
To request an account to curate with the Noctua Stack, contact us

Libraries and CLI to communicate with the Noctua Stack

bbop-manager-minerva

This is the high-level API with OWL formatted requests (e.g. add individual, add fact or evidence using class expressions). https://github.com/berkeleybop/bbop-manager-minerva

minerva-requests

This is the request object used to format specific queries to Minerva. It is composed of a basic request object as well as a request_set designed to chain multiple request objects and speed up complex tasks. https://github.com/berkeleybop/minerva-requests

Some useful details about the API are described here

CLI (REPL)

The Noctua REPL is a recommended step for anyone trying to learn the syntax and how to build requests to Minerva in the Noctua Stack. As any REPL, it allows for the rapid testing of multiple commands and to check the responses from barista. This project can be considered as a basic prototype for any other client wanting to interact with the stack.

https://github.com/geneontology/noctua-repl

Known issues

The bulk of major issues and feature requests are handled by the tracker (https://github.com/geneontology/noctua/issues). If something is not mentioned here or in the tracker, please contact Seth Carbon or Chris Mungall.

Sometimes, when moving instance or relations near a boundary, the relations will fall out of sync; either move nearby instances or refresh the model
Sometimes, when editing an instance, the relations (edges) will fall out of sync; either move nearby instances or refresh the model
The endpoint scheme is reversed between creation and instantiation
TODO, etc.

noctua's People

Contributors

Stargazers

Watchers

Forkers

cmungall mugitty doctorbud hdrabkin planteome ukemi lpalbou dustine32 dsun2

noctua's Issues

Revisit evidence model

Richer and nested evidence. What should the end model look like? Obviously more than the current annotation model.

Import from BEL

Add import features from BEL:
http://neurolex.org/wiki/Category:Resource:Biological_Expression_Language_Framework

Add model metadata editor

We would like to be able to add metadata at the level of the model.

On the server, this would be implemented as a collection of OWLAnnotations, where each OWLAnnotation is essentially a property-value pair: http://owlapi.sourceforge.net/javadoc/org/semanticweb/owlapi/model/OWLAnnotation.html - an annotation is either URI or an XSD literal (number, string, etc)

The set of properties are open ended, drawn from vocabularies such as rdfs, dublin core. Of particular interest:

dc:title - e.g. "wnt signaling in epithelial crypts in mouse"
dc:description - similar to an abstract in a paper
dc:author - auto-filled using persona?
dc:created - auto-filled
rdfs:comment - zero or more per model
dc:source - e.g. pubmed URL
?:status - e.g. in-progress, in-review, completed.

The client code should not care about any property in particular, it should be generic. We may want the server to provide the list on request; or it could be in the client config. The exception may be some kind of status field that determines whether the model is persisted on the server.

The metadata interface would be a simple form-like interface; properties could be autocompletes or pulldowns. We may want to allow users to enter an arbritrary URI for a property.

Some of the metadata could be exposed on the front page for model selection (but this functionality will likely be subsumed by amigo)

Switch to better addition templates; add better specifications for them

Graph addition templates need to have easier and more independent configuration options.

Add a basic form creation interface

There are stubs to start prototyping this.
Basic.js
basic_content.tmpl

Dump to cytoscape formats over total model set

The ability to export global(!) data for viewing.
Flat?
XGMML?
How do we link?
Webcytoscape or desktop?

Use RO, plus a hard-coded popular subset instead of current "system"

The current "system" is a rather lame: just a poor hardcoded list. We need to replace this with something more flexible delivered from some source over the wire.

Use Jenkins pipeline to produce GAFs from LEGO

Use the Jenkins pipeline to produce GAFs from LEGO. Probably for Heiko.
Questions:

annotation extensions?
should we use BP and CC in the extension column?
should it go to GPAD-1?

Decommission the probulator

server should now return separate top level list of facts (ObjectPropertyAssertions) keyed under "facts". This is redundant with with the OPAs returned within individuals, keyed by property shortId, which is now deprecated, and will be removed in future.

{
  // USE ME
  "facts": [
    {
      "object": "gomodel:52f19a000000001-GO-0004872-52f19a000000002",
      "property": "part_of",
      "subject": "gomodel:52f19a000000001-GO-0007166-52f19a000000003"
    }
  ],
  "properties": [
    {
      "id": "part_of",
      "type": "ObjectProperty",
      "label": "part_of"
    }
  ],
  "individuals": [
    {
      // IGNORE ME
      "part_of": [
        {
          "id": "gomodel:52f19a000000001-GO-0004872-52f19a000000002",
          "type": "NamedIndividual"
        }
      ],
      // END OF IGNORE ME

      "id": "gomodel:52f19a000000001-GO-0007166-52f19a000000003",
      "type": [
        {
          "id": "GO:0007166",
          "type": "Class",
          "label": "cell surface receptor signaling pathway"
        }
      ]
    },
    {
      "id": "gomodel:52f19a000000001-GO-0004872-52f19a000000002",
      "type": [
        {
          "id": "GO:0004872",
          "type": "Class",
          "label": "receptor activity"
        }
      ]
    }
  ]
}

Revisit model level annotations

When editing model level annotations, we want:

open-ended properties
don't show evidence type
- Really? What if the model as a whole comes from a single source, like a textbook?

Save agent

A proposed draft of a save agent. Chris believes that just a cron job would be easier, if less powerful.

#/!bin/bash
set -e
# Any subsequent commands which fail will cause the shell script to exit immediately

# Default values
DO_UPDATE="FALSE"
DO_COMMIT="FALSE"
WORK_FOLDER="$(pwd)"

## Command line parsing
while [[ $# > 0 ]]
do
key="$1"
shift

case $key in
    -u|--update)
    DO_UPDATE="TRUE"
    ;;
    -c|--commit)
    DO_COMMIT="TRUE"
    ;;
    -f|--folder)
    WORK_FOLDER="$1"
    shift
    ;;
    *)
      # unknown option
    ;;
esac
done

#echo "DO UPDATE   = $DO_UPDATE"
#echo "DO COMMIT   = $DO_COMMIT"
#echo "WORK FOLDER = $WORK_FOLDER"

if [ "$DO_UPDATE" = "TRUE" ]; then
  echo "UPDATE Folder: $WORK_FOLDER"
  svn update --accept mine-full $WORK_FOLDER
fi

if [ "$DO_COMMIT" = "TRUE" ]; then
  echo "ADD unversioned files to SVN in folder: $WORK_FOLDER"
  svn st $WORK_FOLDER | grep "^\?" | awk "{print \$2}" | xargs svn add $1

  echo "COMMIT folder: $WORK_FOLDER"
  echo svn commit -m"automatic updated of models" $WORK_FOLDER
fi

Details/addType/edit shield for instances

Will need to finish merge's gut-and-redraw code.

Power use capability: enter OWL class expressions for any class slot

Example: enabled_by some C

Where C is a class expression such as

('foo complex' and has_part some X and has_part some Y)

Eventually this will be superseded by something like a pluggable cell component editor. For now we need a way for power users who understand OWL to get this in.

Currently enabled_by assumes it's a named class (that has been loaded into golr)

Allow user to enter free text. Server will figure out how to interpret. E.g. We will try to parse using manchester syntax tool. @hdietze and me to implement

Pipeline from Reactome

Pipeline from Reactome. Ideally, the user is able to just input an ID somewhere and have the converted model loaded/merged in.

Questions about exactly when in the process this would be best.

Use separate templates and widgets for all HTML

There should no HTML in App.js--everything display should be through a widget or an API.
Need to simplify codebase and enforce separation.

Authentication and authorization

authentication and authorization

Box nesting for part_of

This will be a gather involved enhancement.
Should go after other refactors.

Add simple plugin framework

Would like to start a general framework for adding things like this on startup, scan plugins namespace and add things found to the plugins found, eg:

bbop.mme.plugin.foo = function([???]){
this.name = "Bar";
...
};

Would have the option of adding tab to lower area.
Would be one of these types: always on, toggleable, clickable.

Chris's dl query ( #1 ) would be the clickable type--operating incrementally on that event, likely having the ability to add classes to the display to accomplish the activity. Plugins would be able to store their own state.

I think that with this, several of the things in tabs now could be rewritten as plugins, improving code structure.

Occasional non-deterministic layout?

http://localhost:8910/seed/model/gomodel:wb-GO_0043053

It appears that chrome and ff have different opinions about where "nuclear import" should be might be confusing (although, if telekinesis is allowed, they would almost immediately align with a jump).

See how serious this is.

Implement service to map tokens to user data

Essentially a page and a service for remote clients to figure out the facts of the person who logged in. Not a directory since we don't want to be spiderable, but if you have a token you can get identity information.

Implement live synced view all the time

Depends, is related to, #14

constraints on simple addition templates

@cmungall As discussed, I've re-added the separate templates for BP and MF. Just to have it on hand, what are the autocomplete field constraints that you're interested in? If necessary, dynamic could be a possibility.

JSON-LD: use of "type" as key

The JSON currently uses the key "type" in two different ways - one to list all types in a model, another to list types for an individual. This results in an odd nesting.

(this is perhaps more of an owltools request but easier to include here, change will need coordinate with client)

Handling of inferred types

Currently the server returns the asserted type of an individual.

The server is capable of returning the direct inferred type (via Elk). It will be useful to show this, but it's not clear the best way to communicate this from the server, and to show it to the user.

Choices

leave as-is
server only returns direct inferred type in json payload. Client code requires no modification. Disadvantage: user may be confused as the original source assertion is no longer visible.
server returns both asserted and inferred type. Client code is modified to show inferred type in main plumb view. Both asserted and inferred types could be shown in more detailed view (e.g. when a user edits a node, or views a node in the bottom tab). Subchoices here are:
- both direct inferred and asserted are in the payload but not distinguished; not ideal, client can't distinguish these
- as above, but additional metadata provided (e.g. is_inferred:true). Not ideal, as this is not the property of the class; it's the property of the relationship between individual and class
- we have two separate keys: type and inferred_type with appropriate values. I think this may be best

Example

occurs in

User creates "signal transducer activity" individual, with an additional anon class type assertion, "occurs_in some cytoplasm"

Currently user sees this:

We may also want to show the direct inferred type "intracellular signal transducer activity"

part of

User creates an MF instance of "receptor activity". Then creates a BP instance of some subtype of "signal transduction" (e.g. cell surface receptor signaling pathway). Adds a part of. The former should have a direct inferred type to the more specific "signaling receptor activity"

Currently sees this:

Note: MolecularModelManagerTest.testInferredType() is a stub test that performs this set up, but has no junit assertions until we decide what to do here

College Station

Tasks to be completed by the GO meeting in Texas, at College Station.

Model summary tab or page

(file under non-urgent)

col1: molecular entities (things that are the subject of enabled bys, possibly the leaves in the class expression if a complex/class expression)
col2: ontology classes (targets of direct type assertions, plus targets of occurs in)
col3: publication(s). The first one listed should be any annotation assertion at the level of the model

For look and feel, see:
http://curation.pombase.org/genotypetest/curs/23297b95cc45755a

Might also by nice to eventually have a user page like this:
http://curation.pombase.org/genotypetest/

Bioentity autocomplete boxes to constrain by species

Use MyGene?

Allow inconsistent actions and report to user

Possibly change in screen color, etc.

Add ECO to initial pull for labels

Try and get ECO up front in initial load, like the other up front loads.

Meaningful glyphs for edges

activate - arrow ->
inhibition - bar -|
neutral, such as upstream - diamond -<>

Add comment manipulation

Allow the display/manipulation of comments.

Write own bezier+ implementation for jsPlumb

Looks like just plugins and overrides. Started already with:

connectors-sugiyama.js

Test case for restricting connectivity of boxes

It should be impossible to make this model. Note: the implementation needs to happen in the ontology before we can do anything in the UI here.

Prefix: owl: <http://www.w3.org/2002/07/owl#>
Prefix: rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
Prefix: xml: <http://www.w3.org/XML/1998/namespace>
Prefix: xsd: <http://www.w3.org/2001/XMLSchema#>
Prefix: rdfs: <http://www.w3.org/2000/01/rdf-schema#>



Ontology: <http://purl.obolibrary.org/obo/gomodel_pombase-52fa621d0000002>

Import: <http://purl.obolibrary.org/obo/ro.owl>
Import: <http://purl.obolibrary.org/obo/go/extensions/ro_pending.owl>
Import: <http://purl.obolibrary.org/obo/go.owl>

ObjectProperty: <http://purl.obolibrary.org/obo/RO_0002211>


Class: <http://purl.obolibrary.org/obo/GO_0048018>


Class: <http://purl.obolibrary.org/obo/GO_0004707>


Individual: <http://purl.obolibrary.org/obo/gomodel_pombase-52fa621d0000002-GO-0004707-52fa621d0000004>

 Types:
 <http://purl.obolibrary.org/obo/GO_0004707>


Individual: <http://purl.obolibrary.org/obo/gomodel_pombase-52fa621d0000002-GO-0048018-52fa621d0000003>

 Types:
 <http://purl.obolibrary.org/obo/GO_0048018>

 Facts:
 <http://purl.obolibrary.org/obo/RO_0002211> <http://purl.obolibrary.org/obo/gomodel_pombase-52fa621d0000002-GO-0004707-52fa621d0000004>

Need temporary reminder to users that they are dealing with alpha software

Need temporary reminder to users that they are dealing with alpha software.

Probably a dismissible modal.

Cloning models, versioning, and history

This touches on issues of metadata, versioning, and what to do in the long run with the backend.
Like ~~#33~~ and #34

Moderator server architecture

Consider merging launcher and messenger into a unified server. Client sends to moderator, moderator checks and passes to MMM, gets response from MMM, passes back to all clients (with intention?).

message queue to prevent races, etc. could catch conflicts early
would also mean that a&a could be entirely on the client, with the MMM server just listening to whitelist addresses or something.
could also mean that the layout is stored by the messenger server, allowing people a more sane way to interact without conflict (by going all-in on shared)

Possibly need to add intentions to the server calls.

Add evidence manipulation

Allow evidence to be handled in the model.

Add ability to expand main plumb view

Sometimes I want more real estate for the nice graph views. The left panel and bottom panel aren't always required. It would be nice to have a button to pop this out to fullscreen

View option to show MF only

Add a view option to show only boxes that have an enabed_by assertion. Low priority

Batch application of properties

Feature request by David when trying a more complicated example.

The take away from the request is that there is an option at the [model] level that says something like [Add PMID to all instances] -> pop-up with single input, and then batch request for all additions.

Time saver in larger models.

Use mygene API for completion where appropriate

Like in the main app.
May need to create new manager for this.

Add ability to add additional someValuesFrom type expressions for an instance

Example:

User has created an instance of a molecular function. The interface allows them to add two additional SomeValuesFrom class expressions,

occurs_in some L
enabled_by some M

The user would like add arbitrary additional someValuesFrom expressions

R some Y

Particular example:

has_direct_input some GeneProduct123

But any relation from the RO can be used.

This could be at the time of initial box creation, or a post-create edit on the box

Data files need to be under control of github

Need to have a better way of saving files permanently.
Use GO SVN?

Switch to new relations handling (on the end client)

Currently using /getRelations in bootstrap, need to switch to the new handler.
Talk to Heiko if you forget (again).

Use Jenkins pipeline to seed LEGO from GAFs?

Using the Jenkins pipeline to seed LEGO from GAFs?
Possibly for Chris.

Rework landing/selection screen to be more intuitive

This will relate to:

what metadata we store and how it operates with a&a for model recovery
what is permissible and what remote sources we have for initial generation

Add a tab similar to Protege "DL Query" tab

Add a new tab in the bottom div (e.g. to the right of EMPTY)

The tab should contain a simple textbox, in which advanced users can type DL expressions such as one of the following:

'kinase activity'
'part_of' some 'dauer entry'
'regulates' some 'dauer entry'
'occurs_in' some 'cell'
'occurs_in' some 'extracellular region' and 'regulates' some 'kinase activity'
'enabled_by' some 'gene123'

In future this could be assisted by autocomplete, but this is
non-trivial; for now the assumption the user is an ontologist familiar
with both OWL and the GO, and is capable of typing the correct string.

The user then clicks 'submit'; the string is passed as-is to a
server method that wraps MMM.getIndividualsByQuery(modelId, queryString)
(I just committed this)

(this method parses the manchester string to an expression and uses
the reasoner to find the individuals).

This returns a list of individuals which should be a subset of
individuals in the model. These individuals are then highlighted
somehow in the main display. Alternatively, they may be shown in some
kind of list view.

Note that this mimics the DL query tab in Protege, with the
"individuals" selection.

Optional future enhancements include

ability to build a library of multiple queries
query builder
add the query to the ontology as an equivalence axiom

Add a comment on a box

It should be possible to attach any OWLAnnotation (prop-value pair: not an annotation in the GO sense) to any individual. We may want to provide a shortcut for some like rdfs:comment, and have special default (but customizable) behavior in the plumb view.

A user should be able to click on a box, select "add comment", and then type in free text.

We may want this to be visible on a mouseover.

Move towards having a plugin architecture

Consider plugin architecture, maybe as part of refactor.
Would like to start a general framework for adding things like this on startup, scan plugins namespace and add things found to the plugins found, e.g.:

bbop.mme.plugin.foo = function([???]){
  this.name = "Bar";
  // ...
};

Has option of adding tab below.

Is one of these types: always on, toggleable, clickable.

Chris's DL query (#1) would be the clickable type--operating incrementally on that event, likely having the ability to add classes to the display to accomplish the activity.
Plugins would be able to store their own state.