Code Monkey home page Code Monkey logo

translator-knowledge-beacon's Introduction

Purpose

Overview

This project documents the Knowledge Beacon Application Programming Interface (KBAPI).

Specifically, this repository holds the OpenAPI ("Swagger") definition of the KBAPI specification archived in the 'api' subfolder.

Check out our Knowledge Beacon Wiki for additional documentation on the status of Knowledge Beacon API implementations and some additional notes on how to build your own.

This API was developed as an early research and development during the Feasibility Phase of the Biomedical Knowledge Translator Consortium ("Translator"), funded by the National Center for Advancing Translational Sciences ("NCATS") program of the US National Institutes of Health. In the "Development" phase of Translator, the concept of the Knowledge Beacon API has been supplanted by the Translator Reasoner Application Programming Interface ("TRAPI") although Knowledge Beacons remain hosted online at https://kba.ncats.io as documented in our paper (citation below).

Citation

Hannestad LM, Dančík V, Godden M, Suen IW, Huellas-Bruskiewicz KC, Good BM, et al. (2021) Knowledge Beacons: Web services for data harvesting of distributed biomedical knowledge. PLoS ONE 16(3): e0231916. https://doi.org/10.1371/journal.pone.0231916

Knowledge Beacons Workflow

The KBAPI is primarily designed to support a simple knowledge discovery workflow. The endpoint are generally summarized in the following table:

Section Endpoint Description
Metadata /categories List of available concept categories
/predicates List of available predicates
/kmap Knowledge map of the beacon
Concepts /concepts Query concepts by keywords
/concepts/{conceptId} Details about concept
/exactmatches Retrieve equivalent concept identifiers
Statements /statements Query statements by concept id
/statements/{statementId} Details about statements

The workflow captured by the KBAPI is generally as illustrated in the following diagram:

Knowledge Beacon Application Programming Interface

Aside from the concept and statement accessing endpoints, the KBAPI also provides access to the list of concept (/categories) and relationship (/predicates) data types used by the beacon.

In fact, concept instances returned by various calls (/concepts?keywords=.., /concepts/{conceptId} and the subject/object concepts in knowledge assertions returned by the /statements endpoint) are specified by the API to be tagged by the semantic concept types (i.e. "gene", "drug", "disease", etc.) reported by the /categories endpoint, which is assumed to be based on a semantic data type controlled vocabulary (originally based on the UMLS Metamap concept categories, but which is now compliant to eh NCATS Translator endorsed Biolink Model.

The KBAPI also provides endpoints (/exactmatches) to report CURIE identifiers which are deemed to globally identify the functionally equivalent (sensa-SKOS exactMatch or OWL sameAs).

Knowledge Beacons in Action!

The pool of known active and proposed beacons is enumerated in a master YAML-formatted catalog of beacons. The significant currently active ones are as follows:

Other beacon wrappers may be hosted in other repositories elsewhere (see the catalog of beacons).

Sample Usage of the API

A concept by keywords search on the Semantic Medline Database Knowledge Beacon Endpoint:

https://kba.ncats.io/beacon/semmeddb/concepts?keywords=hyperhomocysteinemia

gives the following result:

[
  {
    "categories": [
      "disease or phenotypic feature"
    ],
    "description": null,
    "id": "UMLS:C0598608",
    "name": "Hyperhomocysteinemia"
  }
]

Taking the concept id (with additional constraints) may be used to search for knowledge statements that answer questions like "what could treat Hyperhomocysteinemia?"

https://kba.ncats.io/beacon/semmeddb/statements?s_keywords=vitamin&edge_label=treats&t=UMLS%3AC0598608&offset=1&size=5

gives the following result:

[
  {
    "id": "UMLS:C0301532:treats:UMLS:C0598608",
    "object": {
      "categories": [
        "disease or phenotypic feature"
      ],
      "id": "UMLS:C0598608",
      "name": "Hyperhomocysteinemia"
    },
    "predicate": {
      "edge_label": "treats",
      "negated": true,
      "relation": "semmeddb:treats"
    },
    "subject": {
      "categories": [
        "chemical substance"
      ],
      "id": "UMLS:C0301532",
      "name": "Multivitamin preparation"
    }
  },
  {
    "id": "UMLS:C0087162:treats:UMLS:C0598608",
    "object": {
      "categories": [
        "disease or phenotypic feature"
      ],
      "id": "UMLS:C0598608",
      "name": "Hyperhomocysteinemia"
    },
    "predicate": {
      "edge_label": "treats",
      "negated": true,
      "relation": "semmeddb:treats"
    },
    "subject": {
      "categories": [
        "chemical substance"
      ],
      "id": "UMLS:C0087162",
      "name": "Vitamin B6"
    }
  },
  {
    "id": "UMLS:C0042890:treats:UMLS:C0598608",
    "object": {
      "categories": [
        "disease or phenotypic feature"
      ],
      "id": "UMLS:C0598608",
      "name": "Hyperhomocysteinemia"
    },
    "predicate": {
      "edge_label": "treats",
      "negated": true,
      "relation": "semmeddb:treats"
    },
    "subject": {
      "categories": [
        "chemical substance"
      ],
      "id": "UMLS:C0042890",
      "name": "Vitamins"
    }
  },
  {
    "id": "UMLS:C0042849:treats:UMLS:C0598608",
    "object": {
      "categories": [
        "disease or phenotypic feature"
      ],
      "id": "UMLS:C0598608",
      "name": "Hyperhomocysteinemia"
    },
    "predicate": {
      "edge_label": "treats",
      "negated": true,
      "relation": "semmeddb:treats"
    },
    "subject": {
      "categories": [
        "chemical substance"
      ],
      "id": "UMLS:C0042849",
      "name": "Vitamin B Complex"
    }
  },
  {
    "id": "UMLS:C0042845:treats:UMLS:C0598608",
    "object": {
      "categories": [
        "disease or phenotypic feature"
      ],
      "id": "UMLS:C0598608",
      "name": "Hyperhomocysteinemia"
    },
    "predicate": {
      "edge_label": "treats",
      "negated": true,
      "relation": "semmeddb:treats"
    },
    "subject": {
      "categories": [
        "chemical substance"
      ],
      "id": "UMLS:C0042845",
      "name": "Vitamin B 12"
    }
  }
]

suggesting that some B vitamins can help treat Hyperhomocysteinemia.

Knowledge Beacon Aggregator

REST clients may also access aggregate data obtained from a pool of Knowledge Beacons through an instance of the Knowledge Beacon Aggregator, a public version for which is hosted online at https://kba.ncats.io.

Beacon Validation

A Knowledge Beacon Validator was developed to check Beacon function.

Knowledge Beacon Clients

A basic command line client (and associated Python client access library) was developed for simple access to the Knowledge Beacon Aggregator. The documentation for the client calls of the API are documented in relative detail here.

translator-knowledge-beacon's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

translator-knowledge-beacon's Issues

KEGG beacon?

Has biothings explorer ingested enough of KEGG? If not, then should we create a KEGG beacon?

Source and target filters

@RichardBruskiewich @cmungall Now that the source filters are not required, should the source and target filters be functionally equivalent? Or, should the target filters only apply if the source filters apply?

Alternatively we could get rid of them entirely, and replace them with subject and object filters. That makes more sense to me anyway, since the predicates are almost always directed. With the kmap one could easily figure out whether they should be filtering on the subject or object, anyway. The source and target filters are much more difficult to implement than subject and object filters would be.

Fill out evidence for pubmed articles

I just came across this endpoint: https://eutils.ncbi.nlm.nih.gov/entrez/eutils/esummary.fcgi?db=pubmed&id=22733540&retmode=json from https://www.ncbi.nlm.nih.gov/pmc/tools/get-metadata/, which gives information such as author names, publication title, journal name, volume, issue, pages, first (lead) author, last (senior) author, reference count, language, and exact matches (I see they have: pii, doi, pmc, mid, rid, eid, pmcid).

Currently in the beacons we're only displaying date, evidence type, id, name, and uri. It seems like we're missing out on information that could be relevant to weighting edges.

@cmungall @RichardBruskiewich should we expand the evidence properties in the beacons to include all of this?

Search specification: favour exact keyword matches.

When searching for concepts, exact matches are mixed in with matches that contain the search term, including matches with the search term in the middle of a word. This can cause the search results to have low relevance. One example is that when searching for "liver," the 1st result is "Drug Delivery Systems," the 4th result is "Liver dysfunction," and "Liver" doesn't appear until the 3rd page.

We need to be more clear about what beacons are supposed to do when they execute searches. The order could work like this:

  1. Results whose names exactly match the search phrase come first, e.g. if you type in "diabetes mellitus" and there is a concept with the name "diabetes mellitus", then that concept would come before "diabetes" or "gestational diabetes mellitus".
  2. Results whose names exactly match a term in the search phrase come second. So if you search "liver" then "liver disfunction" should come before "Drug Delivery Systems"
  3. Then, results whose description or definition exactly match a term in the search phrase come third.
  4. Then, finally, any other hit can follow. It's acceptable that "Drug Delivery Systems" would appear for "liver", as long as it comes after anything relevant to the liver.

Also all keyword matches should be caseless. "diabetes mellitus" should match "Diabetes Mellitus".

Add a /predicates API call to return list of predicates used by the beacon and their definitions?

It would actually be helpful in beacon client design to have access to the full list of predicates being used in statements from a beacon, plus a precise definition on their interpretation including directionality of application (e.g. subject => predicate => object direction). At some point, one may even suggest that a global ontology of predicates be established with a central catalog of predicates established to encourage reuse (and perhaps, global CURIEs)

This proposed API call obviously should also be proxied and pooled through to the beacon-aggregator

Attributes for predicates

Can knowledge beacon API be updated so that predicates can have tag-value attributes (similar to what concepts have)

Random access pagination or iteration?

@cmungall @RichardBruskiewich @vdancik Must we support random access pagination (getting a page of a given offset and size)? Would it be troubling to support only iteration (getting the next page of a given size)?

There may not be a bijective mapping from the knowledge source's records to the statements we want to extract. Sometimes I might infer a single statements from multiple records, or multiple statements from a single record. And sometimes I'm not able to apply filters when getting records from the knowledge source, and I must throw away records that don't match the filters. This is easy if we don't need to support random access, and pretty challenging otherwise. With NDEx we've been caching all results and then returning pages from the cache, but that seems like a pretty impractical solution.

Add introspection aggregate queries to API

We would like a client to be able to ask a beacon what info it contains and how much

@micheldummontier will work on a PR to do this. We won't necessarily merge this, as it seems that the swagger is generated from the code.

Path variable truncation causes problems for CURIEs that use period characters

By default, Spring's @RequestMapping truncates everything after the last . in a @PathVariable. For example, "complex.filename.html" is passed in as "complex.filename". This is a problem for API calls of the form /path/{value}.

For example, for the request /exactmatches/HGNC.SYMBOL%3ADMBT1, Spring passes "HGNC" in into the conceptId variable, as if you had requested /exactmatches/HGNC.

This issue affects the beacon aggregator as well as the translator knowledge beacon. I haven't checked the reference beacon.

Namespaces endpoint

This endpoint would be like the kmap for exact matches. This is an example of what kind of data it would return:

[
   {
      "clique_mappings":[
         {
            "prefix":"OMIM",
            "uri":"http://purl.obolibrary.org/obo/OMIM_"
         },
         {
            "prefix":"Orphanet",
            "uri":"http://www.orpha.net/ORDO/Orphanet_"
         },
         {
            "prefix":"UniProtKB",
            "uri":"http://identifiers.org/uniprot/"
         }
      ],
      "frequency":905,
      "local_prefix":"NCBIGene",
      "uri":"http://www.ncbi.nlm.nih.gov/gene/"
   },
   {
      "clique_mappings":[
         {
            "prefix":"WBPhenotype",
            "uri":"http://purl.obolibrary.org/obo/WBPhenotype_"
         },
         {
            "prefix":"UPHENO",
            "uri":"http://purl.obolibrary.org/obo/UPHENO_"
         }
      ],
      "frequency":2286,
      "local_prefix":"HP",
      "uri":"http://purl.obolibrary.org/obo/HP_"
   }
]

This would essentially say that the beacon has concepts identified by NCBIGene and HP curies, and that it can produce exact matches between OMIM, Orphanet, UniProtKB, and NCBIGene, and between HP, UPHENO, and WBPhenotype.

This will be useful for clients (e.g., the beacon aggregator) to be able to know whether or not they should query a beacon when trying to build up concept cliques. It would also document the case of the curie prefixes (e.g., "NCBIGene" vs "NCBIGENE" vs. "ncbigene") that the beacon uses.

wrap disgenet

I recommend this is done via the sparql endpoint. I can provide the sparql required

No links from gene to disease (SLC26A3)

On this gene:
image

There are no links to diseases

these are available through the monarch instance of biolink:

https://api.monarchinitiative.org/api/bioentity/gene/OMIM%3A126650/diseases/?fetch_objects=true&rows=100

The only query results seem to be wikidata and semmeddb, even though I have others checked:

image

The logs don't help much - I click to see these and it takes me to https://kba.ncats.io/errorlog?sessionId=s3Wg7ENqMudJy75fyJbA which is a json file with [] in it

proposal: replace exactMatches service with call to statements API

set a predicate parameter to reflect something like 'owl:sameAs' to retrieve mappings.

this would reduce the number of services in the API and make it clear that we would like to see evidence and provenance for these relationships - just like any other relationship returned by a service.

Beacon API incremental update 1.1.1

A few possible loose ends in the 1.1.0 release:

  • /kmap should return the 'negated' parameter for the predicate value of a mapping
  • there may be utility in extending the metadata endpoints to include the following:
  1. A Beacon "/light" endpoint which simply serves as a low overhead 'ping' if a beacon is online
  2. A Beacon '/description" endpoint which returns a significant amount of the descriptive documentation which is currently annotated in the Beacon list file

Proposal for API version 1.3.0

Beacon API proposals:

  1. Change the evidence date to a json object {"year" : 2015, "month" : 4, "day" : 23} rather than a string. If we do this then there is no possibility of different beacons formatting dates differently, and will allow applications to get this information without having to parse date strings.
  2. Add some sort of qualification flag/status for statements that have been inferred through either deduction or heuristic. What is the best way to display this?
  3. Optional sub-graph ID? Some knowledge sources (like NDEx) are really a collection of independent knowledge graphs. In that case we might want to be able to choose which of those sub-graphs to query.
  4. Replace the statement source and target filters with subject and object filters. See #61, #60
  5. Set appropriate minimum and default values for size and offset. It makes the logic easier if we don't have to handle null values, and it prevents an accidental dump of the whole knowledge source. We may wish to set a max size as well, something like 10,000?
  6. Metadata endpoints should report the total number of nodes (concepts) and edges (statements)
  7. Concept and statement details endpoints should take a list of identifiers and give you a list of detail entities.
  8. Bring back synonyms on concepts endpoint. This way we can display gene symbols as well as protein names.
  9. Replace/complement random access pagination with a next page token. NDEx, for example, only supports pagination over networks and not over the nodes and edges in those networks. The NDEx beacon's next page token could represent: {network=5, offset=62, size=800}. Thus allowing each beacon to implement pagination with as many parameters as it needs. #59. Along with the next page token we can return, whenever possible, the total number of records for that query.
  10. Remove fields from metadata endpoints: /categories remove uri, local_id, local_uri, add local_category. /predicates remove id, uri, local_id, local_uri, local_relation.
  11. Remove separate details endpoints. Instead have the response fields be configurable. User can pass in a list of fields, and those fields will show up in the response.
  12. Add an endpoint that returns metadata about the beacon (rather than the knowledge graph), like its name, its github page, a wiki or jupyter notebook explaining how to use it (maybe all beacons can share one), who to contact about it, and a link to the knowledge source it wraps.

Aggregator API proposals:

  1. Add publication metadata to evidence, including the publications title, abstract, authors, journal, volume, issue, page numbers, page count, reference count, and language. See #56.
  2. Use the publication metadata to implement a statement score, and display that score in main statements endpoint. An initial scoring mechanism could be something like this:
score = 0
for line in abstract.split('.'):
    score += line.count(subject_name) * line.count(predicate_name) * line.count(object_name)
score = sigmoid(score)

January 2018 Hackathon review and feedback on further iterations of the Knowledge Beacon API

Time to review the good, the bad and the ugly of the overall objectives, big picture design and implementation priorities of the Knowledge Beacon API. After 9 months of working with the concept and the specific initial implementation, what would we do differently now? What is missing (e.g. Provenance)? What is less useful or awkward to use? Is there a better way to specify it? Is the data representation optimal (e.g. REST/JSON) or would a RDF/LOD standard work better? What specific standards should be enforced (e.g. data type semantics? relation predicate semantics?) Are there any practical operational issues (e.g. standards for dealing with problematic queries? Exception/error reporting)?

Complete lighting of key knowledge beacons by Oct 2017

See https://github.com/NCATS-Tangerine/translator-knowledge-beacon/blob/develop/api/knowledge-beacon-list.yaml for master list

Need more transparent management of taxon identities

The taxon (species) that a particular concept (especially, gene concepts) pertain to is an important discriminator of concepts but is not explicitly returned by the Knowledge Beacon API in any endpoint, except perhaps as ancillary data in the /concepts/ call which returns concept details. Also, the /exactmatches API calls do not also well discriminate between equivalent concepts that are taxonomically distinct (e.g. orthologous gene loci from different species). Attempting to capture and return the taxonomic identity (e.g. NCBI taxonomy id?) for concepts in the various endpoints, would help fix this issue.

Track data sources

Should probably have some parameter in the API response to indicate which service provided which piece of data. (e.g. service: http://stars.renci.org somewhere in the response).

This becomes important for clients when aggregating and displaying data coming in from many places.

Maybe just part of the evidence API ?

Data versioning

Some kbeacons may like to support the ability to access a particular version of the data. Obviously not all beacons can support this.

date should be returned in payloads, and also as an optional parameter (http header) in queries?

Knowledge Beacon 1.2.0 release proposal (discussion)

Team discussions with @vdancik, @cmungall and this writer suggest that a modest iteration on the Knowledge Beacon API is desirable. Here are the tables of proposed change (alternatives):

Id Proposal Use Case
1) Add (optional) offset input parameter to all size parameter constrained endpoints Allow sequential 'cursor' driven batching of returned items
2) Make the s parameter optional in the /statements endpoint; replace keywords parameter with s_keywords and t_keywords; replace categories parameter with s_categories and t_categories parameters Generalizes the classical s anchored statement query to potentially allow retrieval of statements matching a (subject category, predicate, object category) pattern.
3) Make the keywords parameter in /concepts optional Simple concept retrieval based on categories only (batch processing can be constrained by size and offset)

Add "exact match" mode to /concepts keyword search

To some extent, the /concepts keyword search is Google - like in that matches to any of the keywords are returned. Generally, it is suggested that a ranking of "high quality" (more keywords matched) results returned first versus "lower quality" result returned further down the list, is assumed (need to check if this expectation is well documented in the API).

However, it may be the case that users are only interested in "exact matches" to all keywords. Adding an "exactMatch" or "matchAll" flag to the input parameters as an expectation, could be helpful to make the keyword search more precise for some use cases.

Q: Do we need both: "exactMatch" might mean "exactly verbatim as written (keywords in order and including whitespace)" versus "matchAll" which might imply that the concept name must include (i.e. "match") all the keywords specified?

Add a path to return a knowledge map

with @stevencox @balhoff @jmcmurry

For smartapi and beacons we will define a way a knowledge source can advertise what types and identifiers are related by what relations/predicates.

This would be a list such as the following

-
  subject:
    semantic_type: gene
    prefixes:
       - NCBIGene
       - ENSEMBL
       - HGNC
       - MGI
  predicate:
    id: RO:nnn
    label: has phenotype
  object:
    semantic_type: phenotype
    prefixes:
       - MP
       - HP
  count: 500 # optional
  description: >
     blah blah
-
  ...      

Array collection format: multi, pipes, csv

We're currently using the multi format for arrays, which looks like:
/api/statements?c=X&c=Y&c=Z

But this raises a problem. Most people want to use Python Flask to write their API's, but the Swagger Codegen application generates API's that use Connexion, which do not support the multi collection format.

Connexion only supports the csv and pipes collection format, which looks like:
/api/statements?c=X,Y,Z
/api/statements?c=X|Y|Z

I have been trying to develop Connexion to support the multi collection format, and if I accomplish this then the issue is resolved. But otherwise, we might want to use a different collection format so that developers can use Swagger to generate Python Flask project stubs.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.