ncats-tangerine / translator-api-registry Goto Github PK

View Code? Open in Web Editor NEW

13.0 18.0 30.0 7.92 MB

This repo hosts the API metadata for the Translator project

Python 1.27% Jupyter Notebook 98.73%

ncats-translator

translator-api-registry's Introduction

translator-api-registry

This repo hosts the API metadata for the Translator project

How to add your API

First, each API should create a separate folder to host its metadata. The folder "_example_api" provides basic template for adding API metadata, so you can start with copying "_example_api" folder and renaming it to your API name.
Second, fill in the metadata about your API according to the instruction. Also please refer to the existing examples like "mygene.info" and "myvariant.info" APIs. See more details in the next section.
Add an entry to API_LIST.yml file following the existing example. This is the master list of the APIs available in this repo. Our SmartAPI application will import all the API metadata based on this file.

If you have the permission, commit your changes to this repo. Otherwise, feel free to submit a pull-request. Please check the "build status" badge above, and make sure it's green after your changes. We run some validation tests in this "tests.py" for each commit. (Tip: you can run python tests.py locally from the root of this repo to make sure all tests pass before you commit your code.)

Specific notes for adding a Reasoner API

In addition to follow the above steps, we recommend to add these extra info into your Reasoner API metadata:

info:
    x-reasoner_standard_version: 0.9

tags:
    name: translator
    name: reasoner

How to create your OpenAPI v3 metadata

Starting from the scratch

You can use this editor to write/edit your API metadata. You can start with an existing metadata example from "mygene.info" or "myvariant.info" APIs. The editor automatically validates your API metadata and gives a live preview of auto-generated API documentation.

This OpenAPI GUI interface can also be useful for creating your API metadata from the scratch. But be aware of that this interface does not support any SmartAPI extensions (those fields with "x-" prefix) we added to the standard OpenAPI v3 specifications. You can of course add extra SmartAPI fields after you export your metadata from the GUI interface to the editor.

Converting from a Swagger/OpenAPI v2 metadata

If you already have an API metadata document in older Swagger/OpenAPI v2 specification. You can try this conversion tool to convert it to the latest OpenAPI v3 format, and then edit it in the editor:

https://mermade.org.uk/openapi-converter

http://openapiconverter.azurewebsites.net/

This converter is not perfect, but still a good starting point.

Tip: Feel free to play with your API metadata file with the tools we mentioned above, and commit your changes even when they are not fully complete or valid. As along as the metadata entry has not been added to the API_LIST.yml file (see below), you will be fine :-). When you are happy with your metadata, you can now move to the next step to add it to the API_LIST.yml file.

A code snippet to convert flask-restful auto-generated swagger v2 specification to SmartAPI metedata, kindly provided by @JohnCEarls.

API_LIST.yml file

This is a YAML file at the root of this repo to keep track of all APIs available in this repo. Our SmartAPI application will import all the API metadata based on this file and render an API registry web frontend.

For each API, you just need to add a text block like this:

- metadata: mygene.info/openapi_minimum.yml
  translator:
      - returnjson: true
        notes: ""

metadata field

The value of this field should be either the URL or the relative path pointing to the API metadata. The API metadata should follow OpenAPI specifications, in either JSON or YAML format. Specifically, we support OpenAPI v3 specification documented here, plus the SmartAPI extensions documented here.
translator field

This serves as the placeholder for any translator project specific API properties, e.g. adding some API-specific notes.
- How to propose a new translator.* field?
  
  As we expand our list of APIs, we will need to expand our metadata fields as we needed. To do so, you can:
  - discuss it with us at our slack channel (#arch-working-group)
  - open an issue in this repo
  - submit a pull-request for your modified API_LIST.yml file

CORS support

If you want users are able to request your API from the browser, e.g. in a web application, your API should support CORS. We recommend every translator API to support CORS. Depending on your web server (e.g. Apache or Nginx) and/or the web framework (e.g. Django, Flask, Tornado) you use, you can find the relevant instruction to enable CORS for your API here, or via Google.

How to pick URIs for annotating input parameters or the response data object?

Typically for a JSON-based REST API, we use URIs to annotate both the acceptable parameter value types and the fields from the response data object, both in OpenAPI metdata files and JSON-LD context files. You can find some examples for "mygene.info" and "myvariant.info" APIs.

To help you decide which URIs to use, we maintain a "ID_MAPPING.csv" file to keep records of all URIs we will use. Feel free to add URIs for additional field types. Please make sure not to break the csv format, as that will break github's nice csv rendering and search features.

In general, we like to use the URIs from these repositories (also in that priority order)：

Identifiers.org
purl.uniprot.org (?)
[please add]

Know a knowledge source useful for Translator, but no API available?

You can add a knowledge source (or datasets) to this DATASET_LIST.yml file. Follow the instruction and existing entries there.

Translator team members monitor this list and can potentially build an API to serve that particulara knowledge source, so that it can be better integrated in the rest of Translator API ecosystem.

translator-api-registry's People

Contributors

Stargazers

Watchers

translator-api-registry's Issues

Update README, other files

README is outdated. Other files may need updates as well, see list below

.gitignore
.travis.yml
API_LIST.yml
DATASET_LIST.yml
ID_MAPPING.csv
README.md
tests.py

Tweaks to PFOCR's API

https://github.com/NCATS-Tangerine/translator-api-registry/blob/master/pfocr/smartapi.yaml#L758

I plan to swap in pfocrURL to replace figureURL as the ref_url in the registry (once we've updated the source data at BTE).

I would also like to include a new field to continue to pass along the figureURL for potential UI usage, but I'm not sure which field key to use...

add registry item for Harmonizome

Harmonizome has precomputed 139800 gene-disease perturbation associations based on GEO. Would be very useful to add these to the Translator API network.

Example API call to query by gene symbol (retrieving data sets in which it is differentially expressed): http://amp.pharm.mssm.edu/Harmonizome/api/1.0/gene/nanog?showAssociations=true

Example API call to query by data set (retrieving all genes that are differentially expressed): http://amp.pharm.mssm.edu/Harmonizome/api/1.0/gene_set/V/Allen+Brain+Atlas+Adult+Human+Brain+Tissue+Gene+Expression+Profiles

Update registry for exposures_api - change to cmaq_exposures_api

Update the GT_exposures_api to reflect current cmaq-exposures-api. Change name to GT_cmaq_exposures_api

Identifier Management Process

I have APIs with a few datatypes where I'm not clear on the best identifiers to use.

How should I go about finding identifiers for these?

Data Types:

Text: We have a service that takes two strings, treats them as English MeSH terms and returns articles relating the terms.
Scores: We have a related service that takes two english terms and returns a score reflecting the "distance" between the words according to a machine learning model. Is there an ontology reflecting concepts like the score?
Location: We have geo codes expressed as latitude and longitude.
Time: We have time. For automated service invocation, probably we want time format specific identifiers.

Provide link to web UI for editing openapi v3

Requirements for smartAPI 'semantic type' extensions

Following up on hackathon discussions about extending the smartAPI spec to capture the 'semantic type' of entities described by data. One thing I would like these extensions/refinements to support is derivation of a simple 'Translator API Catalog' that contains human-readable, summary-level descriptions of API content and accessibility (see #8).

An informal schema for this catalog is here, and describes the elements we would ideally like to extract or derive from the smartAPI files. About half of these elements map directly to existing smartAPI element/fields, so are already achievable (apiName, accessURL, description, license, termsOfService, accessRestrictions).

The semantic typing extensions would ideally support derivation of two additional elements in the catalog schema:

entityTypes: array of terms describing the entity types that the data returned by the API is about
entityAssociations: and array of hyphen-separated pairs of entities types describing associations that can be obtained form the API (and optionally a relationship type or entity role)

Other elements in the catalog schema that we would like to derive include:

But if these are not suited for inclusion in the smartAPI spec, they could be implemented elsewhere - e.g. added under the "translator" element in the API_LIST.yml file, where other Translator-specific metadata currently lives.

Wanted: pharmacogenomic data source

@MarkDWilliams will provide details. This is a request from chemistry team

Pathfinding approaches to ML-Guided knowledge exploration

Use an integrated neo4j database to explore how human and machine learning agents might collaborate to extract evidence from knowledge graphs to derive predictions and mechanistic hypotheses.

Tasks

Load a neo4j instance with diverse data types from Monarch and SemMed DB databases.
Define and optimize cypher 'pathfinding' query templates.
Apply templates toward answering selected CQs - start with 'positive control' queries that look for paths through the graph providing evidence supporting a known fact/mechanism (e.g. ALDH2 as a known modifier of FA, or cyclodextrin as a successful re-purposing for Niemann-Pick disease).
Manually explore query results by evaluating types of paths returned, defining rules/approaches to identify most meaningful evidence, and refining queries to hone in on these paths in the data.
Explore machine learning approaches to automate this process, and derive evidence-based predictions from data in knowledge graphs.
Explore approaches/interfaces for human intervention in this process - i.e. how to present underlying rationale for automated predictions in a way that allows human users to evaluate the evidence, refine and extend queries based on this, and inform new experiments and analyses.

Goals

Understand data and modeling requirements for this type of approach
Inform architectural requirements for BB and reasoner applications - particularly w.r.t automated/machine learning methods that can help weight evidence and make predictions, and interfaces for human intervention in refining and extending ML results.
Provide end-to-end examples of what open-ended, ML-guided exploration and discovery in the Translator might look like in practice.

Valued Expertise

Monarch/SemMedDB data
Cypher query language and graph-based algorithms (e.g. for pathfinding, traversals, edge-weighting)
Visualization of graph data and paths
Machine learning approaches

Tests are failing on master

Something seems to have gone awry with a recent push:
66676ee
Apparently the GTex API is missing the tags field. It also appears to be a JSON file with a .yaml extension, so I'm not sure quite how that's supposed to be handled.

To avoid this in the future, an owner of the repo could protect the master branch and enforce that tests must be passing before anything can be merged in. @newgene, do you have the power to do that?

Ensure we have github usernames in contact fields for all apis

This is an example: drug to disease ingest wanted

Please enter information where you see elipses ... below. The template is a guide, follow it where you can.

Data Source

Optional. Name of the preferred source for this data. E.g. OMIM, DrugBank

DATABASE: DrugBank

Main Entity Type

What kind of entity are you interested in? Biolink type preferred, e.g. variant, disease, gene

SUBJECT: drug

Connected Entity Type

What kind of thing should the subject be connected to? E.g. for drug-disease links this would be 'disease'. Biolink type preferred

OBJECT: disease
RELATION: treats

Example statement

Enter one or more examples of statements/edges you would expect to retrieve using this source. For example, "aldehydes exacerbates Fanconi anemia"

EXAMPLE 1: imatinib treats cancer
EXAMPLE 2: ...

Usage

How will this be used? Give a URL of a workflow if possible

USAGE: ...

Any other information:

Enter any information you like here

add record for iPTMnet API

Data Source

Optional. Name of the preferred source for this data. E.g. OMIM, DrugBank

iPTMnet -- more info at https://research.bioinformatics.udel.edu/iptmnet/

Main Entity Type

What kind of entity are you interested in? Biolink type preferred, e.g. variant, disease, gene

SUBJECT: Protein

Connected Entity Type

What kind of thing should the subject be connected to? E.g. for drug-disease links this would be 'disease'. Biolink type preferred

OBJECT: Protein
RELATION: entity regulated by entity / entity regulates entity

Example statement

Enter one or more examples of statements/edges you would expect to retrieve using this source. For example, "aldehydes exacerbates Fanconi anemia". In future this could be used to drive integration tests

EXAMPLE 1: CDK2 is phosphorylated by CDK7

From https://research.bioinformatics.udel.edu/iptmnet/api/P24941/substrate, this snippet asserts that CDK7 phosphorylates CDK2 with various bits of provenance

{
  "residue": "T",
  "site": "T165",
  "ptm_type": "Phosphorylation",
  "score": 3,
  "sources": [
    {
      "name": "RLIMS-P",
      "label": "rlimsp",
      "url": "http://research.bioinformatics.udel.edu/rlimsp/"
    },
    {
      "name": "PSP",
      "label": "psp",
      "url": "http://www.phosphosite.org/"
    }
  ],
  "enzymes": [
    {
      "id": "P50613",
      "enz_type": "uniprot_ac",
      "name": "CDK7"
    }
  ],
  "pmids": [
    "18396144"
  ]
}

EXAMPLE 2: CDK2 phosphorylates MCM4

From https://research.bioinformatics.udel.edu/iptmnet/api/P24941/as-enzyme, this snippet asserts that CDK2 phosphorylates MCM4 with various bits of provenance

{
  "substrate": "P33991",
  "substrate_symbol": "MCM4",
  "site": "S3",
  "score": 0,
  "sources": [
    {
      "name": "HPRD",
      "label": "hprd",
      "url": "http://www.hprd.org/"
    }
  ],
  "pmids": [
    "19651622",
    "16519687",
    "20068231"
  ]
}

Preferred format or ingest method

How would you prefer to get the data? Via API or Data dump that can be ingested into your KG? Would you prefer a smart API registry entry, a neo4j dump, biolink-compliant CSV/RDF/JSON that can be loaded with KGX?

METHOD: SmartAPI annotation -- note existing Swagger documentation at https://research.bioinformatics.udel.edu/iptmnet/api/doc/

Add entry for Broad PGM API

Remove/move old or unused registrations?

Delete or move some registrations to a "deprecated" branch.

Some registrations/folders are from previous phases of Translator and may not be needed.

registered/previously-updated by Su/Wu lab:

automat-covid-phenotypes
broad-pgm
chembio
civic/jsonld_context
clinical_risk_factor
cohd
cord
ebi_ontology_lookup_service_api
genetics_provider
harmonize
hetio
hmdb
openfda/jsonld_context
pharos
reactome/jsonld_context
scibite
scigraph

Duplicate files?
MolePro vs molecular_data_provider

From other teams/previous phases:

CTD_api
FooDB_api
GTEx_api
GT_cmaq_exposures_api
GT_endotypes_api
GT_exposures_api
_example_api
api_catalog
bicluster_api
depmap
depmap_bicluster_api
disease ontology api
ebi_ontology_mapping_service_api
ebi_proteins
ebi_proteins_taxonomy
ensembl
gene_knockout_correlation
greent
hpotomondo_bicluster_api
indigo
rnaseqdb_bicluster_api
robokop
robokop_extend
robokop_messenger
rtx

Add entry for gnomad graphql

http://gnomad-api.broadinstitute.org/

Note that we don't have a strategy for graphql yet...

Add link to swagger v2->v3 converter to README

drug drug interaction data wanted (EXAMPLE ONLY)

Please enter information where you see elipses ... below. The template is a guide, follow it where you can.

Data Source

Optional. Name of the preferred source for this data. E.g. OMIM, DrugBank

DATABASE: pharmkgb

Main Entity Type

What kind of entity are you interested in? Biolink type preferred, e.g. variant, disease, gene

SUBJECT: drug

Connected Entity Type

What kind of thing should the subject be connected to? E.g. for drug-disease links this would be 'disease'. Biolink type preferred

OBJECT: drug
RELATION: interacts_with

Example statement

EXAMPLE 1: imatinib interacts_with ??
EXAMPLE 2: ...

Preferred format or ingest method

METHOD: ...

Usage

How will this be used? Give a URL of a workflow if possible

USAGE: ...

Any other information:

Enter any information you like here

Create metadata for a Translator API Catalog

At present there is no comprehensive and up to date catalog of Translator APIs that provides high-level description of the content and accessibility of data served by each. This has been a barrier for many Translator efforts.

We are asking an API developer or representative to complete a short metadata record describing their API. The schema and templates provided below should allow contributors to enter metadata in 15 minutes or less, and we aim to have complete records for all translator APIs by the end of the Jan 2018 Hackathon.

The metadata in this catalog is intended to complement the more granular metadata collected in the smartAPI registry, to provide summary-level information about the content, accessibility, and utility of each API for technical and non-technical users.

Collection of this metadata will be coordinated with Translator smartAPI registry efforts. Specifically, we will extend the yaml format used the existing API_LIST.yml file with additional fields, as defined in the schema.yaml file.

Instructions for creating a metadata record:

Go to the API list at http://bit.ly/apicatalog and add your name to the sign up sheet.
Review the schema.yml file, and the example_metadata.yml completed record.
Copy text from either the example_metadata.yml record or this template.yml file, and overwrite it in a new file with metadata about your API.
Name your metadata file [api name]_metadata.yml and commit it to the api_catalog/records folder in the translator-api-registry repo.

Notes:

If a particular field does not apply to your API, enter 'null' as the value.
Value set enumerations for fields with controlled entry are provided at the end of the schema.yml file.
Feel free to extend the 'Entity Type' value set (ENUM3) directly in the schema file and commit it back with these changes. However, all other value sets are not directly extensible - if you want to alter these value sets please make a ticket proposing the change/extension.

I will hold 'office hour' sessions for at least one hour each day at the hackathon to answer questions and take feedback about the metadata schema and process. Additionally, questions or feedback can be recorded as comments on this ticket.

Once completed, I will merge the content of each API's metadata file into the API_LIST.yml file. From here we can automate derivation of various artifacts (e.g. a spreadsheet view for easier viewing/filtering).

extend EBI OLS registry item

currently have endpoint for going from string to DOID. Extend to use DOID to get parents, children, xrefs. Do initially for diseases, but in theory could be expanded to other types.

(based on discussion with @stuppie and how disease ontology endpoint does this in a somewhat non-intuitive way.)

ncats-tangerine / translator-api-registry Goto Github PK

translator-api-registry's Introduction

translator-api-registry

How to add your API

Specific notes for adding a Reasoner API

How to create your OpenAPI v3 metadata

Starting from the scratch

Converting from a Swagger/OpenAPI v2 metadata

API_LIST.yml file

CORS support

How to pick URIs for annotating input parameters or the response data object?

Know a knowledge source useful for Translator, but no API available?

translator-api-registry's People

Contributors

Stargazers

Watchers

Forkers

translator-api-registry's Issues

Tasks

Goals

Valued Expertise

Recommend Projects

Recommend Topics

Recommend Org