Code Monkey home page Code Monkey logo

translator-api-registry's Introduction

Build Status

translator-api-registry

This repo hosts the API metadata for the Translator project

How to add your API

  1. First, each API should create a separate folder to host its metadata. The folder "_example_api" provides basic template for adding API metadata, so you can start with copying "_example_api" folder and renaming it to your API name.
  2. Second, fill in the metadata about your API according to the instruction. Also please refer to the existing examples like "mygene.info" and "myvariant.info" APIs. See more details in the next section.
  3. Add an entry to API_LIST.yml file following the existing example. This is the master list of the APIs available in this repo. Our SmartAPI application will import all the API metadata based on this file.

If you have the permission, commit your changes to this repo. Otherwise, feel free to submit a pull-request. Please check the "build status" badge above, and make sure it's green after your changes. We run some validation tests in this "tests.py" for each commit. (Tip: you can run python tests.py locally from the root of this repo to make sure all tests pass before you commit your code.)

Specific notes for adding a Reasoner API

In addition to follow the above steps, we recommend to add these extra info into your Reasoner API metadata:

info:
    x-reasoner_standard_version: 0.9

tags:
    name: translator
    name: reasoner

How to create your OpenAPI v3 metadata

Starting from the scratch

You can use this editor to write/edit your API metadata. You can start with an existing metadata example from "mygene.info" or "myvariant.info" APIs. The editor automatically validates your API metadata and gives a live preview of auto-generated API documentation.

This OpenAPI GUI interface can also be useful for creating your API metadata from the scratch. But be aware of that this interface does not support any SmartAPI extensions (those fields with "x-" prefix) we added to the standard OpenAPI v3 specifications. You can of course add extra SmartAPI fields after you export your metadata from the GUI interface to the editor.

Converting from a Swagger/OpenAPI v2 metadata

If you already have an API metadata document in older Swagger/OpenAPI v2 specification. You can try this conversion tool to convert it to the latest OpenAPI v3 format, and then edit it in the editor:

https://mermade.org.uk/openapi-converter

http://openapiconverter.azurewebsites.net/

This converter is not perfect, but still a good starting point.

Tip: Feel free to play with your API metadata file with the tools we mentioned above, and commit your changes even when they are not fully complete or valid. As along as the metadata entry has not been added to the API_LIST.yml file (see below), you will be fine :-). When you are happy with your metadata, you can now move to the next step to add it to the API_LIST.yml file.

A code snippet to convert flask-restful auto-generated swagger v2 specification to SmartAPI metedata, kindly provided by @JohnCEarls.

API_LIST.yml file

This is a YAML file at the root of this repo to keep track of all APIs available in this repo. Our SmartAPI application will import all the API metadata based on this file and render an API registry web frontend.

For each API, you just need to add a text block like this:

- metadata: mygene.info/openapi_minimum.yml
  translator:
      - returnjson: true
        notes: ""
  • metadata field

    The value of this field should be either the URL or the relative path pointing to the API metadata. The API metadata should follow OpenAPI specifications, in either JSON or YAML format. Specifically, we support OpenAPI v3 specification documented here, plus the SmartAPI extensions documented here.

  • translator field

    This serves as the placeholder for any translator project specific API properties, e.g. adding some API-specific notes.

    • How to propose a new translator.* field?

      As we expand our list of APIs, we will need to expand our metadata fields as we needed. To do so, you can:

      • discuss it with us at our slack channel (#arch-working-group)
      • open an issue in this repo
      • submit a pull-request for your modified API_LIST.yml file

CORS support

If you want users are able to request your API from the browser, e.g. in a web application, your API should support CORS. We recommend every translator API to support CORS. Depending on your web server (e.g. Apache or Nginx) and/or the web framework (e.g. Django, Flask, Tornado) you use, you can find the relevant instruction to enable CORS for your API here, or via Google.

How to pick URIs for annotating input parameters or the response data object?

Typically for a JSON-based REST API, we use URIs to annotate both the acceptable parameter value types and the fields from the response data object, both in OpenAPI metdata files and JSON-LD context files. You can find some examples for "mygene.info" and "myvariant.info" APIs.

To help you decide which URIs to use, we maintain a "ID_MAPPING.csv" file to keep records of all URIs we will use. Feel free to add URIs for additional field types. Please make sure not to break the csv format, as that will break github's nice csv rendering and search features.

In general, we like to use the URIs from these repositories (also in that priority order)๏ผš

  1. Identifiers.org
  2. purl.uniprot.org (?)
  3. [please add]

Know a knowledge source useful for Translator, but no API available?

You can add a knowledge source (or datasets) to this DATASET_LIST.yml file. Follow the instruction and existing entries there.

Translator team members monitor this list and can potentially build an API to serve that particulara knowledge source, so that it can be better integrated in the rest of Translator API ecosystem.

translator-api-registry's People

Contributors

andrewsu avatar bill-baumgartner avatar briapersaud avatar cmungall avatar colleenxu avatar cyrus0824 avatar deepakunni3 avatar eddturner avatar edeutsch avatar edgargaticacu avatar githubbit avatar gloriachin avatar hyi avatar kevinxin90 avatar lstillwe avatar mbrush avatar mnarayan1 avatar newgene avatar phillipsowen avatar rjawesome avatar simonjupp avatar stevencox avatar stuppie avatar tokebe avatar vdancik avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

translator-api-registry's Issues

Update README, other files

README is outdated. Other files may need updates as well, see list below

.gitignore
.travis.yml
API_LIST.yml
DATASET_LIST.yml
ID_MAPPING.csv
README.md
tests.py

add registry item for Harmonizome

Harmonizome has precomputed 139800 gene-disease perturbation associations based on GEO. Would be very useful to add these to the Translator API network.

Example API call to query by gene symbol (retrieving data sets in which it is differentially expressed): http://amp.pharm.mssm.edu/Harmonizome/api/1.0/gene/nanog?showAssociations=true

Example API call to query by data set (retrieving all genes that are differentially expressed): http://amp.pharm.mssm.edu/Harmonizome/api/1.0/gene_set/V/Allen+Brain+Atlas+Adult+Human+Brain+Tissue+Gene+Expression+Profiles

Identifier Management Process

I have APIs with a few datatypes where I'm not clear on the best identifiers to use.

How should I go about finding identifiers for these?

Data Types:

  • Text: We have a service that takes two strings, treats them as English MeSH terms and returns articles relating the terms.
  • Scores: We have a related service that takes two english terms and returns a score reflecting the "distance" between the words according to a machine learning model. Is there an ontology reflecting concepts like the score?
  • Location: We have geo codes expressed as latitude and longitude.
  • Time: We have time. For automated service invocation, probably we want time format specific identifiers.

Requirements for smartAPI 'semantic type' extensions

Following up on hackathon discussions about extending the smartAPI spec to capture the 'semantic type' of entities described by data. One thing I would like these extensions/refinements to support is derivation of a simple 'Translator API Catalog' that contains human-readable, summary-level descriptions of API content and accessibility (see #8).

An informal schema for this catalog is here, and describes the elements we would ideally like to extract or derive from the smartAPI files. About half of these elements map directly to existing smartAPI element/fields, so are already achievable (apiName, accessURL, description, license, termsOfService, accessRestrictions).

The semantic typing extensions would ideally support derivation of two additional elements in the catalog schema:

  • entityTypes: array of terms describing the entity types that the data returned by the API is about
  • entityAssociations: and array of hyphen-separated pairs of entities types describing associations that can be obtained form the API (and optionally a relationship type or entity role)

Other elements in the catalog schema that we would like to derive include:

But if these are not suited for inclusion in the smartAPI spec, they could be implemented elsewhere - e.g. added under the "translator" element in the API_LIST.yml file, where other Translator-specific metadata currently lives.

Pathfinding approaches to ML-Guided knowledge exploration

Use an integrated neo4j database to explore how human and machine learning agents might collaborate to extract evidence from knowledge graphs to derive predictions and mechanistic hypotheses.

Tasks

  1. Load a neo4j instance with diverse data types from Monarch and SemMed DB databases.
  2. Define and optimize cypher 'pathfinding' query templates.
  3. Apply templates toward answering selected CQs - start with 'positive control' queries that look for paths through the graph providing evidence supporting a known fact/mechanism (e.g. ALDH2 as a known modifier of FA, or cyclodextrin as a successful re-purposing for Niemann-Pick disease).
  4. Manually explore query results by evaluating types of paths returned, defining rules/approaches to identify most meaningful evidence, and refining queries to hone in on these paths in the data.
  5. Explore machine learning approaches to automate this process, and derive evidence-based predictions from data in knowledge graphs.
  6. Explore approaches/interfaces for human intervention in this process - i.e. how to present underlying rationale for automated predictions in a way that allows human users to evaluate the evidence, refine and extend queries based on this, and inform new experiments and analyses.

Goals

  1. Understand data and modeling requirements for this type of approach
  2. Inform architectural requirements for BB and reasoner applications - particularly w.r.t automated/machine learning methods that can help weight evidence and make predictions, and interfaces for human intervention in refining and extending ML results.
  3. Provide end-to-end examples of what open-ended, ML-guided exploration and discovery in the Translator might look like in practice.

Valued Expertise

  1. Monarch/SemMedDB data
  2. Cypher query language and graph-based algorithms (e.g. for pathfinding, traversals, edge-weighting)
  3. Visualization of graph data and paths
  4. Machine learning approaches

Tests are failing on master

Something seems to have gone awry with a recent push:
66676ee
Apparently the GTex API is missing the tags field. It also appears to be a JSON file with a .yaml extension, so I'm not sure quite how that's supposed to be handled.

To avoid this in the future, an owner of the repo could protect the master branch and enforce that tests must be passing before anything can be merged in. @newgene, do you have the power to do that?

This is an example: drug to disease ingest wanted

Please enter information where you see elipses ... below. The template is a guide, follow it where you can.

Data Source

Optional. Name of the preferred source for this data. E.g. OMIM, DrugBank

  • DATABASE: DrugBank

Main Entity Type

What kind of entity are you interested in? Biolink type preferred, e.g. variant, disease, gene

  • SUBJECT: drug

Connected Entity Type

What kind of thing should the subject be connected to? E.g. for drug-disease links this would be 'disease'. Biolink type preferred

  • OBJECT: disease

  • RELATION: treats

Example statement

Enter one or more examples of statements/edges you would expect to retrieve using this source. For example, "aldehydes exacerbates Fanconi anemia"

  • EXAMPLE 1: imatinib treats cancer
  • EXAMPLE 2: ...

Usage

How will this be used? Give a URL of a workflow if possible

  • USAGE: ...

Any other information:

Enter any information you like here

add record for iPTMnet API

Data Source

Optional. Name of the preferred source for this data. E.g. OMIM, DrugBank

Main Entity Type

What kind of entity are you interested in? Biolink type preferred, e.g. variant, disease, gene

  • SUBJECT: Protein

Connected Entity Type

What kind of thing should the subject be connected to? E.g. for drug-disease links this would be 'disease'. Biolink type preferred

Example statement

Enter one or more examples of statements/edges you would expect to retrieve using this source. For example, "aldehydes exacerbates Fanconi anemia". In future this could be used to drive integration tests

  • EXAMPLE 1: CDK2 is phosphorylated by CDK7

From https://research.bioinformatics.udel.edu/iptmnet/api/P24941/substrate, this snippet asserts that CDK7 phosphorylates CDK2 with various bits of provenance

{
  "residue": "T",
  "site": "T165",
  "ptm_type": "Phosphorylation",
  "score": 3,
  "sources": [
    {
      "name": "RLIMS-P",
      "label": "rlimsp",
      "url": "http://research.bioinformatics.udel.edu/rlimsp/"
    },
    {
      "name": "PSP",
      "label": "psp",
      "url": "http://www.phosphosite.org/"
    }
  ],
  "enzymes": [
    {
      "id": "P50613",
      "enz_type": "uniprot_ac",
      "name": "CDK7"
    }
  ],
  "pmids": [
    "18396144"
  ]
}
  • EXAMPLE 2: CDK2 phosphorylates MCM4

From https://research.bioinformatics.udel.edu/iptmnet/api/P24941/as-enzyme, this snippet asserts that CDK2 phosphorylates MCM4 with various bits of provenance

{
  "substrate": "P33991",
  "substrate_symbol": "MCM4",
  "site": "S3",
  "score": 0,
  "sources": [
    {
      "name": "HPRD",
      "label": "hprd",
      "url": "http://www.hprd.org/"
    }
  ],
  "pmids": [
    "19651622",
    "16519687",
    "20068231"
  ]
}

Preferred format or ingest method

How would you prefer to get the data? Via API or Data dump that can be ingested into your KG? Would you prefer a smart API registry entry, a neo4j dump, biolink-compliant CSV/RDF/JSON that can be loaded with KGX?

Remove/move old or unused registrations?

Delete or move some registrations to a "deprecated" branch.

Some registrations/folders are from previous phases of Translator and may not be needed.

registered/previously-updated by Su/Wu lab:

  • automat-covid-phenotypes
  • broad-pgm
  • chembio
  • civic/jsonld_context
  • clinical_risk_factor
  • cohd
  • cord
  • ebi_ontology_lookup_service_api
  • genetics_provider
  • harmonize
  • hetio
  • hmdb
  • openfda/jsonld_context
  • pharos
  • reactome/jsonld_context
  • scibite
  • scigraph

Duplicate files?
MolePro vs molecular_data_provider

From other teams/previous phases:

  • CTD_api
  • FooDB_api
  • GTEx_api
  • GT_cmaq_exposures_api
  • GT_endotypes_api
  • GT_exposures_api
  • _example_api
  • api_catalog
  • bicluster_api
  • depmap
  • depmap_bicluster_api
  • disease ontology api
  • ebi_ontology_mapping_service_api
  • ebi_proteins
  • ebi_proteins_taxonomy
  • ensembl
  • gene_knockout_correlation
  • greent
  • hpotomondo_bicluster_api
  • indigo
  • rnaseqdb_bicluster_api
  • robokop
  • robokop_extend
  • robokop_messenger
  • rtx

drug drug interaction data wanted (EXAMPLE ONLY)

Please enter information where you see elipses ... below. The template is a guide, follow it where you can.

Data Source

Optional. Name of the preferred source for this data. E.g. OMIM, DrugBank

  • DATABASE: pharmkgb

Main Entity Type

What kind of entity are you interested in? Biolink type preferred, e.g. variant, disease, gene

  • SUBJECT: drug

Connected Entity Type

What kind of thing should the subject be connected to? E.g. for drug-disease links this would be 'disease'. Biolink type preferred

  • OBJECT: drug

  • RELATION: interacts_with

Example statement

Enter one or more examples of statements/edges you would expect to retrieve using this source. For example, "aldehydes exacerbates Fanconi anemia". In future this could be used to drive integration tests

  • EXAMPLE 1: imatinib interacts_with ??
  • EXAMPLE 2: ...

Preferred format or ingest method

How would you prefer to get the data? Via API or Data dump that can be ingested into your KG? Would you prefer a smart API registry entry, a neo4j dump, biolink-compliant CSV/RDF/JSON that can be loaded with KGX?

  • METHOD: ...

Usage

How will this be used? Give a URL of a workflow if possible

  • USAGE: ...

Any other information:

Enter any information you like here

Create metadata for a Translator API Catalog

At present there is no comprehensive and up to date catalog of Translator APIs that provides high-level description of the content and accessibility of data served by each. This has been a barrier for many Translator efforts.

We are asking an API developer or representative to complete a short metadata record describing their API. The schema and templates provided below should allow contributors to enter metadata in 15 minutes or less, and we aim to have complete records for all translator APIs by the end of the Jan 2018 Hackathon.

The metadata in this catalog is intended to complement the more granular metadata collected in the smartAPI registry, to provide summary-level information about the content, accessibility, and utility of each API for technical and non-technical users.

Collection of this metadata will be coordinated with Translator smartAPI registry efforts. Specifically, we will extend the yaml format used the existing API_LIST.yml file with additional fields, as defined in the schema.yaml file.


Instructions for creating a metadata record:

  1. Go to the API list at http://bit.ly/apicatalog and add your name to the sign up sheet.
  2. Review the schema.yml file, and the example_metadata.yml completed record.
  3. Copy text from either the example_metadata.yml record or this template.yml file, and overwrite it in a new file with metadata about your API.
  4. Name your metadata file [api name]_metadata.yml and commit it to the api_catalog/records folder in the translator-api-registry repo.

Notes:

  1. If a particular field does not apply to your API, enter 'null' as the value.
  2. Value set enumerations for fields with controlled entry are provided at the end of the schema.yml file.
  3. Feel free to extend the 'Entity Type' value set (ENUM3) directly in the schema file and commit it back with these changes. However, all other value sets are not directly extensible - if you want to alter these value sets please make a ticket proposing the change/extension.

I will hold 'office hour' sessions for at least one hour each day at the hackathon to answer questions and take feedback about the metadata schema and process. Additionally, questions or feedback can be recorded as comments on this ticket.

Once completed, I will merge the content of each API's metadata file into the API_LIST.yml file. From here we can automate derivation of various artifacts (e.g. a spreadsheet view for easier viewing/filtering).

extend EBI OLS registry item

currently have endpoint for going from string to DOID. Extend to use DOID to get parents, children, xrefs. Do initially for diseases, but in theory could be expanded to other types.

(based on discussion with @stuppie and how disease ontology endpoint does this in a somewhat non-intuitive way.)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.