biothings / biothings_explorer Goto Github PK
View Code? Open in Web Editor NEWTRAPI service for BioThings Explorer
Home Page: https://api.bte.ncats.io
License: Apache License 2.0
TRAPI service for BioThings Explorer
Home Page: https://api.bte.ncats.io
License: Apache License 2.0
Current no test implemented for /v1/team/{team_name}/query endpoint. We need to implement tests to ensure it's working correctly.
BTE is not correctly interpreting mygene.info output on GO annotations because it is ignoring the qualifiers. I believe the fix involves a modification of the mygene.info SmartAPI record (and hopefully TRAPI has a way of expressing qualifiers). Example below...
I issued this query to get BiologicalProcess
es related to the gene VAMP2
:
{
"message": {
"query_graph": {
"nodes": {
"n0": {
"id": "NCBIGENE:6844",
"category":"biolink:Gene"
},
"n1": {
"category": "biolink:BiologicalProcess"
}
},
"edges": {
"e01": {
"subject": "n0",
"object": "n1"
}
}
}
}
}
The following edge linking VAMP2
to neutrophil degranulation (GO:0043312)
is returned in the output:
"NCBIGENE:6844-GO:0043312-MyGene.info API-NCBI Gene": {
"predicate": "biolink:participates_in",
"subject": "NCBIGENE:6844",
"object": "GO:0043312",
"attributes": [
{
"name": "provided_by",
"value": "NCBI Gene",
"type": "biolink:provided_by"
},
{
"name": "api",
"value": "MyGene.info API",
"type": "bts:api"
},
{
"name": "evidence",
"value": "IMP",
"type": "bts:evidence"
},
{
"name": "publications",
"value": [
"PMID:16677249"
],
"type": "biolink:publications"
}
]
},
The original content from http://mygene.info/v3/gene/6844?fields=go looks like this:
{
"evidence": "IMP",
"gocategory": "BP",
"id": "GO:0043312",
"pubmed": 16677249,
"qualifier": "NOT",
"term": "neutrophil degranulation"
},
Critically, the NOT
qualifier in the mygene.info record is not being shown in the TRAPI BTE output, which completely reverses the interpretation.
Individual SmartAPI TRAPI interface should enable id resolution by default.
If the SmartAPI is from text mining teams, disable the id resolution module.
The package needs to be separate from current TRAPI code repo.
It should perform:
Currently, BTE use BioThings APIs to resolve identifiers, which requires category (e.g. Gene, ChemicalSubstance) to be specified.
TRAPI standard does allow user to specify a query without category info.
So in order to support that, we should include NodeNormalizer as a fallback.
The issue at NCATSTranslator/testing#10 reports that BTE does not return any results for the following query:
{
"message": {
"query_graph": {
"nodes": {
"n0": {
"id": "UniProtKB:P52788",
"category":"biolink:Gene"
},
"n1": {
"category": "biolink:ChemicalSubstance"
}
},
"edges": {
"e01": {
"subject": "n0",
"object": "n1"
}
}
}
}
}
If I convert UniProtKB:P52788
to NCBIGENE:6611
(based on http://mygene.info/v3/query?q=P52788&fields=entrezgene,uniprot), the query returns many results as expected. I tried adjusting the category
for n0
to biolink:Protein
and biolink:GenomicEntity
, but those queries also return zero results. What is the proper way to form a BTE TRAPI query for a UniProtKB CURIE?
Current /query endpoint use a static copy of SmartAPI Specs from smartapi-kg nodejs package.
It should dynamically query SmartAPI API for specs at run time.
The TRAPI service needs to access and modify ./log folder. Need to set the UID & GID to be the same as the UID & GID for ./log folder in our server.
{
"message": {
"query_graph": {
"nodes": {
"n00": {
"id": "MONDO:0002715",
"category": "biolink:Disease"
},
"n01": {
"category": "biolink:ChemicalSubstance"
},
"n02": {
"category": "biolink:Gene"
}
},
"edges": {
"e00": {
"predicate": "biolink:correlated_with",
"subject": "n00",
"object": "n01"
},
"e01": {
"predicate": "biolink:related_to",
"subject": "n01",
"object": "n02"
}
}
}
}
}
Error:
{
"error": "TypeError: Cannot convert undefined or null to object"
}
See details in this issue: NCATSTranslator/testing#12
According to TRAPI: predicate is supported as list or as a string
However, current BTE implementation doesn't support list.
The following query fails:
{
"message": {
"query_graph": {
"edges": {
"e00": {
"object": "n01",
"subject": "n00",
"predicate": ["biolink:physically_interacts_with"]
}
},
"nodes": {
"n00": {
"category": "biolink:ChemicalSubstance",
"id": "DRUGBANK:DB00188"
},
"n01": {
"category": "biolink:Gene"
}
}
}
}
}
The error message is:
{
"error": "TypeError: this.predicate.startsWith is not a function"
}
Current logging only provides how a TRAPI query is parsed and how SmartAPI kg is used. Should include additional information such as:
Above need support from other bte related nodejs packages.
{
"message": {
"query_graph": {
"nodes": {
"n0": {
"id": "WIKIPATHWAYS:Pathway:WP195",
"category": "biolink:Pathway"
},
"n1": {
"category": "biolink:Gene"
},
"n2": {
"category": "biolink:ChemicalSubstance"
}
},
"edges": {
"e01": {
"subject": "n0",
"object": "n1"
},
"e02": {
"subject": "n1",
"object": "n2"
}
}
}
}
}
slice
Should have /query endpoint have the same implementation as /v1/query.
Probably should use regex when specifying routing. e.g. (v1)?/query
See expressjs routing mechanism: https://expressjs.com/en/guide/routing.html
This is helpful to speed up nodejs app when there're multiple queries asking for the same edge.
Use redis docker image for easier deployment.
Use .env to store redis url/password info.
Summary: I think BTE is making an error in setting up the API request for LINCS data portal API. We are required to provide the input ID as a curie, so I set it as a ChemicalSubstance with the id "LINCS:LSM-1023" (which is imatinib). The logs show that the LINCS API query is then (see the bold for the error):
{
"timestamp": "2021-03-24T04:11:46.587Z",
"level": "DEBUG",
"message": "call-apis: Succesfully made the following query: {\"url\":\"http://lincsportal.ccs.miami.edu/dcic/api/drugindication\",\**"params\":{\"id\":\"LINCS:LSM-1023\"}**,\"method\":\"get\",\"timeout\":50000}",
"code": null
},
Looking at the smartapi page for LINCS data portal, the id field should not have a prefix...it should only have the id "LSM-1023".
The situation: I tried to query the LINCS data portal API thru BTE's /v1/smartapi/{smartapi_id}/query endpoint.
The smartapi_id is 9ee398a738916a98b612068cc022454f, the request body is:
{
"message": {
"query_graph": {
"edges": {
"e00": {
"object": "n01",
"subject": "n00"
}
},
"nodes": {
"n00": {
"category": "biolink:ChemicalSubstance",
"id": "LINCS:LSM-1023"
},
"n01": {
"category": "biolink:Disease"
}
}
}
}
}
It returns no hits.
However, if I query the LINCS Data portal endpoint directly with the id as "LSM-1023", I get multiple results like:
{"documents": [
{
"lsm_id":"LSM-1023",
"efo_id":"Orphanet:44890",
"efo_term":"GASTROINTESTINAL STROMAL TUMOR",
"max_fda_phase_for_ind":"4",
"mesh_heading":"GASTROINTESTINAL STROMAL TUMORS",
"mesh_id":"D046152"
}
,
{
"lsm_id":"LSM-1023",
"efo_id":"EFO:0000691",
"efo_term":"SARCOMA",
"max_fda_phase_for_ind":"3",
"mesh_heading":"SARCOMA",
"mesh_id":"D012509"
}
Note: I'm not sure if the BTE Python client has an issue with this API too, since it accepts only LINCS IDs and I'm not sure if BTE will ever end up querying it.
Current behavior in edge response:
"attributes": [
{
"name": "provided_by",
"value": "Text Mining KP",
"type": "biolink:provided_by"
},
{
"name": "api",
"value": "Text Mining Targeted Association API",
"type": "bts:api"
},
{
"name": "CHEBI",
"value": "CHEBI:32630",
"type": "bts:CHEBI"
},
{
"name": "object_spans",
"value": [
"start: 91, end: 96",
"start: 62, end: 67"
],
"type": "bts:object_spans"
},
{
"name": "relation_spans",
"value": [
"",
""
],
"type": "bts:relation_spans"
},
{
"name": "score",
"value": [
"0.9994468",
"0.97133327"
],
"type": "bts:score"
},
{
"name": "sentence",
"value": [
"Dietary restriction of leucine for at least three days could result in the inactivation of Hsf-1, leading to a reduction in Hsp70 synthesis.",
"However, in cells that were leucine starved for 3 and 4 days, Hsf-1 activity and Hsp70 synthesis level was dramatically decreased."
],
"type": "bts:sentence"
},
{
"name": "subject_spans",
"value": [
"start: 23, end: 30",
"start: 28, end: 35"
],
"type": "bts:subject_spans"
},
{
"name": "publications",
"value": [
"PMID:31397439",
"PMID:31397439"
],
"type": "biolink:publications"
}
]
Information such as CHEBI does not belong here. Needs to be removed.
{
"message": {
"query_graph": {
"nodes": {
"a": {
"category": "biolink:Disease",
"id": "MESH:D015464"
},
"b": {
"category": "biolink:ChemicalSubstance",
"id": "CHEBI:45783"
},
"c": {
"category": "biolink:Gene"
}
},
"edges": {
"ac": {
"subject": "a",
"object": "c"
},
"bc": {
"subject": "c",
"object": "b"
}
}
}
},
"knowledge_graph": {
"nodes": [],
"edges": []
},
"results": []
}
Currently edges are grouped by (subject-object-api-source)
See if the spread operation to update kg object cause performance issue.
use scp to transfer to test results: https://github.com/appleboy/scp-action
It's a good feature to store user request persistently, so users can come back and look up their results just using the answer id we assign to them.
We could also hook this up with the web interface. Given an answer id, the UI can fetch results directly from mongodb and display the results as graph/table for exploration.
One ID might belong to multiple semantic types,
e.g. UMLS:C0008780 can be mapped as a Disease or a PhenotypicFeature
So when user provide the following query:
{
"message": {
"query_graph": {
"edges": {
"e00": {
"object": "n01",
"subject": "n00"
}
},
"nodes": {
"n00": {
"category": ["biolink:Disease", "biolink:PhenotypicFeature"],
"id": "UMLS:C0008780"
},
"n01": {
"category": "biolink:Gene"
}
}
}
}
}
We should look for Genes which related to UMLS:C0008780 as a Disease or as a PhenotypicFeature.
right now, the commit url and compare url in CHANGELOG are wrong. Need to fix that as well as the .versionrc.json file which helps automatically generate them.
{
"message": {
"query_graph": {
"nodes": {
"n0": {
"id": "MONDO:0005132",
"category":"biolink:Disease"
},
"n1": {
"category": "biolink:ChemicalSubstance"
},
"n2": {
"id": "UMLS:C0032961",
"category":"biolink:Disease"
}
},
"edges": {
"e01": {
"subject": "n1",
"object": "n0",
"predicate":"biolink:treats"
},
"e02": {
"subject": "n1",
"object": "n2",
"predicate": "biolink:contraindicated_for"
}
}
}
}
}
{
"message": {
"query_graph": {
"edges": {
"e00": {
"subject": "n00",
"object": "n01",
"category": "biolink:correlated_with"
}
},
"nodes": {
"n00": {
"category": "biolink:ChemicalSubstance",
"id": "CAS:121999-58-4"
},
"n01": {
"category": "biolink:ChemicalSubstance"
}
}
}
}
}
According to Ryan,
This query returns
{
"error": "TypeError: Cannot read property 'slice' of undefined"
}
FAIL test/integration/TRAPIv1.test.js (97.982 s)
โ Testing endpoints โบ POST /v1/query with clinical risk kp queryexpect(received).toHaveProperty(path) Expected path: "MONDO:0005249" Received path: [] Received value: {} 69 | expect(response.body.message.knowledge_graph).toHaveProperty("nodes"); 70 | expect(response.body.message.knowledge_graph).toHaveProperty("edges"); > 71 | expect(response.body.message.knowledge_graph.nodes).toHaveProperty("MONDO:0005249") | ^ 72 | }) 73 | }) 74 | at __test__/integration/TRAPIv1.test.js:71:69 at Object.<anonymous> (__test__/integration/TRAPIv1.test.js:60:9)
{
"message": {
"query_graph": {
"nodes": {
"n00": {
"id": "name:Imatinib",
"category": "biolink:ChemicalSubstance"
},
"n01": {
"category": "biolink:Disease"
},
"n02": {
"category": "biolink:Gene"
}
},
"edges": {
"e00": {
"subject": "n00",
"object": "n01",
"predicate":"biolink:treats"
},
"e01": {
"subject": "n01",
"object": "n02",
"predicate":"biolink:caused_by"
}
}
}
}
}
Above query results in a 504 timeout error in current BTE app. Need to investigate how that happens and how to set timeout on either express.js end or nginx end.
Currently, the BioLink reversal class (include file read) has to be initiated every time when processing predicates. Need to modify to adapt Singleton Design Pattern, so it's only initiated once to speed the program up.
{
"message": {
"query_graph": {
"nodes": {
"n0": {
"category": "biolink:Drug",
"id": "RXCUI:466423"
},
"n3": {
"category": "biolink:Disease"
}
},
"edges": {
"e03": {
"subject": "n0",
"object": "n3"
}
}
}
}
}
Error message:
{
"error": "TypeError: Cannot read property 'id' of undefined"
}
As shown above, the meta-kg sometimes could take up to 20s to load. This is causing serious performance issue on BTE API end. Need to refactor the smartapi-kg package so that it can take a list of specs sending to it as a file instead of making real time API query.
Need also to implement cron job on TRAPI end to fetch SmartAPI specs periodically from SmartAPI API.
On initial installation in WSL, I got the following error when executing a test query: "TypeError: Promise.allSettled is not a function"
We would like to create a regression testing framework to quantitatively assess BTE's performance. As a gold standard, we can use the orphan drug indication dataset mentioned in NCATSTranslator/Relay#123 or the mechanistic paths from https://sulab.github.io/DrugMechDB/. For each of those gold standards, we should create a TRAPI query (examples), send it to BTE using a small library of plausible metapaths focused on drug repurposing, and then assess whether BTE was able to retrieve the right drug among the results. (Later we can also assess where that drug ranked among all potential drugs retrieved.) We would want to execute this test on a regular basis (weekly?), and then have a simple web page where results can be viewed/browsed.
tagging @ariutta and @AlexanderPico
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.