biothings / biothings_explorer Goto Github PK

View Code? Open in Web Editor NEW

8.0 8.0 9.0 11.13 MB

TRAPI service for BioThings Explorer

Home Page: https://api.bte.ncats.io

License: Apache License 2.0

JavaScript 12.36% Dockerfile 1.55% HTML 19.65% Shell 3.00% Smarty 1.47% Vue 54.61% CSS 1.03% TypeScript 6.33%

ncats-translator biothings-explorer

biothings_explorer's People

Contributors

Stargazers

Watchers

Forkers

kevinxin90 naveen584 ariutta ericz1803 newgene smartniz mnarayan1 pahmadi8740 andrewsu

biothings_explorer's Issues

Add test /v1/team/{team_name}/query endpoint

Current no test implemented for /v1/team/{team_name}/query endpoint. We need to implement tests to ensure it's working correctly.

Add support for symmetric biolink predicate

Add GO qualifiers to mygene.info record in SmartAPI

BTE is not correctly interpreting mygene.info output on GO annotations because it is ignoring the qualifiers. I believe the fix involves a modification of the mygene.info SmartAPI record (and hopefully TRAPI has a way of expressing qualifiers). Example below...

I issued this query to get BiologicalProcesses related to the gene VAMP2:

{
	"message": {
		"query_graph": {
			"nodes": {
				"n0": {
					"id": "NCBIGENE:6844",
					"category":"biolink:Gene"
				},
				"n1": {
					"category": "biolink:BiologicalProcess"
                }
			},
			"edges": {
				"e01": {
					"subject": "n0",
                    "object": "n1"
                }
			}
		}
	}
}

The following edge linking VAMP2 to neutrophil degranulation (GO:0043312) is returned in the output:

                "NCBIGENE:6844-GO:0043312-MyGene.info API-NCBI Gene": {
                    "predicate": "biolink:participates_in",
                    "subject": "NCBIGENE:6844",
                    "object": "GO:0043312",
                    "attributes": [
                        {
                            "name": "provided_by",
                            "value": "NCBI Gene",
                            "type": "biolink:provided_by"
                        },
                        {
                            "name": "api",
                            "value": "MyGene.info API",
                            "type": "bts:api"
                        },
                        {
                            "name": "evidence",
                            "value": "IMP",
                            "type": "bts:evidence"
                        },
                        {
                            "name": "publications",
                            "value": [
                                "PMID:16677249"
                            ],
                            "type": "biolink:publications"
                        }
                    ]
                },

The original content from http://mygene.info/v3/gene/6844?fields=go looks like this:

{
   "evidence": "IMP",
   "gocategory": "BP",
   "id": "GO:0043312",
   "pubmed": 16677249,
   "qualifier": "NOT",
   "term": "neutrophil degranulation"
},

Critically, the NOT qualifier in the mygene.info record is not being shown in the TRAPI BTE output, which completely reverses the interpretation.

Disable ID Resolution for Text Mining KPs

Individual SmartAPI TRAPI interface should enable id resolution by default.

If the SmartAPI is from text mining teams, disable the id resolution module.

CI test should run on all branches

Need a nodejs package handling BioLink model

The package needs to be separate from current TRAPI code repo.

It should perform:

Given a specific node type (e.g. biolink:GeneOrGeneProduct), return all descendants/ancestors of that node type.
Given a specific node type, return all available ID Prefixes defined in BioLink model
Given a specific ID Prefix, return all node types which can have this ID Prefix.
Given a specific predicate, return all its descendants/ancestors predicates

Use NodeNormalizer to resolve QNodes with only id specified

Currently, BTE use BioThings APIs to resolve identifiers, which requires category (e.g. Gene, ChemicalSubstance) to be specified.

TRAPI standard does allow user to specify a query without category info.

So in order to support that, we should include NodeNormalizer as a fallback.

how to query by UniProtKB CURIE?

The issue at NCATSTranslator/testing#10 reports that BTE does not return any results for the following query:

{
	"message": {
		"query_graph": {
			"nodes": {
				"n0": {
					"id": "UniProtKB:P52788",
					"category":"biolink:Gene"
				},
				"n1": {
					"category": "biolink:ChemicalSubstance"
                }
			},
			"edges": {
				"e01": {
					"subject": "n0",
                                        "object": "n1"
                                }
			}
		}
	}
}

If I convert UniProtKB:P52788 to NCBIGENE:6611 (based on http://mygene.info/v3/query?q=P52788&fields=entrezgene,uniprot), the query returns many results as expected. I tried adjusting the category for n0 to biolink:Protein and biolink:GenomicEntity, but those queries also return zero results. What is the proper way to form a BTE TRAPI query for a UniProtKB CURIE?

/query endpoint should fetch SmartAPI Specs dynamically

Current /query endpoint use a static copy of SmartAPI Specs from smartapi-kg nodejs package.

It should dynamically query SmartAPI API for specs at run time.

Add UserID, groupID environment variable in Docker Compose file.

The TRAPI service needs to access and modify ./log folder. Need to set the UID & GID to be the same as the UID & GID for ./log folder in our server.

Query not working

{
  "message": {
    "query_graph": {
      "nodes": {
        "n00": {
          "id": "MONDO:0002715",
          "category": "biolink:Disease"
        },
        "n01": {
          "category": "biolink:ChemicalSubstance"
        },
        "n02": {
          "category": "biolink:Gene"
        }
      },
      "edges": {
        "e00": {
          "predicate": "biolink:correlated_with",
          "subject": "n00",
          "object": "n01"
        },
        "e01": {
          "predicate": "biolink:related_to",
          "subject": "n01",
          "object": "n02"
        }
      }
    }
  }
}

Error:

{
    "error": "TypeError: Cannot convert undefined or null to object"
}

Expand Query Graph node (without curie) based on BioLink model hierarchy

See details in this issue: NCATSTranslator/testing#12

Add additional node attributes including nodeDegree

How many unique source KG nodes does this KG node connects from.
How many unique target KG nodes does this KG node connects to.
How many unique edges (source-predicate-target) does this KG node connects from.
How many unique edges (source-predicate-target) does this kG node connects to.

Create a module to transform TRAPI Query Graph

Expand node by its id, e.g. if user provides a MONDO ID as input, we will traverse MONDO hierarchy to get all its descendants.
Expand node by its category, e.g. if user provides a NamedThing category, we will traverse BioLink class hierarchy to get all descendants of NamedThing class.
Expand predicate, e.g. if user provides a related_to predicate, we will traverse BioLink predicate hierarchy to get all descendants of related_to predicate.

BTE doesn't handle predicate as a list

According to TRAPI: predicate is supported as list or as a string

However, current BTE implementation doesn't support list.

The following query fails:

{
  "message": {
    "query_graph": {
      "edges": {
        "e00": {
          "object": "n01",
          "subject": "n00",
          "predicate": ["biolink:physically_interacts_with"]
        }
      },
      "nodes": {
        "n00": {
          "category": "biolink:ChemicalSubstance",
          "id": "DRUGBANK:DB00188"
        },
        "n01": {
          "category": "biolink:Gene"
        }
      }
    }
  }
}

The error message is:

{
    "error": "TypeError: this.predicate.startsWith is not a function"
}

Improve logging module

Current logging only provides how a TRAPI query is parsed and how SmartAPI kg is used. Should include additional information such as:

what's the query made to API
how many response do we get from each API call.
How many response do we get after merging the results from different KPs.

Above need support from other bte related nodejs packages.

Query with unexpected exceptions

{
	"message": {
		"query_graph": {
			"nodes": {
				"n0": {
					"id": "WIKIPATHWAYS:Pathway:WP195",
					"category": "biolink:Pathway"
				},
				"n1": {
					"category": "biolink:Gene"
				},
				"n2": {
					"category": "biolink:ChemicalSubstance"
				}
			},
			"edges": {
				"e01": {
					"subject": "n0",
					"object": "n1"
				},
				"e02": {
					"subject": "n1",
					"object": "n2"
				}
			}
		}
	}
}

slice

Deprecate TRAPI v0.9.2 support

Should have /query endpoint have the same implementation as /v1/query.

Probably should use regex when specifying routing. e.g. (v1)?/query

See expressjs routing mechanism: https://expressjs.com/en/guide/routing.html

set up ci to deploy to test server with a commit hash id

Add support for reverse biolink predicate

Investigate redis in memory database for cacheing query results

This is helpful to speed up nodejs app when there're multiple queries asking for the same edge.

Use redis docker image for easier deployment.

Use .env to store redis url/password info.

Disease(s) Treated By Drug

see: NCATSTranslator/testing#20

Include additional node attributes in TRAPI Knowledge Graph

Chemical:
- chembl_max_phase
- chembl_molecule_type
- chembl_drug_category
- drugbank_class
- drugbank_groups
- drugbank_kingdom
- drugbank_superclass
- contraindications
- mesh_pharmacology_class
- fda_epc_pharmacology_class
Gene:
- interpro
- type_of_gene
Pathway:
- number_of_participants
BiologicalProcess:
- number_of_participants
CellularComponent:
- number_of_participants
MolecularActivity:
- number_of_participants

Accessing LINCS data portal API thru BTE

Summary: I think BTE is making an error in setting up the API request for LINCS data portal API. We are required to provide the input ID as a curie, so I set it as a ChemicalSubstance with the id "LINCS:LSM-1023" (which is imatinib). The logs show that the LINCS API query is then (see the bold for the error):

    {
      "timestamp": "2021-03-24T04:11:46.587Z",
      "level": "DEBUG",
      "message": "call-apis: Succesfully made the following query: {\"url\":\"http://lincsportal.ccs.miami.edu/dcic/api/drugindication\",\**"params\":{\"id\":\"LINCS:LSM-1023\"}**,\"method\":\"get\",\"timeout\":50000}",
      "code": null
    },

Looking at the smartapi page for LINCS data portal, the id field should not have a prefix...it should only have the id "LSM-1023".

The situation: I tried to query the LINCS data portal API thru BTE's /v1/smartapi/{smartapi_id}/query endpoint.

The smartapi_id is 9ee398a738916a98b612068cc022454f, the request body is:

{
  "message": {
    "query_graph": {
      "edges": {
        "e00": {
          "object": "n01",
          "subject": "n00"
        }
      },
      "nodes": {
        "n00": {
          "category": "biolink:ChemicalSubstance",
          "id": "LINCS:LSM-1023"
        },
        "n01": {
          "category": "biolink:Disease"
        }
      }
    }
  }
}

It returns no hits.

However, if I query the LINCS Data portal endpoint directly with the id as "LSM-1023", I get multiple results like:

{"documents": [
{
"lsm_id":"LSM-1023",
"efo_id":"Orphanet:44890",
"efo_term":"GASTROINTESTINAL STROMAL TUMOR",
"max_fda_phase_for_ind":"4",
"mesh_heading":"GASTROINTESTINAL STROMAL TUMORS",
"mesh_id":"D046152"
}
,
{
"lsm_id":"LSM-1023",
"efo_id":"EFO:0000691",
"efo_term":"SARCOMA",
"max_fda_phase_for_ind":"3",
"mesh_heading":"SARCOMA",
"mesh_id":"D012509"
}

Note: I'm not sure if the BTE Python client has an issue with this API too, since it accepts only LINCS IDs and I'm not sure if BTE will ever end up querying it.

/performance endpoint is showing path not found

Node related info should not appear in edge data in TRAPI response

Current behavior in edge response:

"attributes": [
                        {
                            "name": "provided_by",
                            "value": "Text Mining KP",
                            "type": "biolink:provided_by"
                        },
                        {
                            "name": "api",
                            "value": "Text Mining Targeted Association API",
                            "type": "bts:api"
                        },
                        {
                            "name": "CHEBI",
                            "value": "CHEBI:32630",
                            "type": "bts:CHEBI"
                        },
                        {
                            "name": "object_spans",
                            "value": [
                                "start: 91, end: 96",
                                "start: 62, end: 67"
                            ],
                            "type": "bts:object_spans"
                        },
                        {
                            "name": "relation_spans",
                            "value": [
                                "",
                                ""
                            ],
                            "type": "bts:relation_spans"
                        },
                        {
                            "name": "score",
                            "value": [
                                "0.9994468",
                                "0.97133327"
                            ],
                            "type": "bts:score"
                        },
                        {
                            "name": "sentence",
                            "value": [
                                "Dietary restriction of leucine for at least three days could result in the inactivation of Hsf-1, leading to a reduction in Hsp70 synthesis.",
                                "However, in cells that were leucine starved for 3 and 4 days, Hsf-1 activity and Hsp70 synthesis level was dramatically decreased."
                            ],
                            "type": "bts:sentence"
                        },
                        {
                            "name": "subject_spans",
                            "value": [
                                "start: 23, end: 30",
                                "start: 28, end: 35"
                            ],
                            "type": "bts:subject_spans"
                        },
                        {
                            "name": "publications",
                            "value": [
                                "PMID:31397439",
                                "PMID:31397439"
                            ],
                            "type": "biolink:publications"
                        }
                    ]

Information such as CHEBI does not belong here. Needs to be removed.

Handle explain type of query

{
    "message": {
        "query_graph": {
            "nodes": {
                "a": {
                    "category": "biolink:Disease",
                    "id": "MESH:D015464"
                },
                "b": {
                    "category": "biolink:ChemicalSubstance",
                    "id": "CHEBI:45783"
                },
                "c": {
                    "category": "biolink:Gene"
                }
            },
            "edges": {
                "ac": {
                    "subject": "a",
                    "object": "c"
                },
                "bc": {
                    "subject": "c",
                    "object": "b"
                }
            }
        }
    },
    "knowledge_graph": {
        "nodes": [],
        "edges": []
    },
    "results": []
}

Knowledge Graph Edges should be grouped based on subject-predicate-object

Currently edges are grouped by (subject-object-api-source)

Refactor KnowledgeGraph module

See if the spread operation to update kg object cause performance issue.

Performance test should be run on test server on Github actions instead of dev/prod server

use scp to transfer to test results: https://github.com/appleboy/scp-action

Investigate mongodb as a persistant data storage

It's a good feature to store user request persistently, so users can come back and look up their results just using the answer id we assign to them.

We could also hook this up with the web interface. Given an answer id, the UI can fetch results directly from mongodb and display the results as graph/table for exploration.

use git stash to shelf any changes made on the prod/dev server so git pull wouldn't fail

Support handling list as value for category

One ID might belong to multiple semantic types,
e.g. UMLS:C0008780 can be mapped as a Disease or a PhenotypicFeature

So when user provide the following query:

{
  "message": {
    "query_graph": {
      "edges": {
        "e00": {
          "object": "n01",
          "subject": "n00"
        }
      },
      "nodes": {
        "n00": {
          "category": ["biolink:Disease", "biolink:PhenotypicFeature"],
          "id": "UMLS:C0008780"
        },
        "n01": {
          "category": "biolink:Gene"
        }
      }
    }
  }
}

We should look for Genes which related to UMLS:C0008780 as a Disease or as a PhenotypicFeature.

Set up Development branch and deploy to the dev.api.bte.ncats.io server

fix wrong url in CHANGELOG

right now, the commit url and compare url in CHANGELOG are wrong. Need to fix that as well as the .versionrc.json file which helps automatically generate them.

Speed up the nodejs application

profiling: https://nodejs.org/en/docs/guides/simple-profiling/

This query is not working

{
	"message": {
		"query_graph": {
			"nodes": {
				"n0": {
					"id": "MONDO:0005132",
					"category":"biolink:Disease"
				},
				"n1": {
					"category": "biolink:ChemicalSubstance"
				},
				"n2": {
					"id": "UMLS:C0032961",
					"category":"biolink:Disease"
				}
			},
			"edges": {
				"e01": {
					"subject": "n1",
					"object": "n0",
					"predicate":"biolink:treats"
				},
				"e02": {
					"subject": "n1",
					"object": "n2",
					"predicate": "biolink:contraindicated_for"
				}
			}
		}
	}
}

Query returns unexpected exceptions

{
    "message": {
        "query_graph": {
            "edges": {
                "e00": {
                    "subject": "n00",
                    "object": "n01",
                    "category": "biolink:correlated_with"
                }
            },
            "nodes": {
                "n00": {
                    "category": "biolink:ChemicalSubstance",
                    "id": "CAS:121999-58-4"
                },
                "n01": {
                    "category": "biolink:ChemicalSubstance"
                }
            }
        }
    }
}

According to Ryan,

This query returns


{
    "error": "TypeError: Cannot read property 'slice' of undefined"
}

Test with clinical risk KP fails because data source changes

FAIL test/integration/TRAPIv1.test.js (97.982 s)
● Testing endpoints › POST /v1/query with clinical risk kp query

expect(received).toHaveProperty(path)

Expected path: "MONDO:0005249"
Received path: []

Received value: {}

  69 |                 expect(response.body.message.knowledge_graph).toHaveProperty("nodes");
  70 |                 expect(response.body.message.knowledge_graph).toHaveProperty("edges");
> 71 |                 expect(response.body.message.knowledge_graph.nodes).toHaveProperty("MONDO:0005249")
     |                                                                     ^
  72 |             })
  73 |     })
  74 | 

  at __test__/integration/TRAPIv1.test.js:71:69
  at Object.<anonymous> (__test__/integration/TRAPIv1.test.js:60:9)

Investigate Timeout Error

{
  "message": {
    "query_graph": {
      "nodes": {
        "n00": {
          "id": "name:Imatinib",
          "category": "biolink:ChemicalSubstance"
        },
        "n01": {
          "category": "biolink:Disease"
        },
        "n02": {
            "category": "biolink:Gene"
        }
      },
      "edges": {
        "e00": {
          "subject": "n00",
          "object": "n01",
          "predicate":"biolink:treats"
        },
        "e01": {
          "subject": "n01",
          "object": "n02",
          "predicate":"biolink:caused_by"
        }
      }
    }
  }
}

Above query results in a 504 timeout error in current BTE app. Need to investigate how that happens and how to set timeout on either express.js end or nginx end.

Add an optional parameter to export results as a csv table

Use Singleton Design Pattern for BioLink reversal class

Currently, the BioLink reversal class (include file read) has to be initiated every time when processing predicates. Need to modify to adapt Singleton Design Pattern, so it's only initiated once to speed the program up.

Query Fails

{
    "message": {
        "query_graph": {
            "nodes": {
                "n0": {
                    "category": "biolink:Drug",
                    "id": "RXCUI:466423"
                },
                "n3": {
                    "category": "biolink:Disease"
                }
            },
            "edges": {
                "e03": {
                    "subject": "n0",
                    "object": "n3"
                }
            }
        }
    }
}

Error message:


{
    "error": "TypeError: Cannot read property 'id' of undefined"
}

Add feature to allow user to test local SmartAPI specs

Separate out the logics of TRAPI Query Graph Handling from this repo

Refactor load meta-kg

As shown above, the meta-kg sometimes could take up to 20s to load. This is causing serious performance issue on BTE API end. Need to refactor the smartapi-kg package so that it can take a list of specs sending to it as a file instead of making real time API query.

Need also to implement cron job on TRAPI end to fetch SmartAPI specs periodically from SmartAPI API.

hook up with new OOP designed id resolver module

Error: "TypeError: Promise.allSettled is not a function"

On initial installation in WSL, I got the following error when executing a test query: "TypeError: Promise.allSettled is not a function"

Missing type for node attributes

Type is a required field for TRAPI 1.0 standard. Currently, we have type for all edge attributes, but we don't have type for node attributes.

Create regression testing infrastructure

We would like to create a regression testing framework to quantitatively assess BTE's performance. As a gold standard, we can use the orphan drug indication dataset mentioned in NCATSTranslator/Relay#123 or the mechanistic paths from https://sulab.github.io/DrugMechDB/. For each of those gold standards, we should create a TRAPI query (examples), send it to BTE using a small library of plausible metapaths focused on drug repurposing, and then assess whether BTE was able to retrieve the right drug among the results. (Later we can also assess where that drug ranked among all potential drugs retrieved.) We would want to execute this test on a regular basis (weekly?), and then have a simple web page where results can be viewed/browsed.

tagging @ariutta and @AlexanderPico