translatorsri / plater Goto Github PK
View Code? Open in Web Editor NEWPlater automatically creates a TRAPI interface for a biolink-compliant neo4j graph.
Plater automatically creates a TRAPI interface for a biolink-compliant neo4j graph.
Automat KP's will return 500 errors when making a POST request at the /query endpoint. This happens both when going through Strider and when calling the API directly. Only happens with certain data values, but various requests will cause this to happen. The attached file shows some examples of requests and data values that cause this error to be returned.
This issue is to report that several of the example queries for ICEES KG will return errors or empty results because they are not compatible with the underlying data.
For example, for the 'query reasoner via one of several inputs' functionality, the example query includes a MONDO identifier that ICEES KG does not support. I think this should be changed from MONDO:0004969 to MONDO:0004979. Likewise, the category for n0 should be changed from biolink:Gene to biolink:ChemicalEntity. These changes provide a successful query biolink:ChemicalEntity related_to biolink:Disease (MONDO:0004979)
.
The example query for the overlay endpoint returns an error. I think this might be similar to the above example in that I don't think the example query is something that ICEES KG can respond to.
With the meta KG endpoint, the identifier prefixes associated with Biolink categories appear to differ from those that ICEES KG actually uses. For instance,
"biolink:SmallMolecule": {
"id_prefixes": [
"PUBCHEM.COMPOUND",
"UNII",
"CHEBI"
does not include RXNORMCUI, but the drugs included in ICEES KG are actually mapped to RXNORMCUI not PUBCHEM.COMPOUND, UNII, or CHEBI. I think the issue has to do with the fact that the identifier prefixes are being pulled automatically from Plater/Automat, but I'm wondering if this will be problematic.
Assigning @cbizon because I do not know who the point person is for Plater-related issues.
Query:
q = {"message":{"query_graph":{
"edges": {
"e00": {
"object": "n01",
"predicates": [
"biolink:located_in"
],
"subject": "n00"
}
},
"nodes": {
"n00": {
"ids": [
"NCBIGene:5354"
]
},
"n01": {
"categories": [
"biolink:AnatomicalEntity"
]
}
}
}}}
(NCBIGene:5354)-[located_in]->(AnatomicalEntity)
Running against automat/hetio returns 0 results.
But, changing the query to
(NCBIGene:5354)-[expressed_in]->(AnatomicalEntity)
returns 56 results.
However, expressed_in is_a located_in, so the first query should also return those 56 results.
Currently included examples for each path requires adding a trapi message as a json file to the /examples directory.
This only supports calls that are trapi interface queries.
SRI Reference graph has additional calls that are GET and take arguments as below:
For example:
https://trapi.monarchinitiative.org/docs#/default/node__node_type___curie__get
Just takes a biolink node type and curie as part of the constructed url
e.g.
'https://trapi.monarchinitiative.org/biolink%3ADisease/MONDO%3A0000251'
This issue is to request that the URL for edge attributes in CAM Provider KG is updated to: https://github.com/NCATSTranslator/Translator-All/wiki/CAM-Provider-KG. I believe the current URL is https://github.com/NCATSTranslator/Translator-All/wiki, which isn't specific to CAM Provider KG.
I want to find all chemicals connected to a disease by 2 hops. If I run as a cypher query:
cypher={"query": "MATCH (n:`biolink:Disease` {id:'MONDO:0008078'})-[x]-(q0)-[x1]-(c:`biolink:ChemicalEntity`) RETURN *"}
It runs fine in a few minutes
But if I send the equivalent TRAPI
{
"message": {
"query_graph": {
"nodes": {
"disease": {
"ids": [
"MONDO:0008078"
]
},
"nt_0": {
"categories": [
"biolink:NamedThing"
]
},
"chemical": {
"categories": [
"biolink:ChemicalEntity"
]
}
},
"edges": {
"edge_0": {
"subject": "disease",
"object": "nt_0"
},
"dedge": {
"subject": "nt_0",
"object": "chemical"
}
}
}
}
}
It never returns.
Docker file is pointing to the older repo where PLATER used to exists in. Need to update to this new repo.
While loading one of the datasets that has quite large number of edges (~ 156675722), We discovered that the graph schema generation cypher we highly inefficient and doesn't complete for in a reasonable time .
Thinking maybe it might be better to modify the cypher as
MATCH (a)-[x]->(b) where not a:Concept and not b:Concept RETURN DISTINCT labels(a), type(x), labels(b)
And do the permutations that the original cypher did in python.
See https://github.com/NCATSTranslator/OperationsAndWorkflows/wiki/How-to-%22do%22-operations for how to add operations to the openapi spec.
all the platers should expose "lookup" currently. We may want to soon add others. Need this pushed to all automat platers.
It has been a while since we looked at the non-TRAPI and translator endpoints of the Plater API. It's not clear if people are using the one_hop, node, and simple_spec endpoints or for what purposes.
For one_hop and node, I think the parameters should be changed. Specifically, they both contain redundant/unnecessary parameters. They also use the base url path without an explicit endpoint, which I think could be confusing and potentially cause issues calling them unintentionally.
one_hop
/{source_type}/{target_type}/{curie}
returns one hop paths from source_type
with curie
to target_type
, but if the curie is specified, I'm not sure why the source type also needs to be specified. This could be changed to something like the following without losing functionality:
/one_hop/{curie}/{target_type}/
node
/{node_type}/{curie}
Returns a node matching curie
.
Similarly, these parameters seem redundant to me, the node_type could be removed without losing anything.
simple_spec
/simple_spec with optional source
and target
url parameters
"Returns a list of available predicates when choosing a single source or target curie. Calling this endpoint with no query parameters will return all possible hops for all types."
This endpoint is somewhat redundant with meta_knowledge_graph, except that it returns less information. It's based on edges and has no nodes section, so it has no node curie prefixes or attributes, or edge attributes. It only includes leaf node types, not every permutation. It gives the option to pre-filter for specific nodes, but without a use case I'm not sure how helpful that is. It also queries the neo4j every time for results and doesn't cache anything, so it can be slow. For that reason, I reworked simple_spec recently so that it returns a cached pre-computed result when parameters are left blank (the full simple spec), because that was especially slow for large graphs (but currently this doesn't filter for leaf node types). I wonder if we should be querying neo4j for this, if we should be caching results, if this should just be part of the meta_knowledge_graph endpoint, or if we need it at all.
I could see us adding additional functionality as well. Should we support N-hop queries from a pinned node to a target type or something like that? Should we add endpoints related to subclass hierarchies?
Additionally, we recently combined the /1.4/ endpoints with everything else, which results in all of these being exposed on the smartapi registry. We could probably fairly easily split them out again if we wanted.
"message":{
"query_graph":{
"nodes":{
"n0":{
},
"n1":{
}
},
"edges":{
"e0":{
"subject":"uh oh",
"object":"n1"
}
}
}
}
But it should return a 400 error with details about invalid edges
We want /predictes to only show leaf nodes, but here is the predicates for automat human-goa:
{
"biolink:MacromolecularMachine": {
"biolink:BiologicalProcess": [
"biolink:actively_involved_in"
],
"biolink:MolecularActivity": [
"biolink:enables"
],
"biolink:CellularComponent": [
"biolink:related_to"
]
},
"biolink:GeneOrGeneProduct": {
"biolink:BiologicalProcess": [
"biolink:actively_involved_in"
],
"biolink:MolecularActivity": [
"biolink:enables"
],
"biolink:CellularComponent": [
"biolink:related_to"
]
},
"biolink:Gene": {
"biolink:BiologicalProcess": [
"biolink:actively_involved_in"
],
"biolink:MolecularActivity": [
"biolink:enables"
],
"biolink:CellularComponent": [
"biolink:related_to"
]
},
"biolink:MolecularActivity": {
"biolink:MacromolecularMachine": [
"biolink:enabled_by"
],
"biolink:GeneOrGeneProduct": [
"biolink:enabled_by"
],
"biolink:Gene": [
"biolink:enabled_by"
]
},
"biolink:CellularComponent": {
"biolink:MacromolecularMachine": [
"biolink:related_to"
],
"biolink:GeneOrGeneProduct": [
"biolink:related_to"
],
"biolink:Gene": [
"biolink:related_to"
]
}
}
I think everything in there should be a gene, so I don't see why we would have the entries for GeneOrGeneProduct or MacromolecularMachine.
Compare this in the TRAPI dist: https://github.com/NCATSTranslator/ReasonerAPI/blob/master/examples/Message/causes_predicate_vs_qualifier.json
With what Plater is producing:
"15159895": {
"subject": "PUBCHEM.COMPOUND:24823",
"object": "NCBIGene:3569",
"predicate": "biolink:affects",
"qualifiers": null,
"attributes": [
{
"attribute_type_id": "biolink:aggregator_knowledge_source",
"value": [
"infores:automat-robokop"
],
"value_type_id": "biolink:InformationResource",
"original_attribute_name": "biolink:aggregator_knowledge_source",
"value_url": null,
"attribute_source": "infores:automat-robokop",
"description": null,
"attributes": null
},
{
"attribute_type_id": "biolink:object_direction_qualifier",
"value": "increased",
"value_type_id": "EDAM:data_0006",
"original_attribute_name": "object_direction_qualifier",
"value_url": null,
"attribute_source": null,
"description": null,
"attributes": null
},
{
"attribute_type_id": "biolink:object_aspect_qualifier",
"value": "activity",
"value_type_id": "EDAM:data_0006",
"original_attribute_name": "object_aspect_qualifier",
"value_url": null,
"attribute_source": null,
"description": null,
"attributes": null
},
{
"attribute_type_id": "biolink:primary_knowledge_source",
"value": [
"infores:ctd"
],
"value_type_id": "biolink:InformationResource",
"original_attribute_name": "biolink:primary_knowledge_source",
"value_url": null,
"attribute_source": "infores:automat-robokop",
"description": null,
"attributes": null
},
{
"attribute_type_id": "biolink:qualified_predicate",
"value": "biolink:causes",
"value_type_id": "EDAM:data_0006",
"original_attribute_name": "qualified_predicate",
"value_url": null,
"attribute_source": null,
"description": null,
"attributes": null
},
Specifically, the qualifiers should not be in the attributes, but in the "qualifiers" element.
Looks like we have inverseOf (#21)
What about
In particular, queries for phenotypes of Xeroderma Pigmentosa should return phenotypes annotated to subclass descendants of XD
(I can provide full proof later, using owlstar, but for now doing the standard closure trick implemented in monarch and elsewhere is good)
Add more unit test and setup automated testing for repo
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.