translatorsri / plater Goto Github PK

View Code? Open in Web Editor NEW

1.0 1.0 7.0 518 KB

Plater automatically creates a TRAPI interface for a biolink-compliant neo4j graph.

Dockerfile 0.94% Python 96.30% Shell 1.42% Jinja 1.34%

ncats-translator shared-utility trapi

plater's People

Contributors

Stargazers

Watchers

Forkers

yaphetkg kshefchek monarch-initiative sierra-moxon amykglen

plater's Issues

Consider using Pydantic settings in place of custom `config`

https://fastapi.tiangolo.com/advanced/settings/

ICEES KG endpoint example queries not compatible with ICEES

This issue is to report that several of the example queries for ICEES KG will return errors or empty results because they are not compatible with the underlying data.

For example, for the 'query reasoner via one of several inputs' functionality, the example query includes a MONDO identifier that ICEES KG does not support. I think this should be changed from MONDO:0004969 to MONDO:0004979. Likewise, the category for n0 should be changed from biolink:Gene to biolink:ChemicalEntity. These changes provide a successful query biolink:ChemicalEntity related_to biolink:Disease (MONDO:0004979).

The example query for the overlay endpoint returns an error. I think this might be similar to the above example in that I don't think the example query is something that ICEES KG can respond to.

With the meta KG endpoint, the identifier prefixes associated with Biolink categories appear to differ from those that ICEES KG actually uses. For instance,

    "biolink:SmallMolecule": {
      "id_prefixes": [
        "PUBCHEM.COMPOUND",
        "UNII",
        "CHEBI"

does not include RXNORMCUI, but the drugs included in ICEES KG are actually mapped to RXNORMCUI not PUBCHEM.COMPOUND, UNII, or CHEBI. I think the issue has to do with the fact that the identifier prefixes are being pulled automatically from Plater/Automat, but I'm wondering if this will be problematic.

Assigning @cbizon because I do not know who the point person is for Plater-related issues.

Reevaluate Plater API

It has been a while since we looked at the non-TRAPI and translator endpoints of the Plater API. It's not clear if people are using the one_hop, node, and simple_spec endpoints or for what purposes.

For one_hop and node, I think the parameters should be changed. Specifically, they both contain redundant/unnecessary parameters. They also use the base url path without an explicit endpoint, which I think could be confusing and potentially cause issues calling them unintentionally.

one_hop
/{source_type}/{target_type}/{curie}
returns one hop paths from source_type with curie to target_type, but if the curie is specified, I'm not sure why the source type also needs to be specified. This could be changed to something like the following without losing functionality:
/one_hop/{curie}/{target_type}/

node
/{node_type}/{curie}
Returns a node matching curie.
Similarly, these parameters seem redundant to me, the node_type could be removed without losing anything.

simple_spec
/simple_spec with optional source and target url parameters
"Returns a list of available predicates when choosing a single source or target curie. Calling this endpoint with no query parameters will return all possible hops for all types."
This endpoint is somewhat redundant with meta_knowledge_graph, except that it returns less information. It's based on edges and has no nodes section, so it has no node curie prefixes or attributes, or edge attributes. It only includes leaf node types, not every permutation. It gives the option to pre-filter for specific nodes, but without a use case I'm not sure how helpful that is. It also queries the neo4j every time for results and doesn't cache anything, so it can be slow. For that reason, I reworked simple_spec recently so that it returns a cached pre-computed result when parameters are left blank (the full simple spec), because that was especially slow for large graphs (but currently this doesn't filter for leaf node types). I wonder if we should be querying neo4j for this, if we should be caching results, if this should just be part of the meta_knowledge_graph endpoint, or if we need it at all.

I could see us adding additional functionality as well. Should we support N-hop queries from a pinned node to a target type or something like that? Should we add endpoints related to subclass hierarchies?

Additionally, we recently combined the /1.4/ endpoints with everything else, which results in all of these being exposed on the smartapi registry. We could probably fairly easily split them out again if we wanted.

Expose operations in x-trapi

See https://github.com/NCATSTranslator/OperationsAndWorkflows/wiki/How-to-%22do%22-operations for how to add operations to the openapi spec.

all the platers should expose "lookup" currently. We may want to soon add others. Need this pushed to all automat platers.

Define and implement deductive inference rules

Looks like we have inverseOf (#21)

What about

owl:symmetric
owl:transitive
subClassOf-someValues chains

In particular, queries for phenotypes of Xeroderma Pigmentosa should return phenotypes annotated to subclass descendants of XD

(I can provide full proof later, using owlstar, but for now doing the standard closure trick implemented in monarch and elsewhere is good)

Returning 500 Error During Requests For Automat KP's

Automat KP's will return 500 errors when making a POST request at the /query endpoint. This happens both when going through Strider and when calling the API directly. Only happens with certain data values, but various requests will cause this to happen. The attached file shows some examples of requests and data values that cause this error to be returned.

response_1625760453713.txt

Logging Query timing and metrics

Be able to organize queries and their timings in the logs to be able to extract performance metrics across queries.
If possible provide metrics as an endpoint.

CAM Provider KG - new URL

This issue is to request that the URL for edge attributes in CAM Provider KG is updated to: https://github.com/NCATSTranslator/Translator-All/wiki/CAM-Provider-KG. I believe the current URL is https://github.com/NCATSTranslator/Translator-All/wiki, which isn't specific to CAM Provider KG.

Docker file update

Docker file is pointing to the older repo where PLATER used to exists in. Need to update to this new repo.

Document use of about.json

Inefficient TRAPI

I want to find all chemicals connected to a disease by 2 hops. If I run as a cypher query:

cypher={"query": "MATCH (n:`biolink:Disease` {id:'MONDO:0008078'})-[x]-(q0)-[x1]-(c:`biolink:ChemicalEntity`) RETURN *"}

It runs fine in a few minutes

But if I send the equivalent TRAPI

{
    "message": {
        "query_graph": {
            "nodes": {
                "disease": {
                    "ids": [
                        "MONDO:0008078"
                    ]
                },
                "nt_0": {
                    "categories": [
                        "biolink:NamedThing"
                    ]
                },
                "chemical": {
                    "categories": [
                        "biolink:ChemicalEntity"
                    ]
                }
            },
            "edges": {
                "edge_0": {
                    "subject": "disease",
                    "object": "nt_0"
                },
                "dedge": {
                    "subject": "nt_0",
                    "object": "chemical"
                }
            }
        }
    }
}

It never returns.

Increase test coverage

Add more unit test and setup automated testing for repo

As a KP I need to be able to add examples for all types of calls to be able to pass all SmartAPI uptime checks.

Currently included examples for each path requires adding a trapi message as a json file to the /examples directory.
This only supports calls that are trapi interface queries.

SRI Reference graph has additional calls that are GET and take arguments as below:

For example:
https://trapi.monarchinitiative.org/docs#/default/node__node_type___curie__get

Just takes a biolink node type and curie as part of the constructed url
e.g.
'https://trapi.monarchinitiative.org/biolink%3ADisease/MONDO%3A0000251'

Modify Docker Deployment for FastAPI integration

/predicates not quite right

We want /predictes to only show leaf nodes, but here is the predicates for automat human-goa:

{
  "biolink:MacromolecularMachine": {
    "biolink:BiologicalProcess": [
      "biolink:actively_involved_in"
    ],
    "biolink:MolecularActivity": [
      "biolink:enables"
    ],
    "biolink:CellularComponent": [
      "biolink:related_to"
    ]
  },
  "biolink:GeneOrGeneProduct": {
    "biolink:BiologicalProcess": [
      "biolink:actively_involved_in"
    ],
    "biolink:MolecularActivity": [
      "biolink:enables"
    ],
    "biolink:CellularComponent": [
      "biolink:related_to"
    ]
  },
  "biolink:Gene": {
    "biolink:BiologicalProcess": [
      "biolink:actively_involved_in"
    ],
    "biolink:MolecularActivity": [
      "biolink:enables"
    ],
    "biolink:CellularComponent": [
      "biolink:related_to"
    ]
  },
  "biolink:MolecularActivity": {
    "biolink:MacromolecularMachine": [
      "biolink:enabled_by"
    ],
    "biolink:GeneOrGeneProduct": [
      "biolink:enabled_by"
    ],
    "biolink:Gene": [
      "biolink:enabled_by"
    ]
  },
  "biolink:CellularComponent": {
    "biolink:MacromolecularMachine": [
      "biolink:related_to"
    ],
    "biolink:GeneOrGeneProduct": [
      "biolink:related_to"
    ],
    "biolink:Gene": [
      "biolink:related_to"
    ]
  }
}

I think everything in there should be a gene, so I don't see why we would have the entries for GeneOrGeneProduct or MacromolecularMachine.

Gracefully handle Broken edge from Client

Testing during relay Graceful error handling :
a query with broken edge causes a time out in plater:

   "message":{
      "query_graph":{
         "nodes":{
            "n0":{
               
            },
            "n1":{
               
            }
         },
         "edges":{
            "e0":{
               "subject":"uh oh",
               "object":"n1"
            }
         }
      }
   }

But it should return a 400 error with details about invalid edges

subpredicate query failing

Query:

q = {"message":{"query_graph":{
  "edges": {
    "e00": {
      "object": "n01",
      "predicates": [
        "biolink:located_in"
      ],
      "subject": "n00"
    }
  },
  "nodes": {
    "n00": {
      "ids": [
        "NCBIGene:5354"
      ]
    },
    "n01": {
      "categories": [
        "biolink:AnatomicalEntity"
      ]
    }
  }
}}}

(NCBIGene:5354)-[located_in]->(AnatomicalEntity)

Running against automat/hetio returns 0 results.

But, changing the query to

(NCBIGene:5354)-[expressed_in]->(AnatomicalEntity) returns 56 results.

However, expressed_in is_a located_in, so the first query should also return those 56 results.

Incorrect qualifier TRAPI

Compare this in the TRAPI dist: https://github.com/NCATSTranslator/ReasonerAPI/blob/master/examples/Message/causes_predicate_vs_qualifier.json

With what Plater is producing:

"15159895": {
        "subject": "PUBCHEM.COMPOUND:24823",
        "object": "NCBIGene:3569",
        "predicate": "biolink:affects",
        "qualifiers": null,
        "attributes": [
            {
                "attribute_type_id": "biolink:aggregator_knowledge_source",
                "value": [
                    "infores:automat-robokop"
                ],
                "value_type_id": "biolink:InformationResource",
                "original_attribute_name": "biolink:aggregator_knowledge_source",
                "value_url": null,
                "attribute_source": "infores:automat-robokop",
                "description": null,
                "attributes": null
            },
            {
                "attribute_type_id": "biolink:object_direction_qualifier",
                "value": "increased",
                "value_type_id": "EDAM:data_0006",
                "original_attribute_name": "object_direction_qualifier",
                "value_url": null,
                "attribute_source": null,
                "description": null,
                "attributes": null
            },
            {
                "attribute_type_id": "biolink:object_aspect_qualifier",
                "value": "activity",
                "value_type_id": "EDAM:data_0006",
                "original_attribute_name": "object_aspect_qualifier",
                "value_url": null,
                "attribute_source": null,
                "description": null,
                "attributes": null
            },
            {
                "attribute_type_id": "biolink:primary_knowledge_source",
                "value": [
                    "infores:ctd"
                ],
                "value_type_id": "biolink:InformationResource",
                "original_attribute_name": "biolink:primary_knowledge_source",
                "value_url": null,
                "attribute_source": "infores:automat-robokop",
                "description": null,
                "attributes": null
            },
            {
                "attribute_type_id": "biolink:qualified_predicate",
                "value": "biolink:causes",
                "value_type_id": "EDAM:data_0006",
                "original_attribute_name": "qualified_predicate",
                "value_url": null,
                "attribute_source": null,
                "description": null,
                "attributes": null
            },

Specifically, the qualifiers should not be in the attributes, but in the "qualifiers" element.

PLATER graph schema generation cypher

While loading one of the datasets that has quite large number of edges (~ 156675722), We discovered that the graph schema generation cypher we highly inefficient and doesn't complete for in a reasonable time .

Thinking maybe it might be better to modify the cypher as

MATCH (a)-[x]->(b) where not a:Concept and not b:Concept RETURN DISTINCT labels(a), type(x), labels(b)

And do the permutations that the original cypher did in python.