Code Monkey home page Code Monkey logo

plater's People

Contributors

cbizon avatar evandietzmorris avatar phillipsowen avatar yaphetkg avatar

Stargazers

 avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

plater's Issues

Returning 500 Error During Requests For Automat KP's

Automat KP's will return 500 errors when making a POST request at the /query endpoint. This happens both when going through Strider and when calling the API directly. Only happens with certain data values, but various requests will cause this to happen. The attached file shows some examples of requests and data values that cause this error to be returned.

response_1625760453713.txt

ICEES KG endpoint example queries not compatible with ICEES

This issue is to report that several of the example queries for ICEES KG will return errors or empty results because they are not compatible with the underlying data.

For example, for the 'query reasoner via one of several inputs' functionality, the example query includes a MONDO identifier that ICEES KG does not support. I think this should be changed from MONDO:0004969 to MONDO:0004979. Likewise, the category for n0 should be changed from biolink:Gene to biolink:ChemicalEntity. These changes provide a successful query biolink:ChemicalEntity related_to biolink:Disease (MONDO:0004979).

The example query for the overlay endpoint returns an error. I think this might be similar to the above example in that I don't think the example query is something that ICEES KG can respond to.

With the meta KG endpoint, the identifier prefixes associated with Biolink categories appear to differ from those that ICEES KG actually uses. For instance,

    "biolink:SmallMolecule": {
      "id_prefixes": [
        "PUBCHEM.COMPOUND",
        "UNII",
        "CHEBI"

does not include RXNORMCUI, but the drugs included in ICEES KG are actually mapped to RXNORMCUI not PUBCHEM.COMPOUND, UNII, or CHEBI. I think the issue has to do with the fact that the identifier prefixes are being pulled automatically from Plater/Automat, but I'm wondering if this will be problematic.

Assigning @cbizon because I do not know who the point person is for Plater-related issues.

subpredicate query failing

Query:

q = {"message":{"query_graph":{
  "edges": {
    "e00": {
      "object": "n01",
      "predicates": [
        "biolink:located_in"
      ],
      "subject": "n00"
    }
  },
  "nodes": {
    "n00": {
      "ids": [
        "NCBIGene:5354"
      ]
    },
    "n01": {
      "categories": [
        "biolink:AnatomicalEntity"
      ]
    }
  }
}}}

(NCBIGene:5354)-[located_in]->(AnatomicalEntity)

Running against automat/hetio returns 0 results.

But, changing the query to

(NCBIGene:5354)-[expressed_in]->(AnatomicalEntity) returns 56 results.

However, expressed_in is_a located_in, so the first query should also return those 56 results.

As a KP I need to be able to add examples for all types of calls to be able to pass all SmartAPI uptime checks.

Currently included examples for each path requires adding a trapi message as a json file to the /examples directory.
This only supports calls that are trapi interface queries.

SRI Reference graph has additional calls that are GET and take arguments as below:

For example:
https://trapi.monarchinitiative.org/docs#/default/node__node_type___curie__get

Just takes a biolink node type and curie as part of the constructed url
e.g.
'https://trapi.monarchinitiative.org/biolink%3ADisease/MONDO%3A0000251'

Inefficient TRAPI

I want to find all chemicals connected to a disease by 2 hops. If I run as a cypher query:

cypher={"query": "MATCH (n:`biolink:Disease` {id:'MONDO:0008078'})-[x]-(q0)-[x1]-(c:`biolink:ChemicalEntity`) RETURN *"}

It runs fine in a few minutes

But if I send the equivalent TRAPI

{
    "message": {
        "query_graph": {
            "nodes": {
                "disease": {
                    "ids": [
                        "MONDO:0008078"
                    ]
                },
                "nt_0": {
                    "categories": [
                        "biolink:NamedThing"
                    ]
                },
                "chemical": {
                    "categories": [
                        "biolink:ChemicalEntity"
                    ]
                }
            },
            "edges": {
                "edge_0": {
                    "subject": "disease",
                    "object": "nt_0"
                },
                "dedge": {
                    "subject": "nt_0",
                    "object": "chemical"
                }
            }
        }
    }
}

It never returns.

Docker file update

Docker file is pointing to the older repo where PLATER used to exists in. Need to update to this new repo.

PLATER graph schema generation cypher

While loading one of the datasets that has quite large number of edges (~ 156675722), We discovered that the graph schema generation cypher we highly inefficient and doesn't complete for in a reasonable time .

Thinking maybe it might be better to modify the cypher as

MATCH (a)-[x]->(b) where not a:Concept and not b:Concept RETURN DISTINCT labels(a), type(x), labels(b)

And do the permutations that the original cypher did in python.

Logging Query timing and metrics

  • Be able to organize queries and their timings in the logs to be able to extract performance metrics across queries.
  • If possible provide metrics as an endpoint.

Reevaluate Plater API

It has been a while since we looked at the non-TRAPI and translator endpoints of the Plater API. It's not clear if people are using the one_hop, node, and simple_spec endpoints or for what purposes.

For one_hop and node, I think the parameters should be changed. Specifically, they both contain redundant/unnecessary parameters. They also use the base url path without an explicit endpoint, which I think could be confusing and potentially cause issues calling them unintentionally.

one_hop
/{source_type}/{target_type}/{curie}
returns one hop paths from source_type with curie to target_type, but if the curie is specified, I'm not sure why the source type also needs to be specified. This could be changed to something like the following without losing functionality:
/one_hop/{curie}/{target_type}/

node
/{node_type}/{curie}
Returns a node matching curie.
Similarly, these parameters seem redundant to me, the node_type could be removed without losing anything.

simple_spec
/simple_spec with optional source and target url parameters
"Returns a list of available predicates when choosing a single source or target curie. Calling this endpoint with no query parameters will return all possible hops for all types."
This endpoint is somewhat redundant with meta_knowledge_graph, except that it returns less information. It's based on edges and has no nodes section, so it has no node curie prefixes or attributes, or edge attributes. It only includes leaf node types, not every permutation. It gives the option to pre-filter for specific nodes, but without a use case I'm not sure how helpful that is. It also queries the neo4j every time for results and doesn't cache anything, so it can be slow. For that reason, I reworked simple_spec recently so that it returns a cached pre-computed result when parameters are left blank (the full simple spec), because that was especially slow for large graphs (but currently this doesn't filter for leaf node types). I wonder if we should be querying neo4j for this, if we should be caching results, if this should just be part of the meta_knowledge_graph endpoint, or if we need it at all.

I could see us adding additional functionality as well. Should we support N-hop queries from a pinned node to a target type or something like that? Should we add endpoints related to subclass hierarchies?

Additionally, we recently combined the /1.4/ endpoints with everything else, which results in all of these being exposed on the smartapi registry. We could probably fairly easily split them out again if we wanted.

Gracefully handle Broken edge from Client

  • Testing during relay Graceful error handling :
    a query with broken edge causes a time out in plater:
   "message":{
      "query_graph":{
         "nodes":{
            "n0":{
               
            },
            "n1":{
               
            }
         },
         "edges":{
            "e0":{
               "subject":"uh oh",
               "object":"n1"
            }
         }
      }
   }

But it should return a 400 error with details about invalid edges

/predicates not quite right

We want /predictes to only show leaf nodes, but here is the predicates for automat human-goa:

{
  "biolink:MacromolecularMachine": {
    "biolink:BiologicalProcess": [
      "biolink:actively_involved_in"
    ],
    "biolink:MolecularActivity": [
      "biolink:enables"
    ],
    "biolink:CellularComponent": [
      "biolink:related_to"
    ]
  },
  "biolink:GeneOrGeneProduct": {
    "biolink:BiologicalProcess": [
      "biolink:actively_involved_in"
    ],
    "biolink:MolecularActivity": [
      "biolink:enables"
    ],
    "biolink:CellularComponent": [
      "biolink:related_to"
    ]
  },
  "biolink:Gene": {
    "biolink:BiologicalProcess": [
      "biolink:actively_involved_in"
    ],
    "biolink:MolecularActivity": [
      "biolink:enables"
    ],
    "biolink:CellularComponent": [
      "biolink:related_to"
    ]
  },
  "biolink:MolecularActivity": {
    "biolink:MacromolecularMachine": [
      "biolink:enabled_by"
    ],
    "biolink:GeneOrGeneProduct": [
      "biolink:enabled_by"
    ],
    "biolink:Gene": [
      "biolink:enabled_by"
    ]
  },
  "biolink:CellularComponent": {
    "biolink:MacromolecularMachine": [
      "biolink:related_to"
    ],
    "biolink:GeneOrGeneProduct": [
      "biolink:related_to"
    ],
    "biolink:Gene": [
      "biolink:related_to"
    ]
  }
}

I think everything in there should be a gene, so I don't see why we would have the entries for GeneOrGeneProduct or MacromolecularMachine.

Incorrect qualifier TRAPI

Compare this in the TRAPI dist: https://github.com/NCATSTranslator/ReasonerAPI/blob/master/examples/Message/causes_predicate_vs_qualifier.json

With what Plater is producing:

"15159895": {
        "subject": "PUBCHEM.COMPOUND:24823",
        "object": "NCBIGene:3569",
        "predicate": "biolink:affects",
        "qualifiers": null,
        "attributes": [
            {
                "attribute_type_id": "biolink:aggregator_knowledge_source",
                "value": [
                    "infores:automat-robokop"
                ],
                "value_type_id": "biolink:InformationResource",
                "original_attribute_name": "biolink:aggregator_knowledge_source",
                "value_url": null,
                "attribute_source": "infores:automat-robokop",
                "description": null,
                "attributes": null
            },
            {
                "attribute_type_id": "biolink:object_direction_qualifier",
                "value": "increased",
                "value_type_id": "EDAM:data_0006",
                "original_attribute_name": "object_direction_qualifier",
                "value_url": null,
                "attribute_source": null,
                "description": null,
                "attributes": null
            },
            {
                "attribute_type_id": "biolink:object_aspect_qualifier",
                "value": "activity",
                "value_type_id": "EDAM:data_0006",
                "original_attribute_name": "object_aspect_qualifier",
                "value_url": null,
                "attribute_source": null,
                "description": null,
                "attributes": null
            },
            {
                "attribute_type_id": "biolink:primary_knowledge_source",
                "value": [
                    "infores:ctd"
                ],
                "value_type_id": "biolink:InformationResource",
                "original_attribute_name": "biolink:primary_knowledge_source",
                "value_url": null,
                "attribute_source": "infores:automat-robokop",
                "description": null,
                "attributes": null
            },
            {
                "attribute_type_id": "biolink:qualified_predicate",
                "value": "biolink:causes",
                "value_type_id": "EDAM:data_0006",
                "original_attribute_name": "qualified_predicate",
                "value_url": null,
                "attribute_source": null,
                "description": null,
                "attributes": null
            },

Specifically, the qualifiers should not be in the attributes, but in the "qualifiers" element.

Define and implement deductive inference rules

Looks like we have inverseOf (#21)

What about

  • owl:symmetric
  • owl:transitive
  • subClassOf-someValues chains

In particular, queries for phenotypes of Xeroderma Pigmentosa should return phenotypes annotated to subclass descendants of XD

(I can provide full proof later, using owlstar, but for now doing the standard closure trick implemented in monarch and elsewhere is good)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.