suihuanglab / improving-agent Goto Github PK

2.0 2.0 4.0 782 KB

An Autonomous Reasoning Agent in the NCATS Biomedical Translator Network that uses EHR and multi-omic cohorts to rank results from user queries

License: MIT License

Dockerfile 0.08% Python 90.61% CSS 0.34% JavaScript 5.46% HTML 3.44% Shell 0.07%

ncats-translator

improving-agent's Introduction

imProving Agent

imProving Agent is Autonomous Reasoning Agent built on top of Scalable Precision Medicine Oriented Knowledge Engine (SPOKE) and is part of the NCATS Biomedical Data Translator Network. It aims to improve user queries by utilizing EHR and multi-omic cohorts to extract the best knowledge for a given concept. Use these links to find out more about imProving Agents's Data, its algorithms, or some of its multi-omic cohort data.

Using imProving Agent as a client

Find a Jupyter notebook with some basic examples of appropriate Translator Reasoner API queries here.

Deploying

SPOKE database

imProving Agent relies on a bolt connection to a Neo4j instance of SPOKE

PSEVs

For ranking, imProving Agent relies on PSEVs. PSEVs are accessed via the psev-service

Environment variables

Depending on the environment in which imProving Agent is to be run, a number of environment variables must be set.

ITRB deployment

The following variables must be set for deployment in the ITRB test, staging, and production environments:

APP_ENV_IA should be set to itrb
AWS_REGION should be set to the AWS region in which the secrets noted below are stored.
NEO4J_SPOKE_HOSTNAME should be set to the environment-specific (CI, Production, Test) hostname of the instance hosting SPOKE
NEO4J_SECRETS_NAME should be set to the environment-specific name of the secret that contains the authentication credentials for SPOKE
PSEV_SECRETS_NAME should be set to the environment-specific name of the secrets that will be used to authenticate with the psev-service
PSEV_SERVICE_HOSTNAME the environment-specific hostname of the instance hosting the psev-service. In most cases, this should be set to the same value as the NEO4J hostname above

Local and dev deployment

APP_ENV_IA should be set to local or dev
NEO4J_SPOKE_HOSTNAME should be set to the docker network name of the SPOKE instance to which imProving Agent will connect. In some cases, e.g. when running against remote databases, this can be set to the remote hostname
NEO4J_SPOKE_PASS should be set to the password of the local SPOKE Neo4j instance
NEO4J_SPOKE_USER should be set to the user of the local SPOKE Neo4j instance
PSEV_API_KEY should be set to the value of the key that will be used to connect with the local psev-service instance
PSEV_SERVICE_HOSTNAME should be set to the docker network name of the psev-service to which imProving Agent will connect
AWS_SHARED_CREDENTIALS_FILE if running locally against remote resources in AWS, this should be set to the location where boto3 can find your AWS credentials

Networking

The imProving Agent HTTP service will be exposed on port 3031 and uwsgi stats will be exposed on port 3032. When running locally via docker-compose, these ports are bound to host ports 3033 and 3034, respectively.

Running imProving Agent

Given the depedencies described above, start the service with docker-compose up web

improving-agent's People

Contributors

Stargazers

Watchers

Forkers

aojesanmi wangk8 sphinx-automation bettyli037

improving-agent's Issues

SSL Certificate is expired on dev

I am getting SSL certificate errors when sending queries to https://spokekp.healthdatascience.cloud/api/v1.4/query

Node normalization

Currently, we only accept CURIEs that are native to SPOKE. We need set up our client for requesting from SRI resources to resolve and map CURIEs from other dbs.

Integrate COHD

The COHD KP offers easily ingestible data on clinical co-occurrence for concepts in the EHR. The data are encoded with OMOP, which should make for an easy mapping to SPOKE with existing code or the SRI node-normalization service.

scoring issue with MVP1 post KL/AT updates

From Sarah:
it looks up in Test, i see scores and subgraphs for MVP2/genes but I'm not seeing scores for MVP1, is that expected?

here is the MVP1/Disease 28af66c5-0292-4d9a-9273-f92938ffe052

Support qNodes with ids but no categories

As surfaced here NCATSTranslator/minihackathons#11 , the imProving Agent does not support qNodes with qNode IDs but no categories.

This was an intentional design decision that was used for retrieving relevant node-specific identifier regexes to prevent Cypher injection and to skip querying the node normalizer if the identifier is recognized. However, this could be updated to always query the node normalizer regardless and the resulting node category from the node normalizer would allow the retrieval of the relevant regex.

Broader KP interaction

As surfaced here: NCATSTranslator/minihackathons#71 (comment) we do not query data from the valuable and unique ICEES APIs that provide exposures and other EHR data that could be integrated into our results and ranking algorithm.

Beyond that, there are many KPs that we do not interact with. This issue will track progress on further integration.

Value of the info.x-trapi.test_data_location property for the improving-agent ARA entry in the Translator SmartAPI Registry

The current value (as of August 4th, 2022 morning) of the info.x-trapi.test_data_location property is set for the improving-agent ARA entry in the Translator SmartAPI Registry to a non-existing github target. Please set the URL to a single JSON file resource, preferrably at the end of a https://raw.githubusercontent.com/ endpoint (if in Github. Actually, it can be anywhere on the internet as long as it is a REST accessible JSON file resource).

Improves returning results with scores 0 value

ARS has noticed that improving is sending results with score value of 0.
this is the parent pk we encounter the issue
https://ars.test.transltr.io/ars/api/messages/d3a8e46d-56c4-42bd-a76e-d8a073fc5b88?trace=y

tests

Write tests to cover existing code and define a framework and standards that PRs must meet in re: testing

TRAPI 1.0

Bring imProving Agent to TRAPI 1.0 compliance

Move to ITRB

Per @cbizon

Please move production to ITRB prod asap
Also we will want ITRB testing and staging soon

Support for directed edges

Surfaced here: NCATSTranslator/minihackathons#41

In an earlier time when we wanted to expose as many results as possible, we wrote our query logic to make all edges undirected. This usually wasn't an issue given a predicate and two different node types, however this query shows an issue in which the edge direction is important for two nodes of the same type.

Supporting this requires some additional query logic to be written in the TRAPI to Cypher translation. Alternatively, we could leave that logic alone and filter results that don't match the intended edge direction. The former case would be cleaner and more computationally efficient, but could lead to issues where Biolink modeling does not agree with SPOKE modeling. The latter may be the simpler, though less efficient approach.

Investigate intermediate node results membership

From Mark:

We're seeing results come back like this:

"node_bindings": {
              "on": [
                {
                  "id": "NCBIGene:154",
                  "query_id": null,
                  "attributes": null
                }
              ],
              "sn": [
                {
                  "id": "PUBCHEM.COMPOUND:5311065",
                  "query_id": null,
                  "attributes": null
                }
              ],
              "intermediate_gene_96062": [
                {
                  "id": "NCBIGene:348",
                  "query_id": null,
                  "attributes": null
                }
              ]

with the extra "intermediate_gene" node.  Is this intentional?  At the moment, it throws off the ARS merge behavior because we determine whether two results are the same by looking for the set of nodes in the node bindings to match.  So, this one doesn't match with the results returned by other ARAs that are the same, except for lacking the intermediary node.

This is in CI with PK = 21791a56-966b-477b-9f6e-7fb3450387cf

Docker and CI/CD

fully Dockerize the (im)PROVE agent
-- SPOKE is currently running in Docker, but we need to get nginx, uwsgi, and the actual evidARA Python code into images as well
decide and implement a CI/CD framework
-- evidARA is currently running in a VPC in AWS, so the most likely choice is alongside other services in CodePipeline
-- this could likely exist in the free tier of Github's Actions or CircleCI, so those options will also be evaluated

ARS Integration

Improving Agent does not currently respond to queries sent through the ARS. I believe this may be an issue related to TRAPI versioning
https://ars.transltr.io/ars/api/messages/9d330568-6e65-4e23-8465-b66fb321c955

disease identifier (minor issue)

If I query: What disease { cond_assoc_w_gene | gene_assoc_w_cond | genetic_association } with gene MUC5B (727897), then all ARA's return an important answer: IPF, Mondo ID 0008345. However, ImProving returns a level below IPF, which is ILD2, Mondo ID 0800029. Is this as expected? It cannot be de-duped with all the other ARA results of IPF.

@brettasmi