Code Monkey home page Code Monkey logo

improving-agent's Introduction

imProving Agent

imProving Agent is Autonomous Reasoning Agent built on top of Scalable Precision Medicine Oriented Knowledge Engine (SPOKE) and is part of the NCATS Biomedical Data Translator Network. It aims to improve user queries by utilizing EHR and multi-omic cohorts to extract the best knowledge for a given concept. Use these links to find out more about imProving Agents's Data, its algorithms, or some of its multi-omic cohort data.

Using imProving Agent as a client

Find a Jupyter notebook with some basic examples of appropriate Translator Reasoner API queries here.

Deploying

SPOKE database

imProving Agent relies on a bolt connection to a Neo4j instance of SPOKE

PSEVs

For ranking, imProving Agent relies on PSEVs. PSEVs are accessed via the psev-service

Environment variables

Depending on the environment in which imProving Agent is to be run, a number of environment variables must be set.

ITRB deployment

The following variables must be set for deployment in the ITRB test, staging, and production environments:

  • APP_ENV_IA should be set to itrb
  • AWS_REGION should be set to the AWS region in which the secrets noted below are stored.
  • NEO4J_SPOKE_HOSTNAME should be set to the environment-specific (CI, Production, Test) hostname of the instance hosting SPOKE
  • NEO4J_SECRETS_NAME should be set to the environment-specific name of the secret that contains the authentication credentials for SPOKE
  • PSEV_SECRETS_NAME should be set to the environment-specific name of the secrets that will be used to authenticate with the psev-service
  • PSEV_SERVICE_HOSTNAME the environment-specific hostname of the instance hosting the psev-service. In most cases, this should be set to the same value as the NEO4J hostname above

Local and dev deployment

  • APP_ENV_IA should be set to local or dev
  • NEO4J_SPOKE_HOSTNAME should be set to the docker network name of the SPOKE instance to which imProving Agent will connect. In some cases, e.g. when running against remote databases, this can be set to the remote hostname
  • NEO4J_SPOKE_PASS should be set to the password of the local SPOKE Neo4j instance
  • NEO4J_SPOKE_USER should be set to the user of the local SPOKE Neo4j instance
  • PSEV_API_KEY should be set to the value of the key that will be used to connect with the local psev-service instance
  • PSEV_SERVICE_HOSTNAME should be set to the docker network name of the psev-service to which imProving Agent will connect
  • AWS_SHARED_CREDENTIALS_FILE if running locally against remote resources in AWS, this should be set to the location where boto3 can find your AWS credentials

Networking

The imProving Agent HTTP service will be exposed on port 3031 and uwsgi stats will be exposed on port 3032. When running locally via docker-compose, these ports are bound to host ports 3033 and 3034, respectively.

Running imProving Agent

Given the depedencies described above, start the service with docker-compose up web

improving-agent's People

Contributors

brettasmi avatar aojesanmi avatar

Stargazers

HuFeiHu avatar Yetmens avatar

Watchers

James Cloos avatar  avatar

improving-agent's Issues

Node normalization

Currently, we only accept CURIEs that are native to SPOKE. We need set up our client for requesting from SRI resources to resolve and map CURIEs from other dbs.

Integrate COHD

The COHD KP offers easily ingestible data on clinical co-occurrence for concepts in the EHR. The data are encoded with OMOP, which should make for an easy mapping to SPOKE with existing code or the SRI node-normalization service.

scoring issue with MVP1 post KL/AT updates

From Sarah:
it looks up in Test, i see scores and subgraphs for MVP2/genes but I'm not seeing scores for MVP1, is that expected?

here is the MVP1/Disease 28af66c5-0292-4d9a-9273-f92938ffe052

Support qNodes with ids but no categories

As surfaced here NCATSTranslator/minihackathons#11 , the imProving Agent does not support qNodes with qNode IDs but no categories.

This was an intentional design decision that was used for retrieving relevant node-specific identifier regexes to prevent Cypher injection and to skip querying the node normalizer if the identifier is recognized. However, this could be updated to always query the node normalizer regardless and the resulting node category from the node normalizer would allow the retrieval of the relevant regex.

Broader KP interaction

As surfaced here: NCATSTranslator/minihackathons#71 (comment) we do not query data from the valuable and unique ICEES APIs that provide exposures and other EHR data that could be integrated into our results and ranking algorithm.

Beyond that, there are many KPs that we do not interact with. This issue will track progress on further integration.

Value of the info.x-trapi.test_data_location property for the improving-agent ARA entry in the Translator SmartAPI Registry

The current value (as of August 4th, 2022 morning) of the info.x-trapi.test_data_location property is set for the improving-agent ARA entry in the Translator SmartAPI Registry to a non-existing github target. Please set the URL to a single JSON file resource, preferrably at the end of a https://raw.githubusercontent.com/ endpoint (if in Github. Actually, it can be anywhere on the internet as long as it is a REST accessible JSON file resource).

tests

Write tests to cover existing code and define a framework and standards that PRs must meet in re: testing

TRAPI 1.0

Bring imProving Agent to TRAPI 1.0 compliance

Move to ITRB

Per @cbizon

Please move production to ITRB prod asap
Also we will want ITRB testing and staging soon

Support for directed edges

Surfaced here: NCATSTranslator/minihackathons#41

In an earlier time when we wanted to expose as many results as possible, we wrote our query logic to make all edges undirected. This usually wasn't an issue given a predicate and two different node types, however this query shows an issue in which the edge direction is important for two nodes of the same type.

Supporting this requires some additional query logic to be written in the TRAPI to Cypher translation. Alternatively, we could leave that logic alone and filter results that don't match the intended edge direction. The former case would be cleaner and more computationally efficient, but could lead to issues where Biolink modeling does not agree with SPOKE modeling. The latter may be the simpler, though less efficient approach.

Investigate intermediate node results membership

From Mark:

We're seeing results come back like this:

"node_bindings": {
              "on": [
                {
                  "id": "NCBIGene:154",
                  "query_id": null,
                  "attributes": null
                }
              ],
              "sn": [
                {
                  "id": "PUBCHEM.COMPOUND:5311065",
                  "query_id": null,
                  "attributes": null
                }
              ],
              "intermediate_gene_96062": [
                {
                  "id": "NCBIGene:348",
                  "query_id": null,
                  "attributes": null
                }
              ]

with the extra "intermediate_gene" node.  Is this intentional?  At the moment, it throws off the ARS merge behavior because we determine whether two results are the same by looking for the set of nodes in the node bindings to match.  So, this one doesn't match with the results returned by other ARAs that are the same, except for lacking the intermediary node.

This is in CI with PK = 21791a56-966b-477b-9f6e-7fb3450387cf

Docker and CI/CD

  • fully Dockerize the (im)PROVE agent
    -- SPOKE is currently running in Docker, but we need to get nginx, uwsgi, and the actual evidARA Python code into images as well

  • decide and implement a CI/CD framework
    -- evidARA is currently running in a VPC in AWS, so the most likely choice is alongside other services in CodePipeline
    -- this could likely exist in the free tier of Github's Actions or CircleCI, so those options will also be evaluated

disease identifier (minor issue)

If I query: What disease { cond_assoc_w_gene | gene_assoc_w_cond | genetic_association } with gene MUC5B (727897), then all ARA's return an important answer: IPF, Mondo ID 0008345. However, ImProving returns a level below IPF, which is ILD2, Mondo ID 0800029. Is this as expected? It cannot be de-duped with all the other ARA results of IPF.

@brettasmi

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.