Code Monkey home page Code Monkey logo

medikanren's Introduction

mediKanren

*** FOR RESEARCH PURPOSES ONLY ***

Proof-of-concept for reasoning over medical knowledge graphs, using miniKanren + heuristics + indexing.

There are several prototypes, each in its directory:

Contributed use cases, queries and applications are now located in a directory separate from medikanren itself:

If you have previously contributed code applying medikanren and can't find yours, look there.

medikanren's People

Contributors

gregr avatar jeffhhk avatar kaiwenho avatar kimthi1011 avatar michaelballantyne avatar mzheng17 avatar namin avatar nathanielrb avatar triagedr avatar webyrd avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

medikanren's Issues

Chembl prefix being used

Queries sent to Unsecret agent return compounds from chembl with the "chembl" prefix rather than that "chembl.compound" prefix that is in TRAPI.

Unsecret Agent not returning results for connections to UMLS identifiers for vomiting/nausea

In attempting to recreate the results of the TIDBIT regarding cyclic vomiting, I sent out several queries for connections to various identifiers related to vomiting, nausea, etc. There are connections to these identifiers present in SemMedDB, but no results are returned from Unsecret. This query graph was submitted to Unsecret through the ARS.

{"message":{
  
  "query_graph": {
    "nodes": [
      {
        "id": "n0",
        "set": false,
        "curie":"UMLS:C0520909",
        "type": "ChemicalSubstance"
      },
      {
        "id": "n1",
        "type": "named_thing",
        "set": false
      }
    ],
    "edges": [
      {
        "id": "e0",
        "source_id": "n1",
        "target_id": "n0"
      }
    ]
  }
}}

The following identifiers were also used without results being returned:
UMLS:C0027497
UMLS:C0027498
UMLS:C0718572
UMLS:C0722001
UMLS:C0520904
UMLS:C0520909

Add "entities like these" to mediKanren

Google used to have a feature where you could enter a set of things, and then ask it to extend the list with more "similar" entities.

For example, if you entered "dog, cat, cow", it might add "horse, pig, chicken."

It would be cool if we could do this in mediKanren. Given a list of entities, we could find overlapping properties, e.g. each one INHIBITS c for the same concept c. Then, we could look for other inhibitors of c.

We could rank by the number of shared predicates. And weight by the number of elements in the core set that it shares it with.

For examples, if 2 out of 3 items in the core set share a predicate, then satisfying this predicate is worth .66 points in the ranking.

Add query template for phenotypic drug repurposing

Problem: We want to be able to recommend drugs based on what a disease does to a patient.

We want to be able to run queries of the form:

"y such that [disease A] increases x (for some x) AND drug y decreases x."

and

"y such that [disease A] decreases x (for some x) AND drug y increases x."

I think the most general query template would be:

"y such that R(A,x) and R’(y,x) for some x."

ITRB deployments

I know that medikanren's server is being rebuilt. This is a reminder that when that is complete, we need an ITRB deployment in the prod environment, correctly annotated in the smartAPI registry

Add tissue-specific filter

Given a list of gene names (and maybe metabolites too), filter those that have high expression in a particular tissue type.

This would be particularly useful for the output.

For example, "restrict output to all genes highly expressed in the uterus."

[This came up as a request during the May hackathon working with an SME.]

A bunch of semi-related tasks to help transition to mediKanren 2

Since we are moving to mediKanren 2, there are a bunch of related things to do:

  • port the webserver to mediKanren 2

  • implement TRAPI 1.1 compliance for the May Relay

  • augment our server with “pragma” style directives that extend TRAPI, so we can do reasoning that TRAPI doesn’t currently support, but which is useful to PMI use cases

  • implement the ability for TRAPI requests to span multiple KGs

  • implement light weight reasoning / query expansion

  • implement / improve node and edge normalization

  • make sure the NCATS TRAPI queries we are getting don’t break or DoS the server, and return reasonable answers

building index with rust

So I've started porting code/csv-semmed-ordered-unique-enum.rkt to rust. It works on the sample_semmed.csv

On the semmedDB page I am seeing a ~2GB PREDICATION gzipped sql dump. Can you confirm this is the right data source?

If that is the case, the CSV file it produces is about 9.5GB. I think it would be simpler to just decompress and process the sql.gz directly as its really almost identical to csv anyway. I'd like to make the tool usable for everyone though and if you need explicit csv support that would be good to know

Create web interface

Now that researchers are asking to use the tool, a web interface would avoid the need for people to install Racket. More importantly, it would avoid the need for them to deal with processing or downloading data sources.

It is critical that the web interface retains the interactive feel and responsiveness of the tool.

Improved concept search

Searching for 'beta-catenin' does not return 'beta catenin', although perhaps it should.

Searching for 'betacatenin' returns nothing.

Consider using edit distance as well.

Add metabolic databases

We want to add resources like biocyc/metacyc.

When working on different aspects of the phenotype for a disease with an unknown genetic cause, it may be useful to do "pathway intersections" between two different genes implicated in different aspects of the phenotype.

Import wikidata

We should also consider using wikidata as our "patch" system when we correct errors in the NLP-generated data.

Deploy TRAPI 1.3

Here's your official ticket:
"https://arax.ncats.io/?smartapi=1 shows that mediKanren has 1.1 in development (infores:unsecret-agent). Please deploy 1.3 of this tool into development ASAP."

Tickets were created for all tools that do not currently have 1.3 in development. We know you are rebuilding the tool and making tremendous progress. But we would be remiss if we didn't keep you in the loop that we're moving to 1.3. Please keep us posted and keep on trucking.

Generalize queries supported in current GUI

Right now the Racket GUI supports:

Concept 1 -> Predicate 1 -> X -> Predicate 2 -> Concept 2

and

Concept 1 -> Predicate -> X

and

X -> Predicate -> Concept 2

where X is some unspecified concept.

However, the existing interface does not support the direct connection between two specified concepts and a specified predicate:

Concept 1 -> Predicate -> Concept 2

It would also be useful to be able to specify the middle concept in the two predicate query above:

Concept 1 -> Predicate 1 -> Concept 3 -> Predicate 2 -> Concept 2

Also, it would be handy to have the ability to specify the synthetic predicate 'any predicate', and the synthetic concept 'any concept' (which would subsume the X above).

It would be useful to be able to specify the types of an underspecified concept: gene product, disease, phenotype, etc. We could support SemMedDB semantic types, but we probably should also support synthetic concept types, since the SemMedDB types are rather messy.

We should support sorting and filtering of answers.

Add "mechanisms in common" query

Given a set of concepts (likely drugs), find all mechanisms they touch in common.

For example, if drug X inhibits gene Y and drug Z inhibits gene Y, then "inhibits gene Y" is a mechanism in common.

suggestion from Melissa Haendel: document licensing of KSs used

For the reasoners- you may wish to examine licensing as a criteria for inclusion. You can request license evaluation if you are uncertain here:
http://reusabledata.org/
https://github.com/reusabledata/reusabledata/issues/new
GitHub
Build software better, together
GitHub is where people build software. More than 27 million people use GitHub to discover, fork, and contribute to over 80 million projects.
or you can also curate your own and make a PR. I would greatly appreciate if we can ensure that all Translator Knowledge sources are curated for licensing information and we would be grateful for assistance.

License?

I could be missing it, but I didn't see one committed to the repository

Need a way to export results

While demoing the tool, folks have been asking if I could export results for them to take away.

A simple text file dump would be a good enough way to start.

Later on, .csv, .json, etc. export would be great.

Output not being flushed

I'm trying to use mediKanren with the entire SemMedDB. I obtained the source db and created a .csv following the same format as indicated in code/sample_semmed.csv, at about 10GB. Everything seems to run fine but it's not creating any output. It seems that everything is kept in RAM until the end - which means the machine ran out of RAM and swap well before that. There doesn't seem to be anything wrong with printing to file on my system (macOS + Racket v7.3), only with flushing at these lines:

(flush-output out-predicate)
(flush-output out-semtype))

I noticed the todo in README.md dated Nov 27, 2017:
TODO: add SemMedDB files, along with terms of use information for SemMedDB.

What is the recommended solution for this problem?

Integrate pharos data

A good first cut would be to important INHIBITS/STIMULATES relationships on genes/targets.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.