Code Monkey home page Code Monkey logo

biothings_explorer's People

Contributors

andrewsu avatar ariutta avatar colleenxu avatar dependabot[bot] avatar ericz1803 avatar kannabhargav avatar kevinxin90 avatar marcodarko avatar mnarayan1 avatar newgene avatar pahmadi8740 avatar rjawesome avatar sengineer0 avatar tokebe avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

biothings_explorer's Issues

Add GO qualifiers to mygene.info record in SmartAPI

BTE is not correctly interpreting mygene.info output on GO annotations because it is ignoring the qualifiers. I believe the fix involves a modification of the mygene.info SmartAPI record (and hopefully TRAPI has a way of expressing qualifiers). Example below...

I issued this query to get BiologicalProcesses related to the gene VAMP2:

{
	"message": {
		"query_graph": {
			"nodes": {
				"n0": {
					"id": "NCBIGENE:6844",
					"category":"biolink:Gene"
				},
				"n1": {
					"category": "biolink:BiologicalProcess"
                }
			},
			"edges": {
				"e01": {
					"subject": "n0",
                    "object": "n1"
                }
			}
		}
	}
}

The following edge linking VAMP2 to neutrophil degranulation (GO:0043312) is returned in the output:

                "NCBIGENE:6844-GO:0043312-MyGene.info API-NCBI Gene": {
                    "predicate": "biolink:participates_in",
                    "subject": "NCBIGENE:6844",
                    "object": "GO:0043312",
                    "attributes": [
                        {
                            "name": "provided_by",
                            "value": "NCBI Gene",
                            "type": "biolink:provided_by"
                        },
                        {
                            "name": "api",
                            "value": "MyGene.info API",
                            "type": "bts:api"
                        },
                        {
                            "name": "evidence",
                            "value": "IMP",
                            "type": "bts:evidence"
                        },
                        {
                            "name": "publications",
                            "value": [
                                "PMID:16677249"
                            ],
                            "type": "biolink:publications"
                        }
                    ]
                },

The original content from http://mygene.info/v3/gene/6844?fields=go looks like this:

{
   "evidence": "IMP",
   "gocategory": "BP",
   "id": "GO:0043312",
   "pubmed": 16677249,
   "qualifier": "NOT",
   "term": "neutrophil degranulation"
},

Critically, the NOT qualifier in the mygene.info record is not being shown in the TRAPI BTE output, which completely reverses the interpretation.

Need a nodejs package handling BioLink model

The package needs to be separate from current TRAPI code repo.

It should perform:

  1. Given a specific node type (e.g. biolink:GeneOrGeneProduct), return all descendants/ancestors of that node type.
  2. Given a specific node type, return all available ID Prefixes defined in BioLink model
  3. Given a specific ID Prefix, return all node types which can have this ID Prefix.
  4. Given a specific predicate, return all its descendants/ancestors predicates

Use NodeNormalizer to resolve QNodes with only id specified

Currently, BTE use BioThings APIs to resolve identifiers, which requires category (e.g. Gene, ChemicalSubstance) to be specified.

TRAPI standard does allow user to specify a query without category info.

So in order to support that, we should include NodeNormalizer as a fallback.

how to query by UniProtKB CURIE?

The issue at NCATSTranslator/testing#10 reports that BTE does not return any results for the following query:

{
	"message": {
		"query_graph": {
			"nodes": {
				"n0": {
					"id": "UniProtKB:P52788",
					"category":"biolink:Gene"
				},
				"n1": {
					"category": "biolink:ChemicalSubstance"
                }
			},
			"edges": {
				"e01": {
					"subject": "n0",
                                        "object": "n1"
                                }
			}
		}
	}
}

If I convert UniProtKB:P52788 to NCBIGENE:6611 (based on http://mygene.info/v3/query?q=P52788&fields=entrezgene,uniprot), the query returns many results as expected. I tried adjusting the category for n0 to biolink:Protein and biolink:GenomicEntity, but those queries also return zero results. What is the proper way to form a BTE TRAPI query for a UniProtKB CURIE?

Query not working

{
  "message": {
    "query_graph": {
      "nodes": {
        "n00": {
          "id": "MONDO:0002715",
          "category": "biolink:Disease"
        },
        "n01": {
          "category": "biolink:ChemicalSubstance"
        },
        "n02": {
          "category": "biolink:Gene"
        }
      },
      "edges": {
        "e00": {
          "predicate": "biolink:correlated_with",
          "subject": "n00",
          "object": "n01"
        },
        "e01": {
          "predicate": "biolink:related_to",
          "subject": "n01",
          "object": "n02"
        }
      }
    }
  }
}

Error:

{
    "error": "TypeError: Cannot convert undefined or null to object"
}

Add additional node attributes including nodeDegree

  1. How many unique source KG nodes does this KG node connects from.
  2. How many unique target KG nodes does this KG node connects to.
  3. How many unique edges (source-predicate-target) does this KG node connects from.
  4. How many unique edges (source-predicate-target) does this kG node connects to.

Create a module to transform TRAPI Query Graph

  1. Expand node by its id, e.g. if user provides a MONDO ID as input, we will traverse MONDO hierarchy to get all its descendants.
  2. Expand node by its category, e.g. if user provides a NamedThing category, we will traverse BioLink class hierarchy to get all descendants of NamedThing class.
  3. Expand predicate, e.g. if user provides a related_to predicate, we will traverse BioLink predicate hierarchy to get all descendants of related_to predicate.

BTE doesn't handle predicate as a list

According to TRAPI: predicate is supported as list or as a string

However, current BTE implementation doesn't support list.

The following query fails:

{
  "message": {
    "query_graph": {
      "edges": {
        "e00": {
          "object": "n01",
          "subject": "n00",
          "predicate": ["biolink:physically_interacts_with"]
        }
      },
      "nodes": {
        "n00": {
          "category": "biolink:ChemicalSubstance",
          "id": "DRUGBANK:DB00188"
        },
        "n01": {
          "category": "biolink:Gene"
        }
      }
    }
  }
}

The error message is:

{
    "error": "TypeError: this.predicate.startsWith is not a function"
}

Improve logging module

Current logging only provides how a TRAPI query is parsed and how SmartAPI kg is used. Should include additional information such as:

  1. what's the query made to API
  2. how many response do we get from each API call.
  3. How many response do we get after merging the results from different KPs.

Above need support from other bte related nodejs packages.

Query with unexpected exceptions

{
	"message": {
		"query_graph": {
			"nodes": {
				"n0": {
					"id": "WIKIPATHWAYS:Pathway:WP195",
					"category": "biolink:Pathway"
				},
				"n1": {
					"category": "biolink:Gene"
				},
				"n2": {
					"category": "biolink:ChemicalSubstance"
				}
			},
			"edges": {
				"e01": {
					"subject": "n0",
					"object": "n1"
				},
				"e02": {
					"subject": "n1",
					"object": "n2"
				}
			}
		}
	}
}

slice

Include additional node attributes in TRAPI Knowledge Graph

  1. Chemical:
    • chembl_max_phase
    • chembl_molecule_type
    • chembl_drug_category
    • drugbank_class
    • drugbank_groups
    • drugbank_kingdom
    • drugbank_superclass
    • contraindications
    • mesh_pharmacology_class
    • fda_epc_pharmacology_class
  2. Gene:
    • interpro
    • type_of_gene
  3. Pathway:
    • number_of_participants
  4. BiologicalProcess:
    • number_of_participants
  5. CellularComponent:
    • number_of_participants
  6. MolecularActivity:
    • number_of_participants

Accessing LINCS data portal API thru BTE

Summary: I think BTE is making an error in setting up the API request for LINCS data portal API. We are required to provide the input ID as a curie, so I set it as a ChemicalSubstance with the id "LINCS:LSM-1023" (which is imatinib). The logs show that the LINCS API query is then (see the bold for the error):

    {
      "timestamp": "2021-03-24T04:11:46.587Z",
      "level": "DEBUG",
      "message": "call-apis: Succesfully made the following query: {\"url\":\"http://lincsportal.ccs.miami.edu/dcic/api/drugindication\",\**"params\":{\"id\":\"LINCS:LSM-1023\"}**,\"method\":\"get\",\"timeout\":50000}",
      "code": null
    },

Looking at the smartapi page for LINCS data portal, the id field should not have a prefix...it should only have the id "LSM-1023".


The situation: I tried to query the LINCS data portal API thru BTE's /v1/smartapi/{smartapi_id}/query endpoint.

The smartapi_id is 9ee398a738916a98b612068cc022454f, the request body is:

{
  "message": {
    "query_graph": {
      "edges": {
        "e00": {
          "object": "n01",
          "subject": "n00"
        }
      },
      "nodes": {
        "n00": {
          "category": "biolink:ChemicalSubstance",
          "id": "LINCS:LSM-1023"
        },
        "n01": {
          "category": "biolink:Disease"
        }
      }
    }
  }
}

It returns no hits.


However, if I query the LINCS Data portal endpoint directly with the id as "LSM-1023", I get multiple results like:

{"documents": [
{
"lsm_id":"LSM-1023",
"efo_id":"Orphanet:44890",
"efo_term":"GASTROINTESTINAL STROMAL TUMOR",
"max_fda_phase_for_ind":"4",
"mesh_heading":"GASTROINTESTINAL STROMAL TUMORS",
"mesh_id":"D046152"
}
,
{
"lsm_id":"LSM-1023",
"efo_id":"EFO:0000691",
"efo_term":"SARCOMA",
"max_fda_phase_for_ind":"3",
"mesh_heading":"SARCOMA",
"mesh_id":"D012509"
}

Note: I'm not sure if the BTE Python client has an issue with this API too, since it accepts only LINCS IDs and I'm not sure if BTE will ever end up querying it.

Node related info should not appear in edge data in TRAPI response

Current behavior in edge response:

"attributes": [
                        {
                            "name": "provided_by",
                            "value": "Text Mining KP",
                            "type": "biolink:provided_by"
                        },
                        {
                            "name": "api",
                            "value": "Text Mining Targeted Association API",
                            "type": "bts:api"
                        },
                        {
                            "name": "CHEBI",
                            "value": "CHEBI:32630",
                            "type": "bts:CHEBI"
                        },
                        {
                            "name": "object_spans",
                            "value": [
                                "start: 91, end: 96",
                                "start: 62, end: 67"
                            ],
                            "type": "bts:object_spans"
                        },
                        {
                            "name": "relation_spans",
                            "value": [
                                "",
                                ""
                            ],
                            "type": "bts:relation_spans"
                        },
                        {
                            "name": "score",
                            "value": [
                                "0.9994468",
                                "0.97133327"
                            ],
                            "type": "bts:score"
                        },
                        {
                            "name": "sentence",
                            "value": [
                                "Dietary restriction of leucine for at least three days could result in the inactivation of Hsf-1, leading to a reduction in Hsp70 synthesis.",
                                "However, in cells that were leucine starved for 3 and 4 days, Hsf-1 activity and Hsp70 synthesis level was dramatically decreased."
                            ],
                            "type": "bts:sentence"
                        },
                        {
                            "name": "subject_spans",
                            "value": [
                                "start: 23, end: 30",
                                "start: 28, end: 35"
                            ],
                            "type": "bts:subject_spans"
                        },
                        {
                            "name": "publications",
                            "value": [
                                "PMID:31397439",
                                "PMID:31397439"
                            ],
                            "type": "biolink:publications"
                        }
                    ]

Information such as CHEBI does not belong here. Needs to be removed.

Handle explain type of query

{
    "message": {
        "query_graph": {
            "nodes": {
                "a": {
                    "category": "biolink:Disease",
                    "id": "MESH:D015464"
                },
                "b": {
                    "category": "biolink:ChemicalSubstance",
                    "id": "CHEBI:45783"
                },
                "c": {
                    "category": "biolink:Gene"
                }
            },
            "edges": {
                "ac": {
                    "subject": "a",
                    "object": "c"
                },
                "bc": {
                    "subject": "c",
                    "object": "b"
                }
            }
        }
    },
    "knowledge_graph": {
        "nodes": [],
        "edges": []
    },
    "results": []
}

Investigate mongodb as a persistant data storage

It's a good feature to store user request persistently, so users can come back and look up their results just using the answer id we assign to them.

We could also hook this up with the web interface. Given an answer id, the UI can fetch results directly from mongodb and display the results as graph/table for exploration.

Support handling list as value for category

One ID might belong to multiple semantic types,
e.g. UMLS:C0008780 can be mapped as a Disease or a PhenotypicFeature

So when user provide the following query:

{
  "message": {
    "query_graph": {
      "edges": {
        "e00": {
          "object": "n01",
          "subject": "n00"
        }
      },
      "nodes": {
        "n00": {
          "category": ["biolink:Disease", "biolink:PhenotypicFeature"],
          "id": "UMLS:C0008780"
        },
        "n01": {
          "category": "biolink:Gene"
        }
      }
    }
  }
}

We should look for Genes which related to UMLS:C0008780 as a Disease or as a PhenotypicFeature.

fix wrong url in CHANGELOG

right now, the commit url and compare url in CHANGELOG are wrong. Need to fix that as well as the .versionrc.json file which helps automatically generate them.

This query is not working

{
	"message": {
		"query_graph": {
			"nodes": {
				"n0": {
					"id": "MONDO:0005132",
					"category":"biolink:Disease"
				},
				"n1": {
					"category": "biolink:ChemicalSubstance"
				},
				"n2": {
					"id": "UMLS:C0032961",
					"category":"biolink:Disease"
				}
			},
			"edges": {
				"e01": {
					"subject": "n1",
					"object": "n0",
					"predicate":"biolink:treats"
				},
				"e02": {
					"subject": "n1",
					"object": "n2",
					"predicate": "biolink:contraindicated_for"
				}
			}
		}
	}
}

Query returns unexpected exceptions

{
    "message": {
        "query_graph": {
            "edges": {
                "e00": {
                    "subject": "n00",
                    "object": "n01",
                    "category": "biolink:correlated_with"
                }
            },
            "nodes": {
                "n00": {
                    "category": "biolink:ChemicalSubstance",
                    "id": "CAS:121999-58-4"
                },
                "n01": {
                    "category": "biolink:ChemicalSubstance"
                }
            }
        }
    }
}

According to Ryan,

This query returns


{
    "error": "TypeError: Cannot read property 'slice' of undefined"
}

Test with clinical risk KP fails because data source changes

FAIL test/integration/TRAPIv1.test.js (97.982 s)
โ— Testing endpoints โ€บ POST /v1/query with clinical risk kp query

expect(received).toHaveProperty(path)

Expected path: "MONDO:0005249"
Received path: []

Received value: {}

  69 |                 expect(response.body.message.knowledge_graph).toHaveProperty("nodes");
  70 |                 expect(response.body.message.knowledge_graph).toHaveProperty("edges");
> 71 |                 expect(response.body.message.knowledge_graph.nodes).toHaveProperty("MONDO:0005249")
     |                                                                     ^
  72 |             })
  73 |     })
  74 | 

  at __test__/integration/TRAPIv1.test.js:71:69
  at Object.<anonymous> (__test__/integration/TRAPIv1.test.js:60:9)

Investigate Timeout Error

{
  "message": {
    "query_graph": {
      "nodes": {
        "n00": {
          "id": "name:Imatinib",
          "category": "biolink:ChemicalSubstance"
        },
        "n01": {
          "category": "biolink:Disease"
        },
        "n02": {
            "category": "biolink:Gene"
        }
      },
      "edges": {
        "e00": {
          "subject": "n00",
          "object": "n01",
          "predicate":"biolink:treats"
        },
        "e01": {
          "subject": "n01",
          "object": "n02",
          "predicate":"biolink:caused_by"
        }
      }
    }
  }
}

Above query results in a 504 timeout error in current BTE app. Need to investigate how that happens and how to set timeout on either express.js end or nginx end.

Use Singleton Design Pattern for BioLink reversal class

Currently, the BioLink reversal class (include file read) has to be initiated every time when processing predicates. Need to modify to adapt Singleton Design Pattern, so it's only initiated once to speed the program up.

Query Fails

{
    "message": {
        "query_graph": {
            "nodes": {
                "n0": {
                    "category": "biolink:Drug",
                    "id": "RXCUI:466423"
                },
                "n3": {
                    "category": "biolink:Disease"
                }
            },
            "edges": {
                "e03": {
                    "subject": "n0",
                    "object": "n3"
                }
            }
        }
    }
}

Error message:


{
    "error": "TypeError: Cannot read property 'id' of undefined"
}

Refactor load meta-kg

Screen Shot 2021-02-07 at 9 37 26 PM

As shown above, the meta-kg sometimes could take up to 20s to load. This is causing serious performance issue on BTE API end. Need to refactor the smartapi-kg package so that it can take a list of specs sending to it as a file instead of making real time API query.

Need also to implement cron job on TRAPI end to fetch SmartAPI specs periodically from SmartAPI API.

Missing type for node attributes

Screen Shot 2021-02-01 at 10 07 13 AM

Type is a required field for TRAPI 1.0 standard. Currently, we have type for all edge attributes, but we don't have type for node attributes.

Create regression testing infrastructure

We would like to create a regression testing framework to quantitatively assess BTE's performance. As a gold standard, we can use the orphan drug indication dataset mentioned in NCATSTranslator/Relay#123 or the mechanistic paths from https://sulab.github.io/DrugMechDB/. For each of those gold standards, we should create a TRAPI query (examples), send it to BTE using a small library of plausible metapaths focused on drug repurposing, and then assess whether BTE was able to retrieve the right drug among the results. (Later we can also assess where that drug ranked among all potential drugs retrieved.) We would want to execute this test on a regular basis (weekly?), and then have a simple web page where results can be viewed/browsed.

tagging @ariutta and @AlexanderPico

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.