Code Monkey home page Code Monkey logo

biothings_explorer_pfocr_prioritization's Introduction

Current Repository for the WikiPathways Web Site

Uptime Robot status Twitter Follow

This repository contains the code and development history of the main web site for the WikiPathways project: wikipathways.org. Built upon MediaWiki, the site includes numerous custom extensions, javascript, skins and hacks.

How to Contribute

Installation

We do not recommend attempting to install this site code as-is. There are many parts and services required that are not included here. Contact one of the architects for more details.

If you do attempt to install, note instructions in the README.md files in each subdirectory of wpi/extensions/, e.g., GPMLConverter and Pathways.

Contributing Pathway Content

If you are interested in adding or editing pathway diagrams, check out these resources:


Old Repo: http://svn.bigcat.unimaas.nl/wikipathways/

biothings_explorer_pfocr_prioritization's People

Contributors

alexanderpico avatar ayushi-agrawal-gladstone avatar

Watchers

 avatar  avatar  avatar

biothings_explorer_pfocr_prioritization's Issues

Common names (instead of CURIEs) in heatmaps

The heatmaps with result CURIEs by pathway figures have NCBI Gene ids or MESH ids. These are less easier to remember for the user. Having the gene symbols, chemical names and disease names instead of CURIEs will be more useful.

Suggested by @khanspers.

"Require specified query nodes in pathway results" checkbox not working

This issue was reported by @khanspers on Slack. Below are details of the root cause analysis.

Issue:
For the same results, running the notebook twice with the Require specified query nodes in pathway results set to either true and then false returns the same figures. Is this expected? Meaning in some cases maybe there just aren't any figures that ONLY include the BTE results?

Root cause:
The variable required_curies is not reset in the same session of the notebook.

Steps to replicate:

  1. Run the PET notebook with Require specified query nodes in pathway results as True. The required_curies variable now stores the list of the query nodes with ids.
  2. Run the notebook in the same session (with no reloads or re-opening of the notebook) with Require specified query nodes in pathway results as False, the required_curies variable still has the node ids from the previous run and so same results are generated.

Proposed resolution:
Set required_curies = set() in the Iterative Enrichment of TRAPI Results Using PFOCR Pathway Figures section of the PET notebook before checking for the user-defined value of Require specified query nodes in pathway results.

More recent TRAPI results cause problems in PET notebook

Some TRAPI results cause an error in the PET notebook, seemingly because "name" is not found for some nodes in the returned knowledge graph.

The error is:

Error in Python
KeyError Traceback (most recent call last)
in <cell line: 6>()
5 curie_to_unified_curies = dict()
6 for k, v in trapi_message["knowledge_graph"]["nodes"].items():
----> 7 curie_to_name[k] = v["name"]
8 #first check is k is already what we want
9 [prefix, identifier] = k.split(":")

KeyError: 'name'

Query results examples (both are "what treats Huntington's disease?"):
Results URL that works: https://arax.ncats.io/api/arax/v1.3/response/a46d7144-718e-40dc-bde7-8733f38d9d04 (run around mid-Febrary)
Results URL that doesnt work: https://arax.ncats.io/api/arax/v1.3/response/33dd53d9-fd8e-4f17-9a3b-055a782d8467 (run on April 5)

For now, the PET notebook has been updated to not expect a "name", to make it possible to run the notebook. However, this affects how results from the notebook are displayed, it now presents the CURIE instead of the name, which is less human readable.

Alternative to iterative enrichment-with-exclusion

A common solution to the redundancy issue with enrichment results is to filter by Jaccard similarity index. Algorithm:

  1. Calculate enrichment results normally
  2. Decide on max Jaccard threshold for filtering (e.g., 0.5)
  3. Calculate the Jaccard index* between the top ranked pathway and the next pathway in the ranked list of results
  4. If index is > threshold (i.e., too similar), then discard next; if < threshold, then keep.
  5. Continue to evaluate subsequent pathways against each of the retained pathways until you reach desired number of results (n) or run out of pathways.

* Jaccard index formula: count of entities in intersection / count of entities in union

Required CURIE missing from top hit

https://arax.ncats.io/api/arax/v1.3/response/8a8007f6-d02d-4578-81e3-df041ef5541b

The results all should contain the MESH for Alzheimer's, but the first one definitely does not. At least not by eye or by its pfocr web page: https://pfocr.wikipathways.org/figures/PMC5541263__onc2016467f8.html.

First, double check the dropbox pfocr files that are actually used. Maybe it is in there?

Next, check the enrichment code block to see if it is being erroneously inserted somehow.

Not clear what the input TRAPI results URL should look like

It is not clear that a JSON TRAPI result URL is expected by the notebook as input.
Suggested solutions:

  1. In the user input widget, "TRAPI Result URL" field could be renamed to "TRAPI Result URL (JSON)"
  2. At the top of notebook, provide more details that a JSON TRAPI results URL is needed as input. This is not clear from the current description.

Reported by @khanspers

Query examples to test

Try running the notebook for more queries for further testing. Examples:

  1. Original example results from #538:
  1. Liver injury query from Question of the Month
  2. Alzheimer's disease query from Question of the Month

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.