rtxteam / rtx Goto Github PK

Software repo for Team Expander Agent (Oregon State U., Institute for Systems Biology, and Penn State U.)

License: MIT License

Python 79.01% Shell 0.99% HTML 0.76% CSS 0.38% Perl 0.03% JavaScript 6.57% Dockerfile 0.08% Jupyter Notebook 12.08% Smarty 0.06% TeX 0.03%

ncats-translator

rtx's Issues

change "pharos_drug" to "drug" in BioNetExpander, BuildMasterKG, and Q2Utils.py

From @saramsey on November 27, 2017 18:45

Do not do this until after the demo.

Copied from original issue: dkoslicki/NCATS#117

Make a script to build of our standards-compliant RTX KG and publish a gzipped tar archive on the web

QueryNCBIeUtils.get_mesh_terms_for_mesh_uid always returns empty

Hi @saramsey, it looks like QueryNCBIeUtils.get_mesh_terms_for_mesh_uid is always returning an empty answer:

>>>QueryNCBIeUtils.get_medgen_uid_for_omim_id('614332')
{482154, 814824}
>>>QueryNCBIeUtils.get_mesh_terms_for_mesh_uid({482154, 814824})
[]

and

>>>QueryNCBIeUtils.get_medgen_uid_for_omim_id('600320')
{325371}
>>>QueryNCBIeUtils.get_mesh_terms_for_mesh_uid({325371})
[]

and a bunch of other similar examples.

Error running Q2

Yesterday Q2 was working, now Q1 is not working. Maybe attempted fix for Q1 broke Q2?

python3 Q2Solution.py -r 'DOID:1686' -d 'physostigmine'
Traceback (most recent call last):
File "Q2Solution.py", line 187, in
main()
File "Q2Solution.py", line 184, in main
res = answerQ2(drug, disease, k)
File "Q2Solution.py", line 70, in answerQ2
RU.weight_graph_with_google_distance(g)
File "/mnt/data/orangeboard/test/RTX/code/reasoningtool/QuestionAnswering/ReasoningUtilities.py", line 693, in weight_graph_with_google_distance
gd_temp = QueryNCBIeUtils.normalized_google_distance(source_mesh_term, target_mesh_term, mesh1=mesh1, mesh2=mesh2)
File "/mnt/data/orangeboard/test/RTX/code/reasoningtool/QueryNCBIeUtils.py", line 232, in normalized_google_distance
nij = QueryNCBIeUtils.get_pubmed_hits_count('({mesh1}) AND ({mesh2})'.format(mesh1=mesh1_str_decorated,
UnboundLocalError: local variable 'mesh1_str_decorated' referenced before assignment

URI's for nodes in KG

URI's are needed for the output format specified by @edeutsch. @saramsey suggested "OK, this should be doable by a simple python class backed by a config file or something. "

Random Walk with Restart development

From @saramsey on November 26, 2017 19:16

This is a placeholder issue for the development of Random Walk with Restart for multi-path relatedness between a "Q1 disease" and a genetic condition (OMIM).

Copied from original issue: dkoslicki/NCATS#98

node1 not defined bug

add ChEMBL drug->target probability scores for target predictions to the KG

From @saramsey on January 10, 2018 1:10

Copied from original issue: dkoslicki/NCATS#133

Node and relationship properties

@saramsey Here is a prioritized list of node types and relationship types for which it's desirable to explore what sorts of properties we can pull from various KS's:

Nodes:

Anatomy
phenotype
microRNA
pathway
protein
disease

(source node type, relationship type, target node type):

(phenotype, phenotype assoc with, anatomy) example desired property: what sort of association?
(phenotype, phenotype assoc with, disease) example desired property: probability/prevalence in population (since some phenotypes seem to be very rare to me and it would be good to have this indicated)
(protein, is expressed in, anatomy)
(protein, gene assoc with, disease) example desired property: what sort of association is this? What's the strength of the association?
(microRNA, gene assoc with, disease)
(microRNA, gene assoc with, phenotype)
(microRNA, is expressed in, anatomy)
(microRNA, controls expression of, protein) example desired property: what sort of control is exerted? What's the strength?

If for each of these you let me know what sort of data is being returned by the KS, I can take a look and we can pick/choose which properties to include.

Need to get restatedQuestion returned from translate()

@dkoslicki In order for the restatedQuestion to be properly filled in for query results, the translate() function needs to return that information. It did under the old system, but no longer does.
The relevant code is line 669 or search for FIXME

I can't figure out how to get the restated question into that slot. Would you fix that?
do git pull in NewStdAPI first.

Some protein nodes are missing descriptions

Eg. P08563

look into cy2neo for neo4j visualization

APOC on all neo4j instances

@saramsey It would be quite handy (especially for the new standardized KG) to have APOC installed. @finnagin got this to work for one instance (and noted a few workarounds that needed to be done with missing libraries). Can we get APOC on all neo4j instances from here on out?

This document seems relevant.

move KG construction code to a folder in the repo

Neo4j node names are a mix of identifiers and real names

For example, disont_disease names are DOID's while the descriptions are human readable. In contrast, pharos_drugs have human readable names, and CHEMBL descriptions.

I vote we standardize to identifiers for names, human readable descriptions. Or perhaps even better a more uniform node property naming convention such as:
ID: identifier (DOID, CHEMBL, etc.)
name: human readable name
description: pulled from MESH
etc.

Connection issue for Neo4j on ncats.saramsey.org

From @saramsey on January 6, 2018 0:7

Q0 and Q3 are now working, but I don’t get Q1 and Q2 to work. I futzed around with getting Q1 to work. It looks like all the neo4j connection commands were hard-wired to lysine and they weren’t working. So I changed them all to localhost. Should that work? But now I get a another connection error.

cd /mnt/data/orangeboard/code/NCATS/code/reasoningtool
grep lysine *.py
(now mostly commented out and replaced with localhost)
python3 Q1Solution.py
…… lots of errors ending with:
requests.exceptions.ConnectionError: ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response',))
ideas?
I suspect it can’t connect to neo4j, but not sure…

Copied from original issue: dkoslicki/NCATS#128

Speeding up QueryNCBIeUtils

Just a note that when you query two MeSH terms, it give the joint and the marginals:
Note that with https://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=pubmed&term=%28%22malaria%22%5BMeSH%20Terms%5D%29%20AND%20%22osteoarthritis%22%5BMeSH%20Terms%5D&retmode=json&retmax=1000
You get back three counts corresponding to joint, marginal, marginal:

{
    "header": {
        "type": "esearch",
        "version": "0.3"
    },
    "esearchresult": {
        "count": "4",
        "retmax": "4",
        "retstart": "0",
        "idlist": [
            "16425715",
            "11775318",
            "3300734",
            "5982767"
        ],
        "translationset": [
        ],
        "translationstack": [
            {
                "term": "\"malaria\"[MeSH Terms]",
                "field": "MeSH Terms",
                "count": "60405",
                "explode": "Y"
            },
            {
                "term": "\"osteoarthritis\"[MeSH Terms]",
                "field": "MeSH Terms",
                "count": "54506",
                "explode": "Y"
            },
            "AND"
        ],
        "querytranslation": "\"malaria\"[MeSH Terms] AND \"osteoarthritis\"[MeSH Terms]"
    }
}

HTTP responses potentially mess up output format?

@edeutsch @saramsey I've noticed that we occasionally get the following printed out to screen when using QueryNCBIeUtils:

HTTP response status code: 502 for URL:
https://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=pubmed&term=%28Van%20Buchem%20disease%20type%202%29%20AND%20%28Osteoporosis%5BMeSH%20Terms%5D%29&retmode=json&retmax=1000

This does not necessarily mean that the QXSolution.py will fail (since I will just then put in the max google distance), but I imagine that having something besides the JSON formatted response printed may mess up the UI/API. Will this be an issue? I don't know if requests is using stdout or stderr to print these as I'm not familiar with the requests package.

Geneprof 500 response

From @erikyao on January 12, 2018 18:59

When running test_geneprof_id_to_transcription_factor_gene_symbols of tests/QueryGeneProfTestCase, QueryGeneProf.geneprof_id_to_transcription_factor_gene_symbols sometimes returns an empty set due to Genprof's 500 response.

Test code:

ret_set = QueryGeneProf.geneprof_id_to_transcription_factor_gene_symbols(16269)  # 'HMOX1'

Debugging message:

HTTP response status code: 500 for URL:
http://www.geneprof.org/GeneProf/api/gene.info/regulation/binary/by.target/human/16269.json?with-sample-info=true

Cause of 500 reported by Geneprof:

java.lang.OutOfMemoryError: unable to create new native thread

We may need to add exception-handling code in this case.

Copied from original issue: dkoslicki/NCATS#134

need a method to convert MeSH tree number to MeSH UID

see issue #22 for the back-story

Why does BuildMasterKG.py use so much memory?

Is it possibly a requests-cache memory leak?

psf/requests#1685

RTXQuery checks only for drugs for Q3

@edeutsch
Hi Eric,

I noticed in RTXQuery.py, line 88 and onwards, there is the following code:

if id == 'Q3':
  targets = qph.query_drug_name_to_targets(terms[0])
  if targets:
    list = '<UL>\n'
    for target in targets:
      list += "<LI> "+target["name"]+"\n"
    list += "</UL>\n"
    codeString = "OK"
    result = [ { "id": 537, "code": 1, "codeString": codeString, "message": "AnswerFound", "result": targets, "text": [ terms[0]+" is known to target: "+list ] } ]
    self.logQuery(id,codeString,terms)
  else:
    codeString = "DrugNotFound"
    result = [ { "id": 537, "code": 11, "codeString": codeString, "message": "DrugNotFound", "text": [ "Unable to find drug '"+terms[0]+"'." ] } ]
    self.logQuery(id,codeString,terms)
  return(result);

Q3 as it stands now can allow for all different node types, so checking query_drug_name_to_targets isn't the best idea here. For example:

QT.find_question_parameters("what phenotype is phenotype associated with malaria")
Out[649]:
{'corpus_index': 3,
 'error_code': None,
 'error_message': None,
 'input_text': 'what phenotype is phenotype associated with malaria',
 'terms': {'relationship_type': 'phenotype_assoc_with',
  'source_name': 'DOID:12365',
  'target_label': 'phenont_phenotype'}}

And the node DOID:12365 has been found by QuestionTranslator.py (it checks if nodes exist in the knowledge graph). So the Q3Solution.py would run on these input terms, but the RTXQuery isn't letting it since it thinks it's just looking for drugs. Perhaps an error message like "Unknown term" or, more specifically, "entity not in knowledge graph" is better here, along with removing the check for query_drug_names_to_targets.

add MeSH definitions into the knowledge graph

From @saramsey on January 9, 2018 22:32

Copied from original issue: dkoslicki/NCATS#132

There non-human proteins showing up in the KG

I've been looking at the COP for the treatment of osteoarthritis by naproxen, and it appears that one of the reason why we are not getting the correct answer is that some of the relevant nodes are not expanded. For example, the PTGS1 gene is a ("the relevant") target of naproxen, and yet:

match (n:uniprot_protein{description:"PTGS1"}) return n.expanded
False

And so I cannot connect PTGS1 to any relevant piece of anatomy:

match p=(s:pharos_drug{name:"naproxen"})-[:targets]-(:uniprot_protein{description:"PTGS1"})-[]-(:anatont_anatomy) return p limit 50
(no changes, no records)

Was there an error in the Orangeboard construction that caused this node not to be expanded? I had assumed that we had seeded with all drugs and expanded from there. Yet:

match p=(s:pharos_drug{name:"naproxen"})-[:targets]-(t:uniprot_protein) return t.name, t.description, t.expanded
╒════════╤═══════════════╤════════════╕
│"t.name"│"t.description"│"t.expanded"│
╞════════╪═══════════════╪════════════╡
│"P25101"│"EDNRA" │false │
├────────┼───────────────┼────────────┤
│"P0DJD9"│"PGA5" │false │
├────────┼───────────────┼────────────┤
│"P23219"│"PTGS1" │false │
├────────┼───────────────┼────────────┤
│"Q15722"│"LTB4R" │false │
├────────┼───────────────┼────────────┤
│"P11712"│"CYP2C9" │true │
├────────┼───────────────┼────────────┤
│"P09917"│"ALOX5" │true │
├────────┼───────────────┼────────────┤
│"O14842"│"FFAR1" │false │
├────────┼───────────────┼────────────┤
│"Q07869"│"PPARA" │false │
├────────┼───────────────┼────────────┤
│"O43174"│"CYP26A1" │false │
├────────┼───────────────┼────────────┤
│"P24530"│"EDNRB" │false │
├────────┼───────────────┼────────────┤
│"P37231"│"PPARG" │true │
├────────┼───────────────┼────────────┤
│"Q8TCC7"│"SLC22A8" │false │
├────────┼───────────────┼────────────┤
│"P49286"│"MTNR1B" │false │
├────────┼───────────────┼────────────┤
│"P08183"│"ABCB1" │true │
├────────┼───────────────┼────────────┤
│"P01375"│"TNF" │true │
├────────┼───────────────┼────────────┤
│"P00374"│"DHFR" │true │
├────────┼───────────────┼────────────┤
│"P08684"│"CYP3A4" │true │
├────────┼───────────────┼────────────┤
│"P35354"│"PTGS2" │true │
├────────┼───────────────┼────────────┤
│"P33261"│"CYP2C19" │false │
├────────┼───────────────┼────────────┤
│"P42330"│"AKR1C3" │false │
├────────┼───────────────┼────────────┤
│"P16473"│"TSHR" │false │
├────────┼───────────────┼────────────┤
│"Q16665"│"HIF1A" │true │
├────────┼───────────────┼────────────┤
│"P48039"│"MTNR1A" │false │
└────────┴───────────────┴────────────┘

Automatically add+expand nodes not in the KG

This was something we had discussed previously: recognizing when a node is absent from the KG and then populating it automatically. For example, it would be nice to have "DOID:9849" in the KG, but it's not currently there. This would require making the KG not read only (which would be a risk) but the tradeoff is probably worth it.

Add protein->GeneOntology mapping

From @saramsey on November 18, 2017 4:25

can use BioLink for this, most likely

Copied from original issue: dkoslicki/NCATS#40

add_rel bug

File "BuildMasterKG.py", line 371, in
running_time = timeit.timeit(lambda: run_function(), number=1)
File "/usr/lib/python3.5/timeit.py", line 213, in timeit
return Timer(stmt, setup, timer, globals).timeit(number)
File "/usr/lib/python3.5/timeit.py", line 178, in timeit
timing = self.inner(it, self.timer)
File "", line 6, in inner
File "BuildMasterKG.py", line 371, in
running_time = timeit.timeit(lambda: run_function(), number=1)
File "BuildMasterKG.py", line 337, in make_master_kg
seed_and_expand_kg_q2(num_expansions=3)
File "BuildMasterKG.py", line 250, in seed_and_expand_kg_q2
bne.expand_all_nodes()
File "/mnt/data/orangeboard/code/NCATS/code/reasoningtool/BioNetExpander.py", line 343, in expand_all_nodes
self.expand_node(node)
File "/mnt/data/orangeboard/code/NCATS/code/reasoningtool/BioNetExpander.py", line 331, in expand_node
expand_method(self)
File "/mnt/data/orangeboard/code/NCATS/code/reasoningtool/BioNetExpander.py", line 82, in expand_ncbigene_microrna
self.add_rel('participates_in', 'gene_ontology', node1, node2)
AttributeError: 'BioNetExpander' object has no attribute 'add_rel'
rt@ip-172-31-43-220:/reasoningtool$
http://www.geneprof.org/GeneProf/api/gene.info/gp.id/human/C_NAME/UBA52.json
Traceback (most recent call last):
File "BuildMasterKG.py", line 371, in
running_time = timeit.timeit(lambda: run_function(), number=1)
File "/usr/lib/python3.5/timeit.py", line 213, in timeit
return Timer(stmt, setup, timer, globals).timeit(number)
File "/usr/lib/python3.5/timeit.py", line 178, in timeit
timing = self.inner(it, self.timer)
File "", line 6, in inner
File "BuildMasterKG.py", line 371, in
running_time = timeit.timeit(lambda: run_function(), number=1)
File "BuildMasterKG.py", line 337, in make_master_kg
seed_and_expand_kg_q2(num_expansions=3)
File "BuildMasterKG.py", line 250, in seed_and_expand_kg_q2
bne.expand_all_nodes()
File "/mnt/data/orangeboard/code/NCATS/code/reasoningtool/BioNetExpander.py", line 343, in expand_all_nodes sel expand_node(node)
File "/mnt/data/orangeboard/code/NCATS/code/reasoningtool/BioNetExpander.py", line 331, in expand_node exp d_method(self)
File "/mnt/data/orangeboard/code/NCATS/code/reasoningtool/BioNetExpander.py", line 82, in expand_ncbigene_microrna self.add_rel('participates_in', 'gene_ontology', node1, node2)
AttributeError: 'BioNetExpander' object has no attribute 'add_rel'
rt@ip-172-31-43-220:/reasoningtool$
HTTP response status code: 404 for URL:
https://reactome.org/ContentService/data/complexes/UniProt/O95751
HTTP response status code: 404 for URL:
http://www.geneprof.org/GeneProf/api/gene.info/gp.id/human/C_NAME/LDOC1.json
https://api.monarchinitiative.org/api/bioentity/phenotype/HP:0010167/anatomy
Status code 500 for url: https://api.monarchinitiative.org/api/bioentity/phenotype/HP:0010167/anatomy
HTTP response status code: 404 for URL:
http://www.geneprof.org/GeneProf/api/gene.info/gp.id/human/C_NAME/WIPF1.json
HTTP response status code: 404 for URL:
https://reactome.org/ContentService/data/complexes/UniProt/O95865
HTTP response status code: 404 for URL:
http://www.geneprof.org/GeneProf/api/gene.info/gp.id/human/C_NAME/DDAH2.json
HTTP response status code: 404 for URL:
https://reactome.org/ContentService/data/complexes/UniProt/A0A087WUI6
HTTP response status code: 404 for URL:
http://www.geneprof.org/GeneProf/api/gene.info/gp.id/human/C_NAME/PIBF1.json
HTTP response status code: 404 for URL:
https://reactome.org/ContentService/data/complexes/UniProt/Q86YD7
Number of rels: 604000; elapsed time: 14748.08 s
HTTP response status code: 404 for URL:
http://www.geneprof.org/GeneProf/api/gene.info/gp.id/human/C_NAME/UBA52.json
Traceback (most recent call last):
File "BuildMasterKG.py", line 371, in
running_time = timeit.timeit(lambda: run_function(), number=1)
File "/usr/lib/python3.5/timeit.py", line 213, in timeit
return Timer(stmt, setup, timer, globals).timeit(number)
File "/usr/lib/python3.5/timeit.py", line 178, in timeit
timing = self.inner(it, self.timer)
File "", line 6, in inner
File "BuildMasterKG.py", line 371, in
running_time = timeit.timeit(lambda: run_function(), number=1)
File "BuildMasterKG.py", line 337, in make_master_kg
seed_and_expand_kg_q2(num_expansions=3)
File "BuildMasterKG.py", line 250, in seed_and_expand_kg_q2
bne.expand_all_nodes()
File "/mnt/data/orangeboard/code/NCATS/code/reasoningtool/BioNetExpander.py", line 343, in expand_all_nodes
self.expand_node(node)
File "/mnt/data/orangeboard/code/NCATS/code/reasoningtool/BioNetExpander.py", line 331, in expand_node
expand_method(self)
File "/mnt/data/orangeboard/code/NCATS/code/reasoningtool/BioNetExpander.py", line 82, in expand_ncbigene_microrna
self.add_rel('participates_in', 'gene_ontology', node1, node2)
AttributeError: 'BioNetExpander' object has no attribute 'add_rel'

Add code to map variant<->disease and variant<->gene, for Q1

From @saramsey on November 24, 2017 15:40

Copied from original issue: dkoslicki/NCATS#65

Update KG to comply with the emerging Translator KG spec

Here is a proposed minimal integration:
https://docs.google.com/spreadsheets/d/1zXitcR1QjHyh6WocukgshSR7IoAVg7MJQG-HNh96Jec/edit#gid=3366698

Here is a proposed maximal integration:
https://docs.google.com/spreadsheets/d/1zXitcR1QjHyh6WocukgshSR7IoAVg7MJQG-HNh96Jec/edit#gid=421374962

Merge NewStdAPI to master

This is a placeholder issue to remind us to merge NewStdAPI to master after #41 is complete.

Investigate if the disease->phenotype mapping should be filtered

From @saramsey on November 17, 2017 20:15

We're getting a huge expansion when we map disease->phenotype. Are some of these super-general phenotypes that are high on the phenotype ontology tree? Not sure if that is a problem.

Copied from original issue: dkoslicki/NCATS#36

Web service should return reusable (structured) results

From @edeutsch on November 27, 2017 17:31

At the moment, the web service returns text blobs that are human readable and (reasonably) nicely displayable in the UI, but they are not really nicely structured data in a way that 3rd party software would like. We should upgrade to return both.

Copied from original issue: dkoslicki/NCATS#116

switch KG to using MONDO instead of Disease Ontology

do this in the same branch where the rest of the KG development work will be done

Q1 error

python3 Q1Solution.py -j -i 'DOID:12365'
Traceback (most recent call last):
File "Q1Solution.py", line 270, in
main()
File "Q1Solution.py", line 259, in main
res = answerQ1(disease, directed=directed, max_path_len=max_path_len, verbose=verbose, use_json=use_json)
File "Q1Solution.py", line 185, in answerQ1
omims = Q1Utils.refine_omims_well_studied(omims, doid, omim_to_mesh, q1_doid_to_mesh, verbose=verbose)
File "/mnt/data/orangeboard/production/RTX/code/reasoningtool/QuestionAnswering/Q1Utils.py", line 489, in refine_omims_well_studied
res = QueryNCBIeUtils.QueryNCBIeUtils.normalized_google_distance(omim_mesh, q1_doid_to_mesh[doid])
TypeError: unhashable type: 'list'

map from ClinVar UID to a MeSH UID

Originally proposed by @dkoslicki:

Related question: is there any way to convert between clinvarid and meshid/term? I see get_clinvar_uids_for_disease_or_phenotype_string, but I don't see any method to convert clinvar to mesh.

Steve says:

It appears not, but it looks like we can connect from ClniVar OrphaNet and from OrphaNet we can get a MeSH ID.

seed nodes for Q2 COP in the KG

New question 4 ready for UI integration

@saramsey @edeutsch To test the robustness of our new integration scheme, I've implemented a new question type: "What diseases are similar to X?" where X is a disease (see commits 5d9ab3e and a5f6e36). I've created this in a new branch (called Q4), and after we've merged NewStdAPI into master, we can see how straightforward it will be to implement this new question type into the UI, and then merge Q4 into master as well. Then we'll be half way to the minimal viable product question count!

As for details of the implementation, after a bit of cypher jiggering (8a70291 and a72ac90), I am able to count the number of nodes (of a certain type) that are shared between any two node types in the graph. This allows me to compute the Jaccard index between the number of phenotypes shared in common between two diseases (and gives an informative error if a disease has no phenotypes but it's parent does). I then return all diseases which have a large enough Jaccard index (given by the --threshold parameter with default Jaccard=0.20).

@edeutsch This question again returns each disease as a separate node (as in #41). I can change this when requested.

Add code for Timeout handling to QueryReactome

From @saramsey on November 27, 2017 16:41

Copied from original issue: dkoslicki/NCATS#115

Write QueryJASPAR class to query JASPAR's REST API

From @saramsey on December 9, 2017 1:13

Article describing the REST API for JASPAR is on bioArxiv:

https://www.biorxiv.org/content/early/2017/07/06/160184

Copied from original issue: dkoslicki/NCATS#126

Problem parsing JSON in RTXQuery

@edeutsch I'm trying to get Q2 hooked up with RTXQuery using the new std api. However, even though it gives the same output as Q3 (which works with RTXQuery): printed JSON text, there seems to be an issue with parsing it:

python3 ../../UI/OpenAPI/python-flask-server/RTXQuery.py 
python3 Q2Solution.py -r 'physostigmine' -d 'DOID:1686'
Traceback (most recent call last):
  File "../../UI/OpenAPI/python-flask-server/RTXQuery.py", line 119, in <module>
    if __name__ == "__main__": main()
  File "../../UI/OpenAPI/python-flask-server/RTXQuery.py", line 114, in main
    result = rtxq.query(query)
  File "../../UI/OpenAPI/python-flask-server/RTXQuery.py", line 69, in query
    response = json.loads(returnedText)
  File "/usr/lib/python3.5/json/__init__.py", line 312, in loads
    s.__class__.__name__))
TypeError: the JSON object must be str, not 'CompletedProcess'

Noting that python3 Q2Solution.py -r 'physostigmine' -d 'DOID:1686' executes without issue. I assume this is a problem with the return of Q2, but I can't seem to see how this differs from the working Q3.

QueryNCBIeUtils error handline

Apparently QueryNCBIeUtils doesn't have sufficient error handling:

>>>python3 Q2Solution.py -r mupirocin -d 'DOID:1563'
HTTP timeout in QueryNCBIeUtils.py; URL: https://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=pubmed&term=%28CASP1_HUMAN%7CCaspase-1%20%5BMeSH%20Terms%5D%7CCASP-1%5BMeSH%20Terms%5D%7CEC%203.4.22.36%5BMeSH%20Terms%5D%7CInterleukin-1%20beta%20convertase%5BMeSH%20Terms%5D%7CIL-1BC%5BMeSH%20Terms%5D%7CInterleukin-1%20beta-converting%20enzyme%5BMeSH%20Terms%5D%7CICE%5BMeSH%20Terms%5D%7CIL-1%20beta-converting%20enzyme%5BMeSH%20Terms%5D%7Cp45%7C%20%5BCleaved%20into%3A%20Caspase-1%20subunit%20p20%3B%20Caspase-1%20subunit%20p10%5D%7CCASP1%5BMeSH%20Terms%5D%7CIL1BC%5BMeSH%20Terms%5D%7CIL1BCE%5BMeSH%20Terms%5D%29%20AND%20%28Deubiquitination%29&retmode=json&retmax=1000
Traceback (most recent call last):
  File "Q2Solution.py", line 223, in <module>
    main()
  File "Q2Solution.py", line 218, in main
    res = answerQ2(drug, disease, k)
  File "Q2Solution.py", line 128, in answerQ2
    mesh1=False, mesh2=False)
  File "/home/dkoslicki/Dropbox/Repositories/RTX/code/reasoningtool/QueryNCBIeUtils.py", line 243, in normalized_google_distance
    numerator = max(math.log(ni), math.log(nj)) - math.log(nij)
TypeError: a float is required

I assume the way to go is either:

Try again if an error is encountered
or throw an exception (so downstream code doesn't try to operate on garbage, which is what happened above).

have BuildMasterKG load a single TSV file of "seed nodes"

bug in QueryReactome

https://reactome.org/ContentService/interactors/static/molecule/P35346/details
Traceback (most recent call last):
File "BuildMasterKG.py", line 371, in
running_time = timeit.timeit(lambda: run_function(), number=1)
File "/usr/lib/python3.5/timeit.py", line 213, in timeit
return Timer(stmt, setup, timer, globals).timeit(number)
File "/usr/lib/python3.5/timeit.py", line 178, in timeit
timing = self.inner(it, self.timer)
File "", line 6, in inner
File "BuildMasterKG.py", line 371, in
running_time = timeit.timeit(lambda: run_function(), number=1)
File "BuildMasterKG.py", line 337, in make_master_kg
seed_and_expand_kg_q2(num_expansions=3)
File "BuildMasterKG.py", line 250, in seed_and_expand_kg_q2
bne.expand_all_nodes()
File "/mnt/data/orangeboard/code/NCATS/code/reasoningtool/BioNetExpander.py", line 343, in expand_all_nodes
self.expand_node(node)
File "/mnt/data/orangeboard/code/NCATS/code/reasoningtool/BioNetExpander.py", line 331, in expand_node
expand_method(self)
File "/mnt/data/orangeboard/code/NCATS/code/reasoningtool/BioNetExpander.py", line 220, in expand_uniprot_protein
int_dict = QueryReactome.query_uniprot_id_to_interacting_uniprot_ids_desc(uniprot_id_str)
File "/mnt/data/orangeboard/code/NCATS/code/reasoningtool/QueryReactome.py", line 154, in query_uniprot_id_to_interacting_uniprot_ids_desc
res = QueryReactome.send_query_get('interactors/static/molecule', uniprot_id + '/details').json()
AttributeError: 'NoneType' object has no attribute 'json'

Identifiers to MeSH terms

I am trying to improve our answers to the Q2 COP's. To do this, I need the Google distance to work on all nodes in our KG. Currently, in QueryNCBIeUtils.py, there is a way to get mesh term ID's from OMIM ID's and from disease names, but I also need to get mesh term ID's from:

Uniprot protein identifiers (eg: P23219).
UBERON ID's

I don't think I need it for phenotypes since the KG descriptions are all showing up in MeSH as far as I can tell.

The need stems from the fact that currently the Google distance doesn't work on "P23219" and "Prostaglandin G/H synthase 1" (one of its synonyms), but does work for another synonym "Cyclooxygenase-1".

I tried poking around on the uniprot API, and the only way I could see to do this is to download:
https://www.uniprot.org/uniprot/P23219.xml (or https://www.uniprot.org/uniprot/P23219.txt)
and then pull off all the names and hit them with QueryNCBIeUtils.is_mesh_term. But I assume there's a much better way to do this via API call.

Check results of Q2Solution.py

From @dkoslicki on November 26, 2017 23:9

Since I don't have any "ground truth" for these, those with more bio knowledge, please make sure they look plausible/are correct.

You can run it yourself on all Q2 drugs/diseases via:

git pull
python3 Q2Solution.py -a

Copied from original issue: dkoslicki/NCATS#107

rtxteam / rtx Goto Github PK

rtx's Issues

Recommend Projects

Recommend Topics

Recommend Org