Code Monkey home page Code Monkey logo

rtx's People

Contributors

acevedol avatar amykglen avatar arnabdotorg avatar chunyuma avatar dependabot[bot] avatar deqingqu avatar dkoslicki avatar ecwood avatar edeutsch avatar erikyao avatar finnagin avatar flashkicker avatar isbluis avatar jlmcclelland avatar kvarforl avatar kvnthomas98 avatar lianghuang3 avatar mapleknight avatar meghasin avatar oacob1 avatar oliphs avatar pahmadi8740 avatar palzer avatar rcpeene avatar rtx-travis-tester avatar rtxci avatar saramsey avatar sundareswarpullela avatar veronicaflores avatar zheng-liu avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

rtx's Issues

Automatically add+expand nodes not in the KG

This was something we had discussed previously: recognizing when a node is absent from the KG and then populating it automatically. For example, it would be nice to have "DOID:9849" in the KG, but it's not currently there. This would require making the KG not read only (which would be a risk) but the tradeoff is probably worth it.

Check results of Q2Solution.py

From @dkoslicki on November 26, 2017 23:9

Since I don't have any "ground truth" for these, those with more bio knowledge, please make sure they look plausible/are correct.

You can run it yourself on all Q2 drugs/diseases via:

git pull
python3 Q2Solution.py -a

Copied from original issue: dkoslicki/NCATS#107

Node and relationship properties

@saramsey Here is a prioritized list of node types and relationship types for which it's desirable to explore what sorts of properties we can pull from various KS's:

Nodes:

  1. Anatomy
  2. phenotype
  3. microRNA
  4. pathway
  5. protein
  6. disease

(source node type, relationship type, target node type):

  1. (phenotype, phenotype assoc with, anatomy) example desired property: what sort of association?
  2. (phenotype, phenotype assoc with, disease) example desired property: probability/prevalence in population (since some phenotypes seem to be very rare to me and it would be good to have this indicated)
  3. (protein, is expressed in, anatomy)
  4. (protein, gene assoc with, disease) example desired property: what sort of association is this? What's the strength of the association?
  5. (microRNA, gene assoc with, disease)
  6. (microRNA, gene assoc with, phenotype)
  7. (microRNA, is expressed in, anatomy)
  8. (microRNA, controls expression of, protein) example desired property: what sort of control is exerted? What's the strength?

If for each of these you let me know what sort of data is being returned by the KS, I can take a look and we can pick/choose which properties to include.

Need to get restatedQuestion returned from translate()

@dkoslicki In order for the restatedQuestion to be properly filled in for query results, the translate() function needs to return that information. It did under the old system, but no longer does.
The relevant code is line 669 or search for FIXME

I can't figure out how to get the restated question into that slot. Would you fix that?
do git pull in NewStdAPI first.

Problem parsing JSON in RTXQuery

@edeutsch I'm trying to get Q2 hooked up with RTXQuery using the new std api. However, even though it gives the same output as Q3 (which works with RTXQuery): printed JSON text, there seems to be an issue with parsing it:

python3 ../../UI/OpenAPI/python-flask-server/RTXQuery.py 
python3 Q2Solution.py -r 'physostigmine' -d 'DOID:1686'
Traceback (most recent call last):
  File "../../UI/OpenAPI/python-flask-server/RTXQuery.py", line 119, in <module>
    if __name__ == "__main__": main()
  File "../../UI/OpenAPI/python-flask-server/RTXQuery.py", line 114, in main
    result = rtxq.query(query)
  File "../../UI/OpenAPI/python-flask-server/RTXQuery.py", line 69, in query
    response = json.loads(returnedText)
  File "/usr/lib/python3.5/json/__init__.py", line 312, in loads
    s.__class__.__name__))
TypeError: the JSON object must be str, not 'CompletedProcess'

Noting that python3 Q2Solution.py -r 'physostigmine' -d 'DOID:1686' executes without issue. I assume this is a problem with the return of Q2, but I can't seem to see how this differs from the working Q3.

HTTP responses potentially mess up output format?

@edeutsch @saramsey I've noticed that we occasionally get the following printed out to screen when using QueryNCBIeUtils:

HTTP response status code: 502 for URL:
https://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=pubmed&term=%28Van%20Buchem%20disease%20type%202%29%20AND%20%28Osteoporosis%5BMeSH%20Terms%5D%29&retmode=json&retmax=1000

This does not necessarily mean that the QXSolution.py will fail (since I will just then put in the max google distance), but I imagine that having something besides the JSON formatted response printed may mess up the UI/API. Will this be an issue? I don't know if requests is using stdout or stderr to print these as I'm not familiar with the requests package.

Error running Q2

Yesterday Q2 was working, now Q1 is not working. Maybe attempted fix for Q1 broke Q2?

python3 Q2Solution.py -r 'DOID:1686' -d 'physostigmine'
Traceback (most recent call last):
File "Q2Solution.py", line 187, in
main()
File "Q2Solution.py", line 184, in main
res = answerQ2(drug, disease, k)
File "Q2Solution.py", line 70, in answerQ2
RU.weight_graph_with_google_distance(g)
File "/mnt/data/orangeboard/test/RTX/code/reasoningtool/QuestionAnswering/ReasoningUtilities.py", line 693, in weight_graph_with_google_distance
gd_temp = QueryNCBIeUtils.normalized_google_distance(source_mesh_term, target_mesh_term, mesh1=mesh1, mesh2=mesh2)
File "/mnt/data/orangeboard/test/RTX/code/reasoningtool/QueryNCBIeUtils.py", line 232, in normalized_google_distance
nij = QueryNCBIeUtils.get_pubmed_hits_count('({mesh1}) AND ({mesh2})'.format(mesh1=mesh1_str_decorated,
UnboundLocalError: local variable 'mesh1_str_decorated' referenced before assignment

Speeding up QueryNCBIeUtils

Just a note that when you query two MeSH terms, it give the joint and the marginals:
Note that with https://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=pubmed&term=%28%22malaria%22%5BMeSH%20Terms%5D%29%20AND%20%22osteoarthritis%22%5BMeSH%20Terms%5D&retmode=json&retmax=1000
You get back three counts corresponding to joint, marginal, marginal:

{
    "header": {
        "type": "esearch",
        "version": "0.3"
    },
    "esearchresult": {
        "count": "4",
        "retmax": "4",
        "retstart": "0",
        "idlist": [
            "16425715",
            "11775318",
            "3300734",
            "5982767"
        ],
        "translationset": [
        ],
        "translationstack": [
            {
                "term": "\"malaria\"[MeSH Terms]",
                "field": "MeSH Terms",
                "count": "60405",
                "explode": "Y"
            },
            {
                "term": "\"osteoarthritis\"[MeSH Terms]",
                "field": "MeSH Terms",
                "count": "54506",
                "explode": "Y"
            },
            "AND"
        ],
        "querytranslation": "\"malaria\"[MeSH Terms] AND \"osteoarthritis\"[MeSH Terms]"
    }
}

Identifiers to MeSH terms

I am trying to improve our answers to the Q2 COP's. To do this, I need the Google distance to work on all nodes in our KG. Currently, in QueryNCBIeUtils.py, there is a way to get mesh term ID's from OMIM ID's and from disease names, but I also need to get mesh term ID's from:

  1. Uniprot protein identifiers (eg: P23219).
  2. UBERON ID's

I don't think I need it for phenotypes since the KG descriptions are all showing up in MeSH as far as I can tell.

The need stems from the fact that currently the Google distance doesn't work on "P23219" and "Prostaglandin G/H synthase 1" (one of its synonyms), but does work for another synonym "Cyclooxygenase-1".

I tried poking around on the uniprot API, and the only way I could see to do this is to download:
https://www.uniprot.org/uniprot/P23219.xml (or https://www.uniprot.org/uniprot/P23219.txt)
and then pull off all the names and hit them with QueryNCBIeUtils.is_mesh_term. But I assume there's a much better way to do this via API call.

add_rel bug

File "BuildMasterKG.py", line 371, in
running_time = timeit.timeit(lambda: run_function(), number=1)
File "/usr/lib/python3.5/timeit.py", line 213, in timeit
return Timer(stmt, setup, timer, globals).timeit(number)
File "/usr/lib/python3.5/timeit.py", line 178, in timeit
timing = self.inner(it, self.timer)
File "", line 6, in inner
File "BuildMasterKG.py", line 371, in
running_time = timeit.timeit(lambda: run_function(), number=1)
File "BuildMasterKG.py", line 337, in make_master_kg
seed_and_expand_kg_q2(num_expansions=3)
File "BuildMasterKG.py", line 250, in seed_and_expand_kg_q2
bne.expand_all_nodes()
File "/mnt/data/orangeboard/code/NCATS/code/reasoningtool/BioNetExpander.py", line 343, in expand_all_nodes
self.expand_node(node)
File "/mnt/data/orangeboard/code/NCATS/code/reasoningtool/BioNetExpander.py", line 331, in expand_node
expand_method(self)
File "/mnt/data/orangeboard/code/NCATS/code/reasoningtool/BioNetExpander.py", line 82, in expand_ncbigene_microrna
self.add_rel('participates_in', 'gene_ontology', node1, node2)
AttributeError: 'BioNetExpander' object has no attribute 'add_rel'
rt@ip-172-31-43-220:/reasoningtool$
http://www.geneprof.org/GeneProf/api/gene.info/gp.id/human/C_NAME/UBA52.json
Traceback (most recent call last):
File "BuildMasterKG.py", line 371, in
running_time = timeit.timeit(lambda: run_function(), number=1)
File "/usr/lib/python3.5/timeit.py", line 213, in timeit
return Timer(stmt, setup, timer, globals).timeit(number)
File "/usr/lib/python3.5/timeit.py", line 178, in timeit
timing = self.inner(it, self.timer)
File "", line 6, in inner
File "BuildMasterKG.py", line 371, in
running_time = timeit.timeit(lambda: run_function(), number=1)
File "BuildMasterKG.py", line 337, in make_master_kg
seed_and_expand_kg_q2(num_expansions=3)
File "BuildMasterKG.py", line 250, in seed_and_expand_kg_q2
bne.expand_all_nodes()
File "/mnt/data/orangeboard/code/NCATS/code/reasoningtool/BioNetExpander.py", line 343, in expand_all_nodes sel expand_node(node)
File "/mnt/data/orangeboard/code/NCATS/code/reasoningtool/BioNetExpander.py", line 331, in expand_node exp d_method(self)
File "/mnt/data/orangeboard/code/NCATS/code/reasoningtool/BioNetExpander.py", line 82, in expand_ncbigene_microrna self.add_rel('participates_in', 'gene_ontology', node1, node2)
AttributeError: 'BioNetExpander' object has no attribute 'add_rel'
rt@ip-172-31-43-220:
/reasoningtool$
HTTP response status code: 404 for URL:
https://reactome.org/ContentService/data/complexes/UniProt/O95751
HTTP response status code: 404 for URL:
http://www.geneprof.org/GeneProf/api/gene.info/gp.id/human/C_NAME/LDOC1.json
https://api.monarchinitiative.org/api/bioentity/phenotype/HP:0010167/anatomy
Status code 500 for url: https://api.monarchinitiative.org/api/bioentity/phenotype/HP:0010167/anatomy
HTTP response status code: 404 for URL:
http://www.geneprof.org/GeneProf/api/gene.info/gp.id/human/C_NAME/WIPF1.json
HTTP response status code: 404 for URL:
https://reactome.org/ContentService/data/complexes/UniProt/O95865
HTTP response status code: 404 for URL:
http://www.geneprof.org/GeneProf/api/gene.info/gp.id/human/C_NAME/DDAH2.json
HTTP response status code: 404 for URL:
https://reactome.org/ContentService/data/complexes/UniProt/A0A087WUI6
HTTP response status code: 404 for URL:
http://www.geneprof.org/GeneProf/api/gene.info/gp.id/human/C_NAME/PIBF1.json
HTTP response status code: 404 for URL:
https://reactome.org/ContentService/data/complexes/UniProt/Q86YD7
Number of rels: 604000; elapsed time: 14748.08 s
HTTP response status code: 404 for URL:
http://www.geneprof.org/GeneProf/api/gene.info/gp.id/human/C_NAME/UBA52.json
Traceback (most recent call last):
File "BuildMasterKG.py", line 371, in
running_time = timeit.timeit(lambda: run_function(), number=1)
File "/usr/lib/python3.5/timeit.py", line 213, in timeit
return Timer(stmt, setup, timer, globals).timeit(number)
File "/usr/lib/python3.5/timeit.py", line 178, in timeit
timing = self.inner(it, self.timer)
File "", line 6, in inner
File "BuildMasterKG.py", line 371, in
running_time = timeit.timeit(lambda: run_function(), number=1)
File "BuildMasterKG.py", line 337, in make_master_kg
seed_and_expand_kg_q2(num_expansions=3)
File "BuildMasterKG.py", line 250, in seed_and_expand_kg_q2
bne.expand_all_nodes()
File "/mnt/data/orangeboard/code/NCATS/code/reasoningtool/BioNetExpander.py", line 343, in expand_all_nodes
self.expand_node(node)
File "/mnt/data/orangeboard/code/NCATS/code/reasoningtool/BioNetExpander.py", line 331, in expand_node
expand_method(self)
File "/mnt/data/orangeboard/code/NCATS/code/reasoningtool/BioNetExpander.py", line 82, in expand_ncbigene_microrna
self.add_rel('participates_in', 'gene_ontology', node1, node2)
AttributeError: 'BioNetExpander' object has no attribute 'add_rel'

Geneprof 500 response

From @erikyao on January 12, 2018 18:59

When running test_geneprof_id_to_transcription_factor_gene_symbols of tests/QueryGeneProfTestCase, QueryGeneProf.geneprof_id_to_transcription_factor_gene_symbols sometimes returns an empty set due to Genprof's 500 response.

Test code:

ret_set = QueryGeneProf.geneprof_id_to_transcription_factor_gene_symbols(16269)  # 'HMOX1'

Debugging message:

HTTP response status code: 500 for URL:
http://www.geneprof.org/GeneProf/api/gene.info/regulation/binary/by.target/human/16269.json?with-sample-info=true

Cause of 500 reported by Geneprof:

java.lang.OutOfMemoryError: unable to create new native thread

We may need to add exception-handling code in this case.

Copied from original issue: dkoslicki/NCATS#134

map from ClinVar UID to a MeSH UID

Originally proposed by @dkoslicki:

Related question: is there any way to convert between clinvarid and meshid/term? I see get_clinvar_uids_for_disease_or_phenotype_string, but I don't see any method to convert clinvar to mesh.

Steve says:

It appears not, but it looks like we can connect from ClniVar OrphaNet and from OrphaNet we can get a MeSH ID.

QueryNCBIeUtils error handline

Apparently QueryNCBIeUtils doesn't have sufficient error handling:

>>>python3 Q2Solution.py -r mupirocin -d 'DOID:1563'
HTTP timeout in QueryNCBIeUtils.py; URL: https://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=pubmed&term=%28CASP1_HUMAN%7CCaspase-1%20%5BMeSH%20Terms%5D%7CCASP-1%5BMeSH%20Terms%5D%7CEC%203.4.22.36%5BMeSH%20Terms%5D%7CInterleukin-1%20beta%20convertase%5BMeSH%20Terms%5D%7CIL-1BC%5BMeSH%20Terms%5D%7CInterleukin-1%20beta-converting%20enzyme%5BMeSH%20Terms%5D%7CICE%5BMeSH%20Terms%5D%7CIL-1%20beta-converting%20enzyme%5BMeSH%20Terms%5D%7Cp45%7C%20%5BCleaved%20into%3A%20Caspase-1%20subunit%20p20%3B%20Caspase-1%20subunit%20p10%5D%7CCASP1%5BMeSH%20Terms%5D%7CIL1BC%5BMeSH%20Terms%5D%7CIL1BCE%5BMeSH%20Terms%5D%29%20AND%20%28Deubiquitination%29&retmode=json&retmax=1000
Traceback (most recent call last):
  File "Q2Solution.py", line 223, in <module>
    main()
  File "Q2Solution.py", line 218, in main
    res = answerQ2(drug, disease, k)
  File "Q2Solution.py", line 128, in answerQ2
    mesh1=False, mesh2=False)
  File "/home/dkoslicki/Dropbox/Repositories/RTX/code/reasoningtool/QueryNCBIeUtils.py", line 243, in normalized_google_distance
    numerator = max(math.log(ni), math.log(nj)) - math.log(nij)
TypeError: a float is required

I assume the way to go is either:

  1. Try again if an error is encountered
  2. or throw an exception (so downstream code doesn't try to operate on garbage, which is what happened above).

Neo4j node names are a mix of identifiers and real names

For example, disont_disease names are DOID's while the descriptions are human readable. In contrast, pharos_drugs have human readable names, and CHEMBL descriptions.

I vote we standardize to identifiers for names, human readable descriptions. Or perhaps even better a more uniform node property naming convention such as:
ID: identifier (DOID, CHEMBL, etc.)
name: human readable name
description: pulled from MESH
etc.

QuestionTranslater gets stuck

The new QuestionTranslator gets wedged on this question:

What is the clinical outcome pathway of dicumarol for treatment of coagulation?

node1 not defined bug

File "BuildMasterKG.py", line 371, in
running_time = timeit.timeit(lambda: run_function(), number=1)
File "/usr/lib/python3.5/timeit.py", line 213, in timeit
return Timer(stmt, setup, timer, globals).timeit(number)
File "/usr/lib/python3.5/timeit.py", line 178, in timeit
timing = self.inner(it, self.timer)
File "", line 6, in inner
File "BuildMasterKG.py", line 371, in
running_time = timeit.timeit(lambda: run_function(), number=1)
File "BuildMasterKG.py", line 337, in make_master_kg
seed_and_expand_kg_q2(num_expansions=3)
File "BuildMasterKG.py", line 250, in seed_and_expand_kg_q2
bne.expand_all_nodes()
File "/mnt/data/orangeboard/code/NCATS/code/reasoningtool/BioNetExpander.py", line 343, in expand_all_nodes
self.expand_node(node)
File "/mnt/data/orangeboard/code/NCATS/code/reasoningtool/BioNetExpander.py", line 331, in expand_node
expand_method(self)
File "/mnt/data/orangeboard/code/NCATS/code/reasoningtool/BioNetExpander.py", line 82, in expand_ncbigene_microrna
self.orangeboard.add_rel('participates_in', 'gene_ontology', node1, node2)
NameError: name 'node1' is not defined

Q1 error

python3 Q1Solution.py -j -i 'DOID:12365'
Traceback (most recent call last):
File "Q1Solution.py", line 270, in
main()
File "Q1Solution.py", line 259, in main
res = answerQ1(disease, directed=directed, max_path_len=max_path_len, verbose=verbose, use_json=use_json)
File "Q1Solution.py", line 185, in answerQ1
omims = Q1Utils.refine_omims_well_studied(omims, doid, omim_to_mesh, q1_doid_to_mesh, verbose=verbose)
File "/mnt/data/orangeboard/production/RTX/code/reasoningtool/QuestionAnswering/Q1Utils.py", line 489, in refine_omims_well_studied
res = QueryNCBIeUtils.QueryNCBIeUtils.normalized_google_distance(omim_mesh, q1_doid_to_mesh[doid])
TypeError: unhashable type: 'list'

RTXQuery checks only for drugs for Q3

@edeutsch
Hi Eric,

I noticed in RTXQuery.py, line 88 and onwards, there is the following code:

if id == 'Q3':
  targets = qph.query_drug_name_to_targets(terms[0])
  if targets:
    list = '<UL>\n'
    for target in targets:
      list += "<LI> "+target["name"]+"\n"
    list += "</UL>\n"
    codeString = "OK"
    result = [ { "id": 537, "code": 1, "codeString": codeString, "message": "AnswerFound", "result": targets, "text": [ terms[0]+" is known to target: "+list ] } ]
    self.logQuery(id,codeString,terms)
  else:
    codeString = "DrugNotFound"
    result = [ { "id": 537, "code": 11, "codeString": codeString, "message": "DrugNotFound", "text": [ "Unable to find drug '"+terms[0]+"'." ] } ]
    self.logQuery(id,codeString,terms)
  return(result);

Q3 as it stands now can allow for all different node types, so checking query_drug_name_to_targets isn't the best idea here. For example:

QT.find_question_parameters("what phenotype is phenotype associated with malaria")
Out[649]:
{'corpus_index': 3,
 'error_code': None,
 'error_message': None,
 'input_text': 'what phenotype is phenotype associated with malaria',
 'terms': {'relationship_type': 'phenotype_assoc_with',
  'source_name': 'DOID:12365',
  'target_label': 'phenont_phenotype'}}

And the node DOID:12365 has been found by QuestionTranslator.py (it checks if nodes exist in the knowledge graph). So the Q3Solution.py would run on these input terms, but the RTXQuery isn't letting it since it thinks it's just looking for drugs. Perhaps an error message like "Unknown term" or, more specifically, "entity not in knowledge graph" is better here, along with removing the check for query_drug_names_to_targets.

There non-human proteins showing up in the KG

I've been looking at the COP for the treatment of osteoarthritis by naproxen, and it appears that one of the reason why we are not getting the correct answer is that some of the relevant nodes are not expanded. For example, the PTGS1 gene is a ("the relevant") target of naproxen, and yet:

match (n:uniprot_protein{description:"PTGS1"}) return n.expanded
False

And so I cannot connect PTGS1 to any relevant piece of anatomy:

match p=(s:pharos_drug{name:"naproxen"})-[:targets]-(:uniprot_protein{description:"PTGS1"})-[]-(:anatont_anatomy) return p limit 50
(no changes, no records)

Was there an error in the Orangeboard construction that caused this node not to be expanded? I had assumed that we had seeded with all drugs and expanded from there. Yet:

match p=(s:pharos_drug{name:"naproxen"})-[:targets]-(t:uniprot_protein) return t.name, t.description, t.expanded
╒════════╤═══════════════╤════════════╕
│"t.name"│"t.description"│"t.expanded"│
╞════════╪═══════════════╪════════════╡
│"P25101"│"EDNRA" │false │
├────────┼───────────────┼────────────┤
│"P0DJD9"│"PGA5" │false │
├────────┼───────────────┼────────────┤
│"P23219"│"PTGS1" │false │
├────────┼───────────────┼────────────┤
│"Q15722"│"LTB4R" │false │
├────────┼───────────────┼────────────┤
│"P11712"│"CYP2C9" │true │
├────────┼───────────────┼────────────┤
│"P09917"│"ALOX5" │true │
├────────┼───────────────┼────────────┤
│"O14842"│"FFAR1" │false │
├────────┼───────────────┼────────────┤
│"Q07869"│"PPARA" │false │
├────────┼───────────────┼────────────┤
│"O43174"│"CYP26A1" │false │
├────────┼───────────────┼────────────┤
│"P24530"│"EDNRB" │false │
├────────┼───────────────┼────────────┤
│"P37231"│"PPARG" │true │
├────────┼───────────────┼────────────┤
│"Q8TCC7"│"SLC22A8" │false │
├────────┼───────────────┼────────────┤
│"P49286"│"MTNR1B" │false │
├────────┼───────────────┼────────────┤
│"P08183"│"ABCB1" │true │
├────────┼───────────────┼────────────┤
│"P01375"│"TNF" │true │
├────────┼───────────────┼────────────┤
│"P00374"│"DHFR" │true │
├────────┼───────────────┼────────────┤
│"P08684"│"CYP3A4" │true │
├────────┼───────────────┼────────────┤
│"P35354"│"PTGS2" │true │
├────────┼───────────────┼────────────┤
│"P33261"│"CYP2C19" │false │
├────────┼───────────────┼────────────┤
│"P42330"│"AKR1C3" │false │
├────────┼───────────────┼────────────┤
│"P16473"│"TSHR" │false │
├────────┼───────────────┼────────────┤
│"Q16665"│"HIF1A" │true │
├────────┼───────────────┼────────────┤
│"P48039"│"MTNR1A" │false │
└────────┴───────────────┴────────────┘

Errant debug statements

QueryNCBIeUtils is printing out a list of ints, causing problems with the JSON parsing of QXSolution.

URI's for nodes in KG

URI's are needed for the output format specified by @edeutsch. @saramsey suggested "OK, this should be doable by a simple python class backed by a config file or something. "

regression test suite

Hi Yao & Zheng,

You know how each of the “QueryXXXXX.py” modules has test code? Can you update all these modules so that the test methods can be run (all at once) as a set of regression tests, perhaps using the ‘test.regrtest’ or ‘unittest” package?

Connection issue for Neo4j on ncats.saramsey.org

From @saramsey on January 6, 2018 0:7

Q0 and Q3 are now working, but I don’t get Q1 and Q2 to work. I futzed around with getting Q1 to work. It looks like all the neo4j connection commands were hard-wired to lysine and they weren’t working. So I changed them all to localhost. Should that work? But now I get a another connection error.

cd /mnt/data/orangeboard/code/NCATS/code/reasoningtool
grep lysine *.py
(now mostly commented out and replaced with localhost)
python3 Q1Solution.py
…… lots of errors ending with:
requests.exceptions.ConnectionError: ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response',))
ideas?
I suspect it can’t connect to neo4j, but not sure…

Copied from original issue: dkoslicki/NCATS#128

QueryNCBIeUtils.get_mesh_terms_for_mesh_uid always returns empty

Hi @saramsey, it looks like QueryNCBIeUtils.get_mesh_terms_for_mesh_uid is always returning an empty answer:

>>>QueryNCBIeUtils.get_medgen_uid_for_omim_id('614332')
{482154, 814824}
>>>QueryNCBIeUtils.get_mesh_terms_for_mesh_uid({482154, 814824})
[]

and

>>>QueryNCBIeUtils.get_medgen_uid_for_omim_id('600320')
{325371}
>>>QueryNCBIeUtils.get_mesh_terms_for_mesh_uid({325371})
[]

and a bunch of other similar examples.

New question 4 ready for UI integration

@saramsey @edeutsch To test the robustness of our new integration scheme, I've implemented a new question type: "What diseases are similar to X?" where X is a disease (see commits 5d9ab3e and a5f6e36). I've created this in a new branch (called Q4), and after we've merged NewStdAPI into master, we can see how straightforward it will be to implement this new question type into the UI, and then merge Q4 into master as well. Then we'll be half way to the minimal viable product question count!

As for details of the implementation, after a bit of cypher jiggering (8a70291 and a72ac90), I am able to count the number of nodes (of a certain type) that are shared between any two node types in the graph. This allows me to compute the Jaccard index between the number of phenotypes shared in common between two diseases (and gives an informative error if a disease has no phenotypes but it's parent does). I then return all diseases which have a large enough Jaccard index (given by the --threshold parameter with default Jaccard=0.20).

@edeutsch This question again returns each disease as a separate node (as in #41). I can change this when requested.

bug in QueryReactome

https://reactome.org/ContentService/interactors/static/molecule/P35346/details
Traceback (most recent call last):
File "BuildMasterKG.py", line 371, in
running_time = timeit.timeit(lambda: run_function(), number=1)
File "/usr/lib/python3.5/timeit.py", line 213, in timeit
return Timer(stmt, setup, timer, globals).timeit(number)
File "/usr/lib/python3.5/timeit.py", line 178, in timeit
timing = self.inner(it, self.timer)
File "", line 6, in inner
File "BuildMasterKG.py", line 371, in
running_time = timeit.timeit(lambda: run_function(), number=1)
File "BuildMasterKG.py", line 337, in make_master_kg
seed_and_expand_kg_q2(num_expansions=3)
File "BuildMasterKG.py", line 250, in seed_and_expand_kg_q2
bne.expand_all_nodes()
File "/mnt/data/orangeboard/code/NCATS/code/reasoningtool/BioNetExpander.py", line 343, in expand_all_nodes
self.expand_node(node)
File "/mnt/data/orangeboard/code/NCATS/code/reasoningtool/BioNetExpander.py", line 331, in expand_node
expand_method(self)
File "/mnt/data/orangeboard/code/NCATS/code/reasoningtool/BioNetExpander.py", line 220, in expand_uniprot_protein
int_dict = QueryReactome.query_uniprot_id_to_interacting_uniprot_ids_desc(uniprot_id_str)
File "/mnt/data/orangeboard/code/NCATS/code/reasoningtool/QueryReactome.py", line 154, in query_uniprot_id_to_interacting_uniprot_ids_desc
res = QueryReactome.send_query_get('interactors/static/molecule', uniprot_id + '/details').json()
AttributeError: 'NoneType' object has no attribute 'json'

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.