Code Monkey home page Code Monkey logo

oakx-grape's People

Contributors

caufieldjh avatar cmungall avatar hrshdhgd avatar justaddcoffee avatar pkalita-lbl avatar sierra-moxon avatar

Stargazers

 avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

oakx-grape's Issues

Index is out of bound during the all_by_all_pairwise_similarity

Component

GrapeImplementation.all_by_all_pairwise_similarity

Description

During the GrapeImplementation.all_by_all_pairwise_similarity method call, I got an index out of bounds exception:

IndexError                                Traceback (most recent call last)
Cell In [64], line 1
----> 1 tp = oi.all_by_all_pairwise_similarity(oba_list, vt_list)

File ~/.pyenv/versions/3.10.8/lib/python3.10/site-packages/oakx_grape/grape_implementation.py:402, in GrapeImplementation.all_by_all_pairwise_similarity(self, subjects, objects, predicates)
    398     raise ValueError("For now can only use hardcoded ensmallen predicates")
    400 resnik_model = self._make_grape_resnik_model()
--> 402 sim = resnik_model.get_similarities_from_bipartite_graph_node_names(
    403     source_node_names=subjects,
    404     destination_node_names=objects,
    405     return_similarities_dataframe=True,
    406     return_node_names=True,
    407 )
    409 pairs = iter(self._df_to_pairwise_similarity(sim))
    411 return pairs

File ~/.pyenv/versions/3.10.8/lib/python3.10/site-packages/embiggen/similarities/dag_resnik.py:145, in DAGResnik.get_similarities_from_bipartite_graph_node_names(self, source_node_names, destination_node_names, minimum_similarity, return_similarities_dataframe, return_node_names)
    120 def get_similarities_from_bipartite_graph_node_names(
    121     self,
    122     source_node_names: List[str],
   (...)
    126     return_node_names: bool = False
...
     81     ),
     82     "resnik_score": similarities
     83 })

To Reproduce

Steps to reproduce the behavior:

First of all, I merged two ontologies into one, then I did two terms lists subsetting them based on their prefixes. The first one contains OBA terms while the other contains VT terms.

oi = get_implementation_from_shorthand("grape:sqlite:../tmp/oba-vt.owl")
oba_terms = pd.read_csv('../tmp/oba_terms.txt', header=None)
#['OBA:1000035', 'OBA:1000045', 'OBA:0000003', 'OBA:0000005', 'OBA:0000006']
vt_terms = pd.read_csv('../tmp/vt_terms.txt', header=None)
#['VT:0000181', 'VT:0000362', 'VT:0000717', 'VT:0000813', 'VT:0001097']
tp = oi.all_by_all_pairwise_similarity(oba_list, vt_list)

Expected behavior

When I pass the same list in both of GrapeImplementation.all_by_all_pairwise_similarity parameters everything works fine.

tp = oi.all_by_all_pairwise_similarity(oba_list, oba_list)

for t in tp:
    print(t.ancestor_information_content)
10.202258110046387
5.5109100341796875
0.0001483669620938599
0.0001483669620938599
0.11778302490711212
5.5109100341796875
10.202258110046387
0.0001483669620938599
0.0001483669620938599
0.11778302490711212
10.202258110046387
5.587137222290039
5.587137222290039
10.202258110046387
0.0001483669620938599
0.0001483669620938599
10.202258110046387

Additional context

Library versions:
oaklib 0.1.70
oakx-grape 0.1.2
embiggen 0.11.39

Check if graph transposition impacts parsing

There are still some lingering issues with using original vs. transposed graphs, e.g.,

>>> predicates = ["rdfs:subClassOf"]
>>> tp = oi.termset_pairwise_similarity(["BFO:0000006"], ["BFO:0000018"], predicates)
Graph contains multiple disconnected components. Will ignore all but the largest component. 24 components are present. Largest component has 35 nodes.
>>> oi = get_implementation_from_shorthand("grape:sqlite:obo:phenio")
>>> tp = oi.termset_pairwise_similarity(["eye"], ["eyelash"], predicates)
Graph contains multiple disconnected components. Will ignore all but the largest component. 31982 components are present. Largest component has 242712 nodes.
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/harry/oakx-grape/src/oakx_grape/grape_implementation.py", line 324, in termset_pairwise_similarity
    resnik_model = self._make_grape_resnik_model(dag=dag)
  File "/home/harry/oakx-grape/src/oakx_grape/grape_implementation.py", line 287, in _make_grape_resnik_model
    resnik_model.fit(graph, node_counts=counts)
  File "/home/harry/oakx-grape/.venv/lib/python3.9/site-packages/embiggen/similarities/dag_resnik.py", line 32, in fit
    self._model.fit(
ValueError: The current graph instance Unnamed is not directed acyclic.

This suggests that the graph isn't being interpreted as a DAG because the edges are going in the wrong direction - an issue that transposition will fix unless the original graph is already "correct".
Graphs should only be transposed if necessary, not by default.

Add more specific examples to the docs

There are a few specific use cases where having access to grape in OAK would be particularly useful but requires some demonstration.
These should be in the docs.

For example:

  • Similarity metrics
  • Subgraphs
  • Composing graphs by predicate type as preprocessing for graph ML

problems on M1

When running poetry run python -m unittest

I get

======================================================================
ERROR: tests.test_plugin (unittest.loader._FailedTest)
----------------------------------------------------------------------
ImportError: Failed to import test module: tests.test_plugin
Traceback (most recent call last):
  File "/Users/cjm/opt/anaconda3/envs/oakx-grape-env/lib/python3.9/unittest/loader.py", line 436, in _find_test_path
    module = self._get_module_from_name(name)
  File "/Users/cjm/opt/anaconda3/envs/oakx-grape-env/lib/python3.9/unittest/loader.py", line 377, in _get_module_from_name
    __import__(name)
  File "/Users/cjm/repos/oakx-grape/tests/test_plugin.py", line 4, in <module>
    from oaklib.implementations import get_implementation_resolver
  File "/Users/cjm/Library/Caches/pypoetry/virtualenvs/oakx-grape-pUaGkv-X-py3.9/lib/python3.9/site-packages/oaklib/__init__.py", line 9, in <module>
    from oaklib.selector import get_implementation_from_shorthand  # noqa:F401
  File "/Users/cjm/Library/Caches/pypoetry/virtualenvs/oakx-grape-pUaGkv-X-py3.9/lib/python3.9/site-packages/oaklib/selector.py", line 8, in <module>
    from oaklib.implementations import GildaImplementation
  File "/Users/cjm/Library/Caches/pypoetry/virtualenvs/oakx-grape-pUaGkv-X-py3.9/lib/python3.9/site-packages/oaklib/implementations/__init__.py", line 28, in <module>
    from oaklib.implementations.pronto.pronto_implementation import ProntoImplementation
  File "/Users/cjm/Library/Caches/pypoetry/virtualenvs/oakx-grape-pUaGkv-X-py3.9/lib/python3.9/site-packages/oaklib/implementations/pronto/pronto_implementation.py", line 13, in <module>
    import pronto
  File "/Users/cjm/Library/Caches/pypoetry/virtualenvs/oakx-grape-pUaGkv-X-py3.9/lib/python3.9/site-packages/pronto/__init__.py", line 24, in <module>
    from .definition import Definition
  File "/Users/cjm/Library/Caches/pypoetry/virtualenvs/oakx-grape-pUaGkv-X-py3.9/lib/python3.9/site-packages/pronto/definition.py", line 5, in <module>
    from .xref import Xref
  File "/Users/cjm/Library/Caches/pypoetry/virtualenvs/oakx-grape-pUaGkv-X-py3.9/lib/python3.9/site-packages/pronto/xref.py", line 6, in <module>
    import fastobo
ImportError: dlopen(/Users/cjm/Library/Caches/pypoetry/virtualenvs/oakx-grape-pUaGkv-X-py3.9/lib/python3.9/site-packages/fastobo.cpython-39-darwin.so, 0x0002): tried: '/Users/cjm/Library/Caches/pypoetry/virtualenvs/oakx-grape-pUaGkv-X-py3.9/lib/python3.9/site-packages/fastobo.cpython-39-darwin.so' (mach-o file, but is an incompatible architecture (have (x86_64), need (arm64e)))

Re-use cached graph edge/nodelists?

OAKx-grape does not re-use existing, cached versions of KGX node/edgefiles.
Do we want to?

In practice, this doesn't lead to a massive slowdown, though for a larger graphs it does create a noticeable delay.

Could store with pystow in a manner much like the SQL versions, such that stored graphs are persistent.

Transpose only a subset of edges when loading graph into Grape

Right now, we transpose the entire graph when loading, so it's compatible with semsim operations.
This isn't quite right as edge types other than is_a don't quite make sense when transposed.

We may need to select only is_a edges when transposing the graph.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.