incatools / oakx-grape Goto Github PK

View Code? Open in Web Editor NEW

5.0 5.0 2.0 4.14 MB

Experimental OAK plugin for Grape

Home Page: https://incatools.github.io/oakx-grape/

License: BSD 3-Clause "New" or "Revised" License

Python 99.68% JavaScript 0.32%

oakx-grape's People

Contributors

Stargazers

Watchers

Forkers

justaddcoffee kevinschaper

oakx-grape's Issues

Validate integrity of KGX graph produced by oakx-grape

Run KGX validate after KGX nodes/edges are produced, to see if any changes have been made in the transformation process.

Index is out of bound during the all_by_all_pairwise_similarity

Component

GrapeImplementation.all_by_all_pairwise_similarity

Description

During the GrapeImplementation.all_by_all_pairwise_similarity method call, I got an index out of bounds exception:

IndexError                                Traceback (most recent call last)
Cell In [64], line 1
----> 1 tp = oi.all_by_all_pairwise_similarity(oba_list, vt_list)

File ~/.pyenv/versions/3.10.8/lib/python3.10/site-packages/oakx_grape/grape_implementation.py:402, in GrapeImplementation.all_by_all_pairwise_similarity(self, subjects, objects, predicates)
    398     raise ValueError("For now can only use hardcoded ensmallen predicates")
    400 resnik_model = self._make_grape_resnik_model()
--> 402 sim = resnik_model.get_similarities_from_bipartite_graph_node_names(
    403     source_node_names=subjects,
    404     destination_node_names=objects,
    405     return_similarities_dataframe=True,
    406     return_node_names=True,
    407 )
    409 pairs = iter(self._df_to_pairwise_similarity(sim))
    411 return pairs

File ~/.pyenv/versions/3.10.8/lib/python3.10/site-packages/embiggen/similarities/dag_resnik.py:145, in DAGResnik.get_similarities_from_bipartite_graph_node_names(self, source_node_names, destination_node_names, minimum_similarity, return_similarities_dataframe, return_node_names)
    120 def get_similarities_from_bipartite_graph_node_names(
    121     self,
    122     source_node_names: List[str],
   (...)
    126     return_node_names: bool = False
...
     81     ),
     82     "resnik_score": similarities
     83 })

To Reproduce

Steps to reproduce the behavior:

First of all, I merged two ontologies into one, then I did two terms lists subsetting them based on their prefixes. The first one contains OBA terms while the other contains VT terms.

oi = get_implementation_from_shorthand("grape:sqlite:../tmp/oba-vt.owl")
oba_terms = pd.read_csv('../tmp/oba_terms.txt', header=None)
#['OBA:1000035', 'OBA:1000045', 'OBA:0000003', 'OBA:0000005', 'OBA:0000006']
vt_terms = pd.read_csv('../tmp/vt_terms.txt', header=None)
#['VT:0000181', 'VT:0000362', 'VT:0000717', 'VT:0000813', 'VT:0001097']
tp = oi.all_by_all_pairwise_similarity(oba_list, vt_list)

Expected behavior

When I pass the same list in both of GrapeImplementation.all_by_all_pairwise_similarity parameters everything works fine.

tp = oi.all_by_all_pairwise_similarity(oba_list, oba_list)

for t in tp:
    print(t.ancestor_information_content)

10.202258110046387
5.5109100341796875
0.0001483669620938599
0.0001483669620938599
0.11778302490711212
5.5109100341796875
10.202258110046387
0.0001483669620938599
0.0001483669620938599
0.11778302490711212
10.202258110046387
5.587137222290039
5.587137222290039
10.202258110046387
0.0001483669620938599
0.0001483669620938599
10.202258110046387

Additional context

Library versions:
oaklib 0.1.70
oakx-grape 0.1.2
embiggen 0.11.39

Check if graph transposition impacts parsing

There are still some lingering issues with using original vs. transposed graphs, e.g.,

>>> predicates = ["rdfs:subClassOf"]
>>> tp = oi.termset_pairwise_similarity(["BFO:0000006"], ["BFO:0000018"], predicates)
Graph contains multiple disconnected components. Will ignore all but the largest component. 24 components are present. Largest component has 35 nodes.
>>> oi = get_implementation_from_shorthand("grape:sqlite:obo:phenio")
>>> tp = oi.termset_pairwise_similarity(["eye"], ["eyelash"], predicates)
Graph contains multiple disconnected components. Will ignore all but the largest component. 31982 components are present. Largest component has 242712 nodes.
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/harry/oakx-grape/src/oakx_grape/grape_implementation.py", line 324, in termset_pairwise_similarity
    resnik_model = self._make_grape_resnik_model(dag=dag)
  File "/home/harry/oakx-grape/src/oakx_grape/grape_implementation.py", line 287, in _make_grape_resnik_model
    resnik_model.fit(graph, node_counts=counts)
  File "/home/harry/oakx-grape/.venv/lib/python3.9/site-packages/embiggen/similarities/dag_resnik.py", line 32, in fit
    self._model.fit(
ValueError: The current graph instance Unnamed is not directed acyclic.

This suggests that the graph isn't being interpreted as a DAG because the edges are going in the wrong direction - an issue that transposition will fix unless the original graph is already "correct".
Graphs should only be transposed if necessary, not by default.

Add more specific examples to the docs

There are a few specific use cases where having access to grape in OAK would be particularly useful but requires some demonstration.
These should be in the docs.

For example:

Similarity metrics
Subgraphs
Composing graphs by predicate type as preprocessing for graph ML

problems on M1

When running poetry run python -m unittest

I get

======================================================================
ERROR: tests.test_plugin (unittest.loader._FailedTest)
----------------------------------------------------------------------
ImportError: Failed to import test module: tests.test_plugin
Traceback (most recent call last):
  File "/Users/cjm/opt/anaconda3/envs/oakx-grape-env/lib/python3.9/unittest/loader.py", line 436, in _find_test_path
    module = self._get_module_from_name(name)
  File "/Users/cjm/opt/anaconda3/envs/oakx-grape-env/lib/python3.9/unittest/loader.py", line 377, in _get_module_from_name
    __import__(name)
  File "/Users/cjm/repos/oakx-grape/tests/test_plugin.py", line 4, in <module>
    from oaklib.implementations import get_implementation_resolver
  File "/Users/cjm/Library/Caches/pypoetry/virtualenvs/oakx-grape-pUaGkv-X-py3.9/lib/python3.9/site-packages/oaklib/__init__.py", line 9, in <module>
    from oaklib.selector import get_implementation_from_shorthand  # noqa:F401
  File "/Users/cjm/Library/Caches/pypoetry/virtualenvs/oakx-grape-pUaGkv-X-py3.9/lib/python3.9/site-packages/oaklib/selector.py", line 8, in <module>
    from oaklib.implementations import GildaImplementation
  File "/Users/cjm/Library/Caches/pypoetry/virtualenvs/oakx-grape-pUaGkv-X-py3.9/lib/python3.9/site-packages/oaklib/implementations/__init__.py", line 28, in <module>
    from oaklib.implementations.pronto.pronto_implementation import ProntoImplementation
  File "/Users/cjm/Library/Caches/pypoetry/virtualenvs/oakx-grape-pUaGkv-X-py3.9/lib/python3.9/site-packages/oaklib/implementations/pronto/pronto_implementation.py", line 13, in <module>
    import pronto
  File "/Users/cjm/Library/Caches/pypoetry/virtualenvs/oakx-grape-pUaGkv-X-py3.9/lib/python3.9/site-packages/pronto/__init__.py", line 24, in <module>
    from .definition import Definition
  File "/Users/cjm/Library/Caches/pypoetry/virtualenvs/oakx-grape-pUaGkv-X-py3.9/lib/python3.9/site-packages/pronto/definition.py", line 5, in <module>
    from .xref import Xref
  File "/Users/cjm/Library/Caches/pypoetry/virtualenvs/oakx-grape-pUaGkv-X-py3.9/lib/python3.9/site-packages/pronto/xref.py", line 6, in <module>
    import fastobo
ImportError: dlopen(/Users/cjm/Library/Caches/pypoetry/virtualenvs/oakx-grape-pUaGkv-X-py3.9/lib/python3.9/site-packages/fastobo.cpython-39-darwin.so, 0x0002): tried: '/Users/cjm/Library/Caches/pypoetry/virtualenvs/oakx-grape-pUaGkv-X-py3.9/lib/python3.9/site-packages/fastobo.cpython-39-darwin.so' (mach-o file, but is an incompatible architecture (have (x86_64), need (arm64e)))

Re-use cached graph edge/nodelists?

OAKx-grape does not re-use existing, cached versions of KGX node/edgefiles.
Do we want to?

In practice, this doesn't lead to a massive slowdown, though for a larger graphs it does create a noticeable delay.

Could store with pystow in a manner much like the SQL versions, such that stored graphs are persistent.

provide docs on underlying methods

it's awesome we are now wrapping the underlying embiggen/ensmallen code:

oakx-grape/src/oakx_grape/grape_implementation.py

Line 274 in 088f2d2

    
           def _make_grape_resnik_model(self, counts: dict = None, dag: Graph = None) -> DAGResnik:

it would be great to see docs on underying rust objects, e.g

oakx-grape/src/oakx_grape/grape_implementation.py

Line 286 in 088f2d2

resnik_model = DAGResnik()

create an exemplar jupyter notebook

Transpose only a subset of edges when loading graph into Grape

Right now, we transpose the entire graph when loading, so it's compatible with semsim operations.
This isn't quite right as edge types other than is_a don't quite make sense when transposed.

We may need to select only is_a edges when transposing the graph.

incatools / oakx-grape Goto Github PK

oakx-grape's People

Contributors

Stargazers

Watchers

Forkers

oakx-grape's Issues

Validate integrity of KGX graph produced by oakx-grape

Index is out of bound during the all_by_all_pairwise_similarity

Check if graph transposition impacts parsing

Add more specific examples to the docs

problems on M1

Re-use cached graph edge/nodelists?

provide docs on underlying methods

create an exemplar jupyter notebook

Transpose only a subset of edges when loading graph into Grape

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent