incatools / oakx-grape Goto Github PK
View Code? Open in Web Editor NEWExperimental OAK plugin for Grape
Home Page: https://incatools.github.io/oakx-grape/
License: BSD 3-Clause "New" or "Revised" License
Experimental OAK plugin for Grape
Home Page: https://incatools.github.io/oakx-grape/
License: BSD 3-Clause "New" or "Revised" License
Run KGX validate after KGX nodes/edges are produced, to see if any changes have been made in the transformation process.
Component
GrapeImplementation.all_by_all_pairwise_similarity
Description
During the GrapeImplementation.all_by_all_pairwise_similarity
method call, I got an index out of bounds exception:
IndexError Traceback (most recent call last)
Cell In [64], line 1
----> 1 tp = oi.all_by_all_pairwise_similarity(oba_list, vt_list)
File ~/.pyenv/versions/3.10.8/lib/python3.10/site-packages/oakx_grape/grape_implementation.py:402, in GrapeImplementation.all_by_all_pairwise_similarity(self, subjects, objects, predicates)
398 raise ValueError("For now can only use hardcoded ensmallen predicates")
400 resnik_model = self._make_grape_resnik_model()
--> 402 sim = resnik_model.get_similarities_from_bipartite_graph_node_names(
403 source_node_names=subjects,
404 destination_node_names=objects,
405 return_similarities_dataframe=True,
406 return_node_names=True,
407 )
409 pairs = iter(self._df_to_pairwise_similarity(sim))
411 return pairs
File ~/.pyenv/versions/3.10.8/lib/python3.10/site-packages/embiggen/similarities/dag_resnik.py:145, in DAGResnik.get_similarities_from_bipartite_graph_node_names(self, source_node_names, destination_node_names, minimum_similarity, return_similarities_dataframe, return_node_names)
120 def get_similarities_from_bipartite_graph_node_names(
121 self,
122 source_node_names: List[str],
(...)
126 return_node_names: bool = False
...
81 ),
82 "resnik_score": similarities
83 })
To Reproduce
Steps to reproduce the behavior:
First of all, I merged two ontologies into one, then I did two terms lists subsetting them based on their prefixes. The first one contains OBA terms while the other contains VT terms.
oi = get_implementation_from_shorthand("grape:sqlite:../tmp/oba-vt.owl")
oba_terms = pd.read_csv('../tmp/oba_terms.txt', header=None)
#['OBA:1000035', 'OBA:1000045', 'OBA:0000003', 'OBA:0000005', 'OBA:0000006']
vt_terms = pd.read_csv('../tmp/vt_terms.txt', header=None)
#['VT:0000181', 'VT:0000362', 'VT:0000717', 'VT:0000813', 'VT:0001097']
tp = oi.all_by_all_pairwise_similarity(oba_list, vt_list)
Expected behavior
When I pass the same list in both of GrapeImplementation.all_by_all_pairwise_similarity
parameters everything works fine.
tp = oi.all_by_all_pairwise_similarity(oba_list, oba_list)
for t in tp:
print(t.ancestor_information_content)
10.202258110046387
5.5109100341796875
0.0001483669620938599
0.0001483669620938599
0.11778302490711212
5.5109100341796875
10.202258110046387
0.0001483669620938599
0.0001483669620938599
0.11778302490711212
10.202258110046387
5.587137222290039
5.587137222290039
10.202258110046387
0.0001483669620938599
0.0001483669620938599
10.202258110046387
Additional context
Library versions:
oaklib 0.1.70
oakx-grape 0.1.2
embiggen 0.11.39
There are still some lingering issues with using original vs. transposed graphs, e.g.,
>>> predicates = ["rdfs:subClassOf"]
>>> tp = oi.termset_pairwise_similarity(["BFO:0000006"], ["BFO:0000018"], predicates)
Graph contains multiple disconnected components. Will ignore all but the largest component. 24 components are present. Largest component has 35 nodes.
>>> oi = get_implementation_from_shorthand("grape:sqlite:obo:phenio")
>>> tp = oi.termset_pairwise_similarity(["eye"], ["eyelash"], predicates)
Graph contains multiple disconnected components. Will ignore all but the largest component. 31982 components are present. Largest component has 242712 nodes.
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/harry/oakx-grape/src/oakx_grape/grape_implementation.py", line 324, in termset_pairwise_similarity
resnik_model = self._make_grape_resnik_model(dag=dag)
File "/home/harry/oakx-grape/src/oakx_grape/grape_implementation.py", line 287, in _make_grape_resnik_model
resnik_model.fit(graph, node_counts=counts)
File "/home/harry/oakx-grape/.venv/lib/python3.9/site-packages/embiggen/similarities/dag_resnik.py", line 32, in fit
self._model.fit(
ValueError: The current graph instance Unnamed is not directed acyclic.
This suggests that the graph isn't being interpreted as a DAG because the edges are going in the wrong direction - an issue that transposition will fix unless the original graph is already "correct".
Graphs should only be transposed if necessary, not by default.
There are a few specific use cases where having access to grape
in OAK would be particularly useful but requires some demonstration.
These should be in the docs.
For example:
When running poetry run python -m unittest
I get
======================================================================
ERROR: tests.test_plugin (unittest.loader._FailedTest)
----------------------------------------------------------------------
ImportError: Failed to import test module: tests.test_plugin
Traceback (most recent call last):
File "/Users/cjm/opt/anaconda3/envs/oakx-grape-env/lib/python3.9/unittest/loader.py", line 436, in _find_test_path
module = self._get_module_from_name(name)
File "/Users/cjm/opt/anaconda3/envs/oakx-grape-env/lib/python3.9/unittest/loader.py", line 377, in _get_module_from_name
__import__(name)
File "/Users/cjm/repos/oakx-grape/tests/test_plugin.py", line 4, in <module>
from oaklib.implementations import get_implementation_resolver
File "/Users/cjm/Library/Caches/pypoetry/virtualenvs/oakx-grape-pUaGkv-X-py3.9/lib/python3.9/site-packages/oaklib/__init__.py", line 9, in <module>
from oaklib.selector import get_implementation_from_shorthand # noqa:F401
File "/Users/cjm/Library/Caches/pypoetry/virtualenvs/oakx-grape-pUaGkv-X-py3.9/lib/python3.9/site-packages/oaklib/selector.py", line 8, in <module>
from oaklib.implementations import GildaImplementation
File "/Users/cjm/Library/Caches/pypoetry/virtualenvs/oakx-grape-pUaGkv-X-py3.9/lib/python3.9/site-packages/oaklib/implementations/__init__.py", line 28, in <module>
from oaklib.implementations.pronto.pronto_implementation import ProntoImplementation
File "/Users/cjm/Library/Caches/pypoetry/virtualenvs/oakx-grape-pUaGkv-X-py3.9/lib/python3.9/site-packages/oaklib/implementations/pronto/pronto_implementation.py", line 13, in <module>
import pronto
File "/Users/cjm/Library/Caches/pypoetry/virtualenvs/oakx-grape-pUaGkv-X-py3.9/lib/python3.9/site-packages/pronto/__init__.py", line 24, in <module>
from .definition import Definition
File "/Users/cjm/Library/Caches/pypoetry/virtualenvs/oakx-grape-pUaGkv-X-py3.9/lib/python3.9/site-packages/pronto/definition.py", line 5, in <module>
from .xref import Xref
File "/Users/cjm/Library/Caches/pypoetry/virtualenvs/oakx-grape-pUaGkv-X-py3.9/lib/python3.9/site-packages/pronto/xref.py", line 6, in <module>
import fastobo
ImportError: dlopen(/Users/cjm/Library/Caches/pypoetry/virtualenvs/oakx-grape-pUaGkv-X-py3.9/lib/python3.9/site-packages/fastobo.cpython-39-darwin.so, 0x0002): tried: '/Users/cjm/Library/Caches/pypoetry/virtualenvs/oakx-grape-pUaGkv-X-py3.9/lib/python3.9/site-packages/fastobo.cpython-39-darwin.so' (mach-o file, but is an incompatible architecture (have (x86_64), need (arm64e)))
OAKx-grape does not re-use existing, cached versions of KGX node/edgefiles.
Do we want to?
In practice, this doesn't lead to a massive slowdown, though for a larger graphs it does create a noticeable delay.
Could store with pystow in a manner much like the SQL versions, such that stored graphs are persistent.
it's awesome we are now wrapping the underlying embiggen/ensmallen code:
it would be great to see docs on underying rust objects, e.g
Right now, we transpose the entire graph when loading, so it's compatible with semsim operations.
This isn't quite right as edge types other than is_a don't quite make sense when transposed.
We may need to select only is_a edges when transposing the graph.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.