Code Monkey home page Code Monkey logo

mowl's Introduction

PyPI - Version Documentation Status

mOWL: Machine Learning Library with Ontologies

mOWL is a library that provides different machine learning methods in which ontologies are used as background knowledge. mOWL is developed mainly in Python, but we have integrated the functionalities of OWLAPI, which is written in Java, for which we use JPype to bind Python with the Java Virtual Machine (JVM).

Table of contents

Installation

System dependencies

  • JDK version >= 11
  • Python version: 3.8, 3.9, 3.10, 3.11
  • Conda version >= 4.x.x

Python requirements

  • Gensim >= 4.x.x
  • PyTorch >= 1.12.x
  • PyKEEN >= 1.10.1

Install from PyPi

pip install mowl-borg

Install from source

pip install git+https://github.com/bio-ontology-research-group/mowl

List of contributors

License

This software library is distributed under the BSD-3-Clause license

Documentation

Full documentation and API reference can be found in our ReadTheDocs website.

ChangeLog

ChangeLog is available in our changelog file and also in the release section.

Citation

If you used mOWL in your work, please consider citing this article:

@article{10.1093/bioinformatics/btac811,
    author = {Zhapa-Camacho, Fernando and Kulmanov, Maxat and Hoehndorf, Robert},
    title = "{mOWL: Python library for machine learning with biomedical ontologies}",
    journal = {Bioinformatics},
    year = {2022},
    month = {12},
    issn = {1367-4803},
    doi = {10.1093/bioinformatics/btac811},
    url = {https://doi.org/10.1093/bioinformatics/btac811},
    note = {btac811},
    eprint = {https://academic.oup.com/bioinformatics/advance-article-pdf/doi/10.1093/bioinformatics/btac811/48438324/btac811.pdf},
}

mowl's People

Contributors

azzatha avatar bit2424 avatar carsten-jahn avatar coolmaksat avatar dependabot[bot] avatar edigerad avatar ferzcam avatar leechuck avatar luis-sribeiro avatar smalghamdi avatar sonjakatz avatar trellixvulnteam avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

mowl's Issues

Doc misses comma

The documentation tutorial for "Translational embeddings๏ƒ" misses a comma after the epochs parameter.

Walking requires specified outfile

The walking classes write the walks to a file; could this default to a tmpfile (i.e., without specifying it manually) and only allow specifying the filename optionally if desired?

It seems that some terms were lost while implementing KGE on Ontologies.

Describe the bug

When I ran the custom go.owl file using the example TransE code, it seems that some terms were lost.

How to reproduce

There are mowl wrapped code as follows.

`
import mowl
mowl.init_jvm("20g")
from mowl.projection.edge import Edge
from mowl.projection import TaxonomyProjector

from mowl.datasets.base import PathDataset

dataset = PathDataset("go_cafa3.owl")

from mowl.models import GraphPlusPyKEENModel
from mowl.projection import DL2VecProjector
from pykeen.models import TransE
import torch as th

model = GraphPlusPyKEENModel(dataset)
model.set_projector(DL2VecProjector())
model.set_kge_method(TransE, random_seed=42)
model.optimizer = th.optim.Adam
model.lr = 0.001
model.batch_size = 32
model.train(epochs = 1)

class_embs = model.class_embeddings
role_embs = model.object_property_embeddings
ind_embs = model.individual_embeddings

terms = []
vectors = []
for i,word in enumerate(class_embs):
vector = class_embs[word]
items = word.split('/')
if len(items) > 1:
word = items[-1]
if word.startswith('GO') and not word.endswith('>'):
term = items[-1]
terms.append(term)
vectors.append(vector)

'GO:0005926' in terms
`

False

But GO_0005926 found in owl file like " <owl:Class rdf:about="http://purl.obolibrary.org/obo/GO_0005926"> <obo:IAO_0000231 rdf:resource="http://purl.obolibrary.org/obo/IAO_0000227"/> <obo:IAO_0100001 rdf:resource="http://purl.obolibrary.org/obo/GO_0005925"/> <owl:deprecated rdf:datatype="http://www.w3.org/2001/XMLSchema#boolean">true</owl:deprecated>
</owl:Class> ...".

It also occured in pykeen version code like:

`
import mowl
mowl.init_jvm("20g")
from mowl.projection.edge import Edge
from mowl.datasets.builtin import PPIYeastSlimDataset
from mowl.projection import TaxonomyProjector

from mowl.datasets.base import PathDataset

dataset = PathDataset("go.owl")

proj = TaxonomyProjector(True)

edges = proj.project(dataset.ontology)

#edges = [Edge("node1", "rel1", "node3"), Edge("node5", "rel2", "node1"), Edge("node2", "rel1", "node1")] # example of edges
triples_factory = Edge.as_pykeen(edges, create_inverse_triples = True)

from pykeen.models import TransE
pk_model = TransE(triples_factory=triples_factory, embedding_dim = 50, random_seed=42)
from mowl.kge import KGEModel

model = KGEModel(triples_factory, pk_model, epochs = 1, batch_size = 32)
model.train()
ent_embs = model.class_embeddings_dict
rel_embs = model.object_property_embeddings_dict

terms = []
vectors = []
for i,word in enumerate(ent_embs):
vector = ent_embs[word]
items = word.split('/')
if len(items) > 1:
word = items[-1]
if word.startswith('GO') and not word.endswith('>'):
term = items[-1]
terms.append(term)
vectors.append(vector)

'GO_0005926' in terms
`

False

And it can be observed that when running the code `proj = TaxonomyProjector(True)

edges = proj.project(dataset.ontology)

#edges = [Edge("node1", "rel1", "node3"), Edge("node5", "rel2", "node1"), Edge("node2", "rel1", "node1")] # example of edges
triples_factory = Edge.as_pykeen(edges, create_inverse_triples = True)`๏ผŒ it shows "INFO: Number of ontology classes: 50119", but the final len(terms) is only 42819. Is it because the outdated terms were discarded?

Environment

OS information

NAME="CentOS Linux"
VERSION="7 (Core)"
ID="centos"
ID_LIKE="rhel fedora"
VERSION_ID="7"
PRETTY_NAME="CentOS Linux 7 (Core)"
ANSI_COLOR="0;31"
CPE_NAME="cpe:/o:centos:centos:7"
HOME_URL="https://www.centos.org/"
BUG_REPORT_URL="https://bugs.centos.org/"
CENTOS_MANTISBT_PROJECT="CentOS-7"
CENTOS_MANTISBT_PROJECT_VERSION="7"
REDHAT_SUPPORT_PRODUCT="centos"
REDHAT_SUPPORT_PRODUCT_VERSION="7"

Python version

Python=3.8.13

mOWL version

mowl-borg==0.2.0

JDK version

openjdk 17.0.3-internal 2022-04-19

Additional information

If I need to use embeddings for outdated terms, how should I proceed?

A error when running mowl for opa2vec demo

When I ran the example code of opa2vec on https://mowl.readthedocs.io/en/latest/examples/syntactic/plot_2_opa2vec.html#sphx-glr-examples-syntactic-plot-2-opa2vec-py, there occured a error. How to fix it?
`---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
/home/wangbin/contrast_exp/opa2vec/opa2vec_cafa3.ipynb Cell 4 in 1
----> 1 subclass_axioms = mowl_reasoner.infer_subclass_axioms(classes)
2 equivalent_class_axioms = mowl_reasoner.infer_equivalent_class_axioms(classes)

File /home/software/anaconda3/envs/mowl/lib/python3.8/site-packages/mowl/reasoning/base.py:15, in count_added_axioms..wrapper(self, ontology)
13 @wraps(func)
14 def wrapper(self, ontology):
---> 15 initial_number = ontology.getAxiomCount()
16 func(self, ontology)
17 final_number = ontology.getAxiomCount()

AttributeError: 'org.semanticweb.owlapi.util.CollectionFactory.Cond' object has no attribute 'getAxiomCount'
`

Error running the example notebook from biohackathon MENA 2023

Trying to run the notebook available here: https://drive.google.com/drive/folders/1JHbaV__9f_zVx-lIvp82b2mvc6AjaseJ

Everything works until this codeblock:

from gensim.models.word2vec import LineSentence
from gensim.models import Word2Vec

walk_corpus_file = walker.outfile
sentences = LineSentence(walk_corpus_file)

w2v_model = Word2Vec(sentences, size = 20)

Which gives this error:

TypeError                                 Traceback (most recent call last)
Cell In [27], line 7
      4 walk_corpus_file = walker.outfile
      5 sentences = LineSentence(walk_corpus_file)
----> 7 w2v_model = Word2Vec(sentences, size = 20)

TypeError: __init__() got an unexpected keyword argument 'size'

It seems like the issue comes from the version of Gensim installed by mOWL. There are currently no specification for the gensim version, it might be necessary to add one: https://github.com/bio-ontology-research-group/mowl/blob/main/environment.yml#L8

The versions installed for me are:

  • python 3.9.15
  • gensim 4.3.0

issues with installation

Hi there:
I meet some issues with the package installation. Following the documentations I install everything that was mentioned in https://mowl.readthedocs.io/en/latest/install/index.html. When I tried to run the embedding methods provided at this page 'https://mowl.readthedocs.io/en/latest/graphs/kge.html' I got this error.
image

I tried to install from pypi and build from source code, but it still didn't work. Can you help me to figure out what is the issue? Thank you in advance!

Implement categorical projection of ontologies

mOWL support different types of models. Select which type fits better for your suggestion.

Ontology graph projection

Publication Link

https://arxiv.org/abs/2305.07163

Reference Implementation

https://github.com/bio-ontology-research-group/catE

Additional Implementations

No response

Additional Information

This graph projection must support some parameters such as: saturation steps, saturation with transitivity, and separation of concept and role categories.

Import ELEModule error

Describe the bug

from mowl.nn import ELEModule
from mowl.base_models.elmodel import EmbeddingELModel


ImportError Traceback (most recent call last)
Cell In[1], line 1
----> 1 from mowl.nn import ELEModule
2 from mowl.base_models.elmodel import EmbeddingELModel

ImportError: cannot import name 'ELEModule' from 'mowl.nn'

also tried:

from mowl.models.elembeddings.module import ELEModule
from mowl.base_models.elmodel import EmbeddingELModel

ImportError: Attempt to create Java package 'org' without jvm

How to reproduce

from mowl.nn import ELEModule
from mowl.base_models.elmodel import EmbeddingELModel

#also tried

from mowl.models.elembeddings.module import ELEModule
from mowl.base_models.elmodel import EmbeddingELModel

Environment

Python (using Pycharm)
mOWL version 0.2.1

Additional information

No response

Run the Pretrained Model without a Validation Set or Test Set

Describe the bug

Hi, I tried to implement mowl on my ontology for the embedding. From what I see in the documentation, the validation and test ontology are just optional.

image

And I don't have validation set or test set at the moment, so I tried to run the model with my own training dataset.

However, it raised an error stating that "AttributeError: Validation dataset is None.", as shown in the screenshot below.

image
image

I wonder whether it is impossible to run the model without validation ontology or test ontology. If so, how can I create those validation set or test set from the ontology I have at the moment? I have some ontology in the owl format similar to this:

<!-- http://www.opengis.net/ont/geosparql#Feature -->

<owl:Class rdf:about="http://www.opengis.net/ont/geosparql#Feature">
    <rdfs:subClassOf rdf:resource="http://www.opengis.net/ont/geosparql#SpatialObject"/>
    <owl:disjointWith rdf:resource="http://www.opengis.net/ont/geosparql#Geometry"/>
    <rdfs:isDefinedBy rdf:resource="http://www.opengis.net/ont/geosparql#"/>
    <rdfs:isDefinedBy rdf:resource="http://www.opengis.net/spec/geosparql/1.0/req/core/feature-class"/>
    <rdfs:isDefinedBy rdf:resource="http://www.opengis.net/spec/geosparql/1.1/req/core/feature-class"/>
    <skos:definition xml:lang="en">A discrete spatial phenomenon in a universe of discourse.</skos:definition>
    <skos:example rdf:resource="http://www.opengis.net/spec/geosparql/1.1/specification.html#C.1.1.2.1"/>
    <skos:example rdf:resource="http://www.opengis.net/spec/geosparql/1.1/specification.html#C.1.1.2.2"/>
    <skos:example rdf:resource="http://www.opengis.net/spec/geosparql/1.1/specification.html#C.1.1.2.3"/>
    <skos:example rdf:resource="http://www.opengis.net/spec/geosparql/1.1/specification.html#C.1.1.2.4"/>
    <skos:example rdf:resource="http://www.opengis.net/spec/geosparql/1.1/specification.html#C.1.1.2.5"/>
    <skos:example rdf:resource="http://www.opengis.net/spec/geosparql/1.1/specification.html#C.1.1.2.6"/>
    <skos:example rdf:resource="http://www.opengis.net/spec/geosparql/1.1/specification.html#C.1.1.2.7"/>
    <skos:example rdf:resource="http://www.opengis.net/spec/geosparql/1.1/specification.html#C.1.1.2.8"/>
    <skos:example rdf:resource="http://www.opengis.net/spec/geosparql/1.1/specification.html#C.1.1.2.9"/>
    <skos:example rdf:resource="http://www.opengis.net/spec/geosparql/1.1/specification.html#C.1.1.3.2"/>
    <skos:example rdf:resource="http://www.opengis.net/spec/geosparql/1.1/specification.html#C.1.1.3.3"/>
    <skos:example rdf:resource="http://www.opengis.net/spec/geosparql/1.1/specification.html#C.1.2.2"/>
    <skos:example rdf:resource="http://www.opengis.net/spec/geosparql/1.1/specification.html#C.1.2.3"/>
    <skos:example rdf:resource="http://www.opengis.net/spec/geosparql/1.1/specification.html#C.1.2.4"/>
    <skos:note xml:lang="en">A Feature represents a uniquely identifiable phenomenon, for example a river or an apple. While such phenomena (and therefore the Features used to represent them) are bounded, their boundaries may be crisp (e.g., the declared boundaries of a state), vague (e.g., the delineation of a valley versus its neighboring mountains), and change with time (e.g., a storm front). While discrete in nature, Features may be created from continuous observations, such as an isochrone that determines the region that can be reached by ambulance within 5 minutes.</skos:note>
    <skos:prefLabel xml:lang="en">Feature</skos:prefLabel>
</owl:Class>

How to reproduce

from tqdm import trange, tqdm
import torch as th
import torch.nn as nn
from mowl.datasets.base import PathDataset, Dataset, RemoteDataset, OWLClasses
from mowl.models.elembeddings.examples.model_ppi import ELEmPPI
from mowl.projection.factory import projector_factory

#Build a subclass of PathDataset to implement a customized evaluation_class
class CustomDataset(PathDataset):
    @property
    def evaluation_classes(self):
        """Classes that are used in evaluation
        """

        if self._evaluation_classes is None:
            gis = set()
            for owl_name, owl_cls in self.classes.as_dict.items():
                if "http://www.opengis" in owl_name:
                    gis.add(owl_cls)
            self._evaluation_classes = OWLClasses(gis), OWLClasses(gis)

        return self._evaluation_classes

ds = CustomDataset(ontology_path="opengis.owl")
dataset = ds

model = ELEmPPI(dataset,
                embed_dim=30,
                margin=0.1,
                reg_norm=1,
                learning_rate=0.001,
                epochs=20,
                batch_size=4096,
                model_filepath=r"C:\Users\Steven\mOWL",
                device='cpu')

# Set the number of individuals
model.module.ind_embed = nn.Embedding(num_embeddings=len(dataset.classes), embedding_dim=30)

# Training
model.train()

Environment

Windows 10
Python version 3.11.9
JDK version 11.0.22

Additional information

No response

Add build dependencies

It would be good if the "build from github" could list the build dependencies necessary to get the build (JVM, python version, gradle, etc.). At the moment, users installing from source need to figure out what to install.

Customized random walks

Greetings,

Thank you for the nice tool!

I wonder if it's possible to customize the walks by including only the walks for specific entities? it could be an input file with the entities that we need to generate the embeddings for them. I think this might be more time/space-efficient, rather than generating representations for all the entities/classes.

Best,

A bug that may be due to a mismatch between version of the code and documentation

Describe the bug

  1. I installed mowl package(version=0.2.0).
  2. I ran the example code in documentation link.
  3. I got a Module Not Found Error in the last of link tutorial code block.

ModuleNotFoundError Traceback (most recent call last)
/home/wangbin/contrast_exp/TransH/transh_cafa3.ipynb Cell 4 in 5
1 # from mowl.models.graph_kge.graph_pykeen_model import GraphPlusPyKEENModel
4 from mowl.datasets.builtin import FamilyDataset
----> 5 from mowl.models import GraphPlusPyKEENModel
6 from mowl.projection import DL2VecProjector
7 from pykeen.models import TransE
File /home/software/anaconda3/envs/mowl/lib/python3.8/site-packages/mowl/models/init.py:3
1 from mowl.models.elembeddings.model import ELEmbeddings
2 from mowl.models.elboxembeddings.model import ELBoxEmbeddings
----> 3 from mowl.models.graph_random_walk.random_walk_w2v_model import RandomWalkPlusW2VModel
4 from mowl.models.graph_kge.graph_pykeen_model import GraphPlusPyKEENModel
5 from mowl.models.syntactic.w2v_model import SyntacticPlusW2VModel
ModuleNotFoundError: No module named 'mowl.models.graph_random_walk'

How to reproduce

This is the code I exactly ran.

`
import mowl
mowl.init_jvm("20g")
from mowl.projection.edge import Edge
from mowl.datasets.builtin import PPIYeastSlimDataset
from mowl.projection import TaxonomyProjector

ds = PPIYeastSlimDataset()
proj = TaxonomyProjector(True)

edges = proj.project(ds.ontology)

#edges = [Edge("node1", "rel1", "node3"), Edge("node5", "rel2", "node1"), Edge("node2", "rel1", "node1")] # example of edges
triples_factory = Edge.as_pykeen(edges, create_inverse_triples = True)

from pykeen.models import TransE
pk_model = TransE(triples_factory=triples_factory, embedding_dim = 50, random_seed=42)

from mowl.kge import KGEModel

model = KGEModel(triples_factory, pk_model, epochs = 10, batch_size = 32)
model.train()
ent_embs = model.class_embeddings_dict
rel_embs = model.object_property_embeddings_dict

from mowl.datasets.builtin import FamilyDataset
from mowl.models import GraphPlusPyKEENModel
from mowl.projection import DL2VecProjector
from pykeen.models import TransE
import torch as th

model = GraphPlusPyKEENModel(FamilyDataset())
model.set_projector(DL2VecProjector())
model.set_kge_method(TransE, random_seed=42)
model.optimizer = th.optim.Adam
model.lr = 0.001
model.batch_size = 32
model.train(epochs = 2)

Get embeddings

class_embs = model.class_embeddings
role_embs = model.object_property_embeddings
ind_embs = model.individual_embeddings
`

Environment

OS information

NAME="CentOS Linux"
VERSION="7 (Core)"
ID="centos"
ID_LIKE="rhel fedora"
VERSION_ID="7"
PRETTY_NAME="CentOS Linux 7 (Core)"
ANSI_COLOR="0;31"
CPE_NAME="cpe:/o:centos:centos:7"
HOME_URL="https://www.centos.org/"
BUG_REPORT_URL="https://bugs.centos.org/"
CENTOS_MANTISBT_PROJECT="CentOS-7"
CENTOS_MANTISBT_PROJECT_VERSION="7"
REDHAT_SUPPORT_PRODUCT="centos"
REDHAT_SUPPORT_PRODUCT_VERSION="7"

Python version

Python=3.8.13

mOWL version

mowl-borg==0.2.0

JDK version

openjdk 17.0.3-internal 2022-04-19

Additional information

The main reason is this code.
image

HTTPError when using builtin datasets

Describe the bug

Hi,
When trying to your your demo code for the builtin datasets I get, eg, the following error

HTTPError: 403 Client Error: Forbidden for url: https://bio2vec.cbrc.kaust.edu.sa/data/mowl/ppi_yeast_slim.tar.gz

I get the same trying to wget the tar file

How to reproduce

import mowl
mowl.init_jvm("5g")
from mowl.datasets.builtin import PPIYeastSlimDataset
ds = PPIYeastSlimDataset()
train_ontology = ds.ontology
valid_ontology = ds.validation
test_ontology = ds.testing

Environment

Python 3.10.12
pip installed mowl-borg version 0.3.0

Additional information

No response

Cannot find gateway

Hello,

I installed mowl using pip according to the instructions on the website, but when I try to run the first example I get:

File ~/.pyenv/versions/3.9.13/lib/python3.9/site-packages/mowl/datasets/__init__.py:7, in <module>
      5 dirname = os.path.dirname(__file__)
      6 jars_dir = os.path.join(dirname, "../../gateway/build/distributions/gateway/lib/")
----> 7 jars = f'{str.join(":", [jars_dir + name for name in os.listdir(jars_dir)])}'
      9 if not jpype.isJVMStarted():
     10     jpype.startJVM(
     11         jpype.getDefaultJVMPath(), "-ea",
     12         "-Xmx10g",
     13         "-Djava.class.path=" + jars,
     14         convertStrings=False)

FileNotFoundError: [Errno 2] No such file or directory: '/home/leechuck/.pyenv/versions/3.9.13/lib/python3.9/site-packages/mowl/datasets/../../gateway/build/distributions/gateway/lib/'

Do I need to install another package?

setup.py Error While Building from Source

Describe the bug

Hi, I was following the tutorial here https://pypi.org/project/mowl-borg/ to set up mowl. When I tried to build mowl from source, I followed the commands here:

git clone https://github.com/bio-ontology-research-group/mowl.git

cd mowl

conda env create -f envs/environment_3_8.yml
conda activate mowl

./build_jars.sh

python setup.py install

But at the last step "python setup.py install", it raised an error shown as in the screenshot:
nimport_error

There is a nimport in the first line of module.py, which is supposed to be import. However, I could not manually modify the code and run "python setup.py install" again. Because everytime I ran "python setup.py install", the command generated the module.py again and rewrote the old one so that in the new module.py, it is still nimport.

How to reproduce

git clone https://github.com/bio-ontology-research-group/mowl.git

cd mowl

conda env create -f envs/environment_3_8.yml
conda activate mowl

./build_jars.sh

python setup.py install

Environment

Windows 10
Python version 3.12.1
JDK version 11.0.22

Additional information

No response

Add special visualization for geometric embeddings

Not all embeddings are just points in a vector space, some embeddings (like ELEm and ELBE) project to regions. Would it be possible to visualize those in 2D space showing the regions as regions (instead of points as with TSNE)?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.