pyt-team / toponetx Goto Github PK
View Code? Open in Web Editor NEWComputing on Topological Domains
License: MIT License
Computing on Topological Domains
License: MIT License
Many function signatures inside the individual complex classes are already prepared to consider weights (ref. #242), however the functionality behind these parameters is largely non-existent yet. This issue is meant to help keep track of the progress for this task.
The implementation would largely follow the concept used in NetworkX: Weights are just user-defined properties of the atoms in a complex. The functions take in a property name and considers the (numeric) values under this name as weights.
See for example nx.dijkstra_path().
SC = SimplicialComplex()
SC.add_simplex((1, 2), distance=4)
SC.add_simplex((2, 3), distance=10)
_ = SC.coadjacency_matrix(rank=0, weight="distance") # Use `distance` property on the simplices as weights
Create a class that defines the layer of a pre-existing hypergraph neural network (as described in the literature review here) and implement this layer in a trainable neural network in a Jupyter notebook tutorial.
This will provide a working example for users wishing to implement and train a hypergraph neural network using TopoModelX.
The neural network layer goes into the folder TopoModelX/topomodelx/nn/hypergraph and the tutorial goes into the folder TopoModelX/tutorials.
With a corresponding folder of tests in TopoModelxX/test/nn/hypergraph and TopoModelX/test/tutorials.
Only integrate a new function into topomodelx if it comes with its corresponding test function.
Iterating over complexes is inconsistent across different complex types:
We should make this consistent across all implementations and add a definition to the Complex
ABC. I would suggest for the __iter__
method to iterate over all atoms in a complex. Any thoughts?
There is a bug in the skeleton method of the Hyperedge View Class.
The relevant part of code that introduces the bug is reproduced below:
def skeleton(self, rank, level=None):
"""Skeleton of the complex."""
if level is None or level == "equal":
elements = []
if rank in self.allranks:
return sorted(list(self.hyperedge_dict[rank].keys()))
else:
return []
elif level == "upper" or level == "up":
elements = []
for rank in self.allranks:
if rank >= rank:
elements = elements + list(self.hyperedge_dict[rank].keys())
return sorted(elements)
elif level == "lower" or level == "down":
elements = []
for rank in self.allranks:
if rank <= rank:
elements = elements + list(self.hyperedge_dict[rank].keys())
return sorted(elements)
else:
raise TopoNetXError(
"level must be None, equal, 'upper', 'lower', 'up', or 'down' "
As can be seen in the above code, the input parameter rank
is overwritten by the for loop variable rank
which makes the condition to always evaluate to True
from toponetx import *
he_1 = HyperEdge((1, 2, 3, 4), rank=2)
CC = CombinatorialComplex([he_1], name='trial')
print(CC._complex_set.skeleton(rank=3, level='upper'))
print(CC._complex_set.skeleton(rank=0, level='upper'))
The above two print statements would yield the same value, hence the Bug.
Review TopoNetX at a high-level:
Let us know whether it is ready to be zipped for submission to JMLR.
If outstanding tasks remain, feel free to raise individual GitHub Issues or comment on this issue.
Using calatrava uml TopoNetX
from the folder containing the repo TopoNetX, we can see the architecture of the package.
See: https://github.com/luisfpereira/calatrava
It looks like the screenshot below.
I was following the steps in "Getting Started" section of README file, and the "pip install -e [.all]" command doesn't seem to work for me.
I was able to resolve it by first upgrading the PIP version (it seems that a pip >= 21.3 is required).
I am also using zsh and zsh seems to require a single quote around [.all]. So this works pip install -e '[.all]'
Every tutorial on toponetx's doc website has a thumbnail. Currently, the tutorial on colored hypergraphs uses a hypergraph thumbnail. This should be corrected with a colored hypergraph thumbnail.
Otherwise, it is incorrect and confusing for users who might mix up hypergraphs and colored hypergraphs.
Most classes are directly accessible from toponetx itself. Implying that from toponetx import *
imports many objects and populates the global namespace.
Put code that lifts a complex (such as a simplicial complex) into another (such as a cellular complex) into toponetx
.
This code is currently in topomodelx/transforms (dev branch) here.
Because the lift operation can be used outside of the deep learning context.
Thus, it should be removed from topomodelx
and included in toponetx
.
Into a folder TopoNetX/toponetx/lift and files:
With a corresponding folder of tests in TopoNetX/test/lift and files:
Only integrate a new function into toponetx if it comes with its corresponding test function.
The docstrings in TNX do not respect proper coding style, which causes the doc website to break.
Everywhere in the repository.
Find issues in ColoredHyperGraph and CCC, document them properly.
The current documentation website does not have an API reference.
https://pyt-team.github.io/
We should add one to help contributors navigate the repository.
The documentation website is a great entry point for contributors.
Add API documentation pages in here:
https://github.com/pyt-team/TopoNetX/tree/main/docs
If necessary, change the docs workflow:
.github/workflows/docs.yml
The API should be automatically generated from the code.
Follow what is done here: https://github.com/geomstats/geomstats
Similar to #266, the __getitem__
method is not implemented consistently across different complex types:
SC[simplex]
returns the user-defined attributes associated with that simplex.CC[cell]
returns the neighbours of that cell.CHG[node]
returns the attributes of that node, but it does not work for hyperedges.PC[path]
returns the attributes associated with that path.The following example leads to an error:
from toponetx import SimplicialComplex as sc
simplices = [[3668, 5308, 5900]]
SC = sc(simplices)
B1 = SC.incidence_matrix(2)
The problem if I'm not mistaken is a mix of ordered data structures (tuple) and unordered (fronzen set):
Write methods to generate topological domains.
All experiments on topological domains require topological domains to start with. To generate examples beyond the trivial small examples, one typically relies on procedural methods to do so. In NetworkX, graphs are generated using the generator module https://github.com/networkx/networkx/tree/main/networkx/generators
which allows to generate random graphs with various properties.
Many of these methods are easily generalizable to topological domains or can be obtained by lifting or using some theoretical results (e.g every hypergraph can be realized as a bipartite graph ).
Follow networkX, create a module generator inside toponetx, start getting some of the methods that can be generalized to other higher order complexes slowly, name each file appropriately using convention from networkx.
Increase the code coverage for TopoNetX, making it ready for publications.
The following classes require attentions as they have many methods that remain untested:
In the simplicial complex constructor we have two modes : normal and ghudi. Do we need both ?
I am not sure having both methods is useful and it is potentially harder to maintain on the long run. What do you think?
The code coverage for the Complex Abstract class is very low as Codecov shows that none of the abstract methods were being tested.
I replaced the Complex class (which is being inherited by ConcreteComplex class) with the generic Object class and all the test cases were still getting passed.
HyperNetX released version 2.0.0.post1 on May 16. It contains some performance improvements, but also introduced breaking changes.
From their GitHub repository:
What's Changed
- The static and dynamic distinctions no longer exist. All hypergraphs use the same underlying data structure, supported by Pandas dataFrames. All hypergraphs maintain a state_dict to avoid repeating
- Methods for adding nodes and hyperedges are currently not
- The nwhy optimizations are no longer supported.
- Entity and EntitySet classes are being moved to the background. The Hypergraph constructor does not accept either.
This seems to require a lot of changes on our end, so for now I'll just restrict the HyperNetX version to 1.x.
The classes Cell, Simplex, HyperEdge contain duplicate methods -- moreover there should be methods that are there across all of them. They all mean similar things : the building blocks of higher order networks. There must be a class that unifies them.
Create an abstract class called "atom" (or some better name?) which all the above classes should inherit from
Currently the individual complex classes all miss to implement some methods that should be there by the Complex
abstract class:
CellComplex.clone
CombinatorialComplex.clone
SimplicialComplex.clone
SimplicialComplex.remove_nodes
Additionally, Complex
fails to derive from abc.ABC
and thus the existing declarations of abstract methods in that class are not enforced. Otherwise it would have been more easy to spot that these implementations are missing.
If I run pip install '.[all]'
i.e. not in editable mode with the -e
option (or if I get TopoNetX as dependency from TopoModelX), then data files (such as bunny.obj and coauthorship.npy in https://github.com/pyt-team/TopoNetX/tree/main/toponetx/datasets) are excluded.
I believe something has to be changed in the setuptools/.toml file config for the right files to be included (https://setuptools.pypa.io/en/latest/userguide/datafiles.html).
As of right now, the test suite is a mixture of unittest
and pytest
, essentially bounding the tests to both modules:
unittest.TestCase
and make use of self.assert*
functions. While pytest
is happy to run these tests for the moment, some unittest
functions like subtests to distinguish iterations are not supported.assert
statements or are explicitly bound to pytest
(e.g., using pytest.deprecated_call()
).Let's decide on one of both frameworks and refactor all tests to no longer use the other framework. This should also be enforced in new pull requests.
The neighborhood matrices, (incidence, adjacency, etc) are often sparse.
Depending on the function, the matrix returned is in sparse COO or sparse CSR format.
Is there a good reason to have this discrepancies in formatting? If not, let's convert everything to either coo or csr, because other downstream algorithms would enjoy knowing that a consistent type is returned.
Atoms in a complex can be equipped with additional user-defined data. In our code base, we sometimes call these properties
and sometimes attributes
. We should stick with one and rename occurrences of the other one.
Atom
ABC and in inheriting classes (Cell
, Simplex
, ...).Complex
classes call it attributes, e.g., get_simplex_attributes
and get_cell_attributes
.For context: NetworkX calls these things attributes. If there is no reason against that name, I suggest going with that purely for consistency with NetworkX.
The definition of an abstract simplicial complex is given: what makes it abstract? can be remove the "abstract" adjective, since the non-abstract version is not defined and the python class does not have Abstract in the name
"The degree of an incidence matrix references the degree of the classes we are comparing. An incidence matrix of degree 1 compares which vertices are incident with which edges, of degree 2 compares which vertices are incident with which faces"
--> corrected: degree 2 compares edges with faces?
--> corrected: use rank instead of degree, since the API uses rank.
"The entry in row x and column y is 1 if x and y are related (called incident in this context) and 0 if they are not [2]"
--> Does not take into account the -1 presented in the notebook.
"if the i ๐กโ
vertex is not incident to j ๐กโ
vertex via an edge incident to i ๐กโ
vertex, โ
0 if it is incident."
--> slightly unclear, rephrase?
"The up-Laplacian function up_laplacian_matrix returns a matrix where row 1 will return the up-Laplacian of edge 1, row 2 will return the up-Laplacian of edge 2 and so on." --> unclear, rephrase?
The python code uses 0-indexing but the text tends to use 1-indexing, "the first, the second" which makes it confusing.
"# setting what data we want to attach to which faces. order here is important." -> how is order important?
NodeView class does not have iter abstract method implementation. This affects CombinatorialComplex, ColoredHyperGraph and other dependent classes that use NodeView as a representation to display Nodes.
The iter methods of CombinatorialComplex/ColoredHyperGraph use NodeView objects, which fail with infinite loops if we try to iterate over the nodes of the Complex.
I tried to call the SimplicialComplex.incidence_matrix
(rank 2) on the shrec dataset using the code below
import toponetx.datasets as datasets
shrec, _ = datasets.mesh.shrec_16(size="small")
simplexes = shrec["complexes"]
print(simplexes[0].incidence_matrix(rank=2))
I get the following error (KeyError)
File "/snap/pycharm-professional/336/plugins/python/helpers/pydev/_pydev_imps/_pydev_execfile.py", line 18, in execfile
exec(compile(contents+"\n", file, 'exec'), glob, loc)
File "/home/kha053/PycharmProjects/TopoNetX/tutorials/test.py", line 19, in <module>
print(simplexes[0].incidence_matrix(rank=2))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/kha053/PycharmProjects/TopoNetX/toponetx/classes/simplicial_complex.py", line 783, in incidence_matrix
idx_faces.append(simplex_dict_d_minus_1[tuple(face)])
~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^
KeyError: (89, 81)
Similar errors are shown for SimplicialComplex.up_laplacian_matrix
and SimplicialComplex.down_laplacian_matrix
.
The CellComplex class contains some non-working, unused, or duplicate code / methods. This should be cleaned up to make it easier for both development and use.
As of right now, the changes of my pull request #112 are already visible on the documentation website even though the pull request has not been merged yet (see also when pyt-team/pyt-team.github.io@bc604e2 as authored). This most definitely should not be the case.
The problem is that the docs workflow is triggered on pull requests as well. This is fine for most steps of the workflow to test that Sphinx can build the documentation according to the pull request. However, the last step should only be executed for actual pushes to the main branch.
There were a couple of bugs w.r.t get_rank method in Hyperedge View.
The relevant code is present below
def get_rank(self, e):
"""Get rank.
Parameters
----------
e : Iterable, Hashable or HyperEdge
Returns
-------
int, the rank of the hyperedge e
"""
if isinstance(e, Iterable):
if len(e) == 0:
return 0
else:
for i in list(self.allranks):
if frozenset(e) in self.hyperedge_dict[i]:
return i
raise KeyError(f"hyperedge {e} is not in the complex")
elif isinstance(e, HyperEdge):
if len(e) == 0:
return 0
else:
for i in list(self.allranks):
if frozenset(e.elements) in self.hyperedge_dict[i]:
return i
raise KeyError(f"hyperedge {e} is not in the complex")
elif isinstance(e, Hashable) and not isinstance(e, Iterable):
if e in self.hyperedge_dict[0]:
return 0
else:
raise KeyError(f"hyperedge {e} is not in the complex")
elif isinstance(e, str):
if e in self.hyperedge_dict[0]:
return 0
else:
raise KeyError(f"hyperedge {e} is not in the complex")
e
is int
or str
never evaluates to True
, because the int/str are not present in hyperedge_dict
, but rather frozenset({e})
isThe tutorials webpage of the documentation website does not display the thumbnails properly. We should fix this.
https://pyt-team.github.io/toponetx/tutorials/index.html
The tutorials webpage is one of the first pages that users see. It should look clean.
The nbsphinx_thumbnails
dictionary could be updated in:
https://github.com/pyt-team/TopoNetX/blob/main/docs/conf.py
Alternatively, the tag nbsphinx_thumbnails could be used directly in the jupyter notebook as meta data of a cell. Note that this would override the nbsphinx_thumbnails
dictionary from conf.py
There is a difficulty emerging from the fact that the notebooks are loaded in docs/tutorials through the symlink notebooks.
Some information can be found on the website of nbsphinx
which is the package used to create such thumbnails:
https://nbsphinx.readthedocs.io/en/0.9.1/gallery/thumbnail-from-conf-py.html
JMLR requires that unit-tests' code coverage is "close to 100%". See: https://www.jmlr.org/mloss/mloss-info.html
We can see that, currently, it is at 86%.
We should make it >90%.
Go to the code coverage page for TopoNetX and observe which lines of code are not tested. Write tests for them.
https://app.codecov.io/gh/pyt-team/TopoNetX
If you create a combinatorial complex , and try to add a cell of cardinality greater than 17, and rank 1 , you will notice that it takes a very long time. if you reduce the cardinality of the added cell it gets lesser and lesser time to add it into the cc, and if you increase the cardinality , it will take longer and longer to add the cell.
My point is , add_cell has a time complexity o(n!) ,as for every set , it considers all possible combinations. I could reduce this to linear , by modifying some parts in my fork , but I am not sure of the consequence of these modification for behaviour that I have not tested. incidence matrix and and adjacency matrix were working as they should.
Hi, I am confused by the documentation/definition of the incidence_matrix function in cell_complex.py. It's not clear to me whether the function calculates an up-incidence matrix of rank r, a down-incidence matrix of rank r, or if there is a way to use the function to calculate a (r,k)-incidence matrix.
For instance, if I had a cell complex with cells of rank 1,2,3 corresponding to nodes, edges and faces respectively, what would be the best way to calculate the (2,3) incidence matrix to see which edges are incident to faces?
Minimum Python versions are kind of a mess across all packages:
Moreover, the Python version in the README files is wrong:
Lets decide on a minimum version that applies to all three packages and fix the README files accordingly.
The names of functions, files, folders, variables, parameters needs a clean-up. For example:
r
, sometimes as rank
,The docstrings do not always respect our docstring conventions, which renders poorly in the documentation website.
Clean docstrings in toponetx/algorithms
:
Clean docstrings in toponetx/classes
:
Clean docstrings in toponetx/utils
:
Decide on abbreviation and apply changes:
Make sure that variable/parameter names are consistent:
rank
for rank everywhereCombintorial complex is a hypergraph X with a rank function rk : X->Z^{>0} s.t. if x<=y then rk(x)<=rk(y). This provides the CC structure with ability to consider incidence matrices between rank i and rank j cells.
In many applications however, the rank function is not strictly needed. The rank function makes CC somewhat expensive to compute (add_cell). Instead, one may define a more general class of complexes by defining a color function c: X->Z^{>0} with no condition as before. Such a complex is strictly more general than CCs and allows for faster insertion time. c can be thought of as a 'color' for hyperedges and one may define incidence between two cells of different colors
Write a class ColorHyperGraph with the basic functionalities.
import toponetx as tnx
CX = tnx.CellComplex()
CX._insert_cell((1, 2, 3, 4), color="red")
CX._insert_cell((1, 2, 3, 4), color="green")
CX._insert_cell((1, 2, 3, 4), shape="square")
CX._insert_cell((2, 3, 4, 5))
CV = CX._cells
print(CV.__contains__((1,2,3,4)),"---", CV.__contains__((2,3,4,1)))
set_1 = {1, 2, 3, 4}
set_2 = {2, 1, 3, 4}
print(CV.__contains__(set_1),"---", CV.__contains__(set_2))
The above would print
True --- False
True --- True
The organization of the *View Python classes in not clear:
NodeView
in classes/node.py
but other *Views are in reportview.py? Note that NodeView
is within reportviews in NetworkX.reportview.py
whereas the sister file in networkx
package is reportviews.py (with an s)?We need to clean this.
The strategy behind TopoNetX
Api and organization is to follow the NetworkX
package as closely as possible: this gives useful guidelines and will be easier for users to get intuition about TopoNetX.
See the files:
Or look for "View" from GitHub search bar.
reportview.py
to reportviews.py
-- unless there was a good reason to keep it singular?NodeView
to reportviews.py
-- unless there was a good reason to have it in its own file?TopoNetX for the most part does not use type hints for function parameters and return values, which would be usable by static type checkers such as mypy (in TopoNetX itself but also when consumers want to check their code for errors).
However, recently, some comments by @josefhoppe added a few type hints to the source code, e.g., in #94. I think there should be a clear decision whether to add type hints or not:
Either way, the decision should be made clear in the contribution guidelines and enforced for new pull requests.
Networkx has dedicated methods for the set and get attrs of all parts of the graph (nodes and edges).
To be consistent with the mother package, should we have these as well as the existing functions (set and get cell attrs) ?
In both cases we call the main function set and get cell attrs (but we specify the ranks)?
I think there is a value for this especially that in the mind of people coming from graph theory and not very familiar with topology, zero cells (nodes) are different from other cells. I think there is a value for also for the node case since the latter is used more frequently than any other cell dimension in practice.
Unit tests for the hyperedge class are not running.
Going back in the commit history, it seems that the issue arose when the unit test module was removed in favor of pytest. However, pytest requires the class names to start with Test, which is not the official name.
I will raise a PR with the changes.
Almost everywhere we use the built-in exceptions from the standard library (e.g., KeyError
and ValueError
). We however also ship three own exception classes that are sporadically used.
In its current form, this is not optimal. First of all this should be consistent, second and more importantly, our own exceptions only inherent from Exception
and not from the correct built-in subclasses. TopoNetXNotImplementedError
for example should inherit from the built-in NotImplementedError
.
TypeError
and not a ValueError
.Recently, #135 added a dependency for the wget
package to download files. This package was last updated in 2015 and should be considered unmaintained. As we are also already using the well-known requests
package, we should use it everywhere and drop the extra dependency.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.