Code Monkey home page Code Monkey logo

giotto-tda's Introduction

image

Version_ _ Azure-cov_ Azure-test_ Twitter-follow_ Slack-join_

giotto-tda

giotto-tda is a high-performance topological machine learning toolbox in Python built on top of scikit-learn and is distributed under the GNU AGPLv3 license. It is part of the Giotto family of open-source projects.

Project genesis

giotto-tda is the result of a collaborative effort between L2F SA, the Laboratory for Topology and Neuroscience at EPFL, and the Institute of Reconfigurable & Embedded Digital Systems (REDS) of HEIG-VD.

License

giotto-tda is distributed under the AGPLv3 license. If you need a different distribution license, please contact the L2F team.

Documentation

Please visit https://giotto-ai.github.io/gtda-docs and navigate to the version you are interested in.

Installation

Dependencies

The latest stable version of giotto-tda requires:

  • Python (>= 3.7)
  • NumPy (>= 1.19.1)
  • SciPy (>= 1.5.0)
  • joblib (>= 0.16.0)
  • scikit-learn (>= 0.23.1)
  • pyflagser (>= 0.4.3)
  • python-igraph (>= 0.8.2)
  • plotly (>= 4.8.2)
  • ipywidgets (>= 7.5.1)

To run the examples, jupyter is required.

User installation

The simplest way to install giotto-tda is using pip :

python -m pip install -U giotto-tda

If necessary, this will also automatically install all the above dependencies. Note: we recommend upgrading pip to a recent version as the above may fail on very old versions.

Pre-release, experimental builds containing recently added features, and/or bug fixes can be installed by running :

python -m pip install -U giotto-tda-nightly

The main difference between giotto-tda-nightly and the developer installation (see the section on contributing, below) is that the former is shipped with pre-compiled wheels (similarly to the stable release) and hence does not require any C++ dependencies. As the main library module is called gtda in both the stable and nightly versions, giotto-tda and giotto-tda-nightly should not be installed in the same environment.

Developer installation

Please consult the dedicated page for detailed instructions on how to build giotto-tda from sources across different platforms.

Contributing

We welcome new contributors of all experience levels. The Giotto community goals are to be helpful, welcoming, and effective. To learn more about making a contribution to giotto-tda, please consult the relevant page.

Testing

After developer installation, you can launch the test suite from outside the source directory :

pytest gtda

Citing giotto-tda

If you use giotto-tda in a scientific publication, we would appreciate citations to the following paper:

giotto-tda: A Topological Data Analysis Toolkit for Machine Learning and Data Exploration, Tauzin et al, J. Mach. Learn. Res. 22.39 (2021): 1-6.

You can use the following BibTeX entry:

@article{giotto-tda,
  author  = {Guillaume Tauzin and Umberto Lupo and Lewis Tunstall and Julian Burella P\'{e}rez and Matteo Caorsi and Anibal M. Medina-Mardones and Alberto Dassatti and Kathryn Hess},
  title   = {giotto-tda: A Topological Data Analysis Toolkit for Machine Learning and Data Exploration},
  journal = {Journal of Machine Learning Research},
  year    = {2021},
  volume  = {22},
  number  = {39},
  pages   = {1-6},
  url     = {http://jmlr.org/papers/v22/20-325.html}
}

Community

giotto-ai Slack workspace: https://slack.giotto.ai/

Contacts

[email protected]

giotto-tda's People

Contributors

aldopod92 avatar alexbacce avatar algorithme avatar amg88 avatar ammedmar avatar conet avatar giotto-learn avatar gtauzin avatar jacobbamberger avatar lewtun avatar matteocao avatar nicksale avatar nphilou avatar rballeba avatar reds-heig avatar rorondre avatar rth avatar seanlaw avatar ulupo avatar weilerp avatar wreise avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

giotto-tda's Issues

Add homology_dimensions to diagrams transformers

Description

In the current implementation of diagrams.Scaler, Amplitude and PersistenceEntropy, homology dimensions that don't appear in fit will not be considered in transform.

It might lead to some unexpected results.

Possible improvements

  • Documentation
  • Add homology_dimensions parameter (as done for Filtering) to transformers that might ignore some dimensions in transform

PS: Maybe I'm the only one who thinks that it's not clear enough

Make azure-ci and binder hidden

Description

The azure-ci and binder should be hidden. In the case of binder, so far, we use an environment.yml file that is on the root directory. This can be confused with an enviornment file that would gather the library dependency. As a result:

  • A .binder should be created to hide the environment.yml file
  • The azure-ci folder should be renamed .azure-ci to be hidden

window_width is a bad name in SlidingWindow

Description

Following user feedback, it seems that the name window_width does not convey the meaning of the associated parameter in SlidingWindow sufficiently clearly. Another drawback is that it is awkward to document as the number of points in a window of "width" n is n+1. We might prefer, instead, to ask the user to pass the number of points in each window directly as a parameter.

Fix and expand TransitionGraph for v0.1.0

Polish the examples

We need to upload the data of the examples to openML and make sure they all run smoothly on binder.

Input check for geodesic_distance in graphs submodule

Description

A check_graph function is needed in order to check the input adjacency matrices for the geodesic_distance.py; moreover tests to check the behaviour of this function are also needed.

Steps/Code to Reproduce

Expected Results

Actual Results

Versions

Refactoring of azure-pipelines.yml

From @rth:

The current azure-pipelines.yml could be refactored a bit to use templates and standalone scripts as done e.g. in scikit-learn [1], [2]. That would in particular allow factorizing Linux and Mac OS setups somewhat.

If the current setup works, I'm not sure there is an immediate benefit though.

Add GitHub files to prep for v0.1.0

A number of GitHub files should be added/modified for the v0.1.0:

  • README.rst
  • CONTRIBUTING.rst
  • GOVERNANCE.rst
  • DEED_OF_CONTRIBUTION.rst
  • ISSUE_TEMPLATE.rst
  • LICENSE
  • CODE_AUTHORS
  • CODE_OF_CONDUCT.rst
  • PULL_REQUEST_TEMPLATE.rst
  • RELEASE.md

A description of the repo should also be added.

Amplitude arrays incorrectly filled in diagrams.Amplitude

Description

The transform method in diagrams.Amplitude calls the _parallel_pairwise utility function for joblib-parallel computation of amplitudes of persistence diagrams. This function first calculates all amplitudes for different homology dimensions and across different slices of the input array of diagrams:

https://github.com/giotto-learn/giotto-learn/blob/a5f2024375fd24c84614b9b77ffc39e79cb71cac/giotto/diagrams/_metrics.py#L204

Then, we have to carefully arrange all amplitudes into an array of the correct shape:

https://github.com/giotto-learn/giotto-learn/blob/a5f2024375fd24c84614b9b77ffc39e79cb71cac/giotto/diagrams/_metrics.py#L210

Unfortunately, the final result at present is incorrect, primarily because the top for in the call to Parallel is on the slices, while it should be on the list of homology dimensions. Changing the order of the for loops as well as replacing the final line before return with

amplitude_arrays = np.concatenate(amplitude_arrays).reshape(len(homology_dimensions), X.shape[0]).T

woud yield correct results.

Steps/Code to Reproduce

Create

diagrams = np.array([
    [[0, 1, 0.],
     [0, 0, 0.],
     [0, 4, 1.]],  # Expected bottleneck ampl: [sqrt(2)/2, 2*sqrt(2)] 
    
    [[0, 2, 0.],
     [0, 1, 0.],
     [0, 0, 1.]],  # Expected bottleneck ampl: [sqrt(2), 0] 
    
    [[3, 3.5, 0.],
     [0, 0, 0.],
     [5, 9, 1.]]  # Expected bottleneck ampl: [0.5*sqrt(2)/2, 2*sqrt(2)] 
])

Expected Results

The expected amplitude array when using bottleneck amplitude is

array([[0.70710678, 2.82842712],
       [1.41421356, 0.        ],
       [0.35355339, 2.82842712]])

Actual Results

The current result is

array([[0.70710678, 1.41421356],
       [2.82842712, 0.        ],
       [0.35355339, 2.82842712]])

Versions

giotto-learn: 0.1.2

fix windows wheel readme file

The twine check complains about the structuring of the README.rst file in the windows build. Needs to be fixed before being able to upload to PyPI.

Misreferenced doc version

Description

Doc version don't match release version

Steps/Code to Reproduce

Expected Results

Actual Results

Versions

license copyright

The copyright of the code shall remain to those who wrote such code. The copyright of the project as a whole is L2F's.

Issue template

The issue template has typos. As it is handled through the github dashboard I guess no PRs are possible and the maintainers have to update it.

I was tempted to simply add it to a comment in the actual issue I was opening, but this one will be fast to close and I did not want to spare you that satisfaction :)

At the top this
https://github.com/giotto-learn/giotto-learn/blob/master/CONTRIBUTING.md
shoud be
https://github.com/giotto-learn/giotto-learn/blob/master/CONTRIBUTING.rst

And at the bottom this
import sklearn; print("giotto-Learn", giotto.version)
should be
import giotto; print("giotto-Learn", giotto.version)

fit_transform incorrectly documented throughout

Description

Because fit_transform is never explicitly defined in our transformers, its behaviour and docstrings are inherited from scikit-learn's parent classes (BaseEstimator, TransformerMixin). This yields correct behaviour but incorrect documentation e.g. when scikit-learn's shape rules are broken.

A simple solution might be to add explicit definitions for each problematic fit_transform, only so that docstrings can be manually inserted. The behaviour would just inherit from the parent classes by e.g. the use of super().

TerminatedWorkerError when calling transform on VietorisRipsPersistence

Description

When calling transform on VietorisRipsPersistence I sometimes get the following error:

TerminatedWorkerError: A worker process managed by the executor was unexpectedly terminated. This could be caused by a segmentation fault while calling the function or by an excessive memory usage causing the Operating System to kill the worker. The exit codes of the workers are {SIGABRT(-6)}

Steps/Code to Reproduce

The error is surprisingly hard to reproduce as it appears to depend on how much RAM is available at runtime. The best I can provide at this stage is the following snippet:

homologyDimensions = (0, 1)
persistenceDiagram = hl.VietorisRipsPersistence(metric='euclidean', max_edge_length=10, 
                                            homology_dimensions=homologyDimensions, 
                                            n_jobs=-1)
persistenceDiagram.fit(doc_matrix)

Diagrams = persistenceDiagram.transform(doc_matrix[:n_docs])

where doc_matrix has shape (1902, 778, 300} and takes 1775707200 bytes in memory.

Expected Results

I would expect that when n_jobs=-1, VietorisRipsPersistence would simply try to access the available cores / memory and not throw an error.

Actual Results

---------------------------------------------------------------------------
TerminatedWorkerError                     Traceback (most recent call last)
<ipython-input-40-af8c35fe8d70> in <module>
      7 persistenceDiagram.fit(doc_matrix[:n_docs])
      8 
----> 9 Diagrams = persistenceDiagram.transform(doc_matrix[:n_docs])

~/git/gw_nlp/env/lib/python3.7/site-packages/giotto/homology/point_clouds.py in transform(self, X, y)
    194 
    195         Xt = Parallel(n_jobs=self.n_jobs)(delayed(self._ripser_diagram)(X[i])
--> 196                                           for i in range(n_samples))
    197 
    198         max_n_points = {dim: max(1, np.max([Xt[i][dim].shape[0]

~/git/gw_nlp/env/lib/python3.7/site-packages/joblib/parallel.py in __call__(self, iterable)
   1014 
   1015             with self._backend.retrieval_context():
-> 1016                 self.retrieve()
   1017             # Make sure that we get a last message telling us we are done
   1018             elapsed_time = time.time() - self._start_time

~/git/gw_nlp/env/lib/python3.7/site-packages/joblib/parallel.py in retrieve(self)
    906             try:
    907                 if getattr(self._backend, 'supports_timeout', False):
--> 908                     self._output.extend(job.get(timeout=self.timeout))
    909                 else:
    910                     self._output.extend(job.get())

~/git/gw_nlp/env/lib/python3.7/site-packages/joblib/_parallel_backends.py in wrap_future_result(future, timeout)
    552         AsyncResults.get from multiprocessing."""
    553         try:
--> 554             return future.result(timeout=timeout)
    555         except LokyTimeoutError:
    556             raise TimeoutError()

/usr/local/anaconda3/lib/python3.7/concurrent/futures/_base.py in result(self, timeout)
    430                 raise CancelledError()
    431             elif self._state == FINISHED:
--> 432                 return self.__get_result()
    433             else:
    434                 raise TimeoutError()

/usr/local/anaconda3/lib/python3.7/concurrent/futures/_base.py in __get_result(self)
    382     def __get_result(self):
    383         if self._exception:
--> 384             raise self._exception
    385         else:
    386             return self._result

TerminatedWorkerError: A worker process managed by the executor was unexpectedly terminated. This could be caused by a segmentation fault while calling the function or by an excessive memory usage causing the Operating System to kill the worker. The exit codes of the workers are {SIGABRT(-6)

Versions

Darwin-19.0.0-x86_64-i386-64bit
Python 3.7.3 (default, Mar 27 2019, 16:54:48)
[Clang 4.0.1 (tags/RELEASE_401/final)]
NumPy 1.17.3
SciPy 1.3.1
joblib 0.14.0
Scikit-Learn 0.21.3
giotto-Learn 0.1.1

Document removal of one 0-homology class with infinite lifetime in VietorisRipsPersistence

Currently, VietorisRipsPersistence's transform calls the _ripser_diagram helper function, which always removes the last persistent feature in degree 0 produced by Ripser:

https://github.com/giotto-ai/giotto-learn/blob/647dd86b91176f87a5ee69ae64b803d1e02f9630/giotto/homology/point_clouds.py#L111

This is because, in degree 0, there will always be at least one persistent feature with infinite lifetime.

However, this behaviour is not currently documented in the docs for VietorisRipsPersistence.

Core dump with VietorisRipsPersistence and joblib

Description

Core dump when calling fit_transform on VietorisRipsPersistence with n_jobs=None or 1,
TerminatedWorkerError when n_jobs=2.

Steps/Code to Reproduce

import numpy as np
from giotto.homology import VietorisRipsPersistence
VietorisRipsPersistence(n_jobs=2).fit_transform(np.array([[[0, 0], [0, 1], [1, 0], [1, 1]]]))

Expected Results

No error is thrown.

Actual Results

Illegal instruction (core dumped)

Unreportable Reason: Cannot determine path of python module joblib.externals.loky.backend.popen_loky_posix"

Or if n_jobs=1:

TerminatedWorkerError: A worker process managed by the executor was unexpectedly terminated. This could be caused by a segmentation fault while calling the function or by an excessive memory usage causing the Operating System to kill the worker. The exit codes of the workers are {SIGILL(-4)}

Versions

Linux-4.15.0-65-generic-x86_64-with-Ubuntu-18.04-bionic
Python 3.6.8 (default, Oct 7 2019, 12:59:55)
[GCC 8.3.0]
NumPy 1.17.2
SciPy 1.3.1
Scikit-Learn 0.21.3
giotto-Learn 0.1.1

Index fixing for v0.1.0

Description

The index.rst file should be polished and updated as follows:

  1. Include all meta_transformers
  2. Include PermutationEntropy, PearsonCoefficient/Dissimilarity (#45 ) and Labeller
  3. Improve relative ordering (?)

Steps/Code to Reproduce

Expected Results

Actual Results

Versions

KNeighborsGraph "available" in scikit-learn 0.22 as KNeighborsTransformer

Description

A novelty of scikit-learn 0.22 is the introduction of the KNeighborsTransformer, which implements in the case of single point clouds (rather than collections thereof) exactly what we implemented in graphs.KNeighborsGraph.

While we cannot simply drop our version and straight-up use scikit-learn's due to our insistence on collections of objects (diagrams, graphs, etc), this situation raises two points in my opinion:

  • In the very short term, we should compare our implementation to KNeighborsTransformer, to see if we missed out on some good magic.
  • In the longer term, and especially as scikit-learn's support of graph structures is likely to grow further, should we consider producing universal wrappers of scikit-learn's transformers which make them act on collections of objects? Might this be feasible using elegant decorators?

Redundant azure jobs

From @rth:

Currently in azure-pipelines.yml there are,

  • 2 Mac OS VMs
  • 2 Windows VMS

I'm not sure that it's actually necessary to build wheels twice on Mac OS and Windows.

For instance, scikit-learn ships a single wheel (for each Python version) that works for macosx_10_6_intel.macosx_10_9_intel.macosx_10_9_x86_64.macosx_10_10_intel.macosx_10_10 (setup here). That could be worth investigating.

And similarly I think Windows binaries should be ether backward (or maybe forward) compatibles so building that on one platform could be enough. For instance scikit-learn ships 1 wheel (per Python version) for Win 64bit and 1 for Win 32 bit, and I haven't seen any issues about it.

upload to PyPI

The alpha release is ready: it needs to be uploaded to PyPI with the giotto-learn account.

"ValueError: kth(=<n>) out of bounds (<m>)" for large parameters of n_layers in landscapes function

Description

For sufficiently large values of the n_layers argument, the landscapes function in diagrams._metric can throw an error of the kind

ValueError: kth(=<n>) out of bounds (<m>)

where n and m are integers. This is caused by the line
https://github.com/giotto-ai/giotto-learn/blob/7e693b76e03ea422a3046fc8931f0be6b02fab64/giotto/diagrams/_metrics.py#L28
since if n_points - n_layers < -n_points the subsequent line
https://github.com/giotto-ai/giotto-learn/blob/7e693b76e03ea422a3046fc8931f0be6b02fab64/giotto/diagrams/_metrics.py#L29
will fail.

Steps/Code to Reproduce

Take an ndarray diagrams such that e.g. diagrams.shape[1] is equal to 1, then the error

ValueError: kth(=<-(n-2)>) out of bounds (1)

is thrown when n_layers is set to be n.

Versions

Darwin-17.7.0-x86_64-i386-64bit
Python 3.6.8 |Anaconda, Inc.| (default, Dec 29 2018, 19:04:46)
[GCC 4.2.1 Compatible Clang 4.0.1 (tags/RELEASE_401/final)]
NumPy 1.17.3
SciPy 1.2.1
joblib 0.14.0
Scikit-learn 0.21.3
giotto-learn 0.1.3

Building manylinux wheels

From @rth:

Currently wheels are built based on the ubuntu-16.04 image which means that they would work with that Ubuntu version and later, but probably not other linux distributions.

A way to make wheels that would work for any linux distribution is with manylinux wheels https://github.com/pypa/manylinux.

Traditionally manylinux1 were were built with CentSO5, but a more recent manylinux2010 standard (CentOS 6) was recently adopted. Docker images are available and could be used in that same ubuntu VM.

Hyperlinks in docs return 404 errors

Description

Some of the hyperlinks in the docs point to https://giotto.ai/theory which returns a 404 error. It would be nice if we could have an "under construction page" instead of this error which confused me at first. Even better, would be a functioning theory page.

Steps/Code to Reproduce

Navigate to https://docs.giotto.ai/ and click on the hyperlink of e.g. "Amplitudes of persistent diagrams ..."

Expected Results

No 404 errors

Actual Results

404 (Not Found)

Versions

_subdiagrams only returns diagrams of the last dim in homology_dimensions

The _subdiagrams method can (and currently should) take a list of homology dimensions list but will only return the diagrams of the last element in the list.

https://github.com/giotto-ai/giotto-learn/blob/cda44bd79760106185f3a2849d6b36dbfb0a5aae/giotto/diagrams/_utils.py#L17-L23

Should we remove the loop or return an array of [subdiagrams_i, subdiagrams_j] with homology_dimensions = [i, j] as param ?

Also, this method can be very useful for users so maybe should we make it public ?

Fix PearsonCorrelation problems for v0.1.0

Description

There are some important issues with the current PearsonCorrelation transformer:

  1. It acts on single two-dimensional arrays (i.e. single multi-variate time series) instead of on collections of such (as would be produced e.g. by a sliding window procedure)
  2. It should be renamed to e.g. PearsonDissimilarity as the output is not the correlation
  3. 'positive_definite' does not describe well the difference between the two options: in either case the entries are greater than 0
  4. Tests should be rewritten in view of 1, 2 and 3
  5. Docstrings should be written (#32 )

Steps/Code to Reproduce

Expected Results

Actual Results

Versions

infinity_values having no effects

Description

Steps/Code to Reproduce

homology_dimensions = [0,1]
point_cloud=np.array([[2994.15145385, 2994.6898423 ],
[2994.6898423 , 2995.25011228],
[2995.25011228, 2995.81086442],
[2995.81086442, 2996.34742252],
[2996.34742252, 2996.83255758],
[2996.83255758, 2997.23764226],
[2997.23764226, 2997.53431672],
[2997.53431672, 2997.69673226],
[2997.69673226, 2997.70440466],
[2997.70440466, 2997.545644 ],
[2997.545644 , 2997.22142019],
[2997.22142019, 2996.7493565 ],
[2996.7493565 , 2996.16730176],
[2996.16730176, 2995.53560389],
[2995.53560389, 2994.93679867],
[2994.93679867, 2994.4709825 ],
[2994.4709825 , 2994.2447852 ],
[2994.2447852 , 2994.35188093],
[2994.35188093, 2994.84392356],
[2994.84392356, 2995.69362524],
[2995.69362524, 2996.75785686],
[2996.75785686, 2997.75975786],
[2997.75975786, 2998.32505364],
[2998.32505364, 2998.12240867],
[2998.12240867, 2997.143359 ],
[2997.143359 , 2996.04017334],
[2996.04017334, 2996.09188423],
[2996.09188423, 2997.86034233],
[2997.86034233, 2998.66986234],
[2998.66986234, 2997.01000126],
[2997.01000126, 2998.64346342]])
VR = hl.VietorisRipsPersistence(homology_dimensions=[0,1],infinity_values=1000, n_jobs=-1)
diag_1=VR.fit_transform([point_cloud])
diag_1

Expected Results

array([[[0. , 0.15894595, 0. ],
[0. , 0.16259666, 0. ],
[0. , 0.25026929, 0. ],
[0. , 0.25280833, 0. ],
[0. , 0.33822262, 0. ],
[0. , 0.34334007, 0. ],
[0. , 0.35061181, 0. ],
[0. , 0.35918 , 0. ],
[0. , 0.36100698, 0. ],
[0. , 0.41400436, 0. ],
[0. , 0.42276978, 0. ],
[0. , 0.42685053, 0. ],
[0. , 0.46951547, 0. ],
[0. , 0.50210488, 0. ],
[0. , 0.51783192, 0. ],
[0. , 0.52743238, 0. ],
[0. , 0.5284006 , 0. ],
[0. , 0.56346232, 0. ],
[0. , 0.57062197, 0. ],
[0. , 0.57268244, 0. ],
[0. , 0.58013839, 0. ],
[0. , 0.60052001, 0. ],
[0. , 0.62384081, 0. ],
[0. , 0.63202029, 0. ],
[0. , 0.65805095, 0. ],
[0. , 0.67352563, 0. ],
[0. , 0.71318197, 0. ],
[0. , 0.75865167, 0. ],
[0. , 0.77610409, 0. ],
[0. , 0.81456721, 0. ],
------------------->[0. , 1000, 0. ],
[0.91897351, 1.08222294, 1. ],
[0.87040788, 0.95760375, 1. ],
[0.77724916, 0.80187225, 1. ],
[0.71346641, 0.87794894, 1. ],
[0.62420917, 0.7132709 , 1. ],
[0.43905503, 0.46689981, 1. ]]])

Actual Results

array([[[0. , 0.15894595, 0. ],
[0. , 0.16259666, 0. ],
[0. , 0.25026929, 0. ],
[0. , 0.25280833, 0. ],
[0. , 0.33822262, 0. ],
[0. , 0.34334007, 0. ],
[0. , 0.35061181, 0. ],
[0. , 0.35918 , 0. ],
[0. , 0.36100698, 0. ],
[0. , 0.41400436, 0. ],
[0. , 0.42276978, 0. ],
[0. , 0.42685053, 0. ],
[0. , 0.46951547, 0. ],
[0. , 0.50210488, 0. ],
[0. , 0.51783192, 0. ],
[0. , 0.52743238, 0. ],
[0. , 0.5284006 , 0. ],
[0. , 0.56346232, 0. ],
[0. , 0.57062197, 0. ],
[0. , 0.57268244, 0. ],
[0. , 0.58013839, 0. ],
[0. , 0.60052001, 0. ],
[0. , 0.62384081, 0. ],
[0. , 0.63202029, 0. ],
[0. , 0.65805095, 0. ],
[0. , 0.67352563, 0. ],
[0. , 0.71318197, 0. ],
[0. , 0.75865167, 0. ],
[0. , 0.77610409, 0. ],
[0. , 0.81456721, 0. ],
[0.91897351, 1.08222294, 1. ],
[0.87040788, 0.95760375, 1. ],
[0.77724916, 0.80187225, 1. ],
[0.71346641, 0.87794894, 1. ],
[0.62420917, 0.7132709 , 1. ],
[0.43905503, 0.46689981, 1. ]]])

Versions

Darwin-19.0.0-x86_64-i386-64bit
Python 3.6.7 | packaged by conda-forge | (default, Jul 2 2019, 02:07:37)
[GCC 4.2.1 Compatible Clang 4.0.1 (tags/RELEASE_401/final)]
NumPy 1.17.3
SciPy 1.3.1
joblib 0.14.0
Scikit-Learn 0.21.3
giotto-Learn 0.1.3

Lorenz Attractor Notebook - FileNotFoundError Dataset 74800

Description

It appears that dataset 74800 from openml, used in the Lorenz Attractor example notebook, does not exist.

Steps/Code to Reproduce

Run the third code cell in the example notebook https://hub.gke.mybinder.org/user/giotto-learn-giotto-learn-fogohktg/notebooks/examples/Lorenz_Attractor.ipynb

from openml.datasets.functions import get_dataset
point_cloud = get_dataset(74800).get_data(dataset_format='array')[0]

Expected Results

No error is thrown

Actual Results

`
FileNotFoundError: [Errno 2] No such file or directory: '/home/jovyan/.openml/cache/org/openml/test/datasets/74800/description.xml'

During handling of the above exception, another exception occurred:

OpenMLCacheException Traceback (most recent call last)
/srv/conda/envs/notebook/lib/python3.7/site-packages/openml/datasets/functions.py in _get_dataset_description(did_cache_dir, dataset_id)
780 try:
--> 781 return _get_cached_dataset_description(dataset_id)
782 except OpenMLCacheException:

/srv/conda/envs/notebook/lib/python3.7/site-packages/openml/datasets/functions.py in _get_cached_dataset_description(dataset_id)
117 "Dataset description for dataset id %d not "
--> 118 "cached" % dataset_id)
119

OpenMLCacheException: Dataset description for dataset id 74800 not cached

During handling of the above exception, another exception occurred:

OpenMLServerException Traceback (most recent call last)
in
----> 1 point_cloud = get_dataset(74800).get_data(dataset_format='array')[0]

/srv/conda/envs/notebook/lib/python3.7/site-packages/openml/datasets/functions.py in get_dataset(dataset_id, download_data, version, error_if_multiple)
471 raise OpenMLPrivateDatasetError(e.message) from None
472 else:
--> 473 raise e
474 finally:
475 if remove_dataset_cache:

/srv/conda/envs/notebook/lib/python3.7/site-packages/openml/datasets/functions.py in get_dataset(dataset_id, download_data, version, error_if_multiple)
458 try:
459 remove_dataset_cache = True
--> 460 description = _get_dataset_description(did_cache_dir, dataset_id)
461 features = _get_dataset_features(did_cache_dir, dataset_id)
462 qualities = _get_dataset_qualities(did_cache_dir, dataset_id)

/srv/conda/envs/notebook/lib/python3.7/site-packages/openml/datasets/functions.py in _get_dataset_description(did_cache_dir, dataset_id)
782 except OpenMLCacheException:
783 url_extension = "data/{}".format(dataset_id)
--> 784 dataset_xml = openml._api_calls._perform_api_call(url_extension, 'get')
785 with io.open(description_file, "w", encoding='utf8') as fh:
786 fh.write(dataset_xml)

/srv/conda/envs/notebook/lib/python3.7/site-packages/openml/_api_calls.py in _perform_api_call(call, request_method, data, file_elements)
49 'are present')
50 return _read_url_files(url, data=data, file_elements=file_elements)
---> 51 return _read_url(url, request_method, data)
52
53

/srv/conda/envs/notebook/lib/python3.7/site-packages/openml/_api_calls.py in _read_url(url, request_method, data)
96 response = send_request(request_method=request_method, url=url, data=data)
97 if response.status_code != 200:
---> 98 raise _parse_server_exception(response, url)
99 if 'Content-Encoding' not in response.headers or
100 response.headers['Content-Encoding'] != 'gzip':

OpenMLServerException: https://test.openml.org/api/v1/xml/data/74800 returned code 111: Unknown dataset
`

Versions

Linux-4.14.138+-x86_64-with-debian-buster-sid
Python 3.7.3 | packaged by conda-forge | (default, Jul 1 2019, 21:52:21)
[GCC 7.3.0]
NumPy 1.17.3
SciPy 1.3.1
joblib 0.14.0
Scikit-Learn 0.21.3
giotto-Learn 0.1.1
openml 0.9.0

Linting

The azure-pipelines execute the flake8 command to check the code linting. There are some linting problems:

./giotto/__init__.py:1:48: W291 trailing whitespace
./giotto/base.py:1:80: E501 line too long (90 > 79 characters)
./giotto/homology/point_clouds.py:195:13: E128 continuation line under-indented for visual indent
./giotto/utils/__init__.py:1:80: E501 line too long (81 > 79 characters)

Once these linting problems are corrected, we can remove the flag flake8 --exit-zero and stop the pipelines if the linting does not pass the tests. Proper linting has to be a mandatory check for all PRs.

Remove football-tda from examples

Description

My understanding is that we have decided to remove all but the simplest examples from the repo. Therefore we should remove the football-tda example since it contains new dependencies and pickled data.

Changes in git history

Please be aware that the final moving from gitlab to github has led us to modify git history.

@rth, could you please refork?

Diffusion module

Create a new module implementing diffusion on simplicial complexes via the Hodge Laplacian operator.

HeatKernel sigma param should accept int type

Description

TypeError thrown with integer sigma value for HeatKernel

Steps/Code to Reproduce

hk = HeatKernel(1, n_values=100, n_jobs=-1)
hk.fit(x_whatever)

Expected Results

No TypeError

Actual Results

TypeError: Parameter sigma is of type <class 'int'> while it should be of type <class 'float'>

Versions

Darwin-18.7.0-x86_64-i386-64bit
Python 3.7.3 (default, Mar 27 2019, 16:54:48)
[Clang 4.0.1 (tags/RELEASE_401/final)]
NumPy 1.17.3
SciPy 1.3.1
joblib 0.14.0
Scikit-Learn 0.21.3
giotto-Learn 0.1.2

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.