giotto-ai / giotto-tda Goto Github PK

A high-performance topological machine learning toolbox in Python

Home Page: https://giotto-ai.github.io/gtda-docs

License: Other

CMake 2.11% Python 97.64% Shell 0.24%

topological-machine-learning topological-data-analysis machine-learning scikit-learn tda mapper topology computational-topology

giotto-tda's Introduction

_ _ _ _ _ _

giotto-tda

giotto-tda is a high-performance topological machine learning toolbox in Python built on top of scikit-learn and is distributed under the GNU AGPLv3 license. It is part of the Giotto family of open-source projects.

Project genesis

giotto-tda is the result of a collaborative effort between L2F SA, the Laboratory for Topology and Neuroscience at EPFL, and the Institute of Reconfigurable & Embedded Digital Systems (REDS) of HEIG-VD.

License

giotto-tda is distributed under the AGPLv3 license. If you need a different distribution license, please contact the L2F team.

Documentation

Please visit https://giotto-ai.github.io/gtda-docs and navigate to the version you are interested in.

Installation

Dependencies

The latest stable version of giotto-tda requires:

Python (>= 3.7)
NumPy (>= 1.19.1)
SciPy (>= 1.5.0)
joblib (>= 0.16.0)
scikit-learn (>= 0.23.1)
pyflagser (>= 0.4.3)
python-igraph (>= 0.8.2)
plotly (>= 4.8.2)
ipywidgets (>= 7.5.1)

To run the examples, jupyter is required.

User installation

The simplest way to install giotto-tda is using pip :

python -m pip install -U giotto-tda

If necessary, this will also automatically install all the above dependencies. Note: we recommend upgrading pip to a recent version as the above may fail on very old versions.

Pre-release, experimental builds containing recently added features, and/or bug fixes can be installed by running :

python -m pip install -U giotto-tda-nightly

The main difference between giotto-tda-nightly and the developer installation (see the section on contributing, below) is that the former is shipped with pre-compiled wheels (similarly to the stable release) and hence does not require any C++ dependencies. As the main library module is called gtda in both the stable and nightly versions, giotto-tda and giotto-tda-nightly should not be installed in the same environment.

Developer installation

Please consult the dedicated page for detailed instructions on how to build giotto-tda from sources across different platforms.

Contributing

We welcome new contributors of all experience levels. The Giotto community goals are to be helpful, welcoming, and effective. To learn more about making a contribution to giotto-tda, please consult the relevant page.

Testing

After developer installation, you can launch the test suite from outside the source directory :

pytest gtda

Important links

Official source code repo: https://github.com/giotto-ai/giotto-tda
Download releases: https://pypi.org/project/giotto-tda/
Issue tracker: https://github.com/giotto-ai/giotto-tda/issues

Citing giotto-tda

If you use giotto-tda in a scientific publication, we would appreciate citations to the following paper:

giotto-tda: A Topological Data Analysis Toolkit for Machine Learning and Data Exploration, Tauzin et al, J. Mach. Learn. Res. 22.39 (2021): 1-6.

You can use the following BibTeX entry:

@article{giotto-tda,
  author  = {Guillaume Tauzin and Umberto Lupo and Lewis Tunstall and Julian Burella P\'{e}rez and Matteo Caorsi and Anibal M. Medina-Mardones and Alberto Dassatti and Kathryn Hess},
  title   = {giotto-tda: A Topological Data Analysis Toolkit for Machine Learning and Data Exploration},
  journal = {Journal of Machine Learning Research},
  year    = {2021},
  volume  = {22},
  number  = {39},
  pages   = {1-6},
  url     = {http://jmlr.org/papers/v22/20-325.html}
}

Community

giotto-ai Slack workspace: https://slack.giotto.ai/

Contacts

[email protected]

giotto-tda's People

Contributors

Stargazers

Watchers

Forkers

l2f-abelganz ulupo zifeo monkeybreaker amg88 algorithme wreise henrytomsf trneedham thomasboys aldopod92 moomoofarm1 conet rocheze reds-heig psalvy generaviles diegofiori p-fra weilerp marta-l2f alexbacce francescopase deatinor matteocao celiahacker gtauzin syyunn nyxnyx fudp lewtun rth moneytech giotto-learn stevenlol trendingtechnology yuhongmo louiscastricato yugaljain1999 fdoperezi jfcarmonag mikeg112 neodigm vishalbelsare ahmed4farah simonepoetto nicksale valeman deeplearning2012 jiapengwei xrosliang cvweis cappelchi cdl4694 rorondre tygrin johnjdailey ninamiolane thegodone ammedmar yikuide j-bac zixzeus iasos-net dfpinzon jd-roberts rafaelmri 0carlosc23 noisyoscillator robertoweller zhanfengdog erooke sixuerain wuqi001s mikahlbk jacobbamberger gsanc018 leadwisely99 anyuanay kavindya97 estherheelee phunghx ggonz271 predictionengineer gabrielp2p clusterduck123 asdspal lauraalvesp ai-app hedgefair ximenafernandez havenkey jamc88 seanahmad kcombs15 mehdihosseinimoghadam furmanlukasz siyeopyoon psuarezserrato frankfanslc

giotto-tda's Issues

bind to binder

The examples shall be run from binder.

Add homology_dimensions to diagrams transformers

Description

In the current implementation of diagrams.Scaler, Amplitude and PersistenceEntropy, homology dimensions that don't appear in fit will not be considered in transform.

It might lead to some unexpected results.

Possible improvements

Documentation
Add homology_dimensions parameter (as done for Filtering) to transformers that might ignore some dimensions in transform

PS: Maybe I'm the only one who thinks that it's not clear enough

Make azure-ci and binder hidden

Description

The azure-ci and binder should be hidden. In the case of binder, so far, we use an environment.yml file that is on the root directory. This can be confused with an enviornment file that would gather the library dependency. As a result:

A .binder should be created to hide the environment.yml file
The azure-ci folder should be renamed .azure-ci to be hidden

window_width is a bad name in SlidingWindow

Description

Following user feedback, it seems that the name window_width does not convey the meaning of the associated parameter in SlidingWindow sufficiently clearly. Another drawback is that it is awkward to document as the number of points in a window of "width" n is n+1. We might prefer, instead, to ask the user to pass the number of points in each window directly as a parameter.

Remove is_distance_matrix to include that case with metric='precomputed'

The ripser.py wrapper needs to be refactored for the distance_matrix=True to be included in the metric='precomputed' case that is standard in sklearn.

Fix and expand TransitionGraph for v0.1.0

TransitionGraph contains a bug in https://github.com/giotto-learn/giotto-learn/blob/d057602df8a62bfcbfdc835bf6d360104856073a/giotto/graphs/transition.py#L56 as the resulting matrix is then not boolean as stated in the docs, and it can have integer entries greater than 1.
TransitionGraph should be generic and therefore more general preprocessing steps than argsort should be available. This means a new callable parameter needs to be added and the implementation needs to be generalised (even at the expense of a little bit of performance in the case of np.argsort).

Polish the examples

We need to upload the data of the examples to openML and make sure they all run smoothly on binder.

CI pipelines should build wheels only when manually triggered

Description

Automatic wheels building takes too much time and his often unnecessary.

Input check for geodesic_distance in graphs submodule

Description

A check_graph function is needed in order to check the input adjacency matrices for the geodesic_distance.py; moreover tests to check the behaviour of this function are also needed.

Steps/Code to Reproduce

Expected Results

Actual Results

Versions

Incorrect shapes in time_series module classes

TakensEmbedding, SlidingWindow and Labeller seem to produce incorrect number of output samples.

Execute notebooks as tests in CI

Running notebooks in the CI might be useful as suggested by @gtauzin.

Refactoring of azure-pipelines.yml

From @rth:

The current azure-pipelines.yml could be refactored a bit to use templates and standalone scripts as done e.g. in scikit-learn [1], [2]. That would in particular allow factorizing Linux and Mac OS setups somewhat.

If the current setup works, I'm not sure there is an immediate benefit though.

else branch missing in TransitionGraph

https://github.com/giotto-learn/giotto-learn/blob/a43498354a27f8efe1fa8d32c44ee3b7b0fe14b4/giotto/graphs/transition.py#L151
means that func_ will always be set to func. This was introduced in commit e4c0686.

Add GitHub files to prep for v0.1.0

A number of GitHub files should be added/modified for the v0.1.0:

A description of the repo should also be added.

Amplitude arrays incorrectly filled in diagrams.Amplitude

Description

The transform method in diagrams.Amplitude calls the _parallel_pairwise utility function for joblib-parallel computation of amplitudes of persistence diagrams. This function first calculates all amplitudes for different homology dimensions and across different slices of the input array of diagrams:

https://github.com/giotto-learn/giotto-learn/blob/a5f2024375fd24c84614b9b77ffc39e79cb71cac/giotto/diagrams/_metrics.py#L204

Then, we have to carefully arrange all amplitudes into an array of the correct shape:

https://github.com/giotto-learn/giotto-learn/blob/a5f2024375fd24c84614b9b77ffc39e79cb71cac/giotto/diagrams/_metrics.py#L210

Unfortunately, the final result at present is incorrect, primarily because the top for in the call to Parallel is on the slices, while it should be on the list of homology dimensions. Changing the order of the for loops as well as replacing the final line before return with

amplitude_arrays = np.concatenate(amplitude_arrays).reshape(len(homology_dimensions), X.shape[0]).T

woud yield correct results.

Steps/Code to Reproduce

Create

diagrams = np.array([
    [[0, 1, 0.],
     [0, 0, 0.],
     [0, 4, 1.]],  # Expected bottleneck ampl: [sqrt(2)/2, 2*sqrt(2)] 
    
    [[0, 2, 0.],
     [0, 1, 0.],
     [0, 0, 1.]],  # Expected bottleneck ampl: [sqrt(2), 0] 
    
    [[3, 3.5, 0.],
     [0, 0, 0.],
     [5, 9, 1.]]  # Expected bottleneck ampl: [0.5*sqrt(2)/2, 2*sqrt(2)] 
])

Expected Results

The expected amplitude array when using bottleneck amplitude is

array([[0.70710678, 2.82842712],
       [1.41421356, 0.        ],
       [0.35355339, 2.82842712]])

Actual Results

The current result is

array([[0.70710678, 1.41421356],
       [2.82842712, 0.        ],
       [0.35355339, 2.82842712]])

Versions

giotto-learn: 0.1.2

fix windows wheel readme file

The twine check complains about the structuring of the README.rst file in the windows build. Needs to be fixed before being able to upload to PyPI.

Misreferenced doc version

Description

Doc version don't match release version

Steps/Code to Reproduce

Expected Results

Actual Results

Versions

Generate release to be able to get the code from tarballs

From @marcelotrevisani:

Could you please generate a release to be able to get the code from tarballs? So in that way it will be possible to create a conda package for it

license copyright

The copyright of the code shall remain to those who wrote such code. The copyright of the project as a whole is L2F's.

Error in documenting default metrics in PairwiseDistance and Amplitude

Description

The docstrings for PairwiseDistance and Amplitude currently present the default metric as 'bottleneck'. However, as evident from the source code, the default metric is actually 'landscape'.

Finalise docs for v0.1.0

Finalise all docstrings ahead of first release.

Issue template

The issue template has typos. As it is handled through the github dashboard I guess no PRs are possible and the maintainers have to update it.

I was tempted to simply add it to a comment in the actual issue I was opening, but this one will be fast to close and I did not want to spare you that satisfaction :)

At the top this
https://github.com/giotto-learn/giotto-learn/blob/master/CONTRIBUTING.md
shoud be
https://github.com/giotto-learn/giotto-learn/blob/master/CONTRIBUTING.rst

And at the bottom this
import sklearn; print("giotto-Learn", giotto.version)
should be
import giotto; print("giotto-Learn", giotto.version)

Accept general prime coefficients in VietorisRipsPersistence

A small fix suggested by @MonkeyBreaker in the call to rips_dm_sparse in ripser_interface.py can allow to use coefficients different from 2 in VietorisRipsPersistence. The fix should be implemented and the class updated to exploit this capability.

fit_transform incorrectly documented throughout

Description

Because fit_transform is never explicitly defined in our transformers, its behaviour and docstrings are inherited from scikit-learn's parent classes (BaseEstimator, TransformerMixin). This yields correct behaviour but incorrect documentation e.g. when scikit-learn's shape rules are broken.

A simple solution might be to add explicit definitions for each problematic fit_transform, only so that docstrings can be manually inserted. The behaviour would just inherit from the parent classes by e.g. the use of super().

giotto.diagrams.PairwiseDistance return zero diagonal

Description

giotto.diagrams.PairwiseDistance return zero diagonal for all dimensions, I was expecting to get the norm of the given diagram

Steps/Code to Reproduce

Expected Results

Actual Results

Versions

_filter utility function has bad behaviour when death values are negative

https://github.com/giotto-ai/giotto-learn/blob/647dd86b91176f87a5ee69ae64b803d1e02f9630/giotto/diagrams/_utils.py#L56

The above line in the definition of _filter would presumably yield bad behaviour for filtrations which admit negative parameters, if there are non-positive death values. This in turn would badly affect the behaviour of the Filtering class in future releases.

TerminatedWorkerError when calling transform on VietorisRipsPersistence

Description

When calling transform on VietorisRipsPersistence I sometimes get the following error:

TerminatedWorkerError: A worker process managed by the executor was unexpectedly terminated. This could be caused by a segmentation fault while calling the function or by an excessive memory usage causing the Operating System to kill the worker. The exit codes of the workers are {SIGABRT(-6)}

Steps/Code to Reproduce

The error is surprisingly hard to reproduce as it appears to depend on how much RAM is available at runtime. The best I can provide at this stage is the following snippet:

homologyDimensions = (0, 1)
persistenceDiagram = hl.VietorisRipsPersistence(metric='euclidean', max_edge_length=10, 
                                            homology_dimensions=homologyDimensions, 
                                            n_jobs=-1)
persistenceDiagram.fit(doc_matrix)

Diagrams = persistenceDiagram.transform(doc_matrix[:n_docs])

where doc_matrix has shape (1902, 778, 300} and takes 1775707200 bytes in memory.

Expected Results

I would expect that when n_jobs=-1, VietorisRipsPersistence would simply try to access the available cores / memory and not throw an error.

Actual Results

---------------------------------------------------------------------------
TerminatedWorkerError                     Traceback (most recent call last)
<ipython-input-40-af8c35fe8d70> in <module>
      7 persistenceDiagram.fit(doc_matrix[:n_docs])
      8 
----> 9 Diagrams = persistenceDiagram.transform(doc_matrix[:n_docs])

~/git/gw_nlp/env/lib/python3.7/site-packages/giotto/homology/point_clouds.py in transform(self, X, y)
    194 
    195         Xt = Parallel(n_jobs=self.n_jobs)(delayed(self._ripser_diagram)(X[i])
--> 196                                           for i in range(n_samples))
    197 
    198         max_n_points = {dim: max(1, np.max([Xt[i][dim].shape[0]

~/git/gw_nlp/env/lib/python3.7/site-packages/joblib/parallel.py in __call__(self, iterable)
   1014 
   1015             with self._backend.retrieval_context():
-> 1016                 self.retrieve()
   1017             # Make sure that we get a last message telling us we are done
   1018             elapsed_time = time.time() - self._start_time

~/git/gw_nlp/env/lib/python3.7/site-packages/joblib/parallel.py in retrieve(self)
    906             try:
    907                 if getattr(self._backend, 'supports_timeout', False):
--> 908                     self._output.extend(job.get(timeout=self.timeout))
    909                 else:
    910                     self._output.extend(job.get())

~/git/gw_nlp/env/lib/python3.7/site-packages/joblib/_parallel_backends.py in wrap_future_result(future, timeout)
    552         AsyncResults.get from multiprocessing."""
    553         try:
--> 554             return future.result(timeout=timeout)
    555         except LokyTimeoutError:
    556             raise TimeoutError()

/usr/local/anaconda3/lib/python3.7/concurrent/futures/_base.py in result(self, timeout)
    430                 raise CancelledError()
    431             elif self._state == FINISHED:
--> 432                 return self.__get_result()
    433             else:
    434                 raise TimeoutError()

/usr/local/anaconda3/lib/python3.7/concurrent/futures/_base.py in __get_result(self)
    382     def __get_result(self):
    383         if self._exception:
--> 384             raise self._exception
    385         else:
    386             return self._result

TerminatedWorkerError: A worker process managed by the executor was unexpectedly terminated. This could be caused by a segmentation fault while calling the function or by an excessive memory usage causing the Operating System to kill the worker. The exit codes of the workers are {SIGABRT(-6)

Versions

Darwin-19.0.0-x86_64-i386-64bit
Python 3.7.3 (default, Mar 27 2019, 16:54:48)
[Clang 4.0.1 (tags/RELEASE_401/final)]
NumPy 1.17.3
SciPy 1.3.1
joblib 0.14.0
Scikit-Learn 0.21.3
giotto-Learn 0.1.1

Add "www" to URLs

Description

To be safe on all browser (and friendly to the giotto.ai sysadmins), web links should have "www". E.g. https://giotto.ai/theory should be https://www.giotto.ai/theory

Steps/Code to Reproduce

Expected Results

Actual Results

Versions

Document removal of one 0-homology class with infinite lifetime in VietorisRipsPersistence

Currently, VietorisRipsPersistence's transform calls the _ripser_diagram helper function, which always removes the last persistent feature in degree 0 produced by Ripser:

https://github.com/giotto-ai/giotto-learn/blob/647dd86b91176f87a5ee69ae64b803d1e02f9630/giotto/homology/point_clouds.py#L111

This is because, in degree 0, there will always be at least one persistent feature with infinite lifetime.

However, this behaviour is not currently documented in the docs for VietorisRipsPersistence.

Core dump with VietorisRipsPersistence and joblib

Description

Core dump when calling fit_transform on VietorisRipsPersistence with n_jobs=None or 1,
TerminatedWorkerError when n_jobs=2.

Steps/Code to Reproduce

import numpy as np
from giotto.homology import VietorisRipsPersistence
VietorisRipsPersistence(n_jobs=2).fit_transform(np.array([[[0, 0], [0, 1], [1, 0], [1, 1]]]))

Expected Results

No error is thrown.

Actual Results

Illegal instruction (core dumped)

Unreportable Reason: Cannot determine path of python module joblib.externals.loky.backend.popen_loky_posix"

Or if n_jobs=1:

TerminatedWorkerError: A worker process managed by the executor was unexpectedly terminated. This could be caused by a segmentation fault while calling the function or by an excessive memory usage causing the Operating System to kill the worker. The exit codes of the workers are {SIGILL(-4)}

Versions

Linux-4.15.0-65-generic-x86_64-with-Ubuntu-18.04-bionic
Python 3.6.8 (default, Oct 7 2019, 12:59:55)
[GCC 8.3.0]
NumPy 1.17.2
SciPy 1.3.1
Scikit-Learn 0.21.3
giotto-Learn 0.1.1

Index fixing for v0.1.0

Description

The index.rst file should be polished and updated as follows:

Include all meta_transformers
Include PermutationEntropy, PearsonCoefficient/Dissimilarity (#45 ) and Labeller
Improve relative ordering (?)

Steps/Code to Reproduce

Expected Results

Actual Results

Versions

KNeighborsGraph "available" in scikit-learn 0.22 as KNeighborsTransformer

Description

A novelty of scikit-learn 0.22 is the introduction of the KNeighborsTransformer, which implements in the case of single point clouds (rather than collections thereof) exactly what we implemented in graphs.KNeighborsGraph.

While we cannot simply drop our version and straight-up use scikit-learn's due to our insistence on collections of objects (diagrams, graphs, etc), this situation raises two points in my opinion:

In the very short term, we should compare our implementation to KNeighborsTransformer, to see if we missed out on some good magic.
In the longer term, and especially as scikit-learn's support of graph structures is likely to grow further, should we consider producing universal wrappers of scikit-learn's transformers which make them act on collections of objects? Might this be feasible using elegant decorators?

Redundant azure jobs

From @rth:

Currently in azure-pipelines.yml there are,

2 Mac OS VMs
2 Windows VMS

I'm not sure that it's actually necessary to build wheels twice on Mac OS and Windows.

For instance, scikit-learn ships a single wheel (for each Python version) that works for macosx_10_6_intel.macosx_10_9_intel.macosx_10_9_x86_64.macosx_10_10_intel.macosx_10_10 (setup here). That could be worth investigating.

And similarly I think Windows binaries should be ether backward (or maybe forward) compatibles so building that on one platform could be enough. For instance scikit-learn ships 1 wheel (per Python version) for Win 64bit and 1 for Win 32 bit, and I haven't seen any issues about it.

upload to PyPI

The alpha release is ready: it needs to be uploaded to PyPI with the giotto-learn account.

"ValueError: kth(=<n>) out of bounds (<m>)" for large parameters of n_layers in landscapes function

Description

For sufficiently large values of the n_layers argument, the landscapes function in diagrams._metric can throw an error of the kind

ValueError: kth(=<n>) out of bounds (<m>)

where n and m are integers. This is caused by the line
https://github.com/giotto-ai/giotto-learn/blob/7e693b76e03ea422a3046fc8931f0be6b02fab64/giotto/diagrams/_metrics.py#L28
since if n_points - n_layers < -n_points the subsequent line
https://github.com/giotto-ai/giotto-learn/blob/7e693b76e03ea422a3046fc8931f0be6b02fab64/giotto/diagrams/_metrics.py#L29
will fail.

Steps/Code to Reproduce

Take an ndarray diagrams such that e.g. diagrams.shape[1] is equal to 1, then the error

ValueError: kth(=<-(n-2)>) out of bounds (1)

is thrown when n_layers is set to be n.

Versions

Darwin-17.7.0-x86_64-i386-64bit
Python 3.6.8 |Anaconda, Inc.| (default, Dec 29 2018, 19:04:46)
[GCC 4.2.1 Compatible Clang 4.0.1 (tags/RELEASE_401/final)]
NumPy 1.17.3
SciPy 1.2.1
joblib 0.14.0
Scikit-learn 0.21.3
giotto-learn 0.1.3

Modify ripser bindings wrapper to accept metric='precomputed'

Build wheels for python 3.5

Include, in the azure pipelines, the python 3.5 build.

Building manylinux wheels

From @rth:

Currently wheels are built based on the ubuntu-16.04 image which means that they would work with that Ubuntu version and later, but probably not other linux distributions.

A way to make wheels that would work for any linux distribution is with manylinux wheels https://github.com/pypa/manylinux.

Traditionally manylinux1 were were built with CentSO5, but a more recent manylinux2010 standard (CentOS 6) was recently adopted. Docker images are available and could be used in that same ubuntu VM.

Hyperlinks in docs return 404 errors

Description

Some of the hyperlinks in the docs point to https://giotto.ai/theory which returns a 404 error. It would be nice if we could have an "under construction page" instead of this error which confused me at first. Even better, would be a functioning theory page.

Steps/Code to Reproduce

Navigate to https://docs.giotto.ai/ and click on the hyperlink of e.g. "Amplitudes of persistent diagrams ..."

Expected Results

No 404 errors

Actual Results

404 (Not Found)

Versions

Cite ripser.py and Dyonisus 2 project, more references

These need proper mention in the project's documentation. Additionally, Perea's paper on sliding windows should be mentioned in TakensEmbedding, and Bauer's on Ripser in VietorisRipsPersistence.

_subdiagrams only returns diagrams of the last dim in homology_dimensions

The _subdiagrams method can (and currently should) take a list of homology dimensions list but will only return the diagrams of the last element in the list.

https://github.com/giotto-ai/giotto-learn/blob/cda44bd79760106185f3a2849d6b36dbfb0a5aae/giotto/diagrams/_utils.py#L17-L23

Should we remove the loop or return an array of [subdiagrams_i, subdiagrams_j] with homology_dimensions = [i, j] as param ?

Also, this method can be very useful for users so maybe should we make it public ?

Fix PearsonCorrelation problems for v0.1.0

Description

There are some important issues with the current PearsonCorrelation transformer:

It acts on single two-dimensional arrays (i.e. single multi-variate time series) instead of on collections of such (as would be produced e.g. by a sliding window procedure)
It should be renamed to e.g. PearsonDissimilarity as the output is not the correlation
'positive_definite' does not describe well the difference between the two options: in either case the entries are greater than 0
Tests should be rewritten in view of 1, 2 and 3
Docstrings should be written (#32 )

Steps/Code to Reproduce

Expected Results

Actual Results

Versions

infinity_values having no effects

Description

Steps/Code to Reproduce

homology_dimensions = [0,1]
point_cloud=np.array([[2994.15145385, 2994.6898423 ],
[2994.6898423 , 2995.25011228],
[2995.25011228, 2995.81086442],
[2995.81086442, 2996.34742252],
[2996.34742252, 2996.83255758],
[2996.83255758, 2997.23764226],
[2997.23764226, 2997.53431672],
[2997.53431672, 2997.69673226],
[2997.69673226, 2997.70440466],
[2997.70440466, 2997.545644 ],
[2997.545644 , 2997.22142019],
[2997.22142019, 2996.7493565 ],
[2996.7493565 , 2996.16730176],
[2996.16730176, 2995.53560389],
[2995.53560389, 2994.93679867],
[2994.93679867, 2994.4709825 ],
[2994.4709825 , 2994.2447852 ],
[2994.2447852 , 2994.35188093],
[2994.35188093, 2994.84392356],
[2994.84392356, 2995.69362524],
[2995.69362524, 2996.75785686],
[2996.75785686, 2997.75975786],
[2997.75975786, 2998.32505364],
[2998.32505364, 2998.12240867],
[2998.12240867, 2997.143359 ],
[2997.143359 , 2996.04017334],
[2996.04017334, 2996.09188423],
[2996.09188423, 2997.86034233],
[2997.86034233, 2998.66986234],
[2998.66986234, 2997.01000126],
[2997.01000126, 2998.64346342]])
VR = hl.VietorisRipsPersistence(homology_dimensions=[0,1],infinity_values=1000, n_jobs=-1)
diag_1=VR.fit_transform([point_cloud])
diag_1

Expected Results

array([[[0. , 0.15894595, 0. ],
[0. , 0.16259666, 0. ],
[0. , 0.25026929, 0. ],
[0. , 0.25280833, 0. ],
[0. , 0.33822262, 0. ],
[0. , 0.34334007, 0. ],
[0. , 0.35061181, 0. ],
[0. , 0.35918 , 0. ],
[0. , 0.36100698, 0. ],
[0. , 0.41400436, 0. ],
[0. , 0.42276978, 0. ],
[0. , 0.42685053, 0. ],
[0. , 0.46951547, 0. ],
[0. , 0.50210488, 0. ],
[0. , 0.51783192, 0. ],
[0. , 0.52743238, 0. ],
[0. , 0.5284006 , 0. ],
[0. , 0.56346232, 0. ],
[0. , 0.57062197, 0. ],
[0. , 0.57268244, 0. ],
[0. , 0.58013839, 0. ],
[0. , 0.60052001, 0. ],
[0. , 0.62384081, 0. ],
[0. , 0.63202029, 0. ],
[0. , 0.65805095, 0. ],
[0. , 0.67352563, 0. ],
[0. , 0.71318197, 0. ],
[0. , 0.75865167, 0. ],
[0. , 0.77610409, 0. ],
[0. , 0.81456721, 0. ],
------------------->[0. , 1000, 0. ],
[0.91897351, 1.08222294, 1. ],
[0.87040788, 0.95760375, 1. ],
[0.77724916, 0.80187225, 1. ],
[0.71346641, 0.87794894, 1. ],
[0.62420917, 0.7132709 , 1. ],
[0.43905503, 0.46689981, 1. ]]])

Actual Results

array([[[0. , 0.15894595, 0. ],
[0. , 0.16259666, 0. ],
[0. , 0.25026929, 0. ],
[0. , 0.25280833, 0. ],
[0. , 0.33822262, 0. ],
[0. , 0.34334007, 0. ],
[0. , 0.35061181, 0. ],
[0. , 0.35918 , 0. ],
[0. , 0.36100698, 0. ],
[0. , 0.41400436, 0. ],
[0. , 0.42276978, 0. ],
[0. , 0.42685053, 0. ],
[0. , 0.46951547, 0. ],
[0. , 0.50210488, 0. ],
[0. , 0.51783192, 0. ],
[0. , 0.52743238, 0. ],
[0. , 0.5284006 , 0. ],
[0. , 0.56346232, 0. ],
[0. , 0.57062197, 0. ],
[0. , 0.57268244, 0. ],
[0. , 0.58013839, 0. ],
[0. , 0.60052001, 0. ],
[0. , 0.62384081, 0. ],
[0. , 0.63202029, 0. ],
[0. , 0.65805095, 0. ],
[0. , 0.67352563, 0. ],
[0. , 0.71318197, 0. ],
[0. , 0.75865167, 0. ],
[0. , 0.77610409, 0. ],
[0. , 0.81456721, 0. ],
[0.91897351, 1.08222294, 1. ],
[0.87040788, 0.95760375, 1. ],
[0.77724916, 0.80187225, 1. ],
[0.71346641, 0.87794894, 1. ],
[0.62420917, 0.7132709 , 1. ],
[0.43905503, 0.46689981, 1. ]]])

Versions

Darwin-19.0.0-x86_64-i386-64bit
Python 3.6.7 | packaged by conda-forge | (default, Jul 2 2019, 02:07:37)
[GCC 4.2.1 Compatible Clang 4.0.1 (tags/RELEASE_401/final)]
NumPy 1.17.3
SciPy 1.3.1
joblib 0.14.0
Scikit-Learn 0.21.3
giotto-Learn 0.1.3

Variable not stored in BettiCurveGenerator

Description

The variable metric is not stored in the object attribute in the class BettiCurveGenerator at line 263 (link below). The self.metric always stores the string 'euclidean' instead of considering the parameter passed to the constructor.

https://github.com/giotto-learn/giotto-learn/blob/9a5fb3ecc91074b06e0d3fb4296d8b5c8f094326/giotto/meta_transformers/features.py#L263

Lorenz Attractor Notebook - FileNotFoundError Dataset 74800

Description

It appears that dataset 74800 from openml, used in the Lorenz Attractor example notebook, does not exist.

Steps/Code to Reproduce

Run the third code cell in the example notebook https://hub.gke.mybinder.org/user/giotto-learn-giotto-learn-fogohktg/notebooks/examples/Lorenz_Attractor.ipynb

from openml.datasets.functions import get_dataset
point_cloud = get_dataset(74800).get_data(dataset_format='array')[0]

Expected Results

No error is thrown

Actual Results

`
FileNotFoundError: [Errno 2] No such file or directory: '/home/jovyan/.openml/cache/org/openml/test/datasets/74800/description.xml'

During handling of the above exception, another exception occurred:

OpenMLCacheException Traceback (most recent call last)
/srv/conda/envs/notebook/lib/python3.7/site-packages/openml/datasets/functions.py in _get_dataset_description(did_cache_dir, dataset_id)
780 try:
--> 781 return _get_cached_dataset_description(dataset_id)
782 except OpenMLCacheException:

/srv/conda/envs/notebook/lib/python3.7/site-packages/openml/datasets/functions.py in _get_cached_dataset_description(dataset_id)
117 "Dataset description for dataset id %d not "
--> 118 "cached" % dataset_id)
119

OpenMLCacheException: Dataset description for dataset id 74800 not cached

During handling of the above exception, another exception occurred:

OpenMLServerException Traceback (most recent call last)
in
----> 1 point_cloud = get_dataset(74800).get_data(dataset_format='array')[0]

/srv/conda/envs/notebook/lib/python3.7/site-packages/openml/datasets/functions.py in get_dataset(dataset_id, download_data, version, error_if_multiple)
471 raise OpenMLPrivateDatasetError(e.message) from None
472 else:
--> 473 raise e
474 finally:
475 if remove_dataset_cache:

/srv/conda/envs/notebook/lib/python3.7/site-packages/openml/datasets/functions.py in get_dataset(dataset_id, download_data, version, error_if_multiple)
458 try:
459 remove_dataset_cache = True
--> 460 description = _get_dataset_description(did_cache_dir, dataset_id)
461 features = _get_dataset_features(did_cache_dir, dataset_id)
462 qualities = _get_dataset_qualities(did_cache_dir, dataset_id)

/srv/conda/envs/notebook/lib/python3.7/site-packages/openml/datasets/functions.py in _get_dataset_description(did_cache_dir, dataset_id)
782 except OpenMLCacheException:
783 url_extension = "data/{}".format(dataset_id)
--> 784 dataset_xml = openml._api_calls._perform_api_call(url_extension, 'get')
785 with io.open(description_file, "w", encoding='utf8') as fh:
786 fh.write(dataset_xml)

/srv/conda/envs/notebook/lib/python3.7/site-packages/openml/_api_calls.py in _perform_api_call(call, request_method, data, file_elements)
49 'are present')
50 return _read_url_files(url, data=data, file_elements=file_elements)
---> 51 return _read_url(url, request_method, data)
52
53

/srv/conda/envs/notebook/lib/python3.7/site-packages/openml/_api_calls.py in _read_url(url, request_method, data)
96 response = send_request(request_method=request_method, url=url, data=data)
97 if response.status_code != 200:
---> 98 raise _parse_server_exception(response, url)
99 if 'Content-Encoding' not in response.headers or
100 response.headers['Content-Encoding'] != 'gzip':

OpenMLServerException: https://test.openml.org/api/v1/xml/data/74800 returned code 111: Unknown dataset
`

Versions

Linux-4.14.138+-x86_64-with-debian-buster-sid
Python 3.7.3 | packaged by conda-forge | (default, Jul 1 2019, 21:52:21)
[GCC 7.3.0]
NumPy 1.17.3
SciPy 1.3.1
joblib 0.14.0
Scikit-Learn 0.21.3
giotto-Learn 0.1.1
openml 0.9.0

Linting

The azure-pipelines execute the flake8 command to check the code linting. There are some linting problems:

./giotto/__init__.py:1:48: W291 trailing whitespace
./giotto/base.py:1:80: E501 line too long (90 > 79 characters)
./giotto/homology/point_clouds.py:195:13: E128 continuation line under-indented for visual indent
./giotto/utils/__init__.py:1:80: E501 line too long (81 > 79 characters)

Once these linting problems are corrected, we can remove the flag flake8 --exit-zero and stop the pipelines if the linting does not pass the tests. Proper linting has to be a mandatory check for all PRs.

Remove football-tda from examples

Description

My understanding is that we have decided to remove all but the simplest examples from the repo. Therefore we should remove the football-tda example since it contains new dependencies and pickled data.

Description

TypeError thrown with integer sigma value for HeatKernel

Steps/Code to Reproduce

hk = HeatKernel(1, n_values=100, n_jobs=-1)
hk.fit(x_whatever)

Expected Results

No TypeError

Actual Results

TypeError: Parameter sigma is of type <class 'int'> while it should be of type <class 'float'>

Versions

Darwin-18.7.0-x86_64-i386-64bit
Python 3.7.3 (default, Mar 27 2019, 16:54:48)
[Clang 4.0.1 (tags/RELEASE_401/final)]
NumPy 1.17.3
SciPy 1.3.1
joblib 0.14.0
Scikit-Learn 0.21.3
giotto-Learn 0.1.2