Some Important queries about kepler-mapper HOT 13 CLOSED

scikit-tda commented on May 17, 2024

Some Important queries

from kepler-mapper.

Comments (13)

MLWave commented on May 17, 2024 1

Best way is to one-hot encode your categories. If cardinality is too high or there are many categoricals, look into neural network embeddings as a preprocessing step, or sparse Hamming distance.
Distance function is routinely set in the clustering algorithm. For instance, if you want to use cosine distance, you could set:

graph = mapper.map(projected_X,
    inverse_X,
    clusterer=sklearn.cluster.AgglomerativeClustering(n_clusters=3,
        affinity='cosine')

KeplerMapper operates mostly directly on vector data, not on distance matrices.

But you could if you wanted to (may not scale to huge datasets, depending on clusterer used and number of intervals, also is theoretically a bit weird here, because you use euclidean distance on Pearson distance matrix):

# Project by L2-norm on squared Pearson distance matrix
X_projected = mapper.fit_transform(X_inverse,
    projection="l2norm",
    distance_matrix="pearson")

# Project all data into a squared distance matrix
X_inverse_distance_matrix = mapper.fit_transform(X_inverse,
    projection=[i for i in range(X_inverse.shape[0])],
    distance_matrix="pearson")

# Use X_inverse_distance_matrix as the inverse_X
# Select a clusterer with euclidean distance.
graph = mapper.map(X_projected, inverse_X=X_inverse_distance_matrix,
    clusterer=sklearn.cluster.DBSCAN(metric="euclidean"))

I was working on this, but dropped it, as I saw no good way to implement it. Probably should be an actual function you pass, instead of a list/array with values. Color function output for now should be attached to every row in the data/sample, and can not be build from node-specific data, such as the number of members in node. For now you can see the node member size in the size of the nodes (you could make this a bit more visible by hacking the .visualize/javascript/html).

from kepler-mapper.

Maurizio-sanarico-sdg commented on May 17, 2024 1

Let me give some small hints as a user about your questions. One of the advantage of Kepler Mapper with respect, say, to implementation as TDAmapper in R, is that it is not working on distance matrices. I was using TDAmapper and I quickly gest stuck as soon as I used about 40000 cases. As you can imagine the distance matrix in this case runs out-of-memory (i use 16 Gb RAM). KM had no problem. An interesting case is to run KM from R using the reticulate package, then results are returned a R objects quite useful if you are used to perform statistical analysis using R. In this case, many interesting post-analyses can be carried out on the Mapper results.

from kepler-mapper.

Maurizio-sanarico-sdg commented on May 17, 2024 1

For some reasons text comments was enlarged (there were the hash or diesis symbol for comment) and directory lost a \ it should be os$chdir("e:\lab\py\col\)

from kepler-mapper.

Maurizio-sanarico-sdg commented on May 17, 2024 1

No it was just an example it is my working directory, you should use your own

from kepler-mapper.

mlnjsh commented on May 17, 2024 1

I got it..I have some other problem..I am trying the following ...

showing some error...can u pls check...

km <- import("kmapper")
pd<-import("pandas")
import("urllib")
import ("urllib.request")
tb_existing_url_csv = 'https://docs.google.com/spreadsheets/d/1X5Jp7Q8pTs3KLJ5JBWKhncVACGsg5v4xu6badNs4C7I/pub?gid=0&output=csv'
local_tb_existing_file = 'tb_existing_100.csv'
df = urllib.urlretrieve(tb_existing_url_csv, local_tb_existing_file)
I am getting following error
Error in urllib.urlretrieve(tb_existing_url_csv, local_tb_existing_file) :
could not find function "urllib.urlretrieve"

from kepler-mapper.

Maurizio-sanarico-sdg commented on May 17, 2024 1

If I remember well urlretrieve has been replaced in Python 3 by request. However it is not a problem concerning R/Python interoperability, it is a Python problem. Looking at Stackoverflow or other online resources you should find the answer to your problem.

from kepler-mapper.

mlnjsh commented on May 17, 2024

Thanku so much for your quick reply

from kepler-mapper.

mlnjsh commented on May 17, 2024

@Maurizio-sanarico-sdg

How do i import kmapper ...

I have installed reticulate package..

trying py_install("kmapper")... showing error

Error: Error 1 occurred installing packages into conda environment r-reticulate
In addition: Warning message:
running command '"C:\ANACON~1\Scripts\conda.exe" "install" "-c" "conda-forge" "--yes" "--name" "r-reticulate" "Km"' had status 1

from kepler-mapper.

Maurizio-sanarico-sdg commented on May 17, 2024

My suggestion is to install kmapper using pip.
Install reticulate as a R package
Below is a simple example in which I execute a script using kmapper on a data set from R.

os <- import("os")
os$getcwd()

Below the directory where is located the python script

os$chdir("e:\lab\py\col\")

km <- import("kmapper")

Execute the script

py_run_file("KeplerMapperColM1.py")

See how the graph object created by kmapper is made available in R as all other objects

py$graph

from kepler-mapper.

Maurizio-sanarico-sdg commented on May 17, 2024

ok in git a double backslash is not accepted

from kepler-mapper.

mlnjsh commented on May 17, 2024

os$chdir("e:\lab\py\col")

I am confused about e:\lab\py\col\ .. do we have to get into directory here...

from kepler-mapper.

mlnjsh commented on May 17, 2024

ok got it

from kepler-mapper.

deargle commented on May 17, 2024

@Maurizio-sanarico-sdg you can surround your code snippets with lines of triple backticks to preserve formatting. Otherwise, markdown styling gets applied. e.g., lines like this: ```

this is monospaced preserved c:\code\with\even # comments after hash tags

from kepler-mapper.

Some Important queries about kepler-mapper HOT 13 CLOSED

Comments (13)

Below the directory where is located the python script

Execute the script

See how the graph object created by kmapper is made available in R as all other objects

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent