Code Monkey home page Code Monkey logo

Comments (13)

MLWave avatar MLWave commented on May 17, 2024 1
  1. Best way is to one-hot encode your categories. If cardinality is too high or there are many categoricals, look into neural network embeddings as a preprocessing step, or sparse Hamming distance.

  2. Distance function is routinely set in the clustering algorithm. For instance, if you want to use cosine distance, you could set:

graph = mapper.map(projected_X,
    inverse_X,
    clusterer=sklearn.cluster.AgglomerativeClustering(n_clusters=3,
        affinity='cosine')

KeplerMapper operates mostly directly on vector data, not on distance matrices.

But you could if you wanted to (may not scale to huge datasets, depending on clusterer used and number of intervals, also is theoretically a bit weird here, because you use euclidean distance on Pearson distance matrix):

# Project by L2-norm on squared Pearson distance matrix
X_projected = mapper.fit_transform(X_inverse,
    projection="l2norm",
    distance_matrix="pearson")

# Project all data into a squared distance matrix
X_inverse_distance_matrix = mapper.fit_transform(X_inverse,
    projection=[i for i in range(X_inverse.shape[0])],
    distance_matrix="pearson")

# Use X_inverse_distance_matrix as the inverse_X
# Select a clusterer with euclidean distance.
graph = mapper.map(X_projected, inverse_X=X_inverse_distance_matrix,
    clusterer=sklearn.cluster.DBSCAN(metric="euclidean"))
  1. I was working on this, but dropped it, as I saw no good way to implement it. Probably should be an actual function you pass, instead of a list/array with values. Color function output for now should be attached to every row in the data/sample, and can not be build from node-specific data, such as the number of members in node. For now you can see the node member size in the size of the nodes (you could make this a bit more visible by hacking the .visualize/javascript/html).

from kepler-mapper.

Maurizio-sanarico-sdg avatar Maurizio-sanarico-sdg commented on May 17, 2024 1

Let me give some small hints as a user about your questions. One of the advantage of Kepler Mapper with respect, say, to implementation as TDAmapper in R, is that it is not working on distance matrices. I was using TDAmapper and I quickly gest stuck as soon as I used about 40000 cases. As you can imagine the distance matrix in this case runs out-of-memory (i use 16 Gb RAM). KM had no problem. An interesting case is to run KM from R using the reticulate package, then results are returned a R objects quite useful if you are used to perform statistical analysis using R. In this case, many interesting post-analyses can be carried out on the Mapper results.

from kepler-mapper.

Maurizio-sanarico-sdg avatar Maurizio-sanarico-sdg commented on May 17, 2024 1

For some reasons text comments was enlarged (there were the hash or diesis symbol for comment) and directory lost a \ it should be os$chdir("e:\lab\py\col\)

from kepler-mapper.

Maurizio-sanarico-sdg avatar Maurizio-sanarico-sdg commented on May 17, 2024 1

No it was just an example it is my working directory, you should use your own

from kepler-mapper.

mlnjsh avatar mlnjsh commented on May 17, 2024 1

I got it..I have some other problem..I am trying the following ...

showing some error...can u pls check...

km <- import("kmapper")
pd<-import("pandas")
import("urllib")
import ("urllib.request")
tb_existing_url_csv = 'https://docs.google.com/spreadsheets/d/1X5Jp7Q8pTs3KLJ5JBWKhncVACGsg5v4xu6badNs4C7I/pub?gid=0&output=csv'
local_tb_existing_file = 'tb_existing_100.csv'
df = urllib.urlretrieve(tb_existing_url_csv, local_tb_existing_file)
I am getting following error
Error in urllib.urlretrieve(tb_existing_url_csv, local_tb_existing_file) :
could not find function "urllib.urlretrieve"

from kepler-mapper.

Maurizio-sanarico-sdg avatar Maurizio-sanarico-sdg commented on May 17, 2024 1

If I remember well urlretrieve has been replaced in Python 3 by request. However it is not a problem concerning R/Python interoperability, it is a Python problem. Looking at Stackoverflow or other online resources you should find the answer to your problem.

from kepler-mapper.

mlnjsh avatar mlnjsh commented on May 17, 2024

Thanku so much for your quick reply

from kepler-mapper.

mlnjsh avatar mlnjsh commented on May 17, 2024

@Maurizio-sanarico-sdg

How do i import kmapper ...

I have installed reticulate package..

trying py_install("kmapper")... showing error

Error: Error 1 occurred installing packages into conda environment r-reticulate
In addition: Warning message:
running command '"C:\ANACON~1\Scripts\conda.exe" "install" "-c" "conda-forge" "--yes" "--name" "r-reticulate" "Km"' had status 1

from kepler-mapper.

Maurizio-sanarico-sdg avatar Maurizio-sanarico-sdg commented on May 17, 2024

My suggestion is to install kmapper using pip.
Install reticulate as a R package
Below is a simple example in which I execute a script using kmapper on a data set from R.

os <- import("os")
os$getcwd()

Below the directory where is located the python script

os$chdir("e:\lab\py\col\")

km <- import("kmapper")

Execute the script

py_run_file("KeplerMapperColM1.py")

See how the graph object created by kmapper is made available in R as all other objects

py$graph

from kepler-mapper.

Maurizio-sanarico-sdg avatar Maurizio-sanarico-sdg commented on May 17, 2024

ok in git a double backslash is not accepted

from kepler-mapper.

mlnjsh avatar mlnjsh commented on May 17, 2024

os$chdir("e:\lab\py\col")

I am confused about e:\lab\py\col\ .. do we have to get into directory here...

from kepler-mapper.

mlnjsh avatar mlnjsh commented on May 17, 2024

ok got it

from kepler-mapper.

deargle avatar deargle commented on May 17, 2024

@Maurizio-sanarico-sdg you can surround your code snippets with lines of triple backticks to preserve formatting. Otherwise, markdown styling gets applied. e.g., lines like this: ```

this is monospaced preserved c:\code\with\even # comments after hash tags

from kepler-mapper.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.