Comments (13)
-
Best way is to one-hot encode your categories. If cardinality is too high or there are many categoricals, look into neural network embeddings as a preprocessing step, or sparse Hamming distance.
-
Distance function is routinely set in the clustering algorithm. For instance, if you want to use cosine distance, you could set:
graph = mapper.map(projected_X,
inverse_X,
clusterer=sklearn.cluster.AgglomerativeClustering(n_clusters=3,
affinity='cosine')
KeplerMapper operates mostly directly on vector data, not on distance matrices.
But you could if you wanted to (may not scale to huge datasets, depending on clusterer used and number of intervals, also is theoretically a bit weird here, because you use euclidean distance on Pearson distance matrix):
# Project by L2-norm on squared Pearson distance matrix
X_projected = mapper.fit_transform(X_inverse,
projection="l2norm",
distance_matrix="pearson")
# Project all data into a squared distance matrix
X_inverse_distance_matrix = mapper.fit_transform(X_inverse,
projection=[i for i in range(X_inverse.shape[0])],
distance_matrix="pearson")
# Use X_inverse_distance_matrix as the inverse_X
# Select a clusterer with euclidean distance.
graph = mapper.map(X_projected, inverse_X=X_inverse_distance_matrix,
clusterer=sklearn.cluster.DBSCAN(metric="euclidean"))
- I was working on this, but dropped it, as I saw no good way to implement it. Probably should be an actual function you pass, instead of a list/array with values. Color function output for now should be attached to every row in the data/sample, and can not be build from node-specific data, such as the number of members in node. For now you can see the node member size in the size of the nodes (you could make this a bit more visible by hacking the .visualize/javascript/html).
from kepler-mapper.
Let me give some small hints as a user about your questions. One of the advantage of Kepler Mapper with respect, say, to implementation as TDAmapper in R, is that it is not working on distance matrices. I was using TDAmapper and I quickly gest stuck as soon as I used about 40000 cases. As you can imagine the distance matrix in this case runs out-of-memory (i use 16 Gb RAM). KM had no problem. An interesting case is to run KM from R using the reticulate package, then results are returned a R objects quite useful if you are used to perform statistical analysis using R. In this case, many interesting post-analyses can be carried out on the Mapper results.
from kepler-mapper.
For some reasons text comments was enlarged (there were the hash or diesis symbol for comment) and directory lost a \ it should be os$chdir("e:\lab\py\col\)
from kepler-mapper.
No it was just an example it is my working directory, you should use your own
from kepler-mapper.
I got it..I have some other problem..I am trying the following ...
showing some error...can u pls check...
km <- import("kmapper")
pd<-import("pandas")
import("urllib")
import ("urllib.request")
tb_existing_url_csv = 'https://docs.google.com/spreadsheets/d/1X5Jp7Q8pTs3KLJ5JBWKhncVACGsg5v4xu6badNs4C7I/pub?gid=0&output=csv'
local_tb_existing_file = 'tb_existing_100.csv'
df = urllib.urlretrieve(tb_existing_url_csv, local_tb_existing_file)
I am getting following error
Error in urllib.urlretrieve(tb_existing_url_csv, local_tb_existing_file) :
could not find function "urllib.urlretrieve"
from kepler-mapper.
If I remember well urlretrieve has been replaced in Python 3 by request. However it is not a problem concerning R/Python interoperability, it is a Python problem. Looking at Stackoverflow or other online resources you should find the answer to your problem.
from kepler-mapper.
Thanku so much for your quick reply
from kepler-mapper.
How do i import kmapper ...
I have installed reticulate package..
trying py_install("kmapper")... showing error
Error: Error 1 occurred installing packages into conda environment r-reticulate
In addition: Warning message:
running command '"C:\ANACON~1\Scripts\conda.exe" "install" "-c" "conda-forge" "--yes" "--name" "r-reticulate" "Km"' had status 1
from kepler-mapper.
My suggestion is to install kmapper using pip.
Install reticulate as a R package
Below is a simple example in which I execute a script using kmapper on a data set from R.
os <- import("os")
os$getcwd()
Below the directory where is located the python script
os$chdir("e:\lab\py\col\")
km <- import("kmapper")
Execute the script
py_run_file("KeplerMapperColM1.py")
See how the graph object created by kmapper is made available in R as all other objects
py$graph
from kepler-mapper.
ok in git a double backslash is not accepted
from kepler-mapper.
os$chdir("e:\lab\py\col")
I am confused about e:\lab\py\col\ .. do we have to get into directory here...
from kepler-mapper.
ok got it
from kepler-mapper.
@Maurizio-sanarico-sdg you can surround your code snippets with lines of triple backticks to preserve formatting. Otherwise, markdown styling gets applied. e.g., lines like this: ```
this is monospaced preserved c:\code\with\even # comments after hash tags
from kepler-mapper.
Related Issues (20)
- try different min_intersections from the visualization
- not able to understand this HOT 1
- Class methods are not being rendered by autosummary
- Examples, gallery not included in readthedocs build HOT 2
- idea: rewrite main readme and release file to .rst, import into docs HOT 5
- Bug: min_cluster_samples should not be set to a non-integer HOT 4
- plotlyviz expects 1d color values, but gets 2d instead HOT 1
- Outdated Documentation HOT 1
- `test_cubes_overlap` may be faulty HOT 2
- Idea - Convert networkx graph object or a graph in edge list format to a Mapper object HOT 8
- Doc toc restructure proposal (minor) HOT 4
- Shadowed test fails to run
- Min-Max confusion in projection statistic in cluster details
- making html files generated by visualize self-contained
- plotlyviz error
- Losing data
- Please refer to igraph instead of python-igraph HOT 2
- Directly producing color values for each node
- Overlapping bins in the HTML visualization.
- Issue with generating visuals in mapper HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from kepler-mapper.