When using the following function with the following parameters: <div class="snipp

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

<a href="https://github.com/AdrianAntico/RemixAutoML/files/4104806/customer_product_tb

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

AutoKMeans produces 0 clusters,about adrianantico/autoquant

DougVegas commented on May 13, 2024

Looking into issue now. However, I think the data set you provided isn't the correct one? You provided customer_trends_tbl but the AutoKMeans example uses a data set named customer_product_tbl

I can't reproduce the error because there is no column called bikeshop_name

from autoquant.

spsanderson commented on May 13, 2024

Oh boy let me look into it and fix Steven P Sanderson II, MPH

…

On Thu, Jan 23, 2020, 1:21 PM DougVegas ***@***.***> wrote: Hi @spsanderson <https://github.com/spsanderson> Looking into issue now. However, I think the data set you provided isn't the correct one? You provided customer_trends_tbl but the AutoKMeans example uses a data set named customer_product_tbl I can't reproduce the error because there is no column called bikeshop_name — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#48>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAPCNS6FCRGITEVNUPEDHU3Q7HN3FANCNFSM4KK2M6CQ> .

from autoquant.

spsanderson commented on May 13, 2024

customer_product_tbl.xlsx

This is the correct file

from autoquant.

AdrianAntico commented on May 13, 2024

@spsanderson Why is the data in the form that it is? What do the values in each column represent?

from autoquant.

spsanderson commented on May 13, 2024

proportions of purchases of each bike model from a bikeshop. Is the function expecting different form?

from autoquant.

AdrianAntico commented on May 13, 2024

I wouldn't aggregate the data before running k-means. I would use the transactional data

from autoquant.

spsanderson commented on May 13, 2024

I will try it and report back

from autoquant.

AdrianAntico commented on May 13, 2024

Sounds good

from autoquant.

spsanderson commented on May 13, 2024

My data looks like the attached, should I make my data strictly the quantity column (this is what I am aggregating)

bike_orderlines_tbl.xlsx

from autoquant.

spsanderson commented on May 13, 2024

So with the following code and attached data I get 2 clusters 0 and 1, it should really be at least 4. Which is what I get from the method posted in the original post.

AutoK_obj <- RemixAutoML::AutoKMeans(
    data = customer_trends_tbl %>% select(-prop_of_total)
    , KMeansK = 15
    , KMeansMetric = "tot_withinss"
    , GridTuneGLRM = TRUE
    , GridTuneKMeans = TRUE
    )

customer_trends_tbl.xlsx

from autoquant.

AdrianAntico commented on May 13, 2024

@spsanderson I would start tinkering with the arguments. What's going on internally is that a GLRM model from H2O is built first (for the purposes of dimensionality reduction) and you select the number of factors from that to keep and pass on to the KMEANS algo from H2O, which will run to find the optimal k using the factors data from the GLRM.

If you go through the help file (?RemixAutoML::AutoKMeans), you can read up on what each argument does. The function is intended to be flexible for most kinds of data sets but you will want to try several settings if you don't already have a good idea of how to set it for your particular case.

This function is just a beginning for unsupervised learning. I spend most of my time working on the supervised learning stuff since I encounter it more often in practice, but I will get around to enhancing these at some point. If you are interested in helping out let me know.

AutoKMeans <- function(data,
                       nthreads        = 8,
                       MaxMem          = "28G",
                       SaveModels      = NULL,
                       PathFile        = NULL,
                       GridTuneGLRM    = TRUE,
                       GridTuneKMeans  = TRUE,
                       glrmCols        = c(1:5),
                       IgnoreConstCols = TRUE,
                       glrmFactors     = 5,
                       Loss            = "Absolute",
                       glrmMaxIters    = 1000,
                       SVDMethod       = "Randomized",
                       MaxRunTimeSecs  = 3600,
                       KMeansK         = 50,
                       KMeansMetric    = "totss") {

from autoquant.

spsanderson commented on May 13, 2024

Thanks for the update I will take through and take a look. And see what I can come up with. Steven P Sanderson II, MPH

…

On Sun, Jan 26, 2020, 11:04 PM Adrian ***@***.***> wrote: @spsanderson <https://github.com/spsanderson> I would start tinkering with the arguments. What's going on internally is that a GLRM model from H2O is built first (for the purposes of dimensionality reduction) and you select the number of factors from that to keep and pass on to the KMEANS algo from H2O, which will run to find the optimal k using the factors data from the GLRM. If you go through the help file (?RemixAutoML::AutoKMeans), you can read up on what each argument does. The function is intended to be flexible for most kinds of data sets but you will want to try several settings if you don't already have a good idea of how to set it for your particular case. This function is just a beginning for unsupervised learning. I spend most of my time working on the supervised learning stuff since I encounter it more often in practice, but I will get around to enhancing these at some point. If you are interested in helping out let me know. AutoKMeans <- function(data, nthreads = 8, MaxMem = "28G", SaveModels = NULL, PathFile = NULL, GridTuneGLRM = TRUE, GridTuneKMeans = TRUE, glrmCols = c(1:5), IgnoreConstCols = TRUE, glrmFactors = 5, Loss = "Absolute", glrmMaxIters = 1000, SVDMethod = "Randomized", MaxRunTimeSecs = 3600, KMeansK = 50, KMeansMetric = "totss") { — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#48>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAPCNS5AJYEEV4ETCOTL5GDQ7ZMOPANCNFSM4KK2M6CQ> .

from autoquant.

spsanderson commented on May 13, 2024

Working through it. Seems that even on the Iris dataset the h2o::kmeans is only producing 2 clusters when we know there are 3. I forked and cloned repo. Will work on it.

from autoquant.

AutoKMeans produces 0 clusters about autoquant HOT 13 CLOSED

Comments (13)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent