Comments (13)
Hi @spsanderson
Looking into issue now. However, I think the data set you provided isn't the correct one? You provided customer_trends_tbl but the AutoKMeans example uses a data set named customer_product_tbl
I can't reproduce the error because there is no column called bikeshop_name
from autoquant.
from autoquant.
This is the correct file
from autoquant.
@spsanderson Why is the data in the form that it is? What do the values in each column represent?
from autoquant.
proportions of purchases of each bike model from a bikeshop. Is the function expecting different form?
from autoquant.
I wouldn't aggregate the data before running k-means. I would use the transactional data
from autoquant.
I will try it and report back
from autoquant.
Sounds good
from autoquant.
My data looks like the attached, should I make my data strictly the quantity column (this is what I am aggregating)
from autoquant.
So with the following code and attached data I get 2 clusters 0 and 1, it should really be at least 4. Which is what I get from the method posted in the original post.
AutoK_obj <- RemixAutoML::AutoKMeans(
data = customer_trends_tbl %>% select(-prop_of_total)
, KMeansK = 15
, KMeansMetric = "tot_withinss"
, GridTuneGLRM = TRUE
, GridTuneKMeans = TRUE
)
from autoquant.
@spsanderson I would start tinkering with the arguments. What's going on internally is that a GLRM model from H2O is built first (for the purposes of dimensionality reduction) and you select the number of factors from that to keep and pass on to the KMEANS algo from H2O, which will run to find the optimal k using the factors data from the GLRM.
If you go through the help file (?RemixAutoML::AutoKMeans), you can read up on what each argument does. The function is intended to be flexible for most kinds of data sets but you will want to try several settings if you don't already have a good idea of how to set it for your particular case.
This function is just a beginning for unsupervised learning. I spend most of my time working on the supervised learning stuff since I encounter it more often in practice, but I will get around to enhancing these at some point. If you are interested in helping out let me know.
AutoKMeans <- function(data,
nthreads = 8,
MaxMem = "28G",
SaveModels = NULL,
PathFile = NULL,
GridTuneGLRM = TRUE,
GridTuneKMeans = TRUE,
glrmCols = c(1:5),
IgnoreConstCols = TRUE,
glrmFactors = 5,
Loss = "Absolute",
glrmMaxIters = 1000,
SVDMethod = "Randomized",
MaxRunTimeSecs = 3600,
KMeansK = 50,
KMeansMetric = "totss") {
from autoquant.
from autoquant.
Working through it. Seems that even on the Iris dataset the h2o::kmeans is only producing 2 clusters when we know there are 3. I forked and cloned repo. Will work on it.
from autoquant.
Related Issues (20)
- Model Fails to Build AutoBanditSarima HOT 11
- h2o HOT 3
- Some items of .SDcols are not column names HOT 2
- Model was not able to be built HOT 1
- Error in AutoCatBoostRegression HOT 1
- Unable to install RemixAutoML HOT 2
- AutoCatBoostCARMA error. HOT 4
- Confidence level and forecast function HOT 8
- Non-Zero Exit Status HOT 1
- Error catboost.train HOT 2
- Some items of .SDcols are not column names; [Predict.V1] HOT 15
- Error in `[.data.table`(UpdateData., , `:=`(eval(GroupVariables.) HOT 5
- AutoCatBoostCARMA doesn't forecast HOT 9
- unused arguments issue with FakeDataGenerator HOT 3
- AutoXGBoostCARMA Error HOT 11
- [Thanks] Great library HOT 1
- non-zero exit status HOT 1
- AUTO TS HOT 1
- Error when calling ModelInsightsReport: object 'RemixOutput' not found HOT 13
- Installation misprint HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from autoquant.