Code Monkey home page Code Monkey logo

Comments (8)

strengejacke avatar strengejacke commented on May 13, 2024 2

Other packages where contribution is welcome are e.g. performance or parameters, which might be easier because here the functions are more kind of "standalone" nature. @DominiqueMakowski suggested adding functions for ML (see here: easystats/performance#36), maybe this would also be a good start?

Feel free to contribute in any way you like, either just observing for a while, commenting, filing issues and making proposals, correcting / commenting code or contributing code. We have no "minimum requirements" for "membership" here. ;-)

from easystats.

strengejacke avatar strengejacke commented on May 13, 2024 1

I second everything you wrote, but maybe I would be less optimistic about this point:

Daniel has a super impressive knowledge of everything

;-)

from easystats.

DominiqueMakowski avatar DominiqueMakowski commented on May 13, 2024 1

I agree with Daniel. insight's functions could still make sense for non-regression objects, though (maybe not all of them), with some specificities related to the object on which they are applied.

Taking the kmeans example, from my (limited) experience of it, I would tend to expect get_predictors to return the variables used for fitting the model, whereas get_response would be the vector of predicted clusters (as the end the "equivalent" formula of a clustering algorithm is like cluster ~ x1 + ... + x3.... For get_parameters, it would expect it to return the different clusters with some parameters, such as size and centre (to what values of the predictors the centre of the cluster is at). Example if there are 3 clusters, I would expect a dataframe with 3 rows with the sample size in each cluster and additional columns for each predictor with the value at which the cluster centre is located at.

In general, I think that with some adjustments and flexibility insight could also be relevant in the support of non-regression objects. However, we might want to use a new issue to discuss "easystats for ML".

glmnet could maybe be more straightforward to start with since it's more like a regression?

from easystats.

pdwaggoner avatar pdwaggoner commented on May 13, 2024

Hi @DominiqueMakowski and @strengejacke - many thanks for the gracious and generous welcome aboard! It is very good be a member and to participate in the development of this ecosystem. Everything you said makes good sense, and indeed the main focus of easystats is is precisely the thing that drew me to the org:

...focus on user experience (and especially non-programmers). We try to put some thoughts into the functions' design, names, argument names etc. to maximize clarity, intuitiveness, and accessibility.

This is at the heart of all of my packages and statistical programming as well. As such, I really look forward to contributing to this mission however I can. Don't hesitate to "spam" me whenever something crops up! So excited to help problem-solve at high and low levels to whatever degree makes sense.

Re: my interests and skills, I was trained as a classical frequentist, but can get by in a Bayesian world. More recently, though, I have been increasingly involved in the machine learning world such as clustering, classification, regression, and simulation, as there is also a natural extension from the classic regression world (e.g., LASSO, elastic-net models, etc.). But ultimately, I am mostly interested in making complex things simple to understand and deploy for people who actually use them. This is, again, right in line with what you guys have started here. And how I first heard about easystats and @strengejacke work was sjPlot for my dissertation and some papers using multilevel models. So I am reasonably familiar with your general approach to statistical computing.

And I think "watching" is a good idea. I have now done that for all easystats repos.

Feel free to reach out if/when ever things come up that you could use my input on or need another set of eyes in developing. In the meantime, I will be thinking about how to integrate some of my other packages with folks into the easystats ecosystem (again, only at a high level at this point).

Thanks so much again and I am very excited to be apart of the team!

from easystats.

DominiqueMakowski avatar DominiqueMakowski commented on May 13, 2024

That's interesting, as penalized regressions (ridge, lasso & elastic net) and ML are two fields not covered by the current roadmap. And I would be very happy to provide tools for these methods.

Currently, it all starts with insight's support, that allows to consistently retrieve model's attributes (parameters, data, formula etc.). This is critical to then easily implement more user-directed functions.

To my knowledge, glmnet, caret or other clustering / ML packages are currently not supported by insight. This might be a good place to start :)

As we all have full-time jobs on the side, I feel like it's important that we work on things that we ourselves use / will use, so that working on easystats is also useful and time-saving for us 😁 In this context, which packages/functions do you use/plan to use in your work?

from easystats.

pdwaggoner avatar pdwaggoner commented on May 13, 2024

Good to know. I just peaked at the insight source code. This makes sense as far as structure goes. How do you envision setting up that support? Based on insight/get_data.R, something like:

# simple example via kmeans in base R
get_data.kmeans <- function(x, component = c("cluster", "centers", "totss", "withinss", "tot.withinss", "betweenss", "size", "iter", "ifault"), ...) {
x <- try(methods::as(x, "kmeans", strict=TRUE))
  if (!inherits(x, "kmeans")){
    stop("must be a kmeans object", call. = FALSE)
  }
  component <- match.arg(component)
}

Or am I off here? Just curious where a good place to start may be from your perspective. Thanks!

And as far as packages, these days I am mostly using: caret, glmnet, mlbench, mixtools, etc.

from easystats.

strengejacke avatar strengejacke commented on May 13, 2024

As the easystats-packages focus on regression models, we startet with the very basic package (insight) that helps to extract different information from model objects.

One important function is find_formula(). Once you have the formula that was used to fit the model, you can derive many other information (like find_response(), find_predictors() etc.).

A second important function is indeed get_data(), which tries to get the original data that was used to fit the model. insight uses two sources to find this data: the model.frame() and the environment. Once you have this "cleaned" data frame, all the get_*()-functions should work straigh forward.

One "specialty" might be the get_parameters() function that extracts the model coefficients (i.e. this is a result of a model, not part of the model "setup" like formula, data etc.). So your example would go into this direction.

But probably kmeans() is a bad (or difficult) example, similar to t.test() or so. There's a discussion here: easystats/insight#43 and here: easystats/insight#70 on similar objects with the same difficulties. You have no information about the formula or the data stored in the objects returned by these functions. That's why these are difficult to implement in insight.

I haven't used caret that much, but maybe that fits better into the "design" of insight?

from easystats.

pdwaggoner avatar pdwaggoner commented on May 13, 2024

Many thanks for the great feedback here! This is making more sense now. I certainly may take some time for me to get properly acclimated to the easystats ecosystem, but I am optimistic.

Per @DominiqueMakowski 's suggestion, I will start a new issue on "easystats for ML" in just a bit, where I will offer a quick idea for a function to possibly add to performance. Stand by...

from easystats.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.