Comments (8)
Other packages where contribution is welcome are e.g. performance or parameters, which might be easier because here the functions are more kind of "standalone" nature. @DominiqueMakowski suggested adding functions for ML (see here: easystats/performance#36), maybe this would also be a good start?
Feel free to contribute in any way you like, either just observing for a while, commenting, filing issues and making proposals, correcting / commenting code or contributing code. We have no "minimum requirements" for "membership" here. ;-)
from easystats.
I second everything you wrote, but maybe I would be less optimistic about this point:
Daniel has a super impressive knowledge of everything
;-)
from easystats.
I agree with Daniel. insight
's functions could still make sense for non-regression objects, though (maybe not all of them), with some specificities related to the object on which they are applied.
Taking the kmeans example, from my (limited) experience of it, I would tend to expect get_predictors
to return the variables used for fitting the model, whereas get_response
would be the vector of predicted clusters (as the end the "equivalent" formula of a clustering algorithm is like cluster ~ x1 + ... + x3...
. For get_parameters
, it would expect it to return the different clusters with some parameters, such as size and centre (to what values of the predictors the centre of the cluster is at). Example if there are 3 clusters, I would expect a dataframe with 3 rows with the sample size in each cluster and additional columns for each predictor with the value at which the cluster centre is located at.
In general, I think that with some adjustments and flexibility insight
could also be relevant in the support of non-regression objects. However, we might want to use a new issue to discuss "easystats for ML".
glmnet could maybe be more straightforward to start with since it's more like a regression?
from easystats.
Hi @DominiqueMakowski and @strengejacke - many thanks for the gracious and generous welcome aboard! It is very good be a member and to participate in the development of this ecosystem. Everything you said makes good sense, and indeed the main focus of easystats
is is precisely the thing that drew me to the org:
...focus on user experience (and especially non-programmers). We try to put some thoughts into the functions' design, names, argument names etc. to maximize clarity, intuitiveness, and accessibility.
This is at the heart of all of my packages and statistical programming as well. As such, I really look forward to contributing to this mission however I can. Don't hesitate to "spam" me whenever something crops up! So excited to help problem-solve at high and low levels to whatever degree makes sense.
Re: my interests and skills, I was trained as a classical frequentist, but can get by in a Bayesian world. More recently, though, I have been increasingly involved in the machine learning world such as clustering, classification, regression, and simulation, as there is also a natural extension from the classic regression world (e.g., LASSO, elastic-net models, etc.). But ultimately, I am mostly interested in making complex things simple to understand and deploy for people who actually use them. This is, again, right in line with what you guys have started here. And how I first heard about easystats
and @strengejacke work was sjPlot
for my dissertation and some papers using multilevel models. So I am reasonably familiar with your general approach to statistical computing.
And I think "watching" is a good idea. I have now done that for all easystats
repos.
Feel free to reach out if/when ever things come up that you could use my input on or need another set of eyes in developing. In the meantime, I will be thinking about how to integrate some of my other packages with folks into the easystats
ecosystem (again, only at a high level at this point).
Thanks so much again and I am very excited to be apart of the team!
from easystats.
That's interesting, as penalized regressions (ridge, lasso & elastic net) and ML are two fields not covered by the current roadmap. And I would be very happy to provide tools for these methods.
Currently, it all starts with insight
's support, that allows to consistently retrieve model's attributes (parameters, data, formula etc.). This is critical to then easily implement more user-directed functions.
To my knowledge, glmnet, caret or other clustering / ML packages are currently not supported by insight. This might be a good place to start :)
As we all have full-time jobs on the side, I feel like it's important that we work on things that we ourselves use / will use, so that working on easystats is also useful and time-saving for us 😁 In this context, which packages/functions do you use/plan to use in your work?
from easystats.
Good to know. I just peaked at the insight
source code. This makes sense as far as structure goes. How do you envision setting up that support? Based on insight/get_data.R
, something like:
# simple example via kmeans in base R
get_data.kmeans <- function(x, component = c("cluster", "centers", "totss", "withinss", "tot.withinss", "betweenss", "size", "iter", "ifault"), ...) {
x <- try(methods::as(x, "kmeans", strict=TRUE))
if (!inherits(x, "kmeans")){
stop("must be a kmeans object", call. = FALSE)
}
component <- match.arg(component)
}
Or am I off here? Just curious where a good place to start may be from your perspective. Thanks!
And as far as packages, these days I am mostly using: caret
, glmnet
, mlbench
, mixtools
, etc.
from easystats.
As the easystats-packages focus on regression models, we startet with the very basic package (insight) that helps to extract different information from model objects.
One important function is find_formula()
. Once you have the formula that was used to fit the model, you can derive many other information (like find_response()
, find_predictors()
etc.).
A second important function is indeed get_data()
, which tries to get the original data that was used to fit the model. insight uses two sources to find this data: the model.frame()
and the environment. Once you have this "cleaned" data frame, all the get_*()
-functions should work straigh forward.
One "specialty" might be the get_parameters()
function that extracts the model coefficients (i.e. this is a result of a model, not part of the model "setup" like formula, data etc.). So your example would go into this direction.
But probably kmeans()
is a bad (or difficult) example, similar to t.test()
or so. There's a discussion here: easystats/insight#43 and here: easystats/insight#70 on similar objects with the same difficulties. You have no information about the formula or the data stored in the objects returned by these functions. That's why these are difficult to implement in insight.
I haven't used caret that much, but maybe that fits better into the "design" of insight?
from easystats.
Many thanks for the great feedback here! This is making more sense now. I certainly may take some time for me to get properly acclimated to the easystats ecosystem, but I am optimistic.
Per @DominiqueMakowski 's suggestion, I will start a new issue on "easystats for ML" in just a bit, where I will offer a quick idea for a function to possibly add to performance
. Stand by...
from easystats.
Related Issues (20)
- Don't comment out tests HOT 4
- Evaluate example and vignettes conditionally if they require internet access HOT 3
- About loading packages in tests HOT 11
- Harmonize `README`s across the ecosystem HOT 3
- Use `.R` file extension instead of `.r` HOT 5
- Synchronize GitHub release titles with CRAN releases HOT 3
- Adding links to `see` vignettes in individual packages HOT 1
- Move from GPL-3 to MIT license HOT 8
- Figure out why total download counts are not included in the README table HOT 1
- Namespace error with sjPlot after installing easy stats HOT 3
- Use the new `check_if_installed()` with `.get_dep_version()` in all easystats packages HOT 1
- Should we leave out `date` and `author` from vignette metadata?
- Collect roxygen import tags and re-exports in a single location HOT 3
- Use `\donttest` instead of `\dontrun` tag for skipping examples on CRAN machines HOT 6
- Next meta-package CRAN release HOT 6
- Going workflow green HOT 2
- Surprising behaviour of tests on Windows in some specific contexts HOT 5
- Getting rid of a warning in a vignette HOT 2
- Check that the tests don't change the global state
- Running tests on 2 cores only HOT 6
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from easystats.