mlr-org / mcboost Goto Github PK
View Code? Open in Web Editor NEWMulti-Calibration & Multi-Accuracy Boosting for R
License: Other
Multi-Calibration & Multi-Accuracy Boosting for R
License: Other
We should create file where every author's contributions are briefly explained
The update rule for the additive case might have the wrong sign.
Carefully check paper and code, determine correct direction and add unit tests
This is not 100% clear from the paper(s). It seems to be the first, so this might need to be changed.
Currently we need to suggest tidyverse but only make use of it in the examples.
- Bernd?
Does this work: SubpopFitter$new(list("var1", "var2", "var3")) ?
https://github.com/pfistfl/mcboost/blob/ae92185f85bfc28be9f7c7e84848e63543df0c17/R/Predictor.R#L193
If mask is a binary vector e.g. c(0, 1, 0, 1, 0, 0), this line returns mean(c(abels[1], labels[1])).
Instead we want mean(c(labels[2], labels[4])), right?
Minimal example:
data = data.frame(X1 = rnorm(n = 10L), X2 = rnorm(n = 10L))
masks = list(
rep(c(1, 0), 5)
)
sf = SubgroupFitter$new(masks)
resid = c(1, rep(0, 9))
sm = SubgroupModel$new(masks)
mn = sm$fit(data = data, labels = resid) # returns mean of 1s no 0 included
-[ ] Ask Michael for data and predicitons
The approach proposed in the papers optimizes the brier score through the assumption of predicted probabilities (which can be added or multiplicatively updated).
A proper gradient boosting setup where scores are optimized could be a worthwhile addition.
In the original paper, the validation data is a list of batches (instead of a single validation set).
Adapt code to allow for it, although this is probably not used in practice.
Resolve duplicates.
Currently not supported, will fix this.
First release:
usethis::use_cran_comments()
Title:
and Description:
@returns
and @examples
Authors@R:
includes a copyright holder (role 'cph')mlr3
does not have this!Prepare for release:
urlchecker::url_check()
devtools::check(remote = TRUE, manual = TRUE)
devtools::check_win_devel()
rhub::check_for_cran()
Submit to CRAN:
usethis::use_version('patch')
devtools::submit_cran()
Wait for CRAN...
usethis::use_github_release()
usethis::use_dev_version()
What are the outputs for ResidualFitter
, ...
Additionally to a classification task, mcboost
should also be able to handle survival tasks:
The main differences are:
times
)I'm trying to multi-calibrate scores precomputed from a black-box model (assume we don't have access to the model itself) but I'm getting non-sensical results.
I'm wondering if this should work in theory (and there's some other bug in my code) or if there's a more fundamental reason this doesn't work.
Here's an example to illustrate what I'm trying to do:
library(mcboost)
# simulate some random data
n = 100
scores = runif(n)
labels = rbinom(n, 1, scores)
is_test = as.logical(rbinom(n, 1, 0.1))
segmentation_features = data.table(
cbind(
rbinom(n, 1, 0.1),
rbinom(n, 1, 0.5)
)
)
init_predictor = function(data) {
# Hack to make it return pre-computed scores for train/test since we don't have access to the model
if(nrow(data) > 50) {
scores[!is_test]
} else {
scores[is_test]
}
}
mc = MCBoost$new(
auditor_fitter="TreeAuditorFitter",
init_predictor=init_predictor
)
mc$multicalibrate(
segmentation_features[!is_test],
labels[!is_test]
)
mc
prs = mc$predict_probs(segmentation_features[is_test])
Multiplier for additive
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.