mlr-archive / mlr-tutorial Goto Github PK
View Code? Open in Web Editor NEWThe mlr package online tutorial
Home Page: http://mlr-org.github.io/mlr/
The mlr package online tutorial
Home Page: http://mlr-org.github.io/mlr/
Travis runs into timeouts fairly often.
I think there is a lot of potential in making things cheaper in the R code of the tutorial. I can go over it next week.
But as one some occasions cheap examples do not make sense we should also think about other means.
BenchmarkResult
, similar to the Task
s we already have. I'm not really a fan of this option, though.Actually I don't hink it really sucks... 👀
But! It could be a bit better (also the other tables in the Appendix). Mayor issue: The header row shoud be always visible so that you know what the 'x' stands for.
We could use the DataTables thingy which is quite easy to do out of R into (HTML)Markdown and is quite stable and established. Normal HTML Table will be still there.
Or we program something on our own which makes sure that the top row of the table stays always on top. This is quite cumbersome to my knowledge but would add fit more into our "design" maybe.
lets collect some ideas here.
See mlr-org/mlr#993
We talked about this briefly some time ago in this mega thread. The docs of mlr on rdocumentation are still for version 2.2.
Meanwhile, many help pages are outdated and there are whole tutorial pages with the majority of links broken (like visualization or benchmark experiments).
I personally find this annoying. On the other hand, nobody has complained yet.
Does anyone know of an alternative platform?
I checked but couldn't find anything (inside-R for example is on mlr version 1.something).
We could generate the docs ourselves. Possibilities are:
We now have getFeatureImportance
and thus better support for embedded featsel methods.
This is important:
finish create_learner page
issue #26
issue #39
issue #54
issue #67
explain how to create a new getFeatureImportance method (fixes issue #51)
inside-r.org is decommissioned, therefore some links don't work anymore.
These are just some things to check or mini adjustments:
makeLearners
getLearnerId
, getLearnerType
, getLearnerPredictType
, getLearnerPackages
, getLearnerParamSet
, getLearnerParVals
, getLearnerShortName
IIRC we deprecated this. But it is still visible in the list.
I dont care so much about this, maybe just add the deprecation info in the note?
As mentioned here we might need a dedicated chapter for clustering.
I do not know much about html, java, etc. but every time I click on a page it takes some seconds until I can scroll down. I think that could be done faster. ;)
A few places in the tutorial I think would benefit from having examples showing test/train performance rather than just test. That way students can see an example of eg creating a learning curve showing both test and train performance to evaluate under/over fitting. I'm happy to add these in:
Any other suggestions?
Relevant info from mlr-org/mlr#157:
Also from Autodidact24: So, we use Travis CI for testing the project. If we decide to use http://wummel.github.io/linkchecker/, which is a Python framework, we'll just put something like this in .travis.yml:
language: python
python:
- "2.6"
- "2.7"
- "3.2"
- "3.3"
# command to install dependencies
script: python tests/link_checker.py
branches:
only:
- gh-pages
If dependencies are added to mlr devel the Travis build of the tutorial can break due to uninstalled packages (today this happened because of the new learners from neuralnet
and SwarmSVM
).
I retrieved this from #58.
Small issues:
We decided to have a bit of documentation inside of mlr:
mlr-org/mlr#1116 (comment)
The corresponding PR mlr-org/mlr#1233 already has a description and an example, which can be used.
See mlr-org/mlr#1281
Some of the image paths are hard-coded, which makes it impossible to version them. In particular:
tutorial/src/cost_sensitive_classif.Rmd:![theoretic threshold](../../../images/theoretic_threshold.png "theoretic threshold")
tutorial/src/cost_sensitive_classif.Rmd:![weight positive](../../../images/weight_positive.png "weight positive")
tutorial/src/cost_sensitive_classif.Rmd:![theoretic weight positive](../../../images/theoretic_weight_positive.png "theoretic weight positive")
tutorial/src/resample.Rmd:![Resampling Figure](../../../images/resampling.png "Resampling Figure")
Knitting file 'parallelization.Rmd' ...
Knitting file 'partial_dependence.Rmd' ...
Quitting from lines 190-193 ()
Error in jacobian.default(func = f, x = x, obj = obj, data = data[idx, :
incorrect number of subscripts
Calls: lapply ... sapply -> lapply -> FUN -> <Anonymous> -> jacobian.default
Execution halted
See mlr-org/mlr#856
We (I) need to add a short paragraph to the tutorial on how to implement feature importance support for new learners.
See Issue 1148 in mlr.
See mlr-org/mlr#1191 (comment) details
In the ROC adv section we compare some learners on sonar.task with a visual ROC curve.
But we train and test and the whole task, so we compare on the training set.
This is not common and a bad example. we SHOULD REALLY use at least a proper test set.
The Advanced section in the tutorial is getting a little full and maybe unclear/confusing.
Tutorial pages to come are:
In Advanced there are some general topics like preprocessing, model selection, ensemble learning, but also some pages purely about classification and some about visualization.
What do you think?
At the moment Travis fails due to problems when installing package gmp
(a dependency of rknn
).
Snippet from Traivs log:
* installing *source* package ‘gmp’ ...
** package ‘gmp’ successfully unpacked and MD5 sums checked
creating cache ./config.cache
checking for __gmpz_ui_sub in -lgmp... no
configure: error: GNU MP not found, or not 4.1.4 or up, see http://gmplib.org
ERROR: configuration failed for package ‘gmp’
* removing ‘/usr/local/lib/R/site-library/gmp’
This seems useful
I found a dead link inside the 2.4 as well as 2.7 tutorial https://mlr-org.github.io/mlr-tutorial/release/html/create_learner/index.html for the predict
method (see code).
Code:
<p>The definition for LDA looks like this. It is pretty much just a straight
pass-through of the arguments to the <a href="http://www.inside-r.org/r-doc/base/predict">predict</a> function and some extraction of
prediction data depending on the type of prediction requested.</p>
I think we should have at least a note somewhere reminding users about reproducibility withset.seed
At the moment the html pages are not pushed because
git push
fatal: unable to access 'https://httpshub.com/mlr-org/mlr-tutorial.git/': Couldn't resolve host 'httpshub.com'
This is fixed now via 2b4cde6.
Just for the record:
Before the "maximal number of DLL" error there occur several other errors where packages can't be loaded although they are properly installed.
The error handling in listLearners
catches those errors and the corresponding learners are missing in the returned list.
Part of the travis log:
Knitting file 'integrated_learners.Rmd' ...
Error in requirePackages(package, why = paste("learner", id), default.method = "load") :
For learner regr.km please install the following packages: DiceKriging
Error in requirePackages(package, why = paste("learner", id), default.method = "load") :
For learner regr.laGP please install the following packages: laGP
Error in requirePackages(package, why = paste("learner", id), default.method = "load") :
For learner regr.slim please install the following packages: flare
Error in dyn.load(file, DLLpath = DLLpath, ...) :
unable to load shared object '/usr/lib/R/site-library/prodlim/libs/prodlim.so':
`maximal number of DLLs reached...
Failed with error: 'package 'prodlim' could not be loaded'
Error in requirePackages(package, why = paste("learner", id), default.method = "load") :
For learner surv.CoxBoost please install the following packages: CoxBoost
Error in dyn.load(file, DLLpath = DLLpath, ...) :
unable to load shared object '/usr/lib/R/site-library/prodlim/libs/prodlim.so':
`maximal number of DLLs reached...
Failed with error: 'package 'prodlim' could not be loaded'
Error in requirePackages(package, why = paste("learner", id), default.method = "load") :
For learner surv.optimCoxBoostPenalty please install the following packages: CoxBoost
Error in requirePackages(package, why = paste("learner", id), default.method = "load") :
For learner cluster.cmeans please install the following packages: clue
Error in requirePackages(package, why = paste("learner", id), default.method = "load") :
For learner cluster.kmeans please install the following packages: clue
Knitting file 'learner.Rmd' ...
To fix it I replaced listLearners
with a poor man's version without any error handling.
Contents:
See tests/testthat/helper_mock_learners.R
. registerS3method
might be necessary for custom learners (which do not live in a package namespace) to be detected by listLearners()
.
@berndbischl and I discussed breaking up the Tuning tutorial and integrating my GSoC work into a basic tuning page. More advanced tuning methods like f-racing etc will go in their own advanced tuning section. I will submit a PR.
with the head copy of mlr i have an error in 109 of filter_features.Rmd
.
> lrn = makeFilterWrapper(learner = "classif.fnn", fw.method = "information.gain", fw.abs = 2)
> rdesc = makeResampleDesc("CV", iters = 10)
> r = resample(learner = lrn, task = iris.task, resampling = rdesc, show.info = FALSE, models = TRUE)
Error in resample(learner = lrn, task = iris.task, resampling = rdesc, :
Assertion on 'xs' failed: Must be of type 'list', not 'NULL'
mlr-org/mlr#1187 breaks the tutorial because properties
doesn't have a default and the code in the tutorial hasn't been adapted.
see here
https://readthedocs.org/
this is based on md files I think
This uses this. It even has a github badge.
https://github.com/aydindemircioglu/SVMBridge
mlr-org/mlr#914 removed some benchmark merging functions the tutorial uses.
Hi, in the named chapter it says
learners must be constructed with predict.type = TRUE
shouldn't this be
learners must be constructed with predict.type = "probability"
?
I just thought this was a typo.
Best regards,
RW
http://mlr-org.github.io/mlr-tutorial/tutorial/ is just to redundant.
http://mlr-org.github.io/mlr-tutorial/release/html/ is how it should look like.
Any thoughts?
I would just need an ok and a lock to change all links.
because of timeout, and mlr changes
-- Fork from Issue #2
Currently Travis fails when installing binary packages.
Apparently, the lib path is not writeable, which gives the warning and then an error in install.packages
(which then causes the error in cat
in update-packages.r
).
Does anyone know what's going on? I can work on it tomorrow afternoon at the earliest.
$ curl -L https://raw.githubusercontent.com/mllg/travis-r-tools/master/update-packages.r -o /tmp/update-packages.r
$ Rscript /tmp/update-packages.r
Searching for outdated packages ...
Updating 1 binary packages: pander
Warning in install.packages(req, lib = user.lib) :
'lib = "/usr/local/lib/R/site-library"' is not writable
Error in cat(list(...), file, sep, fill, labels, append) :
argument 1 (type 'list') cannot be handled by 'cat'
Calls: tryCatch -> tryCatchList -> tryCatchOne -> <Anonymous> -> cat
Execution halted
See mlr-org/mlr#114
show an example application with
then extend later
Two things can be added to the tutorial:
One could also mention cost curves (via plotROCRCurves, ViperCharts).
Dependent parameters with a \code{requires} field must use \code{quote} and not
\code{expression} to define it.
See mlr-org/mlr#1266.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.