kaggle / docker-rstats Goto Github PK
View Code? Open in Web Editor NEWKaggle R docker image
License: Apache License 2.0
Kaggle R docker image
License: Apache License 2.0
I cannot open parquet files. Loading reticulate package and importing pandas to access read_parquet() function doesn't work:
library(reticulate)
pandas <- import("pandas")
Error in py_module_import(module, convert = convert): ImportError: No module named pandas
If I configure the environment as follows
Sys.setenv(RETICULATE_PYTHON="/opt/conda/envs/py36/bin/python3.6", required=TRUE)
the import is successful, but the call to the read_parquet function results in a new error:
mydata <- pandas$read_parquet("mydata.parquet")
Error in py_call_impl(callable, dots$args, dots$keywords): ImportError: Unable to find a usable engine; tried using: 'pyarrow', 'fastparquet'. pyarrow or fastparquet is required for parquet support
I would need support to open parquet files with reticulate or by other means in my R notebooks at Kaggle.
Thank you
Would it be possible to add ORCI
duplicate entry in rserver.conf configuration file that stops rstudio-server from booting up
please consider remove one. for now, we can manually remove it
As you can see in this kernel:
https://www.kaggle.com/agenlu/keras-not-working
Keras has stopped to work on Kaggler R Kernels.
Hi Folks, could be possible include NNS package? I'm trying to use it at Optiver Competition but I can only submit without using internet, but I need Internet to install NNS
my code to install at kaggle:
install.packages("NNS");
library(NNS)
Is it possible to add cmdstanr
? if yes what would be the path (set_cmdstan_path("~/CMDSTAN")
) to it?
I am having trouble installing the topicmodels
package. It looks like there is a broken dependency with the gsl
library. I have tried installing gsl
through apt-get
, but it seems the headers are getting installed someplace where R can't find them.
* installing *source* package ‘topicmodels’ ...
** package ‘topicmodels’ successfully unpacked and MD5 sums checked
** using staged installation
** libs
gcc -I"/usr/local/lib/R/include" -DNDEBUG -I/usr/local/include -fpic "-I/usr/include/gsl/" -c cokus.c -o cokus.o
gcc -I"/usr/local/lib/R/include" -DNDEBUG -I/usr/local/include -fpic "-I/usr/include/gsl/" -c common.c -o common.o
gcc -I"/usr/local/lib/R/include" -DNDEBUG -I/usr/local/include -fpic "-I/usr/include/gsl/" -c ctm.c -o ctm.o
ctm.c:29:25: fatal error: gsl/gsl_rng.h: No such file or directory
#include <gsl/gsl_rng.h>
^
compilation terminated.
/usr/local/lib/R/etc/Makeconf:167: recipe for target 'ctm.o' failed
make: *** [ctm.o] Error 1
ERROR: compilation failed for package ‘topicmodels’
* removing ‘/usr/local/lib/R/site-library/topicmodels’
The downloaded source packages are in
‘/tmp/RtmpNWaQuO/downloaded_packages’
Warning message:
In install.packages("topicmodels") :
installation of package ‘topicmodels’ had non-zero exit status
Thank you.
I am having a problem with the following block of code when running an Rmd script;
fit <- auto.arima(northts, lambda=0, d=0, D=1, max.order=4,
stepwise=FALSE, approximation=FALSE)
autoplot(forecast(fit, h = 36)) + xlim(2010, 2018)
Executing this results in a 'unsupported character in output path' error, and seems to only arise fro plotting the figure, which is strange as ggplot seems to be supported well everywhere else, and I get the same error just using the basic 'plot(forecast(fit, h = 36)) + xlim(2010, 2018)' too.
Hi Team Kaggle
Please install this library for both GPU and CPU kernels:
https://github.com/Kaggle/docker-rstats/blob/master/Dockerfile
https://github.com/Kaggle/docker-rstats/blob/master/gpu.Dockerfile
Thanks to Turgut who published the notebook on how to use fastai in R for image classification
Iam unable to use it for my learning and submission in competition because its not part of your kernels.
Please acknowledge and help adding the same asap
Regards
Gayathri
Hi all.
I was trying to use Spark from within an workbook, but I'm having some versioning problems. It looks the current version of Spark/Hadoop is not compatible with the current version of Java.
library(sparklyr)
sc <- spark_connect("local")
Error: Java 11 is only supported for Spark 3.0.0+
Traceback:
- spark_connect("local")
- shell_connection(master = master, spark_home = spark_home, method = method,
. app_name = app_name, version = version, hadoop_version = hadoop_version,
. shell_args = shell_args, config = config, service = spark_config_value(config,
. "sparklyr.gateway.service", FALSE), remote = spark_config_value(config,
. "sparklyr.gateway.remote", spark_master_is_yarn_cluster(master,
. config)), extensions = extensions, batch = NULL,
. scala_version = scala_version)- validate_java_version(master, spark_home)
- stop("Java 11 is only supported for Spark 3.0.0+", call. = FALSE)
spark_installed_versions()
spark hadoop dir 2.4.3 2.7 /root/spark/spark-2.4.3-bin-hadoop2.7
Hello, first of all, I don't know how to make a pull request on github, thats why I am writing here (sorry).
Are you able to install the package patchwork from github? it can be found here: https://github.com/thomasp85/patchwork
I also wonder if you can update the cowplot package? It was updatad three months ago on CRAN but Kaggle uses the older version.
Thanks before hand!
Thank you for all the hard work (^_^)
Please add ClusterR
package to R. Thanks!
I'm new to Kaggle, so I'm not sure if here is the right place to submit requests for adding new R packages to Kaggle Kernel. Sorry if I have submitted it in the wrong place.
The package I would like to be included is:
https://cran.r-project.org/web/packages/TDA/index.html
Thanks a lot!
Modeltime is heavily needed for timeseries forecasting.
It depends on almost 90 packages ... however, most of the packages installment fail!
Hi,
I am sharing my Kaggle Profile. Please install "gifski" and "ggalluvial". From my account I can install above two packages and run my code. In each session I am supposed to do the same to compile my code. After sharing the error free code to public mode I am getting error regarding the loading of packages.
https://www.kaggle.com/krishm/nfl-1st-and-future-analytics-simulation
Your early reply is highly appreciated.
Thanks
It looks like the r docker image is using an older version of h2o. When using the h2o.deeplearning()
function it produces an error
label: unnamed-chunk-5
java version "1.7.0_91"
OpenJDK Runtime Environment (IcedTea 2.6.2) (7u91-2.6.2-1)
OpenJDK 64-Bit Server VM (build 24.91-b01, mixed mode)
Quitting from lines 49-91 (script.Rmd)
Error:
unexpected argument "data", is this legacy code? Try ?h2o.shim
Execution halted
Using h2o.randomForest()
works fine. Below is the code I am running in Kaggle RMarkdown Notebook with data already read in:
library(h2o)
## start a local cluster
localH2O = h2o.init(max_mem_size = '6g', # use 6GB of RAM of 8GB available on Kaggle
nthreads = -1) # use all CPUs (8 on my personal computer :3)
## import MNIST data as H2O
train_h2o = as.h2o(localH2O,train)
test_h2o = as.h2o(localH2O,test)
## set timer
s <- proc.time()
## train model
model =
h2o.deeplearning(x = 2:785, # column numbers for predictors
y = 1, # column number for label
data = train_h2o, # data in H2O format
activation = "RectifierWithDropout", # algorithm
input_dropout_ratio = 0.2, # % of inputs dropout
hidden_dropout_ratios = c(0.5,0.5), # % for nodes dropout
balance_classes = TRUE,
hidden = c(100,100), # two layers of 100 nodes
momentum_stable = 0.99,
nesterov_accelerated_gradient = T, # use it for speed
epochs = 10) # max. no. of epochs
P.S.
The link to R Docker image has a trailing ")" and, therefore, is broken.
Please, add treesnip
package (https://curso-r.github.io/treesnip/) to kaggle notebooks.
platform x86_64-pc-linux-gnu
arch x86_64
os linux-gnu
system x86_64, linux-gnu
status
major 3
minor 1.1
year 2014
month 07
day 10
svn rev 66115
language R
version.string R version 3.1.1 (2014-07-10)
nickname Sock it to Me
Should be upgraded to latest version.
To whom may it concern,
Hi, I am using torch package on kaggle notebook but I found that the installed version is little behind.
packageVersion("torch")
shows that the installed version is v0.3.0 but the latest version is 0.6.0. Could you update the torch package on the docker file?
Sincerely,
Issac
I would appreciate it if you would install fable and forecastLM packages. Thanks in advance.
I've been running docker run --rm -it kaggle/rstats
for a two days now (internet is slightly slow) but i'e got all the parts bt there's a file f0b24ff7f2aa that is currently at 6GB and doesnt show how much is left.
can someone please inform me on the maximum size of that file please/
Would it be possible to add
https://cran.r-project.org/web/packages/dagitty/index.html
https://cran.r-project.org/web/packages/ggdag/index.html
These packages are used often in causal inference research. Thanks so much!
Hi!
Would it possible to update the TSstudio R package to version 0.1.1?
Thank you in advance,
Rami
I'm trying to build a docker image from your Dockerfile and getting this error:
Step 6/15 : ADD patches/ /tmp/patches/
lstat patches/: no such file or directory
There's no patches
dir in your repository, or should I create it manually and leave empty?
is there any way of resuming ocker run --rm -it kaggle/rstats
on connection failure...
OR, can our good providers help us with a better way of downloading the files from here bit by bit. some of us have extremely spotty connections. my download failed after i had finished downloading about 10GB+. had to think about either starting ove or giving up on this.
Hi,
Please add the naryn
R package. It is a toolkit for medical records data analysis. It implements an efficient data structure for storing and querying medical records, which can be useful in many competitions.
It is avaliable on CRAN but not on the kaggle notebooks:
https://cran.r-project.org/web/packages/naryn/index.html
Thank you very much
Hi,
I was trying to install the following package from github : https://github.com/ja-thomas/autoxgboost
It is not a CRAN package, but the build does not move so much. Anyway I run into the following problem :
devtools::install_github("ja-thomas/autoxgboost", dependencies = TRUE)
Installing 11 packages: digest, R6, waldo, testthat, generics, data.table, RcppArmadillo, cmaes, DiceKriging, mlrCPO, mlrMBO
Installing packages into ‘/usr/local/lib/R/site-library’
(as ‘lib’ is unspecified)Error: Failed to install 'autoxgboost' from GitHub:
(converted from warning) installation of package ‘R6’ had non-zero exit status
Traceback:
- devtools::install_github("ja-thomas/autoxgboost", dependencies = TRUE)
- pkgbuild::with_build_tools({
. ellipsis::check_dots_used(action = getOption("devtools.ellipsis_action",
. rlang::warn))
. {
. remotes <- lapply(repo, github_remote, ref = ref, subdir = subdir,
. auth_token = auth_token, host = host)
. install_remotes(remotes, auth_token = auth_token, host = host,
. dependencies = dependencies, upgrade = upgrade, force = force,
. quiet = quiet, build = build, build_opts = build_opts,
. build_manual = build_manual, build_vignettes = build_vignettes,
. repos = repos, type = type, ...)
. }
. }, required = FALSE)- install_remotes(remotes, auth_token = auth_token, host = host,
. dependencies = dependencies, upgrade = upgrade, force = force,
. quiet = quiet, build = build, build_opts = build_opts, build_manual = build_manual,
. build_vignettes = build_vignettes, repos = repos, type = type,
. ...)- tryCatch(res[[i]] <- install_remote(remotes[[i]], ...), error = function(e) {
. stop(remote_install_error(remotes[[i]], e))
. })- tryCatchList(expr, classes, parentenv, handlers)
- tryCatchOne(expr, names, parentenv, handlers[[1L]])
- value[3L]
And :
install.packages("R6", dependencies = TRUE, verbose = TRUE)
Installing package into ‘/usr/local/lib/R/site-library’
(as ‘lib’ is unspecified)system (cmd0): /usr/local/lib/R/bin/R CMD INSTALL
foundpkgs: R6, /tmp/RtmpNMSWAV/downloaded_packages/R6_2.5.0.tar.gz
files: /tmp/RtmpNMSWAV/downloaded_packages/R6_2.5.0.tar.gz
Warning message in install.packages("R6", dependencies = TRUE, verbose = TRUE):
“installation of package ‘R6’ had non-zero exit status”
I am a bit surprised by the dryness of the R error message which does not say a lot about this.
Do you see a way to solve this ?
Hi, I tried to update the following package on kaggle with the command:
devtools :: update_packages ("mlr3verse", upgrade = "always", dependencies = TRUE)
Unfortunately it doesn't work. Can you help me please?
The packages on Kaggle are very old:
mlr3verse (0.1.1 -> 0.2.1 ) [CRAN]
lgr (0.3.4 -> 0.4.2 ) [CRAN]
mlr3misc (0.2.0 -> 0.9.1 ) [CRAN]
paradox (0.2.0 -> 0.7.1 ) [CRAN]
future.apply (1.5.0 -> 1.7.0 ) [CRAN]
mlr3measures (0.1.3 -> 0.3.1 ) [CRAN]
parallelly (NA -> 1.25.0) [CRAN]
palmerpen... (NA -> 0.1.0 ) [CRAN]
future (1.17.0 -> 1.21.0) [CRAN]
globals (0.12.5 -> 0.14.0) [CRAN]
clue (0.3-57 -> 0.3-59) [CRAN]
mlr3 (0.2.0 -> 0.11.0) [CRAN]
bbotk (NA -> 0.3.2 ) [CRAN]
mlr3pipel... (0.1.3 -> 0.3.4 ) [CRAN]
distr6 (NA -> 1.5.2 ) [CRAN]
R62S3 (NA -> 1.4.1 ) [CRAN]
set6 (NA -> 0.2.1 ) [CRAN]
rlang (0.4.10 -> 0.4.11) [CRAN]
tibble (3.1.1 -> 3.1.2 ) [CRAN]
ellipsis (0.3.1 -> 0.3.2 ) [CRAN]
fansi (0.4.2 -> 0.5.0 ) [CRAN]
pillar (1.6.0 -> 1.6.1 ) [CRAN]
vctrs (0.3.7 -> 0.3.8 ) [CRAN]
colorspace (2.0-0 -> 2.0-1 ) [CRAN]
cli (2.4.0 -> 2.5.0 ) [CRAN]
mlr3cluster (NA -> 0.1.1 ) [CRAN]
mlr3data (NA -> 0.3.1 ) [CRAN]
mlr3filters (0.2.0 -> 0.4.1 ) [CRAN]
mlr3fselect (NA -> 0.5.1 ) [CRAN]
mlr3learners (0.2.0 -> 0.4.5 ) [CRAN]
mlr3proba (NA -> 0.4.0 ) [CRAN]
mlr3tuning (0.1.2 -> 0.8.0 ) [CRAN]
mlr3viz (0.1.1 -> 0.5.3 ) [CRAN]
Hi there.
Using h2o4gpu package (which comes already installed in the image ) it gives the following error.
Error: Python module h2o4gpu was not found.
Thanks a lot for the very useful Docker image +++
library(h2o4gpu)
x <- iris[1:4]
y <- as.integer(iris$Species)
model <- h2o4gpu.random_forest_classifier() %>% fit(x, y)
is there any way of resuming ocker run --rm -it kaggle/rstats
on connection failure...
OR, can our good providers help us with a better way of downloading the files from here bit by bit. some of us have extremely spotty connections. my download failed after i had finished downloading about 10GB+. had to think about either starting over or giving up on this.
Today I'm facing the following error when using reticulate
. This did not happen just yesterday.
Error when importing sklearn.mixture
import("sklearn.mixture")
Error in py_module_import(module, convert = convert): ImportError: /usr/lib/x86_64-linux-gnu/libstdc++.so.6: version `GLIBCXX_3.4.30' not found (required by /root/.local/share/r-miniconda/envs/r-reticulate/lib/python3.8/site-packages/scipy/optimize/_highs/_highs_wrapper.cpython-38-x86_64-linux-gnu.so)
Traceback:
1. import("sklearn.mixture")
2. py_module_import(module, convert = convert)
caretEnsemble is no longer available to use, but is available on cran, potentially an issue with rlang?
Is it possible to get the latest version of R within kaggle?
Hi. I wonder why TF is 2.2 in Python docker but not in R?
Line 61 in 2559df8
I tried to reinstall several times via tensorflow::install_tensorflow(version='2.2')
. However it did not work. Could you help?
Showing up this warning on kaggle notebook : Installing package into ‘/usr/local/lib/R/site-library’
(as ‘lib’ is unspecified)
also installing the dependency ‘mrds’
Warning message in install.packages("Distance"):
“installation of package ‘mrds’ had non-zero exit status”
Warning message in install.packages("Distance"):
“installation of package ‘Distance’ had non-zero exit status”
Then I Installing package into ‘/usr/local/lib/R/site-library’
(as ‘lib’ is unspecified)also installing the dependency ‘mrds’
Warning message in install.packages("Distance"):
“installation of package ‘mrds’ had non-zero exit status”
Warning message in install.packages("Distance"):
“installation of package ‘Distance’ had non-zero exit status”tried this way , Warning message in install.packages("Distance", dependencies = FALSE):
“installation of package ‘Distance’ had non-zero exit status”.
It still continue to show non-zero exit status.
Can you please add this package. I have also tried to installl it without dependencies, but nothing really worked.
Last release was on September 28th 2018.
Having the latest packages would be helpful to our users. See #40
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.