Code Monkey home page Code Monkey logo

Comments (8)

Laurae2 avatar Laurae2 commented on May 18, 2024

I think the example is outdated, not sure, I have to check that. I am going to add the code below as the example in my next push.

Try this (change the working directory on line 8 and LGBM on line 32):

library(Laurae)
library(stringi)
library(Matrix)
library(sparsity)
library(data.table)

remove(list = ls()) # WARNING: CLEANS EVERYTHING IN THE ENVIRONMENT
setwd("D:/Data Science/HousePrices") # CHANGE THIS TO WHATEVER TEMPORARY DIRECTORY WHERE YOU WANT TEMPORARY FILES

DT <- data.table(Split1 = c(rep(0, 50), rep(1, 50)), Split2 = rep(c(rep(0, 25), rep(0.5, 25)), 2))
DT$Split3 <- rep(c(rep(0, 10), rep(0.25, 15)), 4)
DT$Split4 <- rep(c(rep(0, 5), rep(0.1, 5), rep(0, 5), rep(0.1, 10)), 4)
DT$Split5 <- rep(c(rep(0, 5), rep(0.05, 5), rep(0, 10), rep(0.05, 5)), 4)
label <- c(rep(0, 25), rep(1, 25), rep(0, 25), rep(1, 25))
label <- as.numeric((DT$Split2 == 0) & (DT$Split1 == 0) & (DT$Split3 == 0))
label <- as.numeric((DT$Split2 == 0) & (DT$Split1 == 0) & (DT$Split3 == 0) & (DT$Split4 == 0) | ((DT$Split2 == 0.5) & (DT$Split1 == 1) & (DT$Split3 == 0.25) & (DT$Split4 == 0.1) & (DT$Split5 == 0)) | ((DT$Split1 == 0) & (DT$Split2 == 0.5)))

trained <- lgbm.cv(y_train = label,
                   x_train = DT,
                   bias_train = NA,
                   folds = 5,
                   unicity = TRUE,
                   application = "binary",
                   num_iterations = 1,
                   early_stopping_rounds = 1,
                   learning_rate = 5,
                   num_leaves = 16,
                   min_data_in_leaf = 1,
                   min_sum_hessian_in_leaf = 1,
                   tree_learner = "serial",
                   num_threads = 1,
                   lgbm_path = "C:/xgboost/LightGBM/windows/x64/Release/lightgbm.exe",
                   workingdir = file.path(getwd()),
                   validation = FALSE,
                   files_exist = FALSE,
                   verbose = TRUE,
                   is_training_metric = TRUE,
                   save_binary = TRUE,
                   metric = "binary_logloss")

str(trained)

I am getting this output:

***************  
Fold no:  1 / 5  
***************  
Using LightGBM path: C:/xgboost/LightGBM/windows/x64/Release/lightgbm.exe  
Working directory of LightGBM: D:/Data Science/HousePrices/temp  
Training configuration file saved to: D:/Data Science/HousePrices/temp/lgbm_train.conf  
Saving train data (data.table) file to: D:/Data Science/HousePrices/temp/lgbm_train.csv  
No list columns are present. Setting sep2='' otherwise quote='auto' would quote fields containing sep2.
maxLineLen=24 from sample. Found in 0.000s
Writing column names ... done in 0.000s
Writing 80 rows in 1 batches of 80 rows (each buffer size 8MB, showProgress=1, nth=1) ... done (actual nth=1, anyBufferGrown=no, maxBuffUsed=0%)
Saving validation data (data.table) file to: D:/Data Science/HousePrices/temp/lgbm_val.csv  
No list columns are present. Setting sep2='' otherwise quote='auto' would quote fields containing sep2.
maxLineLen=24 from sample. Found in 0.000s
Writing column names ... done in 0.000s
Writing 20 rows in 1 batches of 20 rows (each buffer size 8MB, showProgress=1, nth=1) ... done (actual nth=1, anyBufferGrown=no, maxBuffUsed=0%)
Starting to work on model as of Sat Dec 10 2016 10:25:44 PM  
[LightGBM] [Info] Loading parameters .. finished
[LightGBM] [Info] Loading data set from binary file
[LightGBM] [Info] Finish loading data, use 0.000138 seconds
[LightGBM] [Info] Number of postive:27,  number of negative:53
[LightGBM] [Info] Number of data:80, Number of features:5
[LightGBM] [Info] Finish training initilization.
[LightGBM] [Info] Start train
[LightGBM] [Info] cannot find more split with gain = 0.000000 , current #leaves=8
[LightGBM] [Info] Iteration:1, training's log loss: 0.000045
[LightGBM] [Info] 0.000052 seconds elapsed, finished 1 iteration
[LightGBM] [Info] Finish train
Model completed, results saved in D:/Data Science/HousePrices/temp  
[LightGBM] [Info] Loading parameters .. finished
[LightGBM] [Info] 1 models has been loaded

[LightGBM] [Info] Finish predict initilization.
[LightGBM] [Info] Start prediction for data D:/Data Science/HousePrices/temp/lgbm_val.csv without label
[LightGBM] [Info] Finish predict.
Ended to work on model as of Sat Dec 10 2016 10:25:45 PM  
  
  
***************  
Fold no:  2 / 5  
***************  
Using LightGBM path: C:/xgboost/LightGBM/windows/x64/Release/lightgbm.exe  
Working directory of LightGBM: D:/Data Science/HousePrices/temp  
Training configuration file saved to: D:/Data Science/HousePrices/temp/lgbm_train.conf  
Saving train data (data.table) file to: D:/Data Science/HousePrices/temp/lgbm_train.csv  
No list columns are present. Setting sep2='' otherwise quote='auto' would quote fields containing sep2.
maxLineLen=24 from sample. Found in 0.000s
Writing column names ... done in 0.000s
Writing 80 rows in 1 batches of 80 rows (each buffer size 8MB, showProgress=1, nth=1) ... done (actual nth=1, anyBufferGrown=no, maxBuffUsed=0%)
Saving validation data (data.table) file to: D:/Data Science/HousePrices/temp/lgbm_val.csv  
No list columns are present. Setting sep2='' otherwise quote='auto' would quote fields containing sep2.
maxLineLen=24 from sample. Found in 0.000s
Writing column names ... done in 0.000s
Writing 20 rows in 1 batches of 20 rows (each buffer size 8MB, showProgress=1, nth=1) ... done (actual nth=1, anyBufferGrown=no, maxBuffUsed=0%)
Starting to work on model as of Sat Dec 10 2016 10:25:45 PM  
[LightGBM] [Info] Loading parameters .. finished
[LightGBM] [Info] Loading data set from binary file
[LightGBM] [Info] Finish loading data, use 0.000140 seconds
[LightGBM] [Info] Number of postive:27,  number of negative:53
[LightGBM] [Info] Number of data:80, Number of features:5
[LightGBM] [Info] Finish training initilization.
[LightGBM] [Info] Start train
[LightGBM] [Info] cannot find more split with gain = 0.000000 , current #leaves=8
[LightGBM] [Info] Iteration:1, training's log loss: 0.000045
[LightGBM] [Info] 0.000076 seconds elapsed, finished 1 iteration
[LightGBM] [Info] Finish train
Model completed, results saved in D:/Data Science/HousePrices/temp  
[LightGBM] [Info] Loading parameters .. finished
[LightGBM] [Info] 1 models has been loaded

[LightGBM] [Info] Finish predict initilization.
[LightGBM] [Info] Start prediction for data D:/Data Science/HousePrices/temp/lgbm_val.csv without label
[LightGBM] [Info] Finish predict.
Ended to work on model as of Sat Dec 10 2016 10:25:46 PM  
  
  
***************  
Fold no:  3 / 5  
***************  
Using LightGBM path: C:/xgboost/LightGBM/windows/x64/Release/lightgbm.exe  
Working directory of LightGBM: D:/Data Science/HousePrices/temp  
Training configuration file saved to: D:/Data Science/HousePrices/temp/lgbm_train.conf  
Saving train data (data.table) file to: D:/Data Science/HousePrices/temp/lgbm_train.csv  
No list columns are present. Setting sep2='' otherwise quote='auto' would quote fields containing sep2.
maxLineLen=24 from sample. Found in 0.000s
Writing column names ... done in 0.000s
Writing 80 rows in 1 batches of 80 rows (each buffer size 8MB, showProgress=1, nth=1) ... done (actual nth=1, anyBufferGrown=no, maxBuffUsed=0%)
Saving validation data (data.table) file to: D:/Data Science/HousePrices/temp/lgbm_val.csv  
No list columns are present. Setting sep2='' otherwise quote='auto' would quote fields containing sep2.
maxLineLen=24 from sample. Found in 0.000s
Writing column names ... done in 0.000s
Writing 20 rows in 1 batches of 20 rows (each buffer size 8MB, showProgress=1, nth=1) ... done (actual nth=1, anyBufferGrown=no, maxBuffUsed=0%)
Starting to work on model as of Sat Dec 10 2016 10:25:47 PM  
[LightGBM] [Info] Loading parameters .. finished
[LightGBM] [Info] Loading data set from binary file
[LightGBM] [Info] Finish loading data, use 0.000151 seconds
[LightGBM] [Info] Number of postive:27,  number of negative:53
[LightGBM] [Info] Number of data:80, Number of features:5
[LightGBM] [Info] Finish training initilization.
[LightGBM] [Info] Start train
[LightGBM] [Info] cannot find more split with gain = 0.000000 , current #leaves=8
[LightGBM] [Info] Iteration:1, training's log loss: 0.000045
[LightGBM] [Info] 0.000050 seconds elapsed, finished 1 iteration
[LightGBM] [Info] Finish train
Model completed, results saved in D:/Data Science/HousePrices/temp  
[LightGBM] [Info] Loading parameters .. finished
[LightGBM] [Info] 1 models has been loaded

[LightGBM] [Info] Finish predict initilization.
[LightGBM] [Info] Start prediction for data D:/Data Science/HousePrices/temp/lgbm_val.csv without label
[LightGBM] [Info] Finish predict.
Ended to work on model as of Sat Dec 10 2016 10:25:48 PM  
  
  
***************  
Fold no:  4 / 5  
***************  
Using LightGBM path: C:/xgboost/LightGBM/windows/x64/Release/lightgbm.exe  
Working directory of LightGBM: D:/Data Science/HousePrices/temp  
Training configuration file saved to: D:/Data Science/HousePrices/temp/lgbm_train.conf  
Saving train data (data.table) file to: D:/Data Science/HousePrices/temp/lgbm_train.csv  
No list columns are present. Setting sep2='' otherwise quote='auto' would quote fields containing sep2.
maxLineLen=24 from sample. Found in 0.000s
Writing column names ... done in 0.000s
Writing 80 rows in 1 batches of 80 rows (each buffer size 8MB, showProgress=1, nth=1) ... done (actual nth=1, anyBufferGrown=no, maxBuffUsed=0%)
Saving validation data (data.table) file to: D:/Data Science/HousePrices/temp/lgbm_val.csv  
No list columns are present. Setting sep2='' otherwise quote='auto' would quote fields containing sep2.
maxLineLen=24 from sample. Found in 0.000s
Writing column names ... done in 0.000s
Writing 20 rows in 1 batches of 20 rows (each buffer size 8MB, showProgress=1, nth=1) ... done (actual nth=1, anyBufferGrown=no, maxBuffUsed=0%)
Starting to work on model as of Sat Dec 10 2016 10:25:48 PM  
[LightGBM] [Info] Loading parameters .. finished
[LightGBM] [Info] Loading data set from binary file
[LightGBM] [Info] Finish loading data, use 0.000135 seconds
[LightGBM] [Info] Number of postive:27,  number of negative:53
[LightGBM] [Info] Number of data:80, Number of features:5
[LightGBM] [Info] Finish training initilization.
[LightGBM] [Info] Start train
[LightGBM] [Info] cannot find more split with gain = 0.000000 , current #leaves=8
[LightGBM] [Info] Iteration:1, training's log loss: 0.000045
[LightGBM] [Info] 0.000070 seconds elapsed, finished 1 iteration
[LightGBM] [Info] Finish train
Model completed, results saved in D:/Data Science/HousePrices/temp  
[LightGBM] [Info] Loading parameters .. finished
[LightGBM] [Info] 1 models has been loaded

[LightGBM] [Info] Finish predict initilization.
[LightGBM] [Info] Start prediction for data D:/Data Science/HousePrices/temp/lgbm_val.csv without label
[LightGBM] [Info] Finish predict.
Ended to work on model as of Sat Dec 10 2016 10:25:49 PM  
  
  
***************  
Fold no:  5 / 5  
***************  
Using LightGBM path: C:/xgboost/LightGBM/windows/x64/Release/lightgbm.exe  
Working directory of LightGBM: D:/Data Science/HousePrices/temp  
Training configuration file saved to: D:/Data Science/HousePrices/temp/lgbm_train.conf  
Saving train data (data.table) file to: D:/Data Science/HousePrices/temp/lgbm_train.csv  
No list columns are present. Setting sep2='' otherwise quote='auto' would quote fields containing sep2.
maxLineLen=24 from sample. Found in 0.000s
Writing column names ... done in 0.000s
Writing 80 rows in 1 batches of 80 rows (each buffer size 8MB, showProgress=1, nth=1) ... done (actual nth=1, anyBufferGrown=no, maxBuffUsed=0%)
Saving validation data (data.table) file to: D:/Data Science/HousePrices/temp/lgbm_val.csv  
No list columns are present. Setting sep2='' otherwise quote='auto' would quote fields containing sep2.
maxLineLen=24 from sample. Found in 0.000s
Writing column names ... done in 0.000s
Writing 20 rows in 1 batches of 20 rows (each buffer size 8MB, showProgress=1, nth=1) ... done (actual nth=1, anyBufferGrown=no, maxBuffUsed=0%)
Starting to work on model as of Sat Dec 10 2016 10:25:49 PM  
[LightGBM] [Info] Loading parameters .. finished
[LightGBM] [Info] Loading data set from binary file
[LightGBM] [Info] Finish loading data, use 0.000138 seconds
[LightGBM] [Info] Number of postive:27,  number of negative:53
[LightGBM] [Info] Number of data:80, Number of features:5
[LightGBM] [Info] Finish training initilization.
[LightGBM] [Info] Start train
[LightGBM] [Info] cannot find more split with gain = 0.000000 , current #leaves=8
[LightGBM] [Info] Iteration:1, training's log loss: 0.000045
[LightGBM] [Info] 0.000055 seconds elapsed, finished 1 iteration
[LightGBM] [Info] Finish train
Model completed, results saved in D:/Data Science/HousePrices/temp  
[LightGBM] [Info] Loading parameters .. finished
[LightGBM] [Info] 1 models has been loaded

[LightGBM] [Info] Finish predict initilization.
[LightGBM] [Info] Start prediction for data D:/Data Science/HousePrices/temp/lgbm_val.csv without label
[LightGBM] [Info] Finish predict.
Ended to work on model as of Sat Dec 10 2016 10:25:50 PM

and

List of 3
 $ Models    :List of 5
  ..$ 1:List of 8
  .. ..$ Model     : chr [1:14] "max_feature_idx=-1" "sigmoid=1" "" "Tree=0" ...
  .. ..$ Path      : chr "D:/Data Science/HousePrices/temp"
  .. ..$ Name      : chr "lgbm_model.txt"
  .. ..$ lgbm      : chr "C:/xgboost/LightGBM/windows/x64/Release/lightgbm.exe"
  .. ..$ Train     : chr "lgbm_train.csv"
  .. ..$ Valid     : chr "lgbm_val.csv"
  .. ..$ Test      : logi NA
  .. ..$ Validation: num [1:20] 1 1 1 1 1 ...
  ..$ 2:List of 8
  .. ..$ Model     : chr [1:14] "max_feature_idx=-1" "sigmoid=1" "" "Tree=0" ...
  .. ..$ Path      : chr "D:/Data Science/HousePrices/temp"
  .. ..$ Name      : chr "lgbm_model.txt"
  .. ..$ lgbm      : chr "C:/xgboost/LightGBM/windows/x64/Release/lightgbm.exe"
  .. ..$ Train     : chr "lgbm_train.csv"
  .. ..$ Valid     : chr "lgbm_val.csv"
  .. ..$ Test      : logi NA
  .. ..$ Validation: num [1:20] 1 1 1 1 1 ...
  ..$ 3:List of 8
  .. ..$ Model     : chr [1:14] "max_feature_idx=-1" "sigmoid=1" "" "Tree=0" ...
  .. ..$ Path      : chr "D:/Data Science/HousePrices/temp"
  .. ..$ Name      : chr "lgbm_model.txt"
  .. ..$ lgbm      : chr "C:/xgboost/LightGBM/windows/x64/Release/lightgbm.exe"
  .. ..$ Train     : chr "lgbm_train.csv"
  .. ..$ Valid     : chr "lgbm_val.csv"
  .. ..$ Test      : logi NA
  .. ..$ Validation: num [1:20] 1 1 1 1 1 ...
  ..$ 4:List of 8
  .. ..$ Model     : chr [1:14] "max_feature_idx=-1" "sigmoid=1" "" "Tree=0" ...
  .. ..$ Path      : chr "D:/Data Science/HousePrices/temp"
  .. ..$ Name      : chr "lgbm_model.txt"
  .. ..$ lgbm      : chr "C:/xgboost/LightGBM/windows/x64/Release/lightgbm.exe"
  .. ..$ Train     : chr "lgbm_train.csv"
  .. ..$ Valid     : chr "lgbm_val.csv"
  .. ..$ Test      : logi NA
  .. ..$ Validation: num [1:20] 1 1 1 1 1 ...
  ..$ 5:List of 8
  .. ..$ Model     : chr [1:14] "max_feature_idx=-1" "sigmoid=1" "" "Tree=0" ...
  .. ..$ Path      : chr "D:/Data Science/HousePrices/temp"
  .. ..$ Name      : chr "lgbm_model.txt"
  .. ..$ lgbm      : chr "C:/xgboost/LightGBM/windows/x64/Release/lightgbm.exe"
  .. ..$ Train     : chr "lgbm_train.csv"
  .. ..$ Valid     : chr "lgbm_val.csv"
  .. ..$ Test      : logi NA
  .. ..$ Validation: num [1:20] 1 1 1 1 1 ...
 $ Validation:List of 2
  ..$ : num [1:100] 1 1 1 1 1 ...
  ..$ :List of 5
  .. ..$ : num [1:20] 1 1 1 1 1 ...
  .. ..$ : num [1:20] 1 1 1 1 1 ...
  .. ..$ : num [1:20] 1 1 1 1 1 ...
  .. ..$ : num [1:20] 1 1 1 1 1 ...
  .. ..$ : num [1:20] 1 1 1 1 1 ...
 $ Weights   : num [1:5] 0.2 0.2 0.2 0.2 0.2

from laurae.

zachmayer avatar zachmayer commented on May 18, 2024

Thanks!

from laurae.

zachmayer avatar zachmayer commented on May 18, 2024

(You can close this if you want or leave it open)

from laurae.

zachmayer avatar zachmayer commented on May 18, 2024

Another (potentially silly) question: If I followed the installation guide in the readme for linux, what might my lightgbm path be?

from laurae.

Laurae2 avatar Laurae2 commented on May 18, 2024

I fixed the LightGBM functions' documentation in commit @4fe8e2b35acabbe8979cd3181dca8f004a03ee38.

Another (potentially silly) question: If I followed the installation guide in the readme for linux, what might my lightgbm path be?

Your LightGBM should be on the same directory as your LightGBM download.

You can find out where it has been compiled using this on your LightGBM path:

ls -d */

If you installed in a folder named "(...)/LightGBM" path, it should the lgbm_path should be "(...)/LightGBM/lightgbm" (unless my memory is wrong - it must create the executable in the root directory of the folder - you do not need to specify the extension, the shell takes automatically care of it).

from laurae.

zachmayer avatar zachmayer commented on May 18, 2024

I didn't even have lightgbm installed! lol. So for future reference, this error means lightgbm isn't installed, or you're pointing at the wrong path:

Error in outputs[["Models"]][[i]][["Validation"]] : 
  subscript out of bounds

from laurae.

mik3hall avatar mik3hall commented on May 18, 2024

I also got this by omitting the path.

***************  
Fold no:  1 / 5  
***************  
Error in outputs[["Models"]][[i]][["Validation"]] : 
  subscript out of bounds

I installed on OS X as shown here...
cannot install lightgbm in R with devtools on macOS
Doing the R install as shown there with...
R CMD INSTALL --build . --no-multiarch
I believe this installs to the default R package location as shown by...

> .libPaths()
[1] "/Library/Frameworks/R.framework/Versions/3.4/Resources/library"
system("ls -l /Library/Frameworks/R.framework/Versions/3.4/Resources/library/lightgbm")
total 32
-rw-rw-r--   1 mjh  admin  2027 Jun 23 17:48 DESCRIPTION
-rw-rw-r--   1 mjh  admin  2044 Jun 23 17:50 INDEX

Might it be possible to make the .libPaths() location the default path?
I just tried...
lgbm_path = '/Library/Frameworks/R.framework/Versions/3.4/Resources/library',
and got...

***************  
Fold no:  1 / 5  
***************  
done (actual nth=1, anyBufferGrown=no, maxBuffUsed=35%)                                                                              
Saving validation data (data.table) file to: /Users/mjh/ml/kaggle/HomeCredit/code/lgbm_val_1.csv  
No list columns are present. Setting sep2='' otherwise quote='auto' would quote fields containing sep2.
Column writers: 3 12 12 12 12 3 5 5 5 5 12 12 12 12 12 5 3 5 5 3 5 3 3 12 5 3 3 12 3 12 ... 5 5 5 5 3 5 5 5 5 5 
maxLineLen=1559 from sample. Found in 0.016s
Writing column names ... done in 0.000s
Writing 61502 rows in 23 batches of 2690 rows (each buffer size 8MB, showProgress=1, nth=1) ... done (actual nth=1, anyBufferGrown=no, maxBuffUsed=35%)
Starting to work on model as of Tue Jun 26 2018 08:46:11  
/bin/sh: /Library/Frameworks/R.framework/Versions/3.4/Resources/library: is a directory
Model completed, results saved in /Users/mjh/ml/kaggle/HomeCredit/code  
Error in file(con, "r") : cannot open the connection
In addition: Warning message:
In file(con, "r") :
  cannot open file '/Users/mjh/ml/kaggle/HomeCredit/code/lgbm_model_1.txt': No such file or directory

It successfully wrote the .conf and train_1.csv and val_1.csv files. I'm not sure waht the other errors are about where it appears to look for a /bin/sh type executable or has the connection failure with no model_1.txt.

from laurae.

tarunparmar avatar tarunparmar commented on May 18, 2024

The lgbm_path in mac was the location of unix executable that you build from source.
In my case I had it in my downloads folder so the lgbm_path value would be something like "/Downloads/LightGBM/lightgbm"

from laurae.

Related Issues (15)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.