Code Monkey home page Code Monkey logo

emma's People

Contributors

jjanborowka avatar mllg avatar okcze avatar pbiecek avatar woznicak avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

emma's Issues

[test no. 2] VIM_IRMI (PipeImpute)

Only two errors (internal as I remember well) occurred a few times.
On tasks: 3722, 29, 14954, 48:

Error in 1L:ncol(Y) : argument ma długość 0

and

INFO [18:15:01.269] Applying learner 'imput_VIM_IRMI.encodeimpact.classif.glmnet' on task 'Task 3561: profb (Supervised Classification)' (iter 1/5)
[1] "IRMI dont work on selcted params runing on defoult"
Error in VIM::irmi(df, imp_var = F) :
factor with less than 2 levels detected! - Overtime

[test no. 2] Amelia (PipeOpTaskPreproc)

  • test R script: script
  • Amelia (ver. Preproc) log: amelia log
  • successful usage: 3/10 tasks

INFO [14:57:48.084] Applying learner 'imput_Amelia.encodeimpact.classif.glmnet' on task 'Task 48: heart-c (Supervised Classification)' (iter 1/5)
Ostrzeżenie w poleceniu 'amelia.prep(x = x, m = m, idvars = idvars, empri = empri, ts = ts, ':
You have a small number of observations, relative to the number, of variables in the imputation model. Consider removing some variables, or reducing the order of time polynomials to reduce the number of parameters.

error: inv_sympd(): matrix is singular or not positive definite

error: inv_sympd(): matrix is singular or not positive definite

error: inv_sympd(): matrix is singular or not positive definite

error: inv_sympd(): matrix is singular or not positive definite

The resulting variance matrix was not invertible. Please check your data for highly collinear variables.

[test no. 2] VIM_regrImp (PipeOpTaskPreproc)

INFO [22:14:35.457] Applying learner 'imput_VIM_regrImp.encodeimpact.classif.glmnet' on task 'Task 3543: irish (Supervised Classification)' (iter 4/5)
Ostrzeżenie w poleceniu 'multinom(form, data[TFna, ])':
group ‘Senior_cycle_incomplete-secondary_school’ is empty
Ostrzeżenie w poleceniu 'multinom(form, data[TFna, ])':
group ‘Senior_cycle_incomplete-secondary_school’ is empty
Ostrzeżenie w poleceniu 'multinom(form, data[TFna, ])':
group ‘Senior_cycle_incomplete-secondary_school’ is empty
Ostrzeżenie w poleceniu 'multinom(form, data[TFna, ])':
group ‘Senior_cycle_incomplete-secondary_school’ is empty
character(0)
[1] "Error in doTryCatch(return(expr), name, parentenv, handler): \n"
Error in try({ :

[test no. 3] mice (PipeImpute)

Test version without preprocessing of datasets.
Test log: mice log

INFO [09:28:14.248] Applying learner 'imput_mice.encodeimpact.classif.glmnet' on task '3807' (iter 1/5)
Ostrzeżenie: Number of logged events: 1
INFO [09:28:15.452] Applying learner 'imput_mice.encodeimpact.classif.glmnet' on task '3807' (iter 2/5)
Ostrzeżenie: Number of logged events: 25
INFO [09:28:17.239] Applying learner 'imput_mice.encodeimpact.classif.glmnet' on task '3807' (iter 3/5)
INFO [09:28:18.542] Applying learner 'imput_mice.encodeimpact.classif.glmnet' on task '3807' (iter 4/5)
Ostrzeżenie: Number of logged events: 40
INFO [09:28:19.834] Applying learner 'imput_mice.encodeimpact.classif.glmnet' on task '3807' (iter 5/5)

of 5 iterations

  • Task: 3807
  • Learner: imput_mice.encodeimpact.classif.glmnet
  • Warnings: 0 in 0 iterations
  • Errors: 0 in 0 iterations
    PROBABLY LEFT MISSINGS AFTER IMPUTATION!

[test no. 2] softImpute (PipeOpTaskPreproc)

INFO [22:12:39.235] Applying learner 'imput_softImpute.encodeimpact.classif.glmnet' on task 'Task 3830: cars (Supervised Classification)' (iter 3/5)
Error : Processed output task during prediction of imput_softImpute does not match output task during training.

[test no. 2] missForest (PipeOpTaskPreproc)

INFO [15:07:05.868] Applying learner 'imput_missForest.encodeimpact.classif.glmnet' on task 'Task 3722: hungarian (Supervised Classification)' (iter 2/5)
Ostrzeżenie w poleceniu 'randomForest.default(x = obsX, y = obsY, ntree = ntree, mtry = mtry, ':
The response has five or fewer unique values. Are you sure you want to do regression?
Ostrzeżenie w poleceniu 'randomForest.default(x = obsX, y = obsY, ntree = ntree, mtry = mtry, ':
The response has five or fewer unique values. Are you sure you want to do regression?
Ostrzeżenie w poleceniu 'randomForest.default(x = obsX, y = obsY, ntree = ntree, mtry = mtry, ':
The response has five or fewer unique values. Are you sure you want to do regression?
Ostrzeżenie w poleceniu 'randomForest.default(x = obsX, y = obsY, ntree = ntree, mtry = mtry, ':
The response has five or fewer unique values. Are you sure you want to do regression?
Ostrzeżenie w poleceniu 'randomForest.default(x = obsX, y = obsY, ntree = ntree, mtry = mtry, ':
The response has five or fewer unique values. Are you sure you want to do regression?
Error in [.data.frame(final, , i) : nie wybrano kolumn

[test no. 2] VIM_HD (PipeImpute)

INFO [22:12:59.469] Applying learner 'imput_VIM_HD.encodeimpact.classif.glmnet' on task 'Task 3722: hungarian (Supervised Classification)' (iter 5/5)

of 5 iterations

  • Task: Task 3722: hungarian (Supervised Classification)
  • Learner: imput_VIM_HD.encodeimpact.classif.glmnet
  • Warnings: 0 in 0 iterations
  • Errors: 0 in 0 iterations
    PROBABLY LEFT MISSINGS AFTER IMPUTATION!

[test no. 2] Amelia (PipeOpTaskPreproc)

  • test R script: script
  • Amelia (ver. Preproc) log: amelia log
  • successful usage: 3/10 tasks

INFO [14:57:59.713] Applying learner 'imput_Amelia.encodeimpact.classif.glmnet' on task 'Task 3838: autos (Supervised Classification)' (iter 1/5)
Ostrzeżenie w poleceniu 'amcheck(x = x, m = m, idvars = numopts$idvars, priors = priors, ':

The number of categories in one of the variables marked nominal has greater than 10 categories. Check nominal specification.

Error in contrasts<-(*tmp*, value = contr.funs[1 + isOF[nn]]) :
contrasts can be applied only to factors with 2 or more levels

[test no. 2] missMDA_MCA_PCA_FMAD (PipeOpTaskPreproc)

INFO [11:51:44.076] Applying learner 'imput_missMDA_MCA_PCA_FMAD.encodeimpact.classif.glmnet' on task 'Task 3561: profb (Supervised Classification)' (iter 1/5)
[1] "Fail to estimate ncp"
Error in if (any(MM[[g]] < 0)) stop(paste("The algorithm fails to converge. Choose a number of components (ncp) less or equal than ", :
brakuje wartości tam, gdzie wymagane jest TRUE/FALSE

Comments after testing

Information

  • I am conducting the first tests taking single imputation pipe and building simple learning graph
    on 3 datasets with missings using factor encoding and glmnet. I save logs to separate files (EMMA_package/tests/logs),
    where full messages are written.
  • Below I will leave my comments about documentation and usage.
  • As passed datasets I understand the situation when the learner was trained and scored
    (despite possible warnings during imputation)

General remarks

  • I suggest adding a link to the documentation of imputation functions from their packages to each of Pipe wrappers.
  • Generally, Pipes produce a lot of output, warnings etc., from their native functions. For further development might be worth
    to consider hiding all of these native prints and replace it with own messages (for example: used method, successful or not, optimized or not, with verbose option to show or hide these output).
  • Parameters descriptions will be easier to understand if divided to sections (or anyway different distinguished) of parameters
    passed directly to native imputation functions, and these which are additional and created in EMMA for pipe control (optimize, out_file etc.)
  • to do after testing is complete

Detailed comments

  • Amelia:
    • passed df: PipeImput - 1/3, PipeProc - 2/3
    • probably more explanation for plytime /and splinetime params is needed or if they are irrelevant, maybe link to
      Amelia docs, anyway I couldn't figure out what they do from docs
    • in intercs param description there is mentioned optimize param which is not described, then some methods_random section which I don't know where to find
    • from our meetings, I remember that empir param is crucial, so maybe more description about it
    • misspellings in col_0_1 param
  • VIM_IRMI:
    • passed df: PipeImput - 2/3, PipeProc - 2/3
    • misspellings in col_0_1, force, step
  • missForest:
    • passed df: PipeImput - 1/3, PipeProc - 1/3
    • cores param has two different default values described
    • I guess that ntree_set and mtry_set are sets of values for these params optimization when optimize=TRUE.
      If so, I suggest describing that or describing that they are something different if I guessed wrong.
  • missMDA_PCA:
    • passed df: PipeImput - 1/2, PipeProc - 1/2
    • unclear for me are coeff.ridge and method because of the same description and not listed possible values
  • Mice:
    • passed df: PipeImput - 3/3, PipeProc - 3/3
    • low_corr and up_core are unclear for me if they work only when optimize or have a different impact
    • maybe add to set_corr that works also only when correlation=TRUE (as I guess, also I know that this is the default setup but I think it's worth mentioning)
  • Softimpute:
    • passed df: PipeImput - 2/3, PipeProc - 2/3
    • is lambda default set to 0? Just seems weird from the description
    • in type maybe more information about algorithms or link to softImpute docs
    • catFun similar as above, maybe set of more possible values or link
  • VIM_HD:
    • passed df: PipeImput - 3/3, PipeProc - 3/3
  • VIM_kNN:
    • passed df: PipeImput - 3/3, PipeProc - 3/3
    • similar for catFUN and numFUN, if some other options despite default are possible I would add more description
  • VIM_regrImp:
    • passed df: PipeImput - 0/3, PipeProc - 0/3
  • missMDA_MFA:
    • passed df: PipeImput - 0/2, PipeProc - 1/3
    • method might have a longer description or link
  • missRanger:
    • passed df: PipeImput - 3/3, PipeProc - 3/3

[test no. 2] VIM_IRMI (PipeOpTaskPreproc)

INFO [14:59:09.899] Applying learner 'imput_VIM_IRMI.encodeimpact.classif.glmnet' on task 'Task 29: credit-approval (Supervised Classification)' (iter 1/5)
...
Ostrzeżenie: glm.fit: fitted probabilities numerically 0 or 1 occurred
Ostrzeżenie w poleceniu 'multinom(form, data = x_reg, summ = 2, maxit = 50, trace = FALSE, ':
group ‘3’ is empty
Ostrzeżenie w poleceniu 'multinom(form, data = x_reg, summ = 2, maxit = 50, trace = FALSE, ':
groups ‘2’ ‘3’ are empty
[1] "IRMI dont work on selcted params runing on defoult"
Ostrzeżenie w poleceniu 'multinom(form, data = x_reg, summ = 2, maxit = 50, trace = FALSE, ':
groups ‘2’ ‘3’ are empty
Error in 1L:ncol(Y) : argument ma długość 0

[test no. 2] VIM_IRMI (PipeOpTaskPreproc)

INFO [14:59:36.674] Applying learner 'imput_VIM_IRMI.encodeimpact.classif.glmnet' on task 'Task 3561: profb (Supervised Classification)' (iter 1/5)
[1] "IRMI dont work on selcted params runing on defoult"
Error in VIM::irmi(df, imp_var = F) :
factor with less than 2 levels detected! - Overtime

[test no. 2] Amelia (PipeOpTaskPreproc)

  • test R script: script
  • Amelia (ver. Preproc) log: amelia log
  • successful usage: 3/10 tasks

INFO [14:57:37.314] Applying learner 'imput_Amelia.encodeimpact.classif.glmnet' on task 'Task 3722: hungarian (Supervised Classification)' (iter 1/5)
Amelia Error Code: 4
The data has a column that is completely missing or only has one,observation. Remove these columns: ca

[test no. 2] Amelia (PipeOpTaskPreproc)

  • test R script: script
  • Amelia (ver. Preproc) log: amelia log
  • successful usage: 3/10 tasks

INFO [14:57:45.406] Applying learner 'imput_Amelia.encodeimpact.classif.glmnet' on task 'Task 3830: cars (Supervised Classification)' (iter 1/5)
Amelia Error Code: 36
The number of categories in the nominal variable 'name' is greater than one-third of the observations.

[test no. 2] missRanger (PipeOpTaskPreproc)

INFO [22:15:13.400] Applying learner 'imput_missRanger.encodeimpact.classif.glmnet' on task 'Task 3722: hungarian (Supervised Classification)' (iter 3/5)
Ostrzeżenie: Dropped unused factor level(s) in dependent variable: 3.
Error in pmm(xtrain = fit$predictions, xtest = pred, ytrain = data[[v]][!v.na], :
zmienna sum(ok <- !is.na(xtrain) & !is.na(ytrain)) >= 1L nie ma wartości TRUE

[test no. 2] VIM_HD (PipeOpTaskPreproc)

INFO [22:12:57.002] Applying learner 'imput_VIM_HD.encodeimpact.classif.glmnet' on task 'Task 3722: hungarian (Supervised Classification)' (iter 1/5)
INFO [22:12:57.537] Applying learner 'imput_VIM_HD.encodeimpact.classif.glmnet' on task 'Task 3722: hungarian (Supervised Classification)' (iter 2/5)
INFO [22:12:58.449] Applying learner 'imput_VIM_HD.encodeimpact.classif.glmnet' on task 'Task 3722: hungarian (Supervised Classification)' (iter 3/5)
INFO [22:12:58.997] Applying learner 'imput_VIM_HD.encodeimpact.classif.glmnet' on task 'Task 3722: hungarian (Supervised Classification)' (iter 4/5)
INFO [22:12:59.469] Applying learner 'imput_VIM_HD.encodeimpact.classif.glmnet' on task 'Task 3722: hungarian (Supervised Classification)' (iter 5/5)

of 5 iterations

  • Task: Task 3722: hungarian (Supervised Classification)
  • Learner: imput_VIM_HD.encodeimpact.classif.glmnet
  • Warnings: 0 in 0 iterations
  • Errors: 0 in 0 iterations
    PROBABLY LEFT MISSINGS AFTER IMPUTATION!
  • Comment:
    • model prediction contained NA values, which usually occurs when missing values are present in a data frame (so probably were not imputed in our case)

[test no. 2] softImpute (PipeOpTaskPreproc)

INFO [22:12:48.043] Applying learner 'imput_softImpute.encodeimpact.classif.glmnet' on task 'Task 3675: pbc (Supervised Classification)' (iter 3/5)
Ostrzeżenie w poleceniu '[<-.factor(*tmp*, is.na(col_to_imp), value = "NA's")':
niepoprawny poziom czynnika, wygenerowano wartość NA
INFO [22:12:48.876] Applying learner 'imput_softImpute.encodeimpact.classif.glmnet' on task 'Task 3675: pbc (Supervised Classification)' (iter 4/5)
INFO [22:12:49.340] Applying learner 'imput_softImpute.encodeimpact.classif.glmnet' on task 'Task 3675: pbc (Supervised Classification)' (iter 5/5)

of 5 iterations

  • Task: Task 3675: pbc (Supervised Classification)
  • Learner: imput_softImpute.encodeimpact.classif.glmnet
  • Warnings: 0 in 0 iterations
  • Errors: 0 in 0 iterations
    PROBABLY LEFT MISSINGS AFTER IMPUTATION!
  • Comment:
    • model prediction contained NA values, which usually occurs when missing values are present in a data frame (so probably were not imputed in our case)

[test no. 3] VIM_IRMI (PipePreproc)

Test version without preprocessing of datasets.
Test log: IRMI

INFO [09:37:51.610] Applying learner 'imput_VIM_IRMI.encodeimpact.classif.glmnet' on task '3802' (iter 1/5)
[1] "IRMI dont work on selcted params runing on defoult"
Error in colnames(final) : nie znaleziono obiektu 'final'

[test no. 2] missMDA_MCA_PCA_FMAD (PipeOpTaskPreproc)

INFO [22:32:23.834] Applying learner 'imput_missMDA_MCA_PCA_FMAD.encodeimpact.classif.glmnet' on task 'Task 3722: hungarian (Supervised Classification)' (iter 1/5)
[1] "Fail to estimate ncp"
Error in eigen(crossprod(X, X), symmetric = TRUE) :
wartość nieskończona lub brakuje wartości w 'x'

[test no. 2] missForest (PipeOpTaskPreproc)

INFO [15:08:02.454] Applying learner 'imput_missForest.encodeimpact.classif.glmnet' on task 'Task 3561: profb (Supervised Classification)' (iter 1/5)
Error in [<-.data.frame(*tmp*, misi, res$varInd, value = structure(c(1L, :
zamiana ma 537 wierszy, dane mają 513

[test no. 2] mice (PipeOpTaskPreproc)

INFO [22:10:54.599] Applying learner 'imput_mice.encodeimpact.classif.glmnet' on task 'Task 3722: hungarian (Supervised Classification)' (iter 1/5)
Ostrzeżenie: Number of logged events: 51
Error in lognet(x, is.sparse, ix, jx, y, weights, offset, alpha, nobs, :
wartość NA/NaN/Inf w wywołaniu obcej funcji (argument 5)

[test no. 2] mice (PipeOpTaskPreproc)

INFO [22:11:41.559] Applying learner 'imput_mice.encodeimpact.classif.glmnet' on task 'Task 14954: cylinder-bands (Supervised Classification)' (iter 1/5)
Error in solve.default(xtx + diag(pen)) :
system jest obliczeniowo osobliwy: numer odwrotnego warunku = 1.96433e-16

[test no. 2] Amelia (PipeOpTaskPreproc)

  • test R script: script
  • Amelia (ver. Preproc) log: amelia log
  • successful usage: 3/10 tasks

INFO [14:57:46.758] Applying learner 'imput_Amelia.encodeimpact.classif.glmnet' on task 'Task 14954: cylinder-bands (Supervised Classification)' (iter 1/5)
Ostrzeżenie w poleceniu 'amcheck(x = x, m = m, idvars = numopts$idvars, priors = priors, ':

The number of categories in one of the variables marked nominal has greater than 10 categories. Check nominal specification.

Ostrzeżenie w poleceniu 'amcheck(x = x, m = m, idvars = numopts$idvars, priors = priors, ':

The number of categories in one of the variables marked nominal has greater than 10 categories. Check nominal specification.

Amelia Error Code: 43
You have a variable in your dataset that does not vary. Please remove this variable. Variables that do not vary: cylinder_division, ink_color

[test no. 2] VIM_IRMI (PipeOpTaskPreproc)

INFO [14:59:23.079] Applying learner 'imput_VIM_IRMI.encodeimpact.classif.glmnet' on task 'Task 3830: cars (Supervised Classification)' (iter 1/5)
...
Ostrzeżenie w poleceniu 'predict.lm(object, newdata, se.fit, scale = residual.scale, type = if (type == ':
prediction from a rank-deficient fit may be misleading
Error : Processed output task during prediction of imput_VIM_IRMI does not match output task during training.

[test no. 2] VIM_IRMI (PipeImpute)

INFO [15:05:12.329] Applying learner 'imput_VIM_IRMI.encodeimpact.classif.glmnet' on task 'Task 3838: autos (Supervised Classification)' (iter 1/5)
...
Ostrzeżenie: glm.fit: algorithm did not converge
Ostrzeżenie: glm.fit: fitted probabilities numerically 0 or 1 occurred
Error : Processed output task during prediction of imput_VIM_IRMI does not match output task during training.

[test no. 2] softImpute (PipeImpute)

INFO [22:12:39.235] Applying learner 'imput_softImpute.encodeimpact.classif.glmnet' on task 'Task 3830: cars (Supervised Classification)' (iter 3/5)
Error : Processed output task during prediction of imput_softImpute does not match output task during training.

[test no. 3] VIM_regrImp (PipeImpute)

Test version with preprocessing of datasets.
Test log: VIM_regrImp log

INFO [10:15:48.372] Applying learner 'imput_VIM_regrImp.encodeimpact.classif.glmnet' on task '3604' (iter 1/5)
Ostrzeżenie w poleceniu 'multinom(form, data[TFna, ])': group ‘4’ is empty
Ostrzeżenie w poleceniu 'multinom(form, data[TFna, ])': group ‘4’ is empty
Ostrzeżenie w poleceniu 'multinom(form, data[TFna, ])': group ‘4’ is empty
Ostrzeżenie w poleceniu 'multinom(form, data[TFna, ])': group ‘4’ is empty
[1] "Error in apply(pre, 1, function(x) sample(1:length(x), 1, prob = x)): 'dim(X)' musi mieć dodatnią długość\n"
[1] "Error in apply(pre, 1, function(x) sample(1:length(x), 1, prob = x)): 'dim(X)' musi mieć dodatnią długość\n"
Error in apply(pre, 1, function(x) sample(1:length(x), 1, prob = x)) :
'dim(X)' musi mieć dodatnią długość

It is probably an internal problem already detected but with a changed error message. Because I am not sure - reporting it.

[test no. 2] softImpute (PipeImpute)

INFO [22:12:33.475] Applying learner 'imput_softImpute.encodeimpact.classif.glmnet' on task 'Task 3722: hungarian (Supervised Classification)' (iter 1/5)
Ostrzeżenie w poleceniu '[<-.factor(*tmp*, is.na(col_to_imp), value = "NA's")':
niepoprawny poziom czynnika, wygenerowano wartość NA
Ostrzeżenie w poleceniu '[<-.factor(*tmp*, is.na(col_to_imp), value = "NA's")':
niepoprawny poziom czynnika, wygenerowano wartość NA
Error in lognet(x, is.sparse, ix, jx, y, weights, offset, alpha, nobs, :
wartość NA/NaN/Inf w wywołaniu obcej funcji (argument 5)

Probably this error is thrown by glmnet when missing values were present in data.

[test no. 3] Summary

Test no. 3 for both PipePreproc and PipeImpute versions on a sample of 10 tasks with missings.
This time two setups were tested: with and without dataset preprocessing step.
For readability summary of performance is provided in google sheet.
Sheet with summary: sheet
Tasks test sample: sample

[test no. 2] VIM_IRMI (PipeImpute)

INFO [15:05:10.585] Applying learner 'imput_VIM_IRMI.encodeimpact.classif.glmnet' on task 'Task 3847: analcatdata_draft (Supervised Classification)' (iter 1/5)
No missings in x. Nothing to impute
Ostrzeżenie w poleceniu 'kNN(x, imp_var = FALSE, mixed = mixed, mixed.constant = mixed.constant)':
Nothing to impute, because no NA are present (also after using makeNA)
Error : Processed output task during prediction of imput_VIM_IRMI does not match output task during training.

[test no. 2] missMDA_MFA (PipeOpTaskPreproc)

INFO [22:26:33.665] Applying learner 'imput_missMDA_MFA.encodeimpact.classif.glmnet' on task 'Task 3722: hungarian (Supervised Classification)' (iter 1/5)
Error in eigen(crossprod(X, X), symmetric = TRUE) :
wartość nieskończona lub brakuje wartości w 'x'

[test no. 2] softImpute (PipeOpTaskPreproc)

INFO [22:12:33.475] Applying learner 'imput_softImpute.encodeimpact.classif.glmnet' on task 'Task 3722: hungarian (Supervised Classification)' (iter 1/5)
Ostrzeżenie w poleceniu '[<-.factor(*tmp*, is.na(col_to_imp), value = "NA's")':
niepoprawny poziom czynnika, wygenerowano wartość NA
Ostrzeżenie w poleceniu '[<-.factor(*tmp*, is.na(col_to_imp), value = "NA's")':
niepoprawny poziom czynnika, wygenerowano wartość NA
Error in lognet(x, is.sparse, ix, jx, y, weights, offset, alpha, nobs, :
wartość NA/NaN/Inf w wywołaniu obcej funcji (argument 5)

[test no. 2] VIM_IRMI (PipeOpTaskPreproc)

INFO [15:05:12.329] Applying learner 'imput_VIM_IRMI.encodeimpact.classif.glmnet' on task 'Task 3838: autos (Supervised Classification)' (iter 1/5)
Ostrzeżenie w poleceniu 'predict.lm(object, newdata, se.fit, scale = 1, type = if (type == ':
prediction from a rank-deficient fit may be misleading
...
Ostrzeżenie: glm.fit: fitted probabilities numerically 0 or 1 occurred
Ostrzeżenie: glm.fit: algorithm did not converge
...
Error : Processed output task during prediction of imput_VIM_IRMI does not match output task during training.

[test no. 2] mice (PipeOpTaskPreproc)

INFO [22:12:03.409] Applying learner 'imput_mice.encodeimpact.classif.glmnet' on task 'Task 3847: analcatdata_draft (Supervised Classification)' (iter 2/5)
Error in edit.setup(data, setup, ...) :
mice detected constant and/or collinear variables. No predictors were left after their removal.

[test no. 2] missMDA_MFA (PipeOpTaskPreproc)

INFO [22:26:55.985] Applying learner 'imput_missMDA_MFA.encodeimpact.classif.glmnet' on task 'Task 3838: autos (Supervised Classification)' (iter 2/5)
Error in apply(tabdisj[, (vec[i] + 1):vec[i + 1]], 1, which.max) :
'dim(X)' musi mieć dodatnią długość

[test no. 2] VIM_KNN (PipeOpTaskPreproc)

INFO [22:13:38.048] Applying learner 'imput_VIM_kNN.encodeimpact.classif.glmnet' on task 'Task 3722: hungarian (Supervised Classification)' (iter 3/5)
[1] 5
[1] 5
Ostrzeżenie w poleceniu 'VIM::kNN(df, k = k, numFun = numFun, catFun = catFun, imp_var = F)':
All observations of feature are missing, therefore the variable will not be imputed!

Ostrzeżenie w poleceniu 'FUN(newX[, i], ...)':
brak argumentów w min; zwracanie wartości Inf
Ostrzeżenie w poleceniu 'FUN(newX[, i], ...)':
brak argumentów w max; zwracanie wartości -Inf
Ostrzeżenie w poleceniu 'FUN(newX[, i], ...)':
brak argumentów w min; zwracanie wartości Inf
Ostrzeżenie w poleceniu 'FUN(newX[, i], ...)':
brak argumentów w max; zwracanie wartości -Inf
Ostrzeżenie w poleceniu 'FUN(newX[, i], ...)':
brak argumentów w min; zwracanie wartości Inf
Ostrzeżenie w poleceniu 'FUN(newX[, i], ...)':
brak argumentów w max; zwracanie wartości -Inf
Error in indexNA2s[, variable[j]] : indeks jest poza granicami

[test no. 2] Amelia (PipeOpTaskPreproc)

  • test R script: script
  • Amelia (ver. Preproc) log: amelia log
  • successful usage: 3/10 tasks

INFO [14:57:49.855] Applying learner 'imput_Amelia.encodeimpact.classif.glmnet' on task 'Task 3561: profb (Supervised Classification)' (iter 1/5)
Ostrzeżenie w poleceniu 'amcheck(x = x, m = m, idvars = numopts$idvars, priors = priors, ':

The number of categories in one of the variables marked nominal has greater than 10 categories. Check nominal specification.

Ostrzeżenie w poleceniu 'amcheck(x = x, m = m, idvars = numopts$idvars, priors = priors, ':

The number of categories in one of the variables marked nominal has greater than 10 categories. Check nominal specification.

Amelia Error Code: 43
You have a variable in your dataset that does not vary. Please remove this variable. Variables that do not vary: Overtime

[test no. 2] missForest (PipeOpTaskPreproc)

INFO [15:07:47.108] Applying learner 'imput_missForest.encodeimpact.classif.glmnet' on task 'Task 3830: cars (Supervised Classification)' (iter 1/5)
Error in { :
task 1 failed - "Can not handle categorical predictors with more than 53 categories."

INFO [15:07:48.802] Applying learner 'imput_missForest.encodeimpact.classif.glmnet' on task 'Task 14954: cylinder-bands (Supervised Classification)' (iter 1/5)
Error in { :
task 1 failed - "Can not handle categorical predictors with more than 53 categories."

Adding columns with the information where imputed.

Adding separate PipeOpPreproces function is required to create columns with the information where imputation will happen. I will do this. This issue is only to inform you about the problem with the current solution.

[test no. 4] Summary

Test purpose: usage of auto-optimization of parameters in missForest, mice and missRanger.
Test script: script
Test logs: logs
Performance: all pipes performed successful imputation in 5/5 tasks

[test no. 2] VIM_IRMI (PipeOpTaskPreproc)

INFO [15:05:10.585] Applying learner 'imput_VIM_IRMI.encodeimpact.classif.glmnet' on task 'Task 3847: analcatdata_draft (Supervised Classification)' (iter 1/5)
No missings in x. Nothing to impute
Ostrzeżenie w poleceniu 'kNN(x, imp_var = FALSE, mixed = mixed, mixed.constant = mixed.constant)':
Nothing to impute, because no NA are present (also after using makeNA)
Error : Processed output task during prediction of imput_VIM_IRMI does not match output task during training.

[test no. 2] VIM_IRMI (PipeOpTaskPreproc)

INFO [14:58:23.920] Applying learner 'imput_VIM_IRMI.encodeimpact.classif.glmnet' on task 'Task 3722: hungarian (Supervised Classification)' (iter 1/5)
Ostrzeżenie w poleceniu 'multinom(form, data = x_reg, summ = 2, maxit = 50, trace = FALSE, ':
group ‘0’ is empty
[1] "IRMI dont work on selcted params runing on defoult"
Ostrzeżenie w poleceniu 'predict.lm(object, newdata, se.fit, scale = 1, type = if (type == ':
prediction from a rank-deficient fit may be misleading
Ostrzeżenie: glm.fit: algorithm did not converge
Ostrzeżenie: glm.fit: fitted probabilities numerically 0 or 1 occurred
Ostrzeżenie w poleceniu 'predict.lm(object, newdata, se.fit, scale = 1, type = if (type == ':
prediction from a rank-deficient fit may be misleading
Ostrzeżenie w poleceniu 'multinom(form, data = x_reg, summ = 2, maxit = 50, trace = FALSE, ':
group ‘0’ is empty
Error in 1L:ncol(Y) : argument ma długość 0

[test no. 2] Summary

test R script: script

PipeOpTaskPreproc version results:

Amelia (PipeOpTaskPreproc)

VIM_IRMI (PipeOpTaskPreproc)

missForest (PipeOpTaskPreproc)

mice (PipeOpTaskPreproc)

  • log: mice log
  • successful usage: 5/10 tasks

softImpute (PipeOpTaskPreproc)

VIM_HD (PipeOpTaskPreproc)

VIM_KNN (PipeOpTaskPreproc)

VIM_regrImp (PipeOpTaskPreproc)

missRanger (PipeOpTaskPreproc)

missMDA_MFA (PipeOpTaskPreproc)

missMDA_MCA_PCA_FMAD (PipeOpTaskPreproc)

[test no. 2] mice (PipeImpute)

INFO [22:10:54.599] Applying learner 'imput_mice.encodeimpact.classif.glmnet' on task 'Task 3722: hungarian (Supervised Classification)' (iter 1/5)
Ostrzeżenie: Number of logged events: 51
Error in lognet(x, is.sparse, ix, jx, y, weights, offset, alpha, nobs, :
wartość NA/NaN/Inf w wywołaniu obcej funcji (argument 5)

Experiment arrangements

update 23.07.2020:

  1. database of datasets:
  • basic statistics of datasets with missing data
  • searching for patterns in observations with missing values in real incomplete data and according to these
  • according to founded patterns, generation missing entries in real complete data
  1. pipeline to support imputation methods:
  • R package
  • for multiple imputation as a default we preserve the first imputed dataset, optionally we can use multiple version of imputed data
  • implementing out of range imputation
  • implementing mask with dummy encoding whether is a a missing entry

[test no. 2] missMDA_MFA (PipeOpTaskPreproc)

  • test R script: script
  • log: missMDA_MFA log
  • successful usage: 1/10 tasks
    Tasks in which probably left missings after imputation (the same situation as discussed here):
  • Task 3543: irish
  • Task 29: credit-approval
  • Task 3830: cars
  • Task 48: heart-c
  • Task 3847: analcatdata_draft
    During imputation, no other errors were thrown, despite linked above.

[test no. 2] missMDA_MFA (PipeOpTaskPreproc)

INFO [22:26:44.621] Applying learner 'imput_missMDA_MFA.encodeimpact.classif.glmnet' on task 'Task 14954: cylinder-bands (Supervised Classification)' (iter 1/5)
Error in if (any(MM[[g]] < 0)) stop(paste("The algorithm fails to converge. Choose a number of components (ncp) less or equal than ", :
brakuje wartości tam, gdzie wymagane jest TRUE/FALSE

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.