Hi Thierry, We have a new RF issue with v0.0.20, where a warning ind

missing data is still present after imputation in RF about radiator HOT 2 CLOSED

thierrygosselin commented on May 26, 2024

missing data is still present after imputation in RF

from radiator.

Comments (2)

Astahlke commented on May 26, 2024

Hi Thierry,

I've done a little more troubleshooting here. It seems like I can't replicate this issue on my local mac, but it is persistent on the HPC. Why is the imputation module throwing a missing data error when there aren't any NA in the data set?

Any ideas?

Thank you!

>   gc <- radiator::genomic_converter(data = miss.genlight,
+                                     output = "genlight",
+                                     imputation.method = "rf",
+                                     monomorphic.out = FALSE,
+                                 hierarchical.levels = "global",
+                                     verbose = TRUE)
#######################################################################
##################### radiator::genomic_converter #####################
#######################################################################
Function arguments and values:
Working directory: /mnt/ceph/stah3621/imputation
Input file: from global environment
Strata: no
Population levels: no
Population labels: no
Output format(s): tidy, genlight
Filename prefix: no
Filters:
Blacklist of individuals: no
Blacklist of genotypes: no
Whitelist of markers: no
monomorphic.out: FALSE
snp.ld: no
common.markers: TRUE
max.marker: no
pop.select: no
maf.thresholds: no

Imputations options:
imputation.method: rf
hierarchical.levels: global

parallel.core: 47

#######################################################################

Importing data

Number of markers missing in all individuals and removed: 1

Tidy genomic data:
    Number of markers: 500
    Number of chromosome/contig/scaffold: 1
    Number of individuals: 94

Preparing data for output

    Data is bi-allelic


#######################################################################
####################### grur::grur_imputations ########################
#######################################################################
Imputation method: rf
Hierarchical levels: global
On-the-fly-imputations options:
    number of trees to grow: 50
    minimum terminal node size: 1
    non-negative integer value used to specify random splitting: 10
    number of iterations: 10
Number of CPUs: 47
Note: If you have speed issues: follow radiator's vignette on parallel computing


Number of populations: 1
Number of individuals: 94
Number of markers: 500

Proportion of missing genotypes before imputations: 0.298319
On-the-fly-imputations using Random Forests algorithm
Imputations computed globally, take a break...
Adjusting REF/ALT alleles to account for imputations...
    generating REF/ALT dictionary
    integrating new genotype codings...

Proportion of missing genotypes after imputations: 0

Computation time: 8 sec
################## grur::grur_imputations completed ###################
Generating adegenet genlight object without imputation
Generating adegenet genlight object WITH imputations

Writing tidy data set:
[email protected]

Writing tidy data set:
[email protected]
############################### RESULTS ###############################
Data format of input: genlight
Biallelic data
Number of common markers: 500
Number of chromosome/contig/scaffold: 1
Number of individuals 94

Computation time: 12 sec
################ radiator::genomic_converter completed ################
Warning message:
In radiator::radiator_imputations_module(data = input, imputation.method = imputation.method,  :
  Missing data is still present in the dataset
    2 options:
    run the function again with hierarchical.levels = 'global'
    use common.markers = TRUE when using hierarchical.levels = 'strata'

> anyNA(as.matrix(gc$genlight.imputed))
[1] FALSE

From what I can tell, R and package issues are the same in the important ways:

On my local mac:

> sessionInfo()
R version 3.5.1 (2018-07-02)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS  10.14.2

Matrix products: default
BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods  
[7] base     

other attached packages:
 [1] bindrcpp_0.2.2        psych_1.8.12          vegan_2.5-3          
 [4] lattice_0.20-38       permute_0.9-4         LEA_2.4.0            
 [7] tidyr_0.8.2           adegenet_2.1.1        ade4_1.7-13          
[10] randomForestSRC_2.8.0 radiator_0.0.21

And on the HPC. Could the locale variables have an impact?

> sessionInfo()
R version 3.5.1 (2018-07-02)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: CentOS Linux 7 (Core)

Matrix products: default
BLAS: /opt/modules/devel/R/3.5.1/lib64/R/lib/libRblas.so
LAPACK: /opt/modules/devel/R/3.5.1/lib64/R/lib/libRlapack.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
 [9] LC_ADDRESS=C               LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods
[7] base

other attached packages:
 [1] dplyr_0.7.8           LEA_2.4.0             randomForestSRC_2.8.0
 [4] bindrcpp_0.2.2        psych_1.8.12          vegan_2.5-3
 [7] lattice_0.20-38       permute_0.9-4         tidyr_0.8.2
[10] radiator_0.0.21       adegenet_2.1.1        ade4_1.7-13

from radiator.

thierrygosselin commented on May 26, 2024

Impossible for me to reproduce the error.
The imputation module was moved out of radiator.
It now reside inside grur only, because of cross-dependency issue to submit to CRAN.
genomic_converter will be added to grur imputations module in the next release of grur, next week.

from radiator.

missing data is still present after imputation in RF about radiator HOT 2 CLOSED

Comments (2)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent