Comments (2)
Hi Thierry,
I've done a little more troubleshooting here. It seems like I can't replicate this issue on my local mac, but it is persistent on the HPC. Why is the imputation module throwing a missing data error when there aren't any NA in the data set?
Any ideas?
Thank you!
> gc <- radiator::genomic_converter(data = miss.genlight,
+ output = "genlight",
+ imputation.method = "rf",
+ monomorphic.out = FALSE,
+ hierarchical.levels = "global",
+ verbose = TRUE)
#######################################################################
##################### radiator::genomic_converter #####################
#######################################################################
Function arguments and values:
Working directory: /mnt/ceph/stah3621/imputation
Input file: from global environment
Strata: no
Population levels: no
Population labels: no
Output format(s): tidy, genlight
Filename prefix: no
Filters:
Blacklist of individuals: no
Blacklist of genotypes: no
Whitelist of markers: no
monomorphic.out: FALSE
snp.ld: no
common.markers: TRUE
max.marker: no
pop.select: no
maf.thresholds: no
Imputations options:
imputation.method: rf
hierarchical.levels: global
parallel.core: 47
#######################################################################
Importing data
Number of markers missing in all individuals and removed: 1
Tidy genomic data:
Number of markers: 500
Number of chromosome/contig/scaffold: 1
Number of individuals: 94
Preparing data for output
Data is bi-allelic
#######################################################################
####################### grur::grur_imputations ########################
#######################################################################
Imputation method: rf
Hierarchical levels: global
On-the-fly-imputations options:
number of trees to grow: 50
minimum terminal node size: 1
non-negative integer value used to specify random splitting: 10
number of iterations: 10
Number of CPUs: 47
Note: If you have speed issues: follow radiator's vignette on parallel computing
Number of populations: 1
Number of individuals: 94
Number of markers: 500
Proportion of missing genotypes before imputations: 0.298319
On-the-fly-imputations using Random Forests algorithm
Imputations computed globally, take a break...
Adjusting REF/ALT alleles to account for imputations...
generating REF/ALT dictionary
integrating new genotype codings...
Proportion of missing genotypes after imputations: 0
Computation time: 8 sec
################## grur::grur_imputations completed ###################
Generating adegenet genlight object without imputation
Generating adegenet genlight object WITH imputations
Writing tidy data set:
[email protected]
Writing tidy data set:
[email protected]
############################### RESULTS ###############################
Data format of input: genlight
Biallelic data
Number of common markers: 500
Number of chromosome/contig/scaffold: 1
Number of individuals 94
Computation time: 12 sec
################ radiator::genomic_converter completed ################
Warning message:
In radiator::radiator_imputations_module(data = input, imputation.method = imputation.method, :
Missing data is still present in the dataset
2 options:
run the function again with hierarchical.levels = 'global'
use common.markers = TRUE when using hierarchical.levels = 'strata'
> anyNA(as.matrix(gc$genlight.imputed))
[1] FALSE
From what I can tell, R and package issues are the same in the important ways:
On my local mac:
> sessionInfo()
R version 3.5.1 (2018-07-02)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS 10.14.2
Matrix products: default
BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRlapack.dylib
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] stats graphics grDevices utils datasets methods
[7] base
other attached packages:
[1] bindrcpp_0.2.2 psych_1.8.12 vegan_2.5-3
[4] lattice_0.20-38 permute_0.9-4 LEA_2.4.0
[7] tidyr_0.8.2 adegenet_2.1.1 ade4_1.7-13
[10] randomForestSRC_2.8.0 radiator_0.0.21
And on the HPC. Could the locale variables have an impact?
> sessionInfo()
R version 3.5.1 (2018-07-02)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: CentOS Linux 7 (Core)
Matrix products: default
BLAS: /opt/modules/devel/R/3.5.1/lib64/R/lib/libRblas.so
LAPACK: /opt/modules/devel/R/3.5.1/lib64/R/lib/libRlapack.so
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods
[7] base
other attached packages:
[1] dplyr_0.7.8 LEA_2.4.0 randomForestSRC_2.8.0
[4] bindrcpp_0.2.2 psych_1.8.12 vegan_2.5-3
[7] lattice_0.20-38 permute_0.9-4 tidyr_0.8.2
[10] radiator_0.0.21 adegenet_2.1.1 ade4_1.7-13
from radiator.
Impossible for me to reproduce the error.
The imputation module was moved out of radiator.
It now reside inside grur only, because of cross-dependency issue to submit to CRAN.
genomic_converter
will be added to grur imputations module in the next release of grur, next week.
from radiator.
Related Issues (20)
- Error: genomic_converter function: Error in SeqArray::seqGetData(gdsfile = data, var.name = "$ref") : The GDS node "$ref" does not exist.
- SeqArray newest version: unused argument (.progress = TRUE) HOT 6
- error in genomic_converter() HOT 2
- Stacks vcf to rubias HOT 4
- genomic_converter Error HOT 4
- filter_rad error: Column `READ_DEPTH` is a `SeqVarDataList` object. HOT 1
- filter_rad issue error/dplyr:::mutate_error ! & DynamicClusterCall()
- error in reading VCF file HOT 1
- The GDS node "$ref" does not exist. HOT 5
- Genomic converter error 'no more individuals in your data' HOT 9
- Error in file(con, "r") : cannot open the connection HOT 1
- Error when using genomic_converter genlight --> pcadapt HOT 3
- Potential help error with DArT HOT 3
- The GDS node "$ref" does not exist. HOT 4
- genomic_converter errors HOT 5
- Error in `dplyr::mutate()` - No Variants Selected HOT 5
- genomic converter() Error:>! `everything()` must be used within a *selecting* function. HOT 1
- Error Converting VCF to Genepop HOT 12
- read_vcf file access error with parallel.core = 1L argument HOT 3
- ERROR with detect_duplicate_genome + tidy format obtained from genomic_converter HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from radiator.