thierrygosselin / radiator Goto Github PK

View Code? Open in Web Editor NEW

57.0 12.0 22.0 10.01 MB

RADseq Data Exploration, Manipulation and Visualization using R

Home Page: https://thierrygosselin.github.io/radiator/

License: GNU General Public License v3.0

R 46.09% HTML 53.91%

radseq radseq-data imputation genotype-likelihoods filter genetics missingness outliers genomics genomic-data-analysis

radiator's Issues

genomic_converter object 'MARKERS' from genind

Hi Thierry,

I'm having a similar problem as that reported earlier by Tom Jenkins; namely that I get an "Evaluation error: object 'MARKERS' not found" when using genomic_converter.

In my case, my data (attached here as "foo2.csv") are brought in via df2genind. These are simulated SNP data (100 SNPs for 100 individuals) recoded from allele counts of 0/1/2 to 110110/100110/100100. The goal is to impute NAs using RF in radiator. Note that this code pasted below was working in radiator v. 0.0.6; I'm now running 0.0.10.

Based on the solutions for Tom's data, I tried changing the @loc.fac slot so each column was unique (e.g. 0.1 0.2 1.1 1.2...) but this didn't help. Any ideas? Thanks!! -Brenna

#################################

foo.genind <- df2genind(foo2, ncode = 3, NA.char = NA, ploidy = 2)

foo.genind@pop <- as.factor(rep("PO1", length(sets.length[x])))

foo.genind

/// GENIND OBJECT /////////

// 100 individuals; 100 loci; 200 alleles; size: 132.5 Kb

// Basic content
@tab: 100 x 200 matrix of allele counts
@loc.n.all: number of alleles per locus (range: 2-2)
@loc.fac: locus factor for the 200 columns of @tab
@all.names: list of allele names for each locus
@ploidy: ploidy of each individual (range: 2-2)
@type: codom
@call: df2genind(X = foo2, ncode = 3, NA.char = NA, ploidy = 2)

// Optional content
@pop: population of each individual (group size range: 1-1)

[email protected]
[1] 0 0 1 1 2 2 3 3 4 4 5 5 6 6 7 7 8 8 9 9 10
[22] 10 11 11 12 12 13 13 14 14 15 15 16 16 17 17 18 18 19 19 20 20
[43] 21 21 22 22 23 23 24 24 25 25 26 26 27 27 28 28 29 29 30 30 31
[64] 31 32 32 33 33 34 34 35 35 36 36 37 37 38 38 39 39 40 40 41 41
[85] 42 42 43 43 44 44 45 45 46 46 47 47 48 48 49 49 50 50 51 51 52
[106] 52 53 53 54 54 55 55 56 56 57 57 58 58 59 59 60 60 61 61 62 62
[127] 63 63 64 64 65 65 66 66 67 67 68 68 69 69 70 70 71 71 72 72 73
[148] 73 74 74 75 75 76 76 77 77 78 78 79 79 80 80 81 81 82 82 83 83
[169] 84 84 85 85 86 86 87 87 88 88 89 89 90 90 91 91 92 92 93 93 94
[190] 94 95 95 96 96 97 97 98 98 99 99
100 Levels: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 ... 99

imp.rf.stackr <- genomic_converter(data=foo.genind, output="tidy", imputation.method="rf", hierarchical.levels="global", verbose = TRUE)
#######################################################################
##################### radiator::genomic_converter #####################
#######################################################################
Function arguments and values:
Working directory: D:/6-NAproject/imputation/filter/temp/temp1
Input file: from global environment
Strata: no
Population levels: no
Population labels: no
Output format(s): tidy, tidy
Filename prefix: no
Filters:
Blacklist of individuals: no
Blacklist of genotypes: no
Whitelist of markers: no
monomorphic.out: TRUE
snp.ld: no
common.markers: TRUE
max.marker: no
pop.select: no
maf.thresholds: no

Imputations options:
imputation.method: rf
hierarchical.levels: global

parallel.core: 7

#######################################################################

Importing data

Error in mutate_impl(.data, dots) :
Evaluation error: object 'MARKERS' not found.

foo2.zip

Error running Bayscan

Hello,
When running run_bayescan I get an error message that stops it running.

#######################################################################
###################### radiator::run_bayescan #########################
#######################################################################

Folder created: 
radiator_bayescan_20190116@1211
For progress, look in the log file: [email protected]
Copying input BayeScan file in folder
Importing BayeScan results
Error in if (length(x) > 1 || grepl("\n", x)) { : 
  missing value where TRUE/FALSE needed

filter_rad issues

Hi Thierry,

A few small bugs I've noticed with filter_rad() in interactive mode:

-Any filter that removes samples/individuals causes failure in the next filtering step:

Error: Column `MISSING_PROP` must be length 203 (the number of rows) or one, not 204

-detect_mixed_genomes doesn't abide by the parallel.core arg in filter_rad(); fixed by adding parallel.core = parallel.core in filter_rad:

gds <- detect_mixed_genomes(data = gds, interactive.filter = interactive.filter, 
    detect.mixed.genomes = detect.mixed.genomes, ind.heterozygosity.threshold = NULL, 
    parameters = filters.parameters, verbose = verbose, parallel.core = parallel.core,
    path.folder = wf, internal = FALSE)

-May need to remove strata=NULL from filter_hwe in filter_rad():

gds <- filter_hwe(data = gds, interactive.filter = interactive.filter, 
    filter.hwe = filter.hwe, strata=NULL, hw.pop.threshold = hw.pop.threshold, 
    midp.threshold = midp.threshold, parallel.core = parallel.core, 
    parameters = filters.parameters, path.folder = wf, verbose = verbose, 
    internal = FALSE)

-when filtering by HWE, interactive mode doesn't always detect the asterisk inputs; I think this happened when I tried setting hw.pop.threshold equal to the number of pops, or it may happen when some strata are removed for having n < 10, but I don't remember exactly. I just tried to re-run on strata where none were removed and didn't get the error.

-in general, is there a way to exit the interactive mode? When the HWE filter couldn't detect my inputs, I had to restart the R session to get out.

-Transferring to genomic_converter requires doing the REF/ALT calibration again. Not a major issue but adds some time.

-purely aesthetic, but when running on Windows, the font choice in the plots (Helvetica?) causes warnings:

In grid.Call(C_textBounds, as.graphicsAnnot(x$label),  ... :
  font family not found in Windows font database

Not issues, but questions/suggestions/requests:
-Are there better explanations for how outliers/q75/iqr are calculated and applied? Is outliers just outside 95% CI?
-filter_coverage step returns a plot of max mean coverage; a plot for min mean would be useful
-In filter_genotyping, is the threshold applied per-strata or only on the total?
-Long LD filtering appears to work but only if pruned WITHOUT missing data statistics when CHROMs represent contigs; pruning with missing data statistics doesn't remove anything. Is there a reason for this? Actually I'm not sure loci are pruned either way. Is it possible to collapse the CHROMs down to a single CHROM to do the long LD filtering?
-outputting the full function call with args entered during the interactive session would help with reproducibility
-asking if you want to run a particular filtering step interactively, e.g. asking if you want to skip calculating HWE since it takes a long time

"Evaluation error" when executing radiator::tidy_genomic_data

Hi Thierry,

I wonder if you can help me with the following problem I have:
I would like to a) tranform my vcf data into a tidy data frame using "tidy_genomic_data", then b) fill NAs using the "radiator_imputations_module".
In the first step, this is my code:

mullus.imp <- radiator::tidy_genomic_data(
data = "LD5000_WestMed.recode.vcf",
vcf.metadata = TRUE,
verbose = TRUE)

and this is the resulting output and error:

####################################################################### ##################### radiator::tidy_genomic_data ##################### ####################################################################### Importing and tidying the VCF... Reading VCF... Generated a filters parameters file: [email protected] Number of SNPs: 15990 Number of samples: 317 conversion timing: 13 sec VCF: biallelic SNPs Cleaning VCF sample names Reads assembly: reference-assisted Filters parameters file: updated Generating individual stats Error in mutate_impl(.data, dots) : Evaluation error: One of the nodes produced an error: Can not open file 'C:\Users\kfietz\Desktop\[email protected]'. Der Prozess kann nicht auf die Datei zugreifen, da sie von einem anderen Prozess verwendet wird. (=the process cannot access the file because it is used by another process)

As I see it, the gds-file is being created at the very moment I execute the command, hence I do not understand how it can be used by another process and what I may do to overcome this error.
Any advice appreciated.

many thanks in advance,
Katharina

Error : (converted from warning) unable to re-encode 'filter_monomorphic.R' line 7

Hi Thierry,
I just tried installing the radiator package. After successfully installing glue, Rtools3.5 and stringi, it fails when trying to install the actual radiator package with the following error message:


Installing package into ‘C:/Users/f/Documents/R/win-library/3.5’
(as ‘lib’ is unspecified)
* installing *source* package 'radiator' ...
** R
Error : (converted from warning) unable to re-encode 'filter_monomorphic.R' line 7
ERROR: unable to collate and parse R files for package 'radiator'
* removing 'C:/Users/f/Documents/R/win-library/3.5/radiator'
In R CMD INSTALL
Error in i.p(...) : 
  (converted from warning) installation of package ‘C:/Users/f/AppData/Local/Temp/Rtmpm0Pwbl/file34c86193c02/radiator_1.0.0.tar.gz’ had non-zero exit status

Any idea as to what causes this error?
Btw, I tried looking at that link to installation errors - but the dropbox link is dead?

Thanks,
Flo

Monomorphic blacklist exported as "blacklist.momorphic.markers.tsv"

A small typo in one of the monomorphic SNP filtering functions, produces the output file as blacklist.momorphic.markers.tsv

error while filtering a large VCF file radiator::filter_rad

Hi Thierry,

Thanks for this excellent tool for RADseq data visualization

I've tried to use the function filter_rad (using the interactive filter) but I wasn't able to complete the filtering because of an Error:

Error in getGlobalsAndPackages(expr, envir = envir, tweak = tweakExpression, : The total size of the 3 globals that need to be exported for the future expression (‘do.call(what = FUN, args = args)’) is 3.64 GiB. This exceeds the maximum allowed size of 1.00 GiB (option 'future.globals.maxSize'). There are three globals: ‘args’ (3.64 GiB of class ‘list’), ‘FUN’ (5.59 KiB of class ‘function’) and ‘progressFifo’ (584 bytes of class ‘numeric’).

Could you please provide advice on how to proceed with a larg vcf file (29G)??

This is all it got to do:
Folder created:
filter_rad_20180911@1225
Reading VCF...
Large vcf file may take several minutes...
Actually, you have time for a coffee...
conversion timing: 1556 sec

radiator is working on the file ...
VCF is biallelic
Updating markers metadata and stats
[==================================================] 100%, completed in 1s
[==================================================] 100%, completed in 1s
Generating SNP position on read stats
Generating coverage stats
Generating individual stats
[==================================================] 100%, completed in 5s

Missing data (averaged):
markers: 0.07
individuals: 0.07

Coverage info:
individuals mean read depth: 40034180
individuals mean genotype coverage: 14
markers mean coverage: 14

Number of chromosome/contig/scaffold: 2699
Number of locus: 111019
Number of markers: 3022643
Number of individuals: 323

Working time: 2068 sec

############################# IMPORTANT ###############################
Tidying vcf with 3022643 SNPs is not optimal
use radiator::filter_rad to reduce to ~ 10 000 unlinked SNPs

Loading VCF issue

Hi,

I'm trying to generate some basic stats for my STACKS generated haplotype data (nucleotide diversity per individual, number polymorphic/monomorphic loci per sampling site). I have a populations.haps.vcf file and I generated a tsv file with my individuals and strata.

In my understanding I first need to generate a tidy file. However, when I use either tidy_vcf() or tidy_genomic_data() I receive the following error:

**tidy.data <- tidy_vcf(data = "populations.haps.vcf", strata = "popmap_2019_LinA.tsv")

Reading VCF
Data summary:
number of samples: 308
number of markers: 7163
done! timing: 1 sec

Filter monomorphic markers
Number of individuals / strata / chrom / locus / SNP:
Blacklisted: 0 / 0 / 0 / 0 / 0

Filter common markers:
Number of individuals / strata / chrom / locus / SNP:
Blacklisted: 0 / 0 / 0 / 43 / 43
Generating individual stats...
Error: Argument 2 must be length 2192960, not 0**

What am I missing here?

Best, Diede

Error upon name cleaning, gemomic_converter() in R 3.5.1

I try to convert a .vcf.gz file to tidy genomic data in Rstudio:
eu_snps_1 <- genomic_converter( data = "UMBELLA_Erumb1_samples_gt_50pct_covered.recode.vcf.gz", vcf.metadata = TRUE, common.markers = FALSE, strata = "strata_eu.tsv" )
Resulting in this error message and traceback:

`
#######################################################################
##################### radiator::genomic_converter #####################
#######################################################################
Function arguments and values:
Working directory: /Volumes/pearman-1/lud11_docs/upv_research/projects/eriogonoideae/eriogonoideae/data/GBS/560_samples_20181029/Erumb1
Input file: UMBELLA_Erumb1_samples_gt_50pct_covered.recode.vcf.gz
Strata: strata_eu.tsv
Population levels: no
Population labels: no
Output format(s): tidy
Filename prefix: no
Filters:
Blacklist of individuals: no
Blacklist of genotypes: no
Whitelist of markers: no
monomorphic.out: TRUE
snp.ld: no
common.markers: FALSE
max.marker: no
pop.select: no
maf.thresholds: no

Imputations options:
imputation.method: no

parallel.core: 15

#######################################################################

Importing data

Show Traceback
Error in stringi::stri_replace_all_fixed(str = as.character(x), pattern = c("_", : object 'input' not found`

stringi::stri_replace_all_fixed(str = as.character(x), pattern = c("_", ":", " "), replacement = c("-", "-", ""), vectorize_all = FALSE)
radiator::clean_ind_names(input$INDIVIDUALS)
radiator::tidy_genomic_data(data = data, vcf.metadata = vcf.metadata, blacklist.id = blacklist.id, blacklist.genotype = blacklist.genotype, whitelist.markers = whitelist.markers, monomorphic.out = monomorphic.out, max.marker = max.marker, snp.ld = snp.ld, common.markers = common.markers, ...
genomic_converter(data = "UMBELLA_Erumb1_samples_gt_50pct_covered.recode.vcf.gz", vcf.metadata = TRUE, common.markers = FALSE, strata = "strata_eu.tsv")

The files exist:
> file.exists("strata_eu.tsv") [1] TRUE

> file.exists("UMBELLA_Erumb1_samples_gt_50pct_covered.recode.vcf.gz") [1] TRUE

The top of the unzipped .vcf.gz looks like this:
##fileformat=VCFv4.0
##fileDate=Thu Oct 25 12:00:44 2018
##source=GBS-SNP-CROP
##phasing=partial
##INFO=<ID=AC,Number=1,Type=Integer,Description="Allele Count">
##INFO=<ID=AF,Number=1,Type=Integer,Description="Allele Frequency">
##INFO=<ID=DP,Number=1,Type=Integer,Description="Average Depth">
##INFO=<ID=NS,Number=1,Type=Integer,Description="Number of Samples With Data">
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
##FORMAT=<ID=DP,Number=1,Type=Integer,Description="Read Depth">
##FORMAT=<ID=AD,Number=1,Type=Integer,Description="Allele Depth">
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT AC3894 AC3895 AC3896...
Erumb1_s00000011 30280 . C T 40 PASS . GT:DP:AD ./.:0:.,. ./.:0:.,. ./.:0:.,. .....

The top of the tab separated, 561-line strata file looks like this:
INDIVIDUALS STRATA FLOWCELL variety2 MACHINE year
AC3894 PBP1 C9RB9ACXX(2101) munzii 26B 2014
AC3895 PBP1 C9RB9ACXX(2101) munzii 26B 2014
AC3896 PBP1 C9RB9ACXX(2101) munzii 26B 2014
AC3897 PBP1 C9RB9ACXX(2101) munzii 26B 2014
AC3898 PBP1 C9RB9ACXX(2*101) munzii 26B 2014.....

The FLOWCELL names all have an "*" even though only the last one appears here.

It looks to me like everything is there that should be. Maybe I am just doing something wrong. I am on this R:

version
_
platform x86_64-apple-darwin15.6.0
arch x86_64
os darwin15.6.0
system x86_64, darwin15.6.0
status
major 3
minor 5.1
year 2018
month 07
day 02
svn rev 74947
language R
version.string R version 3.5.1 (2018-07-02)
nickname Feather Spray

And RStudio version 1.0.153

Work on sexy_function

sexy_markers works on:

To do:

UpSetR plot of common sex markers

Problem with tidy_genomic_data() for multiallelic datasets

Hi Thierry,

I've found an issue with using tidy_genomic_data() with multiallelic datasets. I've been able to trace the problem to the change_alleles() function (called from tidy_genomic_data()), but haven't been able to further pinpoint the problem. Here is a small example:

library(radiator)
library(tidyverse)
#> Loading tidyverse: ggplot2
#> Loading tidyverse: tibble
#> Loading tidyverse: tidyr
#> Loading tidyverse: readr
#> Loading tidyverse: purrr
#> Loading tidyverse: dplyr
#> Warning: package 'dplyr' was built under R version 3.4.2
#> Conflicts with tidy packages ----------------------------------------------
#> filter(): dplyr, stats
#> lag():    dplyr, stats

# A sample tidy biallelic data frame
tidy_dat <- tibble::tribble(
  ~INDIVIDUALS, ~POP_ID, ~LOCUS, ~GT,
  "IND1",    "POP1",  "loc1", "000000",
  "IND2",    "POP1",  "loc1", "001002"
)

# This works as expected
change_alleles(tidy_dat, verbose = TRUE)
#>     Scanning for number of alleles per marker...
#>     Data is biallellic
#>     Generating vcf-style coding
#> $input
#> # A tibble: 2 x 8
#>   INDIVIDUALS POP_ID MARKERS     GT   ALT   REF GT_VCF GT_BIN
#>         <chr>  <chr>   <chr>  <chr> <chr> <chr>  <chr>  <dbl>
#> 1        IND1   POP1    loc1 000000     C     A    ./.     NA
#> 2        IND2   POP1    loc1 001002     C     A    0/1      1
#> 
#> $biallelic
#> [1] TRUE

# A sample tidy multi-allelic data frame
tidy_dat2 <- tibble::tribble(
  ~INDIVIDUALS, ~POP_ID, ~LOCUS, ~GT,
  "IND1",    "POP1",  "loc1", "000000",
  "IND2",    "POP1",  "loc1", "001002",
  "IND3",    "POP1",  "loc1", "003004"
)

# This has an unexpected output
change_alleles(tidy_dat2, verbose = TRUE)
#>     Scanning for number of alleles per marker...
#>     Data is multiallellic
#> Warning in serialize(data, node$con): 'package:dplyr' may not be available
#> when loading
#> Warning in serialize(data, node$con): 'package:dplyr' may not be available
#> when loading

#> Warning in serialize(data, node$con): 'package:dplyr' may not be available
#> when loading

#> Warning in serialize(data, node$con): 'package:dplyr' may not be available
#> when loading

#> Warning in serialize(data, node$con): 'package:dplyr' may not be available
#> when loading

#> Warning in serialize(data, node$con): 'package:dplyr' may not be available
#> when loading

#> Warning in serialize(data, node$con): 'package:dplyr' may not be available
#> when loading
#> Integrating new genotype codings...
#> Warning in serialize(data, node$con): 'package:dplyr' may not be available
#> when loading

#> Warning in serialize(data, node$con): 'package:dplyr' may not be available
#> when loading

#> Warning in serialize(data, node$con): 'package:dplyr' may not be available
#> when loading

#> Warning in serialize(data, node$con): 'package:dplyr' may not be available
#> when loading

#> Warning in serialize(data, node$con): 'package:dplyr' may not be available
#> when loading

#> Warning in serialize(data, node$con): 'package:dplyr' may not be available
#> when loading

#> Warning in serialize(data, node$con): 'package:dplyr' may not be available
#> when loading
#> $input
#> # A tibble: 3 x 8
#>   INDIVIDUALS POP_ID MARKERS    GT_VCF_NUC    REF           ALT     GT
#>         <chr>  <chr>   <chr>         <chr>  <chr>         <chr>  <chr>
#> 1        IND1   POP1    loc1 000000/000000 000000 001002,003004 001001
#> 2        IND2   POP1    loc1 001002/001002 000000 001002,003004 002002
#> 3        IND3   POP1    loc1 003004/003004 000000 001002,003004 003003
#> # ... with 1 more variables: GT_VCF <chr>
#> 
#> $biallelic
#> [1] FALSE

-Chris

R crashes when attempting to import VCF file from Stacks2

Hi Thierry,

I'm trying to import a .vcf file produced by gstacks command of Stacks2.
Variants were called from mapped BAM files, not de novo.

Initially I had this error:

Error in sample.int(length(x), size, replace, prob) : 
  cannot take a sample larger than the population when 'replace = FALSE'

This is caused by the fact that in the current Stacks2 VCF output, the ID is left blank ., so tidy_vcf() fails at this command:

# Since stacks v.1.44 ID as LOCUS + COL (from sumstats) the position of the SNP on the locus.
  # Choose the first 100 markers to scan
  detect.snp.col <- sample(x = unique(input$LOCUS), size = 100, replace = FALSE) %>%
    stringi::stri_detect_fixed(str = ., pattern = "_") %>%
    unique

In order to circumnavigate this, I tried to change the ID column (imported as LOCUS) to be as mentioned in tidy_vcf() (CHROM_POS), or just POS, but then it completely crashes without giving any message.

Any ideas?

Thanks, Ido

Issues with 'genomic_converter()'

I am trying to convert a vcf file to another format i.e., genind. The program recognizes the files and identifies the correct number of individuals and snps but fail to complete the converstion. I keep receiving the following message:

Error in .DynamicClusterCall(cl, length(cl), .fun = function(.proc_idx, : One of the nodes produced an error: Can not open file 'C:\Users\Documents\Data_Analysis\134_radiator_genomic_converter_20190323@1302\[email protected]'. The process cannot access the file because it is being used by another process.

Issues using conversion functions with vcf

I've been getting errors trying to use genomic_converter() and write_colony().
I generated my original .vcf file in ipyrad, and then used PLINK to do some preliminary filtering (removing triallelic loci, and filtering by maf), but I get the same error regardless of which vcf file I use. I've previously used these files for some analyses in adegenet, so I'm not sure where the origin of the error is.
Here's my code:

ash_info<-read.csv('C:/Users/smart/Desktop/plink2_win64_20181028/sca_ASH_info3.csv')
ash_strata<-ash_info[,c(1,11)]
colnames(ash_strata)<-c("INDIVIDUALS","STRATA")
bad_ids<-as.data.frame(c("sca1194"))
colnames(bad_ids)<-c("INDIVIDUALS")
ash_output<-genomic_converter("C:/Users/smart/Desktop/plink2_win64_20181028/ASH_sept_2018_vcf_filter.vcf", strata=ash_strata, imputation.method = "rf", blacklist.id =bad_ids, output=c('genind','tidy'))

which returns the following:

#######################################################################
##################### radiator::genomic_converter #####################
#######################################################################
Function arguments and values:
Working directory: C:/tester
Input file: C:/Users/smart/Desktop/plink2_win64_20181028/ASH_sept_2018_vcf_filter.vcf
Strata: 1:109c(5, 5, 3, 4, 8, 15, 9, 8, 8, 3, 3, 12, 8, 6, 15, 8, 5, 2, 1, 2, 13, 12, 1, 13, 6, 2, 6, 6, 4, 2, 4, 14, 3, 3, 3, 14, 5, 3, 12, 10, 3, 2, 1, 3, 6, 6, 6, 3, 13, 1, 1, 3, 3, 3, 3, 3, 13, 12, 3, 15, 8, 3, 8, 11, 11, 11, 5, 3, 3, 3, 6, 6, 14, 11, 8, 6, 15, 10, 6, 4, 3, 11, 13, 11, 6, 3, 2, 3, 3, 6, 8, 8, 8, 8, 9, 11, 2, 12, 2, 6, 6, 7, 7, 5, 5, 5, 5, 5, 5)
Population levels: no
Population labels: no
Output format(s): tidy
Filename prefix: no
Filters: 
Blacklist of individuals: 1
Blacklist of genotypes: no
Whitelist of markers: no
monomorphic.out: TRUE
snp.ld: no
common.markers: TRUE
max.marker: no
pop.select: no
maf.thresholds: no

Imputations options:
imputation.method: rf
hierarchical.levels: strata

parallel.core: 3

#######################################################################

Importing data

Number of individuals in blacklist: 1

Reading VCF...
Error in if (check.header$format$Number[check.header$format$ID == "AD"] ==  : 
  argument is of length zero

I get the same error if I use the original vcf from ipyrad too.
Out of curiosity, I also tried using vcfR to import the vcf file, convert it to a genind object, and use that in the genomic converter function as below:

ash_vcf<-read.vcfR("C:/Users/smart/Desktop/plink2_win64_20181028/ASH_sept_2018_vcf_filter.vcf")
ash_genind<-vcfR2genind(ash_vcf)
ash_genind@pop<-ash_info[,11]
ash_output<-genomic_converter(ash_genind, blacklist.id = bad_ids, snp.ld = "random")

Which returns this error:

#######################################################################
##################### radiator::genomic_converter #####################
#######################################################################
Function arguments and values:
Working directory: C:/tester
Input file: from global environment
Strata: no
Population levels: no
Population labels: no
Output format(s): tidy
Filename prefix: no
Filters: 
Blacklist of individuals: 1
Blacklist of genotypes: no
Whitelist of markers: no
monomorphic.out: TRUE
snp.ld: random
common.markers: TRUE
max.marker: no
pop.select: no
maf.thresholds: no

Imputations options:
imputation.method: no

parallel.core: 3

#######################################################################

Importing data

Number of individuals in blacklist: 1
Alleles names for each markers will be converted to factors and padded with 0
Error in .f(.x[[i]], ...) : object 'CHROM' not found
In addition: There were 27 warnings (use warnings() to see them)

The 27 warning are all

In serialize(data, node$con) :
  'package:dplyr' may not be available when loading

And my sessionInfo:

R version 3.5.1 (2018-07-02)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows >= 8 x64 (build 9200)

Matrix products: default

locale:
[1] LC_COLLATE=English_United States.1252 
[2] LC_CTYPE=English_United States.1252   
[3] LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C                          
[5] LC_TIME=English_United States.1252    

attached base packages:
[1] grid      stats     graphics  grDevices utils     datasets 
[7] methods   base     

other attached packages:
 [1] grur_0.0.11       bindrcpp_0.2.2    vcfR_1.8.0       
 [4] radiator_0.0.19   viridisLite_0.3.0 geoR_1.7-5.2.1   
 [7] fields_9.6        maps_3.3.0        spam_2.2-0       
[10] dotCall64_1.0-0   INLA_18.07.12     sp_1.3-1         
[13] Matrix_1.2-14     adegenet_2.1.1    ade4_1.7-13      
[16] forcats_0.3.0     stringr_1.3.1     dplyr_0.7.8      
[19] purrr_0.2.5       readr_1.1.1       tidyr_0.8.2      
[22] tibble_1.4.2      ggplot2_3.1.0     tidyverse_1.2.1  

loaded via a namespace (and not attached):
  [1] readxl_1.1.0             uuid_0.1-2              
  [3] backports_1.1.2          plyr_1.8.4              
  [5] igraph_1.2.2             lazyeval_0.2.1          
  [7] splines_3.5.1            listenv_0.7.0           
  [9] rncl_0.8.3               GenomeInfoDb_1.18.0     
 [11] amap_0.8-16              digest_0.6.18           
 [13] htmltools_0.3.6          gdata_2.18.0            
 [15] magrittr_1.5             RandomFieldsUtils_0.3.25
 [17] cluster_2.0.7-1          Biostrings_2.50.1       
 [19] globals_0.12.4           modelr_0.1.2            
 [21] gmodels_2.18.1           prettyunits_1.0.2       
 [23] colorspace_1.3-2         rvest_0.3.2             
 [25] haven_1.1.2              tcltk_3.5.1             
 [27] crayon_1.3.4             RCurl_1.95-4.11         
 [29] jsonlite_1.5             bindr_0.1.1             
 [31] phylobase_0.8.4          ape_5.2                 
 [33] glue_1.3.0               gtable_0.2.0            
 [35] zlibbioc_1.28.0          XVector_0.22.0          
 [37] seqinr_3.4-5             BiocGenerics_0.28.0     
 [39] adegraphics_1.0-12       scales_1.0.0            
 [41] Rcpp_1.0.0               RandomFields_3.1.50     
 [43] xtable_1.8-3             progress_1.2.0          
 [45] spData_0.2.9.4           spdep_0.7-9             
 [47] stats4_3.5.1             httr_1.3.1              
 [49] RColorBrewer_1.1-2       pkgconfig_2.0.2         
 [51] XML_3.98-1.16            deldir_0.1-15           
 [53] SeqArray_1.23.1          tidyselect_0.2.5        
 [55] rlang_0.3.0.1            reshape2_1.4.3          
 [57] later_0.7.5              munsell_0.5.0           
 [59] pbmcapply_1.3.0          adephylo_1.1-11         
 [61] cellranger_1.1.0         tools_3.5.1             
 [63] cli_1.0.1                splancs_2.01-40         
 [65] broom_0.5.0              evaluate_0.12           
 [67] yaml_2.2.0               knitr_1.20              
 [69] gdsfmt_1.18.0            adespatial_0.3-2        
 [71] future_1.10.0            nlme_3.1-137            
 [73] mime_0.6                 xml2_1.2.0              
 [75] compiler_3.5.1           rstudioapi_0.8          
 [77] RNeXML_2.2.0             stringi_1.2.4           
 [79] memuse_4.0-0             lattice_0.20-38         
 [81] vegan_2.5-3              permute_0.9-4           
 [83] pillar_1.3.0             LearnBayes_2.15.1       
 [85] data.table_1.11.8        cowplot_0.9.3           
 [87] bitops_1.0-6             httpuv_1.4.5            
 [89] GenomicRanges_1.34.0     R6_2.3.0                
 [91] latticeExtra_0.6-28      promises_1.0.1          
 [93] KernSmooth_2.23-15       IRanges_2.16.0          
 [95] codetools_0.2-15         boot_1.3-20             
 [97] MASS_7.3-51.1            gtools_3.8.1            
 [99] assertthat_0.2.0         rprojroot_1.3-2         
[101] withr_2.1.2              pinfsc50_1.1.0          
[103] GenomeInfoDbData_1.2.0   S4Vectors_0.20.0        
[105] mgcv_1.8-25              expm_0.999-3            
[107] parallel_3.5.1           hms_0.4.2               
[109] fst_0.8.8                coda_0.19-2             
[111] rmarkdown_1.10           ggpubr_0.1.8            
[113] shiny_1.2.0              lubridate_1.7.4         
[115] base64enc_0.1-3

I'm guessing something is wrong on my end, but I've been trying to troubleshoot this for the last few days, and not really sure what else to do. I'll email you the other vcf file and the pop info file.

Thanks again for any help with this,
-Scott

filter_hwe

Hi Thierry,
Thank you for a terrific package!
My problem is that after filter_hwe, I do not get a working file, and therefore, I cannot proceed with my analysis. I am sending an RData zip file hopefully with what you need. I understand the function should produce a .rad file, but I cannot find it. Also, if it would save a file, what type of file would it be, genind, genepop, etc?
Many thanks,
Rita

Describe the bug
filter_hwe does not save the resulting file without markers that have a certain number of pops in Hardy-Weinberg disequilibrium.

To Reproduce
radiator::filter_hwe(radiator.gen,
interactive.filter = TRUE,
filter.hwe = TRUE,
strata = NULL,
hw.pop.threshold = TRUE,
midp.threshold = "***",
filename = NULL,
parallel.core = parallel::detectCores() - 1,
verbose = TRUE)

the complete error message you're getting
Content of folder:02_filter_hwe_20190812@2233
[email protected]
genotypes.summary.tsv
hw.pop.sum.tsv
hwd.helper.table.tsv
hwd.plot.blacklist.markers.pdf
[email protected]
the output of devtools::session_info()

devtools::session_info()
devtools::session_info()
─ Session info ─────────────────────────────────────────────────────────
setting value
version R version 3.6.1 (2019-07-05)
os macOS Mojave 10.14.5
system x86_64, darwin15.6.0
ui RStudio
language (EN)
collate en_US.UTF-8
ctype en_US.UTF-8
tz Europe/Lisbon
date 2019-08-12

─ Packages ─────────────────────────────────────────────────────────────
package * version date lib
abind 1.4-5 2016-07-21 [1]
acepack 1.4.1 2016-10-29 [1]
ade4 * 1.7-13 2018-08-31 [1]
adegenet * 2.1.1 2018-02-02 [1]
ape * 5.3 2019-03-17 [1]
assertthat 0.2.1 2019-03-21 [1]
backports 1.1.4 2019-04-10 [1]
base64enc 0.1-3 2015-07-28 [1]
bayesm 3.1-3 2019-07-29 [1]
BDgraph 2.60 2019-08-08 [1]
bitops 1.0-6 2013-08-17 [1]
boot 1.3-23 2019-07-05 [1]
broom 0.5.2 2019-04-07 [1]
calibrate 1.7.2 2013-09-10 [1]
callr 3.3.1 2019-07-18 [1]
caTools 1.17.1.2 2019-03-06 [1]
checkmate 1.9.4 2019-07-04 [1]
class 7.3-15 2019-01-01 [1]
classInt 0.4-1 2019-08-06 [1]
cli 1.1.0 2019-03-19 [1]
cluster 2.1.0 2019-06-19 [1]
coda 0.19-3 2019-07-05 [1]
codetools 0.2-16 2018-12-24 [1]
colorspace 1.4-1 2019-03-18 [1]
combinat 0.0-8 2012-10-29 [1]
compositions 1.40-2 2018-06-14 [1]
corpcor 1.6.9 2017-04-01 [1]
crayon 1.3.4 2017-09-16 [1]
crosstalk 1.0.0 2016-12-21 [1]
curl 4.0 2019-07-22 [1]
d3Network 0.5.2.1 2015-01-31 [1]
dartR * 1.3.4 2019-08-11 [1]
data.table 1.12.2 2019-04-07 [1]
DBI 1.0.0 2018-05-02 [1]
deldir 0.1-23 2019-07-31 [1]
DEoptimR 1.0-8 2016-11-19 [1]
desc 1.2.0 2018-05-01 [1]
devtools * 2.1.0 2019-07-06 [1]
digest 0.6.20 2019-07-04 [1]
directlabels 2018.05.22 2018-05-25 [1]
dismo 1.1-4 2017-01-09 [1]
diveRsity * 1.9.90 2017-04-04 [1]
doParallel * 1.0.15 2019-08-02 [1]
dplyr 0.8.3 2019-07-04 [1]
e1071 1.7-2 2019-06-05 [1]
ellipse 0.4.1 2018-01-05 [1]
energy 1.7-6 2019-07-06 [1]
expm 0.999-4 2019-03-21 [1]
fansi 0.4.0 2018-10-05 [1]
fastmatch 1.1-0 2017-01-28 [1]
fdrtool 1.2.15 2015-07-08 [1]
foreach * 1.4.7 2019-07-27 [1]
foreign 0.8-72 2019-08-02 [1]
Formula 1.2-3 2018-05-03 [1]
fs 1.3.1 2019-05-06 [1]
fst 0.9.0 2019-04-09 [1]
gap 1.2.1 2019-06-05 [1]
gdata 2.18.0 2017-06-06 [1]
gdistance 1.2-2 2018-05-07 [1]
gdsfmt 1.20.0 2019-05-02 [1]
generics 0.0.2 2018-11-29 [1]
genetics 1.3.8.1.2 2019-04-22 [1]
GGally 1.4.0 2018-05-17 [1]
ggm 2.3 2015-01-21 [1]
ggplot2 * 3.2.1 2019-08-10 [1]
ggpubr * 0.2.2 2019-08-07 [1]
ggsignif 0.6.0 2019-08-08 [1]
ggtern 3.1.0 2018-12-19 [1]
glasso 1.10 2018-07-13 [1]
glue 1.3.1 2019-03-12 [1]
gmodels 2.18.1 2018-06-25 [1]
gplots * 3.0.1.1 2019-01-27 [1]
gridExtra 2.3 2017-09-09 [1]
gtable 0.3.0 2019-03-25 [1]
gtools 3.8.1 2018-06-26 [1]
HardyWeinberg 1.6.3 2019-06-29 [1]
hierfstat * 0.04-22 2015-12-04 [1]
Hmisc 4.2-0 2019-01-26 [1]
hms 0.5.0 2019-07-09 [1]
htmlTable 1.13.1 2019-01-07 [1]
htmltools 0.3.6 2017-04-28 [1]
htmlwidgets 1.3 2018-09-30 [1]
httpuv 1.5.1 2019-04-05 [1]
huge 1.3.2 2019-04-08 [1]
HWxtest * 1.1.9 2019-05-31 [1]
igraph 1.2.4.1 2019-04-22 [1]
iterators * 1.0.12 2019-07-26 [1]
jomo 2.6-9 2019-07-29 [1]
jpeg 0.1-8 2014-01-23 [1]
jsonlite 1.6 2018-12-07 [1]
KernSmooth 2.23-15 2015-06-29 [1]
knitr 1.24 2019-08-08 [1]
labeling 0.3 2014-08-23 [1]
later 0.8.0 2019-02-11 [1]
latex2exp 0.4.0 2015-11-30 [1]
lattice 0.20-38 2018-11-04 [1]
latticeExtra 0.6-28 2016-02-09 [1]
lavaan 0.6-4 2019-07-03 [1]
lazyeval 0.2.2 2019-03-15 [1]
leaflet 2.0.2 2018-08-27 [1]
LearnBayes 2.15.1 2018-03-18 [1]
lme4 1.1-21 2019-03-05 [1]
magrittr * 1.5 2014-11-22 [1]
manipulateWidget 0.10.0 2018-06-11 [1]
MASS 7.3-51.4 2019-03-31 [1]
Matrix 1.2-17 2019-03-22 [1]
memoise 1.1.0 2017-04-21 [1]
mgcv 1.8-28 2019-03-21 [1]
mice 3.6.0 2019-07-10 [1]
mime 0.7 2019-06-11 [1]
miniUI 0.1.1.1 2018-05-18 [1]
minqa 1.2.4 2014-10-09 [1]
mitml 0.3-7 2019-01-07 [1]
mmod * 1.3.3 2017-04-06 [1]
mnormt 1.5-5 2016-10-15 [1]
munsell 0.5.0 2018-06-12 [1]
mvtnorm 1.0-11 2019-06-19 [1]
nlme 3.1-141 2019-08-01 [1]
nloptr 1.2.1 2018-10-03 [1]
nnet 7.3-12 2016-02-02 [1]
pan 1.6 2018-06-29 [1]
pander * 0.6.3 2018-11-06 [1]
pbapply 1.4-1 2019-07-15 [1]
pbivnorm 0.6.0 2015-01-23 [1]
pbmcapply 1.5.0 2019-07-10 [1]
pca3d 0.10 2017-02-17 [1]
pegas * 0.11 2018-07-09 [1]
permute 0.9-5 2019-03-12 [1]
phangorn 2.5.5 2019-06-19 [1]
pillar 1.4.2 2019-06-29 [1]
pinfsc50 1.1.0 2016-12-02 [1]
pixmap * 0.4-11 2011-07-19 [1]
pkgbuild 1.0.4 2019-08-05 [1]
pkgconfig 2.0.2 2018-08-16 [1]
pkgload 1.0.2 2018-10-29 [1]
plyr 1.8.4 2016-06-08 [1]
png 0.1-7 2013-12-03 [1]
polysat 1.7-4 2019-03-06 [1]
PopGenReport 3.0.4 2019-02-04 [1]
poppr * 2.8.3 2019-06-18 [1]
prettyunits 1.0.2 2015-07-13 [1]
processx 3.4.1 2019-07-18 [1]
promises 1.0.1 2018-04-13 [1]
proto 1.0.0 2016-10-29 [1]
ps 1.3.0 2018-12-21 [1]
psych 1.8.12 2019-01-12 [1]
purrr 0.3.2 2019-03-15 [1]
qgraph 1.6.3 2019-06-19 [1]
quadprog 1.5-7 2019-05-06 [1]
qvalue 2.16.0 2019-05-02 [1]
R.methodsS3 1.7.1 2016-02-16 [1]
R.oo 1.22.0 2018-04-22 [1]
R.utils 2.9.0 2019-06-13 [1]
R6 2.4.0 2019-02-14 [1]
radiator * 1.1.1 2019-08-12 [1]
raster 2.9-23 2019-07-11 [1]
RColorBrewer * 1.1-2 2014-12-07 [1]
Rcpp 1.0.2 2019-07-25 [1]
readr 1.3.1 2018-12-21 [1]
remotes 2.1.0 2019-06-24 [1]
reshape 0.8.8 2018-10-23 [1]
reshape2 * 1.4.3 2017-12-11 [1]
rgdal 1.4-4 2019-05-29 [1]
rgeos 0.5-1 2019-08-05 [1]
rgl 0.100.26 2019-07-08 [1]
RgoogleMaps 1.4.3 2018-11-07 [1]
rJava * 0.9-11 2019-03-29 [1]
rjson 0.2.20 2018-06-08 [1]
rlang 0.4.0 2019-06-25 [1]
robustbase 0.93-5 2019-05-12 [1]
rpart 4.1-15 2019-04-12 [1]
rprojroot 1.3-2 2018-01-03 [1]
rrBLUP 4.6 2018-01-28 [1]
Rsolnp 1.16 2015-12-28 [1]
rstudioapi 0.10 2019-03-19 [1]
rtiff * 1.4.6 2019-03-21 [1]
scales 1.0.0 2018-08-09 [1]
sendplot * 4.0.0 2013-04-25 [1]
seqinr 3.4-5 2017-08-01 [1]
sessioninfo 1.1.1 2018-11-05 [1]
sf 0.7-7 2019-07-24 [1]
shiny 1.3.2 2019-04-22 [1]
SNPRelate 1.18.1 2019-07-03 [1]
sp 1.3-1 2018-06-05 [1]
spData 0.3.0 2019-01-07 [1]
spdep 1.1-2 2019-04-05 [1]
StAMPP * 1.5.1 2017-11-10 [1]
stringi 1.4.3 2019-03-12 [1]
stringr 1.4.0 2019-02-10 [1]
survival 2.44-1.1 2019-04-01 [1]
tensorA 0.36.1 2018-07-29 [1]
testthat 2.2.1 2019-07-25 [1]
tibble 2.1.3 2019-06-06 [1]
tidyr 0.8.3 2019-03-01 [1]
tidyselect 0.2.5 2018-10-11 [1]
truncnorm 1.0-8 2018-02-27 [1]
units 0.6-3 2019-05-03 [1]
usethis * 1.5.1 2019-07-04 [1]
utf8 1.1.4 2018-05-24 [1]
vcfR 1.8.0 2018-04-17 [1]
vctrs 0.2.0 2019-07-05 [1]
vegan 2.5-5 2019-05-12 [1]
viridisLite 0.3.0 2018-02-01 [1]
webshot 0.5.1 2018-09-28 [1]
whisker 0.3-2 2013-04-28 [1]
withr 2.1.2 2018-03-15 [1]
xfun 0.8 2019-06-25 [1]
xlsx * 0.6.1 2018-06-11 [1]
xlsxjars * 0.6.1 2014-08-22 [1]
xtable 1.8-4 2019-04-21 [1]
zeallot 0.1.0 2018-01-28 [1]
zvau * 0.27 2019-08-06 [1]
source
CRAN (R 3.6.0)
CRAN (R 3.6.0)
CRAN (R 3.6.0)
CRAN (R 3.6.0)
CRAN (R 3.6.0)
CRAN (R 3.6.0)
CRAN (R 3.6.0)
CRAN (R 3.6.0)
CRAN (R 3.6.0)
CRAN (R 3.6.0)
CRAN (R 3.6.0)
CRAN (R 3.6.0)
CRAN (R 3.6.0)
CRAN (R 3.6.0)
CRAN (R 3.6.0)
CRAN (R 3.6.0)
CRAN (R 3.6.0)
CRAN (R 3.6.1)
CRAN (R 3.6.1)
CRAN (R 3.6.0)
CRAN (R 3.6.1)
CRAN (R 3.6.0)
CRAN (R 3.6.1)
CRAN (R 3.6.0)
CRAN (R 3.6.0)
CRAN (R 3.6.0)
CRAN (R 3.6.0)
CRAN (R 3.6.0)
CRAN (R 3.6.0)
CRAN (R 3.6.0)
CRAN (R 3.6.0)
Github (green-striped-gecko/dartR@3f9eebd)
CRAN (R 3.6.0)
CRAN (R 3.6.0)
CRAN (R 3.6.0)
CRAN (R 3.6.0)
CRAN (R 3.6.0)
CRAN (R 3.6.0)
CRAN (R 3.6.0)
CRAN (R 3.6.0)
CRAN (R 3.6.0)
CRAN (R 3.6.0)
CRAN (R 3.6.0)
CRAN (R 3.6.0)
CRAN (R 3.6.0)
CRAN (R 3.6.0)
CRAN (R 3.6.0)
CRAN (R 3.6.0)
CRAN (R 3.6.0)
CRAN (R 3.6.0)
CRAN (R 3.6.0)
CRAN (R 3.6.0)
CRAN (R 3.6.0)
CRAN (R 3.6.0)
CRAN (R 3.6.0)
CRAN (R 3.6.0)
CRAN (R 3.6.0)
CRAN (R 3.6.0)
CRAN (R 3.6.0)
Bioconductor
CRAN (R 3.6.0)
CRAN (R 3.6.0)
CRAN (R 3.6.0)
CRAN (R 3.6.0)
CRAN (R 3.6.1)
CRAN (R 3.6.0)
CRAN (R 3.6.1)
CRAN (R 3.6.0)
CRAN (R 3.6.0)
CRAN (R 3.6.0)
CRAN (R 3.6.0)
CRAN (R 3.6.0)
CRAN (R 3.6.0)
CRAN (R 3.6.0)
CRAN (R 3.6.0)
CRAN (R 3.6.0)
CRAN (R 3.6.0)
CRAN (R 3.6.0)
CRAN (R 3.6.0)
CRAN (R 3.6.0)
CRAN (R 3.6.0)
CRAN (R 3.6.0)
CRAN (R 3.6.0)
CRAN (R 3.6.0)
CRAN (R 3.6.0)
CRAN (R 3.6.0)
CRAN (R 3.6.0)
CRAN (R 3.6.0)
CRAN (R 3.6.0)
CRAN (R 3.6.0)
CRAN (R 3.6.1)
CRAN (R 3.6.1)
CRAN (R 3.6.0)
CRAN (R 3.6.0)
CRAN (R 3.6.0)
CRAN (R 3.6.1)
CRAN (R 3.6.0)
CRAN (R 3.6.0)
CRAN (R 3.6.0)
CRAN (R 3.6.0)
CRAN (R 3.6.0)
CRAN (R 3.6.0)
CRAN (R 3.6.0)
CRAN (R 3.6.0)
CRAN (R 3.6.1)
CRAN (R 3.6.1)
CRAN (R 3.6.0)
CRAN (R 3.6.1)
CRAN (R 3.6.0)
CRAN (R 3.6.0)
CRAN (R 3.6.0)
CRAN (R 3.6.0)
CRAN (R 3.6.0)
CRAN (R 3.6.0)
CRAN (R 3.6.0)
CRAN (R 3.6.0)
CRAN (R 3.6.0)
CRAN (R 3.6.0)
CRAN (R 3.6.0)
CRAN (R 3.6.1)
CRAN (R 3.6.0)
CRAN (R 3.6.0)
CRAN (R 3.6.1)
CRAN (R 3.6.0)
CRAN (R 3.6.0)
CRAN (R 3.6.0)
CRAN (R 3.6.0)
CRAN (R 3.6.0)
CRAN (R 3.6.0)
CRAN (R 3.6.0)
CRAN (R 3.6.0)
CRAN (R 3.6.0)
CRAN (R 3.6.1)
CRAN (R 3.6.0)
CRAN (R 3.6.0)
CRAN (R 3.6.0)
CRAN (R 3.6.0)
CRAN (R 3.6.0)
CRAN (R 3.6.0)
CRAN (R 3.6.0)
CRAN (R 3.6.0)
CRAN (R 3.6.0)
CRAN (R 3.6.0)
CRAN (R 3.6.0)
CRAN (R 3.6.0)
CRAN (R 3.6.0)
CRAN (R 3.6.0)
CRAN (R 3.6.0)
CRAN (R 3.6.0)
Bioconductor
CRAN (R 3.6.0)
CRAN (R 3.6.0)
CRAN (R 3.6.0)
CRAN (R 3.6.0)
Github (c9804ca)
CRAN (R 3.6.0)
CRAN (R 3.6.0)
CRAN (R 3.6.0)
CRAN (R 3.6.0)
CRAN (R 3.6.0)
CRAN (R 3.6.0)
CRAN (R 3.6.0)
CRAN (R 3.6.0)
CRAN (R 3.6.0)
CRAN (R 3.6.0)
CRAN (R 3.6.0)
CRAN (R 3.6.0)
CRAN (R 3.6.0)
CRAN (R 3.6.0)
CRAN (R 3.6.0)
CRAN (R 3.6.1)
CRAN (R 3.6.0)
CRAN (R 3.6.0)
CRAN (R 3.6.0)
CRAN (R 3.6.0)
CRAN (R 3.6.0)
CRAN (R 3.6.0)
CRAN (R 3.6.0)
CRAN (R 3.6.0)
CRAN (R 3.6.0)
CRAN (R 3.6.0)
CRAN (R 3.6.0)
Bioconductor
CRAN (R 3.6.0)
CRAN (R 3.6.0)
CRAN (R 3.6.0)
CRAN (R 3.6.0)
CRAN (R 3.6.0)
CRAN (R 3.6.0)
CRAN (R 3.6.1)
CRAN (R 3.6.0)
CRAN (R 3.6.0)
CRAN (R 3.6.0)
CRAN (R 3.6.0)
CRAN (R 3.6.0)
CRAN (R 3.6.0)
CRAN (R 3.6.0)
CRAN (R 3.6.0)
CRAN (R 3.6.0)
CRAN (R 3.6.0)
CRAN (R 3.6.0)
CRAN (R 3.6.0)
CRAN (R 3.6.0)
CRAN (R 3.6.0)
CRAN (R 3.6.0)
CRAN (R 3.6.0)
CRAN (R 3.6.0)
CRAN (R 3.6.0)
CRAN (R 3.6.0)
CRAN (R 3.6.0)
CRAN (R 3.6.0)
Github (romunov/zvau@72f403b)

[1] /Library/Frameworks/R.framework/Versions/3.6/Resources/library

whelk.RData.zip

complete data required to reproduce the problem, a subset of it. The data remains confidential.

Screenshots
If applicable, add screenshots to help explain your problem.

radiator::filter_rad generating individual stats error

Hi Thierry,

I am attempting to use radiator::filter_rad to filter a populations.snps.vcf file from Stacks, but I am consistently getting an error in ''generating individual stats" and then I do not get the output files I've specified. Based on testing with different arguments and versions of the input file (including after filtering out some missing data in vcftools) and on reading other issues here, I suspect there might be a problem with my vcf header that I'm unaware of.

I appreciate any help you can offer. Thanks for making this package and I hope to be able to make it part of my workflow!

-Sarah

the exact command (function, arguments, values) used

All of these lead to the same error:
myfiltereddata <- filter_rad(
data = "populations.snps.vcf",
filter.short.ld = "random",
filter.long.ld = 0.5,
filter.hwe = TRUE,
strata = "strata_locyear.txt",
output = c("vcf","genind"),
filename = "filter_rad_output")

myfiltereddata <- filter_rad(
data = "populations.snps.vcf",
filter.short.ld = "random",
filter.long.ld = 0.5,
filter.hwe = TRUE,
strata = NULL,
output = c("vcf","genind"),
filename = "filter_rad_output")

myfiltereddata <- filter_rad(
data = "populations.snps.vcf",
strata = NULL,
interactive.filter = TRUE)

myfiltereddata <- filter_rad(
data = "populations.snps.vcftools.recode",
strata = NULL,
interactive.filter = TRUE)

myfiltereddata <- filter_rad(
data = "RH_subset.vcf",
strata = NULL,
interactive.filter = TRUE)

the complete error message you're getting

myfiltereddata <- filter_rad(

data = "RH_subset.vcf",
strata = NULL,
interactive.filter = TRUE)
################################################################################
############################# radiator::filter_rad #############################
################################################################################
The function arguments names have changed: please read documentation

Execution date@time: 20190820@0008
Folder created: filter_rad_20190820@0008
Function call and arguments stored in: [email protected]
File written: random.seed (563656)
Filters parameters file generated: [email protected]

Reading VCF
Data summary:
number of samples: 94
number of markers: 185
done! timing: 0 sec

Generating individual stats...
Error in if (stats::sd(id.info$COVERAGE_MEAN) != 0) { :
missing value where TRUE/FALSE needed

Computation time, overall: 1 sec
############################# completed filter_rad #############################
RH_subset.vcf.zip

the output of devtools::session_info()

############################# completed filter_rad #############################

devtools::session_info()
─ Session info ──────────────────────────────────────────────────────────────────────────────────
setting value
version R version 3.6.0 (2019-04-26)
os macOS High Sierra 10.13.6
system x86_64, darwin15.6.0
ui RStudio
language (EN)
collate en_US.UTF-8
ctype en_US.UTF-8
tz America/New_York
date 2019-08-20

─ Packages ──────────────────────────────────────────────────────────────────────────────────────
package * version date lib source
assertthat 0.2.1 2019-03-21 [1] CRAN (R 3.6.0)
backports 1.1.4 2019-04-10 [1] CRAN (R 3.6.0)
Biobase 2.44.0 2019-05-02 [1] Bioconductor
BiocGenerics 0.30.0 2019-05-02 [1] Bioconductor
Biostrings 2.52.0 2019-05-02 [1] Bioconductor
bitops 1.0-6 2013-08-17 [1] CRAN (R 3.6.0)
boot 1.3-23 2019-07-05 [1] CRAN (R 3.6.0)
broom 0.5.2 2019-04-07 [1] CRAN (R 3.6.0)
callr 3.3.1 2019-07-18 [1] CRAN (R 3.6.0)
cli 1.1.0 2019-03-19 [1] CRAN (R 3.6.0)
crayon 1.3.4 2017-09-16 [1] CRAN (R 3.6.0)
data.table 1.12.2 2019-04-07 [1] CRAN (R 3.6.0)
desc 1.2.0 2018-05-01 [1] CRAN (R 3.6.0)
devtools 2.1.0 2019-07-06 [1] CRAN (R 3.6.0)
digest 0.6.20 2019-07-04 [1] CRAN (R 3.6.0)
dplyr 0.8.3 2019-07-04 [1] CRAN (R 3.6.0)
fs 1.3.1 2019-05-06 [1] CRAN (R 3.6.0)
gdsfmt 1.20.0 2019-05-02 [1] Bioconductor
generics 0.0.2 2018-11-29 [1] CRAN (R 3.6.0)
GenomeInfoDb 1.20.0 2019-05-02 [1] Bioconductor
GenomeInfoDbData 1.2.1 2019-08-02 [1] Bioconductor
GenomicRanges 1.36.0 2019-05-02 [1] Bioconductor
glue 1.3.1 2019-03-12 [1] CRAN (R 3.6.0)
GWASExactHW 1.01 2013-01-05 [1] CRAN (R 3.6.0)
hms 0.5.0 2019-07-09 [1] CRAN (R 3.6.0)
IRanges 2.18.1 2019-05-31 [1] Bioconductor
jomo 2.6-9 2019-07-29 [1] CRAN (R 3.6.0)
lattice 0.20-38 2018-11-04 [1] CRAN (R 3.6.0)
lme4 1.1-21 2019-03-05 [1] CRAN (R 3.6.0)
logistf 1.23 2018-07-19 [1] CRAN (R 3.6.0)
magrittr 1.5 2014-11-22 [1] CRAN (R 3.6.0)
MASS 7.3-51.4 2019-03-31 [1] CRAN (R 3.6.0)
Matrix 1.2-17 2019-03-22 [1] CRAN (R 3.6.0)
memoise 1.1.0 2017-04-21 [1] CRAN (R 3.6.0)
mgcv 1.8-28 2019-03-21 [1] CRAN (R 3.6.0)
mice 3.6.0 2019-07-10 [1] CRAN (R 3.6.0)
minqa 1.2.4 2014-10-09 [1] CRAN (R 3.6.0)
mitml 0.3-7 2019-01-07 [1] CRAN (R 3.6.0)
nlme 3.1-140 2019-05-12 [1] CRAN (R 3.6.0)
nloptr 1.2.1 2018-10-03 [1] CRAN (R 3.6.0)
nnet 7.3-12 2016-02-02 [1] CRAN (R 3.6.0)
pan 1.6 2018-06-29 [1] CRAN (R 3.6.0)
pillar 1.4.2 2019-06-29 [1] CRAN (R 3.6.0)
pkgbuild 1.0.3 2019-03-20 [1] CRAN (R 3.6.0)
pkgconfig 2.0.2 2018-08-16 [1] CRAN (R 3.6.0)
pkgload 1.0.2 2018-10-29 [1] CRAN (R 3.6.0)
prettyunits 1.0.2 2015-07-13 [1] CRAN (R 3.6.0)
processx 3.4.1 2019-07-18 [1] CRAN (R 3.6.0)
ps 1.3.0 2018-12-21 [1] CRAN (R 3.6.0)
purrr 0.3.2 2019-03-15 [1] CRAN (R 3.6.0)
R6 2.4.0 2019-02-14 [1] CRAN (R 3.6.0)
radiator * 1.1.2 2019-08-20 [1] Github (53a137d)
Rcpp 1.0.2 2019-07-25 [1] CRAN (R 3.6.0)
RCurl 1.95-4.12 2019-03-04 [1] CRAN (R 3.6.0)
readr 1.3.1 2018-12-21 [1] CRAN (R 3.6.0)
remotes 2.1.0 2019-06-24 [1] CRAN (R 3.6.0)
rlang 0.4.0 2019-06-25 [1] CRAN (R 3.6.0)
rpart 4.1-15 2019-04-12 [1] CRAN (R 3.6.0)
rprojroot 1.3-2 2018-01-03 [1] CRAN (R 3.6.0)
rstudioapi 0.10 2019-03-19 [1] CRAN (R 3.6.0)
S4Vectors 0.22.0 2019-05-02 [1] Bioconductor
SeqArray 1.24.2 2019-07-12 [1] Bioconductor
SeqVarTools 1.22.0 2019-05-02 [1] Bioconductor
sessioninfo 1.1.1 2018-11-05 [1] CRAN (R 3.6.0)
stringi 1.4.3 2019-03-12 [1] CRAN (R 3.6.0)
survival 2.44-1.1 2019-04-01 [1] CRAN (R 3.6.0)
testthat 2.2.1 2019-07-25 [1] CRAN (R 3.6.0)
tibble 2.1.3 2019-06-06 [1] CRAN (R 3.6.0)
tidyr 0.8.3 2019-03-01 [1] CRAN (R 3.6.0)
tidyselect 0.2.5 2018-10-11 [1] CRAN (R 3.6.0)
usethis 1.5.1 2019-07-04 [1] CRAN (R 3.6.0)
vctrs 0.2.0 2019-07-05 [1] CRAN (R 3.6.0)
withr 2.1.2 2018-03-15 [1] CRAN (R 3.6.0)
XVector 0.24.0 2019-05-02 [1] Bioconductor
yaml 2.2.0 2018-07-25 [1] CRAN (R 3.6.0)
zeallot 0.1.0 2018-01-28 [1] CRAN (R 3.6.0)
zlibbioc 1.30.0 2019-05-02 [1] Bioconductor

[1] /Library/Frameworks/R.framework/Versions/3.6/Resources/library

complete data required to reproduce the problem, a subset of it. The data remains confidential.

See attached subset

Fatal error when blacklist is included

Hi Thierry! When I try to run genomic_converter with blacklist.id, I get a fatal error and RStudio shuts down. It seems to be able to read the file because it says there are 11 individuals in the file (which is correct), but after it reads the loci and says Done, the fatal error occurs. I've run genomic_converter without blacklist.id and it works fine so there seems to be an issue with the blacklist.id.

run_bayescan will not run due to missing locus generated by genomic_converter

Dear Thierry,

I have been trying to run the function run_bayescan after filtering my SNPs in radiator. I keep getting the error below from R:

This is the code I used and all files are available at the shared Dropbox link:

Path to Bayescan program

path = "C:/Users/tj248/OneDrive - University of Exeter/Exeter University/PhD Project Documents/Software/BayeScan v 2.1/binaries/BayeScan2.1_win32bits_cmd_line.exe"

Run BayeScan

bayes = run_bayescan("../../R Radiator SNP Filtering/seafan_filt.txt", pr_odds = 100, bayescan.path = path)

I also ran the program using the Windows GUI (using the same input file seafan_filt.txt, and this has told me what appears to be the issue.

When I checked the seafan_filt.txt I noticed that locus 212 is missing from the file which was generated using the genomic_converter function. The code I used for genomic_converter is below:

seafan_filt = genomic_converter(seafan, output=c("arlequin","bayescan","genepop", "pcadapt","structure","vcf","genind"), filename="seafan_filt")

I would be grateful for your help.

Many thanks,
Tom

Release radiator 1.1.0

Prepare for release:

Check that description is informative
Check licensing of included files
usethis::use_cran_comments()
devtools::check()
devtools::check_win_devel()
Polish pkgdown reference index
to use CRAN, remove Remotes and other dependencies found only on GitHub :(
rhub::check_for_cran()
use GitHub Actions to test on the 3 OS

Submit to CRAN:

usethis::use_version('minor')
Update cran-comments.md
devtools::submit_cran()
Approve email

Wait for CRAN...

Old version repository

Dear Thierry,

four months ago I used radiator version 0.6 to filtered my dart data. In this version, the output (e.g estructure, genepop, arlequin) files had SNPs coded as 012. Now I have to refiltering my data but in the version 0.10 the SNPs are coded as 1234.
Is there a repository with old versions of Radiator? if not, is there any command/function to get output files coded as 012?

Thanks in advance

genomic converter is having problem with GT column

Hi Thierry,

Could you please check the code of the genomic converter? For some reason, it is failing to generate output files.

Please, find the code I used and the console output. Thanks

radiator::genomic_converter(data = "tidy.data.snp.ld.rad", output = c("finestructure"))
#######################################################################
##################### radiator::genomic_converter #####################
#######################################################################
Function arguments and values:
Working directory: /Users/buitracn/Documents/RADseq/RADseq-Big-project/VCF-files-poptest/stacks.v2/6pop_448samples/after.dDocent.filters.6pop_345samples/filter_rad_20180914@0156/07_filter_snp_number_20180914@0156
Input file: tidy.data.snp.ld.rad
Strata: no
Population levels: no
Population labels: no
Output format(s): tidy, finestructure
Filename prefix: no
Filters:
Blacklist of individuals: no
Blacklist of genotypes: no
Whitelist of markers: no
monomorphic.out: TRUE
snp.ld: no
common.markers: TRUE
max.marker: no
pop.select: no
maf.thresholds: no

Imputations options:
imputation.method: no

parallel.core: 7

#######################################################################

Importing data

Using markers common in all populations:
Number of markers before/blacklisted/after:24757/0/24757
Scanning for monomorphic markers...
Number of markers before/blacklisted/after: 24757/0/24757

Tidy genomic data:
Number of common markers: 24757
Number of chromosome/contig/scaffold: 905
Number of individuals: 309
Number of populations: 6

Preparing data for output

Error in $<-.data.frame(*tmp*, "GT", value = character(0)) :
replacement has 0 rows, data has 7649913
In addition: Warning message:
Unknown or uninitialised column: 'GT'.

fis_summary inoperable

Attempting to run fis_summary on either a tidy or genlight object returns an error.

fis.test <- fis_summary(tidy.final, vcf.metadata = FALSE)
#######################################################################
##################### radiator::tidy_genomic_data #####################
#######################################################################
Error in radiator::tidy_genomic_data(data = data, vcf.metadata = TRUE, :
Unknowned "..." parameters maf.pop.num.threshold maf.approach maf.operator

Variations of the above code were used with identical results.

RF imputation problem

Hi Thierry,

We are trying to use the RF imputation in radiator. I've emailed you the code and data set. I've tried this with both an older version of radiator (0.0.13) and the current release (0.0.18). In both cases I'm using R 3.5.1 and adegenet 2.1.1.

If you run the code, you'll see that the problem is that the non-imputed and imputed data do not match the original data set -- and they don't match in different ways. I've pasted in that content below (the first 10 rows and 6 columns of each data set). You'll see that while the SNP/column names match in the original (dat) and non-imputed (gc$genlight.no.imputation) data sets, the actual data do not. Then, in the imputed data (gc$genlight.imputed), neither the SNP names nor the data match. In all cases, the row names seem to be maintained. Any idea what is going on here?

Thank you,
Brenna

Bug in sexy_markers?

Hi,

I have just started using your package to try to identify sex linked markers in a DArT dataset, and have run into problems with the "sexy_markers" function. I am sure this is probably due to my own incompetence, but as this is quite a new package I quickly ran into a dead end for troubleshooting short of pestering you here, so I apologise.

I created a "strata" file (individual IDs from Dart file + "strata", which is sex - M, F, U in this case) and have a csv file from Dart. I ran the following code using these two files

sexy_markers("Report_DAmh19-4061_SNP_2", strata = AM_STRATA.tsv, filters = TRUE)

and received the following error

Execution date@time: 20190709@1555
Folder created: sexy_markers_20190709@1555
File written: [email protected]
Error in file(con, "r") : cannot open the connection

Computation time, overall: 0 sec
############################ sexy markers completed #############################

Do you have any suggestions as to how I can resolve this issue?

Thanks!

Session info ------------------------------------------------------------------------------
setting value
version R version 3.5.3 (2019-03-11)
os Windows 10 x64
system x86_64, mingw32
ui RStudio
language (EN)
collate English_Australia.1252
ctype English_Australia.1252
tz Australia/Sydney
date 2019-07-09

ibdg_fh() FH calculation and summary_rad() calculations

Hi Thierry,

I'm trying out ibdg_fh, but I'm not sure I understand your modification to the way it's calculated in PLINK. It looks like I can recreate the radiator-calculated FH by dividing the difference of the observed and expected homozygous proportions by the count of loci, while PLINK uses all counts as described at https://www.cog-genomics.org/plink/1.9/basic_stats#ibc (same result if use all proportions; issue just seems to be mixing proportions with counts in the calculation). Attaching a spreadsheet with radiator values and manually-calculated PLINK values to make it clearer.

The description of the radiator ibdg_fh function sounds like you did modify it intentionally to differ from PLINK, but the current calculation seems like a bug. Also, is the population-level FH calculated by averaging the individual FH values?

And relatedly, could you explain how the calculations for summary_rad() are produced (in comparison to, say, basic.stats() from hierfstat? FIS values in particular are vastly different. Spreadsheet comparing those also attached.

sphanorth124spatial.fh.individuals.xlsx

hierfstatbasicstats.radiatorsummarystats.sphanorth124spatial.radiator.50pctmsng.xlsx

A few issues with `genomic_converter()` function

Hi Thierry,

Working with the package, mainly to clean, import and convert SNP data to different formats, I've been trying to use genomic_converter() function and came up with a few issues with its behaviour:

When exporting to SNPRelate format, it ignores the provided output filename and creates a date-signature based one (see related pull request).
Using vcf.metadata=TRUE argument with a VCF file resulted in an error (object DP not found).
Confusing inconsistency with function argument rules - blacklist.id argument can accept either a file or a data.frame object, while blacklist.genotype can only a filename containing a data.frame. I know it appears in the function documentation, but this inconsistency got me confused for a while until I double checked the fine details. I suggest making both arguments work with R objects, it makes much more sense than relying on files.
snp.ld lets you choose the first, last or random SNP, while to me it makes sense to allow choosing a SNP that is NOT first nor last, because the ones at the tag ends are often supported by fewer reads and are less usable in validation (if flanking primers are to be designed).

That's it for now, thanks, Ido

write.pcadapt and write.vcf not giving useable results, could this be the POP ID column?

Hi Thierry,

I'm trying to do a quick PCAdapt analysis with a tidy file generated in radiator, but it seems that something is going wrong with file conversion to both pcadapt and vcf formats, or that our population labels are not working correctly. Currently, pops are labelled using numbers. I can send you files if it makes it easier to address, but I will write as much as possible here. I am running R v 3.5.1, and did a fresh install of radiator.

When I try to convert the tidy dataframe to pcadapt format using
write_pcadapt("brook_char_tidy_maf.tsv", pop.select = c("1","2","3","4","5","6","7","8","9","10","11","14","16","17"), snp.ld = NULL,
maf.thresholds = NULL, filename = NULL,
parallel.core = parallel::detectCores() - 1)

Everything seems to go fine, except that I get a lot of zeros showing up in the data in the console. This seems a little odd.

Small sampling here
$genotype.matrix
Blackfly-d1084 Blackfly-d1191 Blackfly-d1797
[1,] "0" "0" "0"
[2,] "9" "0" "0"
[3,] "9" "1" "2"
Blackfly-d2016 Blackfly-d2125 Blackfly-d2204
[1,] "0" "0" "0"
[2,] "0" "1" "1"
[3,] "2" "2" "2"
Blackfly-d2305 Blackfly-d2365 Blackfly-d2388
[1,] "0" "0" "0"
[2,] "0" "1" "0"
[3,] "1" "0" "2"
Blackfly-d2732 Blackfly-d2771-bf Blackfly-d2975
[1,] "0" "0" "0"
[2,] "0" "0" "0"
[3,] "1" "2" "1"
Blackfly-d3277 Blackfly-d3469 Blackfly-d3507
[1,] "0" "0" "0"
[2,] "0" "1" "2"
[3,] "2" "2" "1"
Blackfly-d3580 Blackfly-d3638 Blackfly-d503
[1,] "0" "0" "0"
[2,] "0" "2" "1"
[3,] "2" "1" "2"
Blackfly-d940 BobsCove-C1028 BobsCove-C1090
[1,] "0" "0" "0"
[2,] "0" "0" "0"
[3,] "0" "0" "0"
BobsCove-C1140 BobsCove-C1145 BobsCove-C1148
[1,] "0" "0" "0"
[2,] "9" "0" "0"
[3,] "0" "0" "0"
BobsCove-C1189 BobsCove-C1215 BobsCove-C1401
[1,] "0" "0" "0"
[2,] "0" "0" "0"
[3,] "0" "0" "0"
BobsCove-C1449 BobsCove-C1476 BobsCove-C1493
[1,] "0" "0" "0"
[2,] "0" "0" "0"
[3,] "0" "0" "0"
BobsCove-C1526 BobsCove-C1527 BobsCove-C1580
[1,] "0" "0" "0"
[2,] "0" "0" "0"
[3,] "0" "0" "0"
BobsCove-C1607 BobsCove-C1654 BobsCove-C2255
[1,] "0" "0" "0"
[2,] "0" "0" "0"
[3,] "0" "0" "0"
BobsCove-C2279 BobsCove-C2304 BobsCove-C2316
[1,] "0" "9" "0"
[2,] "0" "0" "0"
[3,] "0" "0" "0"
BobsCove-C2390 BobsCove-C2435 BobsCove-C2451
[1,] "0" "0" "0"
[2,] "0" "0" "0"
[3,] "0" "0" "0"
BobsCove-C2452 BobsCove-C2462 BobsCove-C2530

Looking through the tsv tidy file (which is my collaborators), I see some things that may be a bit odd, but he has produced many analyses successfully with the file, so perhaps it is fine. However, I notice things like that the ALT and REF alleles are the same for the whole file (diff from eachother, but the same for all loci). Anyway, in case this file is fine I run

require(pcadapt)

path_to_file <- "D:/Documents/PostDoc/collaboration-MYates-CapeRace-PondStocking/[email protected]"
MY <- read.pcadapt(path_to_file,type="pcadapt")

And I get the error
1547 lines detected.
324 columns detected.
Warning message:
Only one 'return' characters detected, yet Windows

When I try a different route, using
write_vcf("brook_char_tidy_maf.tsv", pop.info = TRUE, filename = NULL)

I get an error
Error in .f(.x[[i]], ...) : object 'POP_ID' not found

However, I know that this field is in the datafile, so this is weird.

To test to see if the file works without the populations field, I try
write_vcf("brook_char_tidy_maf.tsv", pop.info = FALSE, filename = NULL)

The vcf that I get out of this looks strange in comparison to other vcf files that I have worked with. There are a lot of zeros. Looking at the tidy file though, this may be due to some issue converting the vcf to a tidy file. I can send you both the vcf and the tidy files if you'd like. Anyway, I didn't get any complains from R, so I ran

require(pcadapt)
path_to_file <- "D:/Documents/PostDoc/collaboration-MYates-CapeRace-PondStocking/[email protected]"
MY <- read.pcadapt(path_to_file,type="vcf")

This looks more like what I should see
No variant got discarded.
Summary:

- input file:				D:/Documents/PostDoc/collaboration-MYates-CapeRace-PondStocking/[email protected]
- output file:				C:\Users\Admin\AppData\Local\Temp\RtmpCeIy6G\file183c1e3197.pcadapt

- number of individuals detected:	324
- number of loci detected:		4614

4614 lines detected.
324 columns detected.

I fetched my output file, and then ran

x <- pcadapt(MY,K=20)
plot(x,option="screeplot")

data <- as.matrix(read.table("file183c1e3197.pcadapt"))
#check the data
data[1:5,1:6]
V1 V2 V3 V4 V5 V6
[1,] 9 9 9 9 9 9
[2,] 9 0 1 2 0 2
[3,] 0 0 0 0 0 0
[4,] 9 9 9 9 9 9
[5,] 9 0 9 0 0 0
#Without removal of outliers

I skip the population id step b/c we didn't identify them, so go to

x<-pcadapt(data,K=9)

And then it errors out
Error in UseMethod("pcadapt") :
no applicable method for 'pcadapt' applied to an object of class "c('matrix', 'integer', 'numeric')"

Any thoughts about what could be causing these problems? I feel like I should send you my .tsv tidy file, and maybe the vcf and pcadapt files that i generated from those in radiator.

With thanks,
Ella

missing data is still present after imputation in RF

Hi Thierry,

We have a new RF issue with v0.0.20, where a warning indicates that there's still missing data after imputation, even though I don't see any NA in the imputed genlight object.

Thanks for any help!

Amanda

>   gc <- radiator::genomic_converter(data = miss.genlight,
+                                     output = "genlight",
+                                     imputation.method = "rf",
+                                     monomorphic.out = FALSE,
+                                     hierarchical.levels = "global",
+                                     verbose = TRUE)

#######################################################################
##################### radiator::genomic_converter #####################
#######################################################################
Function arguments and values:
Working directory: /mnt/ceph/stah3621/imputation
Input file: from global environment
Strata: no
Population levels: no
Population labels: no
Output format(s): tidy, genlight
Filename prefix: no
Filters:
Blacklist of individuals: no
Blacklist of genotypes: no
Whitelist of markers: no
monomorphic.out: FALSE
snp.ld: no
common.markers: TRUE
max.marker: no
pop.select: no
maf.thresholds: no

Imputations options:
imputation.method: rf
hierarchical.levels: global

parallel.core: 47

#######################################################################

Importing data

Number of markers missing in all individuals and removed: 1

Tidy genomic data:
Number of markers: 500
Number of chromosome/contig/scaffold: 1
Number of individuals: 94

Preparing data for output

Data is bi-allelic

#######################################################################
####################### grur::grur_imputations ########################
#######################################################################
Imputation method: rf
Hierarchical levels: global
On-the-fly-imputations options:
number of trees to grow: 50
minimum terminal node size: 1
non-negative integer value used to specify random splitting: 10
number of iterations: 10
Number of CPUs: 47
Note: If you have speed issues: follow radiator's vignette on parallel computing

Number of populations: 1
Number of individuals: 94
Number of markers: 500

Proportion of missing genotypes before imputations: 0.298319
On-the-fly-imputations using Random Forests algorithm
Imputations computed globally, take a break...
Adjusting REF/ALT alleles to account for imputations...
generating REF/ALT dictionary
integrating new genotype codings...

Proportion of missing genotypes after imputations: 0

Computation time: 8 sec
################## grur::grur_imputations completed ###################
Generating adegenet genlight object without imputation
Generating adegenet genlight object WITH imputations

Writing tidy data set:
[email protected]

Writing tidy data set:
[email protected]
############################### RESULTS ###############################
Data format of input: genlight
Biallelic data
Number of common markers: 500
Number of chromosome/contig/scaffold: 1
Number of individuals 94

Computation time: 11 sec
################ radiator::genomic_converter completed ################
Warning messages:
1: In cleanup(mc.cleanup) : unable to terminate child: No such process
2: In radiator::radiator_imputations_module(data = input, imputation.method = imputation.method, :
Missing data is still present in the dataset
2 options:
run the function again with hierarchical.levels = 'global'
use common.markers = TRUE when using hierarchical.levels = 'strata'

> which(is.na(as.matrix(gc$genlight.imputed)))
integer(0)

> which(is.na(as.matrix(gc$genlight.no.imputation)))[1:10]
[1] 1 2 4 5 8 9 11 12 16 17

> sessionInfo()
R version 3.5.0 (2018-04-23)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: CentOS Linux 7 (Core)

Matrix products: default
BLAS: /opt/modules/devel/R/3.5.0/lib64/R/lib/libRblas.so
LAPACK: /opt/modules/devel/R/3.5.0/lib64/R/lib/libRlapack.so

locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats graphics grDevices utils datasets methods
[7] base

other attached packages:
[1] bindrcpp_0.2.2 randomForestSRC_2.8.0 psych_1.8.10
[4] vegan_2.5-3 lattice_0.20-38 permute_0.9-4
[7] tidyr_0.8.2 adegenet_2.1.1 ade4_1.7-13
[10] radiator_0.0.18

loaded via a namespace (and not attached):
[1] nlme_3.1-137 fs_1.2.6 usethis_1.4.0
[4] devtools_2.0.1 gmodels_2.18.1 rprojroot_1.3-2
[7] tools_3.5.0 backports_1.1.3 R6_2.3.0
[10] spData_0.3.0 lazyeval_0.2.1 mgcv_1.8-26
[13] colorspace_1.4-0 withr_2.1.2 sp_1.3-1
[16] tidyselect_0.2.5 prettyunits_1.0.2 mnormt_1.5-5
[19] processx_3.2.1 curl_3.3 compiler_3.5.0
[22] cli_1.0.1 expm_0.999-3 desc_1.2.0
[25] scales_1.0.0 readr_1.3.1 callr_3.1.1
[28] stringr_1.3.1 digest_0.6.18 foreign_0.8-71
[31] pkgconfig_2.0.2 htmltools_0.3.6 fst_0.8.10
[34] sessioninfo_1.1.1 rlang_0.3.1 shiny_1.2.0
[37] bindr_0.1.1 gtools_3.8.1 spdep_0.8-1
[40] dplyr_0.7.8 magrittr_1.5 Matrix_1.2-15
[43] Rcpp_1.0.0 munsell_0.5.0 ape_5.2
[46] stringi_1.2.4 MASS_7.3-51.1 pkgbuild_1.0.2
[49] plyr_1.8.4 grid_3.5.0 parallel_3.5.0
[52] gdata_2.18.0 listenv_0.7.0 promises_1.0.1
[55] crayon_1.3.4 deldir_0.1-15 splines_3.5.0
[58] hms_0.4.2 ps_1.3.0 pillar_1.3.1
[61] igraph_1.2.2 boot_1.3-20 seqinr_3.4-5
[64] reshape2_1.4.3 codetools_0.2-16 pkgload_1.0.2
[67] LearnBayes_2.15.1 glue_1.3.0 data.table_1.12.0
[70] remotes_2.0.2 httpuv_1.4.5.1 testthat_2.0.1
[73] gtable_0.2.0 purrr_0.2.5 future_1.10.0
[76] amap_0.8-16 assertthat_0.2.0 ggplot2_3.1.0
[79] mime_0.6 xtable_1.8-3 coda_0.19-2
[82] later_0.7.5 tibble_2.0.1 pbmcapply_1.3.1
[85] memoise_1.1.0 cluster_2.0.7-1 globals_0.12.4

Is there a way to not trim for markers in common when using the write_pcadapt function in radiator

Hi Thierry,

Is there a way to not trim for markers in common when using the write_pcadapt function in radiator? I don't see a field for this in the documentation, but I'd like to try PCAdapt using all of the markers in my tidy file.

With thanks,
Ella

vignette links broken

that's all. they return the "not found" github page

run_bayescan

whelk_filter.gen.zip

Dear Thierry,
I want to use the run_bayescan function. Apparently done everything as instructed, but I get an error... would you be able to see what went wrong?
Many thanks,
Rita

convert genepop to genind

data.genind <- adegenet::import2genind("whelk_fitered.gen"),ncode = 2)

convert genind to radiator

data.genind.rad <- tidy_genomic_data(data.genind,
keep.allele.names = TRUE,
tidy = TRUE,
gds = TRUE,
write = FALSE, verbose = TRUE)

place bayescan in /usr/local/bin/ folder

convert tidy into bayescan

write_bayescan(data.genind.rad,
pop.select = NULL,
filename = "file.bayescan",
parallel.core = parallel::detectCores() - 1
)
run_bayescan("file.bayescan",
n = 5000,
thin = 10,
nbp = 20,
pilot = 5000,
burn = 50000,
pr_odds=1000,
subsample = NULL,
iteration.subsample = 1,
parallel.core = parallel::detectCores() - 1,
bayescan.path = "/usr/local/bin/bayescan")

#######################################################################
###################### radiator::run_bayescan #########################
#######################################################################

Folder created:
radiator_bayescan_20190817@1608
For progress, look in the log file: [email protected]
Copying input BayeScan file in folder
sh: line 1: 75707 Abort trap: 6 '/usr/local/bin/bayescan' /Users/rita/Desktop/declan/radiator_bayescan_20190817@1608/file.bayescan -od /Users/rita/Desktop/declan/radiator_bayescan_20190817@1608 -all_trace -threads 3 -n 5000 -thin 10 -nbp 20 -pilot 5000 -burn 50000 -pr_odds 1000 > '/Users/rita/Desktop/declan/radiator_bayescan_20190817@1608/[email protected]' 2>&1
Importing BayeScan results
Error in if (length(x) > 1 || grepl("\n", x)) { :
missing value where TRUE/FALSE needed
In addition: Warning messages:
1: Unnamed col_types should have the same length as col_names. Using smaller of the two.
2: Unnamed col_types should have the same length as col_names. Using smaller of the two.

Error tidy_genomic_data

When I am trying to use the function, it gives this error.

Error in tidy_genomic_data(data = data, strata = strata, filename = filename, :
object 'gt.vcf.nuc' not found

Cheers

Release tags for older versions

Hi Thierry,

Is it possible to tag older versions of releases for reproducibility, as suggested in r-lib/devtools#1469? From the discussion in Issue #19 , it seems like we would have to source files explicitly by commit history rather than releases.

Thanks!

Amanda

"Error: object 'ALLELE_REF_DEPTH' not found" when converting from VCF

Hi,

Running radiator, updated package and tested just a few minutes ago (and was getting it the past week on radiator 1.0.0), using R 3.5.1 on Windows.

I have been unable to convert a VCF output from ipyrad 0.7.28, and the same after being filtered in vcftools, whether using filter_rad(), tidy_vcf(), or tidy_genomic_data(). It looks like it still completes the conversion to a gds, but I get no tidy .rad data frame out, and no saved object in R.

(One issue I'd first like to note to ipyrad users is: for radiator to read your vcf at all, you need to change the NUMBER=1 in ##FORMAT=<ID=CATG,Number=1,Type=String,Description="Base Counts (CATG)"> to NUMBER=4)

Header on ipyrad VCF with first line for first sample is:

##fileformat=VCFv4.0
##fileDate=2019/01/13
##source=ipyrad_v.0.7.28
##reference=genomeassembly.fasta
##phasing=unphased
##INFO=<ID=NS,Number=1,Type=Integer,Description="Number of Samples With Data">
##INFO=<ID=DP,Number=1,Type=Integer,Description="Total Depth">
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
##FORMAT=<ID=DP,Number=1,Type=Integer,Description="Read Depth">
##FORMAT=<ID=CATG,Number=4,Type=String,Description="Base Counts (CATG)"> 
#CHROM	POS	ID	REF	ALT	QUAL	FILTER	INFO	FORMAT	sample1 
locus_100006	5	.	G	A	13	PASS	NS=120;DP=2002	GT:DP:CATG	0/0:250:0,0,0,250

The attempted functions I stall out when extracting DP information:

Reading VCF...you have time for a espresso...
Data summary: 
    number of samples: 267
    number of markers: 44310
done! timing: 33 sec


Filter monomorphic markers
Number of individuals / strata / chrom / locus / SNP:
    Blacklisted: 0 / 0 / 25 / 31 / 31
Only 1 strata...returning data
[==================================================] 100%, completed in 1s
Extracting DP information...
Error: object 'ALLELE_REF_DEPTH' not found

Computation time, overall: 47 sec

I'm assuming ALLELE_REF_DEPTH means radiator wants two depth values, one for the REF allele and one for the ALT, but my VCF only has the total depth (though shows number of CATG reads), and the VCF 4.1 format also has no mention of REF vs ALT-specific depth. Any idea what's going on? Possibly an issue with the seemingly ipyrad-specific CATG format ID?

(also unrelated: if I try to run functions with parallel.core > 1, I get this error:

Error in .DynamicClusterCall(cl, length(cl), .fun = function(.proc_idx,  : 
  One of the nodes produced an error: Can not open file 'G:\My Drive\Illumina Sequencing Data\20181212_rangewide\sphaOCclust85\tidy_vcf_20190318@1356\[email protected]'. The process cannot access the file because it is being used by another process.

And one other miscellaneous error I get when I install/update radiator, in case it helps: "Warning: unable to re-encode 'filter_monomorphic.R' line 7"

Thanks

Issue with converting VCF to genlight

Hi Thierry,

Thanks for all your work with radiator! I'm getting an error when trying to convert a VCF file (v4.2, biallelic, produced by FreeBayes) to genlight. This is the command used:

genomic_converter(data="~/Downloads/TotalRawSNPsHISEQ.biallelic.vcf.recode.vcf", output='genlight', vcf.metadata = TRUE, strata="~/Downloads/strata.tsv")

And the output/error message:

#######################################################################
##################### radiator::genomic_converter #####################
#######################################################################
Function arguments and values:
Working directory: /Users/Emily
Input file: ~/Downloads/TotalRawSNPsHISEQ.biallelic.vcf.recode.vcf
Strata: ~/Downloads/strata.tsv
Population levels: no
Population labels: no
Output format(s): tidy, genlight
Filename prefix: no
Filters: 
Blacklist of individuals: no
Blacklist of genotypes: no
Whitelist of markers: no
monomorphic.out: TRUE
snp.ld: no
common.markers: TRUE
max.marker: no
pop.select: no
maf.thresholds: no

Imputations options:
imputation.method: no

parallel.core: 3

#######################################################################

Importing data


Reading VCF...
Large vcf file may take several minutes...

conversion timing: 128 sec

radiator is working on the file ...
VCF is biallelic
Updating markers metadata and stats
Error in cbind_all(x) : Argument 3 must be length 378, not 3
In addition: Warning message:
In mclapply(seq_len(njobs), mc.preschedule = FALSE, mc.cores = njobs,  :
  3 function calls resulted in an error

And here is my sessionInfo():

R version 3.5.1 (2018-07-02)
Platform: x86_64-apple-darwin16.7.0 (64-bit)
Running under: macOS Sierra 10.12.6

Matrix products: default
BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libLAPACK.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods  
[7] base     

other attached packages:
[1] bindrcpp_0.2.2  radiator_0.0.16 adegenet_2.1.1  ade4_1.7-13    

loaded via a namespace (and not attached):
 [1] nlme_3.1-137           bitops_1.0-6          
 [3] gmodels_2.18.1         GenomeInfoDb_1.16.0   
 [5] tools_3.5.1            R6_2.2.2              
 [7] vegan_2.5-2            spData_0.2.9.0        
 [9] lazyeval_0.2.1         BiocGenerics_0.26.0   
[11] mgcv_1.8-24            colorspace_1.3-2      
[13] permute_0.9-4          sp_1.3-1              
[15] tidyselect_0.2.4       compiler_3.5.1        
[17] expm_0.999-2           scales_0.5.0          
[19] readr_1.1.1            stringr_1.3.1         
[21] digest_0.6.15          XVector_0.20.0        
[23] pkgconfig_2.0.1        htmltools_0.3.6       
[25] fst_0.8.8              rlang_0.2.2           
[27] shiny_1.1.0            bindr_0.1.1           
[29] gtools_3.8.1           spdep_0.7-7           
[31] dplyr_0.7.6            RCurl_1.95-4.11       
[33] magrittr_1.5           GenomeInfoDbData_1.1.0
[35] Matrix_1.2-14          Rcpp_0.12.18          
[37] munsell_0.4.3          S4Vectors_0.18.3      
[39] ape_5.1                stringi_1.2.4         
[41] yaml_2.1.18            MASS_7.3-50           
[43] zlibbioc_1.26.0        plyr_1.8.4            
[45] grid_3.5.1             parallel_3.5.1        
[47] gdata_2.18.0           listenv_0.7.0         
[49] promises_1.0.1         deldir_0.1-15         
[51] lattice_0.20-35        Biostrings_2.48.0     
[53] splines_3.5.1          hms_0.4.2             
[55] pillar_1.2.1           igraph_1.2.1          
[57] GenomicRanges_1.32.6   boot_1.3-20           
[59] seqinr_3.4-5           reshape2_1.4.3        
[61] codetools_0.2-15       gdsfmt_1.16.0         
[63] stats4_3.5.1           LearnBayes_2.15.1     
[65] glue_1.3.0             data.table_1.11.4     
[67] httpuv_1.4.5           gtable_0.2.0          
[69] purrr_0.2.5            tidyr_0.8.1           
[71] SeqArray_1.21.4        future_1.9.0          
[73] amap_0.8-16            assertthat_0.2.0      
[75] ggplot2_3.0.0          mime_0.5              
[77] xtable_1.8-2           coda_0.19-1           
[79] later_0.7.3            tibble_1.4.2          
[81] pbmcapply_1.2.5        IRanges_2.14.11       
[83] cluster_2.0.7-1        globals_0.12.1

Any input/help would be much appreciated!
Emily

Install of gcc on Mac High-Sierra

I'm trying to follow your directions for system configuration for HPC on my Mac High-Sierra installation (ver. 10.13.6), in order to use it with radiator. I am following your vignette rad_genomics_computer_setup.nb.html.
There are some issues.

I get errors at this first step:

bgppermp$ sudo tar -zxvf gcc-8.1-bin.tar.gz -C/
x usr/local/: Can't set user=0/group=0 for usr/localFailed to set file flags
x usr/local/bin/
x usr/local/.com.apple.installer.keep
x usr/local/libexec/
x usr/local/include/
.
.
.
tar: Error exit delayed from previous errors.
The issue was access to /usr/local The solution was to set /usr/local/bin first in the path.

I followed your directions for clang, by copy-pasting, here is what happens:
bgppermp$ sudo tar -xzvf clang+llvm-6.0.0-x86_64-apple-darwin.tar.xz -C/usr/local –strip-components 1
Password:
tar: 1: Not found in archive
tar: –strip-components: Not found in archive
tar: Error exit delayed from previous errors.

The problem is that the --strip-components term is specified incorrectly. The entire string should be:
sudo tar -xzvf clang+llvm-6.0.0-x86_64-apple-darwin.tar.xz -C/usr/local --strip-components=1

You might correct your instructions for HPC at the relevant places.

Error: Duplicate identifiers for rows

Hi Thierry,
Thanks your powerful function!
When I try to use vcf file output from genomic_converter, some function report "Error: Duplicate identifiers for rows."
I used this code to Impute my data

genomic_converter("data.vcf",output ="vcf",
filename = "test2",
monomorphic.out = T,
common.markers = FALSE,
pop.levels = levels(pop$STRATA),
imputation.method = "max",
strata = "map.tsv")

I try used genomic_converter again and get error

genomic_converter("test2.vcf",output ="vcf",
filename = "test3",
monomorphic.out = T,
common.markers = FALSE,
pop.levels = levels(pop$STRATA),
maf.thresholds = c("locus", 1, "OR", 1, 1),
strata = "map.tsv")
and I get

Error: Duplicate identifiers for rows (1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154, 155, 156, 157, 158, 159, 160, 161, 162, 163, 164, 165, 166, 167, 168, 169, 170, 171, 172, 173, 174, 175, 176, 177, 178, 179, 180, 181, 182, 183, 184, 185, 186, 187, 188, 189, 190, 191, 192, 193, 194, 195, 196, 197, 198, 199, 200, 201, 202, 203, 204, 205, 206, 207, 208, 209, 210, 211, 212, 213, 214

Any input/help would be much appreciated!
Clark

Indistict allele names from vcf2genind

Hello,
Great package, and very useful! Just came across an error when trying to do a hierfstat function on a genind object created with vcf2genind.
stop("alleles must be encoded as integers or nucleotides. Exiting")
When I look at the allele names for my genind, I see that they are all A1, A2:
head([email protected])
$locus_100026__41__41
1. 'A1'
2. 'A2'
Is there a way to get allele names that reference the original genotype? I'm pretty sure radiator is reading in my vcf correctly, because it outputs genotype info with other file conversions.

Error importing .tped files with genomic_converter()

Hi Thierry,

I'm trying to import/convert a PLINK .tped file using genomic_converter(), but I keep getting the same error (tried it with a few different sources).

GBS_data <- genomic_converter("../1A-CB478ANXX_NS_filtered_plink.tped", imputation.method = "rf")
Error in data.table::fread(input = stringi::stri_replace_all_fixed(str = data,  : 
  Column number 2 (colClasses[[1]][2]) is out of range [1,ncol=1]

There is a relevant .tfam file and I'm able to use these .tped files with other packages (such as GenABEL)
Thank you very much for developing this (and stackr) package!!

detect_genomic_converter doesn't work for .tped

detect_genomic_format outputs the first line of the data as the data.type if data is a plink file. This is because
file.ending <- stringi::stri_sub(str = data, from = -4, to = -1) outputs "tped" and the function expects ".tped".

This is easily fixed by making the function expect "tped"
This goes on to break both tidy_genomic_data and genomic_converter if you're working with plink files.

Importing genind object using genomic_converter

Hi Thierry,

I've done some analysis in Adegenet. My Genind object contains 93 individuals, 9 populations and 6,386 loci:

/// GENIND OBJECT /////////
// 93 individuals; 6,386 loci; 12,772 alleles; size: 7.4 Mb
// Basic content @tab: 93 x 12772 matrix of allele counts @loc.n.all: number of alleles per locus (range: 2-2) @loc.fac: locus factor for the 12772 columns of @tab @all.names: list of allele names for each locus @ploidy: ploidy of each individual (range: 2-2) @type: codom @call: adegenet::df2genind(X = t(x), sep = sep)
// Optional content @pop: population of each individual (group size range: 8-18)

I want to use the genomic_converter function to convert and export the Genind object into genepop and bayescan files. My understanding of the function is that it accepts Genind data from the global environment, however, I keep getting the error below?

convert = genomic_converter(seafan, output=c("genepop","bayescan"))

####################################################################### ##################### radiator::genomic_converter ##################### #######################################################################
Function arguments and values: Working directory: C:/Users/tj248/OneDrive - University of Exeter/Exeter University/PhD Project Documents/Pink Sea Fan/NextRAD Project 2017/nextRAD SNP Data Analysis Input file: from global environment Strata: no Population levels: no Population labels: no Output format(s): tidy, genepop, bayescan Filename prefix: no Filters: Blacklist of individuals: no Blacklist of genotypes: no Whitelist of markers: no monomorphic.out: TRUE snp.ld: no common.markers: TRUE max.marker: no pop.select: no maf.thresholds: no

Imputations options:
imputation.method: no

parallel.core: 3

#######################################################################

Importing data

Error in mutate_impl(.data, dots) : Evaluation error: object 'MARKERS' not found.

Genomic_converter error : haplotypes to vcf

Hi Thierry,
I'm trying to use genome_converter in radiator to convert haplotypes file generated by Stacks v1.3 to vcf and I get the following error:

Error in dplyr::if_else(POLYMORPHISM == 0, stringi::stri_join(GT_VCF_NUC, : object 'POLYMORPHISM' not found.

The code I used:

genomic_converter(data = batch_1.haplotypes.tsv, strata = map.txt, output = c("vcf"), verbose = TRUE)

Traceback:

dplyr::if_else(POLYMORPHISM == 0, stringi::stri_join(GT_VCF_NUC, "/", GT_VCF_NUC), GT_VCF_NUC, missing = "./.")
12.
mutate_impl(.data, dots, caller_env())
11.
mutate.tbl_df(., GT_VCF_NUC = dplyr::if_else(POLYMORPHISM == 0, stringi::stri_join(GT_VCF_NUC, "/", GT_VCF_NUC), GT_VCF_NUC, missing = "./."), GT_VCF_NUC = dplyr::if_else(stringi::stri_detect_fixed(GT_VCF_NUC, "N"), "./.", GT_VCF_NUC))
10.
dplyr::mutate(., GT_VCF_NUC = dplyr::if_else(POLYMORPHISM == 0, stringi::stri_join(GT_VCF_NUC, "/", GT_VCF_NUC), GT_VCF_NUC, missing = "./."), GT_VCF_NUC = dplyr::if_else(stringi::stri_detect_fixed(GT_VCF_NUC, "N"), "./.", GT_VCF_NUC))
9.
function_list[i]
8.
freduce(value, _function_list)
7.
_fseq(_lhs)
6.
eval(quote(_fseq(_lhs)), env, env)
5.
eval(quote(_fseq(_lhs)), env, env)
4.
withVisible(eval(quote(_fseq(_lhs)), env, env))
3.
input %>% dplyr::mutate(GT_VCF_NUC = dplyr::if_else(POLYMORPHISM == 0, stringi::stri_join(GT_VCF_NUC, "/", GT_VCF_NUC), GT_VCF_NUC, missing = "./."), GT_VCF_NUC = dplyr::if_else(stringi::stri_detect_fixed(GT_VCF_NUC, "N"), "./.", GT_VCF_NUC)) %>% dplyr::select(-POLYMORPHISM)
2.
tidy_genomic_data(data = data, strata = strata, filename = filename, parallel.core = parallel.core, whitelist.markers = whitelist.markers, blacklist.id = blacklist.id, vcf.metadata = vcf.metadata, vcf.stats = vcf.stats, keep.allele.names = keep.allele.names

I would like to know whether the error is due to the input file. (I'm using an older version of Stacks).
I checked the file with detect_genomic_format and radiator recognizes the file as haplo.file.

Thanks in advance for your help!
Best,
Farida

deprecated function

Hi ,
When running genomic_converter outputting to bayescan and running run_bayescan I get the warning:
Deprecated function, update your code to use: filter_monomorphic

I am assuming it is this line which needs to be updated in write_bayescan?
data <- radiator::discard_monomorphic_markers(data = data, verbose = TRUE)$input

tidy_genomic_data() filters common markers even when filter.common.markers=FALSE

Hi Thierry,

Trying to import a vcf using tidy_genomic_data without filtering common markers, but it filters them anyway:

tidy.sphanorth124spatial.radiator.50pctmsng2 <- tidy_genomic_data(data="G:/My Drive/Illumina Sequencing Data/20181212_rangewide/gitprojects/sphanorth124spatial/radiator_final/sphanorth124spatial.radiator.50pctmsng.oneSNPmac3.vcf",
                                                        strata="G:/My Drive/Illumina Sequencing Data/20181212_rangewide/gitprojects/sphanorth124spatial/sphanorth124spatial_popcoords_radiator_strata.txt",
                                                        parallel.core=1,
                                                        filter.common.markers=FALSE)

Execution date/time: 20190503@1233
Folder created: 377_radiator_tidy_genomic_20190503@1233
Function call and arguments stored in: [email protected]
Analyzing strata file
    Number of strata: 27
    Number of individuals: 124
Importing and tidying the VCF...
Execution date@time: 20190503@1233

Reading VCF
Data summary: 
    number of samples: 124
    number of markers: 13524
done! timing: 2 sec


Filter monomorphic markers
Number of individuals / strata / chrom / locus / SNP:
    Blacklisted: 0 / 0 / 0 / 0 / 0

Strata with low sample size detected: fig <- FALSE


Filter common markers:
Number of individuals / strata / chrom / locus / SNP:
    Blacklisted: 0 / 0 / 7453 / 7453 / 7453
Generating individual stats...
[==================================================] 100%, completed in 0s
Generating markers stats...
[==================================================] 100%, completed in 0s
[==================================================] 100%, completed in 0s


Number of chromosome/contig/scaffold: 6071
Number of locus: 6071
Number of markers: 6071
Number of populations: 27
Number of individuals: 124

Session info:

- Session info ------------------------------------------------------------------------------------------------------------------
 setting  value                       
 version  R version 3.5.1 (2018-07-02)
 os       Windows >= 8 x64            
 system   x86_64, mingw32             
 ui       RStudio                     
 language (EN)                        
 collate  English_United States.1252  
 ctype    English_United States.1252  
 tz       America/Los_Angeles         
 date     2019-05-03                  

- Packages ----------------------------------------------------------------------------------------------------------------------
 package          * version   date       lib source                                   
 ade4             * 1.7-13    2018-08-31 [1] CRAN (R 3.5.1)                           
 adegenet         * 2.1.1     2018-02-02 [1] CRAN (R 3.5.1)                           
 amap               0.8-16    2018-05-14 [1] CRAN (R 3.5.0)                           
 ape              * 5.3       2019-03-17 [1] CRAN (R 3.5.3)                           
 assertthat         0.2.1     2019-03-21 [1] CRAN (R 3.5.3)                           
 assigner         * 0.5.5     2019-04-30 [1] Github (thierrygosselin/assigner@ed6475f)
 backports          1.1.3     2018-12-14 [1] CRAN (R 3.5.2)                           
 Biobase            2.40.0    2018-05-01 [1] Bioconductor                             
 BiocGenerics       0.26.0    2018-05-01 [1] Bioconductor                             
 Biostrings         2.48.0    2018-05-01 [1] Bioconductor                             
 bitops             1.0-6     2013-08-17 [1] CRAN (R 3.5.0)                           
 boot               1.3-20    2017-08-06 [2] CRAN (R 3.5.1)                           
 broom              0.5.1     2018-12-05 [1] CRAN (R 3.5.2)                           
 Cairo            * 1.5-10    2019-03-28 [1] CRAN (R 3.5.1)                           
 calibrate          1.7.2     2013-09-10 [1] CRAN (R 3.5.1)                           
 callr              3.2.0     2019-03-15 [1] CRAN (R 3.5.3)                           
 caTools            1.17.1.2  2019-03-06 [1] CRAN (R 3.5.2)                           
 class              7.3-15    2019-01-01 [1] CRAN (R 3.5.3)                           
 classInt           0.3-1     2018-12-18 [1] CRAN (R 3.5.2)                           
 cli                1.1.0     2019-03-19 [1] CRAN (R 3.5.3)                           
 cluster          * 2.0.7-1   2018-04-13 [2] CRAN (R 3.5.1)                           
 coda               0.19-2    2018-10-08 [1] CRAN (R 3.5.1)                           
 codetools          0.2-16    2018-12-24 [1] CRAN (R 3.5.2)                           
 colorspace         1.4-1     2019-03-18 [1] CRAN (R 3.5.1)                           
 combinat           0.0-8     2012-10-29 [1] CRAN (R 3.5.0)                           
 crayon             1.3.4     2017-09-16 [1] CRAN (R 3.5.1)                           
 crosstalk          1.0.0     2016-12-21 [1] CRAN (R 3.5.1)                           
 curl               3.3       2019-01-10 [1] CRAN (R 3.5.2)                           
 data.table         1.12.0    2019-01-13 [1] CRAN (R 3.5.2)                           
 DataCombine      * 0.2.21    2016-04-13 [1] CRAN (R 3.5.1)                           
 DBI                1.0.0     2018-05-02 [1] CRAN (R 3.5.1)                           
 deldir             0.1-16    2019-01-04 [1] CRAN (R 3.5.2)                           
 desc               1.2.0     2018-05-01 [1] CRAN (R 3.5.1)                           
 devtools         * 2.0.1     2018-10-26 [1] CRAN (R 3.5.1)                           
 dichromat          2.0-0     2013-01-24 [1] CRAN (R 3.5.2)                           
 digest             0.6.18    2018-10-10 [1] CRAN (R 3.5.1)                           
 dismo              1.1-4     2017-01-09 [1] CRAN (R 3.5.1)                           
 doParallel         1.0.14    2018-09-24 [1] CRAN (R 3.5.1)                           
 dplyr            * 0.8.0.1   2019-02-15 [1] CRAN (R 3.5.2)                           
 e1071              1.7-1     2019-03-19 [1] CRAN (R 3.5.3)                           
 evaluate           0.13      2019-02-12 [1] CRAN (R 3.5.2)                           
 expm               0.999-4   2019-03-21 [1] CRAN (R 3.5.3)                           
 fansi              0.4.0     2018-10-05 [1] CRAN (R 3.5.1)                           
 fastmatch          1.1-0     2017-01-28 [1] CRAN (R 3.5.0)                           
 foreach            1.4.4     2017-12-12 [1] CRAN (R 3.5.1)                           
 fs                 1.2.7     2019-03-19 [1] CRAN (R 3.5.3)                           
 fst                0.8.10    2018-12-14 [1] CRAN (R 3.5.2)                           
 future             1.12.0    2019-03-08 [1] CRAN (R 3.5.1)                           
 gap                1.1-22    2018-06-08 [1] CRAN (R 3.5.1)                           
 gdata              2.18.0    2017-06-06 [1] CRAN (R 3.5.1)                           
 gdistance          1.2-2     2018-05-07 [1] CRAN (R 3.5.1)                           
 gdsfmt           * 1.16.0    2018-05-01 [1] Bioconductor                             
 generics           0.0.2     2018-11-29 [1] CRAN (R 3.5.2)                           
 genetics           1.3.8.1.1 2019-02-01 [1] CRAN (R 3.5.2)                           
 GenomeInfoDb       1.16.0    2018-05-01 [1] Bioconductor                             
 GenomeInfoDbData   1.1.0     2018-09-05 [1] Bioconductor                             
 GenomicRanges      1.32.6    2018-07-20 [1] Bioconductor                             
 GGally             1.4.0     2018-05-17 [1] CRAN (R 3.5.1)                           
 ggplot2            3.1.0     2018-10-25 [1] CRAN (R 3.5.1)                           
 globals            0.12.4    2018-10-11 [1] CRAN (R 3.5.1)                           
 glue               1.3.1     2019-03-12 [1] CRAN (R 3.5.3)                           
 gmodels            2.18.1    2018-06-25 [1] CRAN (R 3.5.1)                           
 gplots           * 3.0.1.1   2019-01-27 [1] CRAN (R 3.5.2)                           
 gridExtra          2.3       2017-09-09 [1] CRAN (R 3.5.1)                           
 gtable             0.3.0     2019-03-25 [1] CRAN (R 3.5.3)                           
 gtools             3.8.1     2018-06-26 [1] CRAN (R 3.5.0)                           
 GWASExactHW        1.01      2013-01-05 [1] CRAN (R 3.5.0)                           
 hierfstat        * 0.04-22   2015-12-04 [1] CRAN (R 3.5.1)                           
 highr              0.8       2019-03-20 [1] CRAN (R 3.5.3)                           
 hms                0.4.2     2018-03-10 [1] CRAN (R 3.5.1)                           
 htmltools          0.3.6     2017-04-28 [1] CRAN (R 3.5.1)                           
 htmlwidgets        1.3       2018-09-30 [1] CRAN (R 3.5.1)                           
 httpuv             1.5.0     2019-03-15 [1] CRAN (R 3.5.3)                           
 igraph             1.2.4     2019-02-13 [1] CRAN (R 3.5.2)                           
 IRanges            2.14.11   2018-08-24 [1] Bioconductor                             
 iterators          1.0.10    2018-07-13 [1] CRAN (R 3.5.1)                           
 jomo               2.6-7     2019-02-06 [1] CRAN (R 3.5.2)                           
 jsonlite           1.6       2018-12-07 [1] CRAN (R 3.5.2)                           
 KernSmooth         2.23-15   2015-06-29 [2] CRAN (R 3.5.1)                           
 knitr            * 1.22      2019-03-08 [1] CRAN (R 3.5.1)                           
 labeling           0.3       2014-08-23 [1] CRAN (R 3.5.0)                           
 later              0.8.0     2019-02-11 [1] CRAN (R 3.5.2)                           
 lattice            0.20-38   2018-11-04 [1] CRAN (R 3.5.3)                           
 lazyeval           0.2.2     2019-03-15 [1] CRAN (R 3.5.3)                           
 LEA              * 1.99.2    2018-09-29 [1] Github (bcm-uga/LEA@ffea10d)             
 LearnBayes         2.15.1    2018-03-18 [1] CRAN (R 3.5.0)                           
 listenv            0.7.0     2018-01-21 [1] CRAN (R 3.5.1)                           
 lme4               1.1-21    2019-03-05 [1] CRAN (R 3.5.2)                           
 logistf            1.23      2018-07-19 [1] CRAN (R 3.5.1)                           
 magrittr           1.5       2014-11-22 [1] CRAN (R 3.5.1)                           
 manipulateWidget   0.10.0    2018-06-11 [1] CRAN (R 3.5.1)                           
 mapproj            1.2.6     2018-03-29 [1] CRAN (R 3.5.1)                           
 maps             * 3.3.0     2018-04-03 [1] CRAN (R 3.5.1)                           
 MASS               7.3-51.1  2018-11-01 [1] CRAN (R 3.5.3)                           
 Matrix           * 1.2-17    2019-03-22 [1] CRAN (R 3.5.3)                           
 memoise            1.1.0     2017-04-21 [1] CRAN (R 3.5.1)                           
 memuse             4.0-0     2017-11-10 [1] CRAN (R 3.5.0)                           
 mgcv               1.8-28    2019-03-21 [1] CRAN (R 3.5.3)                           
 mice               3.4.0     2019-03-07 [1] CRAN (R 3.5.2)                           
 mime               0.6       2018-10-05 [1] CRAN (R 3.5.1)                           
 miniUI             0.1.1.1   2018-05-18 [1] CRAN (R 3.5.1)                           
 minqa              1.2.4     2014-10-09 [1] CRAN (R 3.5.1)                           
 mitml              0.3-7     2019-01-07 [1] CRAN (R 3.5.2)                           
 mmod             * 1.3.3     2017-04-06 [1] CRAN (R 3.5.1)                           
 munsell            0.5.0     2018-06-12 [1] CRAN (R 3.5.1)                           
 mvtnorm            1.0-10    2019-03-05 [1] CRAN (R 3.5.2)                           
 nlme               3.1-137   2018-04-07 [2] CRAN (R 3.5.1)                           
 nloptr             1.2.1     2018-10-03 [1] CRAN (R 3.5.1)                           
 nnet               7.3-12    2016-02-02 [2] CRAN (R 3.5.1)                           
 pals             * 1.5       2018-01-22 [1] CRAN (R 3.5.2)                           
 pan                1.6       2018-06-29 [1] CRAN (R 3.5.1)                           
 pbmcapply          1.3.1     2019-01-14 [1] CRAN (R 3.5.2)                           
 pegas            * 0.11      2018-07-09 [1] CRAN (R 3.5.1)                           
 permute            0.9-5     2019-03-12 [1] CRAN (R 3.5.3)                           
 phangorn           2.5.3     2019-03-23 [1] CRAN (R 3.5.3)                           
 pillar             1.3.1     2018-12-15 [1] CRAN (R 3.5.2)                           
 pinfsc50           1.1.0     2016-12-02 [1] CRAN (R 3.5.0)                           
 pkgbuild           1.0.3     2019-03-20 [1] CRAN (R 3.5.3)                           
 pkgconfig          2.0.2     2018-08-16 [1] CRAN (R 3.5.1)                           
 pkgload            1.0.2     2018-10-29 [1] CRAN (R 3.5.1)                           
 plyr               1.8.4     2016-06-08 [1] CRAN (R 3.5.1)                           
 png                0.1-7     2013-12-03 [1] CRAN (R 3.5.0)                           
 polysat            1.7-4     2019-03-06 [1] CRAN (R 3.5.2)                           
 PopGenReport     * 3.0.4     2019-02-04 [1] CRAN (R 3.5.2)                           
 poppr            * 2.8.2     2019-03-11 [1] CRAN (R 3.5.3)                           
 prettyunits        1.0.2     2015-07-13 [1] CRAN (R 3.5.1)                           
 processx           3.3.0     2019-03-10 [1] CRAN (R 3.5.3)                           
 promises           1.0.1     2018-04-13 [1] CRAN (R 3.5.1)                           
 ps                 1.3.0     2018-12-21 [1] CRAN (R 3.5.2)                           
 purrr              0.3.2     2019-03-15 [1] CRAN (R 3.5.3)                           
 quadprog           1.5-5     2013-04-17 [1] CRAN (R 3.5.0)                           
 R.methodsS3        1.7.1     2016-02-16 [1] CRAN (R 3.5.0)                           
 R.oo               1.22.0    2018-04-22 [1] CRAN (R 3.5.0)                           
 R.utils            2.8.0     2019-02-14 [1] CRAN (R 3.5.2)                           
 R6                 2.4.0     2019-02-14 [1] CRAN (R 3.5.2)                           
 radiator         * 1.1.0     2019-05-03 [1] Github (thierrygosselin/radiator@fdef494)
 raster             2.8-19    2019-01-30 [1] CRAN (R 3.5.2)                           
 RColorBrewer       1.1-2     2014-12-07 [1] CRAN (R 3.5.0)                           
 Rcpp               1.0.1     2019-03-17 [1] CRAN (R 3.5.3)                           
 RcppEigen          0.3.3.5.0 2018-11-24 [1] CRAN (R 3.5.2)                           
 RCurl              1.95-4.12 2019-03-04 [1] CRAN (R 3.5.2)                           
 readr              1.3.1     2018-12-21 [1] CRAN (R 3.5.2)                           
 remotes            2.0.2     2018-10-30 [1] CRAN (R 3.5.1)                           
 reshape            0.8.8     2018-10-23 [1] CRAN (R 3.5.1)                           
 reshape2           1.4.3     2017-12-11 [1] CRAN (R 3.5.1)                           
 rgdal              1.4-3     2019-03-14 [1] CRAN (R 3.5.3)                           
 rgl                0.100.19  2019-03-12 [1] CRAN (R 3.5.3)                           
 RgoogleMaps        1.4.3     2018-11-07 [1] CRAN (R 3.5.1)                           
 rlang              0.3.2     2019-03-21 [1] CRAN (R 3.5.3)                           
 rpart              4.1-13    2018-02-23 [2] CRAN (R 3.5.1)                           
 rprojroot          1.3-2     2018-01-03 [1] CRAN (R 3.5.1)                           
 rstudioapi         0.10      2019-03-19 [1] CRAN (R 3.5.3)                           
 S4Vectors          0.18.3    2018-06-08 [1] Bioconductor                             
 scales             1.0.0     2018-08-09 [1] CRAN (R 3.5.1)                           
 SeqArray         * 1.21.4    2018-09-05 [1] Github (zhengxwen/SeqArray@1d5ab05)      
 seqinr             3.4-5     2017-08-01 [1] CRAN (R 3.5.1)                           
 SeqVarTools        1.20.2    2019-02-27 [1] Bioconductor                             
 sessioninfo        1.1.1     2018-11-05 [1] CRAN (R 3.5.1)                           
 sf                 0.7-3     2019-02-21 [1] CRAN (R 3.5.2)                           
 shiny              1.2.0     2018-11-02 [1] CRAN (R 3.5.1)                           
 sp                 1.3-1     2018-06-05 [1] CRAN (R 3.5.1)                           
 spData             0.3.0     2019-01-07 [1] CRAN (R 3.5.2)                           
 spdep              1.0-2     2019-02-13 [1] CRAN (R 3.5.2)                           
 StAMPP           * 1.5.1     2017-11-10 [1] CRAN (R 3.5.1)                           
 stringdist         0.9.5.1   2018-06-08 [1] CRAN (R 3.5.1)                           
 stringi            1.4.3     2019-03-12 [1] CRAN (R 3.5.3)                           
 stringr            1.4.0     2019-02-10 [1] CRAN (R 3.5.2)                           
 survival           2.43-3    2018-11-26 [1] CRAN (R 3.5.3)                           
 tess3r           * 1.1.0     2018-09-04 [1] Github (bcm-uga/TESS3_encho_sen@43e5ede) 
 testthat           2.0.1     2018-10-13 [1] CRAN (R 3.5.1)                           
 tibble             2.1.1     2019-03-16 [1] CRAN (R 3.5.3)                           
 tidyr              0.8.3     2019-03-01 [1] CRAN (R 3.5.2)                           
 tidyselect         0.2.5     2018-10-11 [1] CRAN (R 3.5.1)                           
 units              0.6-2     2018-12-05 [1] CRAN (R 3.5.2)                           
 UpSetR             1.3.3     2017-03-21 [1] CRAN (R 3.5.3)                           
 usethis          * 1.4.0     2018-08-14 [1] CRAN (R 3.5.1)                           
 utf8               1.1.4     2018-05-24 [1] CRAN (R 3.5.1)                           
 vcfR             * 1.8.0     2018-04-17 [1] CRAN (R 3.5.1)                           
 vegan              2.5-4     2019-02-04 [1] CRAN (R 3.5.2)                           
 viridis            0.5.1     2018-03-29 [1] CRAN (R 3.5.1)                           
 viridisLite        0.3.0     2018-02-01 [1] CRAN (R 3.5.1)                           
 webshot            0.5.1     2018-09-28 [1] CRAN (R 3.5.1)                           
 withr              2.1.2     2018-03-15 [1] CRAN (R 3.5.1)                           
 xfun               0.5       2019-02-20 [1] CRAN (R 3.5.2)                           
 XML              * 3.98-1.19 2019-03-06 [1] CRAN (R 3.5.2)                           
 xtable             1.8-3     2018-08-29 [1] CRAN (R 3.5.1)                           
 XVector            0.20.0    2018-05-01 [1] Bioconductor                             
 yaml               2.2.0     2018-07-25 [1] CRAN (R 3.5.1)                           
 zlibbioc           1.26.0    2018-05-01 [1] Bioconductor                             

[1] C:/Users/kevin/Documents/R/win-library/3.5
[2] C:/Program Files/R/R-3.5.1/library

Keep all SNPs per locus

Hi there

I am wanting to keep all the SNPs present in the loci for the comparison.

Is this possible?

Cheers
Aimee

Execution date@time: 20190716@1551
Function call and arguments stored in: [email protected]
2 steps to visualize and filter the data based on the number of SNP on the read/locus:
Step 1. Visualization (boxplot, distribution
Step 2. Threshold selection
Filters parameters file: initiated
Generating SNP position on read stats
Generating helper table...
Files written: helper tables and plots

Step 2. Filtering markers based on the SNPs position on the read

Choice of stats are:
1: all (filter off)
2: outliers
3: q75
4: iqr
5: choose your own min and max values
1
File written: whitelist.markers.snp.position.read.tsv
File written: blacklist.markers.snp.position.read.tsv
Filters parameters file: updated
################################### RESULTS ####################################

Filter SNP position on the read : all
Number of individuals / strata / chrom / locus / SNP:
Before: 84 / 5 / 1 / 514 / 957
Blacklisted: 0 / 0 / 0 / 0 / 0
After: 84 / 5 / 1 / 514 / 957

Computation time, overall: 18 sec
##################### completed filter_snp_position_read #######################
################################################################################
############################ radiator::filter_snp_number #######################
################################################################################
Execution date@time: 20190716@1551
Function call and arguments stored in: [email protected]
Interactive mode: on
2 steps to visualize and filter the data based on the number of SNP on the read/locus:
Step 1. Impact of SNP number per read/locus (on individual genotypes and locus/snp number potentially filtered)
Step 2. Choose the filtering thresholds
Filters parameters file: initiated
Generating statistics

With max read length taken from data: 83
The max number of SNP per locus correspond to:
1 SNP per 12 bp

Generating helper table...
Files written: helper tables and plots

Step 2. Filtering markers based on the maximum of SNPs per locus

Do you still want to blacklist markers? (y/n):
n
File written: whitelist.markers.genotyping.tsv
File written: blacklist.markers.genotyping.tsv
Filters parameters file: updated
################################### RESULTS ####################################

Filter SNPs per locus threshold: 1e+12
Number of individuals / strata / chrom / locus / SNP:
Before: 84 / 5 / 1 / 514 / 957
Blacklisted: 0 / 0 / 0 / 0 / 0
After: 84 / 5 / 1 / 514 / 957

Computation time, overall: 95 sec
######################### completed filter_snp_number ##########################
################################################################################
############################## radiator::filter_ld #############################
################################################################################
Execution date@time: 20190716@1552
Function call and arguments stored in: [email protected]

Interactive mode: on

Step 1. Short distance LD threshold selection
Step 2. Filtering markers based on short distance LD
Step 3. Long distance LD pruning selection
Step 4. Threshold selection
Step 5. Filtering markers based on long distance LD

Filters parameters file: initiated
Minimizing short distance LD...
The range in the number of SNP/locus is: 1-7

Step 1. Short distance LD threshold selection
the goal is to keep only 1 SNP per read/locus
Choose the filter.short.ld threshold
Options include:
1: mac (Not sure ? use mac...)
2: random
3: first
4: middle
5: last
1

filter_dart

Hey

Getting this error on file that has previously worked fine with filter_dart

Next step requires the genotypes
Importing DArT data
Error in overscope_eval_next(overscope, expr) :
object 'TARGET_ID' not found

Thanks

radiator::filter_rad has problem with strata recognition

Hi Thierry,

I've been unable to filter my RADseq data set using the interactive filter of radiator::filter_rad. This is because for some reason the strata file is not being recognized

Error in radiator::tidy_genomic_data(data = data, strata = strata, vcf.metadata = TRUE, :
Non-matching INDIVIDUALS between data and strata.

if I only used a tidy.data (supposedly with the strata information already on it)

data.filtered <- radiator::filter_rad(data = prep.vcf$tidy.data, output = "genind", filename = "spis.test")

Then the interactive filter runs, yet when it gets to the 4th filtering step "04: Filtering individuals poorly genotyped" the graph only shows "overall" and "NA" boxplots, and no matter the filtering threshold you set, not a single individual gets blacklisted.

This is a recurrent problem. I tried to reproduce the filtering with a dataset it had worked perfectly in the past but it doesn't work any longer.

allele order changes when using genomic_converter

Hi Thierry - sorry to bug you again, but I'm having an issue with genomic_converter renaming my alleles from those of my input file.

For example, I have a simulation data set in genind format that shows counts of my alleles "100" and "110":

> miss.genind@tab[1:5,1:10]
    0.100 0.110 1.100 1.110 10.100 10.110 11.100 11.110 12.100 12.110
001     2     0     1     1      1      1      0      2      2      0
002     0     2     1     1      0      2      1      1      0      2
003     2     0     0     2      2      0      2      0      2      0
004     0     2     1     1     NA     NA      0      2      1      1
005     1     1     0     2      0      2      1      1     NA     NA

I run genomic_converter like so:

foo <- genomic_converter(data=miss.genind, output="genind", imputation.method="rf", hierarchical.levels="global", verbose = TRUE)

When I look at the output data, the alleles are renamed A1/A2 -- but not consistently. For example, A1 does not always = allele "100" from the original dataset.

> foo$genind.imputed@tab[1:5,1:10]
    0.A1 0.A2 1.A1 1.A2 10.A1 10.A2 11.A1 11.A2 12.A1 12.A2
001    0    2    1    1     1     1     2     0     2     0
002    2    0    1    1     2     0     1     1     0     2
003    0    2    0    2     0     2     0     2     2     0
004    2    0    1    1    NA    NA     2     0     1     1
005    1    1    0    2     2     0     1     1    NA    NA

Individual 3 is a good example if you compare across the data sets.

That's a problem for me since I am taking only one column per locus from the imputed data frame for downstream analysis (for example, I need all of the "110" allele counts).

Can you tweak genomic_converter so it doesn't rename the alleles? That would probably be easiest (?)

Hope that makes sense - I can send the data set if needed.
Thanks so much! (And no huge hurry!)
Brenna

Default permutation method in genomic_converter() is null, not "rf"

According to the genomic_convereter() documentation, the default method for imputation is "rf", while in fact it is NULL. It caused me some confusion trying to understand where the imputed results are.

calculations for missing prop and heterozygosity

Hi Thierry,

I'm learning to use radiator, what it does and how it works. I've been using a dummy data set (DarT) of 19 individuals, two populations and 53 loci to run the package.

However, my "by hand" calculations do not match the missing proportion and heterozygosity values generated by the "filter_individuals" function. I had a look at the function and couldn't figure out how radiator calculates these values. So for example, the "missing_prop" for a particular individual is 0.32, however it only has 4 missing data points out of the 53 loci (4/53= 0.075, missing prop). Likewise, an additional sample has a heterozygosity value of "0" but that individual is heterozygote for two loci.

Do you mind shedding some light on how these values are derived or pointing me to where this is explained?

I have attached the dummy data set and the individuals qc. stats.tvs output in case you need them
Dummy_dataset_and_output_qc_indv.zip

the code i've used is:

infile1 <- "./Dummy_Data_subset.csv"
infile3 <- "./strata.txt"

tmp=read_dart(infile1, infile3)
tmp2= filter_individuals(tmp)

Thank you,

Diana

thierrygosselin / radiator Goto Github PK

radiator's Issues

Path to Bayescan program

Run BayeScan

convert genepop to genind

convert genind to radiator

place bayescan in /usr/local/bin/ folder

convert tidy into bayescan

Recommend Projects

Recommend Topics

Recommend Org