thierrygosselin / radiator Goto Github PK
View Code? Open in Web Editor NEWRADseq Data Exploration, Manipulation and Visualization using R
Home Page: https://thierrygosselin.github.io/radiator/
License: GNU General Public License v3.0
RADseq Data Exploration, Manipulation and Visualization using R
Home Page: https://thierrygosselin.github.io/radiator/
License: GNU General Public License v3.0
Hi Thierry,
I'm having a similar problem as that reported earlier by Tom Jenkins; namely that I get an "Evaluation error: object 'MARKERS' not found" when using genomic_converter.
In my case, my data (attached here as "foo2.csv") are brought in via df2genind. These are simulated SNP data (100 SNPs for 100 individuals) recoded from allele counts of 0/1/2 to 110110/100110/100100. The goal is to impute NAs using RF in radiator. Note that this code pasted below was working in radiator v. 0.0.6; I'm now running 0.0.10.
Based on the solutions for Tom's data, I tried changing the @loc.fac slot so each column was unique (e.g. 0.1 0.2 1.1 1.2...) but this didn't help. Any ideas? Thanks!! -Brenna
#################################
foo.genind <- df2genind(foo2, ncode = 3, NA.char = NA, ploidy = 2)
foo.genind@pop <- as.factor(rep("PO1", length(sets.length[x])))
foo.genind
/// GENIND OBJECT /////////
// 100 individuals; 100 loci; 200 alleles; size: 132.5 Kb
// Basic content
@tab: 100 x 200 matrix of allele counts
@loc.n.all: number of alleles per locus (range: 2-2)
@loc.fac: locus factor for the 200 columns of @tab
@all.names: list of allele names for each locus
@ploidy: ploidy of each individual (range: 2-2)
@type: codom
@call: df2genind(X = foo2, ncode = 3, NA.char = NA, ploidy = 2)
// Optional content
@pop: population of each individual (group size range: 1-1)
[email protected]
[1] 0 0 1 1 2 2 3 3 4 4 5 5 6 6 7 7 8 8 9 9 10
[22] 10 11 11 12 12 13 13 14 14 15 15 16 16 17 17 18 18 19 19 20 20
[43] 21 21 22 22 23 23 24 24 25 25 26 26 27 27 28 28 29 29 30 30 31
[64] 31 32 32 33 33 34 34 35 35 36 36 37 37 38 38 39 39 40 40 41 41
[85] 42 42 43 43 44 44 45 45 46 46 47 47 48 48 49 49 50 50 51 51 52
[106] 52 53 53 54 54 55 55 56 56 57 57 58 58 59 59 60 60 61 61 62 62
[127] 63 63 64 64 65 65 66 66 67 67 68 68 69 69 70 70 71 71 72 72 73
[148] 73 74 74 75 75 76 76 77 77 78 78 79 79 80 80 81 81 82 82 83 83
[169] 84 84 85 85 86 86 87 87 88 88 89 89 90 90 91 91 92 92 93 93 94
[190] 94 95 95 96 96 97 97 98 98 99 99
100 Levels: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 ... 99
imp.rf.stackr <- genomic_converter(data=foo.genind, output="tidy", imputation.method="rf", hierarchical.levels="global", verbose = TRUE)
#######################################################################
##################### radiator::genomic_converter #####################
#######################################################################
Function arguments and values:
Working directory: D:/6-NAproject/imputation/filter/temp/temp1
Input file: from global environment
Strata: no
Population levels: no
Population labels: no
Output format(s): tidy, tidy
Filename prefix: no
Filters:
Blacklist of individuals: no
Blacklist of genotypes: no
Whitelist of markers: no
monomorphic.out: TRUE
snp.ld: no
common.markers: TRUE
max.marker: no
pop.select: no
maf.thresholds: no
Imputations options:
imputation.method: rf
hierarchical.levels: global
parallel.core: 7
#######################################################################
Importing data
Error in mutate_impl(.data, dots) :
Evaluation error: object 'MARKERS' not found.
Hello,
When running run_bayescan I get an error message that stops it running.
#######################################################################
###################### radiator::run_bayescan #########################
#######################################################################
Folder created:
radiator_bayescan_20190116@1211
For progress, look in the log file: [email protected]
Copying input BayeScan file in folder
Importing BayeScan results
Error in if (length(x) > 1 || grepl("\n", x)) { :
missing value where TRUE/FALSE needed
Hi Thierry,
A few small bugs I've noticed with filter_rad() in interactive mode:
-Any filter that removes samples/individuals causes failure in the next filtering step:
Error: Column `MISSING_PROP` must be length 203 (the number of rows) or one, not 204
-detect_mixed_genomes doesn't abide by the parallel.core arg in filter_rad(); fixed by adding parallel.core = parallel.core in filter_rad:
gds <- detect_mixed_genomes(data = gds, interactive.filter = interactive.filter, detect.mixed.genomes = detect.mixed.genomes, ind.heterozygosity.threshold = NULL, parameters = filters.parameters, verbose = verbose, parallel.core = parallel.core, path.folder = wf, internal = FALSE)
-May need to remove strata=NULL from filter_hwe in filter_rad():
gds <- filter_hwe(data = gds, interactive.filter = interactive.filter, filter.hwe = filter.hwe,strata=NULL,hw.pop.threshold = hw.pop.threshold, midp.threshold = midp.threshold, parallel.core = parallel.core, parameters = filters.parameters, path.folder = wf, verbose = verbose, internal = FALSE)
-when filtering by HWE, interactive mode doesn't always detect the asterisk inputs; I think this happened when I tried setting hw.pop.threshold equal to the number of pops, or it may happen when some strata are removed for having n < 10, but I don't remember exactly. I just tried to re-run on strata where none were removed and didn't get the error.
-in general, is there a way to exit the interactive mode? When the HWE filter couldn't detect my inputs, I had to restart the R session to get out.
-Transferring to genomic_converter requires doing the REF/ALT calibration again. Not a major issue but adds some time.
-purely aesthetic, but when running on Windows, the font choice in the plots (Helvetica?) causes warnings:
In grid.Call(C_textBounds, as.graphicsAnnot(x$label), ... :
font family not found in Windows font database
Not issues, but questions/suggestions/requests:
-Are there better explanations for how outliers/q75/iqr are calculated and applied? Is outliers just outside 95% CI?
-filter_coverage step returns a plot of max mean coverage; a plot for min mean would be useful
-In filter_genotyping, is the threshold applied per-strata or only on the total?
-Long LD filtering appears to work but only if pruned WITHOUT missing data statistics when CHROMs represent contigs; pruning with missing data statistics doesn't remove anything. Is there a reason for this? Actually I'm not sure loci are pruned either way. Is it possible to collapse the CHROMs down to a single CHROM to do the long LD filtering?
-outputting the full function call with args entered during the interactive session would help with reproducibility
-asking if you want to run a particular filtering step interactively, e.g. asking if you want to skip calculating HWE since it takes a long time
Hi Thierry,
I wonder if you can help me with the following problem I have:
I would like to a) tranform my vcf data into a tidy data frame using "tidy_genomic_data", then b) fill NAs using the "radiator_imputations_module".
In the first step, this is my code:
mullus.imp <- radiator::tidy_genomic_data(
data = "LD5000_WestMed.recode.vcf",
vcf.metadata = TRUE,
verbose = TRUE)
and this is the resulting output and error:
As I see it, the gds-file is being created at the very moment I execute the command, hence I do not understand how it can be used by another process and what I may do to overcome this error.
Any advice appreciated.
many thanks in advance,
Katharina
Hi Thierry,
I just tried installing the radiator package. After successfully installing glue, Rtools3.5 and stringi, it fails when trying to install the actual radiator package with the following error message:
Installing package into ‘C:/Users/f/Documents/R/win-library/3.5’
(as ‘lib’ is unspecified)
* installing *source* package 'radiator' ...
** R
Error : (converted from warning) unable to re-encode 'filter_monomorphic.R' line 7
ERROR: unable to collate and parse R files for package 'radiator'
* removing 'C:/Users/f/Documents/R/win-library/3.5/radiator'
In R CMD INSTALL
Error in i.p(...) :
(converted from warning) installation of package ‘C:/Users/f/AppData/Local/Temp/Rtmpm0Pwbl/file34c86193c02/radiator_1.0.0.tar.gz’ had non-zero exit status
Any idea as to what causes this error?
Btw, I tried looking at that link to installation errors - but the dropbox link is dead?
Thanks,
Flo
A small typo in one of the monomorphic SNP filtering functions, produces the output file as blacklist.momorphic.markers.tsv
Hi Thierry,
Thanks for this excellent tool for RADseq data visualization
I've tried to use the function filter_rad (using the interactive filter) but I wasn't able to complete the filtering because of an Error:
Error in getGlobalsAndPackages(expr, envir = envir, tweak = tweakExpression, : The total size of the 3 globals that need to be exported for the future expression (‘do.call(what = FUN, args = args)’) is 3.64 GiB. This exceeds the maximum allowed size of 1.00 GiB (option 'future.globals.maxSize'). There are three globals: ‘args’ (3.64 GiB of class ‘list’), ‘FUN’ (5.59 KiB of class ‘function’) and ‘progressFifo’ (584 bytes of class ‘numeric’).
Could you please provide advice on how to proceed with a larg vcf file (29G)??
This is all it got to do:
Folder created:
filter_rad_20180911@1225
Reading VCF...
Large vcf file may take several minutes...
Actually, you have time for a coffee...
conversion timing: 1556 sec
radiator is working on the file ...
VCF is biallelic
Updating markers metadata and stats
[==================================================] 100%, completed in 1s
[==================================================] 100%, completed in 1s
Generating SNP position on read stats
Generating coverage stats
Generating individual stats
[==================================================] 100%, completed in 5s
Missing data (averaged):
markers: 0.07
individuals: 0.07
Coverage info:
individuals mean read depth: 40034180
individuals mean genotype coverage: 14
markers mean coverage: 14
Number of chromosome/contig/scaffold: 2699
Number of locus: 111019
Number of markers: 3022643
Number of individuals: 323
Working time: 2068 sec
############################# IMPORTANT ###############################
Tidying vcf with 3022643 SNPs is not optimal
use radiator::filter_rad to reduce to ~ 10 000 unlinked SNPs
Hi,
I'm trying to generate some basic stats for my STACKS generated haplotype data (nucleotide diversity per individual, number polymorphic/monomorphic loci per sampling site). I have a populations.haps.vcf file and I generated a tsv file with my individuals and strata.
In my understanding I first need to generate a tidy file. However, when I use either tidy_vcf() or tidy_genomic_data() I receive the following error:
**tidy.data <- tidy_vcf(data = "populations.haps.vcf", strata = "popmap_2019_LinA.tsv")
Reading VCF
Data summary:
number of samples: 308
number of markers: 7163
done! timing: 1 sec
Filter monomorphic markers
Number of individuals / strata / chrom / locus / SNP:
Blacklisted: 0 / 0 / 0 / 0 / 0
Filter common markers:
Number of individuals / strata / chrom / locus / SNP:
Blacklisted: 0 / 0 / 0 / 43 / 43
Generating individual stats...
Error: Argument 2 must be length 2192960, not 0**
What am I missing here?
Best, Diede
I try to convert a .vcf.gz file to tidy genomic data in Rstudio:
eu_snps_1 <- genomic_converter( data = "UMBELLA_Erumb1_samples_gt_50pct_covered.recode.vcf.gz", vcf.metadata = TRUE, common.markers = FALSE, strata = "strata_eu.tsv" )
Resulting in this error message and traceback:
`
#######################################################################
##################### radiator::genomic_converter #####################
#######################################################################
Function arguments and values:
Working directory: /Volumes/pearman-1/lud11_docs/upv_research/projects/eriogonoideae/eriogonoideae/data/GBS/560_samples_20181029/Erumb1
Input file: UMBELLA_Erumb1_samples_gt_50pct_covered.recode.vcf.gz
Strata: strata_eu.tsv
Population levels: no
Population labels: no
Output format(s): tidy
Filename prefix: no
Filters:
Blacklist of individuals: no
Blacklist of genotypes: no
Whitelist of markers: no
monomorphic.out: TRUE
snp.ld: no
common.markers: FALSE
max.marker: no
pop.select: no
maf.thresholds: no
Imputations options:
imputation.method: no
parallel.core: 15
#######################################################################
Importing data
Show Traceback
Error in stringi::stri_replace_all_fixed(str = as.character(x), pattern = c("_", : object 'input' not found`
stringi::stri_replace_all_fixed(str = as.character(x), pattern = c("_", ":", " "), replacement = c("-", "-", ""), vectorize_all = FALSE)
radiator::clean_ind_names(input$INDIVIDUALS)
radiator::tidy_genomic_data(data = data, vcf.metadata = vcf.metadata, blacklist.id = blacklist.id, blacklist.genotype = blacklist.genotype, whitelist.markers = whitelist.markers, monomorphic.out = monomorphic.out, max.marker = max.marker, snp.ld = snp.ld, common.markers = common.markers, ...
genomic_converter(data = "UMBELLA_Erumb1_samples_gt_50pct_covered.recode.vcf.gz", vcf.metadata = TRUE, common.markers = FALSE, strata = "strata_eu.tsv")
The files exist:
> file.exists("strata_eu.tsv") [1] TRUE
> file.exists("UMBELLA_Erumb1_samples_gt_50pct_covered.recode.vcf.gz") [1] TRUE
The top of the unzipped .vcf.gz looks like this:
##fileformat=VCFv4.0
##fileDate=Thu Oct 25 12:00:44 2018
##source=GBS-SNP-CROP
##phasing=partial
##INFO=<ID=AC,Number=1,Type=Integer,Description="Allele Count">
##INFO=<ID=AF,Number=1,Type=Integer,Description="Allele Frequency">
##INFO=<ID=DP,Number=1,Type=Integer,Description="Average Depth">
##INFO=<ID=NS,Number=1,Type=Integer,Description="Number of Samples With Data">
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
##FORMAT=<ID=DP,Number=1,Type=Integer,Description="Read Depth">
##FORMAT=<ID=AD,Number=1,Type=Integer,Description="Allele Depth">
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT AC3894 AC3895 AC3896...
Erumb1_s00000011 30280 . C T 40 PASS . GT:DP:AD ./.:0:.,. ./.:0:.,. ./.:0:.,. .....
The top of the tab separated, 561-line strata file looks like this:
INDIVIDUALS STRATA FLOWCELL variety2 MACHINE year
AC3894 PBP1 C9RB9ACXX(2101) munzii 26B 2014
AC3895 PBP1 C9RB9ACXX(2101) munzii 26B 2014
AC3896 PBP1 C9RB9ACXX(2101) munzii 26B 2014
AC3897 PBP1 C9RB9ACXX(2101) munzii 26B 2014
AC3898 PBP1 C9RB9ACXX(2*101) munzii 26B 2014.....
The FLOWCELL names all have an "*" even though only the last one appears here.
It looks to me like everything is there that should be. Maybe I am just doing something wrong. I am on this R:
version
_
platform x86_64-apple-darwin15.6.0
arch x86_64
os darwin15.6.0
system x86_64, darwin15.6.0
status
major 3
minor 5.1
year 2018
month 07
day 02
svn rev 74947
language R
version.string R version 3.5.1 (2018-07-02)
nickname Feather Spray
And RStudio version 1.0.153
sexy_markers
works on:
To do:
Hi Thierry,
I've found an issue with using tidy_genomic_data() with multiallelic datasets. I've been able to trace the problem to the change_alleles() function (called from tidy_genomic_data()), but haven't been able to further pinpoint the problem. Here is a small example:
library(radiator)
library(tidyverse)
#> Loading tidyverse: ggplot2
#> Loading tidyverse: tibble
#> Loading tidyverse: tidyr
#> Loading tidyverse: readr
#> Loading tidyverse: purrr
#> Loading tidyverse: dplyr
#> Warning: package 'dplyr' was built under R version 3.4.2
#> Conflicts with tidy packages ----------------------------------------------
#> filter(): dplyr, stats
#> lag(): dplyr, stats
# A sample tidy biallelic data frame
tidy_dat <- tibble::tribble(
~INDIVIDUALS, ~POP_ID, ~LOCUS, ~GT,
"IND1", "POP1", "loc1", "000000",
"IND2", "POP1", "loc1", "001002"
)
# This works as expected
change_alleles(tidy_dat, verbose = TRUE)
#> Scanning for number of alleles per marker...
#> Data is biallellic
#> Generating vcf-style coding
#> $input
#> # A tibble: 2 x 8
#> INDIVIDUALS POP_ID MARKERS GT ALT REF GT_VCF GT_BIN
#> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <dbl>
#> 1 IND1 POP1 loc1 000000 C A ./. NA
#> 2 IND2 POP1 loc1 001002 C A 0/1 1
#>
#> $biallelic
#> [1] TRUE
# A sample tidy multi-allelic data frame
tidy_dat2 <- tibble::tribble(
~INDIVIDUALS, ~POP_ID, ~LOCUS, ~GT,
"IND1", "POP1", "loc1", "000000",
"IND2", "POP1", "loc1", "001002",
"IND3", "POP1", "loc1", "003004"
)
# This has an unexpected output
change_alleles(tidy_dat2, verbose = TRUE)
#> Scanning for number of alleles per marker...
#> Data is multiallellic
#> Warning in serialize(data, node$con): 'package:dplyr' may not be available
#> when loading
#> Warning in serialize(data, node$con): 'package:dplyr' may not be available
#> when loading
#> Warning in serialize(data, node$con): 'package:dplyr' may not be available
#> when loading
#> Warning in serialize(data, node$con): 'package:dplyr' may not be available
#> when loading
#> Warning in serialize(data, node$con): 'package:dplyr' may not be available
#> when loading
#> Warning in serialize(data, node$con): 'package:dplyr' may not be available
#> when loading
#> Warning in serialize(data, node$con): 'package:dplyr' may not be available
#> when loading
#> Integrating new genotype codings...
#> Warning in serialize(data, node$con): 'package:dplyr' may not be available
#> when loading
#> Warning in serialize(data, node$con): 'package:dplyr' may not be available
#> when loading
#> Warning in serialize(data, node$con): 'package:dplyr' may not be available
#> when loading
#> Warning in serialize(data, node$con): 'package:dplyr' may not be available
#> when loading
#> Warning in serialize(data, node$con): 'package:dplyr' may not be available
#> when loading
#> Warning in serialize(data, node$con): 'package:dplyr' may not be available
#> when loading
#> Warning in serialize(data, node$con): 'package:dplyr' may not be available
#> when loading
#> $input
#> # A tibble: 3 x 8
#> INDIVIDUALS POP_ID MARKERS GT_VCF_NUC REF ALT GT
#> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 IND1 POP1 loc1 000000/000000 000000 001002,003004 001001
#> 2 IND2 POP1 loc1 001002/001002 000000 001002,003004 002002
#> 3 IND3 POP1 loc1 003004/003004 000000 001002,003004 003003
#> # ... with 1 more variables: GT_VCF <chr>
#>
#> $biallelic
#> [1] FALSE
-Chris
Hi Thierry,
I'm trying to import a .vcf
file produced by gstacks
command of Stacks2.
Variants were called from mapped BAM files, not de novo.
Initially I had this error:
Error in sample.int(length(x), size, replace, prob) :
cannot take a sample larger than the population when 'replace = FALSE'
This is caused by the fact that in the current Stacks2 VCF output, the ID is left blank .
, so tidy_vcf()
fails at this command:
# Since stacks v.1.44 ID as LOCUS + COL (from sumstats) the position of the SNP on the locus.
# Choose the first 100 markers to scan
detect.snp.col <- sample(x = unique(input$LOCUS), size = 100, replace = FALSE) %>%
stringi::stri_detect_fixed(str = ., pattern = "_") %>%
unique
In order to circumnavigate this, I tried to change the ID column (imported as LOCUS) to be as mentioned in tidy_vcf()
(CHROM_POS), or just POS, but then it completely crashes without giving any message.
Any ideas?
Thanks, Ido
I am trying to convert a vcf file to another format i.e., genind. The program recognizes the files and identifies the correct number of individuals and snps but fail to complete the converstion. I keep receiving the following message:
Error in .DynamicClusterCall(cl, length(cl), .fun = function(.proc_idx, : One of the nodes produced an error: Can not open file 'C:\Users\Documents\Data_Analysis\134_radiator_genomic_converter_20190323@1302\[email protected]'. The process cannot access the file because it is being used by another process.
I've been getting errors trying to use genomic_converter() and write_colony().
I generated my original .vcf file in ipyrad, and then used PLINK to do some preliminary filtering (removing triallelic loci, and filtering by maf), but I get the same error regardless of which vcf file I use. I've previously used these files for some analyses in adegenet, so I'm not sure where the origin of the error is.
Here's my code:
ash_info<-read.csv('C:/Users/smart/Desktop/plink2_win64_20181028/sca_ASH_info3.csv')
ash_strata<-ash_info[,c(1,11)]
colnames(ash_strata)<-c("INDIVIDUALS","STRATA")
bad_ids<-as.data.frame(c("sca1194"))
colnames(bad_ids)<-c("INDIVIDUALS")
ash_output<-genomic_converter("C:/Users/smart/Desktop/plink2_win64_20181028/ASH_sept_2018_vcf_filter.vcf", strata=ash_strata, imputation.method = "rf", blacklist.id =bad_ids, output=c('genind','tidy'))
which returns the following:
#######################################################################
##################### radiator::genomic_converter #####################
#######################################################################
Function arguments and values:
Working directory: C:/tester
Input file: C:/Users/smart/Desktop/plink2_win64_20181028/ASH_sept_2018_vcf_filter.vcf
Strata: 1:109c(5, 5, 3, 4, 8, 15, 9, 8, 8, 3, 3, 12, 8, 6, 15, 8, 5, 2, 1, 2, 13, 12, 1, 13, 6, 2, 6, 6, 4, 2, 4, 14, 3, 3, 3, 14, 5, 3, 12, 10, 3, 2, 1, 3, 6, 6, 6, 3, 13, 1, 1, 3, 3, 3, 3, 3, 13, 12, 3, 15, 8, 3, 8, 11, 11, 11, 5, 3, 3, 3, 6, 6, 14, 11, 8, 6, 15, 10, 6, 4, 3, 11, 13, 11, 6, 3, 2, 3, 3, 6, 8, 8, 8, 8, 9, 11, 2, 12, 2, 6, 6, 7, 7, 5, 5, 5, 5, 5, 5)
Population levels: no
Population labels: no
Output format(s): tidy
Filename prefix: no
Filters:
Blacklist of individuals: 1
Blacklist of genotypes: no
Whitelist of markers: no
monomorphic.out: TRUE
snp.ld: no
common.markers: TRUE
max.marker: no
pop.select: no
maf.thresholds: no
Imputations options:
imputation.method: rf
hierarchical.levels: strata
parallel.core: 3
#######################################################################
Importing data
Number of individuals in blacklist: 1
Reading VCF...
Error in if (check.header$format$Number[check.header$format$ID == "AD"] == :
argument is of length zero
I get the same error if I use the original vcf from ipyrad too.
Out of curiosity, I also tried using vcfR to import the vcf file, convert it to a genind object, and use that in the genomic converter function as below:
ash_vcf<-read.vcfR("C:/Users/smart/Desktop/plink2_win64_20181028/ASH_sept_2018_vcf_filter.vcf")
ash_genind<-vcfR2genind(ash_vcf)
ash_genind@pop<-ash_info[,11]
ash_output<-genomic_converter(ash_genind, blacklist.id = bad_ids, snp.ld = "random")
Which returns this error:
#######################################################################
##################### radiator::genomic_converter #####################
#######################################################################
Function arguments and values:
Working directory: C:/tester
Input file: from global environment
Strata: no
Population levels: no
Population labels: no
Output format(s): tidy
Filename prefix: no
Filters:
Blacklist of individuals: 1
Blacklist of genotypes: no
Whitelist of markers: no
monomorphic.out: TRUE
snp.ld: random
common.markers: TRUE
max.marker: no
pop.select: no
maf.thresholds: no
Imputations options:
imputation.method: no
parallel.core: 3
#######################################################################
Importing data
Number of individuals in blacklist: 1
Alleles names for each markers will be converted to factors and padded with 0
Error in .f(.x[[i]], ...) : object 'CHROM' not found
In addition: There were 27 warnings (use warnings() to see them)
The 27 warning are all
In serialize(data, node$con) :
'package:dplyr' may not be available when loading
And my sessionInfo:
R version 3.5.1 (2018-07-02)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows >= 8 x64 (build 9200)
Matrix products: default
locale:
[1] LC_COLLATE=English_United States.1252
[2] LC_CTYPE=English_United States.1252
[3] LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C
[5] LC_TIME=English_United States.1252
attached base packages:
[1] grid stats graphics grDevices utils datasets
[7] methods base
other attached packages:
[1] grur_0.0.11 bindrcpp_0.2.2 vcfR_1.8.0
[4] radiator_0.0.19 viridisLite_0.3.0 geoR_1.7-5.2.1
[7] fields_9.6 maps_3.3.0 spam_2.2-0
[10] dotCall64_1.0-0 INLA_18.07.12 sp_1.3-1
[13] Matrix_1.2-14 adegenet_2.1.1 ade4_1.7-13
[16] forcats_0.3.0 stringr_1.3.1 dplyr_0.7.8
[19] purrr_0.2.5 readr_1.1.1 tidyr_0.8.2
[22] tibble_1.4.2 ggplot2_3.1.0 tidyverse_1.2.1
loaded via a namespace (and not attached):
[1] readxl_1.1.0 uuid_0.1-2
[3] backports_1.1.2 plyr_1.8.4
[5] igraph_1.2.2 lazyeval_0.2.1
[7] splines_3.5.1 listenv_0.7.0
[9] rncl_0.8.3 GenomeInfoDb_1.18.0
[11] amap_0.8-16 digest_0.6.18
[13] htmltools_0.3.6 gdata_2.18.0
[15] magrittr_1.5 RandomFieldsUtils_0.3.25
[17] cluster_2.0.7-1 Biostrings_2.50.1
[19] globals_0.12.4 modelr_0.1.2
[21] gmodels_2.18.1 prettyunits_1.0.2
[23] colorspace_1.3-2 rvest_0.3.2
[25] haven_1.1.2 tcltk_3.5.1
[27] crayon_1.3.4 RCurl_1.95-4.11
[29] jsonlite_1.5 bindr_0.1.1
[31] phylobase_0.8.4 ape_5.2
[33] glue_1.3.0 gtable_0.2.0
[35] zlibbioc_1.28.0 XVector_0.22.0
[37] seqinr_3.4-5 BiocGenerics_0.28.0
[39] adegraphics_1.0-12 scales_1.0.0
[41] Rcpp_1.0.0 RandomFields_3.1.50
[43] xtable_1.8-3 progress_1.2.0
[45] spData_0.2.9.4 spdep_0.7-9
[47] stats4_3.5.1 httr_1.3.1
[49] RColorBrewer_1.1-2 pkgconfig_2.0.2
[51] XML_3.98-1.16 deldir_0.1-15
[53] SeqArray_1.23.1 tidyselect_0.2.5
[55] rlang_0.3.0.1 reshape2_1.4.3
[57] later_0.7.5 munsell_0.5.0
[59] pbmcapply_1.3.0 adephylo_1.1-11
[61] cellranger_1.1.0 tools_3.5.1
[63] cli_1.0.1 splancs_2.01-40
[65] broom_0.5.0 evaluate_0.12
[67] yaml_2.2.0 knitr_1.20
[69] gdsfmt_1.18.0 adespatial_0.3-2
[71] future_1.10.0 nlme_3.1-137
[73] mime_0.6 xml2_1.2.0
[75] compiler_3.5.1 rstudioapi_0.8
[77] RNeXML_2.2.0 stringi_1.2.4
[79] memuse_4.0-0 lattice_0.20-38
[81] vegan_2.5-3 permute_0.9-4
[83] pillar_1.3.0 LearnBayes_2.15.1
[85] data.table_1.11.8 cowplot_0.9.3
[87] bitops_1.0-6 httpuv_1.4.5
[89] GenomicRanges_1.34.0 R6_2.3.0
[91] latticeExtra_0.6-28 promises_1.0.1
[93] KernSmooth_2.23-15 IRanges_2.16.0
[95] codetools_0.2-15 boot_1.3-20
[97] MASS_7.3-51.1 gtools_3.8.1
[99] assertthat_0.2.0 rprojroot_1.3-2
[101] withr_2.1.2 pinfsc50_1.1.0
[103] GenomeInfoDbData_1.2.0 S4Vectors_0.20.0
[105] mgcv_1.8-25 expm_0.999-3
[107] parallel_3.5.1 hms_0.4.2
[109] fst_0.8.8 coda_0.19-2
[111] rmarkdown_1.10 ggpubr_0.1.8
[113] shiny_1.2.0 lubridate_1.7.4
[115] base64enc_0.1-3
I'm guessing something is wrong on my end, but I've been trying to troubleshoot this for the last few days, and not really sure what else to do. I'll email you the other vcf file and the pop info file.
Thanks again for any help with this,
-Scott
Hi Thierry,
Thank you for a terrific package!
My problem is that after filter_hwe, I do not get a working file, and therefore, I cannot proceed with my analysis. I am sending an RData zip file hopefully with what you need. I understand the function should produce a .rad file, but I cannot find it. Also, if it would save a file, what type of file would it be, genind, genepop, etc?
Many thanks,
Rita
Describe the bug
filter_hwe does not save the resulting file without markers that have a certain number of pops in Hardy-Weinberg disequilibrium.
To Reproduce
radiator::filter_hwe(radiator.gen,
interactive.filter = TRUE,
filter.hwe = TRUE,
strata = NULL,
hw.pop.threshold = TRUE,
midp.threshold = "***",
filename = NULL,
parallel.core = parallel::detectCores() - 1,
verbose = TRUE)
the complete error message you're getting
Content of folder:02_filter_hwe_20190812@2233
[email protected]
genotypes.summary.tsv
hw.pop.sum.tsv
hwd.helper.table.tsv
hwd.plot.blacklist.markers.pdf
[email protected]
the output of devtools::session_info()
devtools::session_info()
devtools::session_info()
─ Session info ─────────────────────────────────────────────────────────
setting value
version R version 3.6.1 (2019-07-05)
os macOS Mojave 10.14.5
system x86_64, darwin15.6.0
ui RStudio
language (EN)
collate en_US.UTF-8
ctype en_US.UTF-8
tz Europe/Lisbon
date 2019-08-12
─ Packages ─────────────────────────────────────────────────────────────
package * version date lib
abind 1.4-5 2016-07-21 [1]
acepack 1.4.1 2016-10-29 [1]
ade4 * 1.7-13 2018-08-31 [1]
adegenet * 2.1.1 2018-02-02 [1]
ape * 5.3 2019-03-17 [1]
assertthat 0.2.1 2019-03-21 [1]
backports 1.1.4 2019-04-10 [1]
base64enc 0.1-3 2015-07-28 [1]
bayesm 3.1-3 2019-07-29 [1]
BDgraph 2.60 2019-08-08 [1]
bitops 1.0-6 2013-08-17 [1]
boot 1.3-23 2019-07-05 [1]
broom 0.5.2 2019-04-07 [1]
calibrate 1.7.2 2013-09-10 [1]
callr 3.3.1 2019-07-18 [1]
caTools 1.17.1.2 2019-03-06 [1]
checkmate 1.9.4 2019-07-04 [1]
class 7.3-15 2019-01-01 [1]
classInt 0.4-1 2019-08-06 [1]
cli 1.1.0 2019-03-19 [1]
cluster 2.1.0 2019-06-19 [1]
coda 0.19-3 2019-07-05 [1]
codetools 0.2-16 2018-12-24 [1]
colorspace 1.4-1 2019-03-18 [1]
combinat 0.0-8 2012-10-29 [1]
compositions 1.40-2 2018-06-14 [1]
corpcor 1.6.9 2017-04-01 [1]
crayon 1.3.4 2017-09-16 [1]
crosstalk 1.0.0 2016-12-21 [1]
curl 4.0 2019-07-22 [1]
d3Network 0.5.2.1 2015-01-31 [1]
dartR * 1.3.4 2019-08-11 [1]
data.table 1.12.2 2019-04-07 [1]
DBI 1.0.0 2018-05-02 [1]
deldir 0.1-23 2019-07-31 [1]
DEoptimR 1.0-8 2016-11-19 [1]
desc 1.2.0 2018-05-01 [1]
devtools * 2.1.0 2019-07-06 [1]
digest 0.6.20 2019-07-04 [1]
directlabels 2018.05.22 2018-05-25 [1]
dismo 1.1-4 2017-01-09 [1]
diveRsity * 1.9.90 2017-04-04 [1]
doParallel * 1.0.15 2019-08-02 [1]
dplyr 0.8.3 2019-07-04 [1]
e1071 1.7-2 2019-06-05 [1]
ellipse 0.4.1 2018-01-05 [1]
energy 1.7-6 2019-07-06 [1]
expm 0.999-4 2019-03-21 [1]
fansi 0.4.0 2018-10-05 [1]
fastmatch 1.1-0 2017-01-28 [1]
fdrtool 1.2.15 2015-07-08 [1]
foreach * 1.4.7 2019-07-27 [1]
foreign 0.8-72 2019-08-02 [1]
Formula 1.2-3 2018-05-03 [1]
fs 1.3.1 2019-05-06 [1]
fst 0.9.0 2019-04-09 [1]
gap 1.2.1 2019-06-05 [1]
gdata 2.18.0 2017-06-06 [1]
gdistance 1.2-2 2018-05-07 [1]
gdsfmt 1.20.0 2019-05-02 [1]
generics 0.0.2 2018-11-29 [1]
genetics 1.3.8.1.2 2019-04-22 [1]
GGally 1.4.0 2018-05-17 [1]
ggm 2.3 2015-01-21 [1]
ggplot2 * 3.2.1 2019-08-10 [1]
ggpubr * 0.2.2 2019-08-07 [1]
ggsignif 0.6.0 2019-08-08 [1]
ggtern 3.1.0 2018-12-19 [1]
glasso 1.10 2018-07-13 [1]
glue 1.3.1 2019-03-12 [1]
gmodels 2.18.1 2018-06-25 [1]
gplots * 3.0.1.1 2019-01-27 [1]
gridExtra 2.3 2017-09-09 [1]
gtable 0.3.0 2019-03-25 [1]
gtools 3.8.1 2018-06-26 [1]
HardyWeinberg 1.6.3 2019-06-29 [1]
hierfstat * 0.04-22 2015-12-04 [1]
Hmisc 4.2-0 2019-01-26 [1]
hms 0.5.0 2019-07-09 [1]
htmlTable 1.13.1 2019-01-07 [1]
htmltools 0.3.6 2017-04-28 [1]
htmlwidgets 1.3 2018-09-30 [1]
httpuv 1.5.1 2019-04-05 [1]
huge 1.3.2 2019-04-08 [1]
HWxtest * 1.1.9 2019-05-31 [1]
igraph 1.2.4.1 2019-04-22 [1]
iterators * 1.0.12 2019-07-26 [1]
jomo 2.6-9 2019-07-29 [1]
jpeg 0.1-8 2014-01-23 [1]
jsonlite 1.6 2018-12-07 [1]
KernSmooth 2.23-15 2015-06-29 [1]
knitr 1.24 2019-08-08 [1]
labeling 0.3 2014-08-23 [1]
later 0.8.0 2019-02-11 [1]
latex2exp 0.4.0 2015-11-30 [1]
lattice 0.20-38 2018-11-04 [1]
latticeExtra 0.6-28 2016-02-09 [1]
lavaan 0.6-4 2019-07-03 [1]
lazyeval 0.2.2 2019-03-15 [1]
leaflet 2.0.2 2018-08-27 [1]
LearnBayes 2.15.1 2018-03-18 [1]
lme4 1.1-21 2019-03-05 [1]
magrittr * 1.5 2014-11-22 [1]
manipulateWidget 0.10.0 2018-06-11 [1]
MASS 7.3-51.4 2019-03-31 [1]
Matrix 1.2-17 2019-03-22 [1]
memoise 1.1.0 2017-04-21 [1]
mgcv 1.8-28 2019-03-21 [1]
mice 3.6.0 2019-07-10 [1]
mime 0.7 2019-06-11 [1]
miniUI 0.1.1.1 2018-05-18 [1]
minqa 1.2.4 2014-10-09 [1]
mitml 0.3-7 2019-01-07 [1]
mmod * 1.3.3 2017-04-06 [1]
mnormt 1.5-5 2016-10-15 [1]
munsell 0.5.0 2018-06-12 [1]
mvtnorm 1.0-11 2019-06-19 [1]
nlme 3.1-141 2019-08-01 [1]
nloptr 1.2.1 2018-10-03 [1]
nnet 7.3-12 2016-02-02 [1]
pan 1.6 2018-06-29 [1]
pander * 0.6.3 2018-11-06 [1]
pbapply 1.4-1 2019-07-15 [1]
pbivnorm 0.6.0 2015-01-23 [1]
pbmcapply 1.5.0 2019-07-10 [1]
pca3d 0.10 2017-02-17 [1]
pegas * 0.11 2018-07-09 [1]
permute 0.9-5 2019-03-12 [1]
phangorn 2.5.5 2019-06-19 [1]
pillar 1.4.2 2019-06-29 [1]
pinfsc50 1.1.0 2016-12-02 [1]
pixmap * 0.4-11 2011-07-19 [1]
pkgbuild 1.0.4 2019-08-05 [1]
pkgconfig 2.0.2 2018-08-16 [1]
pkgload 1.0.2 2018-10-29 [1]
plyr 1.8.4 2016-06-08 [1]
png 0.1-7 2013-12-03 [1]
polysat 1.7-4 2019-03-06 [1]
PopGenReport 3.0.4 2019-02-04 [1]
poppr * 2.8.3 2019-06-18 [1]
prettyunits 1.0.2 2015-07-13 [1]
processx 3.4.1 2019-07-18 [1]
promises 1.0.1 2018-04-13 [1]
proto 1.0.0 2016-10-29 [1]
ps 1.3.0 2018-12-21 [1]
psych 1.8.12 2019-01-12 [1]
purrr 0.3.2 2019-03-15 [1]
qgraph 1.6.3 2019-06-19 [1]
quadprog 1.5-7 2019-05-06 [1]
qvalue 2.16.0 2019-05-02 [1]
R.methodsS3 1.7.1 2016-02-16 [1]
R.oo 1.22.0 2018-04-22 [1]
R.utils 2.9.0 2019-06-13 [1]
R6 2.4.0 2019-02-14 [1]
radiator * 1.1.1 2019-08-12 [1]
raster 2.9-23 2019-07-11 [1]
RColorBrewer * 1.1-2 2014-12-07 [1]
Rcpp 1.0.2 2019-07-25 [1]
readr 1.3.1 2018-12-21 [1]
remotes 2.1.0 2019-06-24 [1]
reshape 0.8.8 2018-10-23 [1]
reshape2 * 1.4.3 2017-12-11 [1]
rgdal 1.4-4 2019-05-29 [1]
rgeos 0.5-1 2019-08-05 [1]
rgl 0.100.26 2019-07-08 [1]
RgoogleMaps 1.4.3 2018-11-07 [1]
rJava * 0.9-11 2019-03-29 [1]
rjson 0.2.20 2018-06-08 [1]
rlang 0.4.0 2019-06-25 [1]
robustbase 0.93-5 2019-05-12 [1]
rpart 4.1-15 2019-04-12 [1]
rprojroot 1.3-2 2018-01-03 [1]
rrBLUP 4.6 2018-01-28 [1]
Rsolnp 1.16 2015-12-28 [1]
rstudioapi 0.10 2019-03-19 [1]
rtiff * 1.4.6 2019-03-21 [1]
scales 1.0.0 2018-08-09 [1]
sendplot * 4.0.0 2013-04-25 [1]
seqinr 3.4-5 2017-08-01 [1]
sessioninfo 1.1.1 2018-11-05 [1]
sf 0.7-7 2019-07-24 [1]
shiny 1.3.2 2019-04-22 [1]
SNPRelate 1.18.1 2019-07-03 [1]
sp 1.3-1 2018-06-05 [1]
spData 0.3.0 2019-01-07 [1]
spdep 1.1-2 2019-04-05 [1]
StAMPP * 1.5.1 2017-11-10 [1]
stringi 1.4.3 2019-03-12 [1]
stringr 1.4.0 2019-02-10 [1]
survival 2.44-1.1 2019-04-01 [1]
tensorA 0.36.1 2018-07-29 [1]
testthat 2.2.1 2019-07-25 [1]
tibble 2.1.3 2019-06-06 [1]
tidyr 0.8.3 2019-03-01 [1]
tidyselect 0.2.5 2018-10-11 [1]
truncnorm 1.0-8 2018-02-27 [1]
units 0.6-3 2019-05-03 [1]
usethis * 1.5.1 2019-07-04 [1]
utf8 1.1.4 2018-05-24 [1]
vcfR 1.8.0 2018-04-17 [1]
vctrs 0.2.0 2019-07-05 [1]
vegan 2.5-5 2019-05-12 [1]
viridisLite 0.3.0 2018-02-01 [1]
webshot 0.5.1 2018-09-28 [1]
whisker 0.3-2 2013-04-28 [1]
withr 2.1.2 2018-03-15 [1]
xfun 0.8 2019-06-25 [1]
xlsx * 0.6.1 2018-06-11 [1]
xlsxjars * 0.6.1 2014-08-22 [1]
xtable 1.8-4 2019-04-21 [1]
zeallot 0.1.0 2018-01-28 [1]
zvau * 0.27 2019-08-06 [1]
source
CRAN (R 3.6.0)
CRAN (R 3.6.0)
CRAN (R 3.6.0)
CRAN (R 3.6.0)
CRAN (R 3.6.0)
CRAN (R 3.6.0)
CRAN (R 3.6.0)
CRAN (R 3.6.0)
CRAN (R 3.6.0)
CRAN (R 3.6.0)
CRAN (R 3.6.0)
CRAN (R 3.6.0)
CRAN (R 3.6.0)
CRAN (R 3.6.0)
CRAN (R 3.6.0)
CRAN (R 3.6.0)
CRAN (R 3.6.0)
CRAN (R 3.6.1)
CRAN (R 3.6.1)
CRAN (R 3.6.0)
CRAN (R 3.6.1)
CRAN (R 3.6.0)
CRAN (R 3.6.1)
CRAN (R 3.6.0)
CRAN (R 3.6.0)
CRAN (R 3.6.0)
CRAN (R 3.6.0)
CRAN (R 3.6.0)
CRAN (R 3.6.0)
CRAN (R 3.6.0)
CRAN (R 3.6.0)
Github (green-striped-gecko/dartR@3f9eebd)
CRAN (R 3.6.0)
CRAN (R 3.6.0)
CRAN (R 3.6.0)
CRAN (R 3.6.0)
CRAN (R 3.6.0)
CRAN (R 3.6.0)
CRAN (R 3.6.0)
CRAN (R 3.6.0)
CRAN (R 3.6.0)
CRAN (R 3.6.0)
CRAN (R 3.6.0)
CRAN (R 3.6.0)
CRAN (R 3.6.0)
CRAN (R 3.6.0)
CRAN (R 3.6.0)
CRAN (R 3.6.0)
CRAN (R 3.6.0)
CRAN (R 3.6.0)
CRAN (R 3.6.0)
CRAN (R 3.6.0)
CRAN (R 3.6.0)
CRAN (R 3.6.0)
CRAN (R 3.6.0)
CRAN (R 3.6.0)
CRAN (R 3.6.0)
CRAN (R 3.6.0)
CRAN (R 3.6.0)
Bioconductor
CRAN (R 3.6.0)
CRAN (R 3.6.0)
CRAN (R 3.6.0)
CRAN (R 3.6.0)
CRAN (R 3.6.1)
CRAN (R 3.6.0)
CRAN (R 3.6.1)
CRAN (R 3.6.0)
CRAN (R 3.6.0)
CRAN (R 3.6.0)
CRAN (R 3.6.0)
CRAN (R 3.6.0)
CRAN (R 3.6.0)
CRAN (R 3.6.0)
CRAN (R 3.6.0)
CRAN (R 3.6.0)
CRAN (R 3.6.0)
CRAN (R 3.6.0)
CRAN (R 3.6.0)
CRAN (R 3.6.0)
CRAN (R 3.6.0)
CRAN (R 3.6.0)
CRAN (R 3.6.0)
CRAN (R 3.6.0)
CRAN (R 3.6.0)
CRAN (R 3.6.0)
CRAN (R 3.6.0)
CRAN (R 3.6.0)
CRAN (R 3.6.0)
CRAN (R 3.6.0)
CRAN (R 3.6.1)
CRAN (R 3.6.1)
CRAN (R 3.6.0)
CRAN (R 3.6.0)
CRAN (R 3.6.0)
CRAN (R 3.6.1)
CRAN (R 3.6.0)
CRAN (R 3.6.0)
CRAN (R 3.6.0)
CRAN (R 3.6.0)
CRAN (R 3.6.0)
CRAN (R 3.6.0)
CRAN (R 3.6.0)
CRAN (R 3.6.0)
CRAN (R 3.6.1)
CRAN (R 3.6.1)
CRAN (R 3.6.0)
CRAN (R 3.6.1)
CRAN (R 3.6.0)
CRAN (R 3.6.0)
CRAN (R 3.6.0)
CRAN (R 3.6.0)
CRAN (R 3.6.0)
CRAN (R 3.6.0)
CRAN (R 3.6.0)
CRAN (R 3.6.0)
CRAN (R 3.6.0)
CRAN (R 3.6.0)
CRAN (R 3.6.0)
CRAN (R 3.6.1)
CRAN (R 3.6.0)
CRAN (R 3.6.0)
CRAN (R 3.6.1)
CRAN (R 3.6.0)
CRAN (R 3.6.0)
CRAN (R 3.6.0)
CRAN (R 3.6.0)
CRAN (R 3.6.0)
CRAN (R 3.6.0)
CRAN (R 3.6.0)
CRAN (R 3.6.0)
CRAN (R 3.6.0)
CRAN (R 3.6.1)
CRAN (R 3.6.0)
CRAN (R 3.6.0)
CRAN (R 3.6.0)
CRAN (R 3.6.0)
CRAN (R 3.6.0)
CRAN (R 3.6.0)
CRAN (R 3.6.0)
CRAN (R 3.6.0)
CRAN (R 3.6.0)
CRAN (R 3.6.0)
CRAN (R 3.6.0)
CRAN (R 3.6.0)
CRAN (R 3.6.0)
CRAN (R 3.6.0)
CRAN (R 3.6.0)
CRAN (R 3.6.0)
Bioconductor
CRAN (R 3.6.0)
CRAN (R 3.6.0)
CRAN (R 3.6.0)
CRAN (R 3.6.0)
Github (c9804ca)
CRAN (R 3.6.0)
CRAN (R 3.6.0)
CRAN (R 3.6.0)
CRAN (R 3.6.0)
CRAN (R 3.6.0)
CRAN (R 3.6.0)
CRAN (R 3.6.0)
CRAN (R 3.6.0)
CRAN (R 3.6.0)
CRAN (R 3.6.0)
CRAN (R 3.6.0)
CRAN (R 3.6.0)
CRAN (R 3.6.0)
CRAN (R 3.6.0)
CRAN (R 3.6.0)
CRAN (R 3.6.1)
CRAN (R 3.6.0)
CRAN (R 3.6.0)
CRAN (R 3.6.0)
CRAN (R 3.6.0)
CRAN (R 3.6.0)
CRAN (R 3.6.0)
CRAN (R 3.6.0)
CRAN (R 3.6.0)
CRAN (R 3.6.0)
CRAN (R 3.6.0)
CRAN (R 3.6.0)
Bioconductor
CRAN (R 3.6.0)
CRAN (R 3.6.0)
CRAN (R 3.6.0)
CRAN (R 3.6.0)
CRAN (R 3.6.0)
CRAN (R 3.6.0)
CRAN (R 3.6.1)
CRAN (R 3.6.0)
CRAN (R 3.6.0)
CRAN (R 3.6.0)
CRAN (R 3.6.0)
CRAN (R 3.6.0)
CRAN (R 3.6.0)
CRAN (R 3.6.0)
CRAN (R 3.6.0)
CRAN (R 3.6.0)
CRAN (R 3.6.0)
CRAN (R 3.6.0)
CRAN (R 3.6.0)
CRAN (R 3.6.0)
CRAN (R 3.6.0)
CRAN (R 3.6.0)
CRAN (R 3.6.0)
CRAN (R 3.6.0)
CRAN (R 3.6.0)
CRAN (R 3.6.0)
CRAN (R 3.6.0)
CRAN (R 3.6.0)
Github (romunov/zvau@72f403b)
[1] /Library/Frameworks/R.framework/Versions/3.6/Resources/library
Screenshots
If applicable, add screenshots to help explain your problem.
Hi Thierry,
I am attempting to use radiator::filter_rad to filter a populations.snps.vcf file from Stacks, but I am consistently getting an error in ''generating individual stats" and then I do not get the output files I've specified. Based on testing with different arguments and versions of the input file (including after filtering out some missing data in vcftools) and on reading other issues here, I suspect there might be a problem with my vcf header that I'm unaware of.
I appreciate any help you can offer. Thanks for making this package and I hope to be able to make it part of my workflow!
-Sarah
All of these lead to the same error:
myfiltereddata <- filter_rad(
data = "populations.snps.vcf",
filter.short.ld = "random",
filter.long.ld = 0.5,
filter.hwe = TRUE,
strata = "strata_locyear.txt",
output = c("vcf","genind"),
filename = "filter_rad_output")
myfiltereddata <- filter_rad(
data = "populations.snps.vcf",
filter.short.ld = "random",
filter.long.ld = 0.5,
filter.hwe = TRUE,
strata = NULL,
output = c("vcf","genind"),
filename = "filter_rad_output")
myfiltereddata <- filter_rad(
data = "populations.snps.vcf",
strata = NULL,
interactive.filter = TRUE)
myfiltereddata <- filter_rad(
data = "populations.snps.vcftools.recode",
strata = NULL,
interactive.filter = TRUE)
myfiltereddata <- filter_rad(
data = "RH_subset.vcf",
strata = NULL,
interactive.filter = TRUE)
myfiltereddata <- filter_rad(
Execution date@time: 20190820@0008
Folder created: filter_rad_20190820@0008
Function call and arguments stored in: [email protected]
File written: random.seed (563656)
Filters parameters file generated: [email protected]
Reading VCF
Data summary:
number of samples: 94
number of markers: 185
done! timing: 0 sec
Generating individual stats...
Error in if (stats::sd(id.info$COVERAGE_MEAN) != 0) { :
missing value where TRUE/FALSE needed
Computation time, overall: 1 sec
############################# completed filter_rad #############################
RH_subset.vcf.zip
devtools::session_info()
############################# completed filter_rad #############################
devtools::session_info()
─ Session info ──────────────────────────────────────────────────────────────────────────────────
setting value
version R version 3.6.0 (2019-04-26)
os macOS High Sierra 10.13.6
system x86_64, darwin15.6.0
ui RStudio
language (EN)
collate en_US.UTF-8
ctype en_US.UTF-8
tz America/New_York
date 2019-08-20
─ Packages ──────────────────────────────────────────────────────────────────────────────────────
package * version date lib source
assertthat 0.2.1 2019-03-21 [1] CRAN (R 3.6.0)
backports 1.1.4 2019-04-10 [1] CRAN (R 3.6.0)
Biobase 2.44.0 2019-05-02 [1] Bioconductor
BiocGenerics 0.30.0 2019-05-02 [1] Bioconductor
Biostrings 2.52.0 2019-05-02 [1] Bioconductor
bitops 1.0-6 2013-08-17 [1] CRAN (R 3.6.0)
boot 1.3-23 2019-07-05 [1] CRAN (R 3.6.0)
broom 0.5.2 2019-04-07 [1] CRAN (R 3.6.0)
callr 3.3.1 2019-07-18 [1] CRAN (R 3.6.0)
cli 1.1.0 2019-03-19 [1] CRAN (R 3.6.0)
crayon 1.3.4 2017-09-16 [1] CRAN (R 3.6.0)
data.table 1.12.2 2019-04-07 [1] CRAN (R 3.6.0)
desc 1.2.0 2018-05-01 [1] CRAN (R 3.6.0)
devtools 2.1.0 2019-07-06 [1] CRAN (R 3.6.0)
digest 0.6.20 2019-07-04 [1] CRAN (R 3.6.0)
dplyr 0.8.3 2019-07-04 [1] CRAN (R 3.6.0)
fs 1.3.1 2019-05-06 [1] CRAN (R 3.6.0)
gdsfmt 1.20.0 2019-05-02 [1] Bioconductor
generics 0.0.2 2018-11-29 [1] CRAN (R 3.6.0)
GenomeInfoDb 1.20.0 2019-05-02 [1] Bioconductor
GenomeInfoDbData 1.2.1 2019-08-02 [1] Bioconductor
GenomicRanges 1.36.0 2019-05-02 [1] Bioconductor
glue 1.3.1 2019-03-12 [1] CRAN (R 3.6.0)
GWASExactHW 1.01 2013-01-05 [1] CRAN (R 3.6.0)
hms 0.5.0 2019-07-09 [1] CRAN (R 3.6.0)
IRanges 2.18.1 2019-05-31 [1] Bioconductor
jomo 2.6-9 2019-07-29 [1] CRAN (R 3.6.0)
lattice 0.20-38 2018-11-04 [1] CRAN (R 3.6.0)
lme4 1.1-21 2019-03-05 [1] CRAN (R 3.6.0)
logistf 1.23 2018-07-19 [1] CRAN (R 3.6.0)
magrittr 1.5 2014-11-22 [1] CRAN (R 3.6.0)
MASS 7.3-51.4 2019-03-31 [1] CRAN (R 3.6.0)
Matrix 1.2-17 2019-03-22 [1] CRAN (R 3.6.0)
memoise 1.1.0 2017-04-21 [1] CRAN (R 3.6.0)
mgcv 1.8-28 2019-03-21 [1] CRAN (R 3.6.0)
mice 3.6.0 2019-07-10 [1] CRAN (R 3.6.0)
minqa 1.2.4 2014-10-09 [1] CRAN (R 3.6.0)
mitml 0.3-7 2019-01-07 [1] CRAN (R 3.6.0)
nlme 3.1-140 2019-05-12 [1] CRAN (R 3.6.0)
nloptr 1.2.1 2018-10-03 [1] CRAN (R 3.6.0)
nnet 7.3-12 2016-02-02 [1] CRAN (R 3.6.0)
pan 1.6 2018-06-29 [1] CRAN (R 3.6.0)
pillar 1.4.2 2019-06-29 [1] CRAN (R 3.6.0)
pkgbuild 1.0.3 2019-03-20 [1] CRAN (R 3.6.0)
pkgconfig 2.0.2 2018-08-16 [1] CRAN (R 3.6.0)
pkgload 1.0.2 2018-10-29 [1] CRAN (R 3.6.0)
prettyunits 1.0.2 2015-07-13 [1] CRAN (R 3.6.0)
processx 3.4.1 2019-07-18 [1] CRAN (R 3.6.0)
ps 1.3.0 2018-12-21 [1] CRAN (R 3.6.0)
purrr 0.3.2 2019-03-15 [1] CRAN (R 3.6.0)
R6 2.4.0 2019-02-14 [1] CRAN (R 3.6.0)
radiator * 1.1.2 2019-08-20 [1] Github (53a137d)
Rcpp 1.0.2 2019-07-25 [1] CRAN (R 3.6.0)
RCurl 1.95-4.12 2019-03-04 [1] CRAN (R 3.6.0)
readr 1.3.1 2018-12-21 [1] CRAN (R 3.6.0)
remotes 2.1.0 2019-06-24 [1] CRAN (R 3.6.0)
rlang 0.4.0 2019-06-25 [1] CRAN (R 3.6.0)
rpart 4.1-15 2019-04-12 [1] CRAN (R 3.6.0)
rprojroot 1.3-2 2018-01-03 [1] CRAN (R 3.6.0)
rstudioapi 0.10 2019-03-19 [1] CRAN (R 3.6.0)
S4Vectors 0.22.0 2019-05-02 [1] Bioconductor
SeqArray 1.24.2 2019-07-12 [1] Bioconductor
SeqVarTools 1.22.0 2019-05-02 [1] Bioconductor
sessioninfo 1.1.1 2018-11-05 [1] CRAN (R 3.6.0)
stringi 1.4.3 2019-03-12 [1] CRAN (R 3.6.0)
survival 2.44-1.1 2019-04-01 [1] CRAN (R 3.6.0)
testthat 2.2.1 2019-07-25 [1] CRAN (R 3.6.0)
tibble 2.1.3 2019-06-06 [1] CRAN (R 3.6.0)
tidyr 0.8.3 2019-03-01 [1] CRAN (R 3.6.0)
tidyselect 0.2.5 2018-10-11 [1] CRAN (R 3.6.0)
usethis 1.5.1 2019-07-04 [1] CRAN (R 3.6.0)
vctrs 0.2.0 2019-07-05 [1] CRAN (R 3.6.0)
withr 2.1.2 2018-03-15 [1] CRAN (R 3.6.0)
XVector 0.24.0 2019-05-02 [1] Bioconductor
yaml 2.2.0 2018-07-25 [1] CRAN (R 3.6.0)
zeallot 0.1.0 2018-01-28 [1] CRAN (R 3.6.0)
zlibbioc 1.30.0 2019-05-02 [1] Bioconductor
[1] /Library/Frameworks/R.framework/Versions/3.6/Resources/library
See attached subset
Hi Thierry! When I try to run genomic_converter with blacklist.id, I get a fatal error and RStudio shuts down. It seems to be able to read the file because it says there are 11 individuals in the file (which is correct), but after it reads the loci and says Done, the fatal error occurs. I've run genomic_converter without blacklist.id and it works fine so there seems to be an issue with the blacklist.id.
Dear Thierry,
I have been trying to run the function run_bayescan
after filtering my SNPs in radiator. I keep getting the error below from R:
This is the code I used and all files are available at the shared Dropbox link:
path = "C:/Users/tj248/OneDrive - University of Exeter/Exeter University/PhD Project Documents/Software/BayeScan v 2.1/binaries/BayeScan2.1_win32bits_cmd_line.exe"
bayes = run_bayescan("../../R Radiator SNP Filtering/seafan_filt.txt", pr_odds = 100, bayescan.path = path)
I also ran the program using the Windows GUI (using the same input file seafan_filt.txt
, and this has told me what appears to be the issue.
When I checked the seafan_filt.txt
I noticed that locus 212 is missing from the file which was generated using the genomic_converter
function. The code I used for genomic_converter is below:
seafan_filt = genomic_converter(seafan, output=c("arlequin","bayescan","genepop", "pcadapt","structure","vcf","genind"), filename="seafan_filt")
I would be grateful for your help.
Many thanks,
Tom
Prepare for release:
usethis::use_cran_comments()
devtools::check()
devtools::check_win_devel()
Remotes
and other dependencies found only on GitHub :(rhub::check_for_cran()
Submit to CRAN:
usethis::use_version('minor')
cran-comments.md
devtools::submit_cran()
Wait for CRAN...
usethis::use_github_release()
usethis::use_dev_version()
usethis::use_news()
Dear Thierry,
four months ago I used radiator version 0.6 to filtered my dart data. In this version, the output (e.g estructure, genepop, arlequin) files had SNPs coded as 012. Now I have to refiltering my data but in the version 0.10 the SNPs are coded as 1234.
Is there a repository with old versions of Radiator? if not, is there any command/function to get output files coded as 012?
Thanks in advance
Hi Thierry,
Could you please check the code of the genomic converter? For some reason, it is failing to generate output files.
Please, find the code I used and the console output. Thanks
radiator::genomic_converter(data = "tidy.data.snp.ld.rad", output = c("finestructure"))
#######################################################################
##################### radiator::genomic_converter #####################
#######################################################################
Function arguments and values:
Working directory: /Users/buitracn/Documents/RADseq/RADseq-Big-project/VCF-files-poptest/stacks.v2/6pop_448samples/after.dDocent.filters.6pop_345samples/filter_rad_20180914@0156/07_filter_snp_number_20180914@0156
Input file: tidy.data.snp.ld.rad
Strata: no
Population levels: no
Population labels: no
Output format(s): tidy, finestructure
Filename prefix: no
Filters:
Blacklist of individuals: no
Blacklist of genotypes: no
Whitelist of markers: no
monomorphic.out: TRUE
snp.ld: no
common.markers: TRUE
max.marker: no
pop.select: no
maf.thresholds: no
Imputations options:
imputation.method: no
parallel.core: 7
#######################################################################
Importing data
Using markers common in all populations:
Number of markers before/blacklisted/after:24757/0/24757
Scanning for monomorphic markers...
Number of markers before/blacklisted/after: 24757/0/24757
Tidy genomic data:
Number of common markers: 24757
Number of chromosome/contig/scaffold: 905
Number of individuals: 309
Number of populations: 6
Preparing data for output
Error in $<-.data.frame
(*tmp*
, "GT", value = character(0)) :
replacement has 0 rows, data has 7649913
In addition: Warning message:
Unknown or uninitialised column: 'GT'.
Attempting to run fis_summary on either a tidy or genlight object returns an error.
fis.test <- fis_summary(tidy.final, vcf.metadata = FALSE)
#######################################################################
##################### radiator::tidy_genomic_data #####################
#######################################################################
Error in radiator::tidy_genomic_data(data = data, vcf.metadata = TRUE, :
Unknowned "..." parameters maf.pop.num.threshold maf.approach maf.operator
Variations of the above code were used with identical results.
Hi Thierry,
We are trying to use the RF imputation in radiator. I've emailed you the code and data set. I've tried this with both an older version of radiator (0.0.13) and the current release (0.0.18). In both cases I'm using R 3.5.1 and adegenet 2.1.1.
If you run the code, you'll see that the problem is that the non-imputed and imputed data do not match the original data set -- and they don't match in different ways. I've pasted in that content below (the first 10 rows and 6 columns of each data set). You'll see that while the SNP/column names match in the original (dat) and non-imputed (gc$genlight.no.imputation) data sets, the actual data do not. Then, in the imputed data (gc$genlight.imputed), neither the SNP names nor the data match. In all cases, the row names seem to be maintained. Any idea what is going on here?
Thank you,
Brenna
Hi,
I have just started using your package to try to identify sex linked markers in a DArT dataset, and have run into problems with the "sexy_markers" function. I am sure this is probably due to my own incompetence, but as this is quite a new package I quickly ran into a dead end for troubleshooting short of pestering you here, so I apologise.
I created a "strata" file (individual IDs from Dart file + "strata", which is sex - M, F, U in this case) and have a csv file from Dart. I ran the following code using these two files
sexy_markers("Report_DAmh19-4061_SNP_2", strata = AM_STRATA.tsv, filters = TRUE)
and received the following error
Execution date@time: 20190709@1555
Folder created: sexy_markers_20190709@1555
File written: [email protected]
Error in file(con, "r") : cannot open the connection
Computation time, overall: 0 sec
############################ sexy markers completed #############################
Do you have any suggestions as to how I can resolve this issue?
Thanks!
Hi Thierry,
I'm trying out ibdg_fh, but I'm not sure I understand your modification to the way it's calculated in PLINK. It looks like I can recreate the radiator-calculated FH by dividing the difference of the observed and expected homozygous proportions by the count of loci, while PLINK uses all counts as described at https://www.cog-genomics.org/plink/1.9/basic_stats#ibc (same result if use all proportions; issue just seems to be mixing proportions with counts in the calculation). Attaching a spreadsheet with radiator values and manually-calculated PLINK values to make it clearer.
The description of the radiator ibdg_fh function sounds like you did modify it intentionally to differ from PLINK, but the current calculation seems like a bug. Also, is the population-level FH calculated by averaging the individual FH values?
And relatedly, could you explain how the calculations for summary_rad() are produced (in comparison to, say, basic.stats() from hierfstat? FIS values in particular are vastly different. Spreadsheet comparing those also attached.
sphanorth124spatial.fh.individuals.xlsx
hierfstatbasicstats.radiatorsummarystats.sphanorth124spatial.radiator.50pctmsng.xlsx
Hi Thierry,
Working with the package, mainly to clean, import and convert SNP data to different formats, I've been trying to use genomic_converter()
function and came up with a few issues with its behaviour:
SNPRelate
format, it ignores the provided output filename and creates a date-signature based one (see related pull request).vcf.metadata=TRUE
argument with a VCF file resulted in an error (object DP not found
).blacklist.id
argument can accept either a file or a data.frame object, while blacklist.genotype
can only a filename containing a data.frame. I know it appears in the function documentation, but this inconsistency got me confused for a while until I double checked the fine details. I suggest making both arguments work with R objects, it makes much more sense than relying on files.snp.ld
lets you choose the first, last or random SNP, while to me it makes sense to allow choosing a SNP that is NOT first nor last, because the ones at the tag ends are often supported by fewer reads and are less usable in validation (if flanking primers are to be designed).That's it for now, thanks, Ido
Hi Thierry,
I'm trying to do a quick PCAdapt analysis with a tidy file generated in radiator, but it seems that something is going wrong with file conversion to both pcadapt and vcf formats, or that our population labels are not working correctly. Currently, pops are labelled using numbers. I can send you files if it makes it easier to address, but I will write as much as possible here. I am running R v 3.5.1, and did a fresh install of radiator.
When I try to convert the tidy dataframe to pcadapt format using
write_pcadapt("brook_char_tidy_maf.tsv", pop.select = c("1","2","3","4","5","6","7","8","9","10","11","14","16","17"), snp.ld = NULL,
maf.thresholds = NULL, filename = NULL,
parallel.core = parallel::detectCores() - 1)
Everything seems to go fine, except that I get a lot of zeros showing up in the data in the console. This seems a little odd.
Small sampling here
$genotype.matrix
Blackfly-d1084 Blackfly-d1191 Blackfly-d1797
[1,] "0" "0" "0"
[2,] "9" "0" "0"
[3,] "9" "1" "2"
Blackfly-d2016 Blackfly-d2125 Blackfly-d2204
[1,] "0" "0" "0"
[2,] "0" "1" "1"
[3,] "2" "2" "2"
Blackfly-d2305 Blackfly-d2365 Blackfly-d2388
[1,] "0" "0" "0"
[2,] "0" "1" "0"
[3,] "1" "0" "2"
Blackfly-d2732 Blackfly-d2771-bf Blackfly-d2975
[1,] "0" "0" "0"
[2,] "0" "0" "0"
[3,] "1" "2" "1"
Blackfly-d3277 Blackfly-d3469 Blackfly-d3507
[1,] "0" "0" "0"
[2,] "0" "1" "2"
[3,] "2" "2" "1"
Blackfly-d3580 Blackfly-d3638 Blackfly-d503
[1,] "0" "0" "0"
[2,] "0" "2" "1"
[3,] "2" "1" "2"
Blackfly-d940 BobsCove-C1028 BobsCove-C1090
[1,] "0" "0" "0"
[2,] "0" "0" "0"
[3,] "0" "0" "0"
BobsCove-C1140 BobsCove-C1145 BobsCove-C1148
[1,] "0" "0" "0"
[2,] "9" "0" "0"
[3,] "0" "0" "0"
BobsCove-C1189 BobsCove-C1215 BobsCove-C1401
[1,] "0" "0" "0"
[2,] "0" "0" "0"
[3,] "0" "0" "0"
BobsCove-C1449 BobsCove-C1476 BobsCove-C1493
[1,] "0" "0" "0"
[2,] "0" "0" "0"
[3,] "0" "0" "0"
BobsCove-C1526 BobsCove-C1527 BobsCove-C1580
[1,] "0" "0" "0"
[2,] "0" "0" "0"
[3,] "0" "0" "0"
BobsCove-C1607 BobsCove-C1654 BobsCove-C2255
[1,] "0" "0" "0"
[2,] "0" "0" "0"
[3,] "0" "0" "0"
BobsCove-C2279 BobsCove-C2304 BobsCove-C2316
[1,] "0" "9" "0"
[2,] "0" "0" "0"
[3,] "0" "0" "0"
BobsCove-C2390 BobsCove-C2435 BobsCove-C2451
[1,] "0" "0" "0"
[2,] "0" "0" "0"
[3,] "0" "0" "0"
BobsCove-C2452 BobsCove-C2462 BobsCove-C2530
Looking through the tsv tidy file (which is my collaborators), I see some things that may be a bit odd, but he has produced many analyses successfully with the file, so perhaps it is fine. However, I notice things like that the ALT and REF alleles are the same for the whole file (diff from eachother, but the same for all loci). Anyway, in case this file is fine I run
require(pcadapt)
path_to_file <- "D:/Documents/PostDoc/collaboration-MYates-CapeRace-PondStocking/[email protected]"
MY <- read.pcadapt(path_to_file,type="pcadapt")
And I get the error
1547 lines detected.
324 columns detected.
Warning message:
Only one 'return' characters detected, yet Windows
When I try a different route, using
write_vcf("brook_char_tidy_maf.tsv", pop.info = TRUE, filename = NULL)
I get an error
Error in .f(.x[[i]], ...) : object 'POP_ID' not found
However, I know that this field is in the datafile, so this is weird.
To test to see if the file works without the populations field, I try
write_vcf("brook_char_tidy_maf.tsv", pop.info = FALSE, filename = NULL)
The vcf that I get out of this looks strange in comparison to other vcf files that I have worked with. There are a lot of zeros. Looking at the tidy file though, this may be due to some issue converting the vcf to a tidy file. I can send you both the vcf and the tidy files if you'd like. Anyway, I didn't get any complains from R, so I ran
require(pcadapt)
path_to_file <- "D:/Documents/PostDoc/collaboration-MYates-CapeRace-PondStocking/[email protected]"
MY <- read.pcadapt(path_to_file,type="vcf")
This looks more like what I should see
No variant got discarded.
Summary:
- input file: D:/Documents/PostDoc/collaboration-MYates-CapeRace-PondStocking/[email protected]
- output file: C:\Users\Admin\AppData\Local\Temp\RtmpCeIy6G\file183c1e3197.pcadapt
- number of individuals detected: 324
- number of loci detected: 4614
4614 lines detected.
324 columns detected.
I fetched my output file, and then ran
x <- pcadapt(MY,K=20)
plot(x,option="screeplot")
data <- as.matrix(read.table("file183c1e3197.pcadapt"))
#check the data
data[1:5,1:6]
V1 V2 V3 V4 V5 V6
[1,] 9 9 9 9 9 9
[2,] 9 0 1 2 0 2
[3,] 0 0 0 0 0 0
[4,] 9 9 9 9 9 9
[5,] 9 0 9 0 0 0
#Without removal of outliers
I skip the population id step b/c we didn't identify them, so go to
x<-pcadapt(data,K=9)
And then it errors out
Error in UseMethod("pcadapt") :
no applicable method for 'pcadapt' applied to an object of class "c('matrix', 'integer', 'numeric')"
Any thoughts about what could be causing these problems? I feel like I should send you my .tsv tidy file, and maybe the vcf and pcadapt files that i generated from those in radiator.
With thanks,
Ella
Hi Thierry,
We have a new RF issue with v0.0.20, where a warning indicates that there's still missing data after imputation, even though I don't see any NA in the imputed genlight object.
Thanks for any help!
Amanda
> gc <- radiator::genomic_converter(data = miss.genlight,
+ output = "genlight",
+ imputation.method = "rf",
+ monomorphic.out = FALSE,
+ hierarchical.levels = "global",
+ verbose = TRUE)
#######################################################################
##################### radiator::genomic_converter #####################
#######################################################################
Function arguments and values:
Working directory: /mnt/ceph/stah3621/imputation
Input file: from global environment
Strata: no
Population levels: no
Population labels: no
Output format(s): tidy, genlight
Filename prefix: no
Filters:
Blacklist of individuals: no
Blacklist of genotypes: no
Whitelist of markers: no
monomorphic.out: FALSE
snp.ld: no
common.markers: TRUE
max.marker: no
pop.select: no
maf.thresholds: no
Imputations options:
imputation.method: rf
hierarchical.levels: global
parallel.core: 47
#######################################################################
Importing data
Number of markers missing in all individuals and removed: 1
Tidy genomic data:
Number of markers: 500
Number of chromosome/contig/scaffold: 1
Number of individuals: 94
Preparing data for output
Data is bi-allelic
#######################################################################
####################### grur::grur_imputations ########################
#######################################################################
Imputation method: rf
Hierarchical levels: global
On-the-fly-imputations options:
number of trees to grow: 50
minimum terminal node size: 1
non-negative integer value used to specify random splitting: 10
number of iterations: 10
Number of CPUs: 47
Note: If you have speed issues: follow radiator's vignette on parallel computing
Number of populations: 1
Number of individuals: 94
Number of markers: 500
Proportion of missing genotypes before imputations: 0.298319
On-the-fly-imputations using Random Forests algorithm
Imputations computed globally, take a break...
Adjusting REF/ALT alleles to account for imputations...
generating REF/ALT dictionary
integrating new genotype codings...
Proportion of missing genotypes after imputations: 0
Computation time: 8 sec
################## grur::grur_imputations completed ###################
Generating adegenet genlight object without imputation
Generating adegenet genlight object WITH imputations
Writing tidy data set:
[email protected]
Writing tidy data set:
[email protected]
############################### RESULTS ###############################
Data format of input: genlight
Biallelic data
Number of common markers: 500
Number of chromosome/contig/scaffold: 1
Number of individuals 94
Computation time: 11 sec
################ radiator::genomic_converter completed ################
Warning messages:
1: In cleanup(mc.cleanup) : unable to terminate child: No such process
2: In radiator::radiator_imputations_module(data = input, imputation.method = imputation.method, :
Missing data is still present in the dataset
2 options:
run the function again with hierarchical.levels = 'global'
use common.markers = TRUE when using hierarchical.levels = 'strata'
> which(is.na(as.matrix(gc$genlight.imputed)))
integer(0)
> which(is.na(as.matrix(gc$genlight.no.imputation)))[1:10]
[1] 1 2 4 5 8 9 11 12 16 17
> sessionInfo()
R version 3.5.0 (2018-04-23)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: CentOS Linux 7 (Core)
Matrix products: default
BLAS: /opt/modules/devel/R/3.5.0/lib64/R/lib/libRblas.so
LAPACK: /opt/modules/devel/R/3.5.0/lib64/R/lib/libRlapack.so
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods
[7] base
other attached packages:
[1] bindrcpp_0.2.2 randomForestSRC_2.8.0 psych_1.8.10
[4] vegan_2.5-3 lattice_0.20-38 permute_0.9-4
[7] tidyr_0.8.2 adegenet_2.1.1 ade4_1.7-13
[10] radiator_0.0.18
loaded via a namespace (and not attached):
[1] nlme_3.1-137 fs_1.2.6 usethis_1.4.0
[4] devtools_2.0.1 gmodels_2.18.1 rprojroot_1.3-2
[7] tools_3.5.0 backports_1.1.3 R6_2.3.0
[10] spData_0.3.0 lazyeval_0.2.1 mgcv_1.8-26
[13] colorspace_1.4-0 withr_2.1.2 sp_1.3-1
[16] tidyselect_0.2.5 prettyunits_1.0.2 mnormt_1.5-5
[19] processx_3.2.1 curl_3.3 compiler_3.5.0
[22] cli_1.0.1 expm_0.999-3 desc_1.2.0
[25] scales_1.0.0 readr_1.3.1 callr_3.1.1
[28] stringr_1.3.1 digest_0.6.18 foreign_0.8-71
[31] pkgconfig_2.0.2 htmltools_0.3.6 fst_0.8.10
[34] sessioninfo_1.1.1 rlang_0.3.1 shiny_1.2.0
[37] bindr_0.1.1 gtools_3.8.1 spdep_0.8-1
[40] dplyr_0.7.8 magrittr_1.5 Matrix_1.2-15
[43] Rcpp_1.0.0 munsell_0.5.0 ape_5.2
[46] stringi_1.2.4 MASS_7.3-51.1 pkgbuild_1.0.2
[49] plyr_1.8.4 grid_3.5.0 parallel_3.5.0
[52] gdata_2.18.0 listenv_0.7.0 promises_1.0.1
[55] crayon_1.3.4 deldir_0.1-15 splines_3.5.0
[58] hms_0.4.2 ps_1.3.0 pillar_1.3.1
[61] igraph_1.2.2 boot_1.3-20 seqinr_3.4-5
[64] reshape2_1.4.3 codetools_0.2-16 pkgload_1.0.2
[67] LearnBayes_2.15.1 glue_1.3.0 data.table_1.12.0
[70] remotes_2.0.2 httpuv_1.4.5.1 testthat_2.0.1
[73] gtable_0.2.0 purrr_0.2.5 future_1.10.0
[76] amap_0.8-16 assertthat_0.2.0 ggplot2_3.1.0
[79] mime_0.6 xtable_1.8-3 coda_0.19-2
[82] later_0.7.5 tibble_2.0.1 pbmcapply_1.3.1
[85] memoise_1.1.0 cluster_2.0.7-1 globals_0.12.4
Hi Thierry,
Is there a way to not trim for markers in common when using the write_pcadapt function in radiator? I don't see a field for this in the documentation, but I'd like to try PCAdapt using all of the markers in my tidy file.
With thanks,
Ella
that's all. they return the "not found" github page
Dear Thierry,
I want to use the run_bayescan function. Apparently done everything as instructed, but I get an error... would you be able to see what went wrong?
Many thanks,
Rita
data.genind <- adegenet::import2genind("whelk_fitered.gen"),ncode = 2)
data.genind.rad <- tidy_genomic_data(data.genind,
keep.allele.names = TRUE,
tidy = TRUE,
gds = TRUE,
write = FALSE, verbose = TRUE)
write_bayescan(data.genind.rad,
pop.select = NULL,
filename = "file.bayescan",
parallel.core = parallel::detectCores() - 1
)
run_bayescan("file.bayescan",
n = 5000,
thin = 10,
nbp = 20,
pilot = 5000,
burn = 50000,
pr_odds=1000,
subsample = NULL,
iteration.subsample = 1,
parallel.core = parallel::detectCores() - 1,
bayescan.path = "/usr/local/bin/bayescan")
#######################################################################
###################### radiator::run_bayescan #########################
#######################################################################
Folder created:
radiator_bayescan_20190817@1608
For progress, look in the log file: [email protected]
Copying input BayeScan file in folder
sh: line 1: 75707 Abort trap: 6 '/usr/local/bin/bayescan' /Users/rita/Desktop/declan/radiator_bayescan_20190817@1608/file.bayescan -od /Users/rita/Desktop/declan/radiator_bayescan_20190817@1608 -all_trace -threads 3 -n 5000 -thin 10 -nbp 20 -pilot 5000 -burn 50000 -pr_odds 1000 > '/Users/rita/Desktop/declan/radiator_bayescan_20190817@1608/[email protected]' 2>&1
Importing BayeScan results
Error in if (length(x) > 1 || grepl("\n", x)) { :
missing value where TRUE/FALSE needed
In addition: Warning messages:
1: Unnamed col_types
should have the same length as col_names
. Using smaller of the two.
2: Unnamed col_types
should have the same length as col_names
. Using smaller of the two.
When I am trying to use the function, it gives this error.
Error in tidy_genomic_data(data = data, strata = strata, filename = filename, :
object 'gt.vcf.nuc' not found
Cheers
Hi Thierry,
Is it possible to tag older versions of releases for reproducibility, as suggested in r-lib/devtools#1469? From the discussion in Issue #19 , it seems like we would have to source files explicitly by commit history rather than releases.
Thanks!
Amanda
Hi,
Running radiator, updated package and tested just a few minutes ago (and was getting it the past week on radiator 1.0.0), using R 3.5.1 on Windows.
I have been unable to convert a VCF output from ipyrad 0.7.28, and the same after being filtered in vcftools, whether using filter_rad(), tidy_vcf(), or tidy_genomic_data(). It looks like it still completes the conversion to a gds, but I get no tidy .rad data frame out, and no saved object in R.
(One issue I'd first like to note to ipyrad users is: for radiator to read your vcf at all, you need to change the NUMBER=1 in ##FORMAT=<ID=CATG,Number=1,Type=String,Description="Base Counts (CATG)"> to NUMBER=4)
Header on ipyrad VCF with first line for first sample is:
##fileformat=VCFv4.0
##fileDate=2019/01/13
##source=ipyrad_v.0.7.28
##reference=genomeassembly.fasta
##phasing=unphased
##INFO=<ID=NS,Number=1,Type=Integer,Description="Number of Samples With Data">
##INFO=<ID=DP,Number=1,Type=Integer,Description="Total Depth">
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
##FORMAT=<ID=DP,Number=1,Type=Integer,Description="Read Depth">
##FORMAT=<ID=CATG,Number=4,Type=String,Description="Base Counts (CATG)">
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT sample1
locus_100006 5 . G A 13 PASS NS=120;DP=2002 GT:DP:CATG 0/0:250:0,0,0,250
The attempted functions I stall out when extracting DP information:
Reading VCF...you have time for a espresso...
Data summary:
number of samples: 267
number of markers: 44310
done! timing: 33 sec
Filter monomorphic markers
Number of individuals / strata / chrom / locus / SNP:
Blacklisted: 0 / 0 / 25 / 31 / 31
Only 1 strata...returning data
[==================================================] 100%, completed in 1s
Extracting DP information...
Error: object 'ALLELE_REF_DEPTH' not found
Computation time, overall: 47 sec
I'm assuming ALLELE_REF_DEPTH means radiator wants two depth values, one for the REF allele and one for the ALT, but my VCF only has the total depth (though shows number of CATG reads), and the VCF 4.1 format also has no mention of REF vs ALT-specific depth. Any idea what's going on? Possibly an issue with the seemingly ipyrad-specific CATG format ID?
(also unrelated: if I try to run functions with parallel.core > 1, I get this error:
Error in .DynamicClusterCall(cl, length(cl), .fun = function(.proc_idx, :
One of the nodes produced an error: Can not open file 'G:\My Drive\Illumina Sequencing Data\20181212_rangewide\sphaOCclust85\tidy_vcf_20190318@1356\[email protected]'. The process cannot access the file because it is being used by another process.
And one other miscellaneous error I get when I install/update radiator, in case it helps: "Warning: unable to re-encode 'filter_monomorphic.R' line 7"
Thanks
Hi Thierry,
Thanks for all your work with radiator! I'm getting an error when trying to convert a VCF file (v4.2, biallelic, produced by FreeBayes) to genlight. This is the command used:
genomic_converter(data="~/Downloads/TotalRawSNPsHISEQ.biallelic.vcf.recode.vcf", output='genlight', vcf.metadata = TRUE, strata="~/Downloads/strata.tsv")
And the output/error message:
#######################################################################
##################### radiator::genomic_converter #####################
#######################################################################
Function arguments and values:
Working directory: /Users/Emily
Input file: ~/Downloads/TotalRawSNPsHISEQ.biallelic.vcf.recode.vcf
Strata: ~/Downloads/strata.tsv
Population levels: no
Population labels: no
Output format(s): tidy, genlight
Filename prefix: no
Filters:
Blacklist of individuals: no
Blacklist of genotypes: no
Whitelist of markers: no
monomorphic.out: TRUE
snp.ld: no
common.markers: TRUE
max.marker: no
pop.select: no
maf.thresholds: no
Imputations options:
imputation.method: no
parallel.core: 3
#######################################################################
Importing data
Reading VCF...
Large vcf file may take several minutes...
conversion timing: 128 sec
radiator is working on the file ...
VCF is biallelic
Updating markers metadata and stats
Error in cbind_all(x) : Argument 3 must be length 378, not 3
In addition: Warning message:
In mclapply(seq_len(njobs), mc.preschedule = FALSE, mc.cores = njobs, :
3 function calls resulted in an error
And here is my sessionInfo():
R version 3.5.1 (2018-07-02)
Platform: x86_64-apple-darwin16.7.0 (64-bit)
Running under: macOS Sierra 10.12.6
Matrix products: default
BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libLAPACK.dylib
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] stats graphics grDevices utils datasets methods
[7] base
other attached packages:
[1] bindrcpp_0.2.2 radiator_0.0.16 adegenet_2.1.1 ade4_1.7-13
loaded via a namespace (and not attached):
[1] nlme_3.1-137 bitops_1.0-6
[3] gmodels_2.18.1 GenomeInfoDb_1.16.0
[5] tools_3.5.1 R6_2.2.2
[7] vegan_2.5-2 spData_0.2.9.0
[9] lazyeval_0.2.1 BiocGenerics_0.26.0
[11] mgcv_1.8-24 colorspace_1.3-2
[13] permute_0.9-4 sp_1.3-1
[15] tidyselect_0.2.4 compiler_3.5.1
[17] expm_0.999-2 scales_0.5.0
[19] readr_1.1.1 stringr_1.3.1
[21] digest_0.6.15 XVector_0.20.0
[23] pkgconfig_2.0.1 htmltools_0.3.6
[25] fst_0.8.8 rlang_0.2.2
[27] shiny_1.1.0 bindr_0.1.1
[29] gtools_3.8.1 spdep_0.7-7
[31] dplyr_0.7.6 RCurl_1.95-4.11
[33] magrittr_1.5 GenomeInfoDbData_1.1.0
[35] Matrix_1.2-14 Rcpp_0.12.18
[37] munsell_0.4.3 S4Vectors_0.18.3
[39] ape_5.1 stringi_1.2.4
[41] yaml_2.1.18 MASS_7.3-50
[43] zlibbioc_1.26.0 plyr_1.8.4
[45] grid_3.5.1 parallel_3.5.1
[47] gdata_2.18.0 listenv_0.7.0
[49] promises_1.0.1 deldir_0.1-15
[51] lattice_0.20-35 Biostrings_2.48.0
[53] splines_3.5.1 hms_0.4.2
[55] pillar_1.2.1 igraph_1.2.1
[57] GenomicRanges_1.32.6 boot_1.3-20
[59] seqinr_3.4-5 reshape2_1.4.3
[61] codetools_0.2-15 gdsfmt_1.16.0
[63] stats4_3.5.1 LearnBayes_2.15.1
[65] glue_1.3.0 data.table_1.11.4
[67] httpuv_1.4.5 gtable_0.2.0
[69] purrr_0.2.5 tidyr_0.8.1
[71] SeqArray_1.21.4 future_1.9.0
[73] amap_0.8-16 assertthat_0.2.0
[75] ggplot2_3.0.0 mime_0.5
[77] xtable_1.8-2 coda_0.19-1
[79] later_0.7.3 tibble_1.4.2
[81] pbmcapply_1.2.5 IRanges_2.14.11
[83] cluster_2.0.7-1 globals_0.12.1
Any input/help would be much appreciated!
Emily
I'm trying to follow your directions for system configuration for HPC on my Mac High-Sierra installation (ver. 10.13.6), in order to use it with radiator. I am following your vignette rad_genomics_computer_setup.nb.html.
There are some issues.
I get errors at this first step:
bgppermp$ sudo tar -zxvf gcc-8.1-bin.tar.gz -C/
x usr/local/: Can't set user=0/group=0 for usr/localFailed to set file flags
x usr/local/bin/
x usr/local/.com.apple.installer.keep
x usr/local/libexec/
x usr/local/include/
.
.
.
tar: Error exit delayed from previous errors.
The issue was access to /usr/local The solution was to set /usr/local/bin first in the path.
I followed your directions for clang, by copy-pasting, here is what happens:
bgppermp$ sudo tar -xzvf clang+llvm-6.0.0-x86_64-apple-darwin.tar.xz -C/usr/local –strip-components 1
Password:
tar: 1: Not found in archive
tar: –strip-components: Not found in archive
tar: Error exit delayed from previous errors.
The problem is that the --strip-components term is specified incorrectly. The entire string should be:
sudo tar -xzvf clang+llvm-6.0.0-x86_64-apple-darwin.tar.xz -C/usr/local --strip-components=1
You might correct your instructions for HPC at the relevant places.
Hi Thierry,
Thanks your powerful function!
When I try to use vcf file output from genomic_converter, some function report "Error: Duplicate identifiers for rows."
I used this code to Impute my data
genomic_converter("data.vcf",output ="vcf",
filename = "test2",
monomorphic.out = T,
common.markers = FALSE,
pop.levels = levels(pop$STRATA),
imputation.method = "max",
strata = "map.tsv")
I try used genomic_converter again and get error
genomic_converter("test2.vcf",output ="vcf",
filename = "test3",
monomorphic.out = T,
common.markers = FALSE,
pop.levels = levels(pop$STRATA),
maf.thresholds = c("locus", 1, "OR", 1, 1),
strata = "map.tsv")
and I get
Error: Duplicate identifiers for rows (1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154, 155, 156, 157, 158, 159, 160, 161, 162, 163, 164, 165, 166, 167, 168, 169, 170, 171, 172, 173, 174, 175, 176, 177, 178, 179, 180, 181, 182, 183, 184, 185, 186, 187, 188, 189, 190, 191, 192, 193, 194, 195, 196, 197, 198, 199, 200, 201, 202, 203, 204, 205, 206, 207, 208, 209, 210, 211, 212, 213, 214
|
Any input/help would be much appreciated!
Clark
Hello,
Great package, and very useful! Just came across an error when trying to do a hierfstat function on a genind object created with vcf2genind.
stop("alleles must be encoded as integers or nucleotides. Exiting")
When I look at the allele names for my genind, I see that they are all A1, A2:
head([email protected])
$locus_100026__41__41
1. 'A1'
2. 'A2'
Is there a way to get allele names that reference the original genotype? I'm pretty sure radiator is reading in my vcf correctly, because it outputs genotype info with other file conversions.
Hi Thierry,
I'm trying to import/convert a PLINK .tped
file using genomic_converter()
, but I keep getting the same error (tried it with a few different sources).
GBS_data <- genomic_converter("../1A-CB478ANXX_NS_filtered_plink.tped", imputation.method = "rf")
Error in data.table::fread(input = stringi::stri_replace_all_fixed(str = data, :
Column number 2 (colClasses[[1]][2]) is out of range [1,ncol=1]
There is a relevant .tfam
file and I'm able to use these .tped
files with other packages (such as GenABEL)
Thank you very much for developing this (and stackr) package!!
detect_genomic_format outputs the first line of the data as the data.type if data is a plink file. This is because
file.ending <- stringi::stri_sub(str = data, from = -4, to = -1) outputs "tped" and the function expects ".tped".
This is easily fixed by making the function expect "tped"
This goes on to break both tidy_genomic_data and genomic_converter if you're working with plink files.
Hi Thierry,
I've done some analysis in Adegenet. My Genind object contains 93 individuals, 9 populations and 6,386 loci:
/// GENIND OBJECT /////////
// 93 individuals; 6,386 loci; 12,772 alleles; size: 7.4 Mb
// Basic content @tab: 93 x 12772 matrix of allele counts @loc.n.all: number of alleles per locus (range: 2-2) @loc.fac: locus factor for the 12772 columns of @tab @all.names: list of allele names for each locus @ploidy: ploidy of each individual (range: 2-2) @type: codom @call: adegenet::df2genind(X = t(x), sep = sep)
// Optional content @pop: population of each individual (group size range: 8-18)
I want to use the genomic_converter
function to convert and export the Genind object into genepop and bayescan files. My understanding of the function is that it accepts Genind data from the global environment, however, I keep getting the error below?
convert = genomic_converter(seafan, output=c("genepop","bayescan"))
####################################################################### ##################### radiator::genomic_converter ##################### #######################################################################
Function arguments and values: Working directory: C:/Users/tj248/OneDrive - University of Exeter/Exeter University/PhD Project Documents/Pink Sea Fan/NextRAD Project 2017/nextRAD SNP Data Analysis Input file: from global environment Strata: no Population levels: no Population labels: no Output format(s): tidy, genepop, bayescan Filename prefix: no Filters: Blacklist of individuals: no Blacklist of genotypes: no Whitelist of markers: no monomorphic.out: TRUE snp.ld: no common.markers: TRUE max.marker: no pop.select: no maf.thresholds: no
Imputations options:
imputation.method: no
parallel.core: 3
#######################################################################
Importing data
Error in mutate_impl(.data, dots) : Evaluation error: object 'MARKERS' not found.
Hi Thierry,
I'm trying to use genome_converter in radiator to convert haplotypes file generated by Stacks v1.3 to vcf and I get the following error:
Error in dplyr::if_else(POLYMORPHISM == 0, stringi::stri_join(GT_VCF_NUC, : object 'POLYMORPHISM' not found.
The code I used:
genomic_converter(data = batch_1.haplotypes.tsv, strata = map.txt, output = c("vcf"), verbose = TRUE)
Traceback:
dplyr::if_else(POLYMORPHISM == 0, stringi::stri_join(GT_VCF_NUC, "/", GT_VCF_NUC), GT_VCF_NUC, missing = "./.")
12.
mutate_impl(.data, dots, caller_env())
11.
mutate.tbl_df(., GT_VCF_NUC = dplyr::if_else(POLYMORPHISM == 0, stringi::stri_join(GT_VCF_NUC, "/", GT_VCF_NUC), GT_VCF_NUC, missing = "./."), GT_VCF_NUC = dplyr::if_else(stringi::stri_detect_fixed(GT_VCF_NUC, "N"), "./.", GT_VCF_NUC))
10.
dplyr::mutate(., GT_VCF_NUC = dplyr::if_else(POLYMORPHISM == 0, stringi::stri_join(GT_VCF_NUC, "/", GT_VCF_NUC), GT_VCF_NUC, missing = "./."), GT_VCF_NUC = dplyr::if_else(stringi::stri_detect_fixed(GT_VCF_NUC, "N"), "./.", GT_VCF_NUC))
9.
function_list[i]
8.
freduce(value, _function_list
)
7.
_fseq
(_lhs
)
6.
eval(quote(_fseq
(_lhs
)), env, env)
5.
eval(quote(_fseq
(_lhs
)), env, env)
4.
withVisible(eval(quote(_fseq
(_lhs
)), env, env))
3.
input %>% dplyr::mutate(GT_VCF_NUC = dplyr::if_else(POLYMORPHISM == 0, stringi::stri_join(GT_VCF_NUC, "/", GT_VCF_NUC), GT_VCF_NUC, missing = "./."), GT_VCF_NUC = dplyr::if_else(stringi::stri_detect_fixed(GT_VCF_NUC, "N"), "./.", GT_VCF_NUC)) %>% dplyr::select(-POLYMORPHISM)
2.
tidy_genomic_data(data = data, strata = strata, filename = filename, parallel.core = parallel.core, whitelist.markers = whitelist.markers, blacklist.id = blacklist.id, vcf.metadata = vcf.metadata, vcf.stats = vcf.stats, keep.allele.names = keep.allele.names
I would like to know whether the error is due to the input file. (I'm using an older version of Stacks).
I checked the file with detect_genomic_format and radiator recognizes the file as haplo.file.
Thanks in advance for your help!
Best,
Farida
Hi ,
When running genomic_converter outputting to bayescan and running run_bayescan I get the warning:
Deprecated function, update your code to use: filter_monomorphic
I am assuming it is this line which needs to be updated in write_bayescan?
data <- radiator::discard_monomorphic_markers(data = data, verbose = TRUE)$input
Hi Thierry,
Trying to import a vcf using tidy_genomic_data without filtering common markers, but it filters them anyway:
tidy.sphanorth124spatial.radiator.50pctmsng2 <- tidy_genomic_data(data="G:/My Drive/Illumina Sequencing Data/20181212_rangewide/gitprojects/sphanorth124spatial/radiator_final/sphanorth124spatial.radiator.50pctmsng.oneSNPmac3.vcf",
strata="G:/My Drive/Illumina Sequencing Data/20181212_rangewide/gitprojects/sphanorth124spatial/sphanorth124spatial_popcoords_radiator_strata.txt",
parallel.core=1,
filter.common.markers=FALSE)
Execution date/time: 20190503@1233
Folder created: 377_radiator_tidy_genomic_20190503@1233
Function call and arguments stored in: [email protected]
Analyzing strata file
Number of strata: 27
Number of individuals: 124
Importing and tidying the VCF...
Execution date@time: 20190503@1233
Reading VCF
Data summary:
number of samples: 124
number of markers: 13524
done! timing: 2 sec
Filter monomorphic markers
Number of individuals / strata / chrom / locus / SNP:
Blacklisted: 0 / 0 / 0 / 0 / 0
Strata with low sample size detected: fig <- FALSE
Filter common markers:
Number of individuals / strata / chrom / locus / SNP:
Blacklisted: 0 / 0 / 7453 / 7453 / 7453
Generating individual stats...
[==================================================] 100%, completed in 0s
Generating markers stats...
[==================================================] 100%, completed in 0s
[==================================================] 100%, completed in 0s
Number of chromosome/contig/scaffold: 6071
Number of locus: 6071
Number of markers: 6071
Number of populations: 27
Number of individuals: 124
Session info:
- Session info ------------------------------------------------------------------------------------------------------------------
setting value
version R version 3.5.1 (2018-07-02)
os Windows >= 8 x64
system x86_64, mingw32
ui RStudio
language (EN)
collate English_United States.1252
ctype English_United States.1252
tz America/Los_Angeles
date 2019-05-03
- Packages ----------------------------------------------------------------------------------------------------------------------
package * version date lib source
ade4 * 1.7-13 2018-08-31 [1] CRAN (R 3.5.1)
adegenet * 2.1.1 2018-02-02 [1] CRAN (R 3.5.1)
amap 0.8-16 2018-05-14 [1] CRAN (R 3.5.0)
ape * 5.3 2019-03-17 [1] CRAN (R 3.5.3)
assertthat 0.2.1 2019-03-21 [1] CRAN (R 3.5.3)
assigner * 0.5.5 2019-04-30 [1] Github (thierrygosselin/assigner@ed6475f)
backports 1.1.3 2018-12-14 [1] CRAN (R 3.5.2)
Biobase 2.40.0 2018-05-01 [1] Bioconductor
BiocGenerics 0.26.0 2018-05-01 [1] Bioconductor
Biostrings 2.48.0 2018-05-01 [1] Bioconductor
bitops 1.0-6 2013-08-17 [1] CRAN (R 3.5.0)
boot 1.3-20 2017-08-06 [2] CRAN (R 3.5.1)
broom 0.5.1 2018-12-05 [1] CRAN (R 3.5.2)
Cairo * 1.5-10 2019-03-28 [1] CRAN (R 3.5.1)
calibrate 1.7.2 2013-09-10 [1] CRAN (R 3.5.1)
callr 3.2.0 2019-03-15 [1] CRAN (R 3.5.3)
caTools 1.17.1.2 2019-03-06 [1] CRAN (R 3.5.2)
class 7.3-15 2019-01-01 [1] CRAN (R 3.5.3)
classInt 0.3-1 2018-12-18 [1] CRAN (R 3.5.2)
cli 1.1.0 2019-03-19 [1] CRAN (R 3.5.3)
cluster * 2.0.7-1 2018-04-13 [2] CRAN (R 3.5.1)
coda 0.19-2 2018-10-08 [1] CRAN (R 3.5.1)
codetools 0.2-16 2018-12-24 [1] CRAN (R 3.5.2)
colorspace 1.4-1 2019-03-18 [1] CRAN (R 3.5.1)
combinat 0.0-8 2012-10-29 [1] CRAN (R 3.5.0)
crayon 1.3.4 2017-09-16 [1] CRAN (R 3.5.1)
crosstalk 1.0.0 2016-12-21 [1] CRAN (R 3.5.1)
curl 3.3 2019-01-10 [1] CRAN (R 3.5.2)
data.table 1.12.0 2019-01-13 [1] CRAN (R 3.5.2)
DataCombine * 0.2.21 2016-04-13 [1] CRAN (R 3.5.1)
DBI 1.0.0 2018-05-02 [1] CRAN (R 3.5.1)
deldir 0.1-16 2019-01-04 [1] CRAN (R 3.5.2)
desc 1.2.0 2018-05-01 [1] CRAN (R 3.5.1)
devtools * 2.0.1 2018-10-26 [1] CRAN (R 3.5.1)
dichromat 2.0-0 2013-01-24 [1] CRAN (R 3.5.2)
digest 0.6.18 2018-10-10 [1] CRAN (R 3.5.1)
dismo 1.1-4 2017-01-09 [1] CRAN (R 3.5.1)
doParallel 1.0.14 2018-09-24 [1] CRAN (R 3.5.1)
dplyr * 0.8.0.1 2019-02-15 [1] CRAN (R 3.5.2)
e1071 1.7-1 2019-03-19 [1] CRAN (R 3.5.3)
evaluate 0.13 2019-02-12 [1] CRAN (R 3.5.2)
expm 0.999-4 2019-03-21 [1] CRAN (R 3.5.3)
fansi 0.4.0 2018-10-05 [1] CRAN (R 3.5.1)
fastmatch 1.1-0 2017-01-28 [1] CRAN (R 3.5.0)
foreach 1.4.4 2017-12-12 [1] CRAN (R 3.5.1)
fs 1.2.7 2019-03-19 [1] CRAN (R 3.5.3)
fst 0.8.10 2018-12-14 [1] CRAN (R 3.5.2)
future 1.12.0 2019-03-08 [1] CRAN (R 3.5.1)
gap 1.1-22 2018-06-08 [1] CRAN (R 3.5.1)
gdata 2.18.0 2017-06-06 [1] CRAN (R 3.5.1)
gdistance 1.2-2 2018-05-07 [1] CRAN (R 3.5.1)
gdsfmt * 1.16.0 2018-05-01 [1] Bioconductor
generics 0.0.2 2018-11-29 [1] CRAN (R 3.5.2)
genetics 1.3.8.1.1 2019-02-01 [1] CRAN (R 3.5.2)
GenomeInfoDb 1.16.0 2018-05-01 [1] Bioconductor
GenomeInfoDbData 1.1.0 2018-09-05 [1] Bioconductor
GenomicRanges 1.32.6 2018-07-20 [1] Bioconductor
GGally 1.4.0 2018-05-17 [1] CRAN (R 3.5.1)
ggplot2 3.1.0 2018-10-25 [1] CRAN (R 3.5.1)
globals 0.12.4 2018-10-11 [1] CRAN (R 3.5.1)
glue 1.3.1 2019-03-12 [1] CRAN (R 3.5.3)
gmodels 2.18.1 2018-06-25 [1] CRAN (R 3.5.1)
gplots * 3.0.1.1 2019-01-27 [1] CRAN (R 3.5.2)
gridExtra 2.3 2017-09-09 [1] CRAN (R 3.5.1)
gtable 0.3.0 2019-03-25 [1] CRAN (R 3.5.3)
gtools 3.8.1 2018-06-26 [1] CRAN (R 3.5.0)
GWASExactHW 1.01 2013-01-05 [1] CRAN (R 3.5.0)
hierfstat * 0.04-22 2015-12-04 [1] CRAN (R 3.5.1)
highr 0.8 2019-03-20 [1] CRAN (R 3.5.3)
hms 0.4.2 2018-03-10 [1] CRAN (R 3.5.1)
htmltools 0.3.6 2017-04-28 [1] CRAN (R 3.5.1)
htmlwidgets 1.3 2018-09-30 [1] CRAN (R 3.5.1)
httpuv 1.5.0 2019-03-15 [1] CRAN (R 3.5.3)
igraph 1.2.4 2019-02-13 [1] CRAN (R 3.5.2)
IRanges 2.14.11 2018-08-24 [1] Bioconductor
iterators 1.0.10 2018-07-13 [1] CRAN (R 3.5.1)
jomo 2.6-7 2019-02-06 [1] CRAN (R 3.5.2)
jsonlite 1.6 2018-12-07 [1] CRAN (R 3.5.2)
KernSmooth 2.23-15 2015-06-29 [2] CRAN (R 3.5.1)
knitr * 1.22 2019-03-08 [1] CRAN (R 3.5.1)
labeling 0.3 2014-08-23 [1] CRAN (R 3.5.0)
later 0.8.0 2019-02-11 [1] CRAN (R 3.5.2)
lattice 0.20-38 2018-11-04 [1] CRAN (R 3.5.3)
lazyeval 0.2.2 2019-03-15 [1] CRAN (R 3.5.3)
LEA * 1.99.2 2018-09-29 [1] Github (bcm-uga/LEA@ffea10d)
LearnBayes 2.15.1 2018-03-18 [1] CRAN (R 3.5.0)
listenv 0.7.0 2018-01-21 [1] CRAN (R 3.5.1)
lme4 1.1-21 2019-03-05 [1] CRAN (R 3.5.2)
logistf 1.23 2018-07-19 [1] CRAN (R 3.5.1)
magrittr 1.5 2014-11-22 [1] CRAN (R 3.5.1)
manipulateWidget 0.10.0 2018-06-11 [1] CRAN (R 3.5.1)
mapproj 1.2.6 2018-03-29 [1] CRAN (R 3.5.1)
maps * 3.3.0 2018-04-03 [1] CRAN (R 3.5.1)
MASS 7.3-51.1 2018-11-01 [1] CRAN (R 3.5.3)
Matrix * 1.2-17 2019-03-22 [1] CRAN (R 3.5.3)
memoise 1.1.0 2017-04-21 [1] CRAN (R 3.5.1)
memuse 4.0-0 2017-11-10 [1] CRAN (R 3.5.0)
mgcv 1.8-28 2019-03-21 [1] CRAN (R 3.5.3)
mice 3.4.0 2019-03-07 [1] CRAN (R 3.5.2)
mime 0.6 2018-10-05 [1] CRAN (R 3.5.1)
miniUI 0.1.1.1 2018-05-18 [1] CRAN (R 3.5.1)
minqa 1.2.4 2014-10-09 [1] CRAN (R 3.5.1)
mitml 0.3-7 2019-01-07 [1] CRAN (R 3.5.2)
mmod * 1.3.3 2017-04-06 [1] CRAN (R 3.5.1)
munsell 0.5.0 2018-06-12 [1] CRAN (R 3.5.1)
mvtnorm 1.0-10 2019-03-05 [1] CRAN (R 3.5.2)
nlme 3.1-137 2018-04-07 [2] CRAN (R 3.5.1)
nloptr 1.2.1 2018-10-03 [1] CRAN (R 3.5.1)
nnet 7.3-12 2016-02-02 [2] CRAN (R 3.5.1)
pals * 1.5 2018-01-22 [1] CRAN (R 3.5.2)
pan 1.6 2018-06-29 [1] CRAN (R 3.5.1)
pbmcapply 1.3.1 2019-01-14 [1] CRAN (R 3.5.2)
pegas * 0.11 2018-07-09 [1] CRAN (R 3.5.1)
permute 0.9-5 2019-03-12 [1] CRAN (R 3.5.3)
phangorn 2.5.3 2019-03-23 [1] CRAN (R 3.5.3)
pillar 1.3.1 2018-12-15 [1] CRAN (R 3.5.2)
pinfsc50 1.1.0 2016-12-02 [1] CRAN (R 3.5.0)
pkgbuild 1.0.3 2019-03-20 [1] CRAN (R 3.5.3)
pkgconfig 2.0.2 2018-08-16 [1] CRAN (R 3.5.1)
pkgload 1.0.2 2018-10-29 [1] CRAN (R 3.5.1)
plyr 1.8.4 2016-06-08 [1] CRAN (R 3.5.1)
png 0.1-7 2013-12-03 [1] CRAN (R 3.5.0)
polysat 1.7-4 2019-03-06 [1] CRAN (R 3.5.2)
PopGenReport * 3.0.4 2019-02-04 [1] CRAN (R 3.5.2)
poppr * 2.8.2 2019-03-11 [1] CRAN (R 3.5.3)
prettyunits 1.0.2 2015-07-13 [1] CRAN (R 3.5.1)
processx 3.3.0 2019-03-10 [1] CRAN (R 3.5.3)
promises 1.0.1 2018-04-13 [1] CRAN (R 3.5.1)
ps 1.3.0 2018-12-21 [1] CRAN (R 3.5.2)
purrr 0.3.2 2019-03-15 [1] CRAN (R 3.5.3)
quadprog 1.5-5 2013-04-17 [1] CRAN (R 3.5.0)
R.methodsS3 1.7.1 2016-02-16 [1] CRAN (R 3.5.0)
R.oo 1.22.0 2018-04-22 [1] CRAN (R 3.5.0)
R.utils 2.8.0 2019-02-14 [1] CRAN (R 3.5.2)
R6 2.4.0 2019-02-14 [1] CRAN (R 3.5.2)
radiator * 1.1.0 2019-05-03 [1] Github (thierrygosselin/radiator@fdef494)
raster 2.8-19 2019-01-30 [1] CRAN (R 3.5.2)
RColorBrewer 1.1-2 2014-12-07 [1] CRAN (R 3.5.0)
Rcpp 1.0.1 2019-03-17 [1] CRAN (R 3.5.3)
RcppEigen 0.3.3.5.0 2018-11-24 [1] CRAN (R 3.5.2)
RCurl 1.95-4.12 2019-03-04 [1] CRAN (R 3.5.2)
readr 1.3.1 2018-12-21 [1] CRAN (R 3.5.2)
remotes 2.0.2 2018-10-30 [1] CRAN (R 3.5.1)
reshape 0.8.8 2018-10-23 [1] CRAN (R 3.5.1)
reshape2 1.4.3 2017-12-11 [1] CRAN (R 3.5.1)
rgdal 1.4-3 2019-03-14 [1] CRAN (R 3.5.3)
rgl 0.100.19 2019-03-12 [1] CRAN (R 3.5.3)
RgoogleMaps 1.4.3 2018-11-07 [1] CRAN (R 3.5.1)
rlang 0.3.2 2019-03-21 [1] CRAN (R 3.5.3)
rpart 4.1-13 2018-02-23 [2] CRAN (R 3.5.1)
rprojroot 1.3-2 2018-01-03 [1] CRAN (R 3.5.1)
rstudioapi 0.10 2019-03-19 [1] CRAN (R 3.5.3)
S4Vectors 0.18.3 2018-06-08 [1] Bioconductor
scales 1.0.0 2018-08-09 [1] CRAN (R 3.5.1)
SeqArray * 1.21.4 2018-09-05 [1] Github (zhengxwen/SeqArray@1d5ab05)
seqinr 3.4-5 2017-08-01 [1] CRAN (R 3.5.1)
SeqVarTools 1.20.2 2019-02-27 [1] Bioconductor
sessioninfo 1.1.1 2018-11-05 [1] CRAN (R 3.5.1)
sf 0.7-3 2019-02-21 [1] CRAN (R 3.5.2)
shiny 1.2.0 2018-11-02 [1] CRAN (R 3.5.1)
sp 1.3-1 2018-06-05 [1] CRAN (R 3.5.1)
spData 0.3.0 2019-01-07 [1] CRAN (R 3.5.2)
spdep 1.0-2 2019-02-13 [1] CRAN (R 3.5.2)
StAMPP * 1.5.1 2017-11-10 [1] CRAN (R 3.5.1)
stringdist 0.9.5.1 2018-06-08 [1] CRAN (R 3.5.1)
stringi 1.4.3 2019-03-12 [1] CRAN (R 3.5.3)
stringr 1.4.0 2019-02-10 [1] CRAN (R 3.5.2)
survival 2.43-3 2018-11-26 [1] CRAN (R 3.5.3)
tess3r * 1.1.0 2018-09-04 [1] Github (bcm-uga/TESS3_encho_sen@43e5ede)
testthat 2.0.1 2018-10-13 [1] CRAN (R 3.5.1)
tibble 2.1.1 2019-03-16 [1] CRAN (R 3.5.3)
tidyr 0.8.3 2019-03-01 [1] CRAN (R 3.5.2)
tidyselect 0.2.5 2018-10-11 [1] CRAN (R 3.5.1)
units 0.6-2 2018-12-05 [1] CRAN (R 3.5.2)
UpSetR 1.3.3 2017-03-21 [1] CRAN (R 3.5.3)
usethis * 1.4.0 2018-08-14 [1] CRAN (R 3.5.1)
utf8 1.1.4 2018-05-24 [1] CRAN (R 3.5.1)
vcfR * 1.8.0 2018-04-17 [1] CRAN (R 3.5.1)
vegan 2.5-4 2019-02-04 [1] CRAN (R 3.5.2)
viridis 0.5.1 2018-03-29 [1] CRAN (R 3.5.1)
viridisLite 0.3.0 2018-02-01 [1] CRAN (R 3.5.1)
webshot 0.5.1 2018-09-28 [1] CRAN (R 3.5.1)
withr 2.1.2 2018-03-15 [1] CRAN (R 3.5.1)
xfun 0.5 2019-02-20 [1] CRAN (R 3.5.2)
XML * 3.98-1.19 2019-03-06 [1] CRAN (R 3.5.2)
xtable 1.8-3 2018-08-29 [1] CRAN (R 3.5.1)
XVector 0.20.0 2018-05-01 [1] Bioconductor
yaml 2.2.0 2018-07-25 [1] CRAN (R 3.5.1)
zlibbioc 1.26.0 2018-05-01 [1] Bioconductor
[1] C:/Users/kevin/Documents/R/win-library/3.5
[2] C:/Program Files/R/R-3.5.1/library
Hi there
I am wanting to keep all the SNPs present in the loci for the comparison.
Is this possible?
Cheers
Aimee
Execution date@time: 20190716@1551
Function call and arguments stored in: [email protected]
2 steps to visualize and filter the data based on the number of SNP on the read/locus:
Step 1. Visualization (boxplot, distribution
Step 2. Threshold selection
Filters parameters file: initiated
Generating SNP position on read stats
Generating helper table...
Files written: helper tables and plots
Step 2. Filtering markers based on the SNPs position on the read
Choice of stats are:
1: all (filter off)
2: outliers
3: q75
4: iqr
5: choose your own min and max values
1
File written: whitelist.markers.snp.position.read.tsv
File written: blacklist.markers.snp.position.read.tsv
Filters parameters file: updated
################################### RESULTS ####################################
Filter SNP position on the read : all
Number of individuals / strata / chrom / locus / SNP:
Before: 84 / 5 / 1 / 514 / 957
Blacklisted: 0 / 0 / 0 / 0 / 0
After: 84 / 5 / 1 / 514 / 957
Computation time, overall: 18 sec
##################### completed filter_snp_position_read #######################
################################################################################
############################ radiator::filter_snp_number #######################
################################################################################
Execution date@time: 20190716@1551
Function call and arguments stored in: [email protected]
Interactive mode: on
2 steps to visualize and filter the data based on the number of SNP on the read/locus:
Step 1. Impact of SNP number per read/locus (on individual genotypes and locus/snp number potentially filtered)
Step 2. Choose the filtering thresholds
Filters parameters file: initiated
Generating statistics
With max read length taken from data: 83
The max number of SNP per locus correspond to:
1 SNP per 12 bp
Generating helper table...
Files written: helper tables and plots
Step 2. Filtering markers based on the maximum of SNPs per locus
Do you still want to blacklist markers? (y/n):
n
File written: whitelist.markers.genotyping.tsv
File written: blacklist.markers.genotyping.tsv
Filters parameters file: updated
################################### RESULTS ####################################
Filter SNPs per locus threshold: 1e+12
Number of individuals / strata / chrom / locus / SNP:
Before: 84 / 5 / 1 / 514 / 957
Blacklisted: 0 / 0 / 0 / 0 / 0
After: 84 / 5 / 1 / 514 / 957
Computation time, overall: 95 sec
######################### completed filter_snp_number ##########################
################################################################################
############################## radiator::filter_ld #############################
################################################################################
Execution date@time: 20190716@1552
Function call and arguments stored in: [email protected]
Interactive mode: on
Step 1. Short distance LD threshold selection
Step 2. Filtering markers based on short distance LD
Step 3. Long distance LD pruning selection
Step 4. Threshold selection
Step 5. Filtering markers based on long distance LD
Filters parameters file: initiated
Minimizing short distance LD...
The range in the number of SNP/locus is: 1-7
Step 1. Short distance LD threshold selection
the goal is to keep only 1 SNP per read/locus
Choose the filter.short.ld threshold
Options include:
1: mac (Not sure ? use mac...)
2: random
3: first
4: middle
5: last
1
Hey
Getting this error on file that has previously worked fine with filter_dart
Next step requires the genotypes
Importing DArT data
Error in overscope_eval_next(overscope, expr) :
object 'TARGET_ID' not found
Thanks
Hi Thierry,
I've been unable to filter my RADseq data set using the interactive filter of radiator::filter_rad. This is because for some reason the strata file is not being recognized
Error in radiator::tidy_genomic_data(data = data, strata = strata, vcf.metadata = TRUE, :
Non-matching INDIVIDUALS between data and strata.
if I only used a tidy.data (supposedly with the strata information already on it)
data.filtered <- radiator::filter_rad(data = prep.vcf$tidy.data, output = "genind", filename = "spis.test")
Then the interactive filter runs, yet when it gets to the 4th filtering step "04: Filtering individuals poorly genotyped" the graph only shows "overall" and "NA" boxplots, and no matter the filtering threshold you set, not a single individual gets blacklisted.
This is a recurrent problem. I tried to reproduce the filtering with a dataset it had worked perfectly in the past but it doesn't work any longer.
Hi Thierry - sorry to bug you again, but I'm having an issue with genomic_converter renaming my alleles from those of my input file.
For example, I have a simulation data set in genind format that shows counts of my alleles "100" and "110":
> miss.genind@tab[1:5,1:10]
0.100 0.110 1.100 1.110 10.100 10.110 11.100 11.110 12.100 12.110
001 2 0 1 1 1 1 0 2 2 0
002 0 2 1 1 0 2 1 1 0 2
003 2 0 0 2 2 0 2 0 2 0
004 0 2 1 1 NA NA 0 2 1 1
005 1 1 0 2 0 2 1 1 NA NA
I run genomic_converter like so:
foo <- genomic_converter(data=miss.genind, output="genind", imputation.method="rf", hierarchical.levels="global", verbose = TRUE)
When I look at the output data, the alleles are renamed A1/A2 -- but not consistently. For example, A1 does not always = allele "100" from the original dataset.
> foo$genind.imputed@tab[1:5,1:10]
0.A1 0.A2 1.A1 1.A2 10.A1 10.A2 11.A1 11.A2 12.A1 12.A2
001 0 2 1 1 1 1 2 0 2 0
002 2 0 1 1 2 0 1 1 0 2
003 0 2 0 2 0 2 0 2 2 0
004 2 0 1 1 NA NA 2 0 1 1
005 1 1 0 2 2 0 1 1 NA NA
Individual 3 is a good example if you compare across the data sets.
That's a problem for me since I am taking only one column per locus from the imputed data frame for downstream analysis (for example, I need all of the "110" allele counts).
Can you tweak genomic_converter so it doesn't rename the alleles? That would probably be easiest (?)
Hope that makes sense - I can send the data set if needed.
Thanks so much! (And no huge hurry!)
Brenna
According to the genomic_convereter()
documentation, the default method for imputation is "rf", while in fact it is NULL
. It caused me some confusion trying to understand where the imputed results are.
Hi Thierry,
I'm learning to use radiator, what it does and how it works. I've been using a dummy data set (DarT) of 19 individuals, two populations and 53 loci to run the package.
However, my "by hand" calculations do not match the missing proportion and heterozygosity values generated by the "filter_individuals" function. I had a look at the function and couldn't figure out how radiator calculates these values. So for example, the "missing_prop" for a particular individual is 0.32, however it only has 4 missing data points out of the 53 loci (4/53= 0.075, missing prop). Likewise, an additional sample has a heterozygosity value of "0" but that individual is heterozygote for two loci.
Do you mind shedding some light on how these values are derived or pointing me to where this is explained?
I have attached the dummy data set and the individuals qc. stats.tvs output in case you need them
Dummy_dataset_and_output_qc_indv.zip
the code i've used is:
infile1 <- "./Dummy_Data_subset.csv"
infile3 <- "./strata.txt"
tmp=read_dart(infile1, infile3)
tmp2= filter_individuals(tmp)
Thank you,
Diana
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.