Hi Thierry,
I'm trying to do a quick PCAdapt analysis with a tidy file generated in radiator, but it seems that something is going wrong with file conversion to both pcadapt and vcf formats, or that our population labels are not working correctly. Currently, pops are labelled using numbers. I can send you files if it makes it easier to address, but I will write as much as possible here. I am running R v 3.5.1, and did a fresh install of radiator.
When I try to convert the tidy dataframe to pcadapt format using
write_pcadapt("brook_char_tidy_maf.tsv", pop.select = c("1","2","3","4","5","6","7","8","9","10","11","14","16","17"), snp.ld = NULL,
maf.thresholds = NULL, filename = NULL,
parallel.core = parallel::detectCores() - 1)
Everything seems to go fine, except that I get a lot of zeros showing up in the data in the console. This seems a little odd.
Small sampling here
$genotype.matrix
Blackfly-d1084 Blackfly-d1191 Blackfly-d1797
[1,] "0" "0" "0"
[2,] "9" "0" "0"
[3,] "9" "1" "2"
Blackfly-d2016 Blackfly-d2125 Blackfly-d2204
[1,] "0" "0" "0"
[2,] "0" "1" "1"
[3,] "2" "2" "2"
Blackfly-d2305 Blackfly-d2365 Blackfly-d2388
[1,] "0" "0" "0"
[2,] "0" "1" "0"
[3,] "1" "0" "2"
Blackfly-d2732 Blackfly-d2771-bf Blackfly-d2975
[1,] "0" "0" "0"
[2,] "0" "0" "0"
[3,] "1" "2" "1"
Blackfly-d3277 Blackfly-d3469 Blackfly-d3507
[1,] "0" "0" "0"
[2,] "0" "1" "2"
[3,] "2" "2" "1"
Blackfly-d3580 Blackfly-d3638 Blackfly-d503
[1,] "0" "0" "0"
[2,] "0" "2" "1"
[3,] "2" "1" "2"
Blackfly-d940 BobsCove-C1028 BobsCove-C1090
[1,] "0" "0" "0"
[2,] "0" "0" "0"
[3,] "0" "0" "0"
BobsCove-C1140 BobsCove-C1145 BobsCove-C1148
[1,] "0" "0" "0"
[2,] "9" "0" "0"
[3,] "0" "0" "0"
BobsCove-C1189 BobsCove-C1215 BobsCove-C1401
[1,] "0" "0" "0"
[2,] "0" "0" "0"
[3,] "0" "0" "0"
BobsCove-C1449 BobsCove-C1476 BobsCove-C1493
[1,] "0" "0" "0"
[2,] "0" "0" "0"
[3,] "0" "0" "0"
BobsCove-C1526 BobsCove-C1527 BobsCove-C1580
[1,] "0" "0" "0"
[2,] "0" "0" "0"
[3,] "0" "0" "0"
BobsCove-C1607 BobsCove-C1654 BobsCove-C2255
[1,] "0" "0" "0"
[2,] "0" "0" "0"
[3,] "0" "0" "0"
BobsCove-C2279 BobsCove-C2304 BobsCove-C2316
[1,] "0" "9" "0"
[2,] "0" "0" "0"
[3,] "0" "0" "0"
BobsCove-C2390 BobsCove-C2435 BobsCove-C2451
[1,] "0" "0" "0"
[2,] "0" "0" "0"
[3,] "0" "0" "0"
BobsCove-C2452 BobsCove-C2462 BobsCove-C2530
Looking through the tsv tidy file (which is my collaborators), I see some things that may be a bit odd, but he has produced many analyses successfully with the file, so perhaps it is fine. However, I notice things like that the ALT and REF alleles are the same for the whole file (diff from eachother, but the same for all loci). Anyway, in case this file is fine I run
require(pcadapt)
path_to_file <- "D:/Documents/PostDoc/collaboration-MYates-CapeRace-PondStocking/[email protected]"
MY <- read.pcadapt(path_to_file,type="pcadapt")
And I get the error
1547 lines detected.
324 columns detected.
Warning message:
Only one 'return' characters detected, yet Windows
When I try a different route, using
write_vcf("brook_char_tidy_maf.tsv", pop.info = TRUE, filename = NULL)
I get an error
Error in .f(.x[[i]], ...) : object 'POP_ID' not found
However, I know that this field is in the datafile, so this is weird.
To test to see if the file works without the populations field, I try
write_vcf("brook_char_tidy_maf.tsv", pop.info = FALSE, filename = NULL)
The vcf that I get out of this looks strange in comparison to other vcf files that I have worked with. There are a lot of zeros. Looking at the tidy file though, this may be due to some issue converting the vcf to a tidy file. I can send you both the vcf and the tidy files if you'd like. Anyway, I didn't get any complains from R, so I ran
require(pcadapt)
path_to_file <- "D:/Documents/PostDoc/collaboration-MYates-CapeRace-PondStocking/[email protected]"
MY <- read.pcadapt(path_to_file,type="vcf")
This looks more like what I should see
No variant got discarded.
Summary:
- input file: D:/Documents/PostDoc/collaboration-MYates-CapeRace-PondStocking/[email protected]
- output file: C:\Users\Admin\AppData\Local\Temp\RtmpCeIy6G\file183c1e3197.pcadapt
- number of individuals detected: 324
- number of loci detected: 4614
4614 lines detected.
324 columns detected.
I fetched my output file, and then ran
x <- pcadapt(MY,K=20)
plot(x,option="screeplot")
data <- as.matrix(read.table("file183c1e3197.pcadapt"))
#check the data
data[1:5,1:6]
V1 V2 V3 V4 V5 V6
[1,] 9 9 9 9 9 9
[2,] 9 0 1 2 0 2
[3,] 0 0 0 0 0 0
[4,] 9 9 9 9 9 9
[5,] 9 0 9 0 0 0
#Without removal of outliers
I skip the population id step b/c we didn't identify them, so go to
x<-pcadapt(data,K=9)
And then it errors out
Error in UseMethod("pcadapt") :
no applicable method for 'pcadapt' applied to an object of class "c('matrix', 'integer', 'numeric')"
Any thoughts about what could be causing these problems? I feel like I should send you my .tsv tidy file, and maybe the vcf and pcadapt files that i generated from those in radiator.
With thanks,
Ella