Hi Thierry, I'm trying to do a quick PCAdapt analysis with a tidy fi

Further on this... when you run this: <div class="snippet-clipboard-content notran

3. Using radiator::write_vcf <div class="snippet-clipboard-conten

4. Using the vcf file as pcadapt input file <div class="snippet-c

5. problem while running your file in pcadapt <div class="snippet

write.pcadapt and write.vcf not giving useable results, could this be the POP ID column? about radiator HOT 7 CLOSED

thierrygosselin commented on May 23, 2024

write.pcadapt and write.vcf not giving useable results, could this be the POP ID column?

from radiator.

Comments (7)

thierrygosselin commented on May 23, 2024

Sorry for the long delays, got my new computer working today!

I'll go by points/questions/interrogations:

1. Using the command radiator::write_pcadapt:

write_pcadapt("brook_char_tidy_maf.tsv", pop.select = c("1","2","3","4","5","6","7","8","9","10","11","14","16","17"), snp.ld = NULL,
maf.thresholds = NULL, filename = NULL,
parallel.core = parallel::detectCores() - 1)à

You get lots of zeros.

When I run the command with your data I get this:

Generating pcadapt file...
pop.select: 
Using markers common in all populations:
    Number of markers before/blacklisted/after:4614/3067/1547
Scanning for monomorphic markers...
    Number of markers before/blacklisted/after: 1547/0/1547
writing pcadapt file with:
    Number of populations: 14
    Number of individuals: 324
    Number of markers: 1547

Is this what your getting?

note: pcadapt file is coded 0,1,2,9 and 9 is for missing genotypes.

The only other thing I see here is that your original file : brook_char_tidy_maf.tsv as lots of missing data. A lot of the markers are not in common between your groups/sampling sites/pops. This is why in radiator::write_pcadapt you start with 4614 and end up with only 1547. It's pruning the dataset by default because you really don't want to run pcadapt without common markers between your groups.

from radiator.

thierrygosselin commented on May 23, 2024

Further on this... when you run this:

data <- radiator::tidy_genomic_data(data = "brook_char_tidy_maf.tsv", pop.select = c("1","2","3","4","5","6","7","8","9","10","11","14","16","17"))

You should get this:

#######################################################################
##################### radiator::tidy_genomic_data #####################
#######################################################################
Importing the data frame ...
14 population(s) selected

Number of markers with REF/ALT change(s) = 3500
Erasing genotype: no
Using markers common in all populations:
    Number of markers before/blacklisted/after:4614/3067/1547
Scanning for monomorphic markers...
    Number of markers before/blacklisted/after: 1547/0/1547
############################### RESULTS ###############################
Tidy data written in global environment
Data format: tbl_df
Biallelic data

Tidy genomic data:
    Number of common markers: 1547
    Number of chromosome/contig/scaffold: 1
    Number of individuals: 324
    Number of populations: 14
Computation time: 9 sec
################ radiator::tidy_genomic_data completed ################

This generates a tidy dataset filtered for your 14 pop selected and check automatically for the REF/ALT allele from the counts, you see that out of your original 4614 markers 3500 are reassigned REF/ALT correctly. Removing individuals or entire populations does this in general.

You still end up with 1547 markers.

from radiator.

thierrygosselin commented on May 23, 2024

2. reading the pcadapt file generated by radiator in pcadapt

Running this command:

require(pcadapt)
test <- pcadapt::read.pcadapt("my_radiator_pca_file", type = "pcadapt")

I only get this:

1547 lines detected.
324 columns detected.

So the warning you're receiving is probably a window issue, I suggest you check the file in a pc text editor to see if you have something strange.

from radiator.

thierrygosselin commented on May 23, 2024

3. Using radiator::write_vcf

check <- radiator::write_vcf("brook_char_tidy_maf.tsv", pop.info = TRUE, filename = NULL)

I get:

 Error in .f(.x[[i]], ...) : object 'POP_ID' not found

definitely a bug.

However, try running with pop.info = FALSE like this:

check <- radiator::write_vcf("brook_char_tidy_maf.tsv", filename = NULL)

Here's the doc concerning this argument:

_pop.info | (optional, logical)_ 
Should the population information be included in the FORMAT field
(along the GT info for each samples ?). 
To make the VCF population-ready use pop.info = TRUE. 
The population information must be included in the POP_ID column of the tidy dataset.
Default: pop.info = FALSE.

It's not a fully functional arguments yet, no other software besides radiator is exploiting the pop info in vcf. So there's no harm in turning it off as it is by default...

from radiator.

thierrygosselin commented on May 23, 2024

4. Using the vcf file as pcadapt input file

require(pcadapt)
test <- pcadapt::read.pcadapt(path_to_file,type="vcf")

I get:

No variant got discarded.
Summary:

	- input file:				[email protected]
	- output file:				~filea31cda59f03.pcadapt

	- number of individuals detected:	324
	- number of loci detected:		4614

4614 lines detected.
324 columns detected.

As you say above no markers got discarded here. And it's normal, because I havent applied a default common.markers = TRUE filtering when the vcf was generated in radiator.

from radiator.

thierrygosselin commented on May 23, 2024

5. problem while running your file in pcadapt

x <- pcadapt::pcadapt(data, K = 9)

You get this error:

Error in UseMethod("pcadapt") :
no applicable method for 'pcadapt' applied to an object of class "c('matrix', 'integer', 'numeric')"

I don't have an error:

names(x)
[1] "scores"          "singular.values" "loadings"        "zscores"         "af"              "maf"            
 [7] "chi2.stat"       "stat"            "gif"             "pvalues"         "pass"

At this point here it's a pcadapt problem, try re-installing it. If nothing work, ask Michael Blum for help.
I've send you the result for the analysis with and without common markers...

Cheers
Thierry

from radiator.

Ella-Bowles commented on May 23, 2024

Thanks Thierry,

I was able to get a good chunk of the results that you were as well.

I did check the file generated at "test <- pcadapt::read.pcadapt("my_radiator_pca_file", type = "pcadapt")" for windows line endings, and my machine is telling me that it is only ASCII, so not sure what is happening here.

I can get the same as you get from "x <- pcadapt::pcadapt(data, K = 9) and then names(x)" if I use the original dataset generated as "test " in read.pcadapt above, but not if I convert to a matrix (using data <- as.matrix(read.table("file32742f854a4d.pcadapt.bed")), and then use the "data" object. I see many errors when I try to convert to the matrix. Given the error that I am getting with the read.pcadapt command though, I suspect that there is something wrong with the output bed file. Anyway, I am awaiting Mike Blum's feedback on my errors at the moment, and I'll let you know if things don't work.

As an aside, the "common.markers = TRUE" argument isn't working for me when I try to generate the vcf file in radiator. Not a big deal if I can get the pcadapt file to work, but just wanted to give you a heads up.

Ella

from radiator.

write.pcadapt and write.vcf not giving useable results, could this be the POP ID column? about radiator HOT 7 CLOSED

Comments (7)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent