Code Monkey home page Code Monkey logo

Comments (7)

thierrygosselin avatar thierrygosselin commented on May 23, 2024

Sorry for the long delays, got my new computer working today!

I'll go by points/questions/interrogations:

1. Using the command radiator::write_pcadapt:

write_pcadapt("brook_char_tidy_maf.tsv", pop.select = c("1","2","3","4","5","6","7","8","9","10","11","14","16","17"), snp.ld = NULL,
maf.thresholds = NULL, filename = NULL,
parallel.core = parallel::detectCores() - 1)à

You get lots of zeros.

When I run the command with your data I get this:

Generating pcadapt file...
pop.select: 
Using markers common in all populations:
    Number of markers before/blacklisted/after:4614/3067/1547
Scanning for monomorphic markers...
    Number of markers before/blacklisted/after: 1547/0/1547
writing pcadapt file with:
    Number of populations: 14
    Number of individuals: 324
    Number of markers: 1547

Is this what your getting?

note: pcadapt file is coded 0,1,2,9 and 9 is for missing genotypes.

The only other thing I see here is that your original file : brook_char_tidy_maf.tsv as lots of missing data. A lot of the markers are not in common between your groups/sampling sites/pops. This is why in radiator::write_pcadapt you start with 4614 and end up with only 1547. It's pruning the dataset by default because you really don't want to run pcadapt without common markers between your groups.

from radiator.

thierrygosselin avatar thierrygosselin commented on May 23, 2024

Further on this... when you run this:

data <- radiator::tidy_genomic_data(data = "brook_char_tidy_maf.tsv", pop.select = c("1","2","3","4","5","6","7","8","9","10","11","14","16","17"))

You should get this:

#######################################################################
##################### radiator::tidy_genomic_data #####################
#######################################################################
Importing the data frame ...
14 population(s) selected

Number of markers with REF/ALT change(s) = 3500
Erasing genotype: no
Using markers common in all populations:
    Number of markers before/blacklisted/after:4614/3067/1547
Scanning for monomorphic markers...
    Number of markers before/blacklisted/after: 1547/0/1547
############################### RESULTS ###############################
Tidy data written in global environment
Data format: tbl_df
Biallelic data

Tidy genomic data:
    Number of common markers: 1547
    Number of chromosome/contig/scaffold: 1
    Number of individuals: 324
    Number of populations: 14
Computation time: 9 sec
################ radiator::tidy_genomic_data completed ################

This generates a tidy dataset filtered for your 14 pop selected and check automatically for the REF/ALT allele from the counts, you see that out of your original 4614 markers 3500 are reassigned REF/ALT correctly. Removing individuals or entire populations does this in general.

You still end up with 1547 markers.

from radiator.

thierrygosselin avatar thierrygosselin commented on May 23, 2024

2. reading the pcadapt file generated by radiator in pcadapt

Running this command:

require(pcadapt)
test <- pcadapt::read.pcadapt("my_radiator_pca_file", type = "pcadapt")

I only get this:

1547 lines detected.
324 columns detected.

So the warning you're receiving is probably a window issue, I suggest you check the file in a pc text editor to see if you have something strange.

from radiator.

thierrygosselin avatar thierrygosselin commented on May 23, 2024

3. Using radiator::write_vcf

check <- radiator::write_vcf("brook_char_tidy_maf.tsv", pop.info = TRUE, filename = NULL)

I get:

 Error in .f(.x[[i]], ...) : object 'POP_ID' not found 

definitely a bug.

However, try running with pop.info = FALSE like this:

check <- radiator::write_vcf("brook_char_tidy_maf.tsv", filename = NULL)

Here's the doc concerning this argument:

_pop.info | (optional, logical)_ 
Should the population information be included in the FORMAT field
(along the GT info for each samples ?). 
To make the VCF population-ready use pop.info = TRUE. 
The population information must be included in the POP_ID column of the tidy dataset.
Default: pop.info = FALSE.

It's not a fully functional arguments yet, no other software besides radiator is exploiting the pop info in vcf. So there's no harm in turning it off as it is by default...

from radiator.

thierrygosselin avatar thierrygosselin commented on May 23, 2024

4. Using the vcf file as pcadapt input file

require(pcadapt)
test <- pcadapt::read.pcadapt(path_to_file,type="vcf")

I get:

No variant got discarded.
Summary:

	- input file:				[email protected]
	- output file:				~filea31cda59f03.pcadapt

	- number of individuals detected:	324
	- number of loci detected:		4614

4614 lines detected.
324 columns detected.

As you say above no markers got discarded here. And it's normal, because I havent applied a default common.markers = TRUE filtering when the vcf was generated in radiator.

from radiator.

thierrygosselin avatar thierrygosselin commented on May 23, 2024

5. problem while running your file in pcadapt

x <- pcadapt::pcadapt(data, K = 9)

You get this error:

Error in UseMethod("pcadapt") :
no applicable method for 'pcadapt' applied to an object of class "c('matrix', 'integer', 'numeric')"

I don't have an error:

names(x)
[1] "scores"          "singular.values" "loadings"        "zscores"         "af"              "maf"            
 [7] "chi2.stat"       "stat"            "gif"             "pvalues"         "pass"    

At this point here it's a pcadapt problem, try re-installing it. If nothing work, ask Michael Blum for help.
I've send you the result for the analysis with and without common markers...

Cheers
Thierry

from radiator.

Ella-Bowles avatar Ella-Bowles commented on May 23, 2024

Thanks Thierry,

I was able to get a good chunk of the results that you were as well.

I did check the file generated at "test <- pcadapt::read.pcadapt("my_radiator_pca_file", type = "pcadapt")" for windows line endings, and my machine is telling me that it is only ASCII, so not sure what is happening here.

I can get the same as you get from "x <- pcadapt::pcadapt(data, K = 9) and then names(x)" if I use the original dataset generated as "test " in read.pcadapt above, but not if I convert to a matrix (using data <- as.matrix(read.table("file32742f854a4d.pcadapt.bed")), and then use the "data" object. I see many errors when I try to convert to the matrix. Given the error that I am getting with the read.pcadapt command though, I suspect that there is something wrong with the output bed file. Anyway, I am awaiting Mike Blum's feedback on my errors at the moment, and I'll let you know if things don't work.

As an aside, the "common.markers = TRUE" argument isn't working for me when I try to generate the vcf file in radiator. Not a big deal if I can get the pcadapt file to work, but just wanted to give you a heads up.

Ella

from radiator.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.