Comments (7)
Sorry for the long delays, got my new computer working today!
I'll go by points/questions/interrogations:
1. Using the command radiator::write_pcadapt:
write_pcadapt("brook_char_tidy_maf.tsv", pop.select = c("1","2","3","4","5","6","7","8","9","10","11","14","16","17"), snp.ld = NULL,
maf.thresholds = NULL, filename = NULL,
parallel.core = parallel::detectCores() - 1)à
You get lots of zeros.
When I run the command with your data I get this:
Generating pcadapt file...
pop.select:
Using markers common in all populations:
Number of markers before/blacklisted/after:4614/3067/1547
Scanning for monomorphic markers...
Number of markers before/blacklisted/after: 1547/0/1547
writing pcadapt file with:
Number of populations: 14
Number of individuals: 324
Number of markers: 1547
Is this what your getting?
note: pcadapt file is coded 0,1,2,9 and 9 is for missing genotypes.
The only other thing I see here is that your original file : brook_char_tidy_maf.tsv
as lots of missing data. A lot of the markers are not in common between your groups/sampling sites/pops. This is why in radiator::write_pcadapt you start with 4614 and end up with only 1547. It's pruning the dataset by default because you really don't want to run pcadapt without common markers between your groups.
from radiator.
Further on this... when you run this:
data <- radiator::tidy_genomic_data(data = "brook_char_tidy_maf.tsv", pop.select = c("1","2","3","4","5","6","7","8","9","10","11","14","16","17"))
You should get this:
#######################################################################
##################### radiator::tidy_genomic_data #####################
#######################################################################
Importing the data frame ...
14 population(s) selected
Number of markers with REF/ALT change(s) = 3500
Erasing genotype: no
Using markers common in all populations:
Number of markers before/blacklisted/after:4614/3067/1547
Scanning for monomorphic markers...
Number of markers before/blacklisted/after: 1547/0/1547
############################### RESULTS ###############################
Tidy data written in global environment
Data format: tbl_df
Biallelic data
Tidy genomic data:
Number of common markers: 1547
Number of chromosome/contig/scaffold: 1
Number of individuals: 324
Number of populations: 14
Computation time: 9 sec
################ radiator::tidy_genomic_data completed ################
This generates a tidy dataset filtered for your 14 pop selected and check automatically for the REF/ALT allele from the counts, you see that out of your original 4614 markers 3500 are reassigned REF/ALT correctly. Removing individuals or entire populations does this in general.
You still end up with 1547 markers.
from radiator.
2. reading the pcadapt file generated by radiator in pcadapt
Running this command:
require(pcadapt)
test <- pcadapt::read.pcadapt("my_radiator_pca_file", type = "pcadapt")
I only get this:
1547 lines detected.
324 columns detected.
So the warning you're receiving is probably a window issue, I suggest you check the file in a pc text editor to see if you have something strange.
from radiator.
3. Using radiator::write_vcf
check <- radiator::write_vcf("brook_char_tidy_maf.tsv", pop.info = TRUE, filename = NULL)
I get:
Error in .f(.x[[i]], ...) : object 'POP_ID' not found
definitely a bug.
However, try running with pop.info = FALSE
like this:
check <- radiator::write_vcf("brook_char_tidy_maf.tsv", filename = NULL)
Here's the doc concerning this argument:
_pop.info | (optional, logical)_
Should the population information be included in the FORMAT field
(along the GT info for each samples ?).
To make the VCF population-ready use pop.info = TRUE.
The population information must be included in the POP_ID column of the tidy dataset.
Default: pop.info = FALSE.
It's not a fully functional arguments yet, no other software besides radiator is exploiting the pop info in vcf. So there's no harm in turning it off as it is by default...
from radiator.
4. Using the vcf file as pcadapt input file
require(pcadapt)
test <- pcadapt::read.pcadapt(path_to_file,type="vcf")
I get:
No variant got discarded.
Summary:
- input file: [email protected]
- output file: ~filea31cda59f03.pcadapt
- number of individuals detected: 324
- number of loci detected: 4614
4614 lines detected.
324 columns detected.
As you say above no markers got discarded here. And it's normal, because I havent applied a default common.markers = TRUE filtering when the vcf was generated in radiator.
from radiator.
5. problem while running your file in pcadapt
x <- pcadapt::pcadapt(data, K = 9)
You get this error:
Error in UseMethod("pcadapt") :
no applicable method for 'pcadapt' applied to an object of class "c('matrix', 'integer', 'numeric')"
I don't have an error:
names(x)
[1] "scores" "singular.values" "loadings" "zscores" "af" "maf"
[7] "chi2.stat" "stat" "gif" "pvalues" "pass"
At this point here it's a pcadapt problem, try re-installing it. If nothing work, ask Michael Blum for help.
I've send you the result for the analysis with and without common markers...
Cheers
Thierry
from radiator.
Thanks Thierry,
I was able to get a good chunk of the results that you were as well.
I did check the file generated at "test <- pcadapt::read.pcadapt("my_radiator_pca_file", type = "pcadapt")" for windows line endings, and my machine is telling me that it is only ASCII, so not sure what is happening here.
I can get the same as you get from "x <- pcadapt::pcadapt(data, K = 9) and then names(x)" if I use the original dataset generated as "test " in read.pcadapt above, but not if I convert to a matrix (using data <- as.matrix(read.table("file32742f854a4d.pcadapt.bed")), and then use the "data" object. I see many errors when I try to convert to the matrix. Given the error that I am getting with the read.pcadapt command though, I suspect that there is something wrong with the output bed file. Anyway, I am awaiting Mike Blum's feedback on my errors at the moment, and I'll let you know if things don't work.
As an aside, the "common.markers = TRUE" argument isn't working for me when I try to generate the vcf file in radiator. Not a big deal if I can get the pcadapt file to work, but just wanted to give you a heads up.
Ella
from radiator.
Related Issues (20)
- error in genomic_converter() HOT 2
- Stacks vcf to rubias HOT 4
- genomic_converter Error HOT 4
- filter_rad error: Column `READ_DEPTH` is a `SeqVarDataList` object. HOT 1
- filter_rad issue error/dplyr:::mutate_error ! & DynamicClusterCall()
- error in reading VCF file HOT 1
- The GDS node "$ref" does not exist. HOT 5
- Genomic converter error 'no more individuals in your data' HOT 9
- Error in file(con, "r") : cannot open the connection HOT 1
- Error when using genomic_converter genlight --> pcadapt HOT 3
- Potential help error with DArT HOT 3
- The GDS node "$ref" does not exist. HOT 4
- genomic_converter errors HOT 5
- Error in `dplyr::mutate()` - No Variants Selected HOT 5
- genomic converter() Error:>! `everything()` must be used within a *selecting* function. HOT 1
- Error Converting VCF to Genepop HOT 12
- read_vcf file access error with parallel.core = 1L argument HOT 3
- ERROR with detect_duplicate_genome + tidy format obtained from genomic_converter HOT 3
- Error with genomic_converter HOT 2
- Help with bayescan 2.1 installation on linux system HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from radiator.