Comments (19)
from admixtools.
from admixtools.
I am having the same issue (using scaffolds not chromosomes) and getting the bad chrom error. Has this been resolved? If no, @maernster did you find a solution?
Thanks,
Rebecca
from admixtools.
from admixtools.
Great thanks!
from admixtools.
Hi,
I also have a dataset with 4761 scaffolds and I am having trouble converting my plink dataset to eigenstrat format with your convertf utility.
When I do attempt to run convertf I receive this error:
bad chrom: NW_020356435.1
I tried renaming the scaffolds to remove non-alphanumerical characters, this only seems to make a difference when I change the scaffold name to chr, but this seems to work when there is a limited number of chromosomes.
I have also tried the method you described above, but I receive the same error:
bad chrom: chr4635
This is the format of the .map file:
chr4635 chr4635_914 0 914
chr4635 chr4635_1113 0 1113
...
This is the format of the block file:
chr4635_914 1
chr4635_1113 1
I made sure to install the latest version of the software.
Is there any way to be able to generate an eigenstrat file without having to change the chromosome IDs or if this is unavoidable is there an upper limit on the number of chromosomes/scaffolds?
Any advice would be greatly appreciated.
Thanks
Tatiana
from admixtools.
from admixtools.
Hello,
I am encountering the same issue as reported here. I have my SNPs located on 260 contigs, and I cannot replace their names by values from 1 to 260, as the maximum seems to be 100.
Of course, if I change all the #CHR names by the value "1", then it works, but I don't want to lose the precious information of each SNPs location.
@bumblenick I read your suggestion, but could you please explain a little more in detail what you mean by:
"Map your data onto .ind .snp .geno by arbitrarily mapping your snps to chromosomes 1-22"
I have 260 contigs, much more than 22... so is it actually possible to do something?
Thanks for any help :)
All the best,
Marvin
from admixtools.
from admixtools.
Thank you very much for your quick reply,
very appreciated :)
Do you think the option block name can work with admixR?
I have tried to make Admixtools work (via admixR), as follows:
result <- d(W = popsCF, X = "CF_scot", Y = popsCG, Z = popsCh, data = snps, blockname = "my_contigs.txt")
but not working...:
"Error in d(W = popsCF, X = "CF_scot", Y = popsCG, Z = popsCh, data = snps, :
unused argument (blockname = "my_contigs.txt")"
Thanks again!
All the best,
Marvin
from admixtools.
from admixtools.
Hello,
I just wanted to comment here on the solution to the problem mentioned, since I managed to have everything up and running finally:
My initial problem was to run ADMIXTOOLS (preferentially via AdmixR) with non-human data, a few thousands SNPs located on 260 contigs. ADMIXTOOLS does not recognise non-standard (non-human) chromosome names (such as contigs / scaffolds of other sequences, or even integers>100). In my case, my CHR names look like "ctgxxxxxxxxxxxxxxxxx".
Starting with a VCF file containing my SNPs, I used the script:
https://raw.githubusercontent.com/mathii/gdc/master/vcf2eigenstrat.py
to convert my VCF to eigenstrat format required by ADMIXTOOLS (resulting in the 3 files with format: .ind / .geno / .snp).
In the .snp file, my non-standard chromosome names are problematic for running ADMIXTOOLS.
I need to modify the first column (SNP ID) and the second column (CHR ID) of that file as follows:
First column (SNP ID): replace SNP ID by integers from 1 to 2391 (=my total number of SNPs).
Second column (CHR ID): replace CHR names (or contig/scaffold names) by integers from 1 to 22 (arbitrarily).
In order to keep track of the SNP positions in the analyses though, which is necessary for the jackknife process of defining blocks, I need to make another file defining the blocks (= which SNP belongs to which CHR). This info will be important to allow calculation of Z_score (statistical significance).
The file can be called: "my_contigs.txt" and looks like:
1st column = list of SNP ID as integers from 1 to 2391
2nd column = contig /scaffold names corresponding to where the SNPs are actually located, but these names cannot be like the original complicated "ctgxxxxxxxxxxxxx". Instead they need to be integers. In this case, it will be 1 to 260.
I can now run ADMIXTOOLS with AdmixR using the option <params = list(blockname = "my_contigs.txt")>
For instance, in R:
D_stat_1 <- d(W = popsCFsymp, X = "CF_scot", Y = popsCGsymp, Z = popsCh, data = snps, params = list(blockname = "my_contigs.txt"))
Thank you again Nick for your precious help.
All the best,
Marvin
from admixtools.
Trying to work through this problem myself whilst trying to convert .ped
to eigenstat format to run smartPCA.
Is there any intention to allow for non-standard chromosome names (like plink allows using --allow-extra-chr
)?
It does limit and complicate the procedure for those who aren't "forntunate" enough to work on model organisms. I am new to bioinformatics and I am finding it difficult to apply these methods to my dataset.
Thank you for the help on this chain - hopefully I will be able to crack it!
from admixtools.
from admixtools.
H Nick,
Thank you for your response and help :)
Would you mind clarifying how I would remap my chromosomes to new names? Sorry if this is a simple question - very new to this all.
from admixtools.
Will do - thank you Nick
from admixtools.
Dear Nick and EveTC,
I would also be interested in running smartpca on whole genome resequencing data with a reference genome composed of 56 scaffolds. I can see smartpca only uses the SNPs on the first 22 scaffolds but I would be interested in running it on all scaffolds given I have some very low coverage ancient samples. Would any of you have an utility to remap scaffolds names to smaller integers?
Thank you,
Best wishes,
Marie
from admixtools.
from admixtools.
Thanks Nick, that was indeed very easy, I did not notice the numchrom option.
from admixtools.
Related Issues (20)
- Warning while using mergeit
- examples/qpWave.log is missing HOT 1
- Convertf error "fatalx: no valid samples!" HOT 1
- Using f3 to test gene flow from ghost species HOT 3
- Negative outgroup f3 statistics
- zsh: segmentation fault (core dumped) HOT 1
- qpadm with single sample - is it possible to run? HOT 2
- Issue with directories with spaces during install
- qpF4Ration
- warning: bad chrom HOT 2
- Compilation issues on the latest macOS Monterey (M1 Mac) HOT 1
- qpAdm: command not found
- qpWave - Segmentation fault (core dumped) HOT 9
- qpDstat Segmentation fault (core dumped)
- What‘s mean the "best" in the result from the qpDstat?
- something about qpGraph
- qpAdm - "pop: ??? has sample size 1 and inbreed set" error message HOT 2
- Not enough RAM for qpfstats? HOT 1
- only 71 lines and truncated
- Nothing happens when using convertf for PACKEDPED to PACKEDANCESTRYMAP HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from admixtools.