Code Monkey home page Code Monkey logo

Comments (19)

bumblenick avatar bumblenick commented on July 20, 2024 1

from admixtools.

bumblenick avatar bumblenick commented on July 20, 2024

from admixtools.

stubbsrl avatar stubbsrl commented on July 20, 2024

I am having the same issue (using scaffolds not chromosomes) and getting the bad chrom error. Has this been resolved? If no, @maernster did you find a solution?

Thanks,
Rebecca

from admixtools.

bumblenick avatar bumblenick commented on July 20, 2024

from admixtools.

stubbsrl avatar stubbsrl commented on July 20, 2024

Great thanks!

from admixtools.

trfeuerborn avatar trfeuerborn commented on July 20, 2024

Hi,

I also have a dataset with 4761 scaffolds and I am having trouble converting my plink dataset to eigenstrat format with your convertf utility.

When I do attempt to run convertf I receive this error:
bad chrom: NW_020356435.1

I tried renaming the scaffolds to remove non-alphanumerical characters, this only seems to make a difference when I change the scaffold name to chr, but this seems to work when there is a limited number of chromosomes.

I have also tried the method you described above, but I receive the same error:
bad chrom: chr4635

This is the format of the .map file:
chr4635 chr4635_914 0 914
chr4635 chr4635_1113 0 1113
...

This is the format of the block file:
chr4635_914 1
chr4635_1113 1

I made sure to install the latest version of the software.

Is there any way to be able to generate an eigenstrat file without having to change the chromosome IDs or if this is unavoidable is there an upper limit on the number of chromosomes/scaffolds?

Any advice would be greatly appreciated.

Thanks

Tatiana

from admixtools.

bumblenick avatar bumblenick commented on July 20, 2024

from admixtools.

Marvin02860 avatar Marvin02860 commented on July 20, 2024

Hello,

I am encountering the same issue as reported here. I have my SNPs located on 260 contigs, and I cannot replace their names by values from 1 to 260, as the maximum seems to be 100.
Of course, if I change all the #CHR names by the value "1", then it works, but I don't want to lose the precious information of each SNPs location.

@bumblenick I read your suggestion, but could you please explain a little more in detail what you mean by:
"Map your data onto .ind .snp .geno by arbitrarily mapping your snps to chromosomes 1-22"

I have 260 contigs, much more than 22... so is it actually possible to do something?

Thanks for any help :)
All the best,
Marvin

from admixtools.

bumblenick avatar bumblenick commented on July 20, 2024

from admixtools.

Marvin02860 avatar Marvin02860 commented on July 20, 2024

Thank you very much for your quick reply,
very appreciated :)

Do you think the option block name can work with admixR?

I have tried to make Admixtools work (via admixR), as follows:

result <- d(W = popsCF, X = "CF_scot", Y = popsCG, Z = popsCh, data = snps, blockname = "my_contigs.txt")
but not working...:

"Error in d(W = popsCF, X = "CF_scot", Y = popsCG, Z = popsCh, data = snps, :
unused argument (blockname = "my_contigs.txt")"

Thanks again!
All the best,
Marvin

from admixtools.

bumblenick avatar bumblenick commented on July 20, 2024

from admixtools.

Marvin02860 avatar Marvin02860 commented on July 20, 2024

Hello,

I just wanted to comment here on the solution to the problem mentioned, since I managed to have everything up and running finally:

My initial problem was to run ADMIXTOOLS (preferentially via AdmixR) with non-human data, a few thousands SNPs located on 260 contigs. ADMIXTOOLS does not recognise non-standard (non-human) chromosome names (such as contigs / scaffolds of other sequences, or even integers>100). In my case, my CHR names look like "ctgxxxxxxxxxxxxxxxxx".

Starting with a VCF file containing my SNPs, I used the script:
https://raw.githubusercontent.com/mathii/gdc/master/vcf2eigenstrat.py
to convert my VCF to eigenstrat format required by ADMIXTOOLS (resulting in the 3 files with format: .ind / .geno / .snp).

In the .snp file, my non-standard chromosome names are problematic for running ADMIXTOOLS.
I need to modify the first column (SNP ID) and the second column (CHR ID) of that file as follows:
First column (SNP ID): replace SNP ID by integers from 1 to 2391 (=my total number of SNPs).
Second column (CHR ID): replace CHR names (or contig/scaffold names) by integers from 1 to 22 (arbitrarily).

In order to keep track of the SNP positions in the analyses though, which is necessary for the jackknife process of defining blocks, I need to make another file defining the blocks (= which SNP belongs to which CHR). This info will be important to allow calculation of Z_score (statistical significance).
The file can be called: "my_contigs.txt" and looks like:
1st column = list of SNP ID as integers from 1 to 2391
2nd column = contig /scaffold names corresponding to where the SNPs are actually located, but these names cannot be like the original complicated "ctgxxxxxxxxxxxxx". Instead they need to be integers. In this case, it will be 1 to 260.

I can now run ADMIXTOOLS with AdmixR using the option <params = list(blockname = "my_contigs.txt")>

For instance, in R:
D_stat_1 <- d(W = popsCFsymp, X = "CF_scot", Y = popsCGsymp, Z = popsCh, data = snps, params = list(blockname = "my_contigs.txt"))

Thank you again Nick for your precious help.
All the best,
Marvin

from admixtools.

EveTC avatar EveTC commented on July 20, 2024

Trying to work through this problem myself whilst trying to convert .ped to eigenstat format to run smartPCA.
Is there any intention to allow for non-standard chromosome names (like plink allows using --allow-extra-chr)?
It does limit and complicate the procedure for those who aren't "forntunate" enough to work on model organisms. I am new to bioinformatics and I am finding it difficult to apply these methods to my dataset.
Thank you for the help on this chain - hopefully I will be able to crack it!

from admixtools.

bumblenick avatar bumblenick commented on July 20, 2024

from admixtools.

EveTC avatar EveTC commented on July 20, 2024

H Nick,
Thank you for your response and help :)
Would you mind clarifying how I would remap my chromosomes to new names? Sorry if this is a simple question - very new to this all.

from admixtools.

EveTC avatar EveTC commented on July 20, 2024

Will do - thank you Nick

from admixtools.

mariels avatar mariels commented on July 20, 2024

Dear Nick and EveTC,

I would also be interested in running smartpca on whole genome resequencing data with a reference genome composed of 56 scaffolds. I can see smartpca only uses the SNPs on the first 22 scaffolds but I would be interested in running it on all scaffolds given I have some very low coverage ancient samples. Would any of you have an utility to remap scaffolds names to smaller integers?

Thank you,
Best wishes,

Marie

from admixtools.

bumblenick avatar bumblenick commented on July 20, 2024

from admixtools.

mariels avatar mariels commented on July 20, 2024

Thanks Nick, that was indeed very easy, I did not notice the numchrom option.

from admixtools.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.